Martins Blog

Trying to explain complex things in simple terms

Archive for September, 2013

Clusterware and listener management gotcha in 11.2

Posted by Martin Bach on September 10, 2013

I have come across an interesting situation recently and thought it was worth blogging about. My friend Doug Burns might like it, it has to do with consolidation.

Background

I have seen quite a few sites in my career where the separation (of duties/listeners/disk space/log destinations) was paramount-and for good reason! In fact Oracle propagate it as well as a quick search with your favourite search engine will show. In my example I came across a system that used different listeners per database, which is very common and prevents users from “accidentally” connecting to the wrong system. If you are using such a setup please read on. If you are not using Oracle Restart/Clusterware/RAC then this is not immediately relevant to your Oracle estate.

The case is easy to reproduce. All you need is a Grid Infrastructure setup and a database. In my case it’s Oracle Restart 11.2.0.3.0 plus a 11.2.0.3.0 single instance database. I would classify this behaviour you are going to see here as a bug but it probably will return from development as “not a bug”. It is actually fixed in 12.1 which a quick test revealed.

Test Case

My test case requires quite a few listeners. Oracle Restart comes with the default LISTENER on port 1521 without you having to do anything. My setup uses the same OS account for the Oracle Restart and RDBMS home. If you use the grid user for Oracle Restart the logfile to tail will be different but not the problem.

Back to the test case-which is not recommended to reproduce outside your lab VM!

Begin by creating a few more listeners, maybe like this:

[oracle@asmtest ~]$ for i in $(seq 2 9); do srvctl add listener -l lsnr$i -p "tcp:152$i"; done

In a different (screen?) session navigate to the oraagent log file. In a Grid Infrastructure environment the listener(s) and many others are so-called Resources. Each resource has a profile which tells Clusterware what to do with it. If you have never seen one, here is the resource for the ninth listener just created:

[oracle@asmtest oraagent_oracle]$ crsctl stat res ora.LSNR9.lsnr -p
NAME=ora.LSNR9.lsnr
TYPE=ora.listener.type
ACL=owner:oracle:rwx,pgrp:dba:r-x,other::r--,group:oinstall:r-x,user:oracle:rwx
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
ALIAS_NAME=
AUTO_START=restore
CHECK_INTERVAL=60
CHECK_TIMEOUT=30
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=listener) PROPERTY(LISTENER_NAME=PARSE(%NAME%, ., 2))
DEGREE=1
DESCRIPTION=CRS resource type for the Listener
ENABLED=1
ENDPOINTS=TCP:1529
LOAD=1
LOGGING_LEVEL=1
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORACLE_HOME=%CRS_HOME%
PORT=1529
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_DEPENDENCIES=
START_TIMEOUT=180
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=
STOP_TIMEOUT=0
TYPE_VERSION=1.2
UPTIME_THRESHOLD=1d
USR_ORA_ENV=
USR_ORA_OPI=false
VERSION=11.2.0.3.0

There are lots of interesting things to see here, and once you understand that Clusterware is actually just a framework to deal with resource metadata then it becomes a lot less complex and you enter a land of possibilities!

Anyway, there is an AGENT_FILENAME attribute which points to the oraagent binary. In other words, the oraagent binary is responsible for managing the resource. Depending on your installation type the agent logs are either in

  • $ORACLE_HOME/log/$(hostname)/agent/ohasd/oraagent_oracle/ for Oracle Restart or
  • $ORACLE_HOME/log/$(hostname)/agent/crsd/oraagent_oracle/ for a clustered installation.

Should the grid owner be different you will see a directory oraagent_<grid_owner>. Since the listeners are controlled by the $GRID_HOME the grid owner owns the log files: simple!

Anyway, tail the oraagent_oracle.log file now in a separate session.

By now the creation of listeners 2-9 should have completed. Before we can see them in action they need to be started. Are you tailing that log file?

[oracle@asmtest ~]$ for i in $(seq 2 9); do srvctl start listener -l lsnr$i ; done

After a little while and a few hundred kb log file later you should see the listeners:

[oracle@asmtest ~]$ ps -ef | grep tnslsnr
oracle    3930     1  0 11:24 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LISTENER -inherit
oracle    5920     1  0 11:33 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR2 -inherit
oracle    5953     1  0 11:33 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR3 -inherit
oracle    5986     1  0 11:33 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR4 -inherit
oracle    6020     1  0 11:33 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR5 -inherit
oracle    6056     1  0 11:34 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR6 -inherit
oracle    6089     1  0 11:34 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR7 -inherit
oracle    6130     1  0 11:34 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR8 -inherit
oracle    6165     1  0 11:34 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR9 -inherit
oracle    6452  4786  0 11:35 pts/1    00:00:00 grep tnslsnr
[oracle@asmtest ~]$

Clusterware will also tell you they exist:

[oracle@asmtest ~]$ srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): asmtest
Listener LSNR2 is enabled
Listener LSNR2 is running on node(s): asmtest
Listener LSNR3 is enabled
Listener LSNR3 is running on node(s): asmtest
Listener LSNR4 is enabled
Listener LSNR4 is running on node(s): asmtest
Listener LSNR5 is enabled
Listener LSNR5 is running on node(s): asmtest
Listener LSNR6 is enabled
Listener LSNR6 is running on node(s): asmtest
Listener LSNR7 is enabled
Listener LSNR7 is running on node(s): asmtest
Listener LSNR8 is enabled
Listener LSNR8 is running on node(s): asmtest
Listener LSNR9 is enabled
Listener LSNR9 is running on node(s): asmtest
[oracle@asmtest ~]$

Now here is the reason for the blog post: are you still tailing the log? It’s probably flying by right now :) I noticed the following too in the logfile:

2013-09-10 11:36:48.672: [ USRTHRD][666883840] {0:0:3198} Value of LOCAL_LISTENER is
2013-09-10 11:36:48.672: [ USRTHRD][666883840] {0:0:3198} sqlStmt = ALTER SYSTEM SET LOCAL_LISTENER='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1522))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1523))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1524))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1525))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1526))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1527))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1528))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1529))))' SCOPE=MEMORY SID='orcl' /* db agent *//* {0:0:3198} */
2013-09-10 11:36:48.672: [ USRTHRD][666883840] {0:0:3198} ORA-32021: parameter value longer than 255 characters

2013-09-10 11:36:48.672: [ USRTHRD][666883840] {0:0:3198} DbAgent::DedicatedThread::setLocalListener Exception OCIException
2013-09-10 11:36:48.672: [ USRTHRD][666883840] {0:0:3198} ORA-32021: parameter value longer than 255 characters

2013-09-10 11:36:48.672: [ USRTHRD][666883840] {0:0:3198} InstConnection:~InstConnection: this 3417da90
2013-09-10 11:36:48.673: [ USRTHRD][666883840] {0:0:3198} CrsCmd::ClscrsCmdData::stat entity 1 statflag 32 useFilter 0
2013-09-10 11:36:48.688: [ USRTHRD][666883840] {0:0:3198} setResAttrib: attr GEN_START_OPTIONS clsagfw attribVal nomount clscrs tmpAttrValue open
2013-09-10 11:36:48.688: [ USRTHRD][666883840] {0:0:3198} setResAttrib cmdid 0x0107 checktype 0xffff
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} setResAttrib modifyResource attr GEN_START_OPTIONS value nomount
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} setResAttrib clsagfw_modify_attribute attr GEN_START_OPTIONS value nomount retCode 1
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} DbAgent::DThread getConnxn
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} Utils::getCrsHome crsHome /u01/app/oracle/product/11.2.0/grid
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} clsnInstConnection::makeConnectStr UsrOraEnv  m_oracleHome /u01/app/oracle/product/11.2.0/dbhome_1 Crshome /u01/app/oracle/product/11.2.0/grid
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} Utils::getCrsHome crsHome /u01/app/oracle/product/11.2.0/grid
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} clsnInstConnection::makeConnectStr LIBRARY_PATH1 ,LD_LIBRARY_PATH=
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} makeConnectStr = (DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle)(ARGV0=oracleorcl)(ENVS='ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1,ORACLE_SID=orcl,LD_LIBRARY_PATH=')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(CONNECT_DATA=(SID=orcl)))
2013-09-10 11:36:48.697: [ USRTHRD][666883840] {0:0:3198} InstConnection::connectInt: server not attached
2013-09-10 11:36:48.715: [ USRTHRD][666883840] {0:0:3198} In DedicatedThread::generateLocalListener
2013-09-10 11:36:48.716: [ USRTHRD][666883840] {0:0:3198} m_crsHome = /u01/app/oracle/product/11.2.0/grid
2013-09-10 11:36:48.747: [ USRTHRD][666883840] {0:0:3198} Add endpoint from ora.LISTENER.lsnr to LOCAL_LISTENER
2013-09-10 11:36:48.747: [ USRTHRD][666883840] {0:0:3198} CrsCmd::ClscrsCmdData::stat entity 1 statflag 32 useFilter 0

You need to scroll, sorry-check the sqlStmt. This is the agent’s way of telling me that the local listener of the database ORCL isn’t set and that it wants to set it. But it can’t, because the alter system command fails with ORA-32021: parameter value longer than 255 characters. Bummer, better set this manually! As you can see Clusterware tries to set the local listener to ALL the ports, 1521-1529 and fails. The same message will be repeated thousands of times.

So what?

Bug or feature? I don’t really care, but it is something to be aware of, at least until you migrate to 12.1. If your consolidation platform uses lots of listeners (and by lots I mean more than one) then please set the local_listener parameter in the databases to prevent this from happening. But also be aware that the listener resource profile dictates a check every 60 seconds. Each check will produce log information that is written to file. The amount of CPU spent performing this task is proportional to the number of listeners.

This is not necessarily a problem but might help you find out why a server has a high load average and why oraagent_{oracle|grid} ranks high in the “top” output.

Posted in 11g Release 2, Linux | 7 Comments »

RAC in KVM is possible without iSCSI

Posted by Martin Bach on September 3, 2013

(This post is for Jerry. He will know when he reads it)

I have been a great supporter of many flavours of virtualisation and my earliest experience with Xen goes back to Oracle VM 2 which was based on RHEL 4 and an early version of Xen. Why am I saying this? Because Xen is (was?) simple and elegant. Especially for building RAC systems: paravirtualised Linux was all you needed, and a dual-core machine: Xen is very lightweight even though recent achievements in processor architecture (nested page tables, single root IO virtualisation, others) make it more desirable to use hardware virtualisation with paravirtualised drivers. This is what this post is about!

Shared storage in Xen

As you know you need shared block devices for RAC for voting disks, OCR, data files, redo logs, the lot. In Xen that’s as straight forward as it gets:

disk = [
  'file:/var/lib/xen/images/node1/root.img,xvda,w',
   ...
  'file:/var/lib/xen/images/node1/ocr001.img,xvdd,w!'
]

The “w!” means the block device is shared. Simple. Make the file available to the second RAC node and you are done.

In 2009 (yikes!) I ask the question if you could have RAC on KVM with the same technique. Who would have known then that Karl would be a colleague on day? Fast forward a few years and I have implemented such a solution. And it works in the lab. Read the rest of this entry »

Posted in KVM, Linux, Xen | 3 Comments »