Martins Blog

Trying to explain complex things in simple terms

Clusterware and listener management gotcha in 11.2

Posted by Martin Bach on September 10, 2013

I have come across an interesting situation recently and thought it was worth blogging about. My friend Doug Burns might like it, it has to do with consolidation.

Background

I have seen quite a few sites in my career where the separation (of duties/listeners/disk space/log destinations) was paramount-and for good reason! In fact Oracle propagate it as well as a quick search with your favourite search engine will show. In my example I came across a system that used different listeners per database, which is very common and prevents users from “accidentally” connecting to the wrong system. If you are using such a setup please read on. If you are not using Oracle Restart/Clusterware/RAC then this is not immediately relevant to your Oracle estate.

The case is easy to reproduce. All you need is a Grid Infrastructure setup and a database. In my case it’s Oracle Restart 11.2.0.3.0 plus a 11.2.0.3.0 single instance database. I would classify this behaviour you are going to see here as a bug but it probably will return from development as “not a bug”. It is actually fixed in 12.1 which a quick test revealed.

Test Case

My test case requires quite a few listeners. Oracle Restart comes with the default LISTENER on port 1521 without you having to do anything. My setup uses the same OS account for the Oracle Restart and RDBMS home. If you use the grid user for Oracle Restart the logfile to tail will be different but not the problem.

Back to the test case-which is not recommended to reproduce outside your lab VM!

Begin by creating a few more listeners, maybe like this:

[oracle@asmtest ~]$ for i in $(seq 2 9); do srvctl add listener -l lsnr$i -p "tcp:152$i"; done

In a different (screen?) session navigate to the oraagent log file. In a Grid Infrastructure environment the listener(s) and many others are so-called Resources. Each resource has a profile which tells Clusterware what to do with it. If you have never seen one, here is the resource for the ninth listener just created:

[oracle@asmtest oraagent_oracle]$ crsctl stat res ora.LSNR9.lsnr -p
NAME=ora.LSNR9.lsnr
TYPE=ora.listener.type
ACL=owner:oracle:rwx,pgrp:dba:r-x,other::r--,group:oinstall:r-x,user:oracle:rwx
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
ALIAS_NAME=
AUTO_START=restore
CHECK_INTERVAL=60
CHECK_TIMEOUT=30
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=listener) PROPERTY(LISTENER_NAME=PARSE(%NAME%, ., 2))
DEGREE=1
DESCRIPTION=CRS resource type for the Listener
ENABLED=1
ENDPOINTS=TCP:1529
LOAD=1
LOGGING_LEVEL=1
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORACLE_HOME=%CRS_HOME%
PORT=1529
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_DEPENDENCIES=
START_TIMEOUT=180
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=
STOP_TIMEOUT=0
TYPE_VERSION=1.2
UPTIME_THRESHOLD=1d
USR_ORA_ENV=
USR_ORA_OPI=false
VERSION=11.2.0.3.0

There are lots of interesting things to see here, and once you understand that Clusterware is actually just a framework to deal with resource metadata then it becomes a lot less complex and you enter a land of possibilities!

Anyway, there is an AGENT_FILENAME attribute which points to the oraagent binary. In other words, the oraagent binary is responsible for managing the resource. Depending on your installation type the agent logs are either in

  • $ORACLE_HOME/log/$(hostname)/agent/ohasd/oraagent_oracle/ for Oracle Restart or
  • $ORACLE_HOME/log/$(hostname)/agent/crsd/oraagent_oracle/ for a clustered installation.

Should the grid owner be different you will see a directory oraagent_<grid_owner>. Since the listeners are controlled by the $GRID_HOME the grid owner owns the log files: simple!

Anyway, tail the oraagent_oracle.log file now in a separate session.

By now the creation of listeners 2-9 should have completed. Before we can see them in action they need to be started. Are you tailing that log file?

[oracle@asmtest ~]$ for i in $(seq 2 9); do srvctl start listener -l lsnr$i ; done

After a little while and a few hundred kb log file later you should see the listeners:

[oracle@asmtest ~]$ ps -ef | grep tnslsnr
oracle    3930     1  0 11:24 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LISTENER -inherit
oracle    5920     1  0 11:33 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR2 -inherit
oracle    5953     1  0 11:33 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR3 -inherit
oracle    5986     1  0 11:33 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR4 -inherit
oracle    6020     1  0 11:33 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR5 -inherit
oracle    6056     1  0 11:34 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR6 -inherit
oracle    6089     1  0 11:34 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR7 -inherit
oracle    6130     1  0 11:34 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR8 -inherit
oracle    6165     1  0 11:34 ?        00:00:00 /u01/app/oracle/product/11.2.0/grid/bin/tnslsnr LSNR9 -inherit
oracle    6452  4786  0 11:35 pts/1    00:00:00 grep tnslsnr
[oracle@asmtest ~]$

Clusterware will also tell you they exist:

[oracle@asmtest ~]$ srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): asmtest
Listener LSNR2 is enabled
Listener LSNR2 is running on node(s): asmtest
Listener LSNR3 is enabled
Listener LSNR3 is running on node(s): asmtest
Listener LSNR4 is enabled
Listener LSNR4 is running on node(s): asmtest
Listener LSNR5 is enabled
Listener LSNR5 is running on node(s): asmtest
Listener LSNR6 is enabled
Listener LSNR6 is running on node(s): asmtest
Listener LSNR7 is enabled
Listener LSNR7 is running on node(s): asmtest
Listener LSNR8 is enabled
Listener LSNR8 is running on node(s): asmtest
Listener LSNR9 is enabled
Listener LSNR9 is running on node(s): asmtest
[oracle@asmtest ~]$

Now here is the reason for the blog post: are you still tailing the log? It’s probably flying by right now :) I noticed the following too in the logfile:

2013-09-10 11:36:48.672: [ USRTHRD][666883840] {0:0:3198} Value of LOCAL_LISTENER is
2013-09-10 11:36:48.672: [ USRTHRD][666883840] {0:0:3198} sqlStmt = ALTER SYSTEM SET LOCAL_LISTENER='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1522))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1523))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1524))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1525))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1526))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1527))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1528))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.110)(PORT=1529))))' SCOPE=MEMORY SID='orcl' /* db agent *//* {0:0:3198} */
2013-09-10 11:36:48.672: [ USRTHRD][666883840] {0:0:3198} ORA-32021: parameter value longer than 255 characters

2013-09-10 11:36:48.672: [ USRTHRD][666883840] {0:0:3198} DbAgent::DedicatedThread::setLocalListener Exception OCIException
2013-09-10 11:36:48.672: [ USRTHRD][666883840] {0:0:3198} ORA-32021: parameter value longer than 255 characters

2013-09-10 11:36:48.672: [ USRTHRD][666883840] {0:0:3198} InstConnection:~InstConnection: this 3417da90
2013-09-10 11:36:48.673: [ USRTHRD][666883840] {0:0:3198} CrsCmd::ClscrsCmdData::stat entity 1 statflag 32 useFilter 0
2013-09-10 11:36:48.688: [ USRTHRD][666883840] {0:0:3198} setResAttrib: attr GEN_START_OPTIONS clsagfw attribVal nomount clscrs tmpAttrValue open
2013-09-10 11:36:48.688: [ USRTHRD][666883840] {0:0:3198} setResAttrib cmdid 0x0107 checktype 0xffff
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} setResAttrib modifyResource attr GEN_START_OPTIONS value nomount
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} setResAttrib clsagfw_modify_attribute attr GEN_START_OPTIONS value nomount retCode 1
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} DbAgent::DThread getConnxn
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} Utils::getCrsHome crsHome /u01/app/oracle/product/11.2.0/grid
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} clsnInstConnection::makeConnectStr UsrOraEnv  m_oracleHome /u01/app/oracle/product/11.2.0/dbhome_1 Crshome /u01/app/oracle/product/11.2.0/grid
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} Utils::getCrsHome crsHome /u01/app/oracle/product/11.2.0/grid
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} clsnInstConnection::makeConnectStr LIBRARY_PATH1 ,LD_LIBRARY_PATH=
2013-09-10 11:36:48.696: [ USRTHRD][666883840] {0:0:3198} makeConnectStr = (DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle)(ARGV0=oracleorcl)(ENVS='ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1,ORACLE_SID=orcl,LD_LIBRARY_PATH=')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(CONNECT_DATA=(SID=orcl)))
2013-09-10 11:36:48.697: [ USRTHRD][666883840] {0:0:3198} InstConnection::connectInt: server not attached
2013-09-10 11:36:48.715: [ USRTHRD][666883840] {0:0:3198} In DedicatedThread::generateLocalListener
2013-09-10 11:36:48.716: [ USRTHRD][666883840] {0:0:3198} m_crsHome = /u01/app/oracle/product/11.2.0/grid
2013-09-10 11:36:48.747: [ USRTHRD][666883840] {0:0:3198} Add endpoint from ora.LISTENER.lsnr to LOCAL_LISTENER
2013-09-10 11:36:48.747: [ USRTHRD][666883840] {0:0:3198} CrsCmd::ClscrsCmdData::stat entity 1 statflag 32 useFilter 0

You need to scroll, sorry-check the sqlStmt. This is the agent’s way of telling me that the local listener of the database ORCL isn’t set and that it wants to set it. But it can’t, because the alter system command fails with ORA-32021: parameter value longer than 255 characters. Bummer, better set this manually! As you can see Clusterware tries to set the local listener to ALL the ports, 1521-1529 and fails. The same message will be repeated thousands of times.

So what?

Bug or feature? I don’t really care, but it is something to be aware of, at least until you migrate to 12.1. If your consolidation platform uses lots of listeners (and by lots I mean more than one) then please set the local_listener parameter in the databases to prevent this from happening. But also be aware that the listener resource profile dictates a check every 60 seconds. Each check will produce log information that is written to file. The amount of CPU spent performing this task is proportional to the number of listeners.

This is not necessarily a problem but might help you find out why a server has a high load average and why oraagent_{oracle|grid} ranks high in the “top” output.

About these ads

7 Responses to “Clusterware and listener management gotcha in 11.2”

  1. rsiz said

    Nice catch. If I understand you correctly though, the “by lots I mean more than one” is a little overstated, since you need to reach 255 characters in the string to make this fail. With each address_list being, what, 68 characters or so it does not take very many! Now you could make a few tokens of the OFA null and shorten “oracle” to “ora” and “product” to “p” /ora/p/11.2.0/grid is a lot easier to take. We needed real /u01 when disk partitions were small; we needed app when people mixed applications and data on the same drive. But while I like that being shorter some people don’t understand that is still an OFA string, it would not help in this case because of the enormous verbosity of the parameter names and how quickly it adds up. No way around the xxx.xxx.xxx.xxx part, but everything else could be much smaller. Relationally we might want a human reader translator with a -v flag to read these files in that case, but *mostly* they are for machines to read. Sigh. When CPU and memory is cheap people start thinking that wasting it just doesn’t matter. PROG101: If you’re going to stuff a variable length parameter in a string with a maximum length you’d better handle it, just like needing to be able to spill from memory to disk for a sort. Handling it as an error just isn’t good enough.

    • Martin Bach said

      Hi Mark,

      yes it’s slightly overstated, but let me tell you why: I found these two cases during site visits: either many listeners (standard states 1 per database perhaps), or just one.

      I really like your comment about the cheap availability of resources that lead to waste. I migrated a system that was literally falling apart from very old hardware to shiny new one. On the old system everyone had to be extra careful not to code inefficiently. The effect after the new system was in production was a lot more sloppiness-the system was so much more powerful one got away with.

      And this fits: I first did exactly the same test on 12c and now it works. I didn’t test the limit but it must be more than 255 characters. Until the new limit becomes to small too.

      Thanks for your comments as always, much appreciated.

  2. rsiz said

    only very slightly overstated, since 4*68 is 272. This is like the mother hen who was losing chicks and didn’t know it because she could only count 1, 2, 3, a lot. I didn’t do the fine string counting to see whether 3 will squeak in. Fortunately you have provided a root cause solution (set the local listener pre-emptively). I’m not aware of any downside to that and it arguably is good documentation that may help a DBA working on a specific database.

    Some friend said, circa 1988, something like “Don’t worry, how ever fast and cheap they make computers, we’ll always find a way to overload them.”

  3. Hi Martin,

    I hit this bug / feature trying to set up a Data Guard with RAC on both sides. I created a additional listeners for DG only – so I had 4 in total but then I need to setup a listener_networks to separate traffic between public and Data Guard. Easiest way was to use connection string as parameter of listener_networks but I hit a problem with parameter size. Workaround was simple – just create a tnsnames entry with required aliases and reduce size of parameter to bare minimum. It too late now to check – but I’m wondering if similar workaround will work in your case.

    regards,
    Marcin

    • Martin Bach said

      Hi Marcin,

      thanks for your comments!

      I am quite confident the workaround of using a TNSName would work. Until recently I thought of letting Clusterware set the local_listener as it does so based on the information in the OCR and is more dynamic in nature. Except of course for our cases where the listener information doesn’t fit into the 255 character limit in 11g. (Did I mention that this limit has been extended in 12c?)

      For those readers who prefer a little more background information. A good discussion (as always!) is available on Luca Canali’s blog:

      http://externaltable.blogspot.de/2012/07/listenerora-and-oraagent-in-rac-11gr2.html

  4. Sylvain Truchon said

    Hello Martin,

    I had the same problem. (more than 160 000 audit files in 2 or 3 days).

    Here is what I did to stop the production of audit files.

    1- Put a value in local_listener for all my databases.
    2- Kill the oraagent.bin. (it respawn in 1 sec)
    3- Clean the audit directories.
    4- Connect 1 time on each database with / as sysdba.
    5- Look in the audit directories if I see 1 audit file for the connection I did… (not 1000 audit files)

    Be carefull with this solution. I used it with a DEV server and the server is not in cluster.. only Oracle Restart.

    Good luck!

    Sylvain Truchon

  5. […] no more need to worry about local listener (unless you have many local listeners and are not on 12c), remote_listener and listener_networks. That’s a big relief, to see that it does work. I can […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 2,419 other followers

%d bloggers like this: