Martins Blog

Trying to explain complex things in simple terms

Tuning Log and Trace levels for Clusterware 11.2

Posted by Martin Bach on June 2, 2011

With the introduction of Clusterware 11.2 a great number of command line tools have either been deprecated ($ORA_CRS_HOME/bin/crs_* and others) or merged into other tools. This is especially true for crsctl, which is now the tool to access and manipulate low level resources in Clusterware.

This also implies that some of the notes on Metalink are no longer applicable to Clusterware 11.2, such as the one detailing how to get more detailed information in the logs. Not that the log information wasn’t already rather comprehensive if you asked me…

And here comes a warning: don’t change the log levels unless you have a valid reason, or under the instructions of support. Higher log levels than the defaults tend to generate too much data, filling up the GRID_HOME and potentially killing the node.

Log File Location

The location for logs in Clusterware hasn’t changed much since the unified log structure was introduced in 10.2 and documented in “CRS and 10g/11.1 Real Application Clusters (Doc ID 259301.1)”. It has been extended though, and quite dramatically so in 11.2, which is documented as well in one of the better notes from support: “11gR2 Clusterware and Grid Home – What You Need to Know (Doc ID 1053147.1)”

The techniques for getting debug and trace information as described for example in “Diagnosability for CRS / EVM / RACG (Doc ID 357808.1)” doesn’t really apply any more as the syntax changed.

Getting Log Levels in 11.2

If you are interested at which log level a specific Clusterware resource operates, you can use the crsctl get log resource resourceName call, as in this example:

# crsctl get log res ora.asm
Get Resource ora.asm Log Level: 1

Don’t forget to apply the “-init” flag when you want to query resources which are part of the lower stack:

[root@lonengbl312 ~]# crsctl get log res ora.cssd
CRS-4655: Resource ‘ora.cssd’ could not be found.
CRS-4000: Command Get failed, or completed with errors.
[root@lonengbl312 ~]# crsctl get log res ora.cssd –init
Get Resource ora.cssd Log Level: 1

Interestingly, most Clusterware daemons start leaving a detailed message in their log file as to which module uses what logging level. Take CSSD for example:

2011-06-02 09:06:14.847: [    CSSD][1531968800]clsu_load_ENV_levels: Module = CSSD, LogLevel = 2, TraceLevel = 0
2011-06-02 09:06:14.847: [    CSSD][1531968800]clsu_load_ENV_levels: Module = GIPCNM, LogLevel = 2, TraceLevel = 0
2011-06-02 09:06:14.847: [    CSSD][1531968800]clsu_load_ENV_levels: Module = GIPCGM, LogLevel = 2, TraceLevel = 0
2011-06-02 09:06:14.847: [    CSSD][1531968800]clsu_load_ENV_levels: Module = GIPCCM, LogLevel = 2, TraceLevel = 0
2011-06-02 09:06:14.847: [    CSSD][1531968800]clsu_load_ENV_levels: Module = CLSF, LogLevel = 0, TraceLevel = 0
2011-06-02 09:06:14.847: [    CSSD][1531968800]clsu_load_ENV_levels: Module = SKGFD, LogLevel = 0, TraceLevel = 0
2011-06-02 09:06:14.847: [    CSSD][1531968800]clsu_load_ENV_levels: Module = GPNP, LogLevel = 1, TraceLevel = 0
2011-06-02 09:06:14.847: [    CSSD][1531968800]clsu_load_ENV_levels: Module = OLR, LogLevel = 0, TraceLevel = 0

This doesn’t match the output of the previous command. Now how do you check this on the command line? crsctl to the rescue again-it now has a “log” option:

# crsctl get log -h
Usage:
  crsctl get {log|trace} {mdns|gpnp|css|crf|crs|ctss|evm|gipc} "<name1>,..."
 where
   mdns          multicast Domain Name Server
   gpnp          Grid Plug-n-Play Service
   css           Cluster Synchronization Services
   crf           Cluster Health Monitor
   crs           Cluster Ready Services
   ctss          Cluster Time Synchronization Service
   evm           EventManager
   gipc          Grid Interprocess Communications
   <name1>, ...    Module names ("all" for all names)

  crsctl get log res <resname>
 where
   <resname>     Resource name

That’s good information, but the crucial bit about the <name1> and other parameters is missing. The next question I asked myself was “how do I find out which sub-modules make are part of say, crsd. Apart from looking at the trace file”. That information used to be documented in the OTN reference, but it’s now something you can evaluate from Clusterware as well:

[root@lonengbl312 ~]# crsctl lsmodules -h
Usage:
  crsctl lsmodules {mdns|gpnp|css|crf|crs|ctss|evm|gipc}
 where
   mdns  multicast Domain Name Server
   gpnp  Grid Plug-n-Play Service
   css   Cluster Synchronization Services
   crf   Cluster Health Monitor
   crs   Cluster Ready Services
   ctss  Cluster Time Synchronization Service
   evm   EventManager
   gipc  Grid Interprocess Communications

Back to the question as to which modules make up CSSD, I got this answer from Clusterware, matching the log file output:

# crsctl lsmodules css
List CSSD Debug Module: CLSF
List CSSD Debug Module: CSSD
List CSSD Debug Module: GIPCCM
List CSSD Debug Module: GIPCGM
List CSSD Debug Module: GIPCNM
List CSSD Debug Module: GPNP
List CSSD Debug Module: OLR
List CSSD Debug Module: SKGFD

To get detailed information about the log level of CSSD:module, I can now finally use the crsctl get {log|trace} daemon:module command. My CSSD logfile stated the CSSD:OLR had log level 0 and trace level 0, confirmed by Clusterware:

# crsctl get log css “OLR”
Get CSSD Module: OLR  Log Level: 0

Note that the name of the module has to be in upper case. If you are lazy like me you could use the “all” keyword instead of the module name to list all the information in one command:

# crsctl get log css all
Get CSSD Module: CLSF  Log Level: 0
Get CSSD Module: CSSD  Log Level: 2
Get CSSD Module: GIPCCM  Log Level: 2
Get CSSD Module: GIPCGM  Log Level: 2
Get CSSD Module: GIPCNM  Log Level: 2
Get CSSD Module: GPNP  Log Level: 1
Get CSSD Module: OLR  Log Level: 0
Get CSSD Module: SKGFD  Log Level: 0

Setting Log Levels

Let’s have a look how to increase log levels for the critical components. I am picking CSSD here, simply because I want to know if Oracle records the chicken-and-egg problem when having voting files in ASM in the ocssd.log file. (CSSD needs to voting files to start, and ASM needs CSSD to run, but if the voting files are in ASM, then we have gone full circle). With all the information provided in the section above, that’s actually quite simple to do.

# crsctl set log -h
Usage:
  crsctl set {log|trace} {mdns|gpnp|css|crf|crs|ctss|evm|gipc} "<name1>=<lvl1>,..."
 where
   mdns          multicast Domain Name Server
   gpnp          Grid Plug-n-Play Service
   css           Cluster Synchronization Services
   crf           Cluster Health Monitor
   crs           Cluster Ready Services
   ctss          Cluster Time Synchronization Service
   evm           EventManager
   gipc          Grid Interprocess Communications
   <name1>, ...    Module names ("all" for all names)
   <lvl1>, ...     Module log/trace levels

  crsctl set log res <resname>=<lvl>
 where
 <resname>     Resource name
   <lvl>                  Agent log levels

The object I would like to research is the voting file discovery. I know that the ASM disks have an entry in the header indicating whether or not a disk contains a voting file, in the kfdhdb.vfstart and kfdhdb.vfend fields as shown here:

# kfed read /dev/oracleasm/disks/OCR1  | grep kfdhdb.vf
kfdhdb.vfstart:                     256 ; 0x0ec: 0x00000100
kfdhdb.vfend:                       288 ; 0x0f0: 0x00000120

With the default log information, I can see the voting disk discovery happening as shown in this cut down version of the ocssd.log:

2011-06-02 09:06:19.941: [    CSSD][1078901056]clssnmvDDiscThread: using discovery string  for initial discovery
2011-06-02 09:06:19.941: [   SKGFD][1078901056]Discovery with str::
2011-06-02 09:06:19.941: [   SKGFD][1078901056]UFS discovery with ::
2011-06-02 09:06:19.941: [   SKGFD][1078901056]OSS discovery with ::
2011-06-02 09:06:19.941: [   SKGFD][1078901056]Discovery with asmlib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: str ::
2011-06-02 09:06:19.941: [   SKGFD][1078901056]Fetching asmlib disk :ORCL:OCR1:

2011-06-02 09:06:19.942: [   SKGFD][1078901056]Handle 0x58d2a60 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:OCR1:
2011-06-02 09:06:19.942: [   SKGFD][1078901056]Handle 0x58d3290 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:OCR2:
2011-06-02 09:06:19.942: [   SKGFD][1078901056]Handle 0x58d3ac0 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:OCR3:

2011-06-02 09:06:19.953: [    CSSD][1078901056]clssnmvDiskVerify: discovered a potential voting file
2011-06-02 09:06:19.953: [   SKGFD][1078901056]Handle 0x5c22670 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk  :ORCL:OCR1:
2011-06-02 09:06:19.954: [    CSSD][1078901056]clssnmvDiskVerify: Successful discovery for disk ORCL:OCR1, UID 4700d9a1-93094ff5-bf8b96e1-7efdb883, Pending CIN 0:1301401766:0, Committed CIN 0:1301401766:0
2011-06-02 09:06:19.954: [   SKGFD][1078901056]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x5c22670 for disk :ORCL:OCR1:

2011-06-02 09:06:19.956: [    CSSD][1078901056]clssnmvDiskVerify: Successful discovery of 3 disks
2011-06-02 09:06:19.956: [    CSSD][1078901056]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2011-06-02 09:06:19.957: [    CSSD][1078901056]clssnmCompleteVFDiscovery: Completing voting file discovery

Now I’m setting the new log level for SKGFD, as this seems to be responsible for the disk discovery.

# crsctl set log css “SKGFD:5”
Set CSSD Module: SKGFD  Log Level: 5

Does it make a difference? Restarting Clusterware should give more clues. Looking at the logfile I could tell there was no difference. I tried using the trace level now for the same module:

# crsctl get trace css SKGFD
Get CSSD Module: SKGFD  Trace Level: 5

Interestingly that wasn’t picked up!

$ grep “Module = SKGFD” ocssd.log | tail -n1
2011-06-02 11:31:06.707: [    CSSD][427846944]clsu_load_ENV_levels: Module = SKGFD, LogLevel = 2, TraceLevel = 0

So I decided to leave it at this point as I was running out of time for this test. I’d be interested if anyone has successfully raised the log level for this component. On the other hand, I wonder if it is ever necessary to do so. I can confirm however that raising the log level for the CSSD module created a lot more output, actually so much that it seems to equal a system wide SQL trace-so don’t change log levels unless support directs you to.

3 Responses to “Tuning Log and Trace levels for Clusterware 11.2”

  1. Jayakumar said

    Useful information. Thanks much Martin….

  2. Anand said

    Nice one !!! Thanks for Sharing :)

  3. Sridhar said

    Always wanted to know this. I have a documented note from Oracle support stating that there is no way to reduce or make the tracing any more verbose! Great stuff!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: