Martins Blog

Trying to explain complex things in simple terms

Beware of ACFS when upgrading to 11.2.0.3

Posted by Martin Bach on January 10, 2012

This post is about a potential pitfall when migrating from 11.2.0.x to the next point release. I stumbled over problem this one on a two node cluster.

The operating system is Oracle Linux 5.5 running 11.2.0.2.3 and I wanted to go to 11.2.0.3.0. As you know, Grid Infrastructure upgrades are out-of-place, in other words require a separate Oracle home. This is also one of the reasons I wouldn’t want less than 20G on a non-lab like environment for the Grid Infrastructure mount points …

Now when you are upgrading from 11.2.0.x to 11.2.0.3 you need to apply a one-off patch, but the correct one! Search for patch number 12539000 (11203:ASM UPGRADE FAILED ON FIRST NODE WITH ORA-03113) and apply the one that matches your version-and pay attention to these PSUs! There is the obvious required opatch update to be performed before again as well.

So much for the prerequisites. Oracle 11.2.0.3 is available as patch 10404530, and part 3 is for Grid Infrastructure which has to be done first. This post only covers the GI upgrade, the database part is usually quite uneventful in comparison…

Upgrading Grid Infrastructure

After unzipping the third patch file you start runInstaller. But not before having carefully unset all pointers to the current 11.2.0.2 GRID_HOME (ORACLE_HOME, ORACLE_SID, LD_LIBRARY_PATH, ORA_CRS_HOME, etc)!

Clicking through OUI is mostly a matter of “next”, “next”, “next”, the action starts with the rootupgrade.sh script. Here’s the output from node1:

[root@node1 ~]# /u01/crs/11.2.0.3/rootupgrade.sh
Performing root user operation for Oracle 11g

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME=  /u01/crs/11.2.0.3

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: y
Copying oraenv to /usr/local/bin ...
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: y
Copying coraenv to /usr/local/bin ...

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/crs/11.2.0.3/crs/install/crsconfig_params
Creating trace directory
User ignored Prerequisites during installation

ASM upgrade has started on first node.

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node1'
...
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'node1' has completed
CRS-2673: Attempting to stop 'ora.gpnpd' on 'node1'
CRS-2677: Stop of 'ora.gpnpd' on 'node1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
OLR initialization - successful
Replacing Clusterware entries in inittab
clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 11g Release 2.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
PRCA-1056 : Unable to upgrade ACFS from version 11.2.0.2.0 to version 11.2.0.3.0
PRCT-1011 : Failed to run "advmutil". Detailed error: advmutil:
CLSU-00100: Operating System function: open64 failed with error data: 2advmutil: CLSU-00101: Operating System error message: No such file or directory|advmutil: CLSU-00103: error location: OOF_1|advmutil: CLSU-00104: additional error information: open64 (/dev/asm/orahomevol-315)|advmutil: ADVM-09006: Error opening volume /dev/asm/orahomevol-315
srvctl upgrade model -first ... failed
Failed to perform first node tasks for cluster modeling upgrade at /u01/crs/11.2.0.3/crs/install/crsconfig_lib.pm line 9088.
/u01/crs/11.2.0.3/perl/bin/perl -I/u01/crs/11.2.0.3/perl/lib -I/u01/crs/11.2.0.3/crs/install /u01/crs/11.2.0.3/crs/install/rootcrs.pl execution failed

So that was not too great indeed-my update failed halfway through. Two facts make this bearable:

  1. rootupgrade.sh (and root.sh for that matter) are restartable since 11.2.0.2 at least
  2. A great deal of logging is available in $GRID_HOME/cfgtoollogs/crsconfig/rootcrs_hostname.log

Now advmutil was correct-there were no volumes in /dev/asm/*

An analysis of the rootcrs_node1.log file showed that the command that failed was this one

2012-01-06 10:09:10: Executing cmd: /u01/crs/11.2.0.3/bin/srvctl upgrade model  -s 11.2.0.2.0 -d 11.2.0.3.0 -p first
2012-01-06 10:09:12: Command output:
>  PRCA-1056 : Unable to upgrade ACFS from version 11.2.0.2.0 to version 11.2.0.3.0
>  PRCT-1011 : Failed to run "advmutil". Detailed error: advmutil: CLSU-00100: Operating System function: open64 failed with error data: 2|advmutil: CLSU-00101: Operating System error message: No such file or directory|advmutil: CLSU-00103: error location: OOF_1|advmutil: CLSU-00104: additional error information: open64 (/dev/asm/orahomevol-315)|advmutil: ADVM-09006: Error opening volume /dev/asm/orahomevol-315
>End Command output
2012-01-06 10:09:12:   "/u01/crs/11.2.0.3/bin/srvctl upgrade model  -s 11.2.0.2.0 -d 11.2.0.3.0 -p first" failed with status 1.
2012-01-06 10:09:12: srvctl upgrade model -first ... failed

Thinking Clearly

Thinking Clearly is an idea I thought I had adopted from Cary Millsap, but sadly I didn’t apply it here! Lesson learned: don’t assume, check!

I however assumed that because of the shutdown of the clusterware stack there wasn’t any Oracle software running on the node, hence there wouldn’t be an ADVM volume BY DEFINITION. Cluster down-ADVM down too.

Upon checking the log file again, I realised how wrong I was. Most of the lower stack Clusterware daemons were actually running by the time the srvctl command failed to upgrade ACFS to 11.2.0.3. So the reason for this failure had to be a different one. It quickly turned out that ALL the ACFS volumes were disabled. A quick check with asmcmd verified this:

$ asmcmd volinfo -a

Volume Name: ORAHOMEVOL
Volume Device: /dev/asm/orahomevol-315
State: DISABLED
Size (MB): 15120
Resize Unit (MB): 256
Redundancy: UNPROT
Stripe Columns: 4
Stripe Width (K): 128
Usage: ACFS
Mountpath: /u01/app/oracle/product/11.2.0.2

OK, that explains it all-disabled volumes are obviously NOT presented in /dev/asm/. A call to “asmcmd volenable -a” sorted that problem.

Back to point 1 – rootupgrade.sh is restartable. I then switched back to the root session and started another attempt at running the script and: (drums please) it worked. Now all that was left to do was to run rootupgrade.sh on the second (and last) node. This completed successfully as well. The required patch for the ASM rolling upgrade by the way is needed there and then-the rootcrs_lastnode.log file has these lines:

2012-01-10 09:44:10: Command output:
>  Started to upgrade the Oracle Clusterware. This operation may take a few minutes.
>  Started to upgrade the CSS.
>  Started to upgrade the CRS.
>  The CRS was successfully upgraded.
>  Oracle Clusterware operating version was successfully set to 11.2.0.3.0
>End Command output
2012-01-10 09:44:10: /u01/crs/11.2.0.3/bin/crsctl set crs activeversion ... passed
2012-01-10 09:45:10: Rolling upgrade is set to 1
2012-01-10 09:45:10: End ASM rolling upgrade
2012-01-10 09:45:10: Executing as oracle: /u01/crs/11.2.0.3/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/11.2.0.2
2012-01-10 09:45:10: Running as user oracle: /u01/crs/11.2.0.3/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/11.2.0.2
2012-01-10 09:45:10:   Invoking "/u01/crs/11.2.0.3/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/11.2.0.2" as user "oracle"
2012-01-10 09:45:10: Executing /bin/su oracle -c "/u01/crs/11.2.0.3/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/11.2.0.2"
2012-01-10 09:45:10: Executing cmd: /bin/su oracle -c "/u01/crs/11.2.0.3/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/11.2.0.2"
2012-01-10 09:45:51: Command output:
>
>  ASM upgrade has finished on last node.
>
>End Command output
2012-01-10 09:45:51: end rolling ASM upgrade in last

Note the ROLLING UPGRADE!

Summary

If your rootupgrade.sh script bails out with ADVMUTIL, check if your ACFS volumes are enabled-they most likely are not.

3 Responses to “Beware of ACFS when upgrading to 11.2.0.3”

  1. Yoav Givon said

    Thanks Martin for the clear info . I assume this will save some extra heart peaks when the time comes..

  2. lishan said

    Hi Martin

    Have you seen in 11.2.0.3 the issue where if files are opened in ACFS mount points and rebooting the server, GI hangs for 15 minutes trying to stop ACFS resources?

    • Martin Bach said

      Hi Lishan,

      haven’t seen this in 11.2.0.3, but I’m aware of a problem with 11.2.0.2 where the kernel modules weren’t unloaded correctly causing the ACFS layer to pause and think very hard for a long time. Which platform are you on? If Linux, are you on kernel-uek or not?

      Martin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: