This post is about a potential pitfall when migrating from 11.2.0.x to the next point release. I stumbled over problem this one on a two node cluster.
The operating system is Oracle Linux 5.5 running 220.127.116.11.3 and I wanted to go to 18.104.22.168.0. As you know, Grid Infrastructure upgrades are out-of-place, in other words require a separate Oracle home. This is also one of the reasons I wouldn’t want less than 20G on a non-lab like environment for the Grid Infrastructure mount points …
Now when you are upgrading from 11.2.0.x to 22.214.171.124 you need to apply a one-off patch, but the correct one! Search for patch number 12539000 (11203:ASM UPGRADE FAILED ON FIRST NODE WITH ORA-03113) and apply the one that matches your version-and pay attention to these PSUs! There is the obvious required opatch update to be performed before again as well.
So much for the prerequisites. Oracle 126.96.36.199 is available as patch 10404530, and part 3 is for Grid Infrastructure which has to be done first. This post only covers the GI upgrade, the database part is usually quite uneventful in comparison…
Upgrading Grid Infrastructure
After unzipping the third patch file you start runInstaller. But not before having carefully unset all pointers to the current 188.8.131.52 GRID_HOME (ORACLE_HOME, ORACLE_SID, LD_LIBRARY_PATH, ORA_CRS_HOME, etc)!
Clicking through OUI is mostly a matter of “next”, “next”, “next”, the action starts with the rootupgrade.sh script. Here’s the output from node1:
[root@node1 ~]# /u01/crs/184.108.40.206/rootupgrade.sh
Performing root user operation for Oracle 11g
The following environment variables are set as:
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The file "oraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
Copying oraenv to /usr/local/bin ...
The file "coraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
Copying coraenv to /usr/local/bin ...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/crs/220.127.116.11/crs/install/crsconfig_params
Creating trace directory
User ignored Prerequisites during installation
ASM upgrade has started on first node.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node1'
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'node1' has completed
CRS-2673: Attempting to stop 'ora.gpnpd' on 'node1'
CRS-2677: Stop of 'ora.gpnpd' on 'node1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
OLR initialization - successful
Replacing Clusterware entries in inittab
clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 11g Release 2.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
PRCA-1056 : Unable to upgrade ACFS from version 18.104.22.168.0 to version 22.214.171.124.0
PRCT-1011 : Failed to run "advmutil". Detailed error: advmutil:
CLSU-00100: Operating System function: open64 failed with error data: 2advmutil: CLSU-00101: Operating System error message: No such file or directory|advmutil: CLSU-00103: error location: OOF_1|advmutil: CLSU-00104: additional error information: open64 (/dev/asm/orahomevol-315)|advmutil: ADVM-09006: Error opening volume /dev/asm/orahomevol-315
srvctl upgrade model -first ... failed
Failed to perform first node tasks for cluster modeling upgrade at /u01/crs/126.96.36.199/crs/install/crsconfig_lib.pm line 9088.
/u01/crs/188.8.131.52/perl/bin/perl -I/u01/crs/184.108.40.206/perl/lib -I/u01/crs/220.127.116.11/crs/install /u01/crs/18.104.22.168/crs/install/rootcrs.pl execution failed
So that was not too great indeed-my update failed halfway through. Two facts make this bearable:
- rootupgrade.sh (and root.sh for that matter) are restartable since 22.214.171.124 at least
- A great deal of logging is available in $GRID_HOME/cfgtoollogs/crsconfig/rootcrs_hostname.log
Now advmutil was correct-there were no volumes in /dev/asm/*
An analysis of the rootcrs_node1.log file showed that the command that failed was this one
2012-01-06 10:09:10: Executing cmd: /u01/crs/126.96.36.199/bin/srvctl upgrade model -s 188.8.131.52.0 -d 184.108.40.206.0 -p first
2012-01-06 10:09:12: Command output:
> PRCA-1056 : Unable to upgrade ACFS from version 220.127.116.11.0 to version 18.104.22.168.0
> PRCT-1011 : Failed to run "advmutil". Detailed error: advmutil: CLSU-00100: Operating System function: open64 failed with error data: 2|advmutil: CLSU-00101: Operating System error message: No such file or directory|advmutil: CLSU-00103: error location: OOF_1|advmutil: CLSU-00104: additional error information: open64 (/dev/asm/orahomevol-315)|advmutil: ADVM-09006: Error opening volume /dev/asm/orahomevol-315
>End Command output
2012-01-06 10:09:12: "/u01/crs/22.214.171.124/bin/srvctl upgrade model -s 126.96.36.199.0 -d 188.8.131.52.0 -p first" failed with status 1.
2012-01-06 10:09:12: srvctl upgrade model -first ... failed
Thinking Clearly is an idea I thought I had adopted from Cary Millsap, but sadly I didn’t apply it here! Lesson learned: don’t assume, check!
I however assumed that because of the shutdown of the clusterware stack there wasn’t any Oracle software running on the node, hence there wouldn’t be an ADVM volume BY DEFINITION. Cluster down-ADVM down too.
Upon checking the log file again, I realised how wrong I was. Most of the lower stack Clusterware daemons were actually running by the time the srvctl command failed to upgrade ACFS to 184.108.40.206. So the reason for this failure had to be a different one. It quickly turned out that ALL the ACFS volumes were disabled. A quick check with asmcmd verified this:
$ asmcmd volinfo -a
Volume Name: ORAHOMEVOL
Volume Device: /dev/asm/orahomevol-315
Size (MB): 15120
Resize Unit (MB): 256
Stripe Columns: 4
Stripe Width (K): 128
OK, that explains it all-disabled volumes are obviously NOT presented in /dev/asm/. A call to “asmcmd volenable -a” sorted that problem.
Back to point 1 – rootupgrade.sh is restartable. I then switched back to the root session and started another attempt at running the script and: (drums please) it worked. Now all that was left to do was to run rootupgrade.sh on the second (and last) node. This completed successfully as well. The required patch for the ASM rolling upgrade by the way is needed there and then-the rootcrs_lastnode.log file has these lines:
2012-01-10 09:44:10: Command output:
> Started to upgrade the Oracle Clusterware. This operation may take a few minutes.
> Started to upgrade the CSS.
> Started to upgrade the CRS.
> The CRS was successfully upgraded.
> Oracle Clusterware operating version was successfully set to 220.127.116.11.0
>End Command output
2012-01-10 09:44:10: /u01/crs/18.104.22.168/bin/crsctl set crs activeversion ... passed
2012-01-10 09:45:10: Rolling upgrade is set to 1
2012-01-10 09:45:10: End ASM rolling upgrade
2012-01-10 09:45:10: Executing as oracle: /u01/crs/22.214.171.124/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/126.96.36.199
2012-01-10 09:45:10: Running as user oracle: /u01/crs/188.8.131.52/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/184.108.40.206
2012-01-10 09:45:10: Invoking "/u01/crs/220.127.116.11/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/18.104.22.168" as user "oracle"
2012-01-10 09:45:10: Executing /bin/su oracle -c "/u01/crs/22.214.171.124/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/126.96.36.199"
2012-01-10 09:45:10: Executing cmd: /bin/su oracle -c "/u01/crs/188.8.131.52/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/184.108.40.206"
2012-01-10 09:45:51: Command output:
> ASM upgrade has finished on last node.
>End Command output
2012-01-10 09:45:51: end rolling ASM upgrade in last
Note the ROLLING UPGRADE!
If your rootupgrade.sh script bails out with ADVMUTIL, check if your ACFS volumes are enabled-they most likely are not.