Beware of ACFS when upgrading to 126.96.36.199
Posted by Martin Bach on January 10, 2012
This post is about a potential pitfall when migrating from 11.2.0.x to the next point release. I stumbled over problem this one on a two node cluster.
The operating system is Oracle Linux 5.5 running 188.8.131.52.3 and I wanted to go to 184.108.40.206.0. As you know, Grid Infrastructure upgrades are out-of-place, in other words require a separate Oracle home. This is also one of the reasons I wouldn’t want less than 20G on a non-lab like environment for the Grid Infrastructure mount points …
Now when you are upgrading from 11.2.0.x to 220.127.116.11 you need to apply a one-off patch, but the correct one! Search for patch number 12539000 (11203:ASM UPGRADE FAILED ON FIRST NODE WITH ORA-03113) and apply the one that matches your version-and pay attention to these PSUs! There is the obvious required opatch update to be performed before again as well.
So much for the prerequisites. Oracle 18.104.22.168 is available as patch 10404530, and part 3 is for Grid Infrastructure which has to be done first. This post only covers the GI upgrade, the database part is usually quite uneventful in comparison…
Upgrading Grid Infrastructure
After unzipping the third patch file you start runInstaller. But not before having carefully unset all pointers to the current 22.214.171.124 GRID_HOME (ORACLE_HOME, ORACLE_SID, LD_LIBRARY_PATH, ORA_CRS_HOME, etc)!
Clicking through OUI is mostly a matter of “next”, “next”, “next”, the action starts with the rootupgrade.sh script. Here’s the output from node1:
[root@node1 ~]# /u01/crs/126.96.36.199/rootupgrade.sh Performing root user operation for Oracle 11g The following environment variables are set as: ORACLE_OWNER= oracle ORACLE_HOME= /u01/crs/188.8.131.52 Enter the full pathname of the local bin directory: [/usr/local/bin]: The contents of "dbhome" have not changed. No need to overwrite. The file "oraenv" already exists in /usr/local/bin. Overwrite it? (y/n) [n]: y Copying oraenv to /usr/local/bin ... The file "coraenv" already exists in /usr/local/bin. Overwrite it? (y/n) [n]: y Copying coraenv to /usr/local/bin ... Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root script. Now product-specific root actions will be performed. Using configuration parameter file: /u01/crs/184.108.40.206/crs/install/crsconfig_params Creating trace directory User ignored Prerequisites during installation ASM upgrade has started on first node. CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node1' ... CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'node1' has completed CRS-2673: Attempting to stop 'ora.gpnpd' on 'node1' CRS-2677: Stop of 'ora.gpnpd' on 'node1' succeeded CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node1' has completed CRS-4133: Oracle High Availability Services has been stopped. OLR initialization - successful Replacing Clusterware entries in inittab clscfg: EXISTING configuration version 5 detected. clscfg: version 5 is 11g Release 2. Successfully accumulated necessary OCR keys. Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. PRCA-1056 : Unable to upgrade ACFS from version 220.127.116.11.0 to version 18.104.22.168.0 PRCT-1011 : Failed to run "advmutil". Detailed error: advmutil: CLSU-00100: Operating System function: open64 failed with error data: 2advmutil: CLSU-00101: Operating System error message: No such file or directory|advmutil: CLSU-00103: error location: OOF_1|advmutil: CLSU-00104: additional error information: open64 (/dev/asm/orahomevol-315)|advmutil: ADVM-09006: Error opening volume /dev/asm/orahomevol-315 srvctl upgrade model -first ... failed Failed to perform first node tasks for cluster modeling upgrade at /u01/crs/22.214.171.124/crs/install/crsconfig_lib.pm line 9088. /u01/crs/126.96.36.199/perl/bin/perl -I/u01/crs/188.8.131.52/perl/lib -I/u01/crs/184.108.40.206/crs/install /u01/crs/220.127.116.11/crs/install/rootcrs.pl execution failed
So that was not too great indeed-my update failed halfway through. Two facts make this bearable:
- rootupgrade.sh (and root.sh for that matter) are restartable since 18.104.22.168 at least
- A great deal of logging is available in $GRID_HOME/cfgtoollogs/crsconfig/rootcrs_hostname.log
Now advmutil was correct-there were no volumes in /dev/asm/*
An analysis of the rootcrs_node1.log file showed that the command that failed was this one
2012-01-06 10:09:10: Executing cmd: /u01/crs/22.214.171.124/bin/srvctl upgrade model -s 126.96.36.199.0 -d 188.8.131.52.0 -p first 2012-01-06 10:09:12: Command output: > PRCA-1056 : Unable to upgrade ACFS from version 184.108.40.206.0 to version 220.127.116.11.0 > PRCT-1011 : Failed to run "advmutil". Detailed error: advmutil: CLSU-00100: Operating System function: open64 failed with error data: 2|advmutil: CLSU-00101: Operating System error message: No such file or directory|advmutil: CLSU-00103: error location: OOF_1|advmutil: CLSU-00104: additional error information: open64 (/dev/asm/orahomevol-315)|advmutil: ADVM-09006: Error opening volume /dev/asm/orahomevol-315 >End Command output 2012-01-06 10:09:12: "/u01/crs/18.104.22.168/bin/srvctl upgrade model -s 22.214.171.124.0 -d 126.96.36.199.0 -p first" failed with status 1. 2012-01-06 10:09:12: srvctl upgrade model -first ... failed
Thinking Clearly is an idea I thought I had adopted from Cary Millsap, but sadly I didn’t apply it here! Lesson learned: don’t assume, check!
I however assumed that because of the shutdown of the clusterware stack there wasn’t any Oracle software running on the node, hence there wouldn’t be an ADVM volume BY DEFINITION. Cluster down-ADVM down too.
Upon checking the log file again, I realised how wrong I was. Most of the lower stack Clusterware daemons were actually running by the time the srvctl command failed to upgrade ACFS to 188.8.131.52. So the reason for this failure had to be a different one. It quickly turned out that ALL the ACFS volumes were disabled. A quick check with asmcmd verified this:
$ asmcmd volinfo -a Volume Name: ORAHOMEVOL Volume Device: /dev/asm/orahomevol-315 State: DISABLED Size (MB): 15120 Resize Unit (MB): 256 Redundancy: UNPROT Stripe Columns: 4 Stripe Width (K): 128 Usage: ACFS Mountpath: /u01/app/oracle/product/184.108.40.206
OK, that explains it all-disabled volumes are obviously NOT presented in /dev/asm/. A call to “asmcmd volenable -a” sorted that problem.
Back to point 1 – rootupgrade.sh is restartable. I then switched back to the root session and started another attempt at running the script and: (drums please) it worked. Now all that was left to do was to run rootupgrade.sh on the second (and last) node. This completed successfully as well. The required patch for the ASM rolling upgrade by the way is needed there and then-the rootcrs_lastnode.log file has these lines:
2012-01-10 09:44:10: Command output: > Started to upgrade the Oracle Clusterware. This operation may take a few minutes. > Started to upgrade the CSS. > Started to upgrade the CRS. > The CRS was successfully upgraded. > Oracle Clusterware operating version was successfully set to 220.127.116.11.0 >End Command output 2012-01-10 09:44:10: /u01/crs/18.104.22.168/bin/crsctl set crs activeversion ... passed 2012-01-10 09:45:10: Rolling upgrade is set to 1 2012-01-10 09:45:10: End ASM rolling upgrade 2012-01-10 09:45:10: Executing as oracle: /u01/crs/22.214.171.124/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/126.96.36.199 2012-01-10 09:45:10: Running as user oracle: /u01/crs/188.8.131.52/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/184.108.40.206 2012-01-10 09:45:10: Invoking "/u01/crs/220.127.116.11/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/18.104.22.168" as user "oracle" 2012-01-10 09:45:10: Executing /bin/su oracle -c "/u01/crs/22.214.171.124/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/126.96.36.199" 2012-01-10 09:45:10: Executing cmd: /bin/su oracle -c "/u01/crs/188.8.131.52/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/184.108.40.206" 2012-01-10 09:45:51: Command output: > > ASM upgrade has finished on last node. > >End Command output 2012-01-10 09:45:51: end rolling ASM upgrade in last
Note the ROLLING UPGRADE!
If your rootupgrade.sh script bails out with ADVMUTIL, check if your ACFS volumes are enabled-they most likely are not.