Tag Archives: 11.2.0.3

Beware of ACFS when upgrading to 11.2.0.3

This post is about a potential pitfall when migrating from 11.2.0.x to the next point release. I stumbled over problem this one on a two node cluster.

The operating system is Oracle Linux 5.5 running 11.2.0.2.3 and I wanted to go to 11.2.0.3.0. As you know, Grid Infrastructure upgrades are out-of-place, in other words require a separate Oracle home. This is also one of the reasons I wouldn’t want less than 20G on a non-lab like environment for the Grid Infrastructure mount points …

Now when you are upgrading from 11.2.0.x to 11.2.0.3 you need to apply a one-off patch, but the correct one! Search for patch number 12539000 (11203:ASM UPGRADE FAILED ON FIRST NODE WITH ORA-03113) and apply the one that matches your version-and pay attention to these PSUs! There is the obvious required opatch update to be performed before again as well.

So much for the prerequisites. Oracle 11.2.0.3 is available as patch 10404530, and part 3 is for Grid Infrastructure which has to be done first. This post only covers the GI upgrade, the database part is usually quite uneventful in comparison…

Upgrading Grid Infrastructure

After unzipping the third patch file you start runInstaller. But not before having carefully unset all pointers to the current 11.2.0.2 GRID_HOME (ORACLE_HOME, ORACLE_SID, LD_LIBRARY_PATH, ORA_CRS_HOME, etc)!

Clicking through OUI is mostly a matter of “next”, “next”, “next”, the action starts with the rootupgrade.sh script. Here’s the output from node1:

[root@node1 ~]# /u01/crs/11.2.0.3/rootupgrade.sh
Performing root user operation for Oracle 11g

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME=  /u01/crs/11.2.0.3

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: y
Copying oraenv to /usr/local/bin ...
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: y
Copying coraenv to /usr/local/bin ...

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/crs/11.2.0.3/crs/install/crsconfig_params
Creating trace directory
User ignored Prerequisites during installation

ASM upgrade has started on first node.

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node1'
...
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'node1' has completed
CRS-2673: Attempting to stop 'ora.gpnpd' on 'node1'
CRS-2677: Stop of 'ora.gpnpd' on 'node1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
OLR initialization - successful
Replacing Clusterware entries in inittab
clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 11g Release 2.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
PRCA-1056 : Unable to upgrade ACFS from version 11.2.0.2.0 to version 11.2.0.3.0
PRCT-1011 : Failed to run "advmutil". Detailed error: advmutil:
CLSU-00100: Operating System function: open64 failed with error data: 2advmutil: CLSU-00101: Operating System error message: No such file or directory|advmutil: CLSU-00103: error location: OOF_1|advmutil: CLSU-00104: additional error information: open64 (/dev/asm/orahomevol-315)|advmutil: ADVM-09006: Error opening volume /dev/asm/orahomevol-315
srvctl upgrade model -first ... failed
Failed to perform first node tasks for cluster modeling upgrade at /u01/crs/11.2.0.3/crs/install/crsconfig_lib.pm line 9088.
/u01/crs/11.2.0.3/perl/bin/perl -I/u01/crs/11.2.0.3/perl/lib -I/u01/crs/11.2.0.3/crs/install /u01/crs/11.2.0.3/crs/install/rootcrs.pl execution failed

So that was not too great indeed-my update failed halfway through. Two facts make this bearable:

  1. rootupgrade.sh (and root.sh for that matter) are restartable since 11.2.0.2 at least
  2. A great deal of logging is available in $GRID_HOME/cfgtoollogs/crsconfig/rootcrs_hostname.log

Now advmutil was correct-there were no volumes in /dev/asm/*

An analysis of the rootcrs_node1.log file showed that the command that failed was this one

2012-01-06 10:09:10: Executing cmd: /u01/crs/11.2.0.3/bin/srvctl upgrade model  -s 11.2.0.2.0 -d 11.2.0.3.0 -p first
2012-01-06 10:09:12: Command output:
>  PRCA-1056 : Unable to upgrade ACFS from version 11.2.0.2.0 to version 11.2.0.3.0
>  PRCT-1011 : Failed to run "advmutil". Detailed error: advmutil: CLSU-00100: Operating System function: open64 failed with error data: 2|advmutil: CLSU-00101: Operating System error message: No such file or directory|advmutil: CLSU-00103: error location: OOF_1|advmutil: CLSU-00104: additional error information: open64 (/dev/asm/orahomevol-315)|advmutil: ADVM-09006: Error opening volume /dev/asm/orahomevol-315
>End Command output
2012-01-06 10:09:12:   "/u01/crs/11.2.0.3/bin/srvctl upgrade model  -s 11.2.0.2.0 -d 11.2.0.3.0 -p first" failed with status 1.
2012-01-06 10:09:12: srvctl upgrade model -first ... failed

Thinking Clearly

Thinking Clearly is an idea I thought I had adopted from Cary Millsap, but sadly I didn’t apply it here! Lesson learned: don’t assume, check!

I however assumed that because of the shutdown of the clusterware stack there wasn’t any Oracle software running on the node, hence there wouldn’t be an ADVM volume BY DEFINITION. Cluster down-ADVM down too.

Upon checking the log file again, I realised how wrong I was. Most of the lower stack Clusterware daemons were actually running by the time the srvctl command failed to upgrade ACFS to 11.2.0.3. So the reason for this failure had to be a different one. It quickly turned out that ALL the ACFS volumes were disabled. A quick check with asmcmd verified this:

$ asmcmd volinfo -a

Volume Name: ORAHOMEVOL
Volume Device: /dev/asm/orahomevol-315
State: DISABLED
Size (MB): 15120
Resize Unit (MB): 256
Redundancy: UNPROT
Stripe Columns: 4
Stripe Width (K): 128
Usage: ACFS
Mountpath: /u01/app/oracle/product/11.2.0.2

OK, that explains it all-disabled volumes are obviously NOT presented in /dev/asm/. A call to “asmcmd volenable -a” sorted that problem.

Back to point 1 – rootupgrade.sh is restartable. I then switched back to the root session and started another attempt at running the script and: (drums please) it worked. Now all that was left to do was to run rootupgrade.sh on the second (and last) node. This completed successfully as well. The required patch for the ASM rolling upgrade by the way is needed there and then-the rootcrs_lastnode.log file has these lines:

2012-01-10 09:44:10: Command output:
>  Started to upgrade the Oracle Clusterware. This operation may take a few minutes.
>  Started to upgrade the CSS.
>  Started to upgrade the CRS.
>  The CRS was successfully upgraded.
>  Oracle Clusterware operating version was successfully set to 11.2.0.3.0
>End Command output
2012-01-10 09:44:10: /u01/crs/11.2.0.3/bin/crsctl set crs activeversion ... passed
2012-01-10 09:45:10: Rolling upgrade is set to 1
2012-01-10 09:45:10: End ASM rolling upgrade
2012-01-10 09:45:10: Executing as oracle: /u01/crs/11.2.0.3/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/11.2.0.2
2012-01-10 09:45:10: Running as user oracle: /u01/crs/11.2.0.3/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/11.2.0.2
2012-01-10 09:45:10:   Invoking "/u01/crs/11.2.0.3/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/11.2.0.2" as user "oracle"
2012-01-10 09:45:10: Executing /bin/su oracle -c "/u01/crs/11.2.0.3/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/11.2.0.2"
2012-01-10 09:45:10: Executing cmd: /bin/su oracle -c "/u01/crs/11.2.0.3/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/11.2.0.2"
2012-01-10 09:45:51: Command output:
>
>  ASM upgrade has finished on last node.
>
>End Command output
2012-01-10 09:45:51: end rolling ASM upgrade in last

Note the ROLLING UPGRADE!

Summary

If your rootupgrade.sh script bails out with ADVMUTIL, check if your ACFS volumes are enabled-they most likely are not.

Adding another node for RAC 11.2.0.3 on Oracle Linux 6.1 with kernel-UEK

As I have hinted at during my last post about installing Oracle 11.2.0.3 on Oracle Linux 6.1 with Kernel UEK, I have planned another article about adding a node to a cluster.

I deliberately started the installation of my RAC system with only one node to allow my moderately spec’d hardware to deal with a second cluster node. In previous versions of Oracle there was a problem with node additions: the $GRID_HOME/oui/bin/addNode.sh script did pre-requisite checks that used to fail when you had used ASMLib. Unfortuntely, due to my setup I couldn’t test if that was solved (I didn’t use ASMLib).

Cluvfy

As with many cluster operations on non-Exadata you should use the cluvfy tool to ensure that the system you want to add to the cluster meets the requirements. Here’s an example session for the cluvfy output. Since I am about to add a node, the stage has to be “-pre nodeadd”. rac11203node1 is the active cluster node, and rac11203node2 the one I want to add. Note that you run the command from (any) existing node, specifying the nodes to be added with the “-n” parameter. For convenience I have added the “-fixup” option to generate fixup scripts if needed. Also note that this is a lab environment, real production environments would use dm-multipath for storage and a bonded pair of NICs for the public network. Since 11.2.0.2 you no longer need to bond your private NICs, Oracle does that for you now.

Continue reading

Installing Grid Infrastructure 11.2.0.3 on Oracle Linux 6.1 with kernel UEK

Installing Grid Infrastructure 11.2.0.3 on Oracle Linux 6.1

Yesterday was the big day, or the day Oracle release 11.2.0.3 for Linux x86 and x86-64. Time to download and experiment! The following assumes you have already configured RAC 11g Release 2 before, it’s not a step by step guide how to do this. I expect those to shoot out of the grass like mushrooms in the next few days, especially since the weekend allows people to do the same I did!

The Operating System

I have prepared a xen domU for 11.2.0.3, using the latest Oracle Linux 6.1 build I could find. In summary, I am using the following settings:

  • Oracle Linux 6.1 64-bit
  • Oracle Linux Server-uek (2.6.32-100.34.1.el6uek.x86_64)
  • Initially installed to use the “database server” package group
  • 3 NICs – 2 for the HAIP resource and the private interconnect with IP addresses in the ranges of 192.168.100.0/24 and 192.168.101.0/24. The public NIC is on 192.168.99.0/24
    • Node 1 uses 192.168.(99|100|101).129 for eth0, eth1 and eth2. The VIP uses 192.168.99.130
    • Node 1 uses 192.168.(99|100|101).131 for eth0, eth1 and eth2. The VIP uses 192.168.99.132
    • The SCAN is on 192.168.99.(133|134|135)
    • All naming resolution is done via my dom0 bind9 server
  • I am using a 8GB virtual disk for the operating system, and a 20G LUN for the oracle Grid and RDBMS homes. The 20G are subdivided into 2 LVMs of 10G each mounted to /u01/app/oracle and /u01/crs/11.2.0.3. Note you now seem to need 7.5 G for GRID_HOME
  • All software is owned by Oracle
  • Shared storage is provided by the xen blktap driver
    • 3 x 1G LUNs for +OCR containing OCR and voting disks
    • 1 x 10G for +DATA
    • 1 x 10G for +RECO

Continue reading

Oracle 11.2.0.3-can’t be long now!

Update: well it’s out, actually. See the comments below. However the certification matrix hasn’t been updated so it’s anyone’s guess if Oracle/Red Hat 6 are certified at this point in time.

Tanel Poder has already announced it a few days ago, but 11.2.0.3 must be ready for release very soon. It has even been spotted in the “lastet patchset” page on OTN, only to be removed quickly. After another tweet came out from Laurent Schneider, it was time to investigate what’s new. The easiest way is to point your browser to tahiti.oracle.com and type “11.2.0.3” into the search box. You are going to find a wealth of new information!

As a RAC person by heart I am naturally interested in RAC features first. The new features I spotted in the Grid Infrastructure installation guide for Linux are listed here:

http://download.oracle.com/docs/cd/E11882_01/install.112/e22489/whatsnew.htm#CEGJBBBB

Additional information for this article was taken from the “New Features” guide:

http://download.oracle.com/docs/cd/E11882_01/server.112/e22487/chapter1_11203.htm#NEWFTCH1-c

So the question is-what’s in for us?

ASM Cluster File System

As I expected, there was support for ACFS and ADVM for Oracle’s own kernel. This has been overdue for a while. I remember how surprised I was when I installed RAC on Oracle Linux 5.5 with the UEK kernel only to see that infamous “…not supported” output when the installer probed the kernel version. Supported kernels are Linux kernels UEK5-2.6.32-100.34.1 and subsequent updates to 2.6.32-100 kernels for Oracle Linux kernels OL5 and OL6.

A big surprise is support for ACFS for SLES-I though that was pretty much dead in the water after all that messing around from Novell. ACFS has always worked on SLES 10 up to SP3, but it did never for SLES 11. The requirement is SLES 11 SP1, and it has to be 64bit.

There are quite a few additional changes to ACFS. For example, it’s now possible to use ACFS replication and tagging on Windows.

Upgrade

If one of the nodes in the cluster being upgraded with the rootupgrade.sh script fails during the execution of said script, the operation can be completed with the new “force” flag.

Random Bits

I found the following note on MOS regarding the time zone file: Actions For DST Updates When Upgrading To Or Applying The 11.2.0.3 Patchset [ID 1358166.1] I suggest you have a look at that note, as it mentions a new pre-upgrade script you need to download and corrective actions for 11.2.0.1 and 11.2.0.2. I’m sure it’s going to be mentioned in the 11.2.0.3 patch readme as well.

There are also changes expected with the dreaded mutex problem in busy systems, MOS note WAITEVENT: “library cache: mutex X” [ID 727400.1] lists 11.2.0.3 as the release where many of the problems related to this are fixed. Time will tell if they are…

Further enhancements are focused on Warehouse Builder, and XML. SQL apply and the log miner have also been enhanced which is good news for users of Streams and logical standby databases

Summary

It is much to early to say anything else about the patchset. Quite a few important documents don’t have a “new features” section yet. That includes the Real Application Clusters Administration and Deployment guide as well which I’ll cover as soon as it’s out. From a quick glance at the still unreleased patchset it seems it’s less of a radical change than 11.2.0.2 was which is a good thing. Unfortunately the certification matrix hasn’t been updated yet, I am very keen to see support for Oracle/Red Hat Linux 6.x.