Monthly Archives: January 2012

Oracle 11.2.0.3 client not relinking properly

One of the tasks I am performing quite regularly is to deploy Oracle software in form of an RPM. In a previous post I described how this proces could work, based on a post by Frits Hoogland.

Employing the same method, I ran into problems with Oracle 11.2.0.x clients. A few facts to start with:

  • Oracle 11.2.0.3 client 64bit
  • Golden image created on Oracel Linux 5
  • Destination: SuSE Enterprise 10 SP4

The problem described here is most likely applicable to other Oracle clients as well although I haven’t verified that.

The problem

After the clone of the client to the SuSE server I couldn’t start SQL*Plus. Fair enough, I hadn’t set the LD_LIBRARY_PATH. After that I still couldn’t launch sqlplus because of a segmention fault. So if the clone wants to play it difficult, then I can always try to relink it all. But to my great astonishment that didn’t solve the problem either! Still same error.

The relink operation writes information into a logfile, called $ORACLE_HOME/install/relink.log. Please note that this isn’t the RDBMS home, when referring to ORACLE_HOME in this article I specifically speak of the client home!

Continue reading

Advertisements

Did you know the cluvfy healthcheck?

While I was performing a three day seminar recently in Switzerland I came across this new option in cluvfy.

Normally you’d run cluvfy in preparation of the installation of Grid Infrastructure or a set of RAC binaries to ensure everything is ready for the next step in the RAC install process. Beginning with 11.2.0.3, there is another option that’s been sneaked in without too much advertisement: the healthcheck.

Part of the “comp” checks, it takes the following options:

cluvfy comp healthcheck [-collect {cluster|database}] [-db db_unique_name] [-bestpractice|-mandatory] [-deviations] [-html] [-save [-savedir directory_path]

Continue reading

RAC One Node on Oracle Enterprise Manager 12c

One of the promises from Oracle for OEM 12c was improved support for Oracle RAC One Node. I have spent quite a bit of time researching RON, and wrote a little article in 2 parts about it which you can find here:

One of my complaints with it was the limited support in OEM 11.1. At the time I was on a major consolidation project, which would have used OEM for management of the database.

OEM 11.1

Unfortunately OEM 11.1 didn’t have support for RAC One Node. Why? RON is a cluster database running on just one node. The interesting bit is that the ORACLE_SID is your normal ORACLE_SID with an underscore and a number. Under normal circumstances that number is _1, or RON_1. But as soon as you relocate the database using srvctl relocate database -d a second instance RON_2 is started until all sessions have failed over.

OEM obviously doesn’t know about RON_2: it was never discovered. Furthermore, the strict mapping of instance name to host is no longer true (the same applies for policy managed databases by the way!). A few weeks and a few switchover operations later you could be running RON_2 on racnode1.

As a consequence, the poor on-call DBA is paged about a database that has gone down, when it hasn’t-it’s up and running. As a DBA, I wouldn’t want that. After discussions with Oracle they promised to fix that problem, but it hasn’t made it into 11.1 hence this blog post about 12.

Continue reading

Beware of ACFS when upgrading to 11.2.0.3

This post is about a potential pitfall when migrating from 11.2.0.x to the next point release. I stumbled over problem this one on a two node cluster.

The operating system is Oracle Linux 5.5 running 11.2.0.2.3 and I wanted to go to 11.2.0.3.0. As you know, Grid Infrastructure upgrades are out-of-place, in other words require a separate Oracle home. This is also one of the reasons I wouldn’t want less than 20G on a non-lab like environment for the Grid Infrastructure mount points …

Now when you are upgrading from 11.2.0.x to 11.2.0.3 you need to apply a one-off patch, but the correct one! Search for patch number 12539000 (11203:ASM UPGRADE FAILED ON FIRST NODE WITH ORA-03113) and apply the one that matches your version-and pay attention to these PSUs! There is the obvious required opatch update to be performed before again as well.

So much for the prerequisites. Oracle 11.2.0.3 is available as patch 10404530, and part 3 is for Grid Infrastructure which has to be done first. This post only covers the GI upgrade, the database part is usually quite uneventful in comparison…

Upgrading Grid Infrastructure

After unzipping the third patch file you start runInstaller. But not before having carefully unset all pointers to the current 11.2.0.2 GRID_HOME (ORACLE_HOME, ORACLE_SID, LD_LIBRARY_PATH, ORA_CRS_HOME, etc)!

Clicking through OUI is mostly a matter of “next”, “next”, “next”, the action starts with the rootupgrade.sh script. Here’s the output from node1:

[root@node1 ~]# /u01/crs/11.2.0.3/rootupgrade.sh
Performing root user operation for Oracle 11g

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME=  /u01/crs/11.2.0.3

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: y
Copying oraenv to /usr/local/bin ...
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: y
Copying coraenv to /usr/local/bin ...

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/crs/11.2.0.3/crs/install/crsconfig_params
Creating trace directory
User ignored Prerequisites during installation

ASM upgrade has started on first node.

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node1'
...
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'node1' has completed
CRS-2673: Attempting to stop 'ora.gpnpd' on 'node1'
CRS-2677: Stop of 'ora.gpnpd' on 'node1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
OLR initialization - successful
Replacing Clusterware entries in inittab
clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 11g Release 2.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
PRCA-1056 : Unable to upgrade ACFS from version 11.2.0.2.0 to version 11.2.0.3.0
PRCT-1011 : Failed to run "advmutil". Detailed error: advmutil:
CLSU-00100: Operating System function: open64 failed with error data: 2advmutil: CLSU-00101: Operating System error message: No such file or directory|advmutil: CLSU-00103: error location: OOF_1|advmutil: CLSU-00104: additional error information: open64 (/dev/asm/orahomevol-315)|advmutil: ADVM-09006: Error opening volume /dev/asm/orahomevol-315
srvctl upgrade model -first ... failed
Failed to perform first node tasks for cluster modeling upgrade at /u01/crs/11.2.0.3/crs/install/crsconfig_lib.pm line 9088.
/u01/crs/11.2.0.3/perl/bin/perl -I/u01/crs/11.2.0.3/perl/lib -I/u01/crs/11.2.0.3/crs/install /u01/crs/11.2.0.3/crs/install/rootcrs.pl execution failed

So that was not too great indeed-my update failed halfway through. Two facts make this bearable:

  1. rootupgrade.sh (and root.sh for that matter) are restartable since 11.2.0.2 at least
  2. A great deal of logging is available in $GRID_HOME/cfgtoollogs/crsconfig/rootcrs_hostname.log

Now advmutil was correct-there were no volumes in /dev/asm/*

An analysis of the rootcrs_node1.log file showed that the command that failed was this one

2012-01-06 10:09:10: Executing cmd: /u01/crs/11.2.0.3/bin/srvctl upgrade model  -s 11.2.0.2.0 -d 11.2.0.3.0 -p first
2012-01-06 10:09:12: Command output:
>  PRCA-1056 : Unable to upgrade ACFS from version 11.2.0.2.0 to version 11.2.0.3.0
>  PRCT-1011 : Failed to run "advmutil". Detailed error: advmutil: CLSU-00100: Operating System function: open64 failed with error data: 2|advmutil: CLSU-00101: Operating System error message: No such file or directory|advmutil: CLSU-00103: error location: OOF_1|advmutil: CLSU-00104: additional error information: open64 (/dev/asm/orahomevol-315)|advmutil: ADVM-09006: Error opening volume /dev/asm/orahomevol-315
>End Command output
2012-01-06 10:09:12:   "/u01/crs/11.2.0.3/bin/srvctl upgrade model  -s 11.2.0.2.0 -d 11.2.0.3.0 -p first" failed with status 1.
2012-01-06 10:09:12: srvctl upgrade model -first ... failed

Thinking Clearly

Thinking Clearly is an idea I thought I had adopted from Cary Millsap, but sadly I didn’t apply it here! Lesson learned: don’t assume, check!

I however assumed that because of the shutdown of the clusterware stack there wasn’t any Oracle software running on the node, hence there wouldn’t be an ADVM volume BY DEFINITION. Cluster down-ADVM down too.

Upon checking the log file again, I realised how wrong I was. Most of the lower stack Clusterware daemons were actually running by the time the srvctl command failed to upgrade ACFS to 11.2.0.3. So the reason for this failure had to be a different one. It quickly turned out that ALL the ACFS volumes were disabled. A quick check with asmcmd verified this:

$ asmcmd volinfo -a

Volume Name: ORAHOMEVOL
Volume Device: /dev/asm/orahomevol-315
State: DISABLED
Size (MB): 15120
Resize Unit (MB): 256
Redundancy: UNPROT
Stripe Columns: 4
Stripe Width (K): 128
Usage: ACFS
Mountpath: /u01/app/oracle/product/11.2.0.2

OK, that explains it all-disabled volumes are obviously NOT presented in /dev/asm/. A call to “asmcmd volenable -a” sorted that problem.

Back to point 1 – rootupgrade.sh is restartable. I then switched back to the root session and started another attempt at running the script and: (drums please) it worked. Now all that was left to do was to run rootupgrade.sh on the second (and last) node. This completed successfully as well. The required patch for the ASM rolling upgrade by the way is needed there and then-the rootcrs_lastnode.log file has these lines:

2012-01-10 09:44:10: Command output:
>  Started to upgrade the Oracle Clusterware. This operation may take a few minutes.
>  Started to upgrade the CSS.
>  Started to upgrade the CRS.
>  The CRS was successfully upgraded.
>  Oracle Clusterware operating version was successfully set to 11.2.0.3.0
>End Command output
2012-01-10 09:44:10: /u01/crs/11.2.0.3/bin/crsctl set crs activeversion ... passed
2012-01-10 09:45:10: Rolling upgrade is set to 1
2012-01-10 09:45:10: End ASM rolling upgrade
2012-01-10 09:45:10: Executing as oracle: /u01/crs/11.2.0.3/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/11.2.0.2
2012-01-10 09:45:10: Running as user oracle: /u01/crs/11.2.0.3/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/11.2.0.2
2012-01-10 09:45:10:   Invoking "/u01/crs/11.2.0.3/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/11.2.0.2" as user "oracle"
2012-01-10 09:45:10: Executing /bin/su oracle -c "/u01/crs/11.2.0.3/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/11.2.0.2"
2012-01-10 09:45:10: Executing cmd: /bin/su oracle -c "/u01/crs/11.2.0.3/bin/asmca -silent -upgradeLocalASM -lastNode /u01/crs/11.2.0.2"
2012-01-10 09:45:51: Command output:
>
>  ASM upgrade has finished on last node.
>
>End Command output
2012-01-10 09:45:51: end rolling ASM upgrade in last

Note the ROLLING UPGRADE!

Summary

If your rootupgrade.sh script bails out with ADVMUTIL, check if your ACFS volumes are enabled-they most likely are not.