Martins Blog

Trying to explain complex things in simple terms

The tale of restoring the OCR and voting files on Linux for RAC 11.2.0.2

Posted by Martin Bach on October 13, 2011

As part of a server move from one data centre to another I enjoyed working in the depths of Clusterware. This one has been a rather simple case though: the public IP addresses were the only part of the package to change: simple. One caveat though was the recreation of the OCR disk group I am using for the OCR and 3 copies of the voting file. I decided to reply on the backups I took before the server move.

Once the kit has been rewired in the new data centre, it was time to get active. The /etc/multipath.conf file had to be touched to add the new LUNs for my +OCR disk group. I have described the processes in a number of articles, for example here:

https://martincarstenbach.wordpress.com/2011/01/14/adding-storage-dynamically-to-asm-on-linux/

A few facts before we start:

  • Oracle Enterprise Linux 5.5 64bit
  • device-mapper-multipath-0.4.7
  • Grid Infrastructure 11.2.0.2.2 (actually it is Oracle Database SAP Bundle Patch 11.2.0.2.2)
  • ASMLib

I have already described how to restore the OCR and voting files in 11.2.0.1 in “Pro Oracle Database RAC 11g on Linux”, but since then the procedure has changed slightly I thought I’d add this here. The emphasis is on “slightly”.In this blog post I’ll describe what you need to do if you lose the disk group containing OCR and voting disks on a Linux system using ASMLib.  Before the server move I recorded the location of the OCR/voting disk disk group:

SQL> select d.name,  d.path, dg.name as dg_name
2  from v$asm_disk d, v$asm_diskgroup dg
3  where d.group_number = dg.group_number
4  and dg.name = 'OCR'
5  /

NAME       PATH                 DG_NAME
---------- -------------------- ----------
OCR0001    ORCL:OCR0001         OCR
OCR0002    ORCL:OCR0002         OCR
OCR0003    ORCL:OCR0003         OCR

SQL>

After the server has come back on the network, I first ensured everything was stopped:

[root@node1 cluster01]# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'node1'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'node1'
CRS-2673: Attempting to stop 'ora.crf' on 'node1'
CRS-2673: Attempting to stop 'ora.diskmon' on 'node1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'node1'
CRS-2677: Stop of 'ora.cssdmonitor' on 'node1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'node1' succeeded
CRS-2677: Stop of 'ora.crf' on 'node1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'node1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'node1' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'node1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'node1'
CRS-2677: Stop of 'ora.diskmon' on 'node1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'node1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

The next step is to start the cluster in exclusive mode. In 11.2.0.1 it was enough to just use the crsctl start crs -excl, from 11.2.0.2 onwards  you also have to add the -nocrs flag. If you don’t, crsd will try to start, but can’t find a voting file and everything spins/hangs until the  Clusterware runs out of retries and the command fails. Here’s the example output with the correct command syntax:

[root@node1 cluster01]# crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'node1'
CRS-2676: Start of 'ora.mdnsd' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'node1'
CRS-2676: Start of 'ora.gpnpd' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'node1'
CRS-2672: Attempting to start 'ora.gipcd' on 'node1'
CRS-2676: Start of 'ora.cssdmonitor' on 'node1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'node1'
CRS-2672: Attempting to start 'ora.diskmon' on 'node1'
CRS-2676: Start of 'ora.diskmon' on 'node1' succeeded
CRS-2676: Start of 'ora.cssd' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'node1'
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'node1'
CRS-2672: Attempting to start 'ora.ctssd' on 'node1'
CRS-2676: Start of 'ora.drivers.acfs' on 'node1' succeeded
CRS-2676: Start of 'ora.ctssd' on 'node1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'node1'
CRS-2676: Start of 'ora.asm' on 'node1' succeeded

As I said, I have created the disk group with exactly the same name as the one lost. This is very important, or the restore won’t work. The $GRID_HOME/log/`hostname`/client directory contains logs in case you have to troubleshoot. Inside the $GRID_HOME/cdata/<cluster name> directory,  you find all the relevant backups. Ensure you are using the latest backup-this is on the OCR master node. Check each backup directory on the cluster nodes to find the most recent backup. Once the most recent backup has been located, restore it:

[root@node1 cluster01]# ocrconfig -restore backup00.ocr

This worked, as shown in the alert<hostname>.log file in $GRID_HOME/log/`hostname`/:

[/u01/crs/product/11.2.0.2/bin/oraagent.bin(32015)]CRS-5019:All OCR locations are on ASM disk groups [OCR], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/crs/product/11.2.0.2/log/node1/agent/ohasd/oraagent_oracle/oraagent_oracle.log".
2011-10-06 11:03:38.877 [client(1444)]CRS-1002:The OCR was restored from file backup00.ocr.

All right, that sorts the OCR out. Now it’s time to restore the voting disks:

[root@node1 cluster01]# crsctl query css votedisk
Located 0 voting disk(s).
[root@node1 cluster01]# crsctl replace votedisk +OCR
Successful addition of voting disk 361e36921dd64f89bfd63cdbade79651.
Successful addition of voting disk 58f769be54e74fbcbfc655afe290268d.
Successful addition of voting disk af6c1890bb594f72bf39ef626b8fcc8f.
Successfully replaced voting disk group with +OCR.
CRS-4266: Voting file(s) successfully replaced
[root@node1 cluster01]# crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   361e36921dd64f89bfd63cdbade79651 (ORCL:OCR0001) [OCR]
2. ONLINE   58f769be54e74fbcbfc655afe290268d (ORCL:OCR0002) [OCR]
3. ONLINE   af6c1890bb594f72bf39ef626b8fcc8f (ORCL:OCR0003) [OCR]
Located 3 voting disk(s).

This, too, is also recorded, for example in the CSSD log file:

2011-10-06 11:04:59.169
[cssd(32100)]CRS-1605:CSSD voting file is online: ORCL:OCR0001; details in /u01/crs/product/11.2.0.2/log/node1/cssd/ocssd.log.
2011-10-06 11:04:59.169
[cssd(32100)]CRS-1605:CSSD voting file is online: ORCL:OCR0002; details in /u01/crs/product/11.2.0.2/log/node1/cssd/ocssd.log.
2011-10-06 11:04:59.169
[cssd(32100)]CRS-1605:CSSD voting file is online: ORCL:OCR0003; details in /u01/crs/product/11.2.0.2/log/node1/cssd/ocssd.log.
2011-10-06 11:04:59.170
[cssd(32100)]CRS-1626:A Configuration change request completed successfully
2011-10-06 11:04:59.179
[cssd(32100)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node1 .

Now all that remains is to get the cluster back into “normal” mode-stop it and start it, as shown here:

[root@node1 cluster01]# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'node1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'node1'
CRS-2673: Attempting to stop 'ora.asm' on 'node1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'node1'
CRS-2677: Stop of 'ora.asm' on 'node1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'node1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'node1' succeeded
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'node1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'node1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'node1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'node1'
CRS-2677: Stop of 'ora.cssd' on 'node1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'node1'
CRS-2673: Attempting to stop 'ora.diskmon' on 'node1'
CRS-2677: Stop of 'ora.gipcd' on 'node1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'node1'
CRS-2677: Stop of 'ora.diskmon' on 'node1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'node1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@node1 cluster01]# crsctl start cluster
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Start failed, or completed with errors.
[root@node1 cluster01]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

Now all you need to do is wait, and check for the cluster status:

[root@node2 ~]# crsctl check cluster -all
**************************************************************
node1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
node2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************

All is well that ends well, the procedure is much the same it always was. Just remember the “-nocrs” flag.

Reference

How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix systems [ID 1062983.1]

2 Responses to “The tale of restoring the OCR and voting files on Linux for RAC 11.2.0.2”

  1. jarneil said

    Hi Martin,

    Nice!

    Just curious, did you need the -f in the

    crsctl stop crs -f,

    when you are stoping after all the restoring is done?

    jason.

  2. Another option would be to put a copy of the OCR to another disk group (e.g. DATA) with ocrconfig and drop the old one. After that you might want to exchange the voting disks one after each other or replace them completely.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: