One of the projects I was working on recently couldn’t be finished as it should have been due to a lack of switch ports in the SAN. The QA database servers therefore only had single pathing to the storage whereas the production kit (thank god!) had multipathing enabled and configured correctly.
UPDATE 221219: this document refers to software that has since reached its End Of Life date. This article is now archived and shouldn’t be referred to.
This was far from ideal but couldn’t be helped, and it was also out of my control. You could argue that the project managers could have been aware of this and factor it in but I digress. On top of that I had a strange situation where the 3 nodes mapped the LUNs differently-what was seen as /dev/sda on node 1was /dev/sdf on the second node (but /dev/sda again on the third). The presentation of the LUNs to the database servers on the array was identical for each of the 3 nodes. How did I find out? I asked the storage admin to provide the WWIDs as seen on the management console and compared these against /dev/disk/by-id/. This only works with single pathing as explained further down.
The Linux kernel documentation states that the enumeration of devices isn’t static, and unfortunately there is no direct link between controller, target, disk and slice as in Solaris. These are RHEL 5.3 64bit system by the way, and Oracle 10.2.0.4.1 RAC.
There are 2 ways to get around this problem:
- Metalink note 414897.1 describes how to set up udev for this purpose (and it includes 11.2!)
- Use /dev/disk/by-id and /etc/rc.local
Since the problem was promised to be rectified soon I decided not to bother with udev (let’s say it’s neither very user friendly nor intuitive). So all I needed for Clusterware’s installation session to get going were 2 raw devices, mapped to partitions on a 5 GB shared LUN. This LUN was partitioned into 5 equal slices, intended to store 3 voting disks and 2 OCR copies; initially I decided to map only 2 of these for 1 vote disk and 1 OCR which is sufficient to install Clusterware. The plan is to move these off raw devices into the 20th century and onto block devices, something that is possible without this quirky workaround in 11.1. Maybe in another blog entry!
For the time being, the workaround was to enter the following into /etc/rc.local on each node:
raw /dev/raw/raw1 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000-part1 raw /dev/raw/raw2 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000-part2 sleep 5 chmod 644 /dev/raw/raw* chown oracle:oinstall /dev/raw/raw*
This worked beautifully until said change was implemented without really notifying the DBAs, and as it happened I was on call. Somehow Murphy’s law always strikes me, I seem to magically attract that kind of trouble. After the sys admins and storage admins finished their work, the partition table looked strikingly different from the last reboot… And CRS didn’t start either. Time for some serious troubleshooting. The first sign of problems were visible straight after logging in:
[oracle@nodeb~]$ ps -ef | grep smon oracle 17770 17733 0 15:15 pts/0 00:00:00 grep smon
There should have been an ASM instance and a database, hmm. With the Oracle instances not started, something must have been wrong with clusterware. Usually the problem occurs when the raw devices are bound, but the permissions aren’t set correctly. I double checked:
[root@nodec ~]# raw -qa /dev/raw/raw1:bound to major 67, minor 1 /dev/raw/raw2:bound to major 67, minor 2 [root@nodec ~]# ls -l /dev/disk/by-id/scsi-3600508b4000123ca0000e0000219 0000-part1 lrwxrwxrwx 1 root root 11 Nov 28 11:35 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000-part1 -> ../../sdat1 [root@nodec ~]# ls -l /dev/sdat1 brw-r----- 1 root disk 66, 209 Nov 28 11:35 /dev/sdat1
I then changed the raw device permissions and ownership, just to be sure and started CRS:
[root@nodec ~]# chmod 644 /dev/raw/raw* [root@nodec ~]# chown oracle:oinstall /dev/raw/raw* [oracle@nodea ~]$ ls -l /dev/raw/raw* crw-r--r-- 1 oracle oinstall 162, 1 Nov 28 11:36 /dev/raw/raw1 crw-r--r-- 1 oracle oinstall 162, 2 Nov 28 11:36 /dev/raw/raw2 [root@nodec ~]# /u01/crs/oracle/product/crs/bin/crsctl start crs Attempting to start CRS stack The CRS stack will be started shortly ... [root@nodec ~]# /u01/crs/oracle/product/crs/bin/crsctl start crs_stat -t CRS-0184: Cannot communicate with the CRS daemon.
That didn’t seem to have done the trick. At this stage I noticed that the partition table had lots of entries it previously didn’t have, and it dawned to me that the multipathing was now working, but /etc/multipath.conf (we use device mapper multipath) wasn’t configured. I assume that the pointer to the /dev/disk/by-id/ didn’t work because udev (which maintains these symlinks) can’t cope with multipathing, but correct me if I’m wrong. Digging further, I saw these lines in /var/log/messages:
Nov 28 15:28:07 nodea logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.14288.
Nov 28 15:28:07 nodea logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.14369.
Nov 28 15:28:07 nodea logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.14384.
The contents of the files in /tmp were identical:
[root@nodea nodea]# cat /tmp/crsctl.14384
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [Device or resource busy] [16]
I found nothing suitable on this “[16]” in either Metalink or google search-btw, PROC-26 … [13] indicates permission problems on the raw/block devices accessed for Clusterware! The alert_hostname.log file in $ORA_CRS_HOME/log/hostname wasn’t even modified since the server was rebooted, and the processes on the OS level also indicated trouble:
[root@nodea nodea]# ps -ef | grep -E 'init|d.bin|ocls|oprocd|diskmon|evmlogger|sleep|PID' UID PID PPID C STIME TTY TIME CMD root 1 0 0 11:33 ? 00:00:04 init [3] root 3542 14384 0 16:08 ? 00:00:00 /bin/sleep 60 root 3543 14288 0 16:08 ? 00:00:00 /bin/sleep 60 root 3544 14369 0 16:08 ? 00:00:00 /bin/sleep 60 root 3652 514 0 16:08 pts/1 00:00:00 grep -E init|d.bin|ocls|oprocd|diskmon|evmlogger|sleep|PID root 14216 1 0 11:36 ? 00:00:00 /bin/sh /etc/init.d/init.evmd run root 14218 1 0 11:36 ? 00:00:00 /bin/sh /etc/init.d/init.cssd fatal root 14219 1 0 11:36 ? 00:00:00 /bin/sh /etc/init.d/init.crsd run root 14288 14216 0 11:36 ? 00:00:00 /bin/sh /etc/init.d/init.cssd startcheck root 14369 14219 0 11:36 ? 00:00:00 /bin/sh /etc/init.d/init.cssd startcheck root 14384 14218 0 11:36 ? 00:00:00 /bin/sh /etc/init.d/init.cssd startcheck [root@nodea nodea]#
Notice the “startcheck” parameter to init.cssd? Funny enough, asmlib has discovered all the disks:
[root@nodea nodea]# /etc/init.d/oracleasm listdisks DATA1 DATA2 DATA3 DATA4 DATA5 DATA6 DATA7 DATA8 FRA1 FRA2 LOGCTL1
Final proof that there was a problem with the symlinks in /dev/disk/by-id was found then:
[root@nodea nodea]# ls -l /dev/disk/by-id/*3600508b4000123ca0000e00002190000* lrwxrwxrwx 1 root root 9 Nov 28 11:34 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000 -> ../../sdq lrwxrwxrwx 1 root root 11 Nov 28 11:35 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000-part1 -> ../../sdaw1 lrwxrwxrwx 1 root root 11 Nov 28 11:35 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000-part2 -> ../../sdaw2 lrwxrwxrwx 1 root root 11 Nov 28 11:35 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000-part3 -> ../../sdaw3 lrwxrwxrwx 1 root root 11 Nov 28 11:35 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000-part4 -> ../../sdaw4 lrwxrwxrwx 1 root root 11 Nov 28 11:35 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000-part5 -> ../../sdaw5 lrwxrwxrwx 1 root root 11 Nov 28 11:35 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000-part6 -> ../../sdaw6
/dev/sdq and /dev/sdawx should be the same disk? Don’t think so. With that evidence I got the previously reluctant sys admin on-call to update /etc/multipath.conf file on all nodes to match production, ie. use /dev/mapper/voting_orcpx for the LUN. I bound the first two partitions to the raw devices and voila!
raw /dev/raw/raw1 /dev/mapper/voting-ocrp1 raw /dev/raw/raw2 /dev/mapper/voting-ocrp2 chmod 644 /dev/raw/raw* chown oracle:oinstall /dev/raw/raw* [root@nodec ~]# /u01/crs/oracle/product/crs/bin/crsctl start crs The CRS stack will be started shortly
Sure enough, the start was visible in the alert_hostname.log file
[crsd(1050)]CRS-1205:Auto-start failed for the CRS resource . Details in nodea. 2009-11-28 16:46:46.091 [crsd(1050)]CRS-1205:Auto-start failed for the CRS resource . Details in nodea. 2009-11-28 16:46:46.085 [crsd(1050)]CRS-1201:CRSD started on node nodea. 2009-11-28 16:46:44.805 [crsd(1050)]CRS-1012:The OCR service started on node nodea. 2009-11-28 16:46:41.777 cle/product/crs/log/nodea/crsd/crsd.log. [crsd(1050)]CRS-1005:The OCR upgrade was completed. .... 2009-11-28 16:46:41.776 [evmd(986)]CRS-1401:EVMD started on node nodea. 2009-11-28 16:46:41.693 [cssd(1737)]CRS-1601:CSSD Reconfiguration complete. Active nodes are nodea . [cssd(1737)]CRS-1605:CSSD voting file is online: /dev/raw/raw2..... 2009-11-28 16:43:20.846 [cssd(12924)]CRS-1601:CSSD Reconfiguration complete. Active nodes are nodea nodeb nodec .
Now there was only one thing missing, which was the change in /etc/sysconfig/oracleasm. If you don’t instruct asmlib to ignore the individual devices in /dev/sdxx then it will show them via listdisks but can’t open them. The most visible symptom is in v$asm_disk
SQL> select state,MOUNT_STATUS,HEADER_STATUS,library,name from v$asm_disk STATE MOUNT_S HEADER_STATU LIBRARY NAME -------- ------- ------------ -------------------------------------------------------- ------ NORMAL CLOSED FOREIGN System NORMAL CLOSED FOREIGN System NORMAL CLOSED UNKNOWN ASM Library - Generic Linux, version 2.0.4 (KABI_V2) NORMAL CLOSED UNKNOWN ASM Library - Generic Linux, version 2.0.4 (KABI_V2) NORMAL CLOSED UNKNOWN ASM Library - Generic Linux, version 2.0.4 (KABI_V2) NORMAL CLOSED UNKNOWN ASM Library - Generic Linux, version 2.0.4 (KABI_V2) NORMAL CLOSED UNKNOWN ASM Library - Generic Linux, version 2.0.4 (KABI_V2) NORMAL CLOSED UNKNOWN ASM Library - Generic Linux, version 2.0.4 (KABI_V2) NORMAL CLOSED UNKNOWN ASM Library - Generic Linux, version 2.0.4 (KABI_V2) NORMAL CLOSED UNKNOWN ASM Library - Generic Linux, version 2.0.4 (KABI_V2) NORMAL CLOSED UNKNOWN ASM Library - Generic Linux, version 2.0.4 (KABI_V2) NORMAL CLOSED UNKNOWN ASM Library - Generic Linux, version 2.0.4 (KABI_V2) NORMAL CLOSED UNKNOWN ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
Pay attention to the header_status and name colums: asmlib didn’t recognise these. Also, the alert_+asm.log shows lots of repetitions of the following error:
Sat Nov 28 17:02:37 2009 ORA-15186: ASMLIB error function = [asm_open], error = [1], mesg = [Operation not permitted] Sat Nov 28 17:02:37 2009 ORA-15186: ASMLIB error function = [asm_open], error = [1], mesg = [Operation not permitted] Sat Nov 28 17:02:37 2009 ORA-15186: ASMLIB error function = [asm_open], error = [1], mesg = [Operation not permitted]
Now for the fix. Edit /etc/sysconfig/oracleasm and change the following 2 environment variables on all nodes of the cluster:
ORACLEASM_SCANORDER="dm-" ORACLEASM_SCANEXCLUDE="sd"
For this to take effect, shut down the oracle stack, then as root, stop and start asmlib:
# /etc/init.d/oracleasm stop # /etc/init.d/oracleasm start
Restart ASM and you should see success!
SQL> select state,MOUNT_STATUS,HEADER_STATUS,library,name from v$asm_disk; STATE MOUNT_S HEADER_STATU LIBRARY NAME -------- ------- ------------ ---------------------------------------------------------------- ------------------------------ NORMAL CLOSED FOREIGN System NORMAL CLOSED FOREIGN System NORMAL CACHED MEMBER ASM Library - Generic Linux, version 2.0.4 (KABI_V2) DATA1 NORMAL CACHED MEMBER ASM Library - Generic Linux, version 2.0.4 (KABI_V2) DATA2 NORMAL CACHED MEMBER ASM Library - Generic Linux, version 2.0.4 (KABI_V2) DATA3 NORMAL CACHED MEMBER ASM Library - Generic Linux, version 2.0.4 (KABI_V2) DATA4 NORMAL CACHED MEMBER ASM Library - Generic Linux, version 2.0.4 (KABI_V2) DATA5 NORMAL CACHED MEMBER ASM Library - Generic Linux, version 2.0.4 (KABI_V2) DATA6 NORMAL CACHED MEMBER ASM Library - Generic Linux, version 2.0.4 (KABI_V2) DATA7 NORMAL CACHED MEMBER ASM Library - Generic Linux, version 2.0.4 (KABI_V2) DATA8 NORMAL CACHED MEMBER ASM Library - Generic Linux, version 2.0.4 (KABI_V2) FRA2 NORMAL CACHED MEMBER ASM Library - Generic Linux, version 2.0.4 (KABI_V2) FRA3 NORMAL CACHED MEMBER ASM Library - Generic Linux, version 2.0.4 (KABI_V2) LOGCTL1 13 rows selected.
Names are correct, and header status reflects the member state of the disk as well. Good! Confirmation from the alert.log:
Sat Nov 28 17:21:07 2009 NOTE: start heartbeating (grp 2) NOTE: cache opening disk 0 of grp 2: DATA1 label:DATA1 Sat Nov 28 17:21:07 2009 NOTE: F1X0 found on disk 0 fcn 0.1606219 NOTE: cache opening disk 1 of grp 2: DATA2 label:DATA2 NOTE: cache opening disk 2 of grp 2: DATA3 label:DATA3 NOTE: cache opening disk 3 of grp 2: DATA4 label:DATA4 NOTE: cache opening disk 4 of grp 2: DATA5 label:DATA5 ...
So this concludes the war story about how to deal with ad hoc addition of multipathing to RAC when you don’t tell the DBAs. In an ideal world that wouldn’t have happened, but where I work currently that’s rather the rule than the exception.
Suggested background reading
How to Dynamically Add and Remove SCSI Devices on Linux [ID 603868.1]
How to Setup UDEV Rules for RAC OCR & Voting devices on SLES10, RHEL5, OEL5 [ID 414897.1]
Pingback: Blogroll Report 27/11/2009-04/12/2009 « Coskan’s Approach to Oracle