Martins Blog

Trying to explain complex things in simple terms

Let there be multipathing (when there wasn’t before)

Posted by Martin Bach on December 4, 2009

One of the projects I was working on recently couldn’t be finished as it should have been due to a lack of switch ports in the SAN. The QA database servers therefore only had single pathing to the storage whereas the production kit (thank god!) had multipathing enabled and configured correctly.

This was far from ideal but couldn’t be helped, and it was also out of my control. You could argue that the project managers could have been aware of this and factor it in but I digress. On top of that I had a strange situation where the 3 nodes mapped the LNUs differently-what was seen as /dev/sda on node 1was /dev/sdf on the second node (but /dev/sda again on the third). The presentation of the LUNs to the database servers on the array was identical for each of the 3 nodes. How did I find out? I asked the storage admin to provide the WWIDs as seen on the management console and compared these against /dev/disk/by-id/. This only works with single pathing as explained further down.

The Linux kernel documentation states that the enumeration of devices isn’t static, and unfortunately there is no direct link between controller, target, disk and slice as in Solaris. These are RHEL 5.3 64bit system by the way, and Oracle 10.2.0.4.1 RAC.

There are 2 ways to get around this problem:

  • Metalink note 414897.1 describes how to set up udev for this purpose (and it includes 11.2!)
  • Use /dev/disk/by-id and /etc/rc.local

Since the problem was promised to be rectified soon I decided not to bother with udev (let’s say it’s neither very user friendly nor intuitive). So all I needed for Clusterware’s installation session to get going were 2 raw devices, mapped to partitions on a 5 GB shared LUN. This LUN was partitioned into 5 equal slices, intended to store 3 voting disks and 2 OCR copies; initially I decided to map only 2 of these for 1 vote disk and 1 OCR which is sufficient to install Clusterware. The plan is to move these off raw devices into the 20th century and onto block devices, something that is possible without this quirky workaround in 11.1. Maybe in another blog entry!

For the time being, the workaround was to enter the following into /etc/rc.local on each node:

raw /dev/raw/raw1 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000-part1
raw /dev/raw/raw2 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000-part2

sleep 5

chmod 644 /dev/raw/raw*
chown oracle:oinstall /dev/raw/raw*

This worked beautifully until said change was implemented without really notifying the DBAs, and as it happened I was on call. Somehow Murphy’s law always strikes me, I seem to magically attract that kind of trouble. After the sys admins and storage admins finished their work, the partition table looked strikingly different from the last reboot… And CRS didn’t start either. Time for some serious troubleshooting. The first sign of problems were visible straight after logging in:

[oracle@nodeb~]$ ps -ef | grep smon
oracle   17770 17733  0 15:15 pts/0    00:00:00 grep smon

There should have been an ASM instance and a database, hmm. With the Oracle instances not started, something must have been wrong with clusterware. Usually the problem occurs when the raw devices are bound, but the permissions aren’t set correctly. I double checked:

[root@nodec ~]# raw -qa
/dev/raw/raw1:bound to major 67, minor 1
/dev/raw/raw2:bound to major 67, minor 2
[root@nodec ~]# ls -l /dev/disk/by-id/scsi-3600508b4000123ca0000e0000219 0000-part1
lrwxrwxrwx 1 root root 11 Nov 28 11:35 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000-part1 -> ../../sdat1
[root@nodec ~]# ls -l /dev/sdat1
brw-r----- 1 root disk 66, 209 Nov 28 11:35 /dev/sdat1

I then changed the raw device permissions and ownership, just to be sure and started CRS:

[root@nodec ~]# chmod 644 /dev/raw/raw*
[root@nodec ~]# chown oracle:oinstall /dev/raw/raw*

[oracle@nodea ~]$ ls -l /dev/raw/raw*
crw-r--r-- 1 oracle oinstall 162, 1 Nov 28 11:36 /dev/raw/raw1
crw-r--r-- 1 oracle oinstall 162, 2 Nov 28 11:36 /dev/raw/raw2

[root@nodec ~]# /u01/crs/oracle/product/crs/bin/crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
...
[root@nodec ~]# /u01/crs/oracle/product/crs/bin/crsctl start crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

That didn’t seem to have done the trick. At this stage I noticed that the partition table had lots of entries it previously didn’t have, and it dawned to me that the multipathing was now working, but /etc/multipath.conf (we use device mapper multipath) wasn’t configured. I assume that the pointer to the /dev/disk/by-id/ didn’t work because udev (which maintains these symlinks) can’t cope with multipathing, but correct me if I’m wrong. Digging further, I saw these lines in /var/log/messages:

Nov 28 15:28:07 nodea logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.14288.
Nov 28 15:28:07 nodea logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.14369.
Nov 28 15:28:07 nodea logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.14384.

The contents of the files in /tmp were identical:

[root@nodea nodea]# cat /tmp/crsctl.14384
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [Device or resource busy] [16]

I found nothing suitable on this “[16]” in either Metalink or google search-btw, PROC-26 … [13] indicates permission problems on the raw/block devices accessed for Clusterware! The alert_hostname.log file in $ORA_CRS_HOME/log/hostname wasn’t even modified since the server was rebooted, and the processes on the OS level also indicated trouble:

[root@nodea nodea]# ps -ef | grep -E 'init|d.bin|ocls|oprocd|diskmon|evmlogger|sleep|PID'
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 11:33 ?        00:00:04 init [3]                                  
root      3542 14384  0 16:08 ?        00:00:00 /bin/sleep 60
root      3543 14288  0 16:08 ?        00:00:00 /bin/sleep 60
root      3544 14369  0 16:08 ?        00:00:00 /bin/sleep 60
root      3652   514  0 16:08 pts/1    00:00:00 grep -E init|d.bin|ocls|oprocd|diskmon|evmlogger|sleep|PID
root     14216     1  0 11:36 ?        00:00:00 /bin/sh /etc/init.d/init.evmd run
root     14218     1  0 11:36 ?        00:00:00 /bin/sh /etc/init.d/init.cssd fatal
root     14219     1  0 11:36 ?        00:00:00 /bin/sh /etc/init.d/init.crsd run
root     14288 14216  0 11:36 ?        00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root     14369 14219  0 11:36 ?        00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root     14384 14218  0 11:36 ?        00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
[root@nodea nodea]#

Notice the “startcheck” parameter to init.cssd? Funny enough, asmlib has discovered all the disks:

[root@nodea nodea]# /etc/init.d/oracleasm listdisks
DATA1
DATA2
DATA3
DATA4
DATA5
DATA6
DATA7
DATA8
FRA1
FRA2
LOGCTL1

Final proof that there was a problem with the symlinks in /dev/disk/by-id was found then:

[root@nodea nodea]# ls -l /dev/disk/by-id/*3600508b4000123ca0000e00002190000*
lrwxrwxrwx 1 root root  9 Nov 28 11:34 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000 -> ../../sdq
lrwxrwxrwx 1 root root 11 Nov 28 11:35 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000-part1 -> ../../sdaw1
lrwxrwxrwx 1 root root 11 Nov 28 11:35 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000-part2 -> ../../sdaw2
lrwxrwxrwx 1 root root 11 Nov 28 11:35 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000-part3 -> ../../sdaw3
lrwxrwxrwx 1 root root 11 Nov 28 11:35 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000-part4 -> ../../sdaw4
lrwxrwxrwx 1 root root 11 Nov 28 11:35 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000-part5 -> ../../sdaw5
lrwxrwxrwx 1 root root 11 Nov 28 11:35 /dev/disk/by-id/scsi-3600508b4000123ca0000e00002190000-part6 -> ../../sdaw6

/dev/sdq and /dev/sdawx should be the same disk? Don’t think so. With that evidence I got the previously reluctant sys admin on-call to update /etc/multipath.conf file on all nodes to match production, ie. use /dev/mapper/voting_orcpx for the LUN. I bound the first two partitions to the raw devices and voila!

raw /dev/raw/raw1 /dev/mapper/voting-ocrp1
raw /dev/raw/raw2 /dev/mapper/voting-ocrp2
chmod 644 /dev/raw/raw*
chown oracle:oinstall /dev/raw/raw*
[root@nodec ~]# /u01/crs/oracle/product/crs/bin/crsctl start crs
The CRS stack will be started shortly

Sure enough, the start was visible in the alert_hostname.log file

[crsd(1050)]CRS-1205:Auto-start failed for the CRS resource . Details in nodea.
2009-11-28 16:46:46.091
[crsd(1050)]CRS-1205:Auto-start failed for the CRS resource . Details in nodea.
2009-11-28 16:46:46.085
[crsd(1050)]CRS-1201:CRSD started on node nodea.
2009-11-28 16:46:44.805
[crsd(1050)]CRS-1012:The OCR service started on node nodea.
2009-11-28 16:46:41.777
cle/product/crs/log/nodea/crsd/crsd.log.
[crsd(1050)]CRS-1005:The OCR upgrade was completed. ....
2009-11-28 16:46:41.776
[evmd(986)]CRS-1401:EVMD started on node nodea.
2009-11-28 16:46:41.693
[cssd(1737)]CRS-1601:CSSD Reconfiguration complete. Active nodes are nodea .
[cssd(1737)]CRS-1605:CSSD voting file is online: /dev/raw/raw2.....
2009-11-28 16:43:20.846
[cssd(12924)]CRS-1601:CSSD Reconfiguration complete. Active nodes are nodea nodeb nodec .

Now there was only one thing missing, which was the change in /etc/sysconfig/oracleasm. If you don’t instruct asmlib to ignore the individual devices in /dev/sdxx then it will show them via listdisks but can’t open them. The most visible symptom is in v$asm_disk

SQL> select state,MOUNT_STATUS,HEADER_STATUS,library,name from v$asm_disk

STATE    MOUNT_S HEADER_STATU LIBRARY                                                  NAME
-------- ------- ------------ -------------------------------------------------------- ------
NORMAL   CLOSED  FOREIGN      System
NORMAL   CLOSED  FOREIGN      System
NORMAL   CLOSED  UNKNOWN      ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
NORMAL   CLOSED  UNKNOWN      ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
NORMAL   CLOSED  UNKNOWN      ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
NORMAL   CLOSED  UNKNOWN      ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
NORMAL   CLOSED  UNKNOWN      ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
NORMAL   CLOSED  UNKNOWN      ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
NORMAL   CLOSED  UNKNOWN      ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
NORMAL   CLOSED  UNKNOWN      ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
NORMAL   CLOSED  UNKNOWN      ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
NORMAL   CLOSED  UNKNOWN      ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
NORMAL   CLOSED  UNKNOWN      ASM Library - Generic Linux, version 2.0.4 (KABI_V2)

Pay attention to the header_status and name colums: asmlib didn’t recognise these. Also, the alert_+asm.log shows lots of repetitions of the following error:

Sat Nov 28 17:02:37 2009
ORA-15186: ASMLIB error function = [asm_open],  error = [1],  mesg = [Operation not permitted]
Sat Nov 28 17:02:37 2009
ORA-15186: ASMLIB error function = [asm_open],  error = [1],  mesg = [Operation not permitted]
Sat Nov 28 17:02:37 2009
ORA-15186: ASMLIB error function = [asm_open],  error = [1],  mesg = [Operation not permitted]

Now for the fix. Edit /etc/sysconfig/oracleasm and change the following 2 environment variables on all nodes of the cluster:

ORACLEASM_SCANORDER="dm-"
ORACLEASM_SCANEXCLUDE="sd"

For this to take effect, shut down the oracle stack, then as root, stop and start asmlib:

# /etc/init.d/oracleasm stop
# /etc/init.d/oracleasm start

Restart ASM and you should see success!

SQL> select state,MOUNT_STATUS,HEADER_STATUS,library,name from v$asm_disk;

STATE    MOUNT_S HEADER_STATU LIBRARY                                                          NAME
-------- ------- ------------ ---------------------------------------------------------------- ------------------------------
NORMAL   CLOSED  FOREIGN      System
NORMAL   CLOSED  FOREIGN      System
NORMAL   CACHED  MEMBER       ASM Library - Generic Linux, version 2.0.4 (KABI_V2)             DATA1
NORMAL   CACHED  MEMBER       ASM Library - Generic Linux, version 2.0.4 (KABI_V2)             DATA2
NORMAL   CACHED  MEMBER       ASM Library - Generic Linux, version 2.0.4 (KABI_V2)             DATA3
NORMAL   CACHED  MEMBER       ASM Library - Generic Linux, version 2.0.4 (KABI_V2)             DATA4
NORMAL   CACHED  MEMBER       ASM Library - Generic Linux, version 2.0.4 (KABI_V2)             DATA5
NORMAL   CACHED  MEMBER       ASM Library - Generic Linux, version 2.0.4 (KABI_V2)             DATA6
NORMAL   CACHED  MEMBER       ASM Library - Generic Linux, version 2.0.4 (KABI_V2)             DATA7
NORMAL   CACHED  MEMBER       ASM Library - Generic Linux, version 2.0.4 (KABI_V2)             DATA8
NORMAL   CACHED  MEMBER       ASM Library - Generic Linux, version 2.0.4 (KABI_V2)             FRA2
NORMAL   CACHED  MEMBER       ASM Library - Generic Linux, version 2.0.4 (KABI_V2)             FRA3
NORMAL   CACHED  MEMBER       ASM Library - Generic Linux, version 2.0.4 (KABI_V2)             LOGCTL1

13 rows selected.

Names are correct, and header status reflects the member state of the disk as well. Good! Confirmation from the alert.log:

Sat Nov 28 17:21:07 2009
NOTE: start heartbeating (grp 2)
NOTE: cache opening disk 0 of grp 2: DATA1 label:DATA1
Sat Nov 28 17:21:07 2009
NOTE: F1X0 found on disk 0 fcn 0.1606219
NOTE: cache opening disk 1 of grp 2: DATA2 label:DATA2
NOTE: cache opening disk 2 of grp 2: DATA3 label:DATA3
NOTE: cache opening disk 3 of grp 2: DATA4 label:DATA4
NOTE: cache opening disk 4 of grp 2: DATA5 label:DATA5
...

So this concludes the war story about how to deal with ad hoc addition of multipathing to RAC when you don’t tell the DBAs. In an ideal world that wouldn’t have happened, but where I work currently that’s rather the rule than the exception.

Suggested background reading

How to Dynamically Add and Remove SCSI Devices on Linux [ID 603868.1]
How to Setup UDEV Rules for RAC OCR & Voting devices on SLES10, RHEL5, OEL5 [ID 414897.1]

About these ads

One Response to “Let there be multipathing (when there wasn’t before)”

  1. [...] 11-Diagnosing and solving a multipath problem with ASM. ([Device or resource busy] [16]) Martin Bach-Let there be multipathing (when there wasn’t before) [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 2,313 other followers

%d bloggers like this: