4k sector size and Grid Infrastructure 11.2 installation gotcha

Some days are just too good to be true :) I ran into an interesting problem trying to install Grid Infrastructure for a two node cluster. The storage was presented via iSCSI which turned out to be a blessing and inspiration for this blog post. So far I haven’t found out yet how to create “shareable” LUNs in KVM the same way I did successfully with Xen. I wouldn’t recommend general purpose iSCSI for anything besides lab setups though. If you want network based storage, go and use 10GBit/s Ethernet and either use FCoE or (direct) NFS.

Here is my setup. Storage is presented in 3 targets using tgtd on the host:

  1. Target 1 contains 3×2 GB LUNs for OCR and voting disks in normal redundancy.
  2. Target 2 contains 3×10 GB LUNs for +DATA
  3. Target 2 contains 3×10 GB LUNs for +RECO

iSCSI initiators are Oracle Linux 6.4 on KVM with the host running OpenSuSE 12.3 providing the iSCSI targets. Yes, I know I’m probably the only Oracle DBA running SuSE, but to my defence I have a similar system with Oracle Linux 6.4 throughout and both work.

So besides the weird host OS there is nothing special. Since I’m lazy sometimes and don’t particularly like udev I decided to use ASMLib for device name persistence on the iSCSI LUNs. This turned out to be crucial, otherwise I’d never had written this post.

So much for the introduction

And here’s the problem. While installing Grid Infrastructure OUI allowed me to fill out all wizard interfaces and proceeded to install the binaries on all hosts. If you have installed RAC before that’s not the interesting part of the installation. It gets far more interesting when you run root.sh! Normally root.sh simply completes if you paid attention to the pre-requisites. I _have_ paid attention to them yet still the script failed on node 1! I don’t have the exact output on screen any more but the script bailed out trying to create the voting files.

Whenever something goes wrong with the installation of Grid Infrastructure you can turn to $GRID_HOME/cfgtoollogs/crsconfig/rootcrs_$(hostname).log. The relevant section in the file contained this:

2013-04-23 14:11:12: Creating voting files
2013-04-23 14:11:12: Creating voting files in ASM diskgroup OCR
2013-04-23 14:11:12: Executing crsctl replace votedisk '+OCR'
2013-04-23 14:11:12: Executing /u01/app/11.2.0/grid/bin/crsctl replace votedisk '+OCR'
2013-04-23 14:11:12: Executing cmd: /u01/app/11.2.0/grid/bin/crsctl replace votedisk '+OCR'
2013-04-23 14:11:12: Command output:
>  Failed to create voting files on disk group OCR.
>  Change to configuration failed, but was successfully rolled back.
>  CRS-4000: Command Replace failed, or completed with errors.
>End Command output
2013-04-23 14:11:12: Voting file add failed

Ooops. Not good-why would that fail? The ASM instance was up, the OCR has already been created. Most Clusterware commands leave a trace in $GRID_HOME/log/$(hostname)/client. I checked the last file in there but it didn’t help much:

2013-04-23 15:07:49.079: [  CRSCTL][3108575008]crsctl_format_diskgroup: diskgroup OCR creation with status 1. Please check the alert log file for ASM
2013-04-23 15:07:49.082: [ CSSCLNT][3108575008]clsssConfigLockTrace: === clsssConfigUnlock ===
2013-04-23 15:07:49.082: [ CSSCLNT][3108575008]clsssConfigLockTrace: lock(0x1cb70d0), version(1), size(1088)
2013-04-23 15:07:49.082: [ CSSCLNT][3108575008]clsssConfigLockTrace: offsets(24), activever(186647296)
2013-04-23 15:07:49.082: [ CSSCLNT][3108575008]clsssConfigLockTrace: id(14), instantiation(12), incarn(1)
2013-04-23 15:07:49.082: [ CSSCLNT][3108575008]clsssConfigLockTrace: mapoff(28), configoff(548), mapsize(512)
2013-04-23 15:07:49.082: [ CSSCLNT][3108575008]clsssConfigLockTrace: state(0), holders(0), waiters(0)
2013-04-23 15:07:49.082: [ CSSCLNT][3108575008]clsssConfigLockTrace: committimestamp(0), commitstate(0)
2013-04-23 15:07:49.082: [ CSSCLNT][3108575008]clsssConfigLockTrace: === Map (of first 7 entries) ===
2013-04-23 15:07:49.082: [ CSSCLNT][3108575008]clsssConfigLockTrace: 000 000 000 000 000 000 000

Except for line 1-that is a very obvious pointer. Maybe there is something in the ASM instance’s alert.log?

2013-04-23 15:07:48.733000 +01:00
NOTE: [crsctl.bin@rac11gr2node1.example.com (TNS V1-V3) 13092] opening OCR file
NOTE: updated gpnp profile ASM diskstring:
NOTE: Creating voting files in diskgroup OCR
NOTE: Voting File refresh pending for group 1/0x20cf9032 (OCR)
NOTE: Attempting voting file creation in diskgroup OCR
ERROR: Could not create voting files. It spans across 161 AUs (max supported is 64 AUs)
ERROR: Voting file allocation failed for group OCR
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_13100.trc:
ORA-15303: Voting files could not be created in diskgroup OCR due to small Allocation Unit size
2013-04-23 15:07:51.278000 +01:00
NOTE: Attempting voting file refresh on diskgroup OCR

That was it! But what’s so special about the disk group? Something started to dawn on me…I have recently spent quite some time on 4k sector size disks and their implications. And my hard disks have recently been replaced … here’s what I got from ASM:

SQL> select name,sector_size from v$asm_diskgroup;

NAME                           SECTOR_SIZE
------------------------------ -----------
OCR                                   4096

So sure enough, my disk group uses 4k sectors, even though at no point I asked it to do so. In fact up to today I struggled to create a disk group with a 4k sector size for lack of supporting hardware! So this is the first time I see those, my hard disks in the lab server must be pretty new then. There are many ways to check for the block size of your LUN, this time I chose fdisk (this only works for LUNs that use the MBR format, if the LUN has been initialised with a GPT you need to install parted instead)

[root@rac11gr2node1 client]# fdisk -lu /dev/sda

Disk /dev/sda: 2147 MB, 2147483648 bytes
67 heads, 62 sectors/track, 1009 cylinders, total 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1          531712     4191385     1829837   83  Linux
[root@rac11gr2node1 client]#

Notice this line:

Sector size (logical/physical): 512 bytes / 4096 bytes

And there exactly lies the problem. ASMLib in my version cannot deal with a 4k sector size properly it seems. The problem is known, just search for ORA-15303 in My Oracle Support. It seems particular to LUNs/disks with different logical and physical block sizes though-refer to the MOS note for more information. For your reference, here are my versions:

[oracle@rac11gr2node1 ~]$ rpm -qa| grep oracleasm
[oracle@rac11gr2node1 ~]$

The strange thing is that the ASM instance with the single disk group has been created successfully, as shown by the log (note the disk list in line 1):

2013-04-23 14:10:54: Executing as oracle: /u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList 'ORCL:OCR1,ORCL:OCR2,ORCL:O
CR3' -redundancy NORMAL -configureLocalASM -au_size 1
2013-04-23 14:10:54: Running as user oracle: /u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList 'ORCL:OCR1,ORCL:OCR2,ORCL:OCR3' -redundancy NORMAL -configureLocalASM -au_size 1
2013-04-23 14:10:54:   Invoking "/u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList 'ORCL:OCR1,ORCL:OCR2,ORCL:OCR3' -redundancy NORMAL -configureLocalASM -au_size 1" as user "oracle"
2013-04-23 14:10:54: Executing /bin/su oracle -c "/u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList 'ORCL:OCR1,ORCL:OCR2,ORCL:OCR3' -redundancy NORMAL -configureLocalASM -au_size 1"
2013-04-23 14:10:54: Executing cmd: /bin/su oracle -c "/u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList 'ORCL:OCR1,ORCL:OCR2,ORCL:OCR3' -redundancy NORMAL -configureLocalASM -au_size 1"
2013-04-23 14:11:09: Command output:
>  ASM created and started successfully.
>  Disk Group OCR created successfully.
>End Command output

After spending a little while thinking what to do I decided to bite the bullet and start from scratch. The deinstall scripts again worked really well in my case.

Instead of ASMLib I used udev for the second attempt. The trick with udev is to find something to map. The scsi_id command for example is a great help in determining disk attributes, but there are others too. In my case, all I wanted to achieve is to change permissions of the iSCSI disks to oracle:dba and 0660. In my lab environment I didn't use multiple paths to the iSCSI target, and I didn't care about symlinks. You get it, this isn't 100% realistic ... Note that In KVM disks are named vd*, like /dev/vda for the first one etc. if you are using virtio, the para-virtualised drivers. The iSCSI disks ended up being called /dev/sd* which makes it easy to define an asm_diskstring.
[root@rac11gr2node1 sys]# /sbin/scsi_id --whitelisted --replace-whitespace --page=0x80 --device=/dev/sdg --export

So using the vendor string and the model attribute from SYSFS will most likely work. Please note that this is a rather simplistic model and wouldn't necessarily work with Fibre Channel attached disks or multipathing. The resulting udev rule in /etc/udev/rules/99-asm.rules was:

KERNEL=="sd[a-z]*", BUS=="scsi", SYSFS{vendor}=="IET", SYSFS{model}=="VIRTUAL-DISK", OWNER="oracle", GROUP="dba" MODE="0660"

I also deconfigured ASMLib on all nodes just to be sure:

[root@rac11gr2node1 u01]# /etc/init.d/oracleasm configure
Configuring the Oracle ASM library driver.

This will configure the on-boot properties of the Oracle ASM library
driver.  The following questions will determine whether the driver is
loaded on boot and what permissions it will have.  The current values
will be shown in brackets ('[]').  Hitting <ENTER> without typing an
answer will keep that current value.  Ctrl-C will abort.

Default user to own the driver interface [oracle]:
Default group to own the driver interface [dba]:
Start Oracle ASM library driver on boot (y/n) [y]: n
Scan for Oracle ASM disks on boot (y/n) [y]: n
Writing Oracle ASM library driver configuration: done
Dropping Oracle ASMLib disks:                              [  OK  ]
Shutting down the Oracle ASMLib driver:                    [  OK  ]
[root@rac11gr2node1 u01]#

Running udevadm trigger loaded the new rules and indeed, the /dev/sd*1 devices now were owned by oracle:dba.

Then I ran through OUI again and wanted to assign the previously used disks again, but they didn't appear as candidates :( OK so the deconfig didn't zero out the disk headers. Not a problem, oracleasm has a "deletedisk" command that can be used for this exact purpose. You just need to be really sure it's the correct disk you are zeroing out, otherwise someone else will surely complain (you have been warned!).

The rest was simple. The ASM instance has again been created successfully, note the difference in disk names this time (my asm_diskstring was set to /dev/sd*1):

2013-04-23 16:29:29: Executing as oracle: /u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList '/dev/sda1,/dev/sdb1,/dev/sdc1' -redundancy NORMAL -diskString '/dev/sd*1' -configureLocalASM -au_size 1
2013-04-23 16:29:29: Running as user oracle: /u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList '/dev/sda1,/dev/sdb1,/dev/sdc1' -redundancy NORMAL -diskString '/dev/sd*1' -configureLocalASM -au_size 1
2013-04-23 16:29:29:   Invoking "/u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList '/dev/sda1,/dev/sdb1,/dev/sdc1' -redundancy NORMAL -diskString '/dev/sd*1' -configureLocalASM -au_size 1" as user "oracle"
2013-04-23 16:29:29: Executing /bin/su oracle -c "/u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList '/dev/sda1,/dev/sdb1,/dev/sdc1' -redundancy NORMAL -diskString '/dev/sd*1' -configureLocalASM -au_size 1"
2013-04-23 16:29:29: Executing cmd: /bin/su oracle -c "/u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList '/dev/sda1,/dev/sdb1,/dev/sdc1' -redundancy NORMAL -diskString '/dev/sd*1' -configureLocalASM -au_size 1"

And this time the voting disks were created as well:

CRS-2676: Start of 'ora.diskmon' on 'rac11gr2node2' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac11gr2node2' succeeded

ASM created and started successfully.

Disk Group OCR created successfully.

clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4256: Updating the profile
Successful addition of voting disk 9d9a9577a66a4f6fbf267b125dc7f4a3.
Successful addition of voting disk 26e5ff2544084ff0bfb9e753b090ec22.
Successful addition of voting disk 38a820cd4b424f4ebf514370168fa499.
Successfully replaced voting disk group with +OCR.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   9d9a9577a66a4f6fbf267b125dc7f4a3 (/dev/sda1) [OCR]
 2. ONLINE   26e5ff2544084ff0bfb9e753b090ec22 (/dev/sdb1) [OCR]
 3. ONLINE   38a820cd4b424f4ebf514370168fa499 (/dev/sdc1) [OCR]
Located 3 voting disk(s).

Very nice-job done. You should refer back to the relevant MOS note to read more about the problem but for now be advised that 4k sector sizes can cause surprises.


5 thoughts on “4k sector size and Grid Infrastructure 11.2 installation gotcha

  1. flashdba

    Oracle’s implementation of 4k support is … well let’s just say there are a few holes in it. The ASMlib issue is one of them (although I believe they are working on a complete new version of ASMlib) and it causes numerous oddities – this is one variant I haven’t seen before, so thanks for sharing. I’ll add a link to this from my 4k page at http://flashdba.com/4k-sector-size/

    Let’s hope 12c has all of the issues resolved eh? :-)

  2. kevinclosson

    Hi Martin,

    Glad to see others feel the same headaches :-) For EMC XtremIO customers we will just recommend customers forgo ASMLib for a “primary” DG–called something like SYSTEMDG on a LUN formatted 512/4096. In this case SYSTEMDG will be where one stores OCR/CSS. All other luns are of 4096/4096 and suitable for ASMLib (if folks feel inclined).

  3. Cyrill

    Hi Martin
    I first opened this “bug” for ASMLib more than one year ago. After upgrading the FW on our storage we received the error that the DG couldn’t be mounted. After some research (as always in the middle of the night) we found, that bypassing the devices from ASMLib makes the trick (after a lot of discussion with the support from the storage vendor, oracle support and novell support, oracle started fixing it, and after a while some other customer faced the same issue, so after one year it went a bit faster, it was quite annoying, as we are using SLES and Novell builds the Kernel Module from the sources from Oracle….)… In the mean time, the ASMLib is fixed, there is a new parameter for choosing if you want the logical block size used by ASMLib. We are now testing the new Kernel Module, the first impression looks good. So if somebody is also using the new module, it would be nice to receive any feedback about it.

    1. Martin Bach Post author

      Hi Matt!

      It’s quite an honour for me to see you visiting my blog, thank you.


Comments are closed.