Adding storage dynamically to ASM on Linux

Note: This discussion is potentially relevant only to OEL 5.x and RHEL 5.x- I haven’t been able to verify that it works the same way on other Linux distributions. I would assume so though. Before starting with the article, here are some facts:

  • OEL/RHEL 5.5 64bit
  • Oracle 11.2.0.2
  • native multipathing: device-mapper-multipath

The question I have asked myself many times is: how can I dynamically add a LUN to ASM without having to stop any component of the stack? Mocking “reboot-me” OS’s like Windows I soon was quiet when it came to discussing the addition of a LUN to ASM on Linux. Today I learned how to do this, by piecing together information I got from Angus Thomas, a great Red Hat system administrator I had the pleassure to work with in 2009 and 2010. And since I have a short lived memory I decided to write it down.

I’ll describe the process from the top to bottom, from the addition of the LUN to the server all the way up to the addition of the ASM disk to the disk group.

Adding the storage to the cluster nodes

The first step is to obviosuly get the LUN assigned to the server(s). This is the easy part, and outside of the control of the Linux/Oracle admin. The storage team will provision a new LUN to the hosts in question. At this stage, Linux has no idea about the new storage: to make it available, the system administrator has to rescan the SCSI bus. A proven and tested way in RHEL 5 is to issue this command:

[root@node1 ~]# for i in `ls -1 /sys/class/scsi_host`; do
> echo "- - -" > /sys/class/scsi_host/${i}/scan
> done

The new, unpartitioned LUN will appear in /proc/partitions. If it doesn’t then there’s probably something wrong on the SAN side-check /var/log/messages and talk to your storage administrator. If it’s not a misconfiguration then you may not have an option but to reboot the node.

Configure Multipathing

So far so good, the next step is to add it to the multipathing. First of all, you need to find out what the new WWID of the device is. In my case that’s simple: the last new line in /proc/partitions is usually a giveaway. If you are unsure, ask the man who can check the WWID a console to the array. It’s important to get this right at this stage :)

To add the new disk to the multipath.conf file, all you need to do is to add a new section, as in the following example:


multipaths {
..
multipath {
wwid 360000970000294900664533030344239
alias ACFS0001
path_grouping_policy failover
}
..
}

By the way, I have written a more detailed post about configuring multipathing in a previous blog post here. Don’t forget to replicate the changes to the other cluster nodes!

Now  you reload multipathd using /etc/init.d/multipathd reload on each node, and voila, you should see the device in /dev/mapper/ – my ACFS disk appeared as /dev/mapper/ACFS0001.

Now the tricky bit is to partition it (if you need to-it’s no longer mandatory with 11.1 and newer. Some software like EMC’s Replication Manager requires you to though). I succeeded in doing so by checking the device in /dev/disk/by-id and then using fdisk against it as in this example:

...
# fdisk /dev/disk/by-id/scsi-360000970000294900664533030344239
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.


The number of cylinders for this disk is set to 23251.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
 (e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): u
Changing display/entry units to sectors

Command (m for help): n
Command action
 e   extended
 p   primary partition (1-4)
p
Partition number (1-4): 1
First sector (32-47619839, default 32): 128
Last sector or +size or +sizeM or +sizeK (129-47619839, default 47619839):
Using default value 47619839

Command (m for help): p

Disk /dev/disk/by-id/scsi-360000970000294900664533030344239: 24.3 GB, 24381358080 bytes
64 heads, 32 sectors/track, 23251 cylinders, total 47619840 sectors
Units = sectors of 1 * 512 = 512 bytes

 Device Boot                                                         Start  End         Blocks     Id  System
/dev/disk/by-id/scsi-360000970000294900664533030344239p1             128    47619839    23809855+  83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Once you are in fdisk, the commands are identical to single-pathed storage. Type “n” to create a new partition, “p” for a primary and specify the start and end cylinders as needed. Type “p” to print the partition table, and if you are happy with it use “w” to write it. You might wonder why I added an offset and changed the unit (“u”)-this is due to the EMC storage this site uses.  The EMC® Host Connectivity Guide for Linux (P/N 300-003-865 REV A23) suggests a 64k offset. Don’t simply repeat this in your environment-check with the storage team first.

Before adding the partitions to ACFS0001 and ACFS0002 I had 107 partitions:


[root@node1 ~]# wc -l /proc/partitions
107 /proc/partitions

The new partitions are recognised after the 2 fdisk commands completed:


[root@node1 ~]# wc -l /proc/partitions
 107 /proc/partitions

But when you check /dev/mapper now you still don’t see the partition-the naming convention is to append pn to the device name, i.e. /dev/mapper/ACFS0001p1 for the first partition and so on.

kpartx to the rescue! This superb utility can read the partition table of a device and modify it. Initially my setup was as follows:


[root@node1 ~]# ls -l /dev/mapper/ACFS*
brw-rw---- 1 root disk 253, 31 Jan 18 10:05 /dev/mapper/ACFS0001
brw-rw---- 1 root disk 253, 32 Jan 18 10:05 /dev/mapper/ACFS0002

Usually I would have rebooted the node at this stage as I didn’t know about how to update the partition table. But with kpartx (“yum install kpartx” to install) this is no longer needed. Consider the below example:

[root@node1 ~]# kpartx -l /dev/mapper/ACFS0001
ACFS0001p1 : 0 47619711 /dev/mapper/ACFS0001 129
[root@node1 ~]# kpartx -a /dev/mapper/ACFS0001
[root@node1 ~]# kpartx -l /dev/mapper/ACFS0002
ACFS0002p1 : 0 47619711 /dev/mapper/ACFS0002 129
[root@node1 ~]# kpartx -a /dev/mapper/ACFS0002

[root@node1 ~]# ls -l /dev/mapper/ACFS000*
brw-rw---- 1 root disk 253, 31 Jan 18 10:05 /dev/mapper/ACFS0001
brw-rw---- 1 root disk 253, 36 Jan 18 10:13 /dev/mapper/ACFS0001p1
brw-rw---- 1 root disk 253, 32 Jan 18 10:05 /dev/mapper/ACFS0002
brw-rw---- 1 root disk 253, 37 Jan 18 10:13 /dev/mapper/ACFS0002p1

“kpartx -l” prints the partition table, and “kpartx -a” adds it as the example shows. No more need to reboot! However, as it’s been pointed out in the comments section (see below), kpartx doesn’t use/add both paths, so you should run the partprobe command to add the missing paths:


[root@node1 ~]# partprobe
[root@node1 ~]# wc -l /proc/partitions
109 /proc/partitions


 

 

See how there are 109 partitions listed now instead of just 107 from before-the 2 missing paths have been added (one for each device).

Add disks to ASM

With this done, you can add the disk to ASM – I personally like the intermediate step to create and ASMLib disk. Connect to ASM as sysasm and add the disk using the alter diskgroup command:

SQL> alter diskgroup ACFSDG add disk 'ORCL:ACFS0002', 'ORCL:ACFS0001';

Now just wait for the rebalance operation to complete.

Responses

  1. A couple of caveats (some loose ones I admit).

    1) The scsi rescan typically works … not always as it’s dependent on your driver/array combo in which case you will have no choice but reboot to make sure all paths in a multipath environment get updated with the new luns and partition tables.

    2) if your driver/array combo allows everything to be done online then lso after the kpartx -a you might want to consider running partprobe to update ALL paths in a multipath environment.

    3) in a RAC propagate the /var/lib/multipath/bindings file to all nodes along with the multipath.conf file so that all servers map the luns in the same way.

    1. Hi Rui,

      Thanks for your comments!

      re 1) so far it has always worked on sites I have been, but well if it doesn’t then you have no choice but to reboot. Before doing this you should update the multipath.conf file on all cluster nodes (I implicitly assumed that in the post)

      re 2) Not sure I understand which paths you are referring to? multipath -ll shows 2 paths for the old and new devices here without a partprobe…

      re 3) On my current system this directory is empty. Fully agree that multipath.conf has to be consistent across the cluster !

      1. the partprobe updates the partition table of the lun on all paths. When you partition a lun you are doing using only one path (the active path). The secondary path, since it’s info is cached on the host, is not aware of the updated partition table of the lun unless you use partprobe to force a refresh of that path. kpartx -a uses the information from the active path and does not validate it with the passive path.

      2. Thanks for the clarification !

  2. Thank you for this elegant solution!!!
    The alias function of the device mapper would have saved me a headache a couple of months ago… and I would spare the client the use of the ASMlibs…

    Now the question: why do you use the ASMlib if you already have a consistent name for your devices on all nodes?
    What’s the advantages?

    Personally I see far more drawbacks but I’d really like to hear your opinion.

    1. Hi,

      ASMLib will only be available for Oracle’s own kernel in Oracle Linux version 6 and newer … Also Exadata doesn’t use ASMLib … Moving forward I personally wouldn’t use ASMLib in multipathed environments, but will keep it in my lab. The downside to ASMLib is always that you risk your sys admin updating the Linux kernel and you losing visibility of your ASM disks. In my lab I’m sys admin, storage admin and DBA in one person and I know what I’m doing (mostly). I also like ASMLib for iSCSI – udev works as well though.

      Hope this helps,

      Martin

      1. Thank you.
        I was able to get rid of ASMlib without reinstalling my cluster:
        http://www.addalab.it/node/54

        You article was really inspiring!

  3. IT Questions…

    […]Adding storage dynamically to ASM on Linux « Martin's Blog[…]…

  4. what is the ORCL in command alter diskgroup ACFSDG add disk ‘ORCL:ACFS0002’, ‘ORCL:ACFS0001’;. is it osuser name or database name.

    1. Hi Deshray,

      The ORCL is part of the “qualified disk clause” when using ASMLib.

      Martin

Blog at WordPress.com.