Configuration device mapper multipath on OEL5 update 5

I have always wondered how to configure the device mapper multipath package for a Linux system. I knew how to do it in principle, but was never involved in the configuration from start up. Today I got the chance to work on this. The system is used for a lab test and not a production box (otherwise I probably wouldn’t have been allowed on). Actually it’s part of a 2 node cluster.

So the first step is to find out which partitions are visible to the system. The Linux kernel presents this information in the /proc/partitions table, as in the following example:


[root@node1 ~]# cat /proc/partitions
major minor  #blocks  name

 104     0   71652960 cciss/c0d0
 104     1     152586 cciss/c0d0p1
 104     2   71497282 cciss/c0d0p2
 8     0       2880 sda
 8    16  190479360 sdb
 8    32   23809920 sdc
 8    48   23809920 sdd
 8    64   23809920 sde
 8    80   23809920 sdf
 8    96   23809920 sdg
 8   112    1048320 sdh
 8   128    1048320 sdi
 8   144    1048320 sdj
 8   160       2880 sdk
 8   176  190479360 sdl
 8   192   23809920 sdm
 8   208   23809920 sdn
 8   224   23809920 sdo
 8   240   23809920 sdp
 65     0   23809920 sdq
 65    16    1048320 sdr
 65    32    1048320 sds
 65    48    1048320 sdt
 253     0    5111808 dm-0
 253     1   25591808 dm-1
 253     2   10223616 dm-2
 253     3    1015808 dm-3
 253     4   16777216 dm-4
[root@node1 ~]#

Using a keen eye you can see that sdk is the same size as sda, so probably that means that we have two paths to sda to sdj. We’ll confirm this later. The more HBAs and paths you have, the more partitions you are going to see. This is where the multipathing software comes into play: it allows us to abstract from the physical paths and presents a logical device. And offers some additional goodies such as path failover and limited load balancing.

Before proceeding I checked the status of the multipath daemon:

[root@node1 ~]# service multipathd status
multipathd is stopped
[root@node1 ~]# chkconfig --list multipathd
multipathd      0:off   1:off   2:off   3:off   4:off   5:off   6:off
[root@node1 ~]# chkconfig multipathd on

<pre>

As you can see it was not started, and wouldn’t start with a reboot – it was necessary to enable the service at boot time using the chkconfig command. This will automatically create links in /etc/rc.d/rcx.d to start and stop the service. As an additional benefit this command will respect dependencies the authors of the startup script have defined and create the {K,S}xxmultipathd links accordingly.

I next loaded the necessary modules-dm-multipath and dm-round-robin:

[root@node1 ~]# modprobe dm-multipath
[root@node1 ~]# modprobe dm-round-robin

With the multipathing nearly done, I need to get the WWIDs of all attached devices. At some point the WWID is going to be repeated – this is where you stop creating meta devices. Let’s have a look at the output of this first. You have to change directory to /sys, as the scsi_id commands are relative to it.

[node1 sys]# for i in `cat /proc/partitions | awk '{print $4}' |grep sd`; do echo "### $i: `scsi_id -g -u -s /block/$i`"; done
### sda: 360000970000294900664533030303238
### sdb: 360000970000294900664533030344133
### sdc: 360000970000294900664533030344142
### sdd: 360000970000294900664533030344143
### sde: 360000970000294900664533030344144
### sdf: 360000970000294900664533030344239
### sdg: 360000970000294900664533030344241
### sdh: 360000970000294900664533030344244
### sdi: 360000970000294900664533030344245
### sdj: 360000970000294900664533030344246
### sdk: 360000970000294900664533030303238
### sdl: 360000970000294900664533030344133
### sdm: 360000970000294900664533030344142
### sdn: 360000970000294900664533030344143
### sdo: 360000970000294900664533030344144
### sdp: 360000970000294900664533030344239
### sdq: 360000970000294900664533030344241
### sdr: 360000970000294900664533030344244
### sds: 360000970000294900664533030344245
### sdt: 360000970000294900664533030344246
[node1 sys]#

Here you see again that sda and sdk have the same WWID. I like to assign alias names to the multipathing devices-that’s going to make it easier to find out what they are used for. I now have to get the disk sizes and map these to their intended use.

Getting disk sizes:

[node1 sys]# fdisk -l 2>/dev/null | grep ^Disk
Disk /dev/cciss/c0d0: 73.3 GB, 73372631040 bytes        local
Disk /dev/sda: 2 MB, 2949120 bytes                ignore
Disk /dev/sdb: 195.0 GB, 195050864640 bytes
Disk /dev/sdc: 24.3 GB, 24381358080 bytes
Disk /dev/sdd: 24.3 GB, 24381358080 bytes
Disk /dev/sde: 24.3 GB, 24381358080 bytes
Disk /dev/sdf: 24.3 GB, 24381358080 bytes
Disk /dev/sdg: 24.3 GB, 24381358080 bytes
Disk /dev/sdh: 1073 MB, 1073479680 bytes
Disk /dev/sdi: 1073 MB, 1073479680 bytes
Disk /dev/sdj: 1073 MB, 1073479680 bytes
Disk /dev/sdk: 2 MB, 2949120 bytes                ignore
Disk /dev/sdl: 195.0 GB, 195050864640 bytes
Disk /dev/sdm: 24.3 GB, 24381358080 bytes
Disk /dev/sdn: 24.3 GB, 24381358080 bytes
Disk /dev/sdo: 24.3 GB, 24381358080 bytes
Disk /dev/sdp: 24.3 GB, 24381358080 bytes
Disk /dev/sdq: 24.3 GB, 24381358080 bytes
Disk /dev/sdr: 1073 MB, 1073479680 bytes
Disk /dev/sds: 1073 MB, 1073479680 bytes
Disk /dev/sdt: 1073 MB, 1073479680 bytes

The cleared, consolidated view on the storage:

### sdb: 360000970000294900664533030344133    195G
### sdc: 360000970000294900664533030344142    24.3G
### sdd: 360000970000294900664533030344143    24.3G   
### sde: 360000970000294900664533030344144    24.3G   
### sdf: 360000970000294900664533030344239    24.3G   
### sdg: 360000970000294900664533030344241    24.3G   
### sdh: 360000970000294900664533030344244    1G
### sdi: 360000970000294900664533030344245    1G
### sdj: 360000970000294900664533030344246    1G

### sdl: 360000970000294900664533030344133    repeat - second path
### sdm: 360000970000294900664533030344142
### sdn: 360000970000294900664533030344143
### sdo: 360000970000294900664533030344144
### sdp: 360000970000294900664533030344239
### sdq: 360000970000294900664533030344241
### sdr: 360000970000294900664533030344244
### sds: 360000970000294900664533030344245
### sdt: 36000097000029490066453303034424

Finally here’s the mapping I will use:

  • sdb    DATA001
  • sdc    REDO001
  • sdd     FRA001
  • sde    FRA002
  • sdf    ACFS001
  • sdg    ACFS002
  • h,i,j     VOTINGOCR{1,2,3}

The mapping between WWID and alias happens in the /etc/multipath.conf file. The defaults section has been taken from MOS note 555603.1. The devnode_blacklist section has to be set up according to your storage config-in my case I ignore IDE devices and the internal RAID adapter. Note that this is the most generic version of this file-consult your storage vendor for more information about how to set up dev-mapper multipath optimised for your storage array (I’m thinking about the devices {} section here)

[root@node1 ~]# cat /etc/multipath.conf
defaults {
 udev_dir                /dev
 polling_interval        10
 selector                "round-robin 0"
 path_grouping_policy    multibus
 getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
 prio_callout            /bin/true
 path_checker            readsector0
 rr_min_io               100
 rr_weight               priorities
 failback                immediate
 no_path_retry           fail
 user_friendly_name      no
}

devnode_blacklist {
 devnode "^(ramrawloopfdmddm-srscdst)[0-9]*"
 devnode "^hd[a-z]"
 devnode "^cciss!c[0-9]d[0-9]*"
 }

}

multipaths {
 multipath {
 wwid 360000970000294900664533030344133
 alias data001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344142
 alias redo001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344143
 alias fra001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344144
 alias fra002
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344239
 alias acfs001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344241
 alias acfs002
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344244
 alias votingocr001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344245
 alias votingocr002
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344246
 alias votingocr003
 path_grouping_policy failover
 }
}

The mapping is really simple-for each device you use, create a “multipath” section, enter WWID, an alias and a path policy. Done! See if that worked by starting the multipath daemon:

[root@node1 ~]# service multipathd start

As always, /var/log/messages is a good place to check:

Nov 16 16:34:58 loninengblc204 kernel: device-mapper: table: 253:5: multipath: error getting device
Nov 16 16:34:58 loninengblc204 kernel: device-mapper: ioctl: error adding target to table
Nov 16 16:34:58 loninengblc204 multipathd: 360000970000294900664533030303238: load table [0 5760 multipath 0 0 1 1 round-robin 0 2 1 8:0 1000 8:160 1000]
Nov 16 16:34:58 loninengblc204 multipathd: data001: load table [0 380958720 multipath 0 0 2 1 round-robin 0 1 1 8:16 1000 round-robin 0 1 1 8:176 1000]
Nov 16 16:34:58 loninengblc204 multipathd: redo001: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:32 1000 round-robin 0 1 1 8:192 1000]
Nov 16 16:34:58 loninengblc204 multipathd: fra001: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:48 1000 round-robin 0 1 1 8:208 1000]
Nov 16 16:34:58 loninengblc204 multipathd: fra002: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:64 1000 round-robin 0 1 1 8:224 1000]
Nov 16 16:34:58 loninengblc204 multipathd: acfs001: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:80 1000 round-robin 0 1 1 8:240 1000]
Nov 16 16:34:58 loninengblc204 multipathd: acfs002: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:96 1000 round-robin 0 1 1 65:0 1000]
Nov 16 16:34:58 loninengblc204 multipathd: votingocr001: load table [0 2096640 multipath 0 0 2 1 round-robin 0 1 1 8:112 1000 round-robin 0 1 1 65:16 1000]
Nov 16 16:34:58 loninengblc204 multipathd: votingocr002: load table [0 2096640 multipath 0 0 2 1 round-robin 0 1 1 8:128 1000 round-robin 0 1 1 65:32 1000]
Nov 16 16:34:58 loninengblc204 multipathd: votingocr003: load table [0 2096640 multipath 0 0 2 1 round-robin 0 1 1 8:144 1000 round-robin 0 1 1 65:48 1000]
Nov 16 16:34:58 loninengblc204 multipathd: 360000970000294900664533030303238: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: data001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: redo001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: fra001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: fra002: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: acfs001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: acfs002: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: votingocr001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: votingocr002: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: votingocr003: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: path checkers start u

Great – are all paths working?

[root@node1 ~]# multipath -ll | head
fra002 (360000970000294900664533030344144) dm-9 EMC,SYMMETRIX
[size=23G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:4 sde 8:64  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:4 sdo 8:224 [active][ready]
fra001 (360000970000294900664533030344143) dm-8 EMC,SYMMETRIX
[size=23G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:3 sdd 8:48  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:3 sdn 8:208 [active][ready]
acfs002 (360000970000294900664533030344241) dm-11 EMC,SYMMETRIX
[size=23G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:6 sdg 8:96  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:6 sdq 65:0  [active][ready]

Congratulations – distribute the working multipath.conf to all cluster nodes and start multipathd.

The beauty of this over a solution such as power path is that the device names are consistent across the cluster. With PowerPath I have come across a situation where /dev/rdsk/emcpower1a on node1 was /dev/rdsk/emcpower4a on node2 and again a different device on the other nodes. Not really user friendly, but neither a big issue with ASM: it’ll read the information from the disk headers anyway. It was more a problem with pre 11.2 when you had to use block devices to store the OCR and voting files.

Responses

  1. Hi,

    You can also use the command ‘multipath -v2’ to rebuild the aliases (i.e re-read the config-file), instead of stopping/starting the daemon.

    regds
    /M

  2. Great post Martin!

    I know what you mean about EMC PowerPath, but you can use the emcpadm tool to export the mappings from
    one node and then import them on the other nodes to ensure consistent naming.

    e.g

    emcpadm export_mappings -f

    and then


    emcpadm import_mappings -f

    Thanks again for sharing your experience.

    Cheers
    Neil

  3. As Neil stated on EMC PowerPath you can also achieve the same result using:
    On node 1
    emcpadm getused
    and then
    emcpadm rename -s emcpowera -t emcpowere
    emcpadm rename -s emcpowerb -t emcpowerf
    emcpadm rename -s emcpowerc -t emcpowerg

    On node 2
    emcpadm getused
    and then
    emcpadm rename -s emcpowerb -t emcpowere
    emcpadm rename -s emcpowera -t emcpowerf
    emcpadm rename -s emcpowerc -t emcpowerg

    where emcpowera and emcpowerb on nodes were inverted…

    Bye,
    MarcoV

    1. Hi Marco,

      can this be scripted or otherwise automated? I personally would prefer to have device name stability across the cluster without having to rename devices each time a system reboots?

      Martin

      1. Hi Martin,

        I believe that once you have configured it the way you want, issue a powermt save and then the namings should be permanent.

        Neil

  4. You could also use ” service multipath reload” instead of starting/stopping to reload the multipath configuration.

Blog at WordPress.com.