I have always wondered how to configure the device mapper multipath package for a Linux system. I knew how to do it in principle, but was never involved in the configuration from start up. Today I got the chance to work on this. The system is used for a lab test and not a production box (otherwise I probably wouldn’t have been allowed on). Actually it’s part of a 2 node cluster.
So the first step is to find out which partitions are visible to the system. The Linux kernel presents this information in the /proc/partitions table, as in the following example:
[root@node1 ~]# cat /proc/partitions major minor #blocks name 104 0 71652960 cciss/c0d0 104 1 152586 cciss/c0d0p1 104 2 71497282 cciss/c0d0p2 8 0 2880 sda 8 16 190479360 sdb 8 32 23809920 sdc 8 48 23809920 sdd 8 64 23809920 sde 8 80 23809920 sdf 8 96 23809920 sdg 8 112 1048320 sdh 8 128 1048320 sdi 8 144 1048320 sdj 8 160 2880 sdk 8 176 190479360 sdl 8 192 23809920 sdm 8 208 23809920 sdn 8 224 23809920 sdo 8 240 23809920 sdp 65 0 23809920 sdq 65 16 1048320 sdr 65 32 1048320 sds 65 48 1048320 sdt 253 0 5111808 dm-0 253 1 25591808 dm-1 253 2 10223616 dm-2 253 3 1015808 dm-3 253 4 16777216 dm-4 [root@node1 ~]#
Using a keen eye you can see that sdk is the same size as sda, so probably that means that we have two paths to sda to sdj. We’ll confirm this later. The more HBAs and paths you have, the more partitions you are going to see. This is where the multipathing software comes into play: it allows us to abstract from the physical paths and presents a logical device. And offers some additional goodies such as path failover and limited load balancing.
Before proceeding I checked the status of the multipath daemon:
[root@node1 ~]# service multipathd status multipathd is stopped [root@node1 ~]# chkconfig --list multipathd multipathd 0:off 1:off 2:off 3:off 4:off 5:off 6:off [root@node1 ~]# chkconfig multipathd on <pre>
As you can see it was not started, and wouldn’t start with a reboot – it was necessary to enable the service at boot time using the chkconfig command. This will automatically create links in /etc/rc.d/rcx.d to start and stop the service. As an additional benefit this command will respect dependencies the authors of the startup script have defined and create the {K,S}xxmultipathd links accordingly.
I next loaded the necessary modules-dm-multipath and dm-round-robin:
[root@node1 ~]# modprobe dm-multipath [root@node1 ~]# modprobe dm-round-robin
With the multipathing nearly done, I need to get the WWIDs of all attached devices. At some point the WWID is going to be repeated – this is where you stop creating meta devices. Let’s have a look at the output of this first. You have to change directory to /sys, as the scsi_id commands are relative to it.
[node1 sys]# for i in `cat /proc/partitions | awk '{print $4}' |grep sd`; do echo "### $i: `scsi_id -g -u -s /block/$i`"; done ### sda: 360000970000294900664533030303238 ### sdb: 360000970000294900664533030344133 ### sdc: 360000970000294900664533030344142 ### sdd: 360000970000294900664533030344143 ### sde: 360000970000294900664533030344144 ### sdf: 360000970000294900664533030344239 ### sdg: 360000970000294900664533030344241 ### sdh: 360000970000294900664533030344244 ### sdi: 360000970000294900664533030344245 ### sdj: 360000970000294900664533030344246 ### sdk: 360000970000294900664533030303238 ### sdl: 360000970000294900664533030344133 ### sdm: 360000970000294900664533030344142 ### sdn: 360000970000294900664533030344143 ### sdo: 360000970000294900664533030344144 ### sdp: 360000970000294900664533030344239 ### sdq: 360000970000294900664533030344241 ### sdr: 360000970000294900664533030344244 ### sds: 360000970000294900664533030344245 ### sdt: 360000970000294900664533030344246 [node1 sys]#
Here you see again that sda and sdk have the same WWID. I like to assign alias names to the multipathing devices-that’s going to make it easier to find out what they are used for. I now have to get the disk sizes and map these to their intended use.
Getting disk sizes:
[node1 sys]# fdisk -l 2>/dev/null | grep ^Disk Disk /dev/cciss/c0d0: 73.3 GB, 73372631040 bytes local Disk /dev/sda: 2 MB, 2949120 bytes ignore Disk /dev/sdb: 195.0 GB, 195050864640 bytes Disk /dev/sdc: 24.3 GB, 24381358080 bytes Disk /dev/sdd: 24.3 GB, 24381358080 bytes Disk /dev/sde: 24.3 GB, 24381358080 bytes Disk /dev/sdf: 24.3 GB, 24381358080 bytes Disk /dev/sdg: 24.3 GB, 24381358080 bytes Disk /dev/sdh: 1073 MB, 1073479680 bytes Disk /dev/sdi: 1073 MB, 1073479680 bytes Disk /dev/sdj: 1073 MB, 1073479680 bytes Disk /dev/sdk: 2 MB, 2949120 bytes ignore Disk /dev/sdl: 195.0 GB, 195050864640 bytes Disk /dev/sdm: 24.3 GB, 24381358080 bytes Disk /dev/sdn: 24.3 GB, 24381358080 bytes Disk /dev/sdo: 24.3 GB, 24381358080 bytes Disk /dev/sdp: 24.3 GB, 24381358080 bytes Disk /dev/sdq: 24.3 GB, 24381358080 bytes Disk /dev/sdr: 1073 MB, 1073479680 bytes Disk /dev/sds: 1073 MB, 1073479680 bytes Disk /dev/sdt: 1073 MB, 1073479680 bytes
The cleared, consolidated view on the storage:
### sdb: 360000970000294900664533030344133 195G ### sdc: 360000970000294900664533030344142 24.3G ### sdd: 360000970000294900664533030344143 24.3G ### sde: 360000970000294900664533030344144 24.3G ### sdf: 360000970000294900664533030344239 24.3G ### sdg: 360000970000294900664533030344241 24.3G ### sdh: 360000970000294900664533030344244 1G ### sdi: 360000970000294900664533030344245 1G ### sdj: 360000970000294900664533030344246 1G ### sdl: 360000970000294900664533030344133 repeat - second path ### sdm: 360000970000294900664533030344142 ### sdn: 360000970000294900664533030344143 ### sdo: 360000970000294900664533030344144 ### sdp: 360000970000294900664533030344239 ### sdq: 360000970000294900664533030344241 ### sdr: 360000970000294900664533030344244 ### sds: 360000970000294900664533030344245 ### sdt: 36000097000029490066453303034424
Finally here’s the mapping I will use:
- sdb DATA001
- sdc REDO001
- sdd FRA001
- sde FRA002
- sdf ACFS001
- sdg ACFS002
- h,i,j VOTINGOCR{1,2,3}
The mapping between WWID and alias happens in the /etc/multipath.conf file. The defaults section has been taken from MOS note 555603.1. The devnode_blacklist section has to be set up according to your storage config-in my case I ignore IDE devices and the internal RAID adapter. Note that this is the most generic version of this file-consult your storage vendor for more information about how to set up dev-mapper multipath optimised for your storage array (I’m thinking about the devices {} section here)
[root@node1 ~]# cat /etc/multipath.conf defaults { udev_dir /dev polling_interval 10 selector "round-robin 0" path_grouping_policy multibus getuid_callout "/sbin/scsi_id -g -u -s /block/%n" prio_callout /bin/true path_checker readsector0 rr_min_io 100 rr_weight priorities failback immediate no_path_retry fail user_friendly_name no } devnode_blacklist { devnode "^(ramrawloopfdmddm-srscdst)[0-9]*" devnode "^hd[a-z]" devnode "^cciss!c[0-9]d[0-9]*" } } multipaths { multipath { wwid 360000970000294900664533030344133 alias data001 path_grouping_policy failover } multipath { wwid 360000970000294900664533030344142 alias redo001 path_grouping_policy failover } multipath { wwid 360000970000294900664533030344143 alias fra001 path_grouping_policy failover } multipath { wwid 360000970000294900664533030344144 alias fra002 path_grouping_policy failover } multipath { wwid 360000970000294900664533030344239 alias acfs001 path_grouping_policy failover } multipath { wwid 360000970000294900664533030344241 alias acfs002 path_grouping_policy failover } multipath { wwid 360000970000294900664533030344244 alias votingocr001 path_grouping_policy failover } multipath { wwid 360000970000294900664533030344245 alias votingocr002 path_grouping_policy failover } multipath { wwid 360000970000294900664533030344246 alias votingocr003 path_grouping_policy failover } }
The mapping is really simple-for each device you use, create a “multipath” section, enter WWID, an alias and a path policy. Done! See if that worked by starting the multipath daemon:
[root@node1 ~]# service multipathd start
As always, /var/log/messages is a good place to check:
Nov 16 16:34:58 loninengblc204 kernel: device-mapper: table: 253:5: multipath: error getting device Nov 16 16:34:58 loninengblc204 kernel: device-mapper: ioctl: error adding target to table Nov 16 16:34:58 loninengblc204 multipathd: 360000970000294900664533030303238: load table [0 5760 multipath 0 0 1 1 round-robin 0 2 1 8:0 1000 8:160 1000] Nov 16 16:34:58 loninengblc204 multipathd: data001: load table [0 380958720 multipath 0 0 2 1 round-robin 0 1 1 8:16 1000 round-robin 0 1 1 8:176 1000] Nov 16 16:34:58 loninengblc204 multipathd: redo001: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:32 1000 round-robin 0 1 1 8:192 1000] Nov 16 16:34:58 loninengblc204 multipathd: fra001: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:48 1000 round-robin 0 1 1 8:208 1000] Nov 16 16:34:58 loninengblc204 multipathd: fra002: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:64 1000 round-robin 0 1 1 8:224 1000] Nov 16 16:34:58 loninengblc204 multipathd: acfs001: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:80 1000 round-robin 0 1 1 8:240 1000] Nov 16 16:34:58 loninengblc204 multipathd: acfs002: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:96 1000 round-robin 0 1 1 65:0 1000] Nov 16 16:34:58 loninengblc204 multipathd: votingocr001: load table [0 2096640 multipath 0 0 2 1 round-robin 0 1 1 8:112 1000 round-robin 0 1 1 65:16 1000] Nov 16 16:34:58 loninengblc204 multipathd: votingocr002: load table [0 2096640 multipath 0 0 2 1 round-robin 0 1 1 8:128 1000 round-robin 0 1 1 65:32 1000] Nov 16 16:34:58 loninengblc204 multipathd: votingocr003: load table [0 2096640 multipath 0 0 2 1 round-robin 0 1 1 8:144 1000 round-robin 0 1 1 65:48 1000] Nov 16 16:34:58 loninengblc204 multipathd: 360000970000294900664533030303238: event checker started Nov 16 16:34:58 loninengblc204 multipathd: data001: event checker started Nov 16 16:34:58 loninengblc204 multipathd: redo001: event checker started Nov 16 16:34:58 loninengblc204 multipathd: fra001: event checker started Nov 16 16:34:58 loninengblc204 multipathd: fra002: event checker started Nov 16 16:34:58 loninengblc204 multipathd: acfs001: event checker started Nov 16 16:34:58 loninengblc204 multipathd: acfs002: event checker started Nov 16 16:34:58 loninengblc204 multipathd: votingocr001: event checker started Nov 16 16:34:58 loninengblc204 multipathd: votingocr002: event checker started Nov 16 16:34:58 loninengblc204 multipathd: votingocr003: event checker started Nov 16 16:34:58 loninengblc204 multipathd: path checkers start u
Great – are all paths working?
[root@node1 ~]# multipath -ll | head fra002 (360000970000294900664533030344144) dm-9 EMC,SYMMETRIX [size=23G][features=0][hwhandler=0][rw] \_ round-robin 0 [prio=0][enabled] \_ 0:0:0:4 sde 8:64 [active][ready] \_ round-robin 0 [prio=0][enabled] \_ 1:0:0:4 sdo 8:224 [active][ready] fra001 (360000970000294900664533030344143) dm-8 EMC,SYMMETRIX [size=23G][features=0][hwhandler=0][rw] \_ round-robin 0 [prio=0][enabled] \_ 0:0:0:3 sdd 8:48 [active][ready] \_ round-robin 0 [prio=0][enabled] \_ 1:0:0:3 sdn 8:208 [active][ready] acfs002 (360000970000294900664533030344241) dm-11 EMC,SYMMETRIX [size=23G][features=0][hwhandler=0][rw] \_ round-robin 0 [prio=0][enabled] \_ 0:0:0:6 sdg 8:96 [active][ready] \_ round-robin 0 [prio=0][enabled] \_ 1:0:0:6 sdq 65:0 [active][ready]
Congratulations – distribute the working multipath.conf to all cluster nodes and start multipathd.
The beauty of this over a solution such as power path is that the device names are consistent across the cluster. With PowerPath I have come across a situation where /dev/rdsk/emcpower1a on node1 was /dev/rdsk/emcpower4a on node2 and again a different device on the other nodes. Not really user friendly, but neither a big issue with ASM: it’ll read the information from the disk headers anyway. It was more a problem with pre 11.2 when you had to use block devices to store the OCR and voting files.
Responses
Hi,
You can also use the command ‘multipath -v2’ to rebuild the aliases (i.e re-read the config-file), instead of stopping/starting the daemon.
regds
/M
Great post Martin!
I know what you mean about EMC PowerPath, but you can use the emcpadm tool to export the mappings from
one node and then import them on the other nodes to ensure consistent naming.
e.g
emcpadm export_mappings -f
and then
emcpadm import_mappings -f
Thanks again for sharing your experience.
Cheers
Neil
As Neil stated on EMC PowerPath you can also achieve the same result using:
On node 1
emcpadm getused
and then
emcpadm rename -s emcpowera -t emcpowere
emcpadm rename -s emcpowerb -t emcpowerf
emcpadm rename -s emcpowerc -t emcpowerg
On node 2
emcpadm getused
and then
emcpadm rename -s emcpowerb -t emcpowere
emcpadm rename -s emcpowera -t emcpowerf
emcpadm rename -s emcpowerc -t emcpowerg
where emcpowera and emcpowerb on nodes were inverted…
Bye,
MarcoV
Hi Marco,
can this be scripted or otherwise automated? I personally would prefer to have device name stability across the cluster without having to rename devices each time a system reboots?
Martin
Hi Martin,
I believe that once you have configured it the way you want, issue a powermt save and then the namings should be permanent.
Neil
You could also use ” service multipath reload” instead of starting/stopping to reload the multipath configuration.