Before I’ll start with this post, first a few words of warning (and this time I mean it!): even though the described method here works for me it is neither supported or even mentioned as possible in the Oracle documentation or metalink. Use at your own risk, and never with production data. I will not take responsibility for corrupted disk headers here :) Also, this post is for educational purpose only.
Now with that said, I’ll give you some background to this post. During a project I was involved in recently it was decided to upgrade the estate from 10.2.0.3 32bit on RHEL 3 to 10.2.0.4.1 on 64bit RHEL 5. The storage layer changed as well, from OCFS v1 (don’t ask!) to ASM. However, some of the processes that used to be possible such as cloning database LUNs on the SAN via a BCV copy would cause problems now when the cloned LUNs are to be presented to the same host as the source LUNs.
How come? It’s a design feature that ASM writes meta information into its disk headers, which is great in general. Understandably “there can only be one” LUN in use by ASM with the same meta information,i.e. having 2 disks named “DATA1” in diskgroup “DATA” mounted by +ASM won’t work. When cloning though, the source LUN is duplicated 1:1 and ASM would end up “seeing” 2 identical disks, and I assume that would confuse the hell out of it. In 11.2, Oracle introduced a command called renamedg which can deal with such a situation, but prior to that there is no officially supported way. (wink) I have written about the renamedg command in an earlier post by the way. The LUN cloning is the fastest way to duplicate the 2TB system, therefore its owners decided that it had to be possible.
To clarify: this problem only exists when you clone the LUNs and present the clones to the same database server; the presentation of cloned LUNs to a different server than the source machine will not cause the problem described here.
The setup is rather simple: a primary database, a 3 node RAC system with 2 standby databases, one of which is single instance and local to the data centre. It is located on a different storage array to avoid a potential performance impact during the execution of the snapclone process. The production array is an HP EVA 8100 whereas the standby database is on a EVA 5000 series.
The production database uses 3 ASM disk groups: DATA, FRA and CTRLLOG for data files, flash recovery area and online redo logs/control files. The environment cloned from it will use disk groups DATB, FRB and CTRLLGG. This looks as follows in ASM prior to the cloning process (output trimmed to fit on page):
ASMCMD> lsdg State Type AU Total_MB Free_MB Name MOUNTED EXTERN 1048576 20479 17398 CTRLLOG/ MOUNTED EXTERN 1048576 2047968 356521 DATA/ MOUNTED EXTERN 1048576 511992 450918 FRA/ ASMCMD>
The following disks are defined in ASM prior to the snapclone – note we are using ASMLib, therefore the paths are prefixed “ORCL”:
09:51:49 SYS@+ASM AS SYSDBA> r 1 select name,path from v$asm_disk 2* order by path NAME PATH ------------------------------ ---------------------------------------- DATA1 ORCL:DATA1 DATA2 ORCL:DATA2 DATA3 ORCL:DATA3 DATA4 ORCL:DATA4 DATA5 ORCL:DATA5 DATA6 ORCL:DATA6 DATA7 ORCL:DATA7 DATA8 ORCL:DATA8 FRA1 ORCL:FRA1 FRA2 ORCL:FRA2 LOGCTL1 ORCL:LOGCTL1
So far so good. Now about the presentation of these (source) LUNs to the host. This is a single instance box, but the setup doesn’t differ from a RAC setup. The company decided to use device mapper for multipathing, the setup is as follows:
[root@devbox ~]# ls -l /dev/mapper total 0 crw------- 1 root root 10, 63 Jan 7 13:51 control brw-rw---- 1 root disk 253, 0 Jan 7 13:52 VolGroup01-rootvol brw-rw---- 1 root disk 253, 3 Jan 7 13:51 VolGroup01-swapvol brw-rw---- 1 root disk 253, 4 Jan 7 13:52 VolGroup01-u01vol brw-rw---- 1 root disk 253, 2 Jan 7 13:52 VolGroup01-usrvol brw-rw---- 1 root disk 253, 1 Jan 7 13:52 VolGroup01-varvol brw-rw---- 1 root disk 253, 8 Jan 7 14:10 standby_data1 brw-rw---- 1 root disk 253, 19 Jan 7 14:10 standby_data1p1 brw-rw---- 1 root disk 253, 9 Jan 7 14:10 standby_data2 brw-rw---- 1 root disk 253, 20 Jan 7 14:10 standby_data2p1 brw-rw---- 1 root disk 253, 10 Jan 7 14:10 standby_data3 brw-rw---- 1 root disk 253, 26 Jan 7 14:10 standby_data3p1 brw-rw---- 1 root disk 253, 11 Jan 7 14:10 standby_data4 brw-rw---- 1 root disk 253, 21 Jan 7 14:10 standby_data4p1 brw-rw---- 1 root disk 253, 12 Jan 7 14:10 standby_data5 brw-rw---- 1 root disk 253, 27 Jan 7 14:10 standby_data5p1 brw-rw---- 1 root disk 253, 13 Jan 7 14:10 standby_data6 brw-rw---- 1 root disk 253, 22 Jan 7 14:10 standby_data6p1 brw-rw---- 1 root disk 253, 14 Jan 7 14:10 standby_data7 brw-rw---- 1 root disk 253, 28 Jan 7 14:10 standby_data7p1 brw-rw---- 1 root disk 253, 15 Jan 7 14:10 standby_data8 brw-rw---- 1 root disk 253, 23 Jan 7 14:10 standby_data8p1 brw-rw---- 1 root disk 253, 16 Jan 7 14:10 standby_fra1 brw-rw---- 1 root disk 253, 24 Jan 7 14:10 standby_fra1p1 brw-rw---- 1 root disk 253, 17 Jan 7 14:10 standby_fra2 brw-rw---- 1 root disk 253, 25 Jan 7 14:10 standby_fra2p1 brw-rw---- 1 root disk 253, 18 Jan 7 14:10 standby_log_ctl brw-rw---- 1 root disk 253, 29 Jan 7 14:10 standby_log_ctlp1
It’s important to use user friendly names instead of mpathn where n is just a number as you’ll see in a bit. The file /etc/multipath.conf has to be edited correctly.
The example will use the machine “devbox” which is home to the standby database as a source, the disk groups DATA, FRA, CTRLLOG will be cloned (on the array through an SSSU script) and presented to the same server.
The actual process of cloning is rather simple: shut down all Oracle processes (database + ASM) and stop asmlib if you are using it using /etc/init.d/oracleasm disable to completely unload the kernel modules. For RAC, it makes sense to disable the ASM instances and database or otherwise a reboot might mess up the carefully crafted scenario. To be on the very safe side, I usually disable ASMLib as well and change /etc/sysconfig/oracleasm variables:
Then run the SSSU script to clone your source LUNs on the array. Although this example references the HP utility, and BCV copy tool should do the same. If the source system can’t be shut down for a consistent copy, you might consider running the “alter database suspend” (resume) command instead for a short period of time.
The end result will hopefully be a duplicate set of ASM disks for all your source LUNs.When doing this for the first time, some extra work is necessary. The Storage System Scripting Utility SSSU, a command line interface to the storage array (HP EVA 5000 series in this case) initially assigns random scsi WWIDs to the cloned LUNs.Next time, you can pass these WWIDs to the add copy command and everything will be reproducible.
Said WWIDs need to be read and added to the /etc/multipath.conf file once the snapclone procedure finishes. Also, the host might need some encouragement to detect the newly created LUNs. In RHEL 5 up to 5.3, you rescan the SCSI bus as follows:
for i in `ls /sys/class/scsi_host/` ; do echo - - - > /sys/class/scsi_host/$i/scan ; done
RHEL 5.4 and newer has a new shell script, called usr/bin/rescan-scsi-bus.sh to do the same.
- http://kbase.redhat.com/faq/docs/DOC-3942 (How do I rescan the SCSI bus to add or remove a SCSI device without rebooting the computer?)
Be careful with the /dev/disk/by-id/ directory-I found that wwids shown there don’t necessarily match up with the devices when multipathing is enabled, udev sometimes seems to get a bit confused.
At the end of the above steps, with updated multipath.conf you should see this in /dev/mapper for your cloned LUNs:
[root@devbox backup]# ls -l /dev/mapper/*clone* brw-rw---- 1 root disk 253, 30 Jan 7 15:09 /dev/mapper/clone_data1 brw-rw---- 1 root disk 253, 44 Jan 7 15:09 /dev/mapper/clone_data1p1 brw-rw---- 1 root disk 253, 33 Jan 7 15:05 /dev/mapper/clone_data2 brw-rw---- 1 root disk 253, 37 Jan 7 15:05 /dev/mapper/clone_data2p1 brw-rw---- 1 root disk 253, 31 Jan 7 15:05 /dev/mapper/clone_data3 brw-rw---- 1 root disk 253, 45 Jan 7 15:05 /dev/mapper/clone_data3p1 brw-rw---- 1 root disk 253, 34 Jan 7 15:05 /dev/mapper/clone_data4 brw-rw---- 1 root disk 253, 39 Jan 7 15:05 /dev/mapper/clone_data4p1 brw-rw---- 1 root disk 253, 32 Jan 7 15:05 /dev/mapper/clone_data5 brw-rw---- 1 root disk 253, 46 Jan 7 15:05 /dev/mapper/clone_data5p1 brw-rw---- 1 root disk 253, 35 Jan 7 15:05 /dev/mapper/clone_data6 brw-rw---- 1 root disk 253, 47 Jan 7 15:05 /dev/mapper/clone_data6p1 brw-rw---- 1 root disk 253, 36 Jan 7 15:05 /dev/mapper/clone_data7 brw-rw---- 1 root disk 253, 43 Jan 7 15:05 /dev/mapper/clone_data7p1 brw-rw---- 1 root disk 253, 38 Jan 7 15:05 /dev/mapper/clone_data8 brw-rw---- 1 root disk 253, 48 Jan 7 15:05 /dev/mapper/clone_data8p1 brw-rw---- 1 root disk 253, 40 Jan 7 15:05 /dev/mapper/clone_fra1 brw-rw---- 1 root disk 253, 50 Jan 7 15:05 /dev/mapper/clone_fra1p1 brw-rw---- 1 root disk 253, 41 Jan 7 15:05 /dev/mapper/clone_fra2 brw-rw---- 1 root disk 253, 51 Jan 7 15:05 /dev/mapper/clone_fra2p1 brw-rw---- 1 root disk 253, 42 Jan 7 15:05 /dev/mapper/clone_log_ctl brw-rw---- 1 root disk 253, 49 Jan 7 15:05 /dev/mapper/clone_log_ctlp1
Don’t start ASM or ASMLib yet-the disk headers of source and clone are still identical. This is where the unsupported part begins: we have to use the kfed utility which isn’t linked by default in 10.2 (but in 11.1 onwards). Link kfed as follows:
oracle@devbox ~]$ cd $ORACLE_HOME/lib
[oracle@devbox lib]$ make -f ins_rdbms.mk ikfed
Linking KFED utility (kfed)
rm -f /u01/app/oracle/product/10.2.0/db_2/rdbms/lib/kfed
gcc -o /u01/app/oracle/product/10.2.0/db_2/rdbms/lib/kfed -L/u01/app/oracle/product/10.2.0/db_2/rdbms/lib/ -L/u01/app/oracle/product/10.2.0/db_2/lib/ -L/u01/app/oracle/product/10.2.0/db_2/lib/stubs/ /u01/app/oracle/product/10.2.0/db_2/lib/s0main.o /u01/app/oracle/product/10.2.0/db_2/rdbms/lib/sskfeded.o /u01/app/oracle/product/10.2.0/db_2/rdbms/lib/skfedpt.o /u01/app/oracle/product/10.2.0/db_2/rdbms/lib/defopt.o -ldbtools10 -lclntsh `cat /u01/app/oracle/product/10.2.0/db_2/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /u01/app/oracle/product/10.2.0/db_2/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lclient10 -lnnetd10 -lvsn10 -lcommon10 -lgeneric10 -lmm -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `cat /u01/app/oracle/product/10.2.0/db_2/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /u01/app/oracle/product/10.2.0/db_2/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lclient10 -lnnetd10 -lvsn10 -lcommon10 -lgeneric10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lclient10 -lnnetd10 -lvsn10 -lcommon10 -lgeneric10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `cat /u01/app/oracle/product/10.2.0/db_2/lib/sysliblist` -Wl,-rpath,/u01/app/oracle/product/10.2.0/db_2/lib -lm `cat /u01/app/oracle/product/10.2.0/db_2/lib/sysliblist` -ldl -lm -L/u01/app/oracle/product/10.2.0/db_2/lib
mv -f /u01/app/oracle/product/10.2.0/db_2/bin/kfed /u01/app/oracle/product/10.2.0/db_2/bin/kfedO
mv: cannot stat `/u01/app/oracle/product/10.2.0/db_2/bin/kfed’: No such file or directory
make: [ikfed] Error 1 (ignored)
mv /u01/app/oracle/product/10.2.0/db_2/rdbms/lib/kfed /u01/app/oracle/product/10.2.0/db_2/bin/kfed
chmod 751 /u01/app/oracle/product/10.2.0/db_2/bin/kfed
[oracle@devbox lib]$ find $ORACLE_HOME/bin/ -name kfed
The Big Picture
The concept now is as follows:
- Read the ASM disk headers from all cloned ASM disks and store it as a file
- Modify the files and change the disk group name attribute “kfdhdb.grpname” from DATA to DATB in my example. There seems to be a dependency on the length of the disk group name somewhere else in the header so I decided to play safe and just change the last letter. Do so for every single cloned LUN.
- Write the modified headers back
The kfed read operation dumps the header to stdout-you can use this to store it in a file:
[root@devbox backup]# kfed read /dev/mapper/clone_data2p1 > header_clone_data2p1 [...]
To be safe I also copied the original header information to a backup location. I also dd’d the first 4k of each disk and stored that elsewhere, just in case. As far as I know the first 4k of each ASM disk contain the header.
Once all the new header information is ready, it’s time to bite the bullet and write it back to the device. Remember that I said that it was
important to have user friendly names earlier? Write the changed header information to a source LUN and your source database won’t start anymore-what a mess! And don’t go to Oracle support because they won’t help you either Be extra extra careful with this step! Again, I won’t take responsibility if you get it wrong…. The command this time is kfed merge as in the following example:
[root@devbox backup]# kfed merge /dev/mapper/clone_data1p1 text=header_clone_data1p1 [...]
The final step involves force-renaming the disks-the clones still have their original disk names. First, re-enable oracleasmlib by changing the /etc/sysconfig/oracleasm file, make sure to have these values set:
Execute “/etc/init.d/oracleasm enable” to load the kernel modules. I disabled ASMLib before the snapcloen which not only stops it but it also unloads the kernel modules. Again a lot of typing, but well worth it:
/etc/init.d/oracleasm force-renamedisk /dev/mapper/clone_data1p1 DATB1 [...] /etc/init.d/oracleasm force-renamedisk /dev/mapper/clone_data8p1 DATB8 [root@devbox backup]# /etc/init.d/oracleasm scandisks
If all goes well, you’ll get the output similar to the following:
[root@devbox backup]# /etc/init.d/oracleasm listdisks DATA1 DATA2 DATA3 DATA4 DATA5 DATA6 DATA7 DATA8 DATB1 DATB2 DATB3 DATB4 DATB5 DATB6 DATB7 DATB8 FRA1 FRA2 FRB1 FRB2 LOGCTL1 LOGCTL2 [root@devbox backup]
Good stuff! Now start asm and the source database. In my case that’s a standby so I have to mount it and put it into managed recovery. Also set oracleasm_scanboot to TRUE so it’ll automatically scan for disks next time the server boots. If this is the first execution, modify the ASM pfile to add the cloned diskgroups to asm_diskgroups (on all nodes on the cluster in a RAC scenario)
The rest isn’t different from any other database duplication: you backup the source controlfile to trace, edit it (resetlogs case), change the create controlfile line to set database newdb and change paths to point to the changed disk groups (DATA->DATB, FRA->FRB) etc. Make sure no pointers to the source system exist in the spfile/pfile/control file. Then you do the triple step “startup mount – create controlfile – open resetlogs”. In case of RAC you have to add online logfiles for the other threads (instances), also ensure that cluster_database is false for the create controlfile command.