Rename cloned diskgroup in Oracle 10.2
Posted by Martin Bach on January 12, 2010
Before I’ll start with this post, first a few words of warning (and this time I mean it!): even though the described method here works for me it is neither supported or even mentioned as possible in the Oracle documentation or metalink. Use at your own risk, and never with production data. I will not take responsibility for corrupted disk headers here :) Also, this post is for educational purpose only.
Now with that said, I’ll give you some background to this post. During a project I was involved in recently it was decided to upgrade the estate from 10.2.0.3 32bit on RHEL 3 to 10.2.0.4.1 on 64bit RHEL 5. The storage layer changed as well, from OCFS v1 (don’t ask!) to ASM. However, some of the processes that used to be possible such as cloning database LUNs on the SAN via a BCV copy would cause problems now when the cloned LUNs are to be presented to the same host as the source LUNs.
How come? It’s a design feature that ASM writes meta information into its disk headers, which is great in general. Understandably “there can only be one” LUN in use by ASM with the same meta information,i.e. having 2 disks named “DATA1″ in diskgroup “DATA” mounted by +ASM won’t work. When cloning though, the source LUN is duplicated 1:1 and ASM would end up “seeing” 2 identical disks, and I assume that would confuse the hell out of it. In 11.2, Oracle introduced a command called renamedg which can deal with such a situation, but prior to that there is no officially supported way. (wink) I have written about the renamedg command in an earlier post by the way. The LUN cloning is the fastest way to duplicate the 2TB system, therefore its owners decided that it had to be possible.
To clarify: this problem only exists when you clone the LUNs and present the clones to the same database server; the presentation of cloned LUNs to a different server than the source machine will not cause the problem described here.
Setup
The setup is rather simple: a primary database, a 3 node RAC system with 2 standby databases, one of which is single instance and local to the data centre. It is located on a different storage array to avoid a potential performance impact during the execution of the snapclone process. The production array is an HP EVA 8100 whereas the standby database is on a EVA 5000 series.
The production database uses 3 ASM disk groups: DATA, FRA and CTRLLOG for data files, flash recovery area and online redo logs/control files. The environment cloned from it will use disk groups DATB, FRB and CTRLLGG. This looks as follows in ASM prior to the cloning process (output trimmed to fit on page):
ASMCMD> lsdg State Type AU Total_MB Free_MB Name MOUNTED EXTERN 1048576 20479 17398 CTRLLOG/ MOUNTED EXTERN 1048576 2047968 356521 DATA/ MOUNTED EXTERN 1048576 511992 450918 FRA/ ASMCMD>
The following disks are defined in ASM prior to the snapclone – note we are using ASMLib, therefore the paths are prefixed “ORCL”:
09:51:49 SYS@+ASM AS SYSDBA> r 1 select name,path from v$asm_disk 2* order by path NAME PATH ------------------------------ ---------------------------------------- DATA1 ORCL:DATA1 DATA2 ORCL:DATA2 DATA3 ORCL:DATA3 DATA4 ORCL:DATA4 DATA5 ORCL:DATA5 DATA6 ORCL:DATA6 DATA7 ORCL:DATA7 DATA8 ORCL:DATA8 FRA1 ORCL:FRA1 FRA2 ORCL:FRA2 LOGCTL1 ORCL:LOGCTL1
So far so good. Now about the presentation of these (source) LUNs to the host. This is a single instance box, but the setup doesn’t differ from a RAC setup. The company decided to use device mapper for multipathing, the setup is as follows:
[root@devbox ~]# ls -l /dev/mapper total 0 crw------- 1 root root 10, 63 Jan 7 13:51 control brw-rw---- 1 root disk 253, 0 Jan 7 13:52 VolGroup01-rootvol brw-rw---- 1 root disk 253, 3 Jan 7 13:51 VolGroup01-swapvol brw-rw---- 1 root disk 253, 4 Jan 7 13:52 VolGroup01-u01vol brw-rw---- 1 root disk 253, 2 Jan 7 13:52 VolGroup01-usrvol brw-rw---- 1 root disk 253, 1 Jan 7 13:52 VolGroup01-varvol brw-rw---- 1 root disk 253, 8 Jan 7 14:10 standby_data1 brw-rw---- 1 root disk 253, 19 Jan 7 14:10 standby_data1p1 brw-rw---- 1 root disk 253, 9 Jan 7 14:10 standby_data2 brw-rw---- 1 root disk 253, 20 Jan 7 14:10 standby_data2p1 brw-rw---- 1 root disk 253, 10 Jan 7 14:10 standby_data3 brw-rw---- 1 root disk 253, 26 Jan 7 14:10 standby_data3p1 brw-rw---- 1 root disk 253, 11 Jan 7 14:10 standby_data4 brw-rw---- 1 root disk 253, 21 Jan 7 14:10 standby_data4p1 brw-rw---- 1 root disk 253, 12 Jan 7 14:10 standby_data5 brw-rw---- 1 root disk 253, 27 Jan 7 14:10 standby_data5p1 brw-rw---- 1 root disk 253, 13 Jan 7 14:10 standby_data6 brw-rw---- 1 root disk 253, 22 Jan 7 14:10 standby_data6p1 brw-rw---- 1 root disk 253, 14 Jan 7 14:10 standby_data7 brw-rw---- 1 root disk 253, 28 Jan 7 14:10 standby_data7p1 brw-rw---- 1 root disk 253, 15 Jan 7 14:10 standby_data8 brw-rw---- 1 root disk 253, 23 Jan 7 14:10 standby_data8p1 brw-rw---- 1 root disk 253, 16 Jan 7 14:10 standby_fra1 brw-rw---- 1 root disk 253, 24 Jan 7 14:10 standby_fra1p1 brw-rw---- 1 root disk 253, 17 Jan 7 14:10 standby_fra2 brw-rw---- 1 root disk 253, 25 Jan 7 14:10 standby_fra2p1 brw-rw---- 1 root disk 253, 18 Jan 7 14:10 standby_log_ctl brw-rw---- 1 root disk 253, 29 Jan 7 14:10 standby_log_ctlp1
It’s important to use user friendly names instead of mpathn where n is just a number as you’ll see in a bit. The file /etc/multipath.conf has to be edited correctly.
The example will use the machine “devbox” which is home to the standby database as a source, the disk groups DATA, FRA, CTRLLOG will be cloned (on the array through an SSSU script) and presented to the same server.
The Process
The actual process of cloning is rather simple: shut down all Oracle processes (database + ASM) and stop asmlib if you are using it using /etc/init.d/oracleasm disable to completely unload the kernel modules. For RAC, it makes sense to disable the ASM instances and database or otherwise a reboot might mess up the carefully crafted scenario. To be on the very safe side, I usually disable ASMLib as well and change /etc/sysconfig/oracleasm variables:
ORACLEASM_ENABLED=false ORACLEASM_SCANBOOT=false
Then run the SSSU script to clone your source LUNs on the array. Although this example references the HP utility, and BCV copy tool should do the same. If the source system can’t be shut down for a consistent copy, you might consider running the “alter database suspend” (resume) command instead for a short period of time.
The end result will hopefully be a duplicate set of ASM disks for all your source LUNs.When doing this for the first time, some extra work is necessary. The Storage System Scripting Utility SSSU, a command line interface to the storage array (HP EVA 5000 series in this case) initially assigns random scsi WWIDs to the cloned LUNs.Next time, you can pass these WWIDs to the add copy command and everything will be reproducible.
Said WWIDs need to be read and added to the /etc/multipath.conf file once the snapclone procedure finishes. Also, the host might need some encouragement to detect the newly created LUNs. In RHEL 5 up to 5.3, you rescan the SCSI bus as follows:
for i in `ls /sys/class/scsi_host/` ; do echo - - - > /sys/class/scsi_host/$i/scan ; done
RHEL 5.4 and newer has a new shell script, called usr/bin/rescan-scsi-bus.sh to do the same.
Reference:
- http://kbase.redhat.com/faq/docs/DOC-3942 (How do I rescan the SCSI bus to add or remove a SCSI device without rebooting the computer?)
- http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/html/Online_Storage_Reconfiguration_Guide/scanning-storage-interconnects.html
Be careful with the /dev/disk/by-id/ directory-I found that wwids shown there don’t necessarily match up with the devices when multipathing is enabled, udev sometimes seems to get a bit confused.
At the end of the above steps, with updated multipath.conf you should see this in /dev/mapper for your cloned LUNs:
[root@devbox backup]# ls -l /dev/mapper/*clone* brw-rw---- 1 root disk 253, 30 Jan 7 15:09 /dev/mapper/clone_data1 brw-rw---- 1 root disk 253, 44 Jan 7 15:09 /dev/mapper/clone_data1p1 brw-rw---- 1 root disk 253, 33 Jan 7 15:05 /dev/mapper/clone_data2 brw-rw---- 1 root disk 253, 37 Jan 7 15:05 /dev/mapper/clone_data2p1 brw-rw---- 1 root disk 253, 31 Jan 7 15:05 /dev/mapper/clone_data3 brw-rw---- 1 root disk 253, 45 Jan 7 15:05 /dev/mapper/clone_data3p1 brw-rw---- 1 root disk 253, 34 Jan 7 15:05 /dev/mapper/clone_data4 brw-rw---- 1 root disk 253, 39 Jan 7 15:05 /dev/mapper/clone_data4p1 brw-rw---- 1 root disk 253, 32 Jan 7 15:05 /dev/mapper/clone_data5 brw-rw---- 1 root disk 253, 46 Jan 7 15:05 /dev/mapper/clone_data5p1 brw-rw---- 1 root disk 253, 35 Jan 7 15:05 /dev/mapper/clone_data6 brw-rw---- 1 root disk 253, 47 Jan 7 15:05 /dev/mapper/clone_data6p1 brw-rw---- 1 root disk 253, 36 Jan 7 15:05 /dev/mapper/clone_data7 brw-rw---- 1 root disk 253, 43 Jan 7 15:05 /dev/mapper/clone_data7p1 brw-rw---- 1 root disk 253, 38 Jan 7 15:05 /dev/mapper/clone_data8 brw-rw---- 1 root disk 253, 48 Jan 7 15:05 /dev/mapper/clone_data8p1 brw-rw---- 1 root disk 253, 40 Jan 7 15:05 /dev/mapper/clone_fra1 brw-rw---- 1 root disk 253, 50 Jan 7 15:05 /dev/mapper/clone_fra1p1 brw-rw---- 1 root disk 253, 41 Jan 7 15:05 /dev/mapper/clone_fra2 brw-rw---- 1 root disk 253, 51 Jan 7 15:05 /dev/mapper/clone_fra2p1 brw-rw---- 1 root disk 253, 42 Jan 7 15:05 /dev/mapper/clone_log_ctl brw-rw---- 1 root disk 253, 49 Jan 7 15:05 /dev/mapper/clone_log_ctlp1
Don’t start ASM or ASMLib yet-the disk headers of source and clone are still identical. This is where the unsupported part begins: we have to use the kfed utility which isn’t linked by default in 10.2 (but in 11.1 onwards). Link kfed as follows:
oracle@devbox ~]$ cd $ORACLE_HOME/lib
[oracle@devbox lib]$ make -f ins_rdbms.mk ikfed
Linking KFED utility (kfed)
rm -f /u01/app/oracle/product/10.2.0/db_2/rdbms/lib/kfed
gcc -o /u01/app/oracle/product/10.2.0/db_2/rdbms/lib/kfed -L/u01/app/oracle/product/10.2.0/db_2/rdbms/lib/ -L/u01/app/oracle/product/10.2.0/db_2/lib/ -L/u01/app/oracle/product/10.2.0/db_2/lib/stubs/ /u01/app/oracle/product/10.2.0/db_2/lib/s0main.o /u01/app/oracle/product/10.2.0/db_2/rdbms/lib/sskfeded.o /u01/app/oracle/product/10.2.0/db_2/rdbms/lib/skfedpt.o /u01/app/oracle/product/10.2.0/db_2/rdbms/lib/defopt.o -ldbtools10 -lclntsh `cat /u01/app/oracle/product/10.2.0/db_2/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /u01/app/oracle/product/10.2.0/db_2/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lclient10 -lnnetd10 -lvsn10 -lcommon10 -lgeneric10 -lmm -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `cat /u01/app/oracle/product/10.2.0/db_2/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /u01/app/oracle/product/10.2.0/db_2/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lclient10 -lnnetd10 -lvsn10 -lcommon10 -lgeneric10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lclient10 -lnnetd10 -lvsn10 -lcommon10 -lgeneric10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `cat /u01/app/oracle/product/10.2.0/db_2/lib/sysliblist` -Wl,-rpath,/u01/app/oracle/product/10.2.0/db_2/lib -lm `cat /u01/app/oracle/product/10.2.0/db_2/lib/sysliblist` -ldl -lm -L/u01/app/oracle/product/10.2.0/db_2/lib
mv -f /u01/app/oracle/product/10.2.0/db_2/bin/kfed /u01/app/oracle/product/10.2.0/db_2/bin/kfedO
mv: cannot stat `/u01/app/oracle/product/10.2.0/db_2/bin/kfed’: No such file or directory
make: [ikfed] Error 1 (ignored)
mv /u01/app/oracle/product/10.2.0/db_2/rdbms/lib/kfed /u01/app/oracle/product/10.2.0/db_2/bin/kfed
chmod 751 /u01/app/oracle/product/10.2.0/db_2/bin/kfed
[oracle@devbox lib]$ find $ORACLE_HOME/bin/ -name kfed
/u01/app/oracle/product/10.2.0/db_2/bin/kfed
The Big Picture
The concept now is as follows:
- Read the ASM disk headers from all cloned ASM disks and store it as a file
- Modify the files and change the disk group name attribute “kfdhdb.grpname” from DATA to DATB in my example. There seems to be a dependency on the length of the disk group name somewhere else in the header so I decided to play safe and just change the last letter. Do so for every single cloned LUN.
- Write the modified headers back
The kfed read operation dumps the header to stdout-you can use this to store it in a file:
[root@devbox backup]# kfed read /dev/mapper/clone_data2p1 > header_clone_data2p1 [...]
To be safe I also copied the original header information to a backup location. I also dd’d the first 4k of each disk and stored that elsewhere, just in case. As far as I know the first 4k of each ASM disk contain the header.
Once all the new header information is ready, it’s time to bite the bullet and write it back to the device. Remember that I said that it was
important to have user friendly names earlier? Write the changed header information to a source LUN and your source database won’t start anymore-what a mess! And don’t go to Oracle support because they won’t help you either Be extra extra careful with this step! Again, I won’t take responsibility if you get it wrong…. The command this time is kfed merge as in the following example:
[root@devbox backup]# kfed merge /dev/mapper/clone_data1p1 text=header_clone_data1p1 [...]
The final step involves force-renaming the disks-the clones still have their original disk names. First, re-enable oracleasmlib by changing the /etc/sysconfig/oracleasm file, make sure to have these values set:
ORACLEASM_ENABLED=true ORACLEASM_SCANBOOT=false
Execute “/etc/init.d/oracleasm enable” to load the kernel modules. I disabled ASMLib before the snapcloen which not only stops it but it also unloads the kernel modules. Again a lot of typing, but well worth it:
/etc/init.d/oracleasm force-renamedisk /dev/mapper/clone_data1p1 DATB1 [...] /etc/init.d/oracleasm force-renamedisk /dev/mapper/clone_data8p1 DATB8 [root@devbox backup]# /etc/init.d/oracleasm scandisks
If all goes well, you’ll get the output similar to the following:
[root@devbox backup]# /etc/init.d/oracleasm listdisks DATA1 DATA2 DATA3 DATA4 DATA5 DATA6 DATA7 DATA8 DATB1 DATB2 DATB3 DATB4 DATB5 DATB6 DATB7 DATB8 FRA1 FRA2 FRB1 FRB2 LOGCTL1 LOGCTL2 [root@devbox backup]
Good stuff! Now start asm and the source database. In my case that’s a standby so I have to mount it and put it into managed recovery. Also set oracleasm_scanboot to TRUE so it’ll automatically scan for disks next time the server boots. If this is the first execution, modify the ASM pfile to add the cloned diskgroups to asm_diskgroups (on all nodes on the cluster in a RAC scenario)
Remaing Tasks
The rest isn’t different from any other database duplication: you backup the source controlfile to trace, edit it (resetlogs case), change the create controlfile line to set database newdb and change paths to point to the changed disk groups (DATA->DATB, FRA->FRB) etc. Make sure no pointers to the source system exist in the spfile/pfile/control file. Then you do the triple step “startup mount – create controlfile – open resetlogs”. In case of RAC you have to add online logfiles for the other threads (instances), also ensure that cluster_database is false for the create controlfile command.
jessica said
great site. Great information. helped me alot thank you very much
Blogroll Report 08/01/2009 – 15/01/2010 « Coskan’s Approach to Oracle said
[...] 22-How to rename cloned ASM diskgroups in 10GR2 – 11GR2 renamedg alternative? Martin Bach-Rename cloned diskgroup in Oracle 10.2 [...]
Itayemi said
I have a configuration where full device-path-names were used to create ASM diskgroups (Oracle DB on Solaris 10).
Recently the LUN IDs were changed on the SAN storage and so the disk-device-names changed (on the Solaris 10 host) and so the ASM could not find/mount the disks for the diskgroups.
I have tried setting the asm_diskstring parameter to a wildcard but that didn’t work (as I suspected). Assuming asm_diskstring is set, the question is does ASM actually search for the member disks in a diskgroup that was created using full disk-device-paths or does ASM just only looks for those devices exactly as they are specified? The rename_dg in 11g has a asm_diskstring parameter that can be set at the diskgroup level and which causes a rediscovery of the disks in a diskgroup, but unfortunately I am on 10gR2. Is there a way to do something similar via ASM or some other Oracle utility? Thanks.
SQL> select adg.name dg_name, ad.name fg_name, path from v$asm_disk ad
right outer join v$ASM_DISKGROUP adg
on ad.group_number=adg.group_number;
DG_NAME FG_NAME PATH
————— ———————- ————————————
DATA1 DATA1_0001 /dev/rdsk/c2t50XXXXXXXXX94FC2d2s0
DATA1 DATA1_0002 /dev/rdsk/c2t50XXXXXXXXX94FC2d3s0
DATA1 DATA1_0000 /dev/rdsk/c2t50XXXXXXXXX94FC2d1s0
DATA1 DATA1_0003 /dev/rdsk/c2t50XXXXXXXXX94DECd11s0
DATA DATA_0000 /dev/rdsk/c2t50XXXXXXXXX94DECd0s0
DATA DATA_0001 /dev/rdsk/c2t50XXXXXXXXX94DECd1s0
FRAD FRADISK5 /dev/rdsk/c2t50XXXXXXXXX94DECd10s0
FRAD FRAD_0005 /dev/rdsk/c2t50XXXXXXXXX94DECd12s0
FRAD FRADISK6 /dev/rdsk/c2t50XXXXXXXXX94FC2d0s0
FRAD FRADISK3 /dev/rdsk/c2t50XXXXXXXXX94DECd9s0
FRAD FRADISK1 /dev/rdsk/c2t50XXXXXXXXX94DECd2s0
FRAD FRSDISK2 /dev/rdsk/c2t50XXXXXXXXX94DECd3s0
12 rows selected.
SQL>
The LUN ID is roughly the 1 or 2 digit number after the small “d” in the device (PATH) names. Some of them changed at the O/S level when LUN ID was changed on the SAN storage.
I was thinking maybe if I run the following sample command (let’s say the device-name for FRDISK6 changed for example):
/etc/init.d/oracleasm force-renamedisk /dev/rdsk/c2t50XXXXXXXXXXXFC2d5s0 FRDISK6
(where /dev/rdsk/c2t50XXXXXXXXXXXFC2d5s0 is the new O/S device for the disk)
Do you think this update ASM as required (including whatever data structures/tables/views) such that the diskgroup(s) will be mountable?
Martin said
Hi Itayemi,
ASMLib is a Linux-only tool, so your suggestion wouldn’t work I’m afraid. For Solaris, why don’t you create symbolic links to the disks in question, like /dev/asmdisks/data1 etc, linking to the rdsk which is actually DATA1? This worked for me, but bear in mind you have to change your ASM_DISKSTRING.
Itayemi said
Hi Martin, I seem to have discovered the cause of the problem. I think even if the LUN ID changes, Oracle is still able to find the LUNs due to the headers it wrote on them. I have only tested this once in the lab, but it seems changing the LUN ID (possibly coupled with a reconfigure reboot) resets the ownership of the special files to root:sys on Solaris (this makes sense since the original device files become invalid and would have been removed and new ones created which will automatically belong to root).
So I reset the ownership back to oracle:oinstall (example) and it started OK. I have asked the “client” to repeat the test on their test infrastructure and let me know how it turns out (I was assisting remotely during the initial test and didn’t see the errors – may be if I had seen something similar to the output below, I may have had a similar idea as to the cause of the problem).
SQL> startup
ASM instance started
Total System Global Area 130023424 bytes
Fixed Size 1976920 bytes
Variable Size 102880680 bytes
ASM Cache 25165824 bytes
ORA-15032: not all alterations performed
ORA-15063: ASM discovered an insufficient number of disks for diskgroup “MYDATADG2″
ORA-15063: ASM discovered an insufficient number of disks for diskgroup “FRADG1″
ORA-15063: ASM discovered an insufficient number of disks for diskgroup “DATADG1″
drumeng said
Good info..I was able to change and mount a clone using kfed.
SQL> select name,group_number from v$asm_diskgroup;
NAME GROUP_NUMBER
—————————— ————
DG1 1
DGCLONE 2
Problems arise afterwards though, trying to rename the ASM files:
SQL> alter diskgroup dgclone rename alias ‘+DGCLONE/orc1/datafile/system.256.751589979′ to ‘+DGCLONE/cln1/datafile/system.256.751589979′;
alter diskgroup dgclone rename alias ‘+DGCLONE/orc1/datafile/system.256.751589979′ to ‘+DGCLONE/cln1/datafile/system.256.751589979′
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15173: entry ‘datafile’ does not exist in directory ‘cln1′
SQL> alter diskgroup DGCLONE add directory ‘+DGCLONE/cln1/DATAFILE’;
Diskgroup altered.
SQL> alter diskgroup dgclone rename alias ‘+DGCLONE/orc1/datafile/system.256.751589979′ to ‘+DGCLONE/cln1/datafile/system.256.751589979′
2 ;
alter diskgroup dgclone rename alias ‘+DGCLONE/orc1/datafile/system.256.751589979′ to ‘+DGCLONE/cln1/datafile/system.256.751589979′
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15177: cannot operate on system aliases
Any ideas on how to do this?
Martin Bach said
Hi,
as far as I know there is no way to do this, but I’ll check and update the post. The situation is the same as a storage clone of the LUNs for a dev/test/UAT refresh, and if memory serves me right you can’t rename ASM directories.
Martin