Martins Blog

Trying to explain complex things in simple terms

Offloading production backups to a different storage array

Posted by Martin Bach on May 24, 2011

For quite a while Oracle DBAs have performed split mirror backups using special devices called “Business Continuance Volumes” or BCVs for short. A BCV is a special mirror copy of a LUN on the same storage array as the primary copy.

In a BCV backup scenario, the storage administrator (usually) “splits” the mirror after putting the database into hot backup mode. After the mirror is split, the database is taken out of hot backup mode and resumes normal operation. A new Oracle instance on a different host can be mounted using the split mirror copy of the database for backups. The use of this technology for refreshing a test environment is out of scope of this article. The below figure demonstrates the idea:

The advantage of such an approach is that the backup operation, initiated from the mount host, should not impact the performance of the production database. Once the backup is complete, the BCV for the ARCH diskgroup should be re-synchronised with the source LUN, whereas the DATA disk group should not. This allows us to quickly recover from problems with the primary LUNs-more on that on a later post.

One Step Further

On my current site this process has been refined. One of the requirements was that SRDF should be used as a means for disaster recovery. I should say that SRDF, or Symmetrix Remote Data Facility is my customer’s preferred method for DR, and I do by no means want to advertise for EMC here; it just so happened that I was working on EMC storage for this project.

I should also note that Data Guard cannot be used due to an application constraint (an ACFS file system is integral part of the application and must be backed up together with the database).

All storage is presented by ASM, which in many ways makes life easier for me. The ASM LUNs or “disks” will have all the required information in the “disk” header. So after the cloned LUNs have been presented to the mount host, all I need to do is make them available to the OS, and optionally run an “oracleasm scandisks” as root to detect them. From then on I should be able to simply mount the disk group (either via sql*plus or srvctl in 11.2). The actual backup requires a few more steps, these are shown below.

Before going further into detail let’s have a look at the architecture first:

Two data centres are in use: the local one is used for production in normal operations, including backups. One of the design requirements was that backups can be taken on either data centre, in respect to the DR situation.

Split mirror backups as shown in the above figure are taken on the local data centre in normal operations. In case of DR, the remote data centre will be configured to take backups. For this to happen, it is necessary to clone the replicated LUNs (that would be activated in the DR event) much as it’s done for the local data centre’s split mirror backups. As an added advantage, the clones can be used to create pre- and post batch “backups” that would be activated in case of a horrible failure of the batch/end-of-year processing.

Taking the Backups

To be able to take a backup of the split mirror, a few things are necessary. Most of these are documented in MOS note “RMAN and Split Mirror Disk Backups [ID 302615.1]”. You certainly require a recovery catalogue database in the first place. As a first step you register the database in the catalog. You perform this step connect to the production database and recovery catalog, as shown in the below example:

$ rman target / catalog=rman/rmanPwd@rccat

Recovery Manager: Release 11.2.0.2.0 - Production on Fri May 20 10:39:00 2011

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database: PROD (DBID=1796248120)
connected to recovery catalog database

RMAN> register database;

database registered in recovery catalog
starting full resync of recovery catalog
full resync complete

RMAN> exit

If you like, you can configure defaults at this stage, I opted for the following (still connected to the primary database and recovery catalog):

RMAN> configure retention policy to redundancy 3;

new RMAN configuration parameters:
CONFIGURE RETENTION POLICY TO REDUNDANCY 3;
new RMAN configuration parameters are successfully stored

RMAN> CONFIGURE DEFAULT DEVICE TYPE TO SBT_TAPE;

new RMAN configuration parameters:
CONFIGURE DEFAULT DEVICE TYPE TO 'SBT_TAPE';
new RMAN configuration parameters are successfully stored

RMAN> configure controlfile autobackup off;

new RMAN configuration parameters:
CONFIGURE CONTROLFILE AUTOBACKUP OFF;
new RMAN configuration parameters are successfully stored

RMAN> CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE sbt_tape to '%F';

new RMAN configuration parameters:
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE 'SBT_TAPE' TO '%F';
new RMAN configuration parameters are successfully stored

RMAN> configure device type sbt_tape parallelism 4 backup type to backupset;

new RMAN configuration parameters:
CONFIGURE DEVICE TYPE 'SBT_TAPE' PARALLELISM 4 BACKUP TYPE TO BACKUPSET;
new RMAN configuration parameters are successfully stored

RMAN> exit

Now switch over to the mount host for some real work.

The high level steps for taking a backup of the “split mirror” are:

  1. Ensure that the cloned LUNs are presented to the mount host’s operating system, including appropriate zoning on the fabric
  2. Ensure that the multi-pathing solution of choice is correctly configured for all paths to the cloned LUNs
  3. Ensure that the ASM disks are known to the local ASM instance. That may including running an /etc/init.d/oracleasm scandisks as root, or putting the relevant rules into /etc/udev/rules.d/

These steps have been quite generic and will depend on your OS and storage stack. I won’t go into detail, but you might find some relevant bits and pieces on my blog if you are interested.

The next step is to make the ASM disks known to ASM. Initially that has to be done via the command line, from 11.1 onwards you have to connect as SYSASM:

SQL> alter  diskgroup DGName mount;

For Oracle 11.2 this automatically creates a resource for diskgroup DGName in the OCR which is very convenient (especially in RAC environments). Next time, all you need to do is to execute “srvctl start diskgroup -g DGName” as the grid software owner.

For pre 11.2 environments you might want to consider updating the “asm_diskstring” initialisation parameter accordingly.

Once the ASM disk groups your database requires are mounted in ASM, it’s time to mount the database. If not done so already, register the database in the OCR and ensure that you add the spfile option as well, as in this example (Oracle RAC users would add instances additionally):

$ srvctl add database -d PROD -o $ORACLE_HOME -c SINGLE -p ‘+data/PROD/spfilePROD.ora’ \
> -s mount -a DATA,ARCH

As per said MOS document, it is required to start the database using a BACKUP controlfile on the mount host. Otherwise you’d end up with these RMAN errors after the first backup:

RMAN-3014: Implicit resync of recovery catalog failed
RMAN-6038: Recovery catalog package detected an error
RMAN-20035: Invalid high RECID error

To do so, create a backup controlfile on the primary database before splitting the mirror. The process is very much the same you would for a physical standby database:

SQL> alter database backup controlfile to ‘/tmp/backup.ctl’;

Database altered.

This controlfile now needs to be made available to the mount host-an elegant way would be to use DBMS_FILETRANSFER to perform this task, or asmcmd’s copy command.

Once it’s on the mount host, say in /tmp/<databaseName>/backup.ctl, it needs to be made available to Oracle. The easiest way is to use RMAN for this:

Connect to the mirror instance as SYSDBA; do not connect to the recovery catalog.

$ rman target /

Recovery Manager: Release 11.2.0.2.0 - Production on Fri May 20 10:53:40 2011

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database: PROD (DBID=1796248120, not open)

RMAN> shutdown immediate

using target database control file instead of recovery catalog
database dismounted
Oracle instance shut down

RMAN> startup nomount
connected to target database (not started)
Oracle instance started

Total System Global Area    9219969024 bytes

Fixed Size                     2234056 bytes
Variable Size               4630513976 bytes
Database Buffers            4563402752 bytes
Redo Buffers                  23818240 bytes

RMAN> restore controlfile from '/tmp/PROD/ backup.ctl';

Starting restore at 20-MAY-11
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=200 instance=PROD device type=DISK

channel ORA_DISK_1: copied control file copy
output file name=+ARCH/prod/controlfile/current.256.737725307
output file name=+DATA/prod/controlfile/current.256.737725307
Finished restore at 20-MAY-11

If you were really desperate to back the CURRENT controlfile (which is part of the clone) up, you could have done this prior to the restore of the backup controlfile. You must not connect to the recovery catalog in this case-see MOS note 1111369.1 for more information.

Querying V$DATABASE.CONTROLFILE_TYPE should now return “BACKUP”. With the setup completed, you are ready to back the database up from the cloned LUNs. Connect to RMAN again, using the backup instance and the recovery catalog and initiate the backup. For example:

run {
allocate channel t1 type sbt_tape parms 'ENV=(TDPO_OPTFILE=/u01/app/oracle/product/admin/PROD/tdp/tdpo_PROD.opt)';
allocate channel t2 type sbt_tape parms 'ENV=(TDPO_OPTFILE=/u01/app/oracle/product/admin/PROD/tdp/tdpo_PROD.opt)';
backup database;
}

This completes the backup of the database. One caveat exists: the control file is not backed up as part of this process-RMAN doesn’t back a “backup” controlfile up:

$ rman target / catalog rman/rmanPwd@rccat

Recovery Manager: Release 11.2.0.2.0 - Production on Tue May 24 12:51:22 2011

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database: GB1 (DBID=1796248120, not open)
connected to recovery catalog database

RMAN> CONFIGURE CONTROLFILE AUTOBACKUP ON;

new RMAN configuration parameters:
CONFIGURE CONTROLFILE AUTOBACKUP ON;
new RMAN configuration parameters are successfully stored

RMAN> run {
2> allocate channel t1 type sbt_tape parms
3> 'ENV=(TDPO_OPTFILE=/u01/app/oracle/product/admin/PROD/tdp/tdpo_PROD.opt)';
3> backup tablespace USERS;
4> }

allocated channel: t1
channel t1: SID=199 instance=PROD device type=SBT_TAPE
channel t1: Data Protection for Oracle: version 5.5.1.0

Starting backup at 24-MAY-11
channel t1: starting full datafile backup set
channel t1: specifying datafile(s) in backup set
input datafile file number=00006 name=+DATA/prod/datafile/users.264.737727697
channel t1: starting piece 1 at 24-MAY-11
channel t1: finished piece 1 at 24-MAY-11
piece handle=1cmd4ott_1_1 tag=TAG20110524T125316
comment=API Version 2.0,MMS Version 5.5.1.0
channel t1: backup set complete, elapsed time: 00:00:01
Finished backup at 24-MAY-11

RMAN-06497: WARNING: control file is not current, control file AUTOBACKUP skipped
released channel: t1

RMAN>  exit

It’s not difficult at all to get around this problem. As part of the regular archive log backups you are performing on the production database anyway, you add a “backup current controlfile” and a command to resync the catalog (“resync catalog”).

At this stage I should note that the backup process wouldn’t be different in principle, if additional backups were to be taken on the DR site (licensing questions aside). Instead of having to clone the primary LUNs on the local data centre, the storage admins would clone the replicated LUNs (the “R2s” in EMC-talk) and bring these clones up on the DR mount host.

Summary

The process described above approach was interesting from my personal point of view as I haven’t used this concept before. I’m an Oracle DBA, and I feel most comfortable when I have things in hand, relying on another team for doing BAU database tasks is a new experience.

The process description deliberately left product names out unless they were part of the command output. The concept is quite universal and is by no means tied down to a specific vendor.

One of the requirements is that your tape solution can talk to the primary host(s) and the mount host in the local and remote data centres. Although I have seen sites (including rather large ones ) where the DR site was not configured to take backups. Although you could argue that you shouldn’t operate from your DR site for long, that’s by no means an excuse for not having backups. Let’s suggest you run into a block corruption, how would you recover from that without a backup? But then the storage team usually argue that block corruption doesn’t happen on their high end arrays.

References

  • RMAN and Split Mirror Disk Backups [ID 302615.1]
  • RMAN Backup of Controlfile fails from Split Mirror / BCV Copy. Is there any way to take controlfile backup from Split Mirror / BCV so that the Production DB is not used for Controlfile backup? [ID 1111369.1]
  • Supported Backup, Restore and Recovery Operations using Third Party Snapshot Technologies [ID 604683.1]

4 Responses to “Offloading production backups to a different storage array”

  1. Thanks for the post Martin – nicely written.

    When doing this kind of backup before I transferred the backup controlfile from the source host via the FRA LUNs. i.e. back up the file with a specific TAG and then restore it on the mount host using the same TAG. For example:

    startup nomount
    restore controlfile from tag ‘bkp_ctl’ device type disk;

    • Martin Bach said

      Thanks Neil – that makes it indeed easier!

      In a similar way I would like to test the use of the block change tracking, which is included in my DATA disk group. For as long as it is unmodified this should allow me to perform fast incremental backups. But that’s for another post …

      Martin

  2. Hi there, this is the first time I have happened across this blog and I have found this very interesting. We are doing a proof of concept with the scenario that you described above with HP’s Data protector. We have followed the Metalik notes and everything is pretty much the same as described above, begin/end backup on production database, split the mirror, mount the backup/split mirror database using a backup controlfile,RMAN backup of datafiles to tape. Then on prod db backup current control file and resync catalog.
    In our environment currently we only have one licensed server (the backup server) for RMAN to tape. This means our tape solution CANNOT talk to the production host. So my question is … how would we restore and recover a corrupt/missing datafile without this. Is it possible? Has anyone done this?
    Thanks Catherine

    • Martin Bach said

      Hi Catherine,

      my suggestion is to get that extra license before the auditors find out that you cannot restore data to your production host. That’s saving on the wrong end.

      Hope this helps,

      Martin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: