Prompted by a recent interview I wanted to perform a little test with RMAN, and incrementally updated backups. I created a 11.2.0.2.5 database on my Linux (OEL 5.5) test system and refreshed my understanding of this most useful rman feature (which is for another post).
So after I was happy with the working of the incrementally updated image copies I decided to see if I could restore my database with those. Time to use “drop database”, which removes spfile, data files, temp files and the control files. Tabula rasa!
But it didn’t matter I thought: I have controlfile/spfile autobackups and fully recovered image copies plus all the archived logs in the FRA. What could possibly go wrong? Well it took me 30 minutes to get the database back.
ORA-4031
The theory (or let’s call it documentation) states that if you lost every part of your database, you connect into rman and startup nomount. Since there is no pfile, it will realise that and use some defaults. Unfortunately that didn’t work, and I get repeated ORA-4031 errors. This is documented in MOS note 1176443.1. The workaround is-unsurprisingly-to create your own dummy pfile which I would have done anyway. All I needed was the db_name, the db_recovery_file_dest/_size and the sga_target of sufficient size.
With those settings, the startup nomount worked.
RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece
OK cool, so the next step is to restore the spfile from the autobackup. Simple enough, doesn’t take long at all. Or so I thought. I had the location from a backup log:
Starting Control File and SPFILE Autobackup at 13-APR-12
piece handle=+RECO/DEV/autobackup/2012_04_13/s_780484423.265.780484423 comment=NONE
Finished Control File and SPFILE Autobackup at 13-APR-12
A quick check in ASMCMD revealed that the file existed, but I was in for a surprise:
RMAN> restore controlfile from autobackup; Starting restore at 13.04.2012 10:06:24 using channel ORA_DISK_1 recovery area destination: +RECO database name (or database unique name) used for search: DEV channel ORA_DISK_1: no AUTOBACKUPS found in the recovery area AUTOBACKUP search with format "%F" not attempted because DBID was not set RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of restore command at 04/13/2012 10:06:25 RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece RMAN>
The DBID strictly speaking is not needed when the FRA is used as the controlfile/spfile autobackup location (see doc ID 227060.1, case (4) for 10g). So what’s going wrong here? Did I lose all my DBA skills? Maybe a little more force was needed! So I told RMAN explicitly which file to restore.
RMAN> restore controlfile from '+RECO/dev/autobackup/2012_04_13/s_780484423.265.780484423'; Starting restore at 13.04.2012 10:04:54 allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=101 device type=DISK RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of restore command at 04/13/2012 10:04:57 RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece
Oh dear. I then tried to figure out the DBID but was unlucky until I tried to copy the autobackup out of ASM and into the file system-and that provided the missing link:
RMAN> restore controlfile from '/tmp/s_780484423.265.780484423'; Starting restore at 13.04.2012 10:26:04 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=101 device type=DISK channel ORA_DISK_1: restoring control file RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of restore command at 04/13/2012 10:26:11 ORA-19870: error while restoring backup piece /tmp/s_780484423.265.780484423 ORA-19504: failed to create file "+RECO" ORA-17502: ksfdcre:4 Failed to create file +RECO ORA-15001: diskgroup "RECO" does not exist or is not mounted ORA-15204: database version 11.2.0.0.0 is incompatible with diskgroup RECO
OK, now there is a helpful error message! I quickly reverted back to my dummy pfile and added a compatible = 11.2.0.2 and voila!
$ rman target / Recovery Manager: Release 11.2.0.2.0 - Production on Fri Apr 13 10:27:45 2012 Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved. connected to target database: DEV (not mounted) RMAN> shutdown immediate using target database control file instead of recovery catalog startup nomount Oracle instance shut down RMAN> connected to target database (not started) Oracle instance started Total System Global Area 4275781632 bytes Fixed Size 2233336 bytes Variable Size 822086664 bytes Database Buffers 3422552064 bytes Redo Buffers 28909568 bytes RMAN> restore spfile from autobackup; Starting restore at 13.04.2012 10:28:05 allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=101 device type=DISK recovery area destination: +RECO database name (or database unique name) used for search: DEV channel ORA_DISK_1: AUTOBACKUP +reco/DEV/AUTOBACKUP/2012_04_13/s_780484582.274.780484583 found in the recovery area AUTOBACKUP search with format "%F" not attempted because DBID was not set channel ORA_DISK_1: restoring spfile from AUTOBACKUP +reco/DEV/AUTOBACKUP/2012_04_13/s_780484582.274.780484583 channel ORA_DISK_1: SPFILE restore from AUTOBACKUP complete Finished restore at 13.04.2012 10:28:14 RMAN>
Now that was simple. The question was: why hasn’t Oracle provided the information about the problematic “compatible” setting in the first place? It probably would have been too helpful… The rest of the restore went without problems by the way.
Hi Martin,
RMAN never was very helpful in terms of proper error message.
Sometimes I think that it is using /dev/random to generate a error number ;)
Marcin
Hi,
I had similar issues last week. Another workaround for the ORA-4031 is described in MOS [ID 1311984.1] – ORA_RMAN_SGA_TARGET :
export ORA_RMAN_SGA_TARGET=500
rman target /
SET DBID 887707758;
….
This way there is no need of manual dummy pfile creation.
You said “The theory (or let’s call it documentation) states that if you lost every part of your database, you connect into rman and startup nomount. Since there is no pfile, it will realise that and use some defaults. Unfortunately that didn’t work, and I get repeated ORA-4031 errors.”.
I am not sure what went went in your case, but it works for me, with alert.
RMAN> startup nomount;
startup failed: ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file ‘/oradata/oracle/product/11g/dbs/initXYZ.ora’
starting Oracle instance without parameter file for retrieval of spfile
Oracle instance started
http://ora600tom.wordpress.com/2011/08/18/build-an-oracle-parameter-file-a-quick-way/
Hi ora600tom,
that’s the intended behaviour. But as I said there is a bug (refer to MOS note above) that causes a system to crash with ORA-4031 if the shared pool allocated with the dummy pfile is too small. Happens more no >24 core systems where oracle’s algorithm dictates a larger SGA has to be used.
Regards,
Martin
Thanks Martin for clarifying.
Thomas