This is pretty much a note to myself on how to set up Data Guard broker for RAC 22.214.171.124+.
UPDATE: Please note that a lot of this has changed in 12.1, as described in a different blog post of mine (Little things worth knowing: Data Guard Broker Setup changes in 12c). If you are looking for information how to implement this with 12.1 then please continue on the other post. If you are on Oracle 11.2 then please read on :)
The Test Environment
The tests have been performed on Oracle Linux 5.5 with the Red Hat Kernel. Oracle was 126.96.36.199. Sadly my lab server didn’t support more than 2 RAC nodes, so everything has been done on the same cluster. It shouldn’t make a difference though. If it does, please let me know).
WARNING: there are some rather deep changes to the cluster here, be sure to have proper change control around making such amendments as it can cause outages! Nuff said.
Unfortunately I didn’t take notes of the configuration as it was before, so the post is going to be a lot shorter and less dramatic, but it’s useful as a reference (I hope) nevertheless. Now what’s the situation? Imagine you have a two node RAC cluster with separation of duties in place-“grid” owns the GRID_HOME, while “oracle” owns the RDBMS binaries. Imagine further you have two RAC database, ORCL and STDBY. STDBY has only just been duplicated for standby, so there is nothing in place which links the two together.
You then begin by configuring the Data Guard broker, unless of course you know the switchover commands by heart and think that using the broker is for people who can’t remember syntax. In which case you can safely skip to another article on the Internet….
Oh you stayed on :) Read on then.
Register the standby database in Clusterware if that hasn’t happened yet:
$ srvctl add database -d STDBY -p '+DATA/stdby/spfileSTDBY.ora' -o $ORACLE_HOME -s mount -r physical_standby $ srvctl add instance -d STDBY -i STDBY1 -n node1 $ srvctl add instance -d STDBY -i STDBY2 -n node2 $ srvctl start database -d STDBY -o mount
I’m only specifying “mount” mode here-if you are lucky enough to have a license for Active Data Guard you can remove that particular flag.
After the database has been started you can proceed with the broker configuration.
The broker requires its configuration files to be on shared storage for RAC-how would other nodes know about state and configuration changes otherwise? If you are RAC, you are most likely ASM like me, then change the configuration. For ORCL, the current primary database you could use these:
SQL> alter system set dg_broker_config_file1 = '+DATA/ORCL/dr1ORCL.dat' scope=both sid="*"; SQL> alter system set dg_broker_config_file2 = '+DATA/ORCL/dr2ORCL.dat' scope=both sid="*";
Conversely for STDBY, you use:
SQL> alter system set dg_broker_config_file1 = '+DATA/STDBY/dr1STDBY.dat' scope=both sid="*"; SQL> alter system set dg_broker_config_file2 = '+DATA/STDBY/dr2STDBY.dat' scope=both sid="*";
After Oracle confirmed these changes, start the broker on all databases:
SQL> alter system set dg_broker_start = true scope=both sid="*"';
Now comes the hard bit, the part that actually caught me out at first. As it is my habit, I instruct dbca to create my databases. As a result there will be tnsnames.ora entry similar to this one:
ORCL = (DESCRIPTION = (ADDRESS = (PROTOCOL=TCP)(PORT = 1521)(HOST = prodscan) ) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = ORCL) ) ) ...
We will shortly see why that has become a problem, for now note it references the SCAN, and the generic service name for the database. Before we can perform any further configuration changes, there is a need to add db_unique_name_DGMGRL.db_domain as the GLOBAL_DBNAME with the corresponding $ORACLE_HOME and the $ORACLE_SID to the listener. For each node on the cluster, edit listener.ora and statically register the service. Please pay attention to the ORACLE_SID-it needs to map the ORACLE_SID for the instance on the host. The registration is needed during the switchover when the database is shut down. My listener.ora has been amended with these lines on node 1:
SID_LIST_LISTENER = (SID_LIST = (SID_DESC = (GLOBAL_DBNAME = ORCL_DGMGRL.example.com) (ORACLE_HOME = /u01/app/oracle/product/188.8.131.52) (SID_NAME = ORCL1) ) (SID_DESC = (GLOBAL_DBNAME = STDBY_DGMGRL.example.com) (ORACLE_HOME = /u01/app/oracle/product/184.108.40.206) (SID_NAME = STDBY1) ) )
The file is identical on node2, except that the SID_NAMEs are different of course. Now it’s time to reload the listener (in non-play-environments you don’t do this without proper change control!):
$ srvctl stop listener $ srvctl start listener
You should verify that the DGMGRL services are visible in the output of “lsnrctl status” on each node. If not, go back and fix!
With all that prep-work, you can now create the configuration using dgmgrl. Connect as sys to the primary database and create the configuration. I like to have all instances up and running: ORCL1 and ORCL2 for the primary database, STDBY1 and STDBY2 for the standby.
$ dgmgrl DGMGRL for Linux: Version 220.127.116.11.0 - 64bit Production Copyright (c) 2000, 2009, Oracle. All rights reserved. Welcome to DGMGRL, type "help" for information. DGMGRL> connect sys/sys@aande Connected. DGMGRL> create configuration martin as > primary database is ORCL > connect identifier is ORCL;
That creates the first database with all its instances. The next step is to add the standby database.
DGMGRL> add database STDBY as connect identifier is stdby maintained as physical;
You can then browse the database configuration using “show database verbose ORCL”, “show instance verbose ‘ORCL1′” etc. For example, my setup had the following properties:
DGMGRL> show database STDBY Database - STDBY Role: PHYSICAL STANDBY Intended State: APPLY-ON Transport Lag: 0 seconds Apply Lag: 0 seconds Real Time Query: OFF Instance(s): STDBY1 (apply instance) STDBY2 Database Status: SUCCESS DGMGRL> show instance verbose "STDBY1" Instance 'STDBY1' of database 'STDBY' Host Name: node1.example.com PFILE: Properties: SidName = 'STDBY1' StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.99.70)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=STDBY_DGMGRL.example.com)(INSTANCE_NAME=STDBY1)(SERVER=DEDICATED)))' StandbyArchiveLocation = 'USE_DB_RECOVERY_FILE_DEST' AlternateLocation = '' LogArchiveTrace = '0' LogArchiveFormat = '%t_%s_%r.dbf' TopWaitEvents = '(monitor)' Instance Status: SUCCESS ...
When you are happy, enable the configuration:
DGMGRL> enable configuration; Enabled.
Well done-you can now tweak your configuration, change the way that redo is shipped, change standby file management etc. In short, everything you do when creating a broker configuration. The trouble started when I wanted to test the configuration. I don’t hand over infrastructure where I’m not happy that it works! Consider this:
DGMGRL> switchover to STDBY Performing switchover NOW, please wait... New primary database "STDBY" is opening... Operation requires shutdown of instance "ORCL1" on database "ORCL" Shutting down instance "ORCL1"... ORACLE instance shut down. Operation requires startup of instance "ORCL1" on database "ORCL" Starting instance "ORCL1"... ORACLE instance started. Database mounted. Unable to connect to database ORA-12514: TNS:listener does not currently know of service requested in connect descriptor Failed. Warning: You are no longer connected to ORACLE. Switchover succeeded, new primary is "STDBY" DGMGRL>
Ouch! Where did that come from? A little bit of investigation pointed me to the SCAN listeners. Fair enough, there was no ORCL_DBG, nor any other ORCL service registered! Interesting. However, the services were registered locally. I found note “11gR2 RAC DB switchover using DG broker [ID 880017.1]” on Metalink which isn’t really worth reading as it’s so confusing. However, something occurred to me when I read it. Remember when I said initially that the SCAN in the connection identifier was a bad thing? Now what if the registration of the DGB services happens too slowly with the SCAN listeners to be useful. I could simply point everything to the local listener and be done! I also read that there might be problems when using port 1521, so I changed the listener quickly to listen on 1555 (as grid):
$ srvctl modify listener -p 1555 $ srvctl stop listener $ srvctl start listener
Again, be warned-don’t do this without change control!
So I changed my tnsnames.ora file once more, but this time made sure that the connections went locally. The ORCL and STDBY entries were changed to read:
ORCL = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = server1-vip.example.com)(PORT = 1555)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = ORCL) (INSTANCE_NAME = ORCL1) ) ) ...
My local listener only listens on the VIP, hence only one entry for “ADDRESS” here. Notice the VIP and changed port. I am also connecting to the first instance. On the second node, that became:
ORCL = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = server2-vip.example.com)(PORT = 1555)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = ORCL) (INSTANCE_NAME = ORCL2) ) ) ...
This was pure lazyness- I could ^H^H shold have created new service names ORCLDGB and STDBYDGB, but since I am Chuck Norris on my lab server I didn’t care. People might rely on your server’s TNSnames; the shouldn’t, but they might nevertheless.
So did that make a difference? Indeed! After a cycle of the databases and the recreation of my configuration to reflect the new local listener port I could switch over successfully! I was tempted to simply change each instances’ StaticConnectionIdentifier but as per note 1387859.1 that’s not such a good idea-Oracle then assumes you want to always manage it, and if you change the port again everything comes tumbling down on you!
Hope this helps! As always, test before you use this and let me know should there are any problems.