This is going to be the last part of this series, however long it might end up being in the end. In the previous articles you read how to create a physical standby database from a RAC One database.
Networks (refresher)
To make it easier to follow without going back to the previous articles, here are the networks I’m using, listed for your convenience.
- 192.168.100/24: Client network
- 192.168.102/24: Dedicated Data Guard network
Data Guard Broker Configuration
I said it before and I say it again: I like the Data Guard broker interface for managing standby databases. It’s the ultimate simplification and so easy to implement that it’s quite safe to operate even in large organisations. As the added bonus you get OEM integration as well. OEM relies on a Broker configuration. The first step in managing the standby database therefore is to create the Data Guard configuration. And by the way, the documentation was correct, and the parameter “listener_networks” is already set! When the standby database starts you can see it being set by the Oracle agent process. The alert.log reads:
Starting background process DMON DMON started with pid=34, OS id=11023 ORACLE_BASE from environment = /u01/app/oracle Using default pga_aggregate_limit of 2048 MB ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.58)(PORT=1521))' SCOPE=MEMORY SID='sby_2'; 2014-04-28 05:11:47.874000 -04:00 ALTER SYSTEM SET remote_listener=' ron12csby-scan.example.com:1521' SCOPE=MEMORY SID='sby_2'; ALTER SYSTEM SET listener_networks='(( NAME=net2)(LOCAL_LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.102.58) (PORT=1522)))))','((NAME=net2)(REMOTE_LISTENER=ron12csby-dgscan.dg.example.com:1521))' SCOPE=MEMORY SID='sby_2'; 2014-04-28 05:12:45.107000 -04:00 Decreasing number of real time LMS from 1 to 0
So no more need to worry about local listener (unless you have many local listeners and are not on 12c), remote_listener and listener_networks. That’s a big relief, to see that it does work. I can now create the configuration. I like the broker command line interface more than the OEM interface, especially since it is a lot easier to show here in the post.
Before you can work with the broker you need to enabled it. Set the following init.ora parameters to values similar to these below. It is important that these values are set identical on all instances. You also need to ensure the broker configuration files are in ASM.
SQL> show parameter dg_broker NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ dg_broker_config_file1 string +DATA/ron/d1.dat dg_broker_config_file2 string +RECO/ron/d2.dat dg_broker_start boolean TRUE SQL>
Make sure to make the settings on both clusters! Now connect using the command line interface and create the configuration. Note that I’m connecting using ronprinet2, which is the TNS name for the primary database using the Broker Network (192.168.102/24)
[oracle@ron12cprinode1 ~]$ dgmgrl sys/xxx@ronprinet2 DGMGRL for Linux: Version 12.1.0.1.0 - 64bit Production Copyright (c) 2000, 2012, Oracle. All rights reserved. Welcome to DGMGRL, type "help" for information. Connected as SYSDBA. DGMGRL> help create configuraiton Creates a broker configuration Syntax: CREATE CONFIGURATION <configuration name> AS PRIMARY DATABASE IS <database name> CONNECT IDENTIFIER IS <connect identifier>; DGMGRL> create configuration dgtest as primary database is "RON" connect identifier is "ronprinet2"; Configuration "dgtest" created with primary database "RON" DGMGRL> show database verbose "RON" Database - RON Role: PRIMARY Intended State: OFFLINE Instance(s): pri_2 Properties: DGConnectIdentifier = 'ronprinet2' ObserverConnectIdentifier = '' LogXptMode = 'ASYNC' RedoRoutes = '' DelayMins = '0' Binding = 'optional' MaxFailure = '0' MaxConnections = '1' ReopenSecs = '300' NetTimeout = '30' RedoCompression = 'DISABLE' LogShipping = 'ON' PreferredApplyInstance = '' ApplyInstanceTimeout = '0' ApplyLagThreshold = '0' TransportLagThreshold = '0' TransportDisconnectedThreshold = '30' ArchiveLagTarget = '0' LogArchiveMaxProcesses = '4' LogArchiveMinSucceedDest = '1' FastStartFailoverTarget = '' InconsistentProperties = '(monitor)' InconsistentLogXptProps = '(monitor)' SendQEntries = '(monitor)' LogXptStatus = '(monitor)' RecvQEntries = '(monitor)' StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.51) (PORT=1521))(CONNECT_DATA=(SERVICE_NAME=ron_DGMGRL)(INSTANCE_NAME=pri_2)(SERVER=DEDICATED)))' StandbyArchiveLocation = 'USE_DB_RECOVERY_FILE_DEST' AlternateLocation = '' LogArchiveTrace = '0' LogArchiveFormat = '%t_%s_%r.dbf' TopWaitEvents = '(monitor)' Database Status: DISABLED DGMGRL>
A few more things worth mentioning here: first of all the database does not know about the other instance. That’s not to worry about: you will see how this is addressed during the online relocation. The StaticConnectIdentifier is also pointing to the wrong network. Let’s have a look at the only instance:
DGMGRL> show instance verbose pri_2 Instance 'pri_2' of database 'RON' Host Name: ron12cprinode1.example.com PFILE: Properties: StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.51)(PORT=1521)) (CONNECT_DATA=(SERVICE_NAME=ron_DGMGRL)(INSTANCE_NAME=pri_2)(SERVER=DEDICATED)))' StandbyArchiveLocation = 'USE_DB_RECOVERY_FILE_DEST' AlternateLocation = '' LogArchiveTrace = '0' LogArchiveFormat = '%t_%s_%r.dbf' TopWaitEvents = '(monitor)' Instance Status: DISABLED
Unsurprisingly the StaticConnectIdentifier is pointing to the wrong interface again. Let’s correct this.
DGMGRL> edit instance pri_2 set property StaticConnectIdentifier= > '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.102.51)(PORT=1522)) > (CONNECT_DATA=(SERVICE_NAME=ron_DGMGRL)(INSTANCE_NAME=pri_2)(SERVER=DEDICATED)))'; Property "staticconnectidentifier" updated
This command updated the setting on the database and instance level. OK, let’s move on. I need to add the standby database.
DGMGRL> add database "RONDG" AS CONNECT IDENTIFIER IS ronsbynet2 MAINTAINED AS PHYSICAL; Database "RONDG" added DGMGRL> show database verbose "RONDG" Database - RONDG Role: PHYSICAL STANDBY Intended State: OFFLINE Transport Lag: (unknown) Apply Lag: (unknown) Apply Rate: (unknown) Real Time Query: OFF Instance(s): sby_2 Properties: DGConnectIdentifier = 'ronsbynet2' ObserverConnectIdentifier = '' LogXptMode = 'ASYNC' RedoRoutes = '' DelayMins = '0' Binding = 'optional' MaxFailure = '0' MaxConnections = '1' ReopenSecs = '300' NetTimeout = '30' RedoCompression = 'DISABLE' LogShipping = 'ON' PreferredApplyInstance = '' ApplyInstanceTimeout = '0' ApplyLagThreshold = '0' TransportLagThreshold = '0' TransportDisconnectedThreshold = '30' ApplyParallel = 'AUTO' StandbyFileManagement = 'AUTO' ArchiveLagTarget = '0' LogArchiveMaxProcesses = '4' LogArchiveMinSucceedDest = '1' DbFileNameConvert = '' LogFileNameConvert = '' FastStartFailoverTarget = '' InconsistentProperties = '(monitor)' InconsistentLogXptProps = '(monitor)' SendQEntries = '(monitor)' LogXptStatus = '(monitor)' RecvQEntries = '(monitor)' StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.58) (PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rondg_DGMGRL)(INSTANCE_NAME=sby_2)(SERVER=DEDICATED)))' StandbyArchiveLocation = 'USE_DB_RECOVERY_FILE_DEST' AlternateLocation = '' LogArchiveTrace = '0' LogArchiveFormat = '%t_%s_%r.dbf' TopWaitEvents = '(monitor)' Database Status: DISABLED DGMGRL> show instance verbose sby_2 Instance 'sby_2' of database 'RONDG' Host Name: ron12csbynode1.example.com PFILE: Properties: StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.58)(PORT=1521)) (CONNECT_DATA=(SERVICE_NAME=rondg_DGMGRL)(INSTANCE_NAME=sby_2)(SERVER=DEDICATED)))' StandbyArchiveLocation = 'USE_DB_RECOVERY_FILE_DEST' AlternateLocation = '' LogArchiveTrace = '0' LogArchiveFormat = '%t_%s_%r.dbf' TopWaitEvents = '(monitor)' Instance Status: DISABLED
Again let’s fix the static connection identifier.
DGMGRL> edit instance sby_2 set property StaticConnectIdentifier = > '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.102.58)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=rondg_DGMGRL) (INSTANCE_NAME=sby_2)(SERVER=DEDICATED)))'; Property "staticconnectidentifier" updated
Now everything is in place, I can enable the configuration.
DGMGRL> enable configuration Enabled. DGMGRL> show configuration Configuration - dgtest Protection Mode: MaxPerformance Databases: RON - Primary database RONDG - Physical standby database Fast-Start Failover: DISABLED Configuration Status: SUCCESS DGMGRL>
If you haven’t set the standby file management property to auto yet, please do so. I really don’t like MRP0 aborting because a datafile UNNAMED0000010 has been added. This causes confusion that’s better avoided.
If you get errors about missing standby redo logs (SRLs), please add them. I have dones so on the standby and primary, but didn’t include the output here.
Switchover
That should be all that’s needed in preparation. Since I’m impatient I wanted to see if the switchover works. First I check the configuration:
DGMGRL> show configuration Configuration - dgtest Protection Mode: MaxPerformance Databases: RON - Primary database RONDG - Physical standby database Fast-Start Failover: DISABLED Configuration Status: SUCCESS
Success means: ready to switch over. Let’s try:
DGMGRL> switchover to "RONDG" Performing switchover NOW, please wait... Operation requires a connection to instance "sby_2" on database "RONDG" Connecting to instance "sby_2"... Connected as SYSDBA. New primary database "RONDG" is opening... Operation requires startup of instance "pri_2" on database "RON" Starting instance "pri_2"... ORACLE instance started. Database mounted. Switchover succeeded, new primary is "RONDG" DGMGRL> show configuration Configuration - dgtest Protection Mode: MaxPerformance Databases: RONDG - Primary database RON - Physical standby database Fast-Start Failover: DISABLED Configuration Status: SUCCESS
Great so that worked. It was important that the StaticConnectIdentifiers are set correctly. During testing I found that relying on DGConnectIdentifier did not work, when it pointed to the second SCAN listener. The SCAN listeners do not “know” about the statically registered %_DGMGRL services, and you receive the dreaded “listener does not know about service name in connect descriptor” error. I also had to pick a listener for the StaticConnectIdentifier. I wanted to have both instances as potential targets for starting the instance, but that failed too. So in a way I map node 1 to instance sby_2 by pointing the broker to its listener.
Online Relocation
Now on to part 2: what happens to the standby when I perform an online relocation on the primary. Let’s try.
[oracle@ron12csbynode1 admin]$ srvctl status database -d rondg Instance sby_2 is running on node ron12csbynode1 Online relocation: INACTIVE [oracle@ron12csbynode1 admin]$ srvctl status service -d rondg Service ron12c is running on instance(s) sby_2 [oracle@ron12csbynode1 admin]$ srvctl relocate database -d rondg -timeout 2 -verbose -node ron12csbynode2 Configuration updated to two instances Instance sby_1 started Services relocated Waiting for up to 2 minutes for instance sby_2 to stop ... Instance sby_2 stopped Configuration updated to one instance [oracle@ron12csbynode1 admin]$
In the meantime I connected to the primary (while it was relocating) and switched a few logs. This works, as you can see (RON is the standby database now)
2014-05-23 10:30:16.943000 -04:00 Recovery of Online Redo Log: Thread 1 Group 6 Seq 119 Reading mem 0 Mem# 0: +DATA/RON/ONLINELOG/group_6.280.846066567 Mem# 1: +RECO/RON/ONLINELOG/group_6.330.846066593 2014-05-23 10:30:42.897000 -04:00 Archived Log entry 226 added for thread 1 sequence 119 ID 0x64c27dbf dest 1: 2014-05-23 10:30:49.482000 -04:00 Media Recovery Waiting for thread 1 sequence 120 (in transit) 2014-05-23 10:30:52.043000 -04:00 Primary database is in MAXIMUM PERFORMANCE mode Re-archiving standby log 8 thread 2 sequence 97 2014-05-23 10:31:02.879000 -04:00 Recovery of Online Redo Log: Thread 1 Group 5 Seq 120 Reading mem 0 Mem# 0: +DATA/RON/ONLINELOG/group_5.279.846066497 Mem# 1: +RECO/RON/ONLINELOG/group_5.329.846066535 2014-05-23 10:31:12.400000 -04:00 RFS[8]: Assigned to RFS process (PID:13182) RFS[8]: Selected log 9 for thread 2 sequence 98 dbid 1681008808 branch 838874793 2014-05-23 10:31:14.987000 -04:00 Archived Log entry 227 added for thread 2 sequence 97 ID 0x64c27dbf dest 1: 2014-05-23 10:31:50.408000 -04:00 Media Recovery Waiting for thread 2 sequence 98 (in transit) 2014-05-23 10:32:02.193000 -04:00 RFS[8]: Selected log 8 for thread 2 sequence 99 dbid 1681008808 branch 838874793 2014-05-23 10:32:05.872000 -04:00 Recovery of Online Redo Log: Thread 2 Group 9 Seq 98 Reading mem 0 Mem# 0: +DATA/RON/ONLINELOG/group_9.283.846066895 Mem# 1: +RECO/RON/ONLINELOG/group_9.333.846066947
The logs are from 2 threads-the relocation temporarily changes the single instance database to a cluster database by starting instances on both hosts.
Eventually instance sby_2 stops and causes TNS errors in the standby’s alert.log:
*********************************************************************** Fatal NI connect error 12528, connecting to: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ron12csby-dgscan.dg.example.com)(PORT=1521)) (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=RONDG_DGB)(INSTANCE_NAME=sby_2)(CID=(PROGRAM=oracle) (HOST=ron12cprinode1.example.com)(USER=oracle)))) VERSION INFORMATION: TNS for Linux: Version 12.1.0.1.0 - Production Oracle Bequeath NT Protocol Adapter for Linux: Version 12.1.0.1.0 - Production TCP/IP NT Protocol Adapter for Linux: Version 12.1.0.1.0 - Production Time: 23-MAY-2014 10:34:25 Tracing not turned on. Tns error struct: ns main err code: 12564 ...
Nothing to be alarmed about, this is normal, and shown in the output of the relocate command.
The good news is that the primary receives more redo:
2014-05-23 10:35:40.756000 -04:00 RFS[9]: Assigned to RFS process (PID:13329) RFS[9]: Selected log 9 for thread 2 sequence 100 dbid 1681008808 branch 838874793 2014-05-23 10:36:08.256000 -04:00 Archived Log entry 231 added for thread 2 sequence 100 ID 0x64c27dbf dest 1: 2014-05-23 10:36:14.754000 -04:00 Media Recovery Waiting for thread 2 sequence 101 (in transit) 2014-05-23 10:36:15.972000 -04:00 RFS[7]: Opened log for thread 2 sequence 101 dbid 1681008808 branch 838874793 2014-05-23 10:36:20.162000 -04:00 Archived Log entry 232 added for thread 2 sequence 101 rlc 838874793 ID 0x64c27dbf dest 2: 2014-05-23 10:36:29.656000 -04:00 Media Recovery Log +RECO/RON/ARCHIVELOG/2014_05_23/thread_2_seq_101.393.848313375
And after instance 2 is shut down, you can see redo from thread 1 being transferred.
2014-05-23 10:46:58.257000 -04:00 RFS[6]: Selected log 5 for thread 1 sequence 122 dbid 1681008808 branch 838874793 Archived Log entry 233 added for thread 1 sequence 121 ID 0x64c27dbf dest 1: 2014-05-23 10:47:00.479000 -04:00 Media Recovery Waiting for thread 1 sequence 122 (in transit) 2014-05-23 10:47:03.686000 -04:00 Recovery of Online Redo Log: Thread 1 Group 5 Seq 122 Reading mem 0 Mem# 0: +DATA/RON/ONLINELOG/group_5.279.846066497 Mem# 1: +RECO/RON/ONLINELOG/group_5.329.846066535
If you are unsure about the thread to instance mapping, check v$thread, this example is from the primary:
SQL> select thread#, status, enabled, instance from v$thread; THREAD# STATUS ENABLED INSTANCE ---------- ------ -------- ------------------------------ 1 OPEN PUBLIC sby_1 2 CLOSED PUBLIC sby_2
In summary my test showed that an online relocation does not seem to cause trouble for the standby database. I need to run a benchmark like Swingbench against it to see how the online relocation behaves when the system is under load though, the database was pretty much idle during the online relocation.
Amending the Broker configuration
Circling back to the introduction: what happens to the DG broker configuration after the database has been a RAC database (albeit briefly)? Here is the output from the configuration after the first relocation:
DGMGRL> show database verbose "RONDG" Database - RONDG Role: PRIMARY Intended State: TRANSPORT-ON Instance(s): sby_1 sby_2 Properties: DGConnectIdentifier = 'ronsbynet2' ObserverConnectIdentifier = '' LogXptMode = 'ASYNC' RedoRoutes = '' DelayMins = '0' Binding = 'optional' MaxFailure = '0' MaxConnections = '1' ReopenSecs = '300' NetTimeout = '30' RedoCompression = 'DISABLE' LogShipping = 'ON' PreferredApplyInstance = '' ApplyInstanceTimeout = '0' ApplyLagThreshold = '0' TransportLagThreshold = '0' TransportDisconnectedThreshold = '30' ApplyParallel = 'AUTO' StandbyFileManagement = 'AUTO' ArchiveLagTarget = '0' LogArchiveMaxProcesses = '4' LogArchiveMinSucceedDest = '1' DbFileNameConvert = '' LogFileNameConvert = '' FastStartFailoverTarget = '' InconsistentProperties = '(monitor)' InconsistentLogXptProps = '(monitor)' SendQEntries = '(monitor)' LogXptStatus = '(monitor)' RecvQEntries = '(monitor)' StaticConnectIdentifier(*) StandbyArchiveLocation(*) AlternateLocation(*) LogArchiveTrace(*) LogArchiveFormat(*) TopWaitEvents(*) (*) - Please check specific instance for the property value Database Status: SUCCESS
Notice how the second instance (sby_1) has been added. The first question I had: what’s it done to the connection identifiers? As expected it’s set to the wrong network for the newly added instance:
DGMGRL> show instance verbose sby_1; Instance 'sby_1' of database 'RONDG' Host Name: ron12csbynode2.example.com PFILE: Properties: StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.60)(PORT=1521)) (CONNECT_DATA=(SERVICE_NAME=rondg_DGMGRL)(INSTANCE_NAME=sby_1)(SERVER=DEDICATED)))' StandbyArchiveLocation = 'USE_DB_RECOVERY_FILE_DEST' AlternateLocation = '' LogArchiveTrace = '0' LogArchiveFormat = '%t_%s_%r.dbf' TopWaitEvents = '(monitor)' Instance Status: SUCCESS
So that needs to be changed as well. It would make sense to systematically perform a relocation as part of the build of the Data Guard environment on primary and standby to register all instances and update them. Not that it’s a big issue, the switchover is most likely not going to fail, but you have to manually start the database if the connection information is wrong.
Summary
It’s been quite a journey! This article series explained how to add a second SCAN to the environment including all the other infrastructure such as VIPs and listeners. You then saw how to duplicate a database for Data Guard and finally
Hello Martin,
Broker is not reading LISTENER_NETWORKS and updating the StaticConnectIdentifier with correct settings. That is really bad. For me Oracle has to improve/fix this. Any manual update is feasible for tests, but not for any HA-solution. It works with local-listener/remote, so it must also work with LISTENER_NETWORKS.
Markus