Category Archives: Data Guard

Using dbca to create a physical standby database

While investigating new options I discovered with dbca for the previous article I noticed that it’s now possible to use Oracle’s Database Creation Assistant (dbca) to create a physical standby database using the -createDuplicateDB flag.

I wanted to know how easily this can be done on my Oracle 18.5.0 system. I have 2 VMs, server3 and server4 running Oracle Linux 7.6. I created the primary database on server3. The database name is NCDB with the db_unique_name set to SITEA. I also use SITEA as the database SID to prevent confusion. My physical standby database with db_unique_name SITEB will reside on server4.

The setup is kept as simple as possible to avoid any unintended complications. In other words, this is a lab setup, not something you’d entertain using for any other purpose. As always, this is a tech post and not a piece about licencing. It is up to you to ensure you aren’t performing any tasks that you aren’t licensed for.

Creating the primary database

The primary database can be created using dbca as always. Unlike the last post, this one uses ASM, not a filesystem to show how easy it is to do both.

The primary database is a non-Container Database (NCDB). I intend to create the primary instance as SITEA (Site A) with the standby named SITEB. This way no one gets confused when the database runs in primary role in the designated disaster recovery location. I haven’t seen gdbName <> sid/db_unique_name in many single instance deployments and appreciate that might be unfamiliar but bear with me. I’ll try to make this as transparent as possible.

Invoking dbca

Enough talking, let’s create the database:

[oracle@server3 ~]$ dbca -silent -createDatabase \
> -gdbName NCDB \
> -templateName lab_template.dbc \
> -sid SITEA \
> -enableArchive true -archiveLogMode auto \
> -memoryMgmtType auto_sga \
> -createAsContainerDatabase false \
> -recoveryAreaDestination +RECO \
> -totalMemory 2048 \
> -datafileDestination +DATA \
> -initParams db_unique_name=SITEA,dg_broker_start=true \
> -customScripts add_srls_dbca.sql

[...]

Prepare for db operation
10% complete
Registering database with Oracle Restart
14% complete
Copying database files
43% complete
Creating and starting Oracle instance
45% complete
49% complete
53% complete
56% complete
62% complete
Completing Database Creation
68% complete
70% complete
71% complete
Executing Post Configuration Actions
100% complete
Database creation complete. For details check the logfiles at:
/u01/app/oracle/cfgtoollogs/dbca/SITEA.
Database Information:
Global Database Name:SITEA
System Identifier(SID):SITEA
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/SITEA/SITEA1.log" for further details.

A few words of explanation: As I said I use SITEA as db_unique_name and the database’s SID. In Oracle Managed Files (OMF) deployments a file name references the db_unique name, and I wanted my data files, temp files and log files all to be in +DATA/SITEA/. db_unique_name defaults to db_name, and not changing it would have resulted in the database being created in +DATA/NCDB and that’s not what I wanted.

I decided to include starting the Data Guard broker as part of the initialisation parameters. Archiving is enabled right from the start as well.

The custom script you see referenced creates the necessary number of standby redo logs. This will save me a bit of work, because RMAN is smart. When duplicating the database for use as a standby it creates standby redo logs (SRL) for me if if finds SRLs in the target database. Yet another thing less for me to worry about.

Implications of using gdbname and sid (+ db_unique_name) in dbca

If you haven’t used a SID <> gDBName combination before, you might feel about unsure about how Oracle creates things. First of all, it created the database with the db_name you passed to dbca as gdbname. In this post that’s NCDB. Normally, if you grep’d the operating system’s process table for NCDB you would find all the instance’s background processes. With my dbca command however (where an Oracle SID <> gdbname is provided) you won’t find anything though. Grep for the ORACLE_SID “SITEA” instead. Likewise, if you want to create a bequeath connection to the database, set your ORACLE_SID to SITEA.

Since this is Oracle Restart the database is registered with Grid Infrastructure under its db_unique_name/ORACLE_SID combination in the RDBMS home:

[oracle@server3 ~]$ srvctl config database
SITEA
[oracle@server3 ~]$ srvctl config database -db SITEA
Database unique name: SITEA
Database name: NCDB
Oracle home: /u01/app/oracle/product/18.0.0/dbhome_1
Oracle user: oracle
Spfile: +DATA/SITEA/PARAMETERFILE/spfile.273.1003226559
Password file:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Disk Groups: DATA,RECO
Services:
OSDBA group:
OSOPER group:
Database instance: SITEA
[oracle@server3 ~]$

Interacting with the database via srvctl requires you to use the db_unique_name SITEA as shown.

This is my preferred way to set things up these days. I’d like to invite you to critically reflect about the approach I took. I prefer it over other designations such as “PROD” and “STDBY”.

Creating the standby database

With the primary database created, archivelog mode enabled plus a working Fast Recovery Area I can move on to creating the physical standby database.

Before I do this a few small changes have to be made to my network configuration. I always add TNS entries for both primary and standby database in the “primary” host’s $ORACLE_HOME/network/admin/tnsnames.ora file and ship this across to the “standby” host. They are needed later. With the preparation done, I proceed to calling dbca:

[oracle@server4 ~]$ dbca -silent -createDuplicateDB \
> -gdbName NCDB \
> -sid SITEB \
> -initParams db_create_file_dest=+DATA,db_recovery_file_dest=+RECO,dg_broker_start=true \
> -primaryDBConnectionString server3:1521/SITEA \
> -createAsStandby \
> -dbUniqueName SITEB

[ ... ]

Prepare for db operation
22% complete
Listener config step
44% complete
Auxiliary instance creation
67% complete
RMAN duplicate
89% complete
Post duplicate database operations
100% complete
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/SITEB/SITEB1.log" for further details.

So far so good. What did dbca do? And more importantly, how did it create the standby database? I didn’t have a backup of NCDB/SITEA at the time I called dbca on server4 so it must have performed a duplicate … from active database. Looking at the dbca trace in $ORACLE_BASE/cfgtoollogs/dbca/SITEB I can confirm this. After wading through a lot of messages, here’s the relevant bit:

INFO: Mar 11, 2019 12:14:04 PM oracle.assistants.dbca.driver.backend.steps.RmanDuplicateStep executeImpl
INFO: Running rman duplicate command:run {
set newname for datafile 1 to new;
set newname for datafile 3 to new;
set newname for datafile 4 to new;
set newname for datafile 7 to new;
duplicate target database
for standby
from active database
dorecover
nofilenamecheck;
}

This might be an issue for some users because it requires a direct connection to the source database. I don’t know if RMAN might use the “from service … using backupset …” clause if you provided an offline backup.

The files I care about are all in the right place:

SQL> select name from v$datafile union all
2 select name from v$tempfile union all
3 select member from v$logfile union all
4 select value from v$parameter2 where name = 'control_files';

NAME
-------------------------------------------------------
+DATA/SITEB/DATAFILE/system.268.1003228125
+DATA/SITEB/DATAFILE/sysaux.273.1003228127
+DATA/SITEB/DATAFILE/undotbs1.272.1003228129
+DATA/SITEB/DATAFILE/users.271.1003228129
+DATA/SITEB/TEMPFILE/temp.262.1003228149
+DATA/SITEB/ONLINELOG/group_3.266.1003228137
+RECO/SITEB/ONLINELOG/group_3.310.1003228137
+DATA/SITEB/ONLINELOG/group_2.267.1003228137
+RECO/SITEB/ONLINELOG/group_2.258.1003228137
+DATA/SITEB/ONLINELOG/group_1.270.1003228135
+RECO/SITEB/ONLINELOG/group_1.259.1003228135
+DATA/SITEB/ONLINELOG/group_4.265.1003228137
+RECO/SITEB/ONLINELOG/group_4.311.1003228137
+DATA/SITEB/ONLINELOG/group_5.264.1003228139
+RECO/SITEB/ONLINELOG/group_5.312.1003228139
+DATA/SITEB/ONLINELOG/group_6.274.1003228139
+RECO/SITEB/ONLINELOG/group_6.313.1003228141
+DATA/SITEB/ONLINELOG/group_7.263.1003228147
+RECO/SITEB/ONLINELOG/group_7.314.1003228147
+DATA/SITEB/CONTROLFILE/current.260.1003228117
+RECO/SITEB/CONTROLFILE/current.260.1003228117

21 rows selected.

As reported by other bloggers, dbca creates a “temporary” listener that it doesn’t remove after the database creation process is completed. I specifically wanted to see if this is the case with Oracle Restart as well and yes, it is:

[oracle@server4 SITEB]$ ps -ef | grep tnslsnr
oracle 1125 1 0 10:27 ? 00:00:00 /u01/app/oracle/product/18.0.0/dbhome_1/bin/tnslsnr LISTENER20190318102721 -inherit
grid 5837 1 0 09:01 ? 00:00:00 /u01/app/grid/product/18.0.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
oracle 7588 31809 0 10:44 pts/1 00:00:00 grep --color=auto tnslsnr

The extra listener didn’t survive a reboot though and the processes should disappear eventually. dbca persists the configuration into $ORACLE_HOME/network/admin/listener.ora though, and those entries should probably be removed. This is especially true with Oracle Restart (and RAC for that matter) because I want all listeners to be controlled from the GRID home, not the RDBMS home.

There are other leftover files in $ORACLE_HOME/dbs as well for which I’d hope dbca removes them one day.

On the plus side dbca registers the database in Oracle Restart for me as part of the work it did:

[oracle@server4 ~]$ srvctl config database -db SITEB
Database unique name: SITEB
Database name: NCDB
Oracle home: /u01/app/oracle/product/18.0.0/dbhome_1
Oracle user: oracle
Spfile: +DATA/SITEB/PARAMETERFILE/spfile.275.1003228201
Password file: +DATA/SITEB/orapwsiteb
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Disk Groups: DATA,RECO
Services:
OSDBA group:
OSOPER group:
Database instance: SITEB

It’s good to see that I don’t need to worry about storing password file and spfile in ASM myself, so that job is already taken care off. A few other things need to be changed though. The database is still considered a primary (of which I know it isn’t) and the start option therefore is wrong, too. There are licensing implications opening a standby read only rather than just mounting it when turning on redo apply. Refer to the documentation for more details.

Out of habit I changed the role to PHYSICAL_STANDBY and the start option to MOUNT. This is what I ended up with in my lab, YMMV and depends on how you are licensed.

[oracle@server4 ~]$ srvctl config database -db SITEB
Database unique name: SITEB
Database name: NCDB
Oracle home: /u01/app/oracle/product/18.0.0/dbhome_1
Oracle user: oracle
Spfile: +DATA/SITEB/PARAMETERFILE/spfile.275.1003228201
Password file: +DATA/SITEB/orapwsiteb
Domain:
Start options: mount
Stop options: immediate
Database role: PHYSICAL_STANDBY
Management policy: AUTOMATIC
Disk Groups: DATA,RECO
Services:
OSDBA group:
OSOPER group:
Database instance: SITEB

Creating the Broker configuration

The standby redo logs have been created by RMAN and with dg_broker_start set to true by dbca for both databases I can now create a Data Guard Broker configuration. With Oracle Restart I don’t have to statically register the databases with the listener anymore. Clusterware takes care of starting/stopping them during role changes.

[oracle@server3 ~]$ dgmgrl
DGMGRL for Linux: Release 18.0.0.0.0 - Production on Mon Mar 18 11:11:55 2019 Version 18.5.0.0.0

Copyright (c) 1982, 2018, Oracle and/or its affiliates. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys@SITEA
Password:
Connected to "SITEA"
Connected as SYSDBA.
DGMGRL> create configuration autoconf
> primary database is 'SITEA'
> connect identifier is 'SITEA';
Configuration "autoconf" created with primary database "SITEA"
DGMGRL> add database 'SITEB' as connect identifier is 'SITEB';
Database "SITEB" added
DGMGRL> edit database 'SITEA' set property standbyfilemanagement=auto;
Property "standbyfilemanagement" updated
DGMGRL> edit database 'SITEB' set property standbyfilemanagement=auto;
Property "standbyfilemanagement" updated
DGMGRL> enable configuration
Enabled.
DGMGRL> show configuration

Configuration - autoconf

Protection Mode: MaxPerformance
Members:
SITEA - Primary database
SITEB - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS (status updated 17 seconds ago)
DGMGRL>

You might want to increase the protection level from MaxPerformance to something that better suits your requirements. Three might be additional changes you want to do within the broker as well. The code output you just saw is the bare minimum I put into place.

Summary

All in all this has been pretty straight forward without too many hiccups along the way. If you want to you can put a lot of those steps into your favourite automation toolkit and you’ll end up with a standby database reasonably quickly provided your environment supports this kind of setup.

Advertisement

Enhanced “validate” commands in Oracle’s Data Guard Broker 18c

If you are using an Oracle Database Enterprise Edition chances are that there is at least one environment in your estate making use of Data Guard. And if you are using Data Guard, why not use the broker? I have been using Data Guard broker for a long time now, and it has definitely improved a lot over the first releases, back in the day. I like it so much these days that I feel hard done by if I can’t make use of it. This is of course a matter of personal preference, and I might be exaggerating a little :)

One of the nice additions to the broker in Oracle 12.1 was the ability to validate a database before a role change. This is documented in the Data Guard broker documentation. I certainly don’t solely rely on the output of the command, I have my own checks I’m running that go over and above what a validate can do.

During research I ran the command on my 12.2 system and noticed much enhanced output. This is super exciting and worth blogging about. Well, that is, to me at least-I hope you find this post useful. I love discovering little changes like this, they aren’t always advertised on slide 1 in “What’s new in release X” presentations, but nevertheless great boosts to productivity.

This post was nearly complete, but then rested in my drafts folder for just a little too long and 18c has been released. Time for an update!

The environment

Before moving on, here’s the stack in case you find this via a search engine:

  • Oracle Linux 7.4 powering 2 VMs: server1 and server2
  • Oracle 18.3.0, single instance, no Oracle Restart
  • Data Guard Broker configuration managing 2 databases: NCDBA and NCDBB

The broker is quite happy with my setup, at least for now.

DGMGRL> show configuration

Configuration - test

  Protection Mode: MaxAvailability
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 51 seconds ago)

DGMGRL> 

This is my setup, YMMV as always. Refer to your standard documents or other relevant documentation for more details about your configuration

New things to validate in 18c

With 12.2 it was possible to validate a datafile in addition to validating the database. Oracle 18c enhances the validate command quite a bit more:

DGMGRL> help validate

Performs an exhaustive set of validations for a member

Syntax:

  VALIDATE DATABASE [VERBOSE] ;

  VALIDATE DATABASE [VERBOSE]  DATAFILE  
    OUTPUT=;

  VALIDATE DATABASE [VERBOSE]  SPFILE;

  VALIDATE FAR_SYNC [VERBOSE]  
    [WHEN PRIMARY IS ];

  VALIDATE NETWORK CONFIGURATION FOR { ALL |  };

  VALIDATE STATIC CONNECT IDENTIFIER FOR { ALL |  };

DGMGRL> 

In this post I am going to focus on the verbose output generated by validate database, if I can find the time I’ll write about the other new options as well.

Validate database in Oracle 18c

Let’s have a look at the output of validate database verbose … Looking at the configuation status, NCDBB is currently running in recovery mode with NCDBA acting as the primary database. It makes sense to start with the validation of NCDBB first.

The output of the command is rather comprehensive as you will see, I have decided to annotate the output so you don’t have to scroll up and down that much.

DGMGRL> validate database verbose 'NCDBB'

  Database Role:     Physical standby database
  Primary Database:  NCDBA

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Flashback Database Status:
    NCDBA :  Off
    NCDBB :  Off

  Capacity Information:
    Database  Instances        Threads
    NCDBA      1               1
    NCDBB      1               1

  Managed by Clusterware:
    NCDBA :  NO
    NCDBB:  NO
    Validating static connect identifier for database NCDBA...
    The static connect identifier allows for a connection to database "NCDBA".

The first part of the output is related to the database’s role and status. Oracle reckons my standby database is ready for a role change (which I’ll double-check using my own tools and scripts). You can see that flashback database is not enabled (for reasons that don’t matter for this post).

Since both members are single instance databases it makes sense for them to have a single redo thread.

Another important piece of information can be found in the “managed by Clusterware” section. In releases prior to 12.1.0.2 you always had to statically register your databases with the listener for use with the broker. This has changed in 12.1.0.2: another one of these productivity boosters :) In modern releases you don’t need to statically register your databases with the listener provided Clusterware manages them. See MOS DocID 1387859.1 for all the details.

You read in the introduction that I’m using single instance Oracle databases without any Grid Infrastructure at all so this shiny new feature does not apply. There’s something else that might help though: Oracle validates the static connection identifier for you. It doesn’t simply ping the tns alias, the broker actually establishes a connection to the database. I noticed this in the listener.log: a program named dgmgrl connects to the database using the static conncetion identifier (as per “show database memberName staticconnectidentifier”). Output is formatted for readability:

2018-08-14 10:54:16.377000 +01:00
14-AUG-2018 10:54:16 * (CONNECT_DATA=(SERVICE_NAME=NCDBB_DGMGRL)(INSTANCE_NAME=NCDBB)(SERVER=DEDICATED)
(STATIC_SERVICE=TRUE)(CID=(PROGRAM=dgmgrl)(HOST=server1)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)
(HOST=192.168.100.21)(PORT=27049)) * establish * NCDBB_DGMGRL * 0

Let’s continue with the output of the validate database command:

  Temporary Tablespace File Information:
    NCDBA TEMP Files:   1
    NCDBB TEMP Files:   1

  Data file Online Move in Progress:
    NCDBA:  No
    NCDBB:  No

This little section compares the number of temp files and warns you of any online data file move operations.

  Standby Apply-Related Information:
    Apply State:      Running
    Apply Lag:        0 seconds (computed 0 seconds ago)
    Apply Delay:      0 minutes

  Transport-Related Information:
    Transport On:      Yes
    Gap Status:        No Gap
    Transport Lag:     0 seconds (computed 0 seconds ago)
    Transport Status:  Success


  Log Files Cleared:
    NCDBA Standby Redo Log Files:  Cleared
    NCDBB Online Redo Log Files:   Not Cleared
    NCDBB Standby Redo Log Files:  Available

  Current Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (NCDBA)                 (NCDBB)
    1         2                       3                       Sufficient SRLs

  Future Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (NCDBB)                 (NCDBA)
    1         2                       3                       Sufficient SRLs

  Current Configuration Log File Sizes:
    Thread #   Smallest Online Redo      Smallest Standby Redo
               Log File Size             Log File Size
               (NCDBA)                    (NCDBB)
    1          200 MBytes                200 MBytes

  Future Configuration Log File Sizes:
    Thread #   Smallest Online Redo      Smallest Standby Redo
               Log File Size             Log File Size
               (NCDBB)                   (NCDBA)
    1          200 MBytes                200 MBytes

This section is quite interesting as well as it allows you to have a gander at the transport lag and apply lag respectively. I can also see that I have standby redo logs both for my primary as well as the standby database. Sometimes the broker doesn’t get the number of standby redo logs right in “future log file groups configuration”. I found this to rectify itself after a switchover and switch-back.
Another reassuring fact is presented in the current and future configuration log file sizes: I follwed th documentation carefully and created all my (online and standby) redo logs the exact same size.

  Apply-Related Property Settings:
    Property                        NCDBA Value              NCDBB Value
    DelayMins                       0                        0
    ApplyParallel                   AUTO                     AUTO
    ApplyInstances                  0                        0

  Transport-Related Property Settings:
    Property                        NCDBA Value              NCDBB Value
    LogXptMode                      sync                     sync
    Dependency                                        
    DelayMins                       0                        0
    Binding                         optional                 optional
    MaxFailure                      0                        0
    MaxConnections                  1                        1
    ReopenSecs                      300                      300
    NetTimeout                      30                       30
    RedoCompression                 DISABLE                  DISABLE
    LogShipping                     ON                       ON

The section just above compares apply and transport related settings between the primary and standby databases. These are stored in Data Guard broker properties eventually mapping back to database initialisation parameters.

  Automatic Diagnostic Repository Errors:
    Error                       NCDBA    NCDBB
    No logging operation        NO       NO
    Control file corruptions    NO       NO
    SRL Group Unavailable       NO       NO
    System data file missing    NO       NO
    System data file corrupted  NO       NO
    System data file offline    NO       NO
    User data file missing      NO       NO
    User data file corrupted    NO       NO
    User data file offline      NO       NO
    Block Corruptions found     NO       NO

DGMGRL> 

And finally, it appears as if the Automatic Diagnostic Repository (ADR) didn’t show any issues. This doesn’t mean there aren’t, maybe there is still something looming in your configuration that hasn’t been detected yet.

Summary

The validate database command is really quite useful in my opinion performing basic checks before role changes. It doesn’t take the responsibility off you for ensuring the role change will be a success but I found it to be a useful first indicator.

Creating a RAC 12.1 Data Guard Physical Standby environment (3b)

Huh, what is this I hear you ask? Part 3b? Oracle 12.1? Well, there’s a bit of a story to this post. Back in December 2016 I started to write a series of blog posts (part 1 | part 2 | part 3 | part 4) about how I created a standby database on RAC 12.1. For some reason I forgot to post this part. Up until now the step where I am creating the broker configuration was missing. Thanks to a friend I discovered my mistake. I decided to post this article to complete the series. There’s a pretty big logical gap that needed filling :)

I also deliberately decided against making changes to my notes, so this is written in the same style as the articles preceding this one. Also bear in mind that this is Oracle 12.1!

NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small-scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!

Here is the original blog post

In a last-minute decision I decided to split part 3 into half: first the database duplication, followed by the Data Guard configuration. It really got a bit too long (even for my standards), and would have put readers off the article. I appreciate your patience though…

What’s the situation?

At the end of part 3 I had a physical standby database ready for use, including its registration in the OCR. it was looking promising, but there was more work to do.

At the moment, the standby database is out of sync with the primary, a fact that needs to be rectified by creating a Data Guard configuration. I prefer to do this on the command line. If you are so inclined you can of course push the mouse around and do it with OEM.

Step 1: Data Guard Broker pre-requisites

Before creating the configuration you should change the location of the broker configuration files to shared storage. This needs to be done on each side – primary and every standby – before you can start the broker.

SQL> alter system set dg_broker_config_file1='+DATA/NCDBA/dr1NCDBA.dat' scope=both sid='*';

System altered.

SQL> alter system set dg_broker_config_file2='+DATA/NCDBA/dr2NCDBA.dat' scope=both sid='*';

System altered.

SQL> show parameter dg_broker

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
dg_broker_config_file1               string      +DATA/NCDBA/dr1NCDBA.dat
dg_broker_config_file2               string      +DATA/NCDBA/dr2NCDBA.dat
dg_broker_start                      boolean     FALSE

SQL> alter system set dg_broker_start=true scope=both sid='*';

System altered.

SQL> 

I had to repeat the step on NCDBB, of course with adapted paths and file names.

Step 2: Creating a Data Guard Broker configuration

Once that’s done, you create the configuration using dgmgrl, the Data Guard Broker Manager Line Mode tool. I named my configuration RACTEST and added both databases. Here is an example of my session:

[oracle@rac12pri1 ~]$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys@ncdba
Password:
Connected as SYSDBA.
DGMGRL> create configuration ractest as primary database is 'NCDBA' connect identifier is 'NCDBA';
Configuration "ractest" created with primary database "NCDBA"
DGMGRL> add database 'NCDBB' as connect identifier is 'NCDBB';
Database "NCDBB" added
DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
DISABLED

As a first step I connect as SYSDBA before creating the configuration. You can usually type help command to remind you of the exact syntax. After the configuration is created you can start adding (standby) databases. In my case there is only 1 – physical standby – database, but you can think of environments where you have more than one standby environment although that is less common. When specifying the connect identifier you use valid net*8 names found in the default tnsnames.ora (the one in the RDBMS home).

At this point it is important to set standby_file_management to auto. Otherwise managed recovery can abort and you will have to do extra work managing data files that can easily be avoided. The broker syntax to do so involves changing database properties (type show database verbose 'NCDBA' to show all of them). Many Data Guard broker properties are not easily mapped back to the corresponding initialisation parameter. In the case of standby_file_management it is not hard, and the documentation helps with the rest.

Eventually I enable the configuration and check the status. I have found that it can take a couple of minutes at times to get the SUCCESS message.

DGMGRL> edit database 'NCDBA' set property standbyfilemanagement=auto;
Property "standbyfilemanagement" updated
DGMGRL> edit database 'NCDBB' set property standbyfilemanagement=auto;
Property "standbyfilemanagement" updated
DGMGRL> enable configuration
Enabled.
DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 21 seconds ago)

There are other things you must consider in accordance with your company’s (data protection/availability/business continuity/…) standards and other documentation – again my example is rather basic, but deliberately so. It’s my lab playground after all.

For the sake of completeness, here is the current configuration of NCDBA and NCDBB:

DGMGRL> show database 'NCDBA'

Database - NCDBA

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    NCDBA1
    NCDBA2

Database Status:
SUCCESS

DGMGRL> show database 'NCDBB';

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 223.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

DGMGRL> exit

Please note that the broker detected the fact that my systems are cluster databases and lists both instances for each database. It also shows that NCDBB2 is the apply instance. I was positively surprised about the broker’s ability to detect both instances automatically. I seem to remember cases in earlier releases where said instance detection occasionally required a little nudging to get it right.

This example is deliberately kept as minimalist as possible: it works. Most likely it won’t satisfy your requirements. Review your Data Guard (and any other relevant) standards and change the configuration accordingly.

Summary

… back to present time :) This post completes the series, all my notes are now transcribed to the blog. I hope you find this useful.

back to part 3 | on to part 4

Little things worth knowing: redo transport in Data Guard 12.2 part 2

In the first part of this article I looked at a number of views and some netstat output to show how redo is transported from the primary database to its standby systems. The long story short is that TT02 (“async ORL multi”) was found sending redo to CDB3 asynchronously whilest NSS2 (“sync”) transferred redo to the synchronised target – CDB2. Unlike v$dataguard_process wanted me to believe, it really wasn’t LGWR sending redo over the network.

In this little article I would like to show you how the standby databases CDB2 and CDB3 receive redo and how you can map this back to the primary database, closing the loop.

How does CDB2 receive redo?

First I’m looking at CDB2, which receives redo via synchronous mode. I should be able to narrow the communication down between primary and standby by referring to the LGWR and TT02 process IDs in the CLIENT_PID column on the standby. As a quick reminder, 14986 is the PID for LGWR, 15029 belongs to NSS2, and 15252 maps to TT02. Let’s try:

SQL> select db_unique_name,database_role from v$database;

DB_UNIQUE_NAME                 DATABASE_ROLE
------------------------------ ----------------
CDB2                           PHYSICAL STANDBY

SQL> select name, pid, role, action, client_pid, client_role, sequence#, block#, dest_id
  2  from v$dataguard_process where client_pid in (14986,15029,15252);

NAME  PID        ROLE                    ACTION       CLIENT_PID CLIENT_ROLE       SEQUENCE#     BLOCK#    DEST_ID
----- ---------- ----------------------- ------------ ---------- ---------------- ---------- ---------- ----------
rfs   5517       RFS sync                IDLE              14986 log writer               95     174565          0

So it would appear the process responsible for shipping redo to “SYNC” destinations is the log writer. Actually, the output of v$dataguard_process is quite interesting, which is why I’m adding it here for the sake of completeness:

SQL> select name, pid, role, action, client_pid, client_role, sequence#, block#, dest_id
  2  from v$dataguard_process order by action;

NAME  PID        ROLE                    ACTION       CLIENT_PID CLIENT_ROLE       SEQUENCE#     BLOCK#    DEST_ID
----- ---------- ----------------------- ------------ ---------- ---------------- ---------- ---------- ----------
rfs   5517       RFS sync                IDLE              14986 log writer               95     229446          0
rfs   5350       RFS archive             IDLE              15224 archive gap               0          0          1
rfs   5346       RFS ping                IDLE              15124 gap manager              95          0          0
rfs   5354       RFS archive             IDLE              15233 archive gap               0          0          1
MRP0  5348       managed recovery        IDLE                  0 none                      0          0          0
rfs   5352       RFS archive             IDLE              15240 archive gap               0          0          1
LGWR  5207       log writer              IDLE                  0 none                      0          0          0
TT01  5259       redo transport timer    IDLE                  0 none                      0          0          0
TT00  5255       gap manager             IDLE                  0 none                      0          0          0
ARC1  5263       archive redo            IDLE                  0 none                      0          0          0
ARC2  5265       archive redo            IDLE                  0 none                      0          0          0
ARC3  5267       archive redo            IDLE                  0 none                      0          0          0
TMON  5242       redo transport monitor  IDLE                  0 none                      0          0          0
ARC0  5257       archive local           IDLE                  0 none                      0          0          0

14 rows selected.

This view tells me that LGWR is attached to the RFS sync proces. But now I know better than that, and it is similar to what I saw on the primary. Looking a little closer, I can see that strictly speaking, the RFS process is connected to NSS2:

[root@server2 ~]# netstat -tunalp | egrep 'Active|Proto|5517'
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp6       0      0 192.168.100.22:1521     192.168.100.21:15515    ESTABLISHED 5517/oracleCDB2   

I am repeating the values for the primary here so you don’t have to go back to the previous article:

[root@server1 ~]# ps -ef | egrep 'lgwr|lg0|nss'
oracle   14986     1  0 09:58 ?        00:01:19 ora_lgwr_CDB1
oracle   14990     1  0 09:58 ?        00:00:00 ora_lg00_CDB1
oracle   14994     1  0 09:58 ?        00:00:00 ora_lg01_CDB1
oracle   15029     1  0 09:58 ?        00:00:43 ora_nss2_CDB1

[root@server1 ~]# netstat -vW --numeric-ports -ee -p -la --tcp | egrep '14986|14990|14994|15029'
tcp        0      0 server1.example.com:15515 server2.example.com:1521 ESTABLISHED oracle     16400768   15029/ora_nss2_CDB1

You will notice that port 15515 on server1 belongs to ora_nss2_CDB1.

Going back a little to v$dataguard_process, it seems a bit weird to see MRP0 as “idle” when the database is in managed recovery mode using real time apply. Trying something else I am querying v$managed_standby and voila: MRP0 is said to apply logs:

SQL> select process,pid,status,client_process,client_pid,sequence#,block# 
  2  from v$managed_standby order by status;

PROCESS   PID        STATUS       CLIENT_P CLIENT_PID                                SEQUENCE#     BLOCK#
--------- ---------- ------------ -------- ---------------------------------------- ---------- ----------
DGRD      5255       ALLOCATED    N/A      N/A                                               0          0
DGRD      5259       ALLOCATED    N/A      N/A                                               0          0
MRP0      5348       APPLYING_LOG N/A      N/A                                              95     246625
ARCH      5257       CLOSING      ARCH     5257                                             92       4096
ARCH      5263       CLOSING      ARCH     5263                                             93       2048
ARCH      5265       CLOSING      ARCH     5265                                             94     342016
ARCH      5267       CONNECTED    ARCH     5267                                              0          0
RFS       5350       IDLE         UNKNOWN  15224                                             0          0
RFS       5354       IDLE         UNKNOWN  15233                                             0          0
RFS       5352       IDLE         UNKNOWN  15240                                             0          0
RFS       5517       IDLE         LGWR     14986                                            95     246626
RFS       5346       IDLE         Archival 15124                                             0          0

12 rows selected.

I guess that’s true, as the system is in constant recovery using the standby logfiles.

And what about CDB3?

On the other hand, CDB3 – to which redo is shipped asynchronously – lists TT02 as it’s counterpart:

SQL> select db_unique_name,database_role from v$database;

DB_UNIQUE_NAME                 DATABASE_ROLE
------------------------------ ----------------
CDB3                           PHYSICAL STANDBY

SQL> select name, pid, role, action, client_pid, client_role, sequence#, block#, dest_id 
  2   from v$dataguard_process order by action;

NAME  PID                      ROLE                    ACTION       CLIENT_PID CLIENT_ROLE       SEQUENCE#     BLOCK#    DEST_ID
----- ------------------------ ----------------------- ------------ ---------- ---------------- ---------- ---------- ----------
rfs   14803                    RFS ping                IDLE              15124 gap manager              96          0          0
rfs   14809                    RFS archive             IDLE              15233 archive gap               0          0          0
rfs   14811                    RFS async               IDLE              15252 async ORL multi          96      34674          0
MRP0  11825                    managed recovery        IDLE                  0 none                      0          0          0
ARC0  11776                    archive local           IDLE                  0 none                      0          0          0
ARC2  11786                    archive redo            IDLE                  0 none                      0          0          0
TT00  11774                    gap manager             IDLE                  0 none                      0          0          0
ARC3  11788                    archive redo            IDLE                  0 none                      0          0          0
TMON  11706                    redo transport monitor  IDLE                  0 none                      0          0          0
LGWR  11676                    log writer              IDLE                  0 none                      0          0          0
ARC1  11784                    archive redo            IDLE                  0 none                      0          0          0
TT01  11778                    redo transport timer    IDLE                  0 none                      0          0          0

12 rows selected.

Unlike the case with CDB2, the local RFS process is indeed connecting to TT02 on server1:

[root@server2 ~]# netstat -vW --numeric-ports -ee -p -la --tcp | egrep '^Active|^Proto|14811'
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name    
tcp6       0      0 server2.example.com:1521 server1.example.com:12936 ESTABLISHED oracle     4658198    14811/oracleCDB3    

… and on server1:

[root@server1 ~]# netstat -vW --numeric-ports -ee -p -la --tcp | egrep '^Active|^Proto|12936'
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address            State       User       Inode      PID/Program name    
tcp        0      0 server1.example.com:12936 server2.example.com:1521 ESTABLISHED oracle     15820538   15252/ora_tt02_CDB1 

This should be enough evidence, I rest my case :)

Summary

So I guess that answers my question: On my small VMs in the lab, NSSn is responsible for shipping redo to targets in “SYNC” mode. The redo transport server processes TTnn ship redo to destinations that are defined for asynchronous transport.

Creating a RAC 12.1 Data Guard Physical Standby environment (4)

In the previous three parts of this series a lot of preparation work, needed for the configuration of Data Guard, was performed. In this part of the mini-series they all come to fruition. Using the Data Guard broker a switchover operation will be performed. A couple of new features in 12c make this easier. According to the “Changes in This Release for Oracle Data Guard Concepts and Administration” chapter of the 12.1 Data Guard Concepts and Administration guide:

When [you, ed.] perform a switchover from an Oracle RAC primary database to a physical standby database, it is no longer necessary to shut down all but one primary database instance.

I have always wanted to test that in a quiet moment…

I have previously blogged about another useful change that should make my life easier: the static registration of the *_DGMGRL services in the listener.ora file is no longer needed. Have a look at my Data Guard Broker Setup Changes post for more details and reference to the documentation.

NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!

Now let’s get to it.

Step 1: Check the status of the configuration

In the first step I always check the configuration and make sure I can switch over. Data Guard 12c has a nifty automatic check that helps, but I always have a list of tasks I perform prior to a switchover (not shown in this blog post).

The following commands are somewhat sensitive to availability of the network – you should protect your sessions against any type of network failure! I am using screen (1) for that purpose, there are other tools out there doing similar things. Network glitches are too common to ignore, and I have come to appreciate the ability to resume work without too many problems after having seen the dreaded “broken pipe” message in my terminal window…

[oracle@rac12sec1 ~]$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys@ncdbb
Password:
Connected as SYSDBA.
DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 55 seconds ago)

DGMGRL> validate database 'NCDBB'
...

The command to check for switchover readiness is new to 12c as well and called “validate database”. I don’t have screen output from the situation at this point-just take my word that I was ready :) Don’t switch over if you have any concerns or doubts the operation might not succeed! “Validate database” does not relieve you from your duties to check for switchover readiness – follow your procedures.

Step 2: Switch Over

Finally, the big moment has come! It takes just one line to perform the switchover:

DGMGRL> switchover to 'NCDBB'
Performing switchover NOW, please wait...
New primary database "NCDBB" is opening...
Oracle Clusterware is restarting database "NCDBA" ...
Switchover succeeded, new primary is "NCDBB"
DGMGRL> 

DGMGRL> show database 'NCDBA';

Database - NCDBA

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 9.00 KByte/s
  Real Time Query:    ON
  Instance(s):
    NCDBA1
    NCDBA2 (apply instance)

Database Status:
SUCCESS

DGMGRL> show database 'NCDBB';

Database - NCDBB

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    NCDBB1
    NCDBB2

Database Status:
SUCCESS

DGMGRL> 

Well that was easy! Did you notice Data Guard Broker telling us that ‘Oracle Clusterware is restarting database “NCDBA” …’ ? I like it.

If you get stuck at this point something has gone wrong with the database registration in the OCR. You shouldn’t run into problems though, because you tested every aspect of the RAC system before handing the system over to its intended users, didn’t you?

Validating the new standby database shows no issues. I haven’t noticed it before but “validate database” allows you to get more verbose output:

DGMGRL> validate database verbose 'NCDBA';

  Database Role:     Physical standby database
  Primary Database:  NCDBB

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Capacity Information:
    Database  Instances        Threads
    NCDBB     2                2
    NCDBA     2                2

  Temporary Tablespace File Information:
    NCDBB TEMP Files:  1
    NCDBA TEMP Files:  1

  Flashback Database Status:
    NCDBB:  On
    NCDBA:  Off

  Data file Online Move in Progress:
    NCDBB:  No
    NCDBA:  No

  Standby Apply-Related Information:
    Apply State:      Running
    Apply Lag:        0 seconds (computed 1 second ago)
    Apply Delay:      0 minutes

  Transport-Related Information:
    Transport On:      Yes
    Gap Status:        No Gap
    Transport Lag:     0 seconds (computed 1 second ago)
    Transport Status:  Success

  Log Files Cleared:
    NCDBB Standby Redo Log Files:  Cleared
    NCDBA Online Redo Log Files:   Cleared
    NCDBA Standby Redo Log Files:  Available

  Current Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (NCDBB)                 (NCDBA)
    1         2                       3                       Sufficient SRLs
    2         2                       3                       Sufficient SRLs

  Future Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (NCDBA)                 (NCDBB)
    1         2                       3                       Sufficient SRLs
    2         2                       3                       Sufficient SRLs

  Current Configuration Log File Sizes:
    Thread #   Smallest Online Redo      Smallest Standby Redo
               Log File Size             Log File Size
               (NCDBB)                   (NCDBA)
    1          50 MBytes                 50 MBytes
    2          50 MBytes                 50 MBytes

  Future Configuration Log File Sizes:
    Thread #   Smallest Online Redo      Smallest Standby Redo
               Log File Size             Log File Size
               (NCDBA)                   (NCDBB)
    1          50 MBytes                 50 MBytes
    2          50 MBytes                 50 MBytes

  Apply-Related Property Settings:
    Property                        NCDBB Value              NCDBA Value
    DelayMins                       0                        0
    ApplyParallel                   AUTO                     AUTO

  Transport-Related Property Settings:
    Property                        NCDBB Value              NCDBA Value
    LogXptMode                      ASYNC                    ASYNC
    RedoRoutes                      <empty>                  <empty>
    Dependency                      <empty>                  <empty>
    DelayMins                       0                        0
    Binding                         optional                 optional
    MaxFailure                      0                        0
    MaxConnections                  1                        1
    ReopenSecs                      300                      300
    NetTimeout                      30                       30
    RedoCompression                 DISABLE                  DISABLE
    LogShipping                     ON                       ON

  Automatic Diagnostic Repository Errors:
    Error                       NCDBB    NCDBA
    No logging operation        NO       NO
    Control file corruptions    NO       NO
    SRL Group Unavailable       NO       NO
    System data file missing    NO       NO
    System data file corrupted  NO       NO
    System data file offline    NO       NO
    User data file missing      NO       NO
    User data file corrupted    NO       NO
    User data file offline      NO       NO
    Block Corruptions found     NO       NO

DGMGRL> 

Isn’t that cool? That’s more information at my fingertips than I can shake a stick at! It’s also a lot more than I could think of (eg online datafile move!).

Interestingly the Broker reports that I have “Sufficient SRLs”. I have seen it complain about the number of Standby Redo Logs in the past and blogged about this Interesting observation about standby redo logs in Data Guard

Summary

After 4 (!) posts about the matter I have finally been able to perform a switchover operation. Role reversals are a much neglected operation a DBA should be comfortable with. In a crisis situation everyone needs to be clear about what needs to be done to restore service to the users. The database is usually the easier part … Success of Data Guard switchover operations also depends on the quality of change management: it is easy to “forget” applying configuration changes on the DR site.

In today’s busy times only few of us are lucky enough to intimately know each and every database we look after. What’s more common (sadly!) is that a DBA looks after 42 or more databases. This really only works without too many issues if procedures and standards are rock solid, and enforced.

Creating a RAC 12.1 Data Guard Physical Standby environment (2)

In the first part of this mini-series you saw me define the environment as well as creating a primary database. With that out of the way it’s time to think about the standby. Before the standby can be created, a few preparations are necessary both on the primary as well as the standby cluster.

NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!

Preparing the Creation of the Standby Database

It’s easier to follow the discussion if you keep in mind that my primary database is NCDBA and the standby is NCDBB.

Step 1: Listener Configuration

The first step to be completed is the network setup. I am planning on using an active database duplication which requires my node listeners on the cluster to “know” about my database instances. Thankfully since RAC 12.1 we don’t need to register the *_DGMGRL service (used for switchover operations by the Broker) anymore, making life a bit easier. The static registration of databases may not be necessary if you are using a different RMAN duplicate approach.

By default Oracle does not statically register databases, so it’s up to you to do so. On rac12sec1 I added a SID_LIST_LISTENER section (note: only the node listener!) to /u01/app/12.1.0.2/grid/network/admin/listener.ora. Remember that the node listeners reside in the GRID_HOME.

...
SID_LIST_LISTENER =
  (SID_LIST =
...
    (SID_DESC =
      (GLOBAL_DBNAME = NCDBB)
      (ORACLE_HOME = /u01/app/oracle/product/12.1.0.2/dbhome_1)
      (SID_NAME = NCDBB1)
    )
...
  )
...

Likewise, I added a static registration to rac12sec2 – identical to the previous example except for the ORACLE_SID, which is NCDBB2 of course. After which I reloaded the listeners on both nodes.

Step 2: Service Name Configuration

The next file to change is the RDBMS home’s tnsnames.ora. To make sure I cover all eventualities I created the following tnsnames.ora on all cluster nodes, primary and standby RAC

[oracle@rac12pri1 ~]$ cat $ORACLE_HOME/network/admin/tnsnames.ora 
# tnsnames.ora Network Configuration File: /u01/app/oracle/product/12.1.0.2/dbhome_1/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.

NCDBA =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac12pri-scan)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = NCDBA)
    )
  )

NCDBB =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12sec-scan)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = NCDBB)
    )
  )

# only used for RMAN duplicate ... from active database
NCDBA_DUP =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12pri1-vip)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12pri2-vip)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = NCDBA)
    )
  )

NCDBB_DUP =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12sec1-vip)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12sec2-vip)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = NCDBB)
    )
  )

The first two entires are self-explanatory. The second pair requires a little more explanation. Since I have statically registered my standby database with the node listeners only (eg not with the SCAN listeners-I never tried and don’t like the thought of that) the duplicate would fail trying to restart the auxiliary instance. I haven’t captured the output of the failed RMAN duplicate but it’s possible to show you the effect of using the “wrong” tnsname with a simple test (the database I want to connect to is of course down)

[oracle@rac12pri1 ~]$ sqlplus sys@NCDBB as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 14 14:36:49 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter password: 
ERROR:
ORA-12514: TNS:listener does not currently know of service requested in connect
descriptor


Enter user-name: 
[oracle@rac12pri1 ~]$ sqlplus sys@NCDBB_DUP as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 14 14:36:59 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter password: 
Connected to an idle instance.

SQL> 

So it seems to be better to use the net service name where I can be sure the database can be controlled from a remote session :)

NB: this test required a password file for the standby database which doesn’t exist yet and will be created in part 3.

Step 3: Modify the oratab file

You also should change the oratab and add NCDBB on both standby hosts, as shown here for and rac12sec2.

[oracle@rac12sec2 ~]$ grep NCDBB /etc/oratab
NCDBB:/u01/app/oracle/product/12.1.0.2/dbhome_1:N

Step 4: Prepare the Primary Database: Archivelog mode, Force Logging, and maybe even Flashback Database

There are a number of tasks to be completed on the primary database before I can start duplicating it for a standby environment. I need to make sure it’s in Archivelog mode, and that force logging is enabled. If I can afford to take the overhead, I enable Flashback Database as well to help reinstating a failed primary should that need arise. Of course, you don’t just enable Flashback Database, because it can have a negative impact on performance, and the Flashback logs will use space. Your FRA space usage calculation can be derailed if you are not careful. Thorough testing is needed to assess whether it is ok to operate with Flashback Database enabled.

The first task in my scenario is to enable Archivelog mode and to enforce logging.

[oracle@rac12pri1 ~]$ sq

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 14 09:52:11 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> archive log list
Database log mode              No Archive Mode
Automatic archival             Disabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     15
Current log sequence           16
SQL> exit

...

[oracle@rac12pri1 ~]$ srvctl stop database -d NCDBA
[oracle@rac12pri1 ~]$ sq

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 14 10:00:34 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup mount
ORACLE instance started.

Total System Global Area 3221225472 bytes
Fixed Size                  2929552 bytes
Variable Size             838863984 bytes
Database Buffers         2365587456 bytes
Redo Buffers               13844480 bytes
Database mounted.
SQL> alter database archivelog;

Database altered.

SQL> alter database force logging;

Database altered.

SQL>  alter database open;

Database altered.

SQL> exit

It is easy to check if that was a success:

[oracle@rac12pri1 ~]$ srvctl start instance -db NCDBA -node rac12pri2

[oracle@rac12pri1 ~]$ sq

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 14 15:04:28 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     15
Next log sequence to archive   16
Current log sequence           16

Step 5: Add Standby Redo Logs

In order to use Real Time Apply you need Standby Redo Logs. If you create them now (eg before the RMAN duplicate) you are saving time. First of all you don’t need to create Standby Redo Logs (SRL) post switchover when NCDBA operates as a standby database. Secondly, if there are SRLs in place, they are automatically created on the standby database as part of the RMAN duplicate.

You need n+1 SRLs per thread. You can work the number out by looking at v$log. My tiny database has 2 threads with 2 members each – ok for the lab, almost certainly not ok for anything serious. Helps me keep storage footprint in the lab low though.

SQL> select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME     CON_ID
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- ------------ --------- ----------
         1          1         15   52428800        512          2 YES INACTIVE               1621242 14-DEC-16      1621618 14-DEC-16          0
         2          1         16   52428800        512          2 NO  CURRENT                1621618 14-DEC-16   2.8147E+14                    0
         3          2          1   52428800        512          2 YES INACTIVE               1620677 14-DEC-16      1628674 14-DEC-16          0
         4          2          2   52428800        512          2 NO  CURRENT                1629454 14-DEC-16   2.8147E+14 14-DEC-16          0

SQL> 

With that information I can create 3 SRLs per thread. Did I mention that I love Oracle Managed Files?

SQL> alter database add standby logfile thread 1 size 52428800;

Database altered.

SQL> /

Database altered.

SQL> /

Database altered.

SQL> alter database add standby logfile thread 2 size 52428800;

Database altered.

SQL> /

Database altered.

SQL> /

Database altered.

SQL> select group#,thread#,sequence#,bytes,status from v$standby_log;

    GROUP#    THREAD#  SEQUENCE#      BYTES STATUS
---------- ---------- ---------- ---------- ----------
         5          1          0   52428800 UNASSIGNED
         6          1          0   52428800 UNASSIGNED
         7          1          0   52428800 UNASSIGNED
         8          2          0   52428800 UNASSIGNED
         9          2          0   52428800 UNASSIGNED
        10          2          0   52428800 UNASSIGNED

6 rows selected.

The primary database should be in relatively good shape now – Flashback Database is off to simulate the case where my fictional load is considered to impact performance too much.

SQL> select name,db_unique_name,database_role,log_mode,force_logging,flashback_on from v$database;

NAME      DB_UNIQUE_NAME  DATABASE_ROLE    LOG_MODE     FORCE_LOGGING  FLASHBACK_ON
--------- --------------- ---------------- ------------ -------------- ------------------
NCDBA     NCDBA           PRIMARY          ARCHIVELOG   YES            NO

SQL>

Summary

This concludes part 2. In this blog post I shared my notes about configuration the network for RMAN duplicate … from active database. I also configured the primary database in preparation for the initial duplicate command.

Note that I’m not setting any initialisation parameters related to Data Guard: over time I have become a great fan of the Data Guard Broker and plan on using that instead of a manual configuration.

Creating a RAC 12.1 Data Guard Physical Standby environment (1)

I have just realised that the number of posts about RAC 12c Release 1 on this blog is rather too small. And since I’m a great fan of RAC this has to change :) In this mini-series I am going to share my notes about creating a Data Guard setup on my 2 node 12.1.0.2.161018 RAC primary + identical 2 node RAC standby system in the lab.

NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!

The lab Environment

My environment consists of the following entities:

  • Primary Site
    • Oracle Linux 7.2 – UEK3 x86_64
    • Two RAC nodes: rac12pri1 and rac12pri2
    • Grid Infrastructure 12.1.0.2.161018 in /u01/app/12.1.0.2/grid
    • Single RDBMS home 12.1.0.2.161018 in /u01/app/oracle/product/12.1.0.2/dbhome_1
    • Non-container database NCDBA is administrator managed and running on both nodes
  • Standby Site
    • Oracle Linux 7.2 – UEK3 x86_64
    • Two RAC nodes: rac12sec1 and rac12sec2
    • Grid Infrastructure 12.1.0.2.161018 in /u01/app/12.1.0.2/grid
    • Single RDBMS home 12.1.0.2.161018 in /u01/app/oracle/product/12.1.0.2/dbhome_1
    • Standby database NCDBB is administrator managed and running on both nodes

I won’t focus on the creation of the RAC systems, I may have covered some of it in earlier blog posts and of course in the RAC Book.

I have deliberately kept it simple. Although most systems in real life use a dedicated (set of) NICs for Data Guard traffic I decided against it-I didn’t want attention being drawn away from the Big Picture. Similarly I am not touching on the option to create a second SCAN that Oracle allows us to create from 12.1 onwards. If you are interested in these topics kindly refer to my other blog posts.

Creation of the Primary Database

After both RAC systems are set up it’s time to start with the creation of the primary database. This is easy:

dbca -silent -createDatabase -templateName RACDB.dbc \
-gdbName NCDBA -sysPassword ... -systemPassword ... -storageType ASM \
-diskGroupName DATA -recoveryGroupName RECO -sampleSchema true \
-totalMemory 4096 -dbsnmpPassword ... -nodeinfo rac12pri1,rac12pri2

The template referenced in “-templateName” is my own – I always create templates to be license compliant. I covered how to create your custom database template on this blog as well.

I won’t go into detail here about the naming of my databases in a Data Guard configuration. What I learned the hard way was not to use a DB_UNIQUE_NAME that reflects the role. Imagine everyone’s surprise when they connect to a database named STDBY operating in the primary role after a switchover… For lack of better ideas I went ahead and enumerated the databases: my primary database is NCDBA and the standby is NCDBB.

After the database is created, it is started automatically by DBCA.

[oracle@rac12pri1 ~]$ srvctl status database -db NCDBA
Instance NCDBA1 is running on node rac12pri1
Instance NCDBA2 is running on node rac12pri2
[oracle@rac12pri1 ~]$

However, the newly created database isn’t patched (this is a known issue documented on Mike Dietrich’s blog for example).

SQL> select name from v$database;

NAME
---------
NCDBA

SQL> select count(*) from dba_registry_sqlpatch;

  COUNT(*)
----------
         0

No way around it – time to call datapatch:

SQL> alter system set cluster_database=false scope=spfile sid='*';

System altered.

SQL> exit

...

[oracle@rac12pri1 ~]$ srvctl stop database -db NCDBA
[oracle@rac12pri1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 14 13:39:04 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup upgrade
ORACLE instance started.

Total System Global Area 3221225472 bytes
Fixed Size                  2929552 bytes
Variable Size             771755120 bytes
Database Buffers         2432696320 bytes
Redo Buffers               13844480 bytes
Database mounted.
Database opened.
SQL>  exit

...

[oracle@rac12pri1 ~]$ cd $ORACLE_HOME/OPatch/
[oracle@rac12pri1 OPatch]$ ./datapatch -verbose
SQL Patching tool version 12.1.0.2.0 on Wed Dec 14 13:08:51 2016
Copyright (c) 2016, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_16313_2016_12_14_13_08_51/sqlpatch_invocation.log

Connecting to database...OK
Bootstrapping registry and package to current versions...done
Determining current state...done

Current state of SQL patches:
Patch 24315824 (Database PSU 12.1.0.2.161018, Oracle JavaVM Component (OCT2016)):
  Installed in the binary registry only
Bundle series DBBP:
  ID 161018 in the binary registry and not installed in the SQL registry

Adding patches to installation queue and performing prereq checks...
Installation queue:
  Nothing to roll back
  The following patches will be applied:
    24315824 (Database PSU 12.1.0.2.161018, Oracle JavaVM Component (OCT2016))
    24340679 (DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679))

Installing patches...
Patch installation complete.  Total patches installed: 2

Validating logfiles...
Patch 24315824 apply: SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/24315824/20676808/24315824_apply_NCDBA_2016Dec14_13_09_26.log (no errors)
Patch 24340679 apply: SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/24340679/20713212/24340679_apply_NCDBA_2016Dec14_13_09_30.log (no errors)
SQL Patching tool complete on Wed Dec 14 13:15:08 2016
[oracle@rac12pri1 OPatch]$

This concludes part 1 – the database is now set up and running on the primary cluster. In the next part I’m going to describe how to prepare the primary and standby cluster for the Data Guard setup.

Little things worth knowing: Data Guard Broker Setup changes in 12c

One of the problems I have seen when deploying Data Guard for systems such as RAC One Node and policy managed databases was the static listener configuration you needed in 11.2. This has changed with 12c for the better if you are using Grid Infrastructure.

http://docs.oracle.com/database/121/DGBKR/install.htm

In the section about static listener registration a little addendum can be found (thanks to Patrick Hurley/@phurley for pointing this out to me!):

“A static service needs to be defined and registered only if Oracle Clusterware or Oracle Restart is not being used.”

This is good news, let’s put it to the test; I’m a great fan of Oracle Restart. If I ever find the time I’d like to repeat this test with clustered Grid Infrastructure. I think the quote mentioned earlier still stands true but I would like to see it with my own eyes.

The Setup

To start with I used a dbca-created database named “NCDB” on my server named “server1”. It uses the DATA and RECO disk groups and is a non-CDB (although I’m quite sure that doesn’t matter). The patch level is current at the time of writing, I’m on 12.1.0.2.4 (which is the OJVM and PSU bundle patch for July 2015):

[oracle@server1 ~]$  /u01/app/oracle/product/12.1.0.2/dbhome_1/OPatch/opatch lspatches
21068507;Database PSU 12.1.0.2.4, Oracle JavaVM Component (Jul2015)
20831113;OCW Patch Set Update : 12.1.0.2.4 (20831113)
20831110;Database Patch Set Update : 12.1.0.2.4 (20831110)

OPatch succeeded.

[oracle@server1 ~]$  /u01/app/oracle/product/12.1.0.2/grid/OPatch/opatch lspatches
20831113;OCW Patch Set Update : 12.1.0.2.4 (20831113)
20831110;Database Patch Set Update : 12.1.0.2.4 (20831110)
20299018;ACFS Patch Set Update : 12.1.0.2.3 (20299018)
19872484;WLM Patch Set Update: 12.1.0.2.2 (19872484)

OPatch succeeded.

The future standby database will be called STDBY. This is for demonstration purposes only, I’m not a fan of using db_unique_name implying roles. Imagine you switch over to STDBY and run it as primary database for extended periods of time-that would be potentially confusing.

With that in mind, I created/updated a common tnsnames.ora on server1 and server2:

[oracle@server1 ~]$ cat $ORACLE_HOME/network/admin/tnsnames.ora
# tnsnames.ora Network Configuration File: 
# /u01/app/oracle/product/12.1.0.2/dbhome_1/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.

NCDB =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = server1.example.com)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = NCDB)
    )
  )

STDBY =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = server2.example.com)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = STDBY)
    )
  )

The duplicate command was a simple “duplicate target database for standby” after creating a simple pfile and auxiliary structures (audit_dump_dest, large pages etc) were in place. For this to work you need a copy of the controlfile for the standby in place as well as a backup of the primary database).

RMAN> duplicate target database for standby;

Starting Duplicate Db at 27-JUL-15
using channel ORA_AUX_DISK_1
using channel ORA_AUX_DISK_2

contents of Memory Script:
{
   sql clone "alter system set  control_files =
  ''+DATA/STDBY/CONTROLFILE/current.258.886176617'', 
  ''+RECO/STDBY/CONTROLFILE/current.257.886176617'' comment=
 ''Set by RMAN'' scope=spfile";
   restore clone standby controlfile;
}
executing Memory Script

sql statement: alter system set  control_files =   ''+DATA/STDBY/CONTROLFILE/current.258.886176617'', 
 ''+RECO/STDBY/CONTROLFILE/current.257.886176617'' comment= ''Set by RMAN'' scope=spfile

Starting restore at 27-JUL-15
using channel ORA_AUX_DISK_1
using channel ORA_AUX_DISK_2

channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: restoring control file
channel ORA_AUX_DISK_1: reading from backup piece +RECO/NCDB/AUTOBACKUP/2015_07_27/s_886176583.273.886176583
channel ORA_AUX_DISK_1: ORA-19870: error while restoring backup piece +RECO/NCDB/AUTOBACKUP/2015_07_27/s_886176583.273.886176583
ORA-19505: failed to identify file "+RECO/NCDB/AUTOBACKUP/2015_07_27/s_886176583.273.886176583"
ORA-17503: ksfdopn:2 Failed to open file +RECO/NCDB/AUTOBACKUP/2015_07_27/s_886176583.273.886176583
ORA-15012: ASM file '+RECO/NCDB/AUTOBACKUP/2015_07_27/s_886176583.273.886176583' does not exist

failover to previous backup

channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: restoring control file
channel ORA_AUX_DISK_1: reading from backup piece /u01/oraback/NCDB/09qd3uq4_1_1
channel ORA_AUX_DISK_1: piece handle=/u01/oraback/NCDB/09qd3uq4_1_1 tag=TAG20150727T160940
channel ORA_AUX_DISK_1: restored backup piece 1
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:01
output file name=+DATA/STDBY/CONTROLFILE/current.259.886176619
output file name=+RECO/STDBY/CONTROLFILE/current.258.886176619
Finished restore at 27-JUL-15
contents of Memory Script:
{
   sql clone 'alter database mount standby database';
}
executing Memory Script

sql statement: alter database mount standby database

contents of Memory Script:
{
   set newname for clone tempfile  1 to new;
   switch clone tempfile all;
   set newname for clone datafile  1 to new;
   set newname for clone datafile  3 to new;
   set newname for clone datafile  4 to new;
   set newname for clone datafile  5 to new;
   set newname for clone datafile  6 to new;
   restore
   clone database
   ;
}
executing Memory Script

executing command: SET NEWNAME

renamed tempfile 1 to +DATA in control file

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

Starting restore at 27-JUL-15
using channel ORA_AUX_DISK_1
using channel ORA_AUX_DISK_2

channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00004 to +DATA
channel ORA_AUX_DISK_1: restoring datafile 00005 to +DATA
channel ORA_AUX_DISK_1: restoring datafile 00006 to +DATA
channel ORA_AUX_DISK_1: reading from backup piece /u01/oraback/NCDB/05qd3uas_1_1
channel ORA_AUX_DISK_2: starting datafile backup set restore
channel ORA_AUX_DISK_2: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_2: restoring datafile 00001 to +DATA
channel ORA_AUX_DISK_2: restoring datafile 00003 to +DATA
channel ORA_AUX_DISK_2: reading from backup piece /u01/oraback/NCDB/06qd3uat_1_1
channel ORA_AUX_DISK_1: piece handle=/u01/oraback/NCDB/05qd3uas_1_1 tag=TAG20150727T160132
channel ORA_AUX_DISK_1: restored backup piece 1
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:15
channel ORA_AUX_DISK_2: piece handle=/u01/oraback/NCDB/06qd3uat_1_1 tag=TAG20150727T160132
channel ORA_AUX_DISK_2: restored backup piece 1
channel ORA_AUX_DISK_2: restore complete, elapsed time: 00:00:15
Finished restore at 27-JUL-15

contents of Memory Script:
{
   switch clone datafile all;
}
executing Memory Script
datafile 1 switched to datafile copy
input datafile copy RECID=6 STAMP=886176640 file name=+DATA/STDBY/DATAFILE/system.261.886176625
datafile 3 switched to datafile copy
input datafile copy RECID=7 STAMP=886176641 file name=+DATA/STDBY/DATAFILE/sysaux.263.886176625
datafile 4 switched to datafile copy
input datafile copy RECID=8 STAMP=886176641 file name=+DATA/STDBY/DATAFILE/undotbs1.262.886176625
datafile 5 switched to datafile copy
input datafile copy RECID=9 STAMP=886176641 file name=+DATA/STDBY/DATAFILE/example.260.886176625
datafile 6 switched to datafile copy
input datafile copy RECID=10 STAMP=886176641 file name=+DATA/STDBY/DATAFILE/users.264.886176625
Finished Duplicate Db at 27-JUL-15

RMAN>

That’s a working standby database. I will have to register it with Grid Infrastructure next.

[oracle@server2 ~]$ srvctl add database -db STDBY -oraclehome /u01/app/oracle/product/12.1.0.2/dbhome_1 \
> -role physical_standby -startoption MOUNT -policy automatic -diskgroup data,reco

Now all I need to do is set standby_file_management to auto, add standby redo logs to both databases and enable the broker. I’ll not show these here.

Broker Configuration

The next step in the deployment of my standby database is the creation of a Data Guard Broker configuration. I always create it using dgmgrl.

[oracle@server1 ~]$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys@ncdb
Password:
Connected as SYSDBA.

DGMGRL> CREATE CONFIGURATION brokertest AS
>  PRIMARY DATABASE IS 'NCDB'
>  CONNECT IDENTIFIER IS 'NCDB';
Configuration "brokertest" created with primary database "NCDB"

DGMGRL>  add database 'STDBY' as connect identifier is 'STDBY';
Database "STDBY" added

DGMGRL> enable configuration;
Enabled.
DGMGRL> show configuration;

Configuration - brokertest

  Protection Mode: MaxPerformance
  Members:
  NCDB  - Primary database
    STDBY - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 5 seconds ago)

That’s looking good so far! I can see archived redo logs being exchanged, and I can see MRP0 applying data in real time on the standby.

Recap

This is a reference for the settings in the Data Guard Broker. I have not modified any listener.ora file on any host, all I did was to ensure that the tnsnames.ora file has identical contents on both sides. I have listed the database configuration for later reference here:

DGMGRL> show database verbose 'NCDB';

Database - NCDB

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    NCDB

  Properties:
    DGConnectIdentifier             = 'NCDB'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'ASYNC'
    RedoRoutes                      = ''
    DelayMins                       = '0'
    Binding                         = 'optional'
    MaxFailure                      = '0'
    MaxConnections                  = '1'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    LogShipping                     = 'ON'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyLagThreshold               = '0'
    TransportLagThreshold           = '0'
    TransportDisconnectedThreshold  = '30'
    ApplyParallel                   = 'AUTO'
    StandbyFileManagement           = 'AUTO'
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '4'
    LogArchiveMinSucceedDest        = '1'
    DbFileNameConvert               = ''
    LogFileNameConvert              = ''
    FastStartFailoverTarget         = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    SendQEntries                    = '(monitor)'
    LogXptStatus                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=server1.example.com)
           (PORT=1521))(CONNECT_DATA=(SERVICE_NAME=NCDB_DGMGRL)
           (INSTANCE_NAME=NCDB)(SERVER=DEDICATED)))'
    StandbyArchiveLocation          = 'USE_DB_RECOVERY_FILE_DEST'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = '%t_%s_%r.dbf'
    TopWaitEvents                   = '(monitor)'

Database Status:
SUCCESS

DGMGRL> show database verbose 'STDBY';

Database - STDBY

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 5.00 KByte/s
  Active Apply Rate:  0 Byte/s
  Maximum Apply Rate: 0 Byte/s
  Real Time Query:    OFF
  Instance(s):
    STDBY

  Properties:
    DGConnectIdentifier             = 'STDBY'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'ASYNC'
    RedoRoutes                      = ''
    DelayMins                       = '0'
    Binding                         = 'optional'
    MaxFailure                      = '0'
    MaxConnections                  = '1'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    LogShipping                     = 'ON'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyLagThreshold               = '0'
    TransportLagThreshold           = '0'
    TransportDisconnectedThreshold  = '30'
    ApplyParallel                   = 'AUTO'
    StandbyFileManagement           = 'AUTO'
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '4'
    LogArchiveMinSucceedDest        = '1'
    DbFileNameConvert               = ''
    LogFileNameConvert              = ''
    FastStartFailoverTarget         = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    SendQEntries                    = '(monitor)'
    LogXptStatus                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.52)
          (PORT=1521))(CONNECT_DATA=(SERVICE_NAME=STDBY_DGMGRL)(INSTANCE_NAME=STDBY)
          (SERVER=DEDICATED)))'
    StandbyArchiveLocation          = 'USE_DB_RECOVERY_FILE_DEST'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = '%t_%s_%r.dbf'
    TopWaitEvents                   = '(monitor)'

Database Status:
SUCCESS

DGMGRL> show configuration verbose;

Configuration - brokertest

  Protection Mode: MaxPerformance
  Members:
  NCDB  - Primary database
    STDBY - Physical standby database 

  Properties:
    FastStartFailoverThreshold      = '30'
    OperationTimeout                = '30'
    TraceLevel                      = 'USER'
    FastStartFailoverLagLimit       = '30'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '0'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'CONTINUE'

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

Now the big question is: will this allow me to switch over? For the record I am still using my SSH connection (in screen!) on server1, but am now connected to the standby database in dgmgrl.

DGMGRL> validate database 'STDBY'

  Database Role:     Physical standby database
  Primary Database:  NCDB

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Flashback Database Status:
    NCDB:   Off
    STDBY:  Off

  Current Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (NCDB)                  (STDBY)
    1         3                       2                       Insufficient SRLs

  Future Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (STDBY)                 (NCDB)
    1         3                       0                       Insufficient SRLs
    Warning: standby redo logs not configured for thread 1 on NCDB

DGMGRL> switchover to 'STDBY';
Performing switchover NOW, please wait...
New primary database "STDBY" is opening...
Oracle Clusterware is restarting database "NCDB" ...
Switchover succeeded, new primary is "STDBY"
DGMGRL> show configuration

Configuration - brokertest

  Protection Mode: MaxPerformance
  Members:
  STDBY - Primary database
    NCDB  - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 13 seconds ago)

DGMGRL>

Well that seems to have worked!

It seems straight forward though: the SSH connection was made to the future standby host, which could have helped with Oracle Restart restarting the database. What if I try to switch back, but this time connect to server1 (currently in the standby role) and issue the switchover command? Here is the output:

[oracle@server1 ~]$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys@ncdb
Password:
Connected as SYSDBA.
DGMGRL> show configuration

Configuration - brokertest

  Protection Mode: MaxPerformance
  Members:
  STDBY - Primary database
    NCDB  - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 2 seconds ago)

DGMGRL> validate database 'NCDB';

  Database Role:     Physical standby database
  Primary Database:  STDBY

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Flashback Database Status:
    STDBY:  Off
    NCDB:   Off

  Current Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status       
              (STDBY)                 (NCDB)                               
    1         3                       2                       Insufficient SRLs

  Future Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status       
              (NCDB)                  (STDBY)                              
    1         3                       2                       Insufficient SRLs

Ready to switch over:

DGMGRL> switchover to 'NCDB'
Performing switchover NOW, please wait...
New primary database "NCDB" is opening...
Oracle Clusterware is restarting database "STDBY" ...
Switchover succeeded, new primary is "NCDB"
DGMGRL> show configuration

Configuration - brokertest

  Protection Mode: MaxPerformance
  Members:
  NCDB  - Primary database
    STDBY - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 54 seconds ago)

DGMGRL> 

OK, that worked too-very nice. It’s safe to say that in my lab conditions restarting of databases works pretty well.

Little things worth knowing: Is there a penalty in establishing a connection to Oracle using the MAA connection string?

Sorry for the long title!

I had a question during my session about “advanced RAC programming features” during the last Paris Oracle Meetup about the MAA connection string. I showed an example taken from the Appication Continuity White Paper (http://www.oracle.com/technetwork/database/options/clustering/application-continuity-wp-12c-1966213.pdf). Someone from the audience asked me if I had experienced any problems with it, such as very slow connection timeouts. I haven’t, but wanted to double-check anyway. This is a simplified test using a sqlplus connection since it is easier to time than a call to a connection pool creation. If you know of a way to reliably do so in Java/UCP let me know and I’ll test it.

My system is 12.1.0.2 on Oracle Linux 7.1 with UEK (3.8.13-55.1.6.el7uek.x86_64) and the following patches on the RDBMS and Grid Infrastructure side:

[oracle@rac12sby1 ~]$ opatch lspatches -oh /u01/app/12.1.0.2/grid
19872484;WLM Patch Set Update: 12.1.0.2.2 (19872484)
19769480;Database Patch Set Update : 12.1.0.2.2 (19769480)
19769479;OCW Patch Set Update : 12.1.0.2.2 (19769479)
19769473;ACFS Patch Set Update : 12.1.0.2.2 (19769473)

[oracle@rac12sby1 ~]$ opatch lspatches -oh /u01/app/oracle/product/12.1.0.2/dbhome_1
19877336;Database PSU 12.1.0.2.2, Oracle JavaVM Component (Jan2015)
19769480;Database Patch Set Update : 12.1.0.2.2 (19769480)
19769479;OCW Patch Set Update : 12.1.0.2.2 (19769479)

This is the January System Patch by the way, 20132450. At first I wanted to blog about the patch application but it was so uneventful I decided against it.

I have two clusters, rac12pri and rac12sby. Both consist of 2 nodes each. Database CDB is located on rac12pri, STDBY on rac12sby. Normally I am against naming databases by function as you might end up using STDBY in primary role after a data centre migration, and people tend to find the use of STDBY as primary database odd. In this case I hope it helps understanding the concept better, and it’s a lab environment anyway …

Creating the broker configuration

There is nothing special about creating the Broker Configuration, just remember to define the broker configuration files on shared storage and enabling automatic standby file management. I also recommend standby redo logs on both the primary and standby databases. Once you have the configuration in place, check the database(s) in verbose mode to get the broker connection string. You can copy/paste the connection string to sqlplus to ensure that every instance can be started thanks to a static listener registration (Data Guard will restart databases during switchover and failover operations, which is bound to fail unless the databases are statically registered with the listener). Here is what I mean:

DGMGRL> show instance verbose 'CDB1';

Instance 'CDB1' of database 'CDB'

  Host Name: rac12pri1
  PFILE:     
  Properties:
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.108)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=CDB_DGMGRL)(INSTANCE_NAME=CDB1)(SERVER=DEDICATED)))'
    StandbyArchiveLocation          = 'USE_DB_RECOVERY_FILE_DEST'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = '%t_%s_%r.dbf'
    TopWaitEvents                   = '(monitor)'

Instance Status:
SUCCESS

Now I can take the static connection identifier and use it (be sure to specify the “as sysdba” at the end):

[oracle@rac12pri1 ~]$ sqlplus 

SQL*Plus: Release 12.1.0.2.0 Production on Sat Apr 18 10:50:09 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter user-name: sys@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.108)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=CDB_DGMGRL)(INSTANCE_NAME=CDB1)(SERVER=DEDICATED))) as sysdba
Enter password: 

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL>

When this worked for all instances in the configuration you are a big step further in your Data Guard setup!

Service setup

I have created a new service, named rootsrv on both databases.

[oracle@rac12sby1 ~]$ srvctl add service -d STDBY -s rootsrv -role primary -policy automatic -preferred STDBY1,STDBY2

[oracle@rac12pri1 ~]$ srvctl add service -d CDB -s rootsrv -role primary -policy automatic -preferred CDB1,CDB2

To better follow along here is the current status of the configuration (In real life you might want to use a different protection mode):

DGMGRL> show configuration

Configuration - martin

  Protection Mode: MaxPerformance
  Members:
  STDBY - Primary database
    CDB   - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 28 seconds ago)

As per its definition, the service is started on the primary but not on the standby. In fact, I can’t start it on the standby:

[oracle@rac12sby1 ~]$ srvctl status service -d STDBY -s rootsrv
Service rootsrv is running on instance(s) STDBY1,STDBY2

[oracle@rac12pri1 ~]$ srvctl status service -d CDB -s rootsrv
Service rootsrv is not running.
[oracle@rac12pri1 ~]$ srvctl start service -d CDB -s rootsrv
PRCD-1084 : Failed to start service rootsrv
PRCR-1079 : Failed to start resource ora.cdb.rootsrv.svc
CRS-2800: Cannot start resource 'ora.cdb.db' as it is already in the INTERMEDIATE state on server 'rac12pri1'
CRS-2632: There are no more servers to try to place resource 'ora.cdb.rootsrv.svc' on that would satisfy its placement policy
CRS-2800: Cannot start resource 'ora.cdb.db' as it is already in the INTERMEDIATE state on server 'rac12pri2'
[oracle@rac12pri1 ~]$

So no more need for database triggers to start and stop services (this was another question I had during my talk) depending on the database’s role.

The MAA connection string

The MAA connection string as taken from the white paper and slightly adapted is this one:

MAA_TEST1 =
  (DESCRIPTION_LIST=
     (LOAD_BALANCE=off)(FAILOVER=on)
       (DESCRIPTION=
         (CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)
           (ADDRESS_LIST= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST=rac12pri-scan)(PORT=1521)))
           (CONNECT_DATA=(SERVICE_NAME=rootsrv)))
       (DESCRIPTION= (CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)
           (ADDRESS_LIST= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST= rac12sby-scan)(PORT=1521)))
           (CONNECT_DATA=(SERVICE_NAME=rootsrv)))
  )

Please refer to the white paper for an explanation of the various parameters.

Notice that both the primary and standby SCAN are referenced in there, both connecting to “rootsrv” which is currently active on STDBY. To get some proper timing about the connection delay I use the following little snippet:


$ time sqlplus system/secret@maa_test1 <<EOF
> select sys_context('userenv','instance_name') from dual;
> exit
> EOF

Testing on the “primary cluster” first, the local database is operating in the standby role:

[oracle@rac12pri1 ~]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Sat Apr 18 10:41:25 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sat Apr 18 2015 10:33:53 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL>
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
STDBY2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real    0m0.432s
user    0m0.026s
sys     0m0.028s

[oracle@rac12pri1 ~]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Sat Apr 18 10:31:25 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sat Apr 18 2015 10:31:24 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL>
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
STDBY2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real    0m0.415s
user    0m0.023s
sys     0m0.025s

That was quick-not even a second. And as you can see the connection is set up against database STDBY on the other cluster.

Next on the standby cluster:

[oracle@rac12sby1 ~]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Sat Apr 18 10:33:26 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sat Apr 18 2015 10:33:22 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL>
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
STDBY2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real    0m0.613s
user    0m0.025s
sys     0m0.019s

[oracle@rac12sby1 ~]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Sat Apr 18 10:33:52 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sat Apr 18 2015 10:33:50 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL>
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
STDBY2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real    0m0.199s
user    0m0.024s
sys     0m0.021s

So this was quick too. And finally on a different machine altogether (my lab server)

[oracle@lab tns]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv', 'instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Sat Apr 18 08:13:50 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sat Apr 18 2015 08:13:47 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> 
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
STDBY2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real	0m0.387s
user	0m0.020s
sys	0m0.018s

Does it change after a role reversal? I think I’ll cover that in a different post …

Appendix: setting up Data Guard for RAC

This is something I always forget so it’s here for reference. I use an identical tnsnames.ora file across all nodes, here it is:

CDB =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac12pri-scan)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = CDB)
    )
  ) 

STDBY =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac12sby-scan)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = STDBY)
    )
  ) 

# used for DG and RMAN duplicate
STDBY_NON_SCAN =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12sby1-vip)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12sby2-vip)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = STDBY)
    )
  )

CDB_NON_SCAN =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12pri1-vip)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12pri2-vip)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = CDB)
    )
  )

The non-scan TNS entries are a quick workaround to the problem that I registered the SIDs statically with the node listeners only, not the SCAN listeners. If there is a more elegant way of statically registering a database, please let me know! I couldn’t find any documentation on how to do this. I guess adding a SID_LIST_LISTENER_SCANx would do, too. Anyway, here is an example for the local node listener configuration (not the complete file contents). The ORACLE_SID and host names must be adapted for each node in the cluster.

LISTENER =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = IPC)(KEY = LISTENER))
    )
  )

# for broker
SID_LIST_LISTENER=
  (SID_LIST=
    (SID_DESC=
      (GLOBAL_DBNAME=STDBY)
      (ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1)
      (SID_NAME=STDBY1))
    (SID_DESC=
      (GLOBAL_DBNAME=STDBY_DGMGRL)
      (ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1)
      (SID_NAME=STDBY1))
  )

After changing the listener configuration you need to reload the local node listener.

RAC 12c enhancements: adding an additional SCAN-part 4

This is going to be the last part of this series, however long it might end up being in the end. In the previous articles you read how to create a physical standby database from a RAC One database.

Networks (refresher)

To make it easier to follow without going back to the previous articles, here are the networks I’m using, listed for your convenience.

  • 192.168.100/24: Client network
  • 192.168.102/24: Dedicated Data Guard network

Data Guard Broker Configuration

I said it before and I say it again: I like the Data Guard broker interface for managing standby databases. It’s the ultimate simplification and so easy to implement that it’s quite safe to operate even in large organisations. As the added bonus you get OEM integration as well. OEM relies on a Broker configuration. The first step in managing the standby database therefore is to create the Data Guard configuration. And by the way, the documentation was correct, and the parameter “listener_networks” is already set! When the standby database starts you can see it being set by the Oracle agent process. The alert.log reads:

Starting background process DMON
DMON started with pid=34, OS id=11023
ORACLE_BASE from environment = /u01/app/oracle
Using default pga_aggregate_limit of 2048 MB
ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.58)(PORT=1521))' SCOPE=MEMORY SID='sby_2';
2014-04-28 05:11:47.874000 -04:00
ALTER SYSTEM SET remote_listener=' ron12csby-scan.example.com:1521' SCOPE=MEMORY SID='sby_2';
ALTER SYSTEM SET listener_networks='(( NAME=net2)(LOCAL_LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.102.58)
(PORT=1522)))))','((NAME=net2)(REMOTE_LISTENER=ron12csby-dgscan.dg.example.com:1521))' SCOPE=MEMORY SID='sby_2';
2014-04-28 05:12:45.107000 -04:00
Decreasing number of real time LMS from 1 to 0

So no more need to worry about local listener (unless you have many local listeners and are not on 12c), remote_listener and listener_networks. That’s a big relief, to see that it does work. I can now create the configuration. I like the broker command line interface more than the OEM interface, especially since it is a lot easier to show here in the post.

Before you can work with the broker you need to enabled it. Set the following init.ora parameters to values similar to these below. It is important that these values are set identical on all instances. You also need to ensure the broker configuration files are in ASM.

SQL> show parameter dg_broker

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
dg_broker_config_file1               string      +DATA/ron/d1.dat
dg_broker_config_file2               string      +RECO/ron/d2.dat
dg_broker_start                      boolean     TRUE
SQL>

Make sure to make the settings on both clusters! Now connect using the command line interface and create the configuration. Note that I’m connecting using ronprinet2, which is the TNS name for the primary database using the Broker Network (192.168.102/24)

[oracle@ron12cprinode1 ~]$ dgmgrl sys/xxx@ronprinet2
DGMGRL for Linux: Version 12.1.0.1.0 - 64bit Production

Copyright (c) 2000, 2012, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected as SYSDBA.
DGMGRL> help create configuraiton

Creates a broker configuration

Syntax:

  CREATE CONFIGURATION <configuration name> AS
    PRIMARY DATABASE IS <database name>
    CONNECT IDENTIFIER IS <connect identifier>;

DGMGRL> create configuration dgtest as primary database is "RON" connect identifier is "ronprinet2";
Configuration "dgtest" created with primary database "RON"

DGMGRL> show database verbose "RON"

Database - RON

  Role:              PRIMARY
  Intended State:    OFFLINE
  Instance(s):
    pri_2

  Properties:
    DGConnectIdentifier             = 'ronprinet2'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'ASYNC'
    RedoRoutes                      = ''
    DelayMins                       = '0'
    Binding                         = 'optional'
    MaxFailure                      = '0'
    MaxConnections                  = '1'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    LogShipping                     = 'ON'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyLagThreshold               = '0'
    TransportLagThreshold           = '0'
    TransportDisconnectedThreshold  = '30'
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '4'
    LogArchiveMinSucceedDest        = '1'
    FastStartFailoverTarget         = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    SendQEntries                    = '(monitor)'
    LogXptStatus                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.51)
(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=ron_DGMGRL)(INSTANCE_NAME=pri_2)(SERVER=DEDICATED)))'
    StandbyArchiveLocation          = 'USE_DB_RECOVERY_FILE_DEST'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = '%t_%s_%r.dbf'
    TopWaitEvents                   = '(monitor)'

Database Status:
DISABLED

DGMGRL>

A few more things worth mentioning here: first of all the database does not know about the other instance. That’s not to worry about: you will see how this is addressed during the online relocation. The StaticConnectIdentifier is also pointing to the wrong network. Let’s have a look at the only instance:

DGMGRL> show instance verbose pri_2

Instance 'pri_2' of database 'RON'

  Host Name: ron12cprinode1.example.com
  PFILE:
  Properties:
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.51)(PORT=1521))
(CONNECT_DATA=(SERVICE_NAME=ron_DGMGRL)(INSTANCE_NAME=pri_2)(SERVER=DEDICATED)))'
    StandbyArchiveLocation          = 'USE_DB_RECOVERY_FILE_DEST'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = '%t_%s_%r.dbf'
    TopWaitEvents                   = '(monitor)'

Instance Status:
DISABLED

Unsurprisingly the StaticConnectIdentifier is pointing to the wrong interface again. Let’s correct this.

DGMGRL> edit instance pri_2 set property StaticConnectIdentifier=
> '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.102.51)(PORT=1522))
> (CONNECT_DATA=(SERVICE_NAME=ron_DGMGRL)(INSTANCE_NAME=pri_2)(SERVER=DEDICATED)))';
Property "staticconnectidentifier" updated

This command updated the setting on the database and instance level. OK, let’s move on. I need to add the standby database.

DGMGRL> add database "RONDG" AS CONNECT IDENTIFIER IS ronsbynet2 MAINTAINED AS PHYSICAL;
Database "RONDG" added

DGMGRL> show database verbose "RONDG"

Database - RONDG

  Role:              PHYSICAL STANDBY
  Intended State:    OFFLINE
  Transport Lag:     (unknown)
  Apply Lag:         (unknown)
  Apply Rate:        (unknown)
  Real Time Query:   OFF
  Instance(s):
    sby_2

  Properties:
    DGConnectIdentifier             = 'ronsbynet2'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'ASYNC'
    RedoRoutes                      = ''
    DelayMins                       = '0'
    Binding                         = 'optional'
    MaxFailure                      = '0'
    MaxConnections                  = '1'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    LogShipping                     = 'ON'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyLagThreshold               = '0'
    TransportLagThreshold           = '0'
    TransportDisconnectedThreshold  = '30'
    ApplyParallel                   = 'AUTO'
    StandbyFileManagement           = 'AUTO'
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '4'
    LogArchiveMinSucceedDest        = '1'
    DbFileNameConvert               = ''
    LogFileNameConvert              = ''
    FastStartFailoverTarget         = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    SendQEntries                    = '(monitor)'
    LogXptStatus                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.58)
(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rondg_DGMGRL)(INSTANCE_NAME=sby_2)(SERVER=DEDICATED)))'
    StandbyArchiveLocation          = 'USE_DB_RECOVERY_FILE_DEST'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = '%t_%s_%r.dbf'
    TopWaitEvents                   = '(monitor)'

Database Status:
DISABLED

DGMGRL> show instance verbose sby_2

Instance 'sby_2' of database 'RONDG'

  Host Name: ron12csbynode1.example.com
  PFILE:
  Properties:
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.58)(PORT=1521))
(CONNECT_DATA=(SERVICE_NAME=rondg_DGMGRL)(INSTANCE_NAME=sby_2)(SERVER=DEDICATED)))'
    StandbyArchiveLocation          = 'USE_DB_RECOVERY_FILE_DEST'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = '%t_%s_%r.dbf'
    TopWaitEvents                   = '(monitor)'

Instance Status:
DISABLED

Again let’s fix the static connection identifier.

DGMGRL> edit instance sby_2 set property StaticConnectIdentifier =
> '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.102.58)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=rondg_DGMGRL)
(INSTANCE_NAME=sby_2)(SERVER=DEDICATED)))';
Property "staticconnectidentifier" updated

Now everything is in place, I can enable the configuration.

DGMGRL> enable configuration
Enabled.
DGMGRL> show configuration

Configuration - dgtest

  Protection Mode: MaxPerformance
  Databases:
  RON   - Primary database
    RONDG - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

DGMGRL>

If you haven’t set the standby file management property to auto yet, please do so. I really don’t like MRP0 aborting because a datafile UNNAMED0000010 has been added. This causes confusion that’s better avoided.

If you get errors about missing standby redo logs (SRLs), please add them. I have dones so on the standby and primary, but didn’t include the output here.

Switchover

That should be all that’s needed in preparation. Since I’m impatient I wanted to see if the switchover works. First I check the configuration:

DGMGRL> show configuration

Configuration - dgtest

  Protection Mode: MaxPerformance
  Databases:
  RON   - Primary database
    RONDG - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

Success means: ready to switch over. Let’s try:

DGMGRL> switchover to "RONDG"
Performing switchover NOW, please wait...
Operation requires a connection to instance "sby_2" on database "RONDG"
Connecting to instance "sby_2"...
Connected as SYSDBA.
New primary database "RONDG" is opening...
Operation requires startup of instance "pri_2" on database "RON"
Starting instance "pri_2"...
ORACLE instance started.
Database mounted.
Switchover succeeded, new primary is "RONDG"
DGMGRL> show configuration

Configuration - dgtest

  Protection Mode: MaxPerformance
  Databases:
  RONDG - Primary database
    RON   - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

Great so that worked. It was important that the StaticConnectIdentifiers are set correctly. During testing I found that relying on DGConnectIdentifier did not work, when it pointed to the second SCAN listener. The SCAN listeners do not “know” about the statically registered %_DGMGRL services, and you receive the dreaded “listener does not know about service name in connect descriptor” error. I also had to pick a listener for the StaticConnectIdentifier. I wanted to have both instances as potential targets for starting the instance, but that failed too. So in a way I map node 1 to instance sby_2 by pointing the broker to its listener.

Online Relocation

Now on to part 2: what happens to the standby when I perform an online relocation on the primary. Let’s try.

[oracle@ron12csbynode1 admin]$ srvctl status database -d rondg
Instance sby_2 is running on node ron12csbynode1
Online relocation: INACTIVE
[oracle@ron12csbynode1 admin]$ srvctl status service -d rondg
Service ron12c is running on instance(s) sby_2

[oracle@ron12csbynode1 admin]$ srvctl relocate database -d rondg -timeout 2 -verbose -node ron12csbynode2
Configuration updated to two instances
Instance sby_1 started
Services relocated
Waiting for up to 2 minutes for instance sby_2 to stop ...
Instance sby_2 stopped
Configuration updated to one instance
[oracle@ron12csbynode1 admin]$

In the meantime I connected to the primary (while it was relocating) and switched a few logs. This works, as you can see (RON is the standby database now)

2014-05-23 10:30:16.943000 -04:00
Recovery of Online Redo Log: Thread 1 Group 6 Seq 119 Reading mem 0
  Mem# 0: +DATA/RON/ONLINELOG/group_6.280.846066567
  Mem# 1: +RECO/RON/ONLINELOG/group_6.330.846066593
2014-05-23 10:30:42.897000 -04:00
Archived Log entry 226 added for thread 1 sequence 119 ID 0x64c27dbf dest 1:
2014-05-23 10:30:49.482000 -04:00
Media Recovery Waiting for thread 1 sequence 120 (in transit)
2014-05-23 10:30:52.043000 -04:00
Primary database is in MAXIMUM PERFORMANCE mode
Re-archiving standby log 8 thread 2 sequence 97
2014-05-23 10:31:02.879000 -04:00
Recovery of Online Redo Log: Thread 1 Group 5 Seq 120 Reading mem 0
  Mem# 0: +DATA/RON/ONLINELOG/group_5.279.846066497
  Mem# 1: +RECO/RON/ONLINELOG/group_5.329.846066535
2014-05-23 10:31:12.400000 -04:00
RFS[8]: Assigned to RFS process (PID:13182)
RFS[8]: Selected log 9 for thread 2 sequence 98 dbid 1681008808 branch 838874793
2014-05-23 10:31:14.987000 -04:00
Archived Log entry 227 added for thread 2 sequence 97 ID 0x64c27dbf dest 1:
2014-05-23 10:31:50.408000 -04:00
Media Recovery Waiting for thread 2 sequence 98 (in transit)

2014-05-23 10:32:02.193000 -04:00
RFS[8]: Selected log 8 for thread 2 sequence 99 dbid 1681008808 branch 838874793
2014-05-23 10:32:05.872000 -04:00
Recovery of Online Redo Log: Thread 2 Group 9 Seq 98 Reading mem 0
  Mem# 0: +DATA/RON/ONLINELOG/group_9.283.846066895
  Mem# 1: +RECO/RON/ONLINELOG/group_9.333.846066947

The logs are from 2 threads-the relocation temporarily changes the single instance database to a cluster database by starting instances on both hosts.

Eventually instance sby_2 stops and causes TNS errors in the standby’s alert.log:

***********************************************************************

Fatal NI connect error 12528, connecting to:
 (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ron12csby-dgscan.dg.example.com)(PORT=1521))
(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=RONDG_DGB)(INSTANCE_NAME=sby_2)(CID=(PROGRAM=oracle)
(HOST=ron12cprinode1.example.com)(USER=oracle))))

  VERSION INFORMATION:
TNS for Linux: Version 12.1.0.1.0 - Production
Oracle Bequeath NT Protocol Adapter for Linux: Version 12.1.0.1.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 12.1.0.1.0 - Production
  Time: 23-MAY-2014 10:34:25
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12564
...

Nothing to be alarmed about, this is normal, and shown in the output of the relocate command.

The good news is that the primary receives more redo:

2014-05-23 10:35:40.756000 -04:00
RFS[9]: Assigned to RFS process (PID:13329)
RFS[9]: Selected log 9 for thread 2 sequence 100 dbid 1681008808 branch 838874793
2014-05-23 10:36:08.256000 -04:00
Archived Log entry 231 added for thread 2 sequence 100 ID 0x64c27dbf dest 1:
2014-05-23 10:36:14.754000 -04:00
Media Recovery Waiting for thread 2 sequence 101 (in transit)
2014-05-23 10:36:15.972000 -04:00
RFS[7]: Opened log for thread 2 sequence 101 dbid 1681008808 branch 838874793
2014-05-23 10:36:20.162000 -04:00
Archived Log entry 232 added for thread 2 sequence 101 rlc 838874793 ID 0x64c27dbf dest 2:
2014-05-23 10:36:29.656000 -04:00
Media Recovery Log +RECO/RON/ARCHIVELOG/2014_05_23/thread_2_seq_101.393.848313375

And after instance 2 is shut down, you can see redo from thread 1 being transferred.

2014-05-23 10:46:58.257000 -04:00
RFS[6]: Selected log 5 for thread 1 sequence 122 dbid 1681008808 branch 838874793
Archived Log entry 233 added for thread 1 sequence 121 ID 0x64c27dbf dest 1:
2014-05-23 10:47:00.479000 -04:00
Media Recovery Waiting for thread 1 sequence 122 (in transit)
2014-05-23 10:47:03.686000 -04:00
Recovery of Online Redo Log: Thread 1 Group 5 Seq 122 Reading mem 0
  Mem# 0: +DATA/RON/ONLINELOG/group_5.279.846066497
  Mem# 1: +RECO/RON/ONLINELOG/group_5.329.846066535

If you are unsure about the thread to instance mapping, check v$thread, this example is from the primary:

SQL> select thread#, status, enabled, instance from v$thread;

   THREAD# STATUS ENABLED  INSTANCE
---------- ------ -------- ------------------------------
         1 OPEN   PUBLIC   sby_1
         2 CLOSED PUBLIC   sby_2

In summary my test showed that an online relocation does not seem to cause trouble for the standby database. I need to run a benchmark like Swingbench against it to see how the online relocation behaves when the system is under load though, the database was pretty much idle during the online relocation.

Amending the Broker configuration

Circling back to the introduction: what happens to the DG broker configuration after the database has been a RAC database (albeit briefly)? Here is the output from the configuration after the first relocation:

DGMGRL> show database verbose "RONDG"

Database - RONDG

  Role:              PRIMARY
  Intended State:    TRANSPORT-ON
  Instance(s):
    sby_1
    sby_2

  Properties:
    DGConnectIdentifier             = 'ronsbynet2'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'ASYNC'
    RedoRoutes                      = ''
    DelayMins                       = '0'
    Binding                         = 'optional'
    MaxFailure                      = '0'
    MaxConnections                  = '1'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    LogShipping                     = 'ON'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyLagThreshold               = '0'
    TransportLagThreshold           = '0'
    TransportDisconnectedThreshold  = '30'
    ApplyParallel                   = 'AUTO'
    StandbyFileManagement           = 'AUTO'
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '4'
    LogArchiveMinSucceedDest        = '1'
    DbFileNameConvert               = ''
    LogFileNameConvert              = ''
    FastStartFailoverTarget         = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    SendQEntries                    = '(monitor)'
    LogXptStatus                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    StaticConnectIdentifier(*)
    StandbyArchiveLocation(*)
    AlternateLocation(*)
    LogArchiveTrace(*)
    LogArchiveFormat(*)
    TopWaitEvents(*)
    (*) - Please check specific instance for the property value

Database Status:
SUCCESS

Notice how the second instance (sby_1) has been added. The first question I had: what’s it done to the connection identifiers? As expected it’s set to the wrong network for the newly added instance:

DGMGRL> show instance verbose sby_1;

Instance 'sby_1' of database 'RONDG'

  Host Name: ron12csbynode2.example.com
  PFILE:
  Properties:
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.60)(PORT=1521))
(CONNECT_DATA=(SERVICE_NAME=rondg_DGMGRL)(INSTANCE_NAME=sby_1)(SERVER=DEDICATED)))'
    StandbyArchiveLocation          = 'USE_DB_RECOVERY_FILE_DEST'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = '%t_%s_%r.dbf'
    TopWaitEvents                   = '(monitor)'

Instance Status:
SUCCESS

So that needs to be changed as well. It would make sense to systematically perform a relocation as part of the build of the Data Guard environment on primary and standby to register all instances and update them. Not that it’s a big issue, the switchover is most likely not going to fail, but you have to manually start the database if the connection information is wrong.

Summary

It’s been quite a journey! This article series explained how to add a second SCAN to the environment including all the other infrastructure such as VIPs and listeners. You then saw how to duplicate a database for Data Guard and finally