Martins Blog

Trying to explain complex things in simple terms

RAC 12c enhancements: adding an additional SCAN-part 1

Posted by Martin Bach on April 22, 2014

Based on customer request Oracle has added the functionality to add a second SCAN, completely independent of the SCAN defined/created during the cluster creation. Why would you want to use this feature? A few reasons that spring to mind are:

  • Consolidation: customers insist on using a different network
  • Separate network for Data Guard traffic

To demonstrate the concept I am going to show you in this blog post how I

  1. Add a new network resource
  2. Create new VIPs
  3. Add a new SCAN
  4. Add a new SCAN listener

It actually sounds more complex than it is, but I have a feeling I need to split this article in multiple parts as it’s far too long.

The lab setup

When you install RAC 11.2 and 12.1 you are prompted to specify a Single Client Access Name, or SCAN. This SCAN is usually defined in the corporate DNS server and resolves to 3 IP addresses. This allows for an easy way to implement client-side load balancing. The SCAN is explained in more detail in Pro Oracle Database 11g RAC on Linux for 11.2 and on OTN for 11.2 and 12.1. To spice the whole configuration up a little bit I decided to use RAC One Node on the clusters I am using for this demonstration.

I created 2 12.1.0.1.2 clusters for this Data Guard test. Hosts ron12cprinode1 and ron12cprinode2 form the primary cluster, ron12csbynode1 and ron12csbynode2 will form the standby cluster. The RAC One Node database is named RON:

[oracle@ron12cprinode1 ~]$ srvctl config database -db ron
Database unique name: ron
Database name: ron
Oracle home: /u01/app/oracle/product/12.1.0.1/dbhome_1
Oracle user: oracle
Spfile: +DATA/ron/spfilepri.ora
Password file: +DATA/ron/orapwpri
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: ron
Database instances:
Disk Groups: RECO
Mount point paths:
Services: ron12c
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: pri
Candidate servers: ron12cprinode1,ron12cprinode2
Database is administrator managed
[oracle@ron12cprinode1 ~]$

To make things even more interesting I defined my ORACLE_SID prefix on the primary to be “pri” and “sby” on the standby.

[oracle@ron12cprinode1 ~]$ ps -ef | grep smon
oracle    2553     1  0 Feb06 ?        00:00:09 asm_smon_+ASM1
oracle   15660 15578  0 05:05 pts/3    00:00:00 grep smon
oracle   28241     1  0 Feb07 ?        00:00:18 ora_smon_pri_1
[oracle@ron12cprinode1 ~]$

A quick check with gpnptool reveals the network usage before the addition of the second SCAN:

<gpnp:Network-Profile>
 <gpnp:HostNetwork id="gen" HostName="*">
  <gpnp:Network id="net1" IP="192.168.100.0" Adapter="eth0" Use="public"/>
  <gpnp:Network id="net2" IP="192.168.101.0" Adapter="eth1" Use="cluster_interconnect"/>
 </gpnp:HostNetwork>
</gpnp:Network-Profile>

There is the default network, (“netnum 1″) that is created on the network defined as “public” during the installation. I have another spare network port (eth2) reserved for the new network and Data Guard traffic. Currently network 1 is the only one available.

[root@ron12cprinode1 ~]# srvctl config network
Network 1 exists
Subnet IPv4: 192.168.100.0/255.255.255.0/eth0, static
Subnet IPv6:

As you can see RAC 12c now supports IPv6. I have another network available that I want to make available for Data Guard traffic. For this purpose I added all nodes into DNS. I am a bit old-fashioned when it comes to DNS, I am still using bind most of the time. Here is an excerpt of my reverse name resolution file:

; hosts - primary cluster
50	PTR	ron12cprinode1.dg.example.com.
51	PTR	ron12cprinode1-vip.example.com.
52	PTR	ron12cprinode2.dg.example.com.
53	PTR	ron12cprinode2-vip.dg.example.com.
; Data Guard SCAN - primary cluster
54	PTR	ron12cpri-scan.dg.example.com.
55	PTR	ron12cpri-scan.dg.example.com.
56	PTR	ron12cpri-scan.dg.example.com.

; hosts - standby cluster
57	PTR	ron12csbynode1.dg.example.com.
58	PTR	ron12csbynode1-vip.dg.example.com.
59	PTR	ron12csbynode2.dg.example.com.
60	PTR	ron12csbynode2-vip.dg.example.com.
; Data Guard SCAN - standby cluster
61	PTR	ron12csby-scan.dg.example.com.
62	PTR	ron12csby-scan.dg.example.com.
63	PTR	ron12csby-scan.dg.example.com.

The domain is *.dg.example.com, the primary database client traffic will be routed through *.example.com.

Adding the new network

The first step to be performed is to make Clusterware aware of the second network. I am doing this on both sides of the cluster. Notice that the primary nodes are called *pri* whereas the standby cluster is called *sby*

[root@ron12cprinode1 ~]# srvctl add network -netnum 2 -subnet 192.168.102.0/255.255.255.0/eth2 -nettype static -verbose
Successfully added Network.

[root@ron12csbynode1 ~]# srvctl add network -netnum 2 -subnet 192.168.102.0/255.255.255.0/eth2 -nettype static -verbose
Successfully added Network.

So this worked, now I have 2 networks:

[root@ron12cprinode1 ~]# srvctl config network
Network 1 exists
Subnet IPv4: 192.168.100.0/255.255.255.0/eth0, static
Subnet IPv6:
Network 2 exists
Subnet IPv4: 192.168.102.0/255.255.255.0/eth2, static
Subnet IPv6:
[root@ron12cprinode1 ~]#

In the next step I have to add VIPs for the new nodes on the *.dg.example.com subnet. The VIPs must be added on all cluster nodes, 4 in my case.

[oracle@ron12cprinode2 ~]$ srvctl add vip -h

Adds a VIP to the Oracle Clusterware.

Usage: srvctl add vip -node <node_name> -netnum <network_number> -address {<name>|<ip>}/<netmask>[/if1[|if2...]] [-skip] [-verbose]
    -node     <node_name>          Node name
    -address                       <vip_name|ip>/<netmask>[/if1[|if2...]] VIP address specification for node applications
    -netnum   <net_num>            Network number (default number is 1)
    -skip                          Skip reachability check of VIP address
    -verbose                       Verbose output
    -help                          Print usage
[oracle@ron12cprinode2 ~]$

So I did this on each node in my cluster

[root@ron12cprinode1 ~]# srvctl add vip -node ron12cprinode1 -netnum 2 -address 192.168.102.51/255.255.255.0/eth2 -verbose
Network exists: 2/192.168.102.0/255.255.255.0/eth2, type static
Successfully added VIP.
[root@ron12cprinode2 ~]# srvctl add vip -node ron12cprinode2 -netnum 2 -address 192.168.102.53/255.255.255.0/eth2 -verbose
Network exists: 2/192.168.102.0/255.255.255.0/eth2, type static
Successfully added VIP.

[root@ron12csbynode1 ~]# srvctl add vip -node ron12csbynode1 -netnum 2 -address 192.168.102.58/255.255.255.0/eth2 -verbose
Network exists: 2/192.168.102.0/255.255.255.0/eth2, type static
Successfully added VIP.
[root@ron12csbynode2 ~]# srvctl add vip -node ron12csbynode2 -netnum 2 -address 192.168.102.60/255.255.255.0/eth2 -verbose
Network exists: 2/192.168.102.0/255.255.255.0/eth2, type static
Successfully added VIP.

And I need to start the VIPs. They have some funny names as you can see in crsctl status resource (the names can’t be defined, see output of srvctl add scan -h above)

[root@ron12cprinode1 ~]# srvctl status vip -vip ron12cprinode1_2
VIP 192.168.102.51 is enabled
VIP 192.168.102.51 is not running
[root@ron12cprinode1 ~]# srvctl start vip -vip ron12cprinode1_2
[root@ron12cprinode1 ~]# srvctl start vip -vip ron12cprinode2_2
[root@ron12cprinode1 ~]# srvctl status vip -vip ron12cprinode1_2
VIP 192.168.102.51 is enabled
VIP 192.168.102.51 is running on node: ron12cprinode1
[root@ron12cprinode1 ~]#

Add the second SCAN

At this time you can add the second SCAN. The command syntax is shown here:

[oracle@ron12cprinode1 ~]$ srvctl add scan -h

Adds a SCAN VIP to the Oracle Clusterware.

Usage: srvctl add scan -scanname <scan_name> [-netnum <network_number>]
    -scanname <scan_name>          Domain name qualified SCAN name
    -netnum  <net_num>             Network number (default number is 1)
    -subnet                        <subnet>/<netmask>[/if1[|if2...]] NET address specification for network
    -help                          Print usage

Implemented on my first cluster node the command is easier to comprehend.

[root@ron12cprinode1 ~]# srvctl add scan -scanname ron12cpri-dgscan.dg.example.com -netnum 2

[root@ron12cprinode1 ~]# srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node ron12cprinode2
SCAN VIP scan2 is enabled
SCAN VIP scan2 is running on node ron12cprinode1
SCAN VIP scan3 is enabled
SCAN VIP scan3 is running on node ron12cprinode1

You need to create the SCAN on both clusters. On my primary cluster the SCAN has been created with the following configuration:

[root@ron12cprinode1 ~]# srvctl config scan -netnum 2
SCAN name: ron12cpri-dgscan.dg.example.com, Network: 2
Subnet IPv4: 192.168.102.0/255.255.255.0/eth2
Subnet IPv6:
SCAN 0 IPv4 VIP: 192.168.102.54
SCAN name: ron12cpri-dgscan.dg.example.com, Network: 2
Subnet IPv4: 192.168.102.0/255.255.255.0/eth2
Subnet IPv6:
SCAN 1 IPv4 VIP: 192.168.102.55
SCAN name: ron12cpri-dgscan.dg.example.com, Network: 2
Subnet IPv4: 192.168.102.0/255.255.255.0/eth2
Subnet IPv6:
SCAN 2 IPv4 VIP: 192.168.102.56
[root@ron12cprinode1 ~]#

You can see the new VIPs in the output of ifconfig, just as you would with the primary SCAN:

eth2      Link encap:Ethernet  HWaddr 52:54:00:FE:E2:D5
          inet addr:192.168.102.50  Bcast:192.168.102.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fefe:e2d5/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:523 errors:0 dropped:0 overruns:0 frame:0
          TX packets:339 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:118147 (115.3 KiB)  TX bytes:72869 (71.1 KiB)

eth2:1    Link encap:Ethernet  HWaddr 52:54:00:FE:E2:D5
          inet addr:192.168.102.55  Bcast:192.168.102.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

So there is nothing too surprising in the output, it’s exactly the same as before with the public SCAN created during the installation.

End of part 1

This already seems like a lot of text to me so I think it’s time to pause here. The next parts will demonstrate the addition of the SCAN listeners, the new node listeners on the *.dg.example.com network and finally the duplication of the primary RAC One Node database for use as a standby database.

Posted in 12c Release 1, KVM | Tagged: , | Leave a Comment »

How important is a Disaster Recovery site for you?

Posted by Martin Bach on April 17, 2014

I regularly read threads on the oracle-l mailing list, and occasionally feel very tempted to reply to one. Just recently I saw one that I liked a lot. It is specifically about using an Oracle Database Appliance (ODA) as a Disaster Recovery (DR) solution for an Exadata system. The Exadata configuration was not specified, I assume it was a smaller (eighth rack/quarter rack) configuration.

There were lots of arguments pro and against that Exadata->ODA architecture, and that leads to a broader question: how important is DR for your organisation? This blog post is about my personal experience, and probably strongly influenced by where I live in work (Europe), yours might be different.

About the original discussion

A number of replies stated that using an ODA as a DR solution is technically possible and acceptable if you don’t really plan to use it (and you have emails where you pointed out the shortcomings of that architecture).

The use of non-Exadata as a DR solution for Exadata is flawed in many points, and here is why:

Exadata, especially since 11.2.3.3.0, will make very elegant choices when it comes to caching data in the Smart Flash Cache. You will see very decent response times. You will also notice that smart scans will benefit from intelligent caching. This works so well that I had to rewrite some of my demos.

Smart scans and Smart Flash Cache do not exist outside the Exadata platform. Outside of Exadata you get no smart IO either, and no direct access to data compressed with Hybrid Columnar Compression.

The counter argument was that you might not need these anyway because these features are not used in your Exadata production system. But even if you don’t use HCC and smart scans in Exadata (I might ask you though-why?) you will feel the lack of the Smart Flash Cache, your ~1ms response times for many single block IOs might drop to ~6-8 ms most likely.

Others said if you have HCC compressed data then it has to be decompressed first before use (time/CPU intensive) and you might not have the space. If you have a 10x compression on a table that 100G table becomes 1TB. In the context of the ODA you might not even be able to decompress as disk space is limited compared to Exadata (even without HCC)

The broader discussion

The original poster had a very specific question about using an ODA (or other non-Exadata system) as a DR solution for an Exadata primary.

Abstracting from the original question I immediately thought about using non-identical hardware for DR. A common case I found in my career follows this pattern: the production system can’t cope with the workload anymore, so you get new kit. But since getting new kit is expensive, the budget owners (not DBAs!) decide to reuse the old production servers for DR.

I have seen this numerous times and always made sure I have an email trail where I said quite openly that this is not a Good Idea ™ and was overruled. I am a very cautious person.

So what would happen if you had to invoke DR? I haven’t seen this happen often (management often doesn’t hesitates to take that decision), but let’s assume you invoke DR on your old production servers. And guess what: the new hardware was so powerful that far less care was taken to ensure that code performs well-after all, the new hardware can deal with it. Except that the old hardware couldn’t in the first place, and that was the reason it was phased out. Oooops. DR that is inoperable is not really a solution.

How important is DR for you

This all boils down to the question “how important is DR for you”? I gathered that some users feel DR is just a tick in the box, and it isn’t really ever considered to be invoked.

In my opinion (and I formed that many years ago) DR is the only way to ensure business continuity for all but the smallest databases. When working with large data sets in the TB range it becomes quite unmanageable to fully restore that data and still meet the Recovery Time Objective (RTO). Modern technology gives us the opportunity to have an RTO of nearly 0 and critical applications surely need to meet this. In which case you can rule out a restore straight away.

But even if you had an hour to restore, it might be difficult to meet that target, depending on your backup strategy. Disk backups are great there but if the disks with your backups are gone, then you need to fall back to another restore method. Thirty minutes later it can become clear that at the current rate the 1 hour SLA can’t be met.

So you need to invoke DR, and the DR solution-again in my opinion-should perform just like production.

Final note

I should once more point out that the opinions listed here are mine. During every customer engagement I advocated the use of Data Guard or equivalent replication technology. I don’t believe a database restore is a viable option due to the time constraints around database operations, and sleep better knowing I have a working DR solution for my important systems.

Posted in Exadata, Oracle | Leave a Comment »

Duplicate from the standby instead from the primary in 12c

Posted by Martin Bach on April 2, 2014

This post is related to 12c and an active database duplication for a standby I did in my lab environment. I’d say although I first encountered it on 12c there is a chance you run into a similar situation with earlier releases too.

I would normally use ASM for all my databases to make my life easier but this time I had to be mindful of the available memory on the laptop-which at 8 GB-is not plenty. So I went with file system setup instead. After the initial preparations I was ready to launch the one-liner on the standby database:

RMAN> duplicate target database for standby from active database;

This worked away happily for a few moments only to come to an abrupt halt with the below error message. I have started the duplication process on the standby.

...
executing command: SET NEWNAME

executing command: SET NEWNAME

Starting backup at 01-APR-14
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile copy
input datafile file number=00012 name=/u01/oradata/CDB1/valid/sysaux01.dbf
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 04/01/2014 20:34:42
ORA-17628: Oracle error 19505 returned by remote Oracle server
ORA-19505: failed to identify file ""
channel ORA_DISK_1 disabled, job failed on it will be run on another channel
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 04/01/2014 20:34:42
RMAN-05501: aborting duplication of target database
RMAN-03015: error occurred in stored script Memory Script

RMAN-03009: failure of backup command on ORA_DISK_1 channel at 04/01/2014 20:34:42
ORA-17628: Oracle error 19505 returned by remote Oracle server
ORA-19505: failed to identify file ""

Huh? File 12 is part of a PDB named VALID. It was definitely available on the file system on the source, and I even had a backup on disk for it… I first thought it had to do with the PDBs all closed on the source datbase but that was not the case. After opening the PDBs read-write I still had the same error. Strange. I thought about debugging and tracing RMAN but then decided to first try the duplication from the primary. Which was a good idea, because the primary actually gave me a more meaningful error message:

Starting restore at 19.03.2014 10:17:08
using channel ORA_AUX_DISK_1

channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: using network backup set from service cdb1
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00004 to /u01/oradata/STDBY1/undotbs01.dbf
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:25
channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: using network backup set from service cdb1
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00005 to /u01/oradata/STDBY1/pdbseed/system01.dbf
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 03/19/2014 10:17:26
RMAN-05501: aborting duplication of target database
RMAN-03015: error occurred in stored script Memory Script
ORA-19849: error while reading backup piece from service cdb1
ORA-19504: failed to create file "/u01/oradata/STDBY1/pdbseed/system01.dbf"
ORA-27040: file create error, unable to create file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 1

So I forgot to create the needed directories on the standby. For consistency I created a subdirectory per PDB on the primary, which I forgot on the standby host. You can see this here:

RMAN> report schema;

Report of database schema for database with db_unique_name CDB1

List of Permanent Datafiles
===========================
File Size(MB) Tablespace           RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1    790      SYSTEM               ***     /u01/oradata/CDB1/system01.dbf
3    750      SYSAUX               ***     /u01/oradata/CDB1/sysaux01.dbf
4    90       UNDOTBS1             ***     /u01/oradata/CDB1/undotbs01.dbf
5    250      PDB$SEED:SYSTEM      ***     /u01/oradata/CDB1/pdbseed/system01.dbf
6    5        USERS                ***     /u01/oradata/CDB1/users01.dbf
7    590      PDB$SEED:SYSAUX      ***     /u01/oradata/CDB1/pdbseed/sysaux01.dbf
11   270      VALID:SYSTEM         ***     /u01/oradata/CDB1/valid/system01.dbf
12   610      VALID:SYSAUX         ***     /u01/oradata/CDB1/valid/sysaux01.dbf
13   260      RCAT:SYSTEM          ***     /u01/oradata/CDB1/rcat/system01.dbf
14   610      RCAT:SYSAUX          ***     /u01/oradata/CDB1/rcat/sysaux01.dbf
15   100      RCAT:RMAN_CATALOG    ***     /u01/oradata/CDB1/rcat/rman_catalog01.dbf
16   50       VALID:USERS          ***     /u01/oradata/CDB1/valid/users01.dbf
17   250      DGTEST:SYSTEM        ***     /u01/oradata/CDB1/dgtest/system01.dbf
18   590      DGTEST:SYSAUX        ***     /u01/oradata/CDB1/dgtest/sysaux01.dbf

List of Temporary Files
=======================
File Size(MB) Tablespace           Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
1    60       TEMP                 32767       /u01/oradata/CDB1/temp01.dbf
2    20       PDB$SEED:TEMP        32767       /u01/oradata/CDB1/pdbseed/pdbseed_temp01.dbf
4    20       VALID:TEMP           32767       /u01/oradata/CDB1/valid/pdbseed_temp01.dbf
5    20       RCAT:TEMP            32767       /u01/oradata/CDB1/rcat/pdbseed_temp01.dbf
6    20       DGTEST:TEMP          32767       /u01/oradata/CDB1/dgtest/dgtest_temp01.dbf

All right, after creating the directories I retried the duplication and it worked:

...
datafile 4 switched to datafile copy
input datafile copy RECID=5 STAMP=843771196 file name=/u01/oradata/STDBY1/undotbs01.dbf

datafile 5 switched to datafile copy
input datafile copy RECID=6 STAMP=843771198 file name=/u01/oradata/STDBY1/pdbseed/system01.dbf
datafile 6 switched to datafile copy
input datafile copy RECID=7 STAMP=843771198 file name=/u01/oradata/STDBY1/users01.dbf
datafile 7 switched to datafile copy
input datafile copy RECID=8 STAMP=843771199 file name=/u01/oradata/STDBY1/pdbseed/sysaux01.dbf
datafile 11 switched to datafile copy
input datafile copy RECID=9 STAMP=843771199 file name=/u01/oradata/STDBY1/valid/system01.dbf
datafile 12 switched to datafile copy
input datafile copy RECID=10 STAMP=843771199 file name=/u01/oradata/STDBY1/valid/sysaux01.dbf
datafile 13 switched to datafile copy
input datafile copy RECID=11 STAMP=843771199 file name=/u01/oradata/STDBY1/rcat/system01.dbf
datafile 14 switched to datafile copy
input datafile copy RECID=12 STAMP=843771200 file name=/u01/oradata/STDBY1/rcat/sysaux01.dbf
datafile 15 switched to datafile copy
input datafile copy RECID=13 STAMP=843771200 file name=/u01/oradata/STDBY1/rcat/rman_catalog01.dbf
datafile 16 switched to datafile copy
input datafile copy RECID=14 STAMP=843771200 file name=/u01/oradata/STDBY1/valid/users01.dbf
datafile 17 switched to datafile copy
input datafile copy RECID=15 STAMP=843771200 file name=/u01/oradata/STDBY1/dgtest/system01.dbf
datafile 18 switched to datafile copy
input datafile copy RECID=16 STAMP=843771200 file name=/u01/oradata/STDBY1/dgtest/sysaux01.dbf
Finished Duplicate Db at 19.03.2014 10:28:42

Summary

Although not a big problem and probably known by everyone besides me the error message on the standby shouldn’t be of the scary type ‘can’t identify datafile “”‘ followed by the process aborting. Instead something more meaningful would have been nice.

Posted in 12c Release 1, Linux, RMAN | Tagged: , | Leave a Comment »

Recovering a standby over the network in 12c

Posted by Martin Bach on March 17, 2014

Another one of the cool but underrated features in 12c is the possibility to recover a physical standby over the network with one line in RMAN.

Why do you need to perform this activity? Assume someone really clever created a segment “nologging” and the database was not in force logging mode. This operation cannot be replicated by redo apply on the standby, and you are bound to have a problem. Or, in my case, I had the standby shut down in my lab environment (intentionally) and created a few PDBs on my primary. For some reason I lost an archived redo log. This would of course not happen in a production environment, but my lab VM is limited when it comes to space and I may have moved my backup to a USB disk that I didn’t bring along.

So a long story short: you either recreate the standby-which is a rather blunt-force approach to the problem-or recover the standby from the SCN where the problem originated. I was hoping the new 12c recover database from service command in RMAN could help. It did, but not as I expected at first.

The problem with my lab

My environment is Oracle Linux 6.4 x86-64 on Virtual Box with 4 GB RAM. Oracle is 12.1.0.1.2. CDB1 is primary, CDB2 is my standby database. Both are on the same VM (I only have 8 GB total memory). CDB2 complained about a missing archived redo log:

SQL> select * from v$archive_gap;

   THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#     CON_ID
---------- ------------- -------------- ----------
         1            27             27          1

Read the rest of this entry »

Posted in 12c Release 1, Linux, RMAN | 4 Comments »

Interesting GNS anomaly in 12.1.0.1.2 RAC

Posted by Martin Bach on March 13, 2014

I was encountering an interesting anomaly with my 12.1.0.1.2 RAC cluster based on Grid Naming System. I have written about the setup here.

No need to get back to the post-instead let me give you some background if you are not familiar with the Grid Naming System in RAC. Most RAC installations use a Domain Name Server (DNS) to resolve names to IP addresses. In the case of GNS the name resolution is delegated from the main DNS server to a so-called subdomain. In my case my DNS server (bind version 9) is configured on aux.example.com (192.168.100.2), and it delegates name resolution for *.gns.example.com (my cluster) to the Grid Naming System. The GNS address in use is 192.168.100.37 and must be defined in the corporate DNS. The required IP addresses have to be supplied by a DHCP server in the same range as the public network. My public network is on 192.168.100/24, the private network is on 192.168.101/24 and 192.168.102/24. Which component resolves what? Here is a summary:

  • The GNS VIP is resolved by DNS. It’s not in the subdomain according to the documentation. I am using gns-vip.example.com (192.168.100.37) for it.
  • The node public IP is resolved via GNS. Oracle uses the $(hostname) to assign the name. The IP address is fixed
  • The node VIPs are resolved via GNS, the assigned name is $(hostname)-vip and the addresses are assigned by DHCP
  • The private interconnect interface is resolved by GNS, usually as $(hostname)-priv. Addresses are either static or assigned by DHCP
  • The SCAN VIPs are resolved via GNS and assigned by DHCP. And this is exactly where my problem was.

The Problem

All over sudden I had connection problems with my tomcat server connecting to the database. The address I used in tnsnames ora was rac12scan.gns.example.com and hadn’t changed at all. It simply didn’t point to a valid IP half of the time. When I pinged the SCAN I was surprised to see that it listed adresses that weren’t in the output of “srvctl config scan”.

And sure enough, a nslookup against the SCAN resulted in the output of 7 (!) addresses. There was obviously a discrepancy between what Oracle thought the SCAN was (3 IP addresses) and what GNS reported. I tried various cures, including restarting the DNS and DHCP servers (even moving the lease file on the DHCP server to a temporary location), and variations of restarting GNS and the whole cluster (crsctl stop clsuter -all; crsctl start cluster -all). Nothing helped.

Troubleshooting

As you’d expect the name resolution only works with GNS started. If you try to “nslookup rac12scan.gns.example.com” while GNS is down, you get the NXDOMAIN result code which does not really help. The output of “srvctl status gns -list and -detail” is useful in troubleshooting the problem. I also ran “cluvfy comp gns” but that came back clean. Nevertheless, the system instisted that the SCAN was made up of 7 IP addresses, and only 4 of them had SCAN VIPs. No surprise I can’t connect half the time. Now the question is of course: why does the GNS daemon grab so many IP addresses? So I looked at the gnsd.log in $GRID_HOME/log/$(hostname -s)/gns to see what happens. Here is an excerpt from the log file:

2014-03-11 16:37:30.305: [     GNS][1009862208]main::clsgnocrOpenKeyInternal: (:CLSGN01305:) #0 of 3: OCR error 4 (PROC-4: The cluster registry key to be operated on does not exist.) is not expected 32 - no retry will be done.
2014-03-11 16:37:30.305: [     GNS][1009862208]main::clsgnocrOpenKeyInternal: (:CLSGN01307:) Pass #0: open of "SYSTEM.GNS.SUBDOMAIN.gns|dexample|dcom.rac12gns-scan1-vip.SRV" failed with error 4 Last: PROC-4: The cluster registry key to be operated on does not exist. (4) - unable to retry.
2014-03-11 16:37:30.306: [     GNS][1009862208]main::clsgnocrOpenKeyInternal: (:CLSGN01305:) #0 of 3: OCR error 4 (PROC-4: The cluster registry key to be operated on does not exist.) is not expected 32 - no retry will be done.
2014-03-11 16:37:30.306: [     GNS][1009862208]main::clsgnocrOpenKeyInternal: (:CLSGN01307:) Pass #0: open of "SYSTEM.GNS.SUBDOMAIN.gns|dexample|dcom.rac12gns-scan1-vip.TXT" failed with error 4 Last: PROC-4: The cluster registry key to be operated on does not exist. (4) - unable to retry.
2014-03-11 16:37:30.306: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12node1-vip" Type: A 192.168.100.44 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.308: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12node2-vip" Type: A 192.168.100.39 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.308: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.34 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.36 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.35 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.40 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.42 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.41 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.31 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12gns-scan2-vip" Type: A 192.168.100.41 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.310: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12gns-scan2-vip" Type: A 192.168.100.34 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.310: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12gns-scan3-vip" Type: A 192.168.100.40 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.310: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12gns-scan3-vip" Type: A 192.168.100.38 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.310: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12gns-scan1-vip" Type: A 192.168.100.31 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.310: [   CLSNS][1009862208]Resolve::clsns_SetTraceLevel:trace level set to 1.
2014-03-11 16:37:30.310: [   CLSNS][1009862208]Resolve::clsns_SetTraceLevel:trace level set to 1.
2014-03-11 16:37:30.311: [   CLSNS][1009862208]Resolver #0::clsns_SetTraceLevel:trace level set to 1.
2014-03-11 16:37:30.311: [   CLSNS][1009862208]Resolver #0::clsns_SetTraceLevel:trace level set to 1.
2014-03-11 16:37:30.311: [   CLSNS][1009862208]Resolver #1::clsns_SetTraceLevel:trace level set to 1.

You can see the address assignment here, and this corresponds to the DHCP leases I saw on the DHCP server host’s /var/log/messages file. Notice that rac12scan grabs 7 IP addresses. This is surprising, the documentation normally states just 3 IP addresses for use with the SCAN. As you can further see the SCAN VIPs resolve only to a subset of these. 192.168.100.36 for example does not have a corresponding SCAN-VIP like some others too. Also, rac12gns-scan2-vip and rac12gns-scan3-vip appear twice. So if rac12scan now resolves to 192.168.100.36 you will get a TNS Error in the form “Destination host does not exist”. Annoying! But why does it do so? The mesages before (PROC-4) gave me a clue.

The information about GNS VIPs seems to be stored in the OCR. An ocrdump confirmed the findings. The question remains: why do Oracle use more than 3 IP addresses for the SCAN? I can only speculate that the DHCP addresses obtained are added to the OCR, and the existing ones don’t seem to be cleared out or updated correctly.

In an effort to solve the problem I dropped the SCAN listeners and the SCAN VIPS and recreated them. But this seemed to confuse the cluster even more, and the name resolution didn’t work. I tried both the subdomain delegation (nslookup rac12scan.gns.example.com) and querying GNS directly (nslookup rac12scan.gns.example.com – 192.168.100.37), both did not resolve anything. A restart of Clusterware didn’t change the situation so I decided to reboot the cluster nodes hoping to bring everything back to normal (yes I should know better, this isn’t Windows). After some waiting until the servers come back up there was no change. Now the system was so confused that it didn’t come up with a SCAN at all. Trying to restore the OCR failed, and I couldn’t even run a rootcrs.pl -deconfig -force in preparation for a reinstallation of Clusterware. So I trashed the system. I don’t believe there is a time for GNS in my lab after this experience.

Thanks to Marcin Przepiorowski (@pioro) who helped during the investigation.

References:

Grid Infrastructure Installation Guide 12c:

  • 1.1.3 Oracle Grid Infrastructure Network Checklist
  • 5.5.1 About Oracle Grid Infrastructure Name Resolution Options
  • 5.11 Grid Naming Service Standard Cluster Configuration Example

Posted in 12c Release 1, KVM, Linux | 1 Comment »

Runtime Load Balancing Advisory in RAC 12c-addendum

Posted by Martin Bach on February 19, 2014

A reader asked an interesting question yesterday with regards to the previous post on the subject: where did you get your service metrics from when you queried v$servicemetric-PDB or CDB$ROOT?

I queried the PDB, but this morning repeated the test to make sure the results are consistent, and they are. This is definitely something you’d hope for: you should not have different results in the same v$-view depending on the container you execute your query in for a given CON_ID.

During testing I noticed something interesting though. I queried gv$servicemetric but did not limit the result to the service I wanted to test with (FCFSRV). Here is the query against gv$servicemetric while the system was idle.

select inst_id,begin_time,end_time,service_name,cpupercall,callspersec,goodness,delta,con_id
from gv$servicemetric
where con_id = 3 and service_name  in ('FCFSRV','demopdb')
order by service_name, inst_id;

   INST_ID BEGIN_TIME          END_TIME            SERVICE_NAME         CPUPERCALL CALLSPERSEC   GOODNESS      DELTA     CON_ID
---------- ------------------- ------------------- -------------------- ---------- ----------- ---------- ---------- ----------
         1 19.02.2014 10:07:16 19.02.2014 10:07:21 FCFSRV                        0           0        100        100          3
         1 19.02.2014 10:05:57 19.02.2014 10:06:56 FCFSRV                        0           0        100        100          3
         2 19.02.2014 10:05:55 19.02.2014 10:06:54 FCFSRV                        0           0        100        100          3
         2 19.02.2014 10:07:19 19.02.2014 10:07:24 FCFSRV                        0           0        100        100          3
         1 19.02.2014 10:07:16 19.02.2014 10:07:21 demopdb                       0           0          0          1          3
         1 19.02.2014 10:05:57 19.02.2014 10:06:56 demopdb                       0           0          0          1          3
         2 19.02.2014 10:07:19 19.02.2014 10:07:24 demopdb                       0           0          0          1          3
         2 19.02.2014 10:05:55 19.02.2014 10:06:54 demopdb                       0           0          0          1          3

8 rows selected.

Read the rest of this entry »

Posted in 12c Release 1, Linux, RAC | Leave a Comment »

Runtime Load Balancing Advisory in RAC 12c

Posted by Martin Bach on February 18, 2014

This is a follow-up on yesterday’s post about services in the new 12c database architecture. After having worked out everything I needed to know about TAF and RAC 12c in CDBs I wanted to check how FCF works with PDBs today. While investigating I found out that the Runtime Load Balancing Advisory does not seem to work as expected in some cases. But I’m getting ahead of myself. First of all, here is my test case:

  • Oracle Linux 6.4 x86-64
  • Grid Infrastructure 12.1.0.1.2, i.e. January 2014 PSU applied
  • RDBMS 12.1.0.1.2, likewise patched with the January PSU
  • A CDB with just 1 PDB for this purpose, named DEMOPDB
  • Service FCFSRV is used

Read the rest of this entry »

Posted in 12c Release 1, Linux, RAC | 2 Comments »

RAC and Pluggable Databases

Posted by Martin Bach on February 17, 2014

In preparation of the OUGN Spring Seminar and to finally fulfill at least a part of my promise from July I was getting ready to research RAC, PDBs and services for my demos. It turned out to become a lot more interesting than I first assumed.

RAC and Multi-Tenancy

So the first attempt to really look at how this works has started with my 2 node cluster where I created a RAC database: RAC12C, administrator managed with instance RAC12C1 and RAC12C2. The database is registered in Clusterware. Clusterware and RDBMS are patched to the January PSU, i.e. 12.1.0.1.2.

The second step was to create a PDB for testing-it’s named DEMOPDB and available on both my instances by design. By the way-PDBs do not start automatically, it is your responsibility to start them when the database starts. I used a startup-trigger for this in the Consolidation Book, have a look at it to see the example. There are other ways available as I found out.

As soon as the PDBs is opened (in all Oracle deployment types, not limited to a RAC instance), a new service with the same name as the PDB is automatically started. As it turned out, using that service is the most reliable way to connect to the PDB:

SQL> connect user/password@single-client-access-name/pdbname

This works really well. Substitute your hostname with the SCAN for single instance. Now if you would like to implement some more interesting features (TAF/FAN+FCF/Application Continuity) you could create an additional service. The srvctl syntax has changed, Oracle now uses long parameter names (-service instead of -s), which doesn’t really respect the UNIX/GNU way of naming parameters (short parameter: single dash, long parameter name: double-dash) but that’s how it is.

Read the rest of this entry »

Posted in 12c Release 1, Linux, RAC | Tagged: , , , | Leave a Comment »

Public appearances first half of 2014

Posted by Martin Bach on February 11, 2014

I had the great fortune to have had many of my abstract accepted for upcoming conferences and other public appearances, the first ones for this year are these:

OUG Ireland

The first conference I’ll be attending is in Dublin, for OUG Ireland 2014 on March 11.I’m going to help out with RAC Attack and also present about Technologies for Developing Highly Available Applications in RAC 12c at 14:50 for about 45 minutes. What can you expect? Here is the official abstract, I am looking forward to the presentation and the live demos.

Oracle offers a wealth of technologies to make your application more resilient to instance failure. This talk presents an overview of what is commonly considered in application development based on the Java programming language. You will learn about these technologies from a DBA’s point of view. Options will include TAF as a baseline followed by FCF and finally Application Continuity (incl. demos)

Exadata Workshop

At Enkitec we regularly organise Exadata workshops for those who are either interested in the technology or using it already but want to get more out of it. These workshops are scheduled quarterly, and I have already held two of these in London. The next one for Europe will take place on March 27 (Thursday), again in Oracle’s London City Office. If you are interested in the Exadata platform the workshop is an opportunity to discuss your deployment with other users or just to bounce ideas off other people then by all means come along (it’s free of charge)! The official link to this workshop is here:

http://www.enkitec.com/education/sessions/exadata_workshop_london_27-mar-2014

I have updated the material to include the latest Oracle 12c cell software features as well as support for the new RDBMS architecture-Container Databases-and will update delegates on how this works and also how Exadata supports Oracle’s consolidation platform. There are many cool new features in 12c worth exploring and knowing about.

OUGN Spring Seminar

Next up is the Norwegian User Group’s Spring Meeting on April 3-4 where I have managed to secure two slots:

  • Oracle 12c features that didn’t make the marketing top 10
  • Advanced RAC programming features

The agenda is online and can be found here: http://ougnvarseminar2014.sched.org/

There are far too many great speakers to just link to my talks, feel free to browse (and register :)

For me the meeting starts on Wednesday 2nd when I’m flying to Oslo. This is without a shadow of a doubt a very good conference, and I can only recommend it-exactly like the others on this page. It is very well organised and definitely worth going. There are very few other conferences where you can mingle with the presenters and chat about all sorts of things in the same way as there. I have attended twice already and enjoyed it every time.

OUGN14_Speaking_200px

Enkitec E4

Enkitec’s Extreme Exadata Expo is one of the conferences with the highest technical content I can imagine. I attended last year and was really impressed by the speakers and the content. The impact on “Big Data” is clearly visible in the agenda and like last year you can find Exadata and “Big Data” talks.

e4_2014_email_sig

Enkitec Extreme Exadata Expo

I am quite chuffed I made it on the agenda of the event! I’m going to present “Think Exa” together with my colleague Frits Hoogland.

In this presentation we are going to talk about setting your mindset on Exadata to overcome pre-ASM and pre-RAC habits. Unlearning some things we have been taught (or doing) for many years take its time, and we are hoping to give the audience an overview of what can be achieved with the platform.

At the same time that I am happy that I secured a speaker’s slot I am even more excited to attend the conference. Looking at the list of speakers who already confirmed their attendance it is going to be a blast. I am looking forward to seeing my colleagues from the United States but also to catch up with new and old friends.

Looking forward to seeing you at one of these events!

Posted in Public Appearances | Leave a Comment »

Data Guard transport lag in OEM 12c

Posted by Martin Bach on January 30, 2014

I have come across this phenomenon a couple of times now so I thought it was worth writing up.

Consider a scenario where you get an alert because your standby database has an apply lag. The alert is generated by OEM and when you log in and check-it has indeed an apply lag. Even worse, the apply lag increases with every refresh of the page! I tagged this as an 11.2 problem but it’s definitely not related to that version.

Here is a screenshot of this misery:

 Lag in OEM

Now there are of course a number of possible causes:

  • There is a lag
  • You are not using Real Time Apply

The first one is easy to check: look at the redo generation rate on the primary database to see if it’s any different. Maybe you are currently loading lots of data? Maybe a batch job has been initiated that goes over a lot of data… the possibilities are nearly endless.

Another, more subtle interpretation could be that you are not using Real Time Apply. How can you check? In the broker command line interface for example:

DGMGRL> show configuration

Configuration - test

  Protection Mode: MaxPerformance
  Databases:
    pri - Primary database
      Warning: ORA-16789: standby redo logs not configured

    sby - Physical standby database
      Warning: ORA-16789: standby redo logs not configured

Fast-Start Failover: DISABLED

Configuration Status:
WARNING

The warnings about missing standby redo logs show that you cannot possibly use Real Time Apply (it needs standby redo logs). The other option is in the database itself:

SQL> select dest_id,status,database_mode,recovery_mode
  2  from v$archive_dest_status
  3  where status <> 'INACTIVE';

   DEST_ID STATUS    DATABASE_MODE   RECOVERY_MODE
---------- --------- --------------- -----------------------
         1 VALID     MOUNTED-STANDBY MANAGED
        32 VALID     UNKNOWN         IDLE

Did you notice dest_id of 32? That’s a bit of an unusual one, more on that later (since you can only set log_archive_dest_x where x is {1,31}).

So indeed we have managed recovery active, but not using Real Time Apply. This is expressed in the database status:

DGMGRL> show database verbose sby

Database - sby

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   28 seconds
  Apply Lag:       28 seconds
  Real Time Query: OFF
  Instance(s):
    sby

A few moments later when you query the database again the lag has increased:

DGMGRL> show database verbose sby

Database - sby

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   3 minutes 22 seconds
  Apply Lag:       3 minutes 22 seconds
  Real Time Query: OFF
  Instance(s):
    sby

This is to be expected-the primary is still happily processing user requests. The cure is to add standby redo logs, as suggested in so many places and described in the Data Guard documentation. After the successful addition of SRLs the lag should disappear. A restart of managed recovery using the broker will show something along these lines on the standby:

2014-01-30 14:35:18.353000 +00:00
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  THROUGH ALL SWITCHOVER DISCONNECT  USING CURRENT LOGFILE
Attempt to start background Managed Standby Recovery process (sby)
MRP0 started with pid=24, OS id=4854
MRP0: Background Managed Standby Recovery process started (sby)
2014-01-30 14:35:23.406000 +00:00
 started logmerger process
Managed Standby Recovery starting Real Time Apply
...
2014-01-30 14:37:12.595000 +00:00
Media Recovery Waiting for thread 1 sequence 20 (in transit)
2014-01-30 14:37:13.691000 +00:00
Recovery of Online Redo Log: Thread 1 Group 5 Seq 20 Reading mem 0
  Mem# 0: +DATA/sby/onlinelog/group_5.266.838218819

Two important bits of information are shown here: Managed Standby Recovery starting Real Time Apply and the fact that it is using the standby redo log. Sure enough, after the database is in sync with its primary and uses the log, the lag is gone:

DGMGRL> show database verbose sby

Database - sby

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   0 seconds
  Apply Lag:       0 seconds
  Real Time Query: OFF
  Instance(s):
    sby

And also in the OEM view:

OEM-lag-02

Slight Variation

I have also seen this problem in OEM where the transport lag was near 0 and therefore hardly visible due to the scale of the graph. The apply lag nevertheless resulted from the primary working and the current log hasn’t shipped to the standby-obviously before the implementation of standby redo logs. You saw a spike mounting in the OEM view until the next log switch on the primary when the apply lag dropped to 0 for a brief moment before increasing again.

Summary

Real Time Apply is a very very useful feature, especially when used together with the maximum availability protection mode. The real risk of not using standby redo logs – and implicitly no RT Apply – is that you lose data since the current online redo log on the primary has not been copied across. If you need to activate your standby you will be some transactions short of the primary. The larger the online redo log, the larger the gap.

Posted in 11g Release 2, Cloud Control | Tagged: , | 8 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 2,082 other followers