Martins Blog

Trying to explain complex things in simple terms

RAC One Node and Database Protection

Posted by Martin Bach on February 16, 2011

An email from fellow Oak Table Member James Morle about RAC One Node and failover got me thinking about the capabilities of the product.

I have written about RON (Rac One Node) in earlier posts, but haven’t really explored what happens with session failover during a database relocation.

Overview

So to clarify what happens in these two scenarios I have developed a simple test. Taking a RON database + a service I modified both to suit my test needs. Connected to the service I performed a database relocation to see what happens. Next I killed the instance (I wasn’t able to reboot the node) t o simulate what happens when the node crashes.

Setup

The setup used an existing database, “RON”. It also had a service defined already, but that needed tweaking. The database was defined as follows:

$ srvctl config database -d RON
Database unique name: RON
Database name: RON
Oracle home: /u01/app/oracle/product/11.2.0.2
Oracle user: oracle
Spfile: +DATA/RON/spfileRON.ora
Domain: example.com
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: RON
Database instances:
Disk Groups: DATA
Mount point paths:
Services: RON_APP.example.com
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: RON
Candidate servers: node1,node2
Database is administrator managed

The service was initially defined as follows:

srvctl config service -d RON
Service name: RON_APP.example.com
Service is enabled
Server pool: RON
Cardinality: 1
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: NONE
Failover method: NONE
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: BASIC
Edition:
Preferred instances: RON_1
Available instances:
[rac]oracle@node1.example.com $

Some of these attributes require special attention, we have 3 categories to deal with: preferred instances for TAF, the TAF setup itself and the runtime load balancing (RLB) advisory.

Transparent Application Failover with “real” RAC only works if there are two preferred instances. As you can see the service has only one preferred instance. Not sure if that can be changed though…Let’s try:

srvctl modify service -s RON_APP -d RON -i RON_1,RON_2
PRKO-2007 : Invalid instance name: RON_2

I wasn’t surprised-RON is not a RAC database so it has only 1 active instance. When registering the RAC One Node database you don’t add instances as you would with a RAC database, instead you set the database type to RACONENODE (srvctl add database -d name -c RACONEONE … )

I recommend setting TAF properties on the service level-that way you don’t miss crucial parameters in your tnsnames.ora file. This is the preferred way of doing it at least since 11.2. Changing the service is straight forward:

srvctl modify service -d RON -s RON_APP.example.com \
> -P BASIC -e SESSION -m BASIC

This piece of code instructs the service to use a BASIC TAF policy, a failover type of SESSION and the BASIC failover method. These parameters were normally be configured in the CONNECT_DATA section of your TNSNames.ora file.

With these changes made, the service configuration has changed to the below:

$ srvctl config service -d RON -s RON_APP.example.com
Service name: RON_APP.example.com
Service is enabled
Server pool: RON
Cardinality: 1
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: SESSION
Failover method: BASIC
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: BASIC
Edition:
Preferred instances: RON_1
Available instances:
[rac]oracle@node1.example.com $

I also wanted to change the defaults to a more suitable RLB configuration. Instead of a CLB goal of “LONG” I wanted to set it up for “SHORT”. And I needed emphasis on SERVICE_TIME as well (my intention was to run swingbench). The change and resulting service configuration are shown below:

$ srvctl modify service -d RON -s RON_APP.example.com \
> -j short -B service_time

$ srvctl config service -d RON -s RON_APP.example.com
Service name: RON_APP.example.com
Service is enabled
Server pool: RON
Cardinality: 1
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: SESSION
Failover method: BASIC
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: SHORT
Runtime Load Balancing Goal: SERVICE_TIME
TAF policy specification: BASIC
Edition:
Preferred instances: RON_1
Available instances:
[rac]oracle@node1.example.com $

My SCAN name was scan1.example.com, and I ensured that the local_listener was pointing to my none default port of 1821 and the remote_lister was using the EZConnect synatax (“scan1.example.com:1825). It is very important to set the local_listener parameter if you are not using the default port of 1521!

On my client system I defined a local TNS alias “RON” to connect  to the RON database. Note that it doesn’t use any TAF parameters.

C:\oracle\product\11.2.0\client_1\network\admin>tnsping ron

TNS Ping Utility for 32-bit Windows: Version 11.2.0.2.0 - Production on 15-FEB-2011 17:07:44

Copyright (c) 1997, 2010, Oracle.  All rights reserved.

Used parameter files:
C:\oracle\product\11.2.0\client_1\network\admin\sqlnet.ora

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = scan1.example.com)(PORT = 1825)
)) (CONNECT_DATA = (SERVICE_NAME = LRON_APP.uk.db.com)))
OK (10 msec)

Database Relocation

Oracle’s promise is that you don’t lose your session during a database relocation. Let’s see if that is actually true. Using my setup I connected to the RON database (sorry for the broken formatting!):

C:\oracle\product\11.2.0\client_1\network\admin>sqlplus martin/test@ron

SQL*Plus: Release 11.2.0.2.0 Production on Tue Feb 15 16:50:08 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> select username,failover_method,failover_type,failed_over from gv$session where username='MARTIN';

USERNAME                       FAILOVER_M FAILOVER_TYPE FAI
------------------------------ ---------- ------------- ---
MARTIN                         BASIC      SESSION       NO

OK, so TAF is working. At this time I started the relocate command:

[RON_1]oracle@node1.example.com $ srvctl relocate database -d RON -n node2 -w 1 -v
Configuration updated to two instances
Instance RON_2 started

I then repeatedly required my SQL statement above while the instance was relocating. The output is shown below.

SQL> select inst_id,username,failover_method,failover_type,failed_over from gv$session where username='MARTIN';

 INST_ID USERNAME                       FAILOVER_M FAILOVER_TYPE FAI
---------- ------------------------------ ---------- ------------- ---
 1 MARTIN                         NONE       NONE          NO
 1 MARTIN                         BASIC      SESSION       NO
 2 MARTIN                         NONE       NONE          NO

SQL> select * from v$active_instances;

INST_NUMBER INST_NAME
----------- ------------------------------------------------------------
 1 node1.example.com:RON_1
 2 node2.example.com:RON_2

SQL> select inst_id,username,failover_method,failover_type,failed_over from gv$session where username='MARTIN';

 INST_ID USERNAME                       FAILOVER_M FAILOVER_TYPE FAI
---------- ------------------------------ ---------- ------------- ---
 1 MARTIN                         NONE       NONE          NO
 1 MARTIN                         BASIC      SESSION       NO
 2 MARTIN                         NONE       NONE          NO

As you can see the configuration change to 2 instances is reflected in the output of my query to v$active_instances. You also see the number of sessions increasing, pay attention to the inst_id colum: a new session is created on the second instance.

In my other session I saw the relocation completing:

Services relocated
Waiting for 1 minutes for instance RON_1 to stop.....
Instance RON_1 stopped
Configuration updated to one instance

I required one more time to see what happened-would my session survive?

SQL> /
select inst_id,username,failover_method,failover_type,failed_over from gv$session where username='MARTIN'
*
ERROR at line 1:
ORA-25408: can not safely replay call

SQL> /

 INST_ID USERNAME                       FAILOVER_M FAILOVER_TYPE FAI
---------- ------------------------------ ---------- ------------- ---
 2 MARTIN                         BASIC      SESSION       YES

SQL>
SQL> select * from v$active_instances;

INST_NUMBER INST_NAME
----------- ------------------------------------------------------------
 2 node2.example.com:RON_2

Well it did survice. Note the ORA-25408 error. That’s expected, since I’m using the SQL*Plus client I don’t have the opportunity to trap the error and replay my OCI call. You should capture this SQLException in Java or your preferred development environment. I have provided an example in chapter 11 of Pro Oracle Database 11g RAC on Linux.

Node failure

I couldn’t see how a session would survive in the case of a node failure… I said this in my email to James:

> I cannot see how TAF or FCF work in case of a node failure. In my
> tests I did for the book TAF only worked if there was a service with
> at least 2 preferred instances. And FCF requires a FAN aware
> connection pool which is rare to find. RAC one however only has only 1
> active node (unless you relocate it).

But better test before jumping to conclusions!

I could only kill an instance rather than the server which would have been a better test. I assumed Clusterware would try to restart the failed instance on the same node a few times and then relocate the resource if the stat was not successful.

I knew the database was now running on node 2 so I killed the SMON process. That sure results in an instance crash.

oracle@node2.example.com $ ps -ef | grep RON | grep smon
oracle   14208     1  0 16:52 ?        00:00:00 ora_smon_RON_2
$ kill -9 14208

Just as you would expect, the session didn’s survive this (how could it? There is no active second instance!)

SQL> /
select inst_id,username,failover_method,failover_type,failed_over from gv$session where username='MARTIN'
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 15649
Session ID: 183 Serial number: 3

And just as expected, Clusterware restarted the failed instance the second it detected the failure. The node’s alert log showed this:

System State dumped to trace file /u01/app/oracle/product/admin/RON/admin/diag/rdbms/RON/RON_2/trace/RON_2_diag_14174.trc
ORA-1092 : opitsk aborting process
2011-02-15 16:58:29.255000 +00:00
ORA-1092 : opitsk aborting process
License high water mark = 6
2011-02-15 16:58:32.569000 +00:00
Instance terminated by PMON, pid = 14164
USER (ospid: 16101): terminating the instance
Instance terminated by USER, pid = 16101
2011-02-15 16:58:35.286000 +00:00
Starting ORACLE instance (normal)
2011-02-15 16:58:36.477000 +00:00
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
2011-02-15 16:58:42.687000 +00:00
Private Interface ‘bond1:1’ configured from GPnP for use as a private interconnect.
[name=’bond1:1′,…
…, use=public/1]

Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/oracle/product/11.2.0.2/dbs/arch
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is enabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 – 64bit Production

Summary

That concludes my testing. RON works better than a “classic” active/passive cluster and allows sessions to stay connected when migrating the database to a different host. And it makes it easier to convert the database to a full RAC database (I have an example but it needs a bit of tidying up before posting). On the other hand, virtualisation technology has allowed us to do the same for quite some time now. Xen and OracleVM can relocate domUs, and vmotion is the commercial alternative. Whatever suits your needs.

6 Responses to “RAC One Node and Database Protection”

  1. […] This post was mentioned on Twitter by mathewbutler, Martin Bach. Martin Bach said: RAC One Node and Database Protection http://wp.me/pDczL-am […]

  2. ghassem said

    Thank you so much
    It’s excelent

  3. […] before 10.2.0.4 and unless being started with the -F option. Martin Bach writes in an article (RAC One Node and Database Protection) that it is more or less the responsibility of the calling environment to handle just this […]

  4. bdrouvot said

    In case of Rac one Node database crash and using a TAF service : If you wait the instance is up again you will be able to launch your sql without seeing the crash.

    The thing is : The sql has to be launched when the instance is providing the TAF service enven if its crashed before.

    So :

    connect
    launch the sql => ok
    db crash
    wait the db is up again
    re-launch the sql =>ok

    Bertrand

  5. […] For a good understanding of what RAC One Node is, you can have a look to Martin’s post. […]

  6. […] Martin Bach did a good study of Rac One Node capability and especially of what happens with session failover during a database relocation or during a node failure into this blog post. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: