Martins Blog

Trying to explain complex things in simple terms

Archive for the ‘RAC’ Category

Creating a RAC 12.1 Data Guard Physical Standby environment (1)

Posted by Martin Bach on December 14, 2016

I have just realised that the number of posts about RAC 12c Release 1 on this blog is rather too small. And since I’m a great fan of RAC this has to change :) In this mini-series I am going to share my notes about creating a Data Guard setup on my 2 node 12.1.0.2.161018 RAC primary + identical 2 node RAC standby system in the lab.

NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!

The lab Environment

My environment consists of the following entities:

  • Primary Site
    • Oracle Linux 7.2 – UEK3 x86_64
    • Two RAC nodes: rac12pri1 and rac12pri2
    • Grid Infrastructure 12.1.0.2.161018 in /u01/app/12.1.0.2/grid
    • Single RDBMS home 12.1.0.2.161018 in /u01/app/oracle/product/12.1.0.2/dbhome_1
    • Non-container database NCDBA is administrator managed and running on both nodes
  • Standby Site
    • Oracle Linux 7.2 – UEK3 x86_64
    • Two RAC nodes: rac12sec1 and rac12sec2
    • Grid Infrastructure 12.1.0.2.161018 in /u01/app/12.1.0.2/grid
    • Single RDBMS home 12.1.0.2.161018 in /u01/app/oracle/product/12.1.0.2/dbhome_1
    • Standby database NCDBB is administrator managed and running on both nodes

I won’t focus on the creation of the RAC systems, I may have covered some of it in earlier blog posts and of course in the RAC Book.

I have deliberately kept it simple. Although most systems in real life use a dedicated (set of) NICs for Data Guard traffic I decided against it-I didn’t want attention being drawn away from the Big Picture. Similarly I am not touching on the option to create a second SCAN that Oracle allows us to create from 12.1 onwards. If you are interested in these topics kindly refer to my other blog posts.

Creation of the Primary Database

After both RAC systems are set up it’s time to start with the creation of the primary database. This is easy:

dbca -silent -createDatabase -templateName RACDB.dbc \
-gdbName NCDBA -sysPassword ... -systemPassword ... -storageType ASM \
-diskGroupName DATA -recoveryGroupName RECO -sampleSchema true \
-totalMemory 4096 -dbsnmpPassword ... -nodeinfo rac12pri1,rac12pri2

The template referenced in “-templateName” is my own – I always create templates to be license compliant. I covered how to create your custom database template on this blog as well.

I won’t go into detail here about the naming of my databases in a Data Guard configuration. What I learned the hard way was not to use a DB_UNIQUE_NAME that reflects the role. Imagine everyone’s surprise when they connect to a database named STDBY operating in the primary role after a switchover… For lack of better ideas I went ahead and enumerated the databases: my primary database is NCDBA and the standby is NCDBB.

After the database is created, it is started automatically by DBCA.

[oracle@rac12pri1 ~]$ srvctl status database -db NCDBA
Instance NCDBA1 is running on node rac12pri1
Instance NCDBA2 is running on node rac12pri2
[oracle@rac12pri1 ~]$

However, the newly created database isn’t patched (this is a known issue documented on Mike Dietrich’s blog for example).

SQL> select name from v$database;

NAME
---------
NCDBA

SQL> select count(*) from dba_registry_sqlpatch;

  COUNT(*)
----------
         0

No way around it – time to call datapatch:

SQL> alter system set cluster_database=false scope=spfile sid='*';

System altered.

SQL> exit

...

[oracle@rac12pri1 ~]$ srvctl stop database -db NCDBA
[oracle@rac12pri1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 14 13:39:04 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup upgrade
ORACLE instance started.

Total System Global Area 3221225472 bytes
Fixed Size                  2929552 bytes
Variable Size             771755120 bytes
Database Buffers         2432696320 bytes
Redo Buffers               13844480 bytes
Database mounted.
Database opened.
SQL>  exit

...

[oracle@rac12pri1 ~]$ cd $ORACLE_HOME/OPatch/
[oracle@rac12pri1 OPatch]$ ./datapatch -verbose
SQL Patching tool version 12.1.0.2.0 on Wed Dec 14 13:08:51 2016
Copyright (c) 2016, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_16313_2016_12_14_13_08_51/sqlpatch_invocation.log

Connecting to database...OK
Bootstrapping registry and package to current versions...done
Determining current state...done

Current state of SQL patches:
Patch 24315824 (Database PSU 12.1.0.2.161018, Oracle JavaVM Component (OCT2016)):
  Installed in the binary registry only
Bundle series DBBP:
  ID 161018 in the binary registry and not installed in the SQL registry

Adding patches to installation queue and performing prereq checks...
Installation queue:
  Nothing to roll back
  The following patches will be applied:
    24315824 (Database PSU 12.1.0.2.161018, Oracle JavaVM Component (OCT2016))
    24340679 (DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679))

Installing patches...
Patch installation complete.  Total patches installed: 2

Validating logfiles...
Patch 24315824 apply: SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/24315824/20676808/24315824_apply_NCDBA_2016Dec14_13_09_26.log (no errors)
Patch 24340679 apply: SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/24340679/20713212/24340679_apply_NCDBA_2016Dec14_13_09_30.log (no errors)
SQL Patching tool complete on Wed Dec 14 13:15:08 2016
[oracle@rac12pri1 OPatch]$

This concludes part 1 – the database is now set up and running on the primary cluster. In the next part I’m going to describe how to prepare the primary and standby cluster for the Data Guard setup.

Posted in 12c Release 1, Data Guard, Linux, RAC | 1 Comment »

Trouble with multiple SCAN listeners in 12c Release 1

Posted by Martin Bach on May 14, 2016

UPDATE 28-JUL-2016: the problem – as stated by Anil in the comments section – seems to be fixed. I applied the Proactive Bundle Patch 23273686 and OJVM patch 23177536 to the Grid and RDBMS home. After a successful execution of datapatch the errors in the CRSD agent log I reported in this post went away, and I can also see the settings in the database alert.log.

Scroll to the bottom of the post for more detail.

For those not yet on that patch level-the original post

Prompted by comments made by readers about my posts describing how to add a second SCAN in 12c Release 1 and problems regarding the listener_networks parameter I thought it was time to revisit the situation. I’m still running Oracle Linux 7.1/UEK3 (although that should not matter) but upgraded to 12.1.0.2.5. This is the release that is reportedly showing the odd behaviour. I don’t recall my exact version when I wrote the original posts back in April 2014, but by looking at them I think it all worked ok at the time. Here is my current patchlevel after the upgrade to the troublesome PSU.

 SQL> select patch_id,description,status from DBA_REGISTRY_SQLPATCH;

  PATCH_ID DESCRIPTION                                                                      STATUS
---------- -------------------------------------------------------------------------------- ---------------
  21555660 Database PSU 12.1.0.2.5, Oracle JavaVM Component (Oct2015)                       SUCCESS
  21359755 Database Patch Set Update : 12.1.0.2.5 (21359755)                                SUCCESS

This is not the latest patch level! After having seen a presentation by Ludovico Caldara (@ludodba) just this week I would think that the April 2016 patch which was current at the time of writing is the one you want to be on :) I have an upgrade to the April 2016 PSU planned but first wanted to test against 12.1.0.2.5 to see if I could reproduce the issue.

Problem Statement

The parameter listener_networks should be populated automatically by CRS when the database comes up-and indeed you can see evidence of that in the respective node’s crsd_oraagent_oracle.trc file. However, with 2 SCANs present, there is something not quite right. You can see this in the usual locations:

– SCAN listeners
– listener_networks parameter
– CRSD trace

The most obvious clue is that you cannot connect to the database any more using one of the SCANs. To avoid you having to flick back and forth between my posts, here is the network configuration again:

[oracle@ron12cprinode2 ~]$ srvctl config scan -netnum 1
SCAN name: ron12cpri-scan, Network: 1
Subnet IPv4: 192.168.100.0/255.255.255.0/eth0, static
Subnet IPv6: 
SCAN 0 IPv4 VIP: 192.168.100.65
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes: 
SCAN VIP is individually disabled on nodes: 
SCAN 1 IPv4 VIP: 192.168.100.66
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes: 
SCAN VIP is individually disabled on nodes: 
SCAN 2 IPv4 VIP: 192.168.100.64
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes: 
SCAN VIP is individually disabled on nodes: 

[oracle@ron12cprinode2 ~]$ srvctl config scan -netnum 2
SCAN name: ron12cpri-dg-scan, Network: 2
Subnet IPv4: 192.168.102.0/255.255.255.0/eth2, static
Subnet IPv6: 
SCAN 0 IPv4 VIP: 192.168.102.66
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes: 
SCAN VIP is individually disabled on nodes: 
SCAN 1 IPv4 VIP: 192.168.102.64
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes: 
SCAN VIP is individually disabled on nodes: 
SCAN 2 IPv4 VIP: 192.168.102.65
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes: 
SCAN VIP is individually disabled on nodes: 

It’s essentially using the 192.168.100/24 network for the “public” traffic and 192.168.102/24 for Data Guard. I still use my RAC One Node database RON, which is currently active on node 2. All of my SCAN listeners should know about its services, RON_SRV and RON_DG_SRV. However, that’s not the case, as a quick check reveals:

[oracle@ron12cprinode1 ~]$ lsnrctl status LISTENER_SCAN1

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 14-MAY-2016 19:56:14

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                14-MAY-2016 19:21:39
Uptime                    0 days 0 hr. 34 min. 35 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.1.0.2/grid/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/ron12cprinode1/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.65)(PORT=1521)))
Services Summary...
Service "-MGMTDBXDB" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "RON" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RONXDB" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_DG_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "_mgmtdb" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "ron12cpri" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
The command completed successfully
[oracle@ron12cprinode1 ~]$ lsnrctl status DGSCANLSNR_SCAN2_NET2

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 14-MAY-2016 19:56:23

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=DGSCANLSNR_SCAN2_NET2)))
STATUS of the LISTENER
------------------------
Alias                     DGSCANLSNR_SCAN2_NET2
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                14-MAY-2016 19:21:42
Uptime                    0 days 0 hr. 34 min. 41 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.1.0.2/grid/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/ron12cprinode1/dgscanlsnr_scan2_net2/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=DGSCANLSNR_SCAN2_NET2)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.102.64)(PORT=1521)))
The listener supports no services
The command completed successfully
[oracle@ron12cprinode1 ~]$ 

Before generating the above output I specifically “reset” the listener_networks settings on both instances, and ensured that they were created dynamically. After the database restart I couldn’t make out ANY entry for listener_networks:

SQL> select inst_id,name,value from gv$parameter where name = 'listener_networks';

   INST_ID NAME                           VALUE
---------- ------------------------------ ------------------------------
         1 listener_networks

SQL> select inst_id,name,value from gv$spparameter where name = 'listener_networks';

   INST_ID NAME                           VALUE
---------- ------------------------------ ------------------------------
         1 listener_networks

Nothing at all. I manage to reach the system using the RON_SRV service that’s known to the first (public) network’s SCAN:

[oracle@ron12cprinode1 ~]$ sqlplus a/b@ron12cpri-scan/RON_SRV

SQL*Plus: Release 12.1.0.2.0 Production on Sat May 14 20:18:02 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

ERROR:
ORA-01017: invalid username/password; logon denied


Enter user-name: 
[oracle@ron12cprinode1 ~]$ 

The same test failed for connections against ron12cpri-dg-scan, stating that the listener didn’t know about the service. Checking the CRSD trace (on the node the instance runs!) I could see the reason:

2016-05-14 19:47:49.637611 : USRTHRD:2023044864: {1:58687:2893} Endp=ron12cpri-dg-scan:1521
2016-05-14 19:47:49.637638 : USRTHRD:2023044864: {1:58687:2893} Final Endp=(( NAME=net2)(LOCAL_LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.102.63)(PORT=1523)))))','(( NAME=net2)(REMOTE_LISTENER=ron12cpri-dg-scan:1521ron12cpri-dg-scan:1521ron12cpri-dg-scan:1521)), remoteEndp= ron12cpri-scan:1521','ron12cpri-scan:1521','ron12cpri-scan:1521
2016-05-14 19:47:49.637662 : USRTHRD:2023044864: {1:58687:2893} sqlStmt = ALTER SYSTEM SET REMOTE_LISTENER=' ron12cpri-scan:1521','ron12cpri-scan:1521','ron12cpri-scan:1521' SCOPE=MEMORY SID='RON_1' /* db agent *//* {1:58687:2893} */
2016-05-14 19:47:49.645739 : USRTHRD:2023044864: {1:58687:2893} sqlStmt = ALTER SYSTEM SET LISTENER_NETWORKS='(( NAME=net2)(LOCAL_LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.102.63)(PORT=1523)))))','(( NAME=net2)(REMOTE_LISTENER=ron12cpri-dg-scan:1521ron12cpri-dg-scan:1521ron12cpri-dg-scan:1521))' SCOPE=MEMORY SID='RON_1' /* db agent *//* {1:58687:2893} */
2016-05-14 19:47:49.655035 : USRTHRD:2023044864: {1:58687:2893} ORA-02097: parameter cannot be modified because specified value is invalid
ORA-00119: invalid specification for system parameter LISTENER_NETWORKS
ORA-00137: invalid LISTENER_NETWORKS specification with NAME=net2
ORA-00138: all addresses specified for attribute REMOTE_LISTENER are invalid
ORA-00132: syntax error or unresolved network name 'ron12cpri-dg-scan:1521ron12cpri-dg-scan:1521ron12cpri-dg-scan:1521'

2016-05-14 19:47:49.655191 : USRTHRD:2023044864: {1:58687:2893} DbAgent::DedicatedThread::run setRemoteListener Exception OCIException
2016-05-14 19:47:49.655207 : USRTHRD:2023044864: {1:58687:2893} ORA-02097: parameter cannot be modified because specified value is invalid
ORA-00119: invalid specification for system parameter LISTENER_NETWORKS
ORA-00137: invalid LISTENER_NETWORKS specification with NAME=net2
ORA-00138: all addresses specified for attribute REMOTE_LISTENER are invalid
ORA-00132: syntax error or unresolved network name 'ron12cpri-dg-scan:1521ron12cpri-dg-scan:1521ron12cpri-dg-scan:1521'

Looking at this output it appears that setting the remote_listener worked, although I thought we’d only set the host once and not thrice? This looks fishy. It appears to work though, as confirmed in v$parameter and the fact that I can connect against the system.

Interestingly setting listener_networks fails with an ORA-02097: parameter cannot be modified because specified value is invalid. This makes sense: there are white spaces missing in the alter system command, and even if the spaces were correct, the command would fail. Trying manually confirms that thought:

SQL> ALTER SYSTEM SET LISTENER_NETWORKS='(( NAME=net2)(LOCAL_LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.102.63)(PORT=1523)))))','(( NAME=net2)(REMOTE_LISTENER=ron12cpri-dg-scan:1521 ron12cpri-dg-scan:1521 ron12cpri-dg-scan:1521))' SCOPE=MEMORY SID='RON_1';
ALTER SYSTEM SET LISTENER_NETWORKS='(( NAME=net2)(LOCAL_LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.102.63)(PORT=1523)))))','(( NAME=net2)(REMOTE_LISTENER=ron12cpri-dg-scan:1521 ron12cpri-dg-scan:1521 ron12cpri-dg-scan:1521))' SCOPE=MEMORY SID='RON_1'
*
ERROR at line 1:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-00119: invalid specification for system parameter LISTENER_NETWORKS
ORA-00137: invalid LISTENER_NETWORKS specification with NAME=net2
ORA-00138: all addresses specified for attribute REMOTE_LISTENER are invalid
ORA-00132: syntax error or unresolved network name 'ron12cpri-dg-scan:1521
ron12cpri-dg-scan:1521 ron12cpri-dg-scan:1521'

OK – that’s all I need to know. When changing the command to look like what I thought it should look like in the first place (the remote listener specifies only 1 host:port) it works:

SQL> ALTER SYSTEM SET LISTENER_NETWORKS='(( NAME=net2)(LOCAL_LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.102.63)(PORT=1523)))))','(( NAME=net2)(REMOTE_LISTENER=ron12cpri-dg-scan:1521))' SCOPE=MEMORY SID='RON_1';

System altered.

SQL> alter system register;

System altered.

What does that mean? Back to my SCAN listeners again:

[oracle@ron12cprinode1 ~]$ lsnrctl status LISTENER_SCAN1

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 14-MAY-2016 20:07:19

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------   
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                14-MAY-2016 19:21:39
Uptime                    0 days 0 hr. 45 min. 40 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.1.0.2/grid/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/ron12cprinode1/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.65)(PORT=1521)))
Services Summary...
Service "-MGMTDBXDB" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "RON" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RONXDB" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_DG_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "_mgmtdb" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "ron12cpri" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
The command completed successfully

No negative change there, but would the DG_SCAN listener also pick it up?

[oracle@ron12cprinode1 ~]$ lsnrctl status DGSCANLSNR_SCAN2_NET2

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 14-MAY-2016 20:07:21

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=DGSCANLSNR_SCAN2_NET2)))
STATUS of the LISTENER
------------------------   
Alias                     DGSCANLSNR_SCAN2_NET2
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                14-MAY-2016 19:21:42
Uptime                    0 days 0 hr. 45 min. 39 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.1.0.2/grid/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/ron12cprinode1/dgscanlsnr_scan2_net2/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=DGSCANLSNR_SCAN2_NET2)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.102.64)(PORT=1521)))
Services Summary...
Service "RON" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RONXDB" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_DG_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
The command completed successfully

Well it seems it did. Now the question is: can I connect?

[oracle@ron12cprinode1 ~]$ sqlplus system/secretPWD@ron12cpri-scan/RON_SRV

SQL*Plus: Release 12.1.0.2.0 Production on Sat May 14 20:08:45 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

ERROR:
ORA-28002: the password will expire within 7 days


Last Successful login time: Sat May 14 2016 19:58:28 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options
[oracle@ron12cprinode1 ~]$ sqlplus system/secretPWD@ron12cpri-dg-scan/RON_DG_SRV

SQL*Plus: Release 12.1.0.2.0 Production on Sat May 14 20:08:54 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

ERROR:
ORA-28002: the password will expire within 7 days


Last Successful login time: Sat May 14 2016 20:08:45 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> 

Summary

Well I can! So that should solve the problem for the active instance, however there are problems bound to happen when the instance restarts. Since I don’t really have control over the instance name in RAC One Node (RON_1 can be started on node 1 and node 2) I can’t hard-code the value for listener_networks into the spfile. As an end result I’d lock myself out just like CRS did. This is likely a similar issue for multi-node RAC using policy managed databases.

I have repeated the test with the latest version of the stack (upgraded in place), and got the same result. Here are the version numbers:

[oracle@ron12cprinode2 ~]$ /u01/app/oracle/product/12.1.0.2/dbhome_1/OPatch/opatch lspatches
22674709;Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016)
22502555;OCW Patch Set Update : 12.1.0.2.160419 (22502555)
22291127;Database Patch Set Update : 12.1.0.2.160419 (22291127)

OPatch succeeded.

[oracle@ron12cprinode2 ~]$ /u01/app/12.1.0.2/grid/OPatch/opatch lspatches
22674709;Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016)
22502555;OCW Patch Set Update : 12.1.0.2.160419 (22502555)
22291127;Database Patch Set Update : 12.1.0.2.160419 (22291127)

OPatch succeeded.

SQL> select action_time, patch_id,description,status from DBA_REGISTRY_SQLPATCH order by action_time;

ACTION_TIME                                PATCH_ID DESCRIPTION                                                            STATUS
---------------------------------------- ---------- ---------------------------------------------------------------------- ---------------
29-OCT-15 12.46.55.763581                  21555660 Database PSU 12.1.0.2.5, Oracle JavaVM Component (Oct2015)             SUCCESS
29-OCT-15 12.46.55.939750                  21359755 Database Patch Set Update : 12.1.0.2.5 (21359755)                      SUCCESS
14-MAY-16 21.32.15.211167                  21555660 Database PSU 12.1.0.2.5, Oracle JavaVM Component (Oct2015)             SUCCESS
14-MAY-16 21.32.15.233105                  22674709 Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016)        SUCCESS
14-MAY-16 21.32.15.591460                  22291127 Database Patch Set Update : 12.1.0.2.160419 (22291127)                 SUCCESS

The same values for remote_listener and listener_networks as with 12.1.0.2.5 have been observed, and the error about setting listener_networks in the CRSD log was identical to the previous release. I guess that needs to be raised with Oracle …

Apparently this is now fixed with the July Proactive Patch

As stated in the introduction I applied Proactive Bundle Patch 23273686 and OJVM patch 23177536 to the Grid and RDBMS home. After a successful execution of datapatch the errors in the CRSD agent log I reported in this post went away, and I can also see the settings in the database alert.log:

Reconfiguration complete (total time 0.0 secs)
Instance started by oraagent
NOTE: ASMB registering with ASM instance as Flex client 0xffffffffffffffff (reg:3848556776) (new connection)
ORACLE_BASE from environment = /u01/app/oracle
NOTE: ASMB connected to ASM instance +ASM1 osid: 23292 (Flex mode; client id 0x10007)
NOTE: initiating MARK startup
Starting background process MARK
MARK started with pid=34, OS id=23296
ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.61)(PORT=1521))'
   SCOPE=MEMORY SID='RON_1';
NOTE: MARK has subscribed
2016-07-28 15:05:58.435000 +01:00
ALTER SYSTEM SET remote_listener=' ron12cpri-scan:1521' SCOPE=MEMORY SID='RON_1';
ALTER SYSTEM SET listener_networks='(( NAME=net2)(LOCAL_LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)
  (HOST=192.168.102.61)(PORT=1523)))))','(( NAME=net2)(REMOTE_LISTENER=ron12cpri-dg-scan:1521))' 
  SCOPE=MEMORY SID='RON_1';
ALTER DATABASE MOUNT /* db agent *//* {1:45086:1009} */
2016-07-28 15:05:59.470000 +01:00

I can now see that the settings are reflected in the database as well:

SQL> select inst_id, value from gv$spparameter where name = 'listener_networks';

   INST_ID VALUE
---------- ------------------------------
         1

SQL> select inst_id, value from gv$parameter where name = 'listener_networks';

   INST_ID
----------
VALUE
--------------------------------------------------------------------------------
         1
(( NAME=net2)(LOCAL_LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1
02.61)(PORT=1523))))), (( NAME=net2)(REMOTE_LISTENER=ron12cpri-dg-scan:1521))

There is only 1 instance because this is a RAC One database. There is no value in the spfile for the listener_networks parameter, but it’s set in gv$parameter, indicating it has been set dynamically.

As a result, even the listeners know about my database:

[oracle@ron12cprinode2 ~]$ lsnrctl status LISTENER_SCAN1

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 28-JUL-2016 15:20:59

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                28-JUL-2016 14:35:04
Uptime                    0 days 0 hr. 45 min. 55 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.1.0.2/grid/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/ron12cprinode2/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.65)(PORT=1521)))
Services Summary...
Service "-MGMTDBXDB" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "RON" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RONXDB" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_DG_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "_mgmtdb" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "ron12cpri" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
The command completed successfully

What about the SCAN listener on the second network?

[oracle@ron12cprinode2 ~]$ lsnrctl status DGSCANLSNR_SCAN2_NET2

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 28-JUL-2016 15:21:10

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=DGSCANLSNR_SCAN2_NET2)))
STATUS of the LISTENER
------------------------
Alias                     DGSCANLSNR_SCAN2_NET2
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                28-JUL-2016 14:35:03
Uptime                    0 days 0 hr. 46 min. 7 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.1.0.2/grid/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/ron12cprinode2/dgscanlsnr_scan2_net2/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=DGSCANLSNR_SCAN2_NET2)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.102.64)(PORT=1521)))
Services Summary...
Service "RON" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RONXDB" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_DG_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
Service "RON_SRV" has 1 instance(s).
  Instance "RON_1", status READY, has 1 handler(s) for this service...
The command completed successfully

So finally I can connect to the system referencing both SCANs, without having to do anything on my end:

[oracle@ron12cprinode2 ~]$ sqlplus a/b@ron12cpri-scan/ron_srv

SQL*Plus: Release 12.1.0.2.0 Production on Thu Jul 28 15:23:08 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

ERROR:
ORA-01017: invalid username/password; logon denied


Enter user-name: 
[oracle@ron12cprinode2 ~]$ sqlplus a/b@ron12cpri-dg-scan/ron_srv

SQL*Plus: Release 12.1.0.2.0 Production on Thu Jul 28 15:23:17 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

ERROR:
ORA-01017: invalid username/password; logon denied


Enter user-name: 

This seems to prove that the problem is solved.

Posted in 12c Release 1, Linux, RAC | 2 Comments »

Fixing a problem with the ASM spfile preventing RAC 12c from starting

Posted by Martin Bach on July 31, 2015

This is a little note to myself on how to fix a corrupt spfile in clustered ASM. I hope you find it useful, too.

Let’s assume you made a change to the ASM (server) parameter file that causes an issue. You are most likely to notice this once CRS is restarted but parts of the stack fail to come up. If “crsctl check crs” mentions any component not started you can try to find out where in the bootstrap process you are stuck. Here is the output from my system.

[root@rac12pri1 ~]# crsctl stat res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               STABLE
ora.cluster_interconnect.haip
      1        ONLINE  OFFLINE      rac12pri1                STARTING
ora.crf
      1        ONLINE  OFFLINE                               STABLE
ora.crsd
      1        ONLINE  OFFLINE                               STABLE
ora.cssd
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.ctssd
      1        ONLINE  ONLINE       rac12pri1                OBSERVER,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.evmd
      1        ONLINE  INTERMEDIATE rac12pri1                STABLE
ora.gipcd
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.gpnpd
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.mdnsd
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.storage
      1        ONLINE  OFFLINE                               STABLE
--------------------------------------------------------------------------------
[root@rac12pri1 ~]#

I noticed that lots of components are not started. If you are interested in the startup order and dependencies between processes you can find this documented in the Clusterware Administration and Deployment Guide – Chapter 1, Figure 1-2

Another useful piece of information is the Clusterware alert.log. Unlike Oracle Clusterware version 11.2 where log information was in the $GRID_HOME the 12c CRS logs moved to the ADR. A quick look at the alert.log showed


2015-07-28 09:16:51.247 [OCSSD(11611)]CRS-8500: Oracle Clusterware OCSSD process is starting with operating system process ID 11611
2015-07-28 09:16:52.347 [OCSSD(11611)]CRS-1713: CSSD daemon is started in hub mode
2015-07-28 09:16:57.974 [OCSSD(11611)]CRS-1707: Lease acquisition for node rac12pri1 number 1 completed
2015-07-28 09:16:59.076 [OCSSD(11611)]CRS-1605: CSSD voting file is online: /dev/vdc1; details in /u01/app/oracle/diag/crs/rac12pri1/crs/trace/ocssd.trc.
2015-07-28 09:16:59.089 [OCSSD(11611)]CRS-1672: The number of voting files currently available 1 has fallen to the minimum number of voting files required 1.
2015-07-28 09:17:08.198 [OCSSD(11611)]CRS-1601: CSSD Reconfiguration complete. Active nodes are rac12pri1 .
2015-07-28 09:17:10.276 [OCTSSD(11694)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 11694
2015-07-28 09:17:11.261 [OCTSSD(11694)]CRS-2403: The Cluster Time Synchronization Service on host rac12pri1 is in observer mode.
2015-07-28 09:17:11.469 [OCTSSD(11694)]CRS-2407: The new Cluster Time Synchronization Service reference node is host rac12pri1.
2015-07-28 09:17:11.469 [OCTSSD(11694)]CRS-2401: The Cluster Time Synchronization Service started on host rac12pri1.
2015-07-28 09:17:43.016 [ORAROOTAGENT(11376)]CRS-5019: All OCR locations are on ASM disk groups [CHM], and none of these disk groups are
   mounted. Details are at "(:CLSN00140:)" in "/u01/app/oracle/diag/crs/rac12pri1/crs/trace/ohasd_orarootagent_root.trc".
2015-07-28 09:18:05.139 [OCSSD(11611)]CRS-1625: Node rac12pri2, number 2, was shut down
2015-07-28 09:18:05.139 [OCSSD(11611)]CRS-1625: Node rac12pri3, number 3, was shut down
2015-07-28 09:18:05.139 [OCSSD(11611)]CRS-1625: Node rac12pri4, number 4, was shut down

In other words, CSSD has found the block device I use for the voting files, and concludes its initial work. However, the oracle ROOT agent (orarootagent) cannot proceed since none of the OCR locations on ASM can be opened. Checking the log file at that particular time I can see where the problem is:

2015-07-28 09:17:42.989946*:kgfo.c@2846: kgfoCheckMount dg=CHM ok=0
2015-07-28 09:17:42.990045 : USRTHRD:3741497088: {0:9:3} -- trace dump on error exit --

2015-07-28 09:17:42.990057 : USRTHRD:3741497088: {0:9:3} Error [kgfoAl06] in [kgfokge] at kgfo.c:2850

2015-07-28 09:17:42.990067 : USRTHRD:3741497088: {0:9:3} ORA-15077: could not locate ASM instance serving a
  required diskgroup

2015-07-28 09:17:42.990077 : USRTHRD:3741497088: {0:9:3} Category: 7

2015-07-28 09:17:42.990115 : USRTHRD:3741497088: {0:9:3} DepInfo: 15077

2015-07-28 09:17:42.990382 : USRTHRD:3741497088: {0:9:3} -- trace dump end --

2015-07-28 09:17:42.990408 :CLSDYNAM:3741497088: [ora.storage]{0:9:3} [start] retcode = 7, kgfoCheckMount(CHM)
2015-07-28 09:17:42.990423 :CLSDYNAM:3741497088: [ora.storage]{0:9:3} [start] (null) category: 7, operation:
 kgfoAl06, loc: kgfokge, OS error: 15077,
 other: ORA-15077: could not locate ASM instance serving a required diskgroup

So there is not a single ASM instance that could serve the required diskgroup. Hmmm… So maybe I have to back out the change I just made. I have developed a habit of creating backups (pfiles) of spfiles prior to implementing changes. But even if there is no backup of the spfile I can still get the system back, and here are the steps I used. Just as with the database, I need to

  1. Create a temporary pfile on the file system
  2. Start ASM using this temporary pfile
  3. Create a backup my (bad) spfile from the ASM disk group
  4. Extract all parameters
  5. Create a proper pfile that I use to start the cluster with
  6. Convert that to a spfile in ASM

Fixing the problem

The first step is to create a temporary pfile. Using the ASM instance’s alert.log I can go scroll up to a point in time before the change I made to check which parameters are needed. These following are just an example, your settings are different!

...
Using parameter settings in server-side spfile +CHM/rac12pri/ASMPARAMETERFILE/registry.253.885820125
System parameters with non-default values:
  large_pool_size          = 12M
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskstring           = "/dev/vd*1"
  asm_diskgroups           = "DATA"
  asm_diskgroups           = "RECO"
  asm_power_limit          = 1
NOTE: remote asm mode is remote (mode 0x202; from cluster type)
Cluster communication is configured to use the following interface(s) for this instance
  169.254.106.70
  169.254.184.41
cluster interconnect IPC version: Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
...

The new pfile, /tmp/init+ASM1.ora, has the following contents:

  large_pool_size          = 12M
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskstring           = "/dev/vd*1"
  asm_diskgroups           = "DATA"
  asm_diskgroups           = "RECO"
  asm_power_limit          = 1

I can now start the first ASM instance:

[oracle@rac12pri1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 12.1.0.2.0 Production on Tue Jul 28 09:23:23 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup pfile='/tmp/init+ASM1.ora'
ASM instance started

Total System Global Area 1140850688 bytes
Fixed Size                  2933400 bytes
Variable Size            1112751464 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted
SQL>

The alert.log also records the location of the spfile-you should back this up now (using asmcmd or any other tool). Using the backup, you should be able to reconstruct your spfile, but make sure to take the offending parameter out.

I decided to create the spfile as spfileASM.ora in ASM. I amended my temporary pfile with the settings from the recovered spfile and put it back into the cluster.


SQL> create spfile='+CHM/rac12pri/spfileASM.ora' from pfile='/tmp/init+ASM1.ora';

File created.

Why the name change? You cannot create files in ASM that have OMF names. Trying to create the spfile with the original name will cause an error:


SQL> create spfile='+CHM/rac12pri/ASMPARAMETERFILE/registry.253.885820125' from pfile='/tmp/init+ASM1.ora';
create spfile='+CHM/rac12pri/ASMPARAMETERFILE/registry.253.885820125' from pfile='/tmp/init+ASM1.ora'
*
ERROR at line 1:
ORA-17502: ksfdcre:4 Failed to create file
+CHM/rac12pri/ASMPARAMETERFILE/registry.253.885820125
ORA-15177: cannot operate on system aliases

The really nice thing is that this is reflected in the Grid Plug And Play (GPNP) profile immediately. The ASM alert.log showed:

2015-07-28 09:25:01.323000 +01:00
NOTE: updated gpnp profile ASM SPFILE to
NOTE: header on disk 0 advanced to format #2 using fcn 0.0
2015-07-28 09:25:58.332000 +01:00
NOTE: updated gpnp profile ASM diskstring: /dev/vd*1
NOTE: updated gpnp profile ASM diskstring: /dev/vd*1
NOTE: updated gpnp profile ASM SPFILE to +CHM/rac12pri/spfileASM.ora

And the XML profile is updated too (reformatted for better readability)


[oracle@rac12pri1 ~]$ gpnptool get -o-

<?xml version="1.0" encoding="UTF-8"?>
<gpnp:GPnP-Profile Version="1.0" xmlns="http://www.grid-pnp.org/2005/11/gpnp-profile"
  xmlns:gpnp="http://www.grid-pnp.org/2005/11/gpnp-profile"
  xmlns:orcl="http://www.oracle.com/gpnp/2005/11/gpnp-profile"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd"
  ProfileSequence="7" ClusterUId="886a0e42a...5d805357c76a"
  ClusterName="rac12pri" PALocation="">
  <gpnp:Network-Profile>
    <gpnp:HostNetwork id="gen" HostName="*">
     <gpnp:Network id="net1" IP="192.168.100.0" Adapter="eth0" Use="public"/>
     <gpnp:Network id="net2" IP="192.168.101.0" Adapter="eth1" Use="cluster_interconnect"/>
     <gpnp:Network id="net3" IP="192.168.102.0" Adapter="eth2" Use="asm,cluster_interconnect"/>
    </gpnp:HostNetwork>
  </gpnp:Network-Profile>
  <orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/>
  <orcl:ASM-Profile id="asm" DiscoveryString="/dev/vd*1" SPFile="+CHM/rac12pri/spfileASM.ora" Mode="remote"/>
  <ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#">...</ds:Signature>
</gpnp:GPnP-Profile>

This should be it-the correct values have been restored, the spfile is back on shared storage, and I should be able to start with this combination. After having issued the stop/start commands to CRS it was indeed all well:

[root@rac12pri1 ~]# crsctl check cluster
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online   

[root@rac12pri1 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       rac12pri1                STABLE
ora.CHM.dg
               ONLINE  ONLINE       rac12pri1                STABLE
ora.DATA.dg
               ONLINE  ONLINE       rac12pri1                STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac12pri1                STABLE
ora.RECO.dg
               ONLINE  ONLINE       rac12pri1                STABLE
ora.net1.network
               ONLINE  ONLINE       rac12pri1                STABLE
ora.ons
               ONLINE  ONLINE       rac12pri1                STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       rac12pri1                169.254.1.137 192.16
                                                             8.101.10 192.168.102
                                                             .10,STABLE
ora.asm
      1        ONLINE  ONLINE       rac12pri1                STABLE
      2        ONLINE  OFFLINE                               STABLE
      3        ONLINE  OFFLINE                               STABLE
ora.cdb.db
      1        OFFLINE OFFLINE                               Instance Shutdown,ST
                                                             ABLE
      2        OFFLINE OFFLINE                               STABLE
      3        OFFLINE OFFLINE                               STABLE
      4        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       rac12pri1                Open,STABLE
ora.ncdb.db
      1        ONLINE  ONLINE       rac12pri1                Open,STABLE
      2        ONLINE  OFFLINE                               STABLE
      3        ONLINE  OFFLINE                               STABLE
      4        ONLINE  OFFLINE                               STABLE
ora.ncdb.fotest.svc
      1        ONLINE  OFFLINE                               STABLE
      2        ONLINE  ONLINE       rac12pri1                STABLE
ora.oc4j
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.rac12pri1.vip
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.rac12pri2.vip
      1        ONLINE  INTERMEDIATE rac12pri1                FAILED OVER,STABLE
ora.rac12pri3.vip
      1        ONLINE  INTERMEDIATE rac12pri1                FAILED OVER,STABLE
ora.rac12pri4.vip
      1        ONLINE  INTERMEDIATE rac12pri1                FAILED OVER,STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       rac12pri1                STABLE
--------------------------------------------------------------------------------

Time to start Clusterware on the other nodes and to report “We are back and running” :)

Reference

Posted in 12c Release 1, Automatic Storage Management, RAC | Leave a Comment »

Installing Oracle 12.1.0.2 RAC on Oracle Linux 7-part 2

Posted by Martin Bach on January 12, 2015

In the first part of the article series you could read how a kickstart file made the installation of Oracle Linux 7 a lot more bearable. In this part of the series it’s all about configuring the operating system. The installation of Grid Infrastructure and the Oracle database is for another set of posts.

There are quite some differences between Oracle Linux 6 and 7

To me the transition from Oracle Linux 6 to 7 feels like the step from Solaris 9 to 10 at the time. Personally I think that a lot has changed. Although, it’s fair to say that it has been quite some time it has been announced that the network stack commands we know and love are deprecated and might go… Even with Oracle Linux 6 there was a threat that network manager would now be the only tool to modify your network settings (which thankfully was not the case). A lot of efforts of the Linux community have now come to fruition, and it’s time to adjust to the future. Even when it’s painful (and it is, at least a bit).

Configuring the network

The warning has been out there quite a while but now it seems to be true-no more system-config-network-tui to configure the network! No more ifconfig! Oh dear-quite a bit of learning to be done. Luckily someone else has done all the legwork and documented the changes. A good example is this one:

https://dougvitale.wordpress.com/2011/12/21/deprecated-linux-networking-commands-and-their-replacements/

So first of all-don’t fear: although all network interfaces are configured using network manager now, you can still use a command line tool: nmtui. After trying it out I have to say I’m not really convinced about its usability. What appears better is the use of the nmcli, network manager command line tool. It’s use is quite confusing, and it appears to me as if the whole network manager toolset was developed for laptop users, not servers. But I digress. I have a few interfaces in my RAC VM, the first was configured during the installation, eth[1-3] aren’t configured yet.

[root@localhost ~]# nmcli connection show
NAME         UUID                                  TYPE            DEVICE
System eth0  77e3f8a9-76d0-4051-a8f2-cbbe39dab089  802-3-ethernet  eth0
[root@localhost ~]# nmcli device status
DEVICE  TYPE      STATE         CONNECTION
eth0    ethernet  connected     System eth0
eth1    ethernet  disconnected  --
eth2    ethernet  disconnected  --
eth3    ethernet  disconnected  --
lo      loopback  unmanaged     --
[root@localhost ~]#

At this point I have used eth0 as the management network (similar to the way Exadata does) and will use the other networks for the database. eth1 will act as the public network, eth2 and eth3 will be private.

Although the network interfaces can be named differently for device name persistence I stick with the old naming for now. I don’t want to run into trouble with the installer just yet. On physical hardware you are very likely to see very different network interface names, the kernel uses a naming scheme identifying where the cards are (on the main board, or in extension cards for example). I’ll write another post about that soon.

Using dnsmasq (on the host) I configure my hosts for these addresses:

[root@ol62 ~]# grep rac12pri /etc/hosts
192.168.100.107	rac12pri1.example.com		rac12pri1
192.168.100.108	rac12pri1-vip.example.com	rac12pri1-vip
192.168.100.109	rac12pri2.example.com		rac12pri2
192.168.100.110	rac12pri2-vip.example.com	rac12pri2-vip
192.168.100.111	rac12pri-scan.example.com	rac12pri-scan
192.168.100.112	rac12pri-scan.example.com	rac12pri-scan
192.168.100.113	rac12pri-scan.example.com	rac12pri-scan

Configuring the interface is actually not too hard once you got the hang of it. It took me a little while to get it though… It almost appears as if something that was simple and easy to use was made difficult to use.

[root@localhost ~]# nmcli con add con-name eth1 ifname eth1 type ethernet ip4 192.168.100.107/24 gw4 192.168.100.1
[root@localhost ~]# nmcli con add con-name eth2 ifname eth2 type ethernet ip4 192.168.101.107/24
[root@localhost ~]# nmcli con add con-name eth3 ifname eth3 type ethernet ip4 192.168.102.107/24 

[root@localhost ~]# nmcli con show
NAME         UUID                                  TYPE            DEVICE
eth2         ccc7f592-b563-4b9d-a36b-2b45809e4643  802-3-ethernet  eth2
eth1         ae897dee-42ff-4ccd-843b-7c97ba0d5315  802-3-ethernet  eth1
System eth0  77e3f8a9-76d0-4051-a8f2-cbbe39dab089  802-3-ethernet  eth0
eth3         b6074c9a-dcc4-4487-9a8a-052e4c60bbca  802-3-ethernet  eth3

I can now verify the IP addresses using the “ip” tool (ifconfig was not installed, I haven’t yet checked if there was a compatibility package though)

[root@localhost ~]# ip addr show
1: lo: &lt;LOOPBACK,UP,LOWER_UP&gt; mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:6e:6f:67 brd ff:ff:ff:ff:ff:ff
    inet 192.168.150.111/24 brd 192.168.150.255 scope global eth0
    inet6 fe80::5054:ff:fe6e:6f67/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:96:ad:88 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.107/24 brd 192.168.100.255 scope global eth1
    inet6 fe80::5054:ff:fe96:ad88/64 scope link
       valid_lft forever preferred_lft forever
4: eth2: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:c1:cc:8e brd ff:ff:ff:ff:ff:ff
    inet 192.168.101.107/24 brd 192.168.101.255 scope global eth2
    inet6 fe80::5054:ff:fec1:cc8e/64 scope link
       valid_lft forever preferred_lft forever
5: eth3: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:7e:59:45 brd ff:ff:ff:ff:ff:ff
    inet 192.168.102.107/24 brd 192.168.102.255 scope global eth3
    inet6 fe80::5054:ff:fe7e:5945/64 scope link
       valid_lft forever preferred_lft forever

Now what’s left is setting the hostname-which is a simple call to hostnamectl –static set-hostname rac12pri1. nmcli gives you an interface to changing the hostname as well. I repeated the steps for node 2, they are identical except for the network IP addresses of course.

So that concludes the network setup.

Managing linux daemons

If you are curious about setting services at runlevel, then there’ll be another surprise:

[root@rac12pri2 ~]# chkconfig --list

Note: This output shows SysV services only and does not include native
      systemd services. SysV configuration data might be overridden by native
      systemd configuration.

      If you want to list systemd services use 'systemctl list-unit-files'.
      To see services enabled on particular target use
      'systemctl list-dependencies [target]'.

iprdump        	0:off	1:off	2:on	3:on	4:on	5:on	6:off
iprinit        	0:off	1:off	2:on	3:on	4:on	5:on	6:off
iprupdate      	0:off	1:off	2:on	3:on	4:on	5:on	6:off
netconsole     	0:off	1:off	2:off	3:off	4:off	5:off	6:off
network        	0:off	1:off	2:on	3:on	4:on	5:on	6:off
pmcd           	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmie           	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmlogger       	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmmgr          	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmproxy        	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmwebd         	0:off	1:off	2:off	3:off	4:off	5:off	6:off
rhnsd          	0:off	1:off	2:on	3:on	4:on	5:on	6:off
[root@rac12pri2 ~]#

If you just got familiar with upstart then there are some bad news: upstart is now replaced with systemd… This might be the right time to read up on that if you aren’t familiar with it yet:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/System_Administrators_Guide/chap-Managing_Services_with_systemd.html

Things are a little different with that, so here is an example how to enable and start the NTP service. It has to be installed first if that hasn’t been the case. You also should add the -x flag in /etc/sysconfig/ntpd. First I would like to see if the service is available. You use systemctl for this-so instead of a chkconfig ntpd –list you call systemctl as shown:

[root@rac12pri ~]# systemctl list-units --type service --all | grep ntpd
ntpd.service                                                                              loaded inactive dead    Network Time Service
ntpdate.service                                                                           loaded inactive dead    Set time via NTP

I have to get used to the new syntax: previously you used “service <whatever> status” and then, if you needed, typed backspace a few times and changed status to start. The new syntax is closer to human language but less practical: systemctl status <service>. Changing status to start requires more typing.

The check proved that the service exists (i.e. the NTP package is installed), but it is not started. We can change this:

[root@rac12pri ~]# systemctl enable ntpd.service
[root@rac12pri ~]# systemctl start ntpd.service
[root@rac12pri ~]# systemctl status ntpd.service
ntpd.service - Network Time Service
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled)
Active: active (running) since Tue 2014-12-16 15:38:47 GMT; 1s ago
Process: 5179 ExecStart=/usr/sbin/ntpd -u ntp:ntp $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 5180 (ntpd)
CGroup: /system.slice/ntpd.service
└─5180 /usr/sbin/ntpd -u ntp:ntp -g -x

Dec 16 15:38:47 rac12pri ntpd[5180]: Listen normally on 8 eth1 fe80::5054:ff:fe96:ad88 UDP 123
Dec 16 15:38:47 rac12pri ntpd[5180]: Listen normally on 9 eth2 fe80::5054:ff:fec1:cc8e UDP 123
Dec 16 15:38:47 rac12pri ntpd[5180]: Listen normally on 10 eth3 fe80::5054:ff:fe7e:5945 UDP 123
Dec 16 15:38:47 rac12pri ntpd[5180]: Listen normally on 11 eth0 fe80::5054:ff:fe6e:6f67 UDP 123
Dec 16 15:38:47 rac12pri ntpd[5180]: Listening on routing socket on fd #28 for interface updates
Dec 16 15:38:47 rac12pri ntpd[5180]: 0.0.0.0 c016 06 restart
Dec 16 15:38:47 rac12pri ntpd[5180]: 0.0.0.0 c012 02 freq_set ntpd 0.000 PPM
Dec 16 15:38:47 rac12pri ntpd[5180]: 0.0.0.0 c011 01 freq_not_set
Dec 16 15:38:47 rac12pri systemd[1]: Started Network Time Service.
Dec 16 15:38:48 rac12pri ntpd[5180]: 0.0.0.0 c614 04 freq_mode
[root@rac12pri ~]#

The call to “systemctl enable” replaces an invocation of chkconfig to automatically start ntpd as a service (chkconfig ntpd on). Starting the service does not produce any output, hence the need to check the status.

There is a slight caveat with the use of NTP: it is not the default time keeping service. Another tool, named chronyd is used instead.

This causes a problem after the next reboot: chronyd will be started, NTPd won’t be. The Red Hat documentation therefore has a section on how to switch:

[root@rac12pri ~]# systemctl stop chronyd
[root@rac12pri ~]# systemctl disable chronyd
[root@rac12pri ~]# systemctl status chronyd

Storage

Shared storage is provided by KVM. I am using my SSDs in the lab from where I create a few “LUNs”. These must explicitly be made “shareable” to be accessible by more than one guest. Since 12.1.0.2.0 Oracle installs a database for the cluster health monitor by default. Currently I use the following setup for my lab 12.1.0.2 clusters:

  1. +CHM (external redundancy) – 1x 15GB
  2. +OCR (normal redundancy) – 3x 2 GB
  3. +DATA (external redundancy) – 1 x 15GB
  4. +RECO (external redundancy) – 1 x 10 GB

If you use the guided installation of Grid Infrastructure the installer will prompt you for a single disk group only. This means that the CHM database as well as the OCR and voting files be installed in that disk group. I prefer to separate them though, which is why I create a second disk group OCR after the installation has completed and move the voting files and OCR out of +CHM.

DATA and RECO are standard Exadata disk groups and I like to keep things consistent for myself.

I use fdisk to partition the future ASM disks with 1 partition spanning the whole LUN.

Other tasks

A lot of the other pre-installation tasks can actually be performed during the kickstart installation. I still like to use SELinux in permissive mode even though-according to Requirements for Installing Oracle Database 12.1 on RHEL6 or OL6 64-bit (x86-64) (Doc ID 1529864.1)-selinux can be in “enforcing”. The directive in the kickstart file is

selinux –permissive

You shouldn’t have to install additional packages-all packages to be installed should go into the %packages section of the file. Simply copy the package names from the official documentation and paste below the last package in the section. There is one exception to the rule: cvuqdisk must be installed from the Oracle installation media.

Settings for /etc/sysctl.conf and /etc/security/limits.conf can also be made in the kickstart file as shown in the first part of this series.

Storage to be made available to RAC must have permissions set. Since there isn’t an ASMLib in Oracle Linux 7 to my knowledge UDEV will have to be used, and my udev configuration file, too, is in the first part.

To make sure my user and group IDs for the oracle and grid account are the same I create the accounts in the kickstart file as well. Passwords are deliberately not set-they may evolve and I can’t possibly remember them all :)

User equivalence can be set up using a technique I have already described in an earlier blog post. Although the user equivalence setup can be deferred to when you install Grid Infrastructure I still perform it before to allow me to run the cluster verification tool with the -fixup option.

Posted in 12c Release 1, KVM, Linux, RAC | 3 Comments »

Adventures in RAC: gc buffer busy acquire and release

Posted by Martin Bach on December 16, 2014

It seems that I’m getting more and more drawn into the world of performance analysis, and since I sometimes tend to forget things I need to write them down. I almost enjoy the “getting there” more than ultimately solving the problem. You pick up quite a few things on the way.

This environment is Exadata 12.1.1.1.1/Oracle 12.1.0.2 but as with so many things the fact that the database is on Exadata shouldn’t matter.

So here is one of these posts, this time I’m writing up what I saw related to GC Buffer Busy Acquire.

gc buffer busy acquire?

Whenever I see a wait event I haven’t dealt with extensively in the past I try to provoke behaviour to study it more closely. But first you need to know the event’s meaning. One option is to check v$event_name:

SQL> select name, parameter1, parameter2, parameter3, wait_class
  2  from v$event_name where name = 'gc buffer busy acquire';

NAME                           PARAMETER1 PARAMETER2 PARAMETER3 WAIT_CLASS
------------------------------ ---------- ---------- ---------- -------------------
gc buffer busy acquire         file#      block#     class#     Cluster

So when you see this event in ASH/traces/v$session etc you know it’s a cluster wait and potentially limiting your processing throughput. It also tells you the file#, block# and class# of the buffer which you can link to v$bh. This view allows you to find the data object ID given these input parameters.

Using https://orainternals.wordpress.com/2012/04/19/gc-buffer-busy-acquire-vs-release/ as a source I worked out that the event has to do with acquiring a buffer (=block) in RAC (gc = global cache) on the local instance. If the block you need is on the remote instance you wait for it to be released, and the wait event is gc buffer busy release.

Since Oracle will clone blocks in buffer caches for consistent reads and use a shared lock on these for reading I thought that waiting can only happen if someone requested a block in XCUR (exclusive current) mode. So with that working hypothesis I went to work.

How to test

I started off writing a small java class that creates a connection pool against my RAC database. I initially used the default service name in the connect descriptor but had to find out that dbms_monitor.SERV_MOD_ACT_TRACE_ENABLE did not trace my sessions. In the end I created a true RAC service with CLB and RLB goals against both instances and I ended up with traces in the diagnostic_dest.

After setting up the UCP connection pool the code will create a number of threads that each will pull a connection from the pool, do some work (*) and hand it back to the pool as good citizens should do.

(*) The do some work bit is this::

...
                        try {

                                PreparedStatement pstmt = conn.prepareStatement(
                                  "select /* bufferbusy001 */ id, to_char(sysdate,'hh24:mi:ss') d " +
                                  "from t1 where id = ? for update");

                                int randomID = new Random().nextInt((1450770 - 1450765) + 1) + 1450765;
                                System.out.println("thread " + mThreadID + " now querying for ID " + randomID);
                                pstmt.setInt(1, randomID);

                                ResultSet rs = pstmt.executeQuery();

                                while (rs.next()) {
                                        System.out.println("Thread " + mThreadID + " reporting an id of "
                                        + rs.getInt("id") + ". Now it is " + rs.getString("d"));
                                }

                                rs.close();
                                pstmt.close();
                                conn.rollback();
                                conn.close();
                                conn = null;

                                Thread.sleep(2000);
                        } catch (Exception e) {
                                e.printStackTrace();
                        }
...

I think that’s how a Java developer would do it (with more error handling of course) but then I’m not a Java developer. It did work though! What I considered most important was to generate contention on a single block. Using dbms_rowid I could find out which IDs belong to (a random) block:

SQL> select * from (
  2    select id,DBMS_ROWID.ROWID_BLOCK_NUMBER (rowid, 'BIGFILE') as block
  3      from t1
  4 ) where block = 11981654;

        ID      BLOCK
---------- ----------
   1450765   11981654
   1450766   11981654
   1450767   11981654
   1450768   11981654
   1450769   11981654
   1450770   11981654

6 rows selected.

So if I manage to randomly select from the table where ID in the range …765 to …770 then I should be ok and just hit that particular block.

It turned out that the SQL statement completed so quickly I had to considerably ramp up the number of sessions in the pool to see anything. I went up from 10 to 500 before I could notice a change. Most of the statements are too quick to even be caught in ASH-Tanel’s ashtop script showed pretty much nothing except ON-CPU occasionally as well as the odd log file sync event. Snapper also reported sessions in idle state.

SQL> r
  1  select count(*), inst_id, status, sql_id, event, state
  2  from gv$session where module = 'BufferBusy'
  3* group by inst_id, status, sql_id, event, state

  COUNT(*)    INST_ID STATUS   SQL_ID        EVENT                          STATE
---------- ---------- -------- ------------- ------------------------------ -------------------
       251          1 INACTIVE               SQL*Net message from client    WAITING
       248          2 INACTIVE               SQL*Net message from client    WAITING

2 rows selected.

That’s what you see for most of the time.

Let me trace this for you

So in order to get any meaningful idea about the occurrence (or absence) of the gc buffer busy acquire event I added a MODULE to my sessions so I can later on run trcsess to combine traces. Here is the resulting raw trace, or rather an excerpt from it:

=====================
PARSING IN CURSOR #140650659166120 len=96 dep=0 uid=65 oct=3 lid=65 tim=4170152514049 hv=1500360262 ad='5b58a4a10' sqlid='6a5jfvpcqvbk6'
select /* bufferbusy001 */ id, to_char(sysdate,'hh24:mi:ss') d from t1 where id = :1  for update
END OF STMT
PARSE #140650659166120:c=0,e=5598,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2291732815,tim=4170152514046
WAIT #140650659166120: nam='gc buffer busy acquire' ela= 12250 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152526533
WAIT #140650659166120: nam='buffer busy waits' ela= 1890 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152528475
WAIT #140650659166120: nam='enq: TX - row lock contention' ela= 188606 name|mode=1415053318 usn<<16 | slot=1179674 sequence=1485 obj#=20520 tim=4170152717199
WAIT #140650659166120: nam='gc buffer busy acquire' ela= 1590 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152718839
WAIT #140650659166120: nam='enq: TX - row lock contention' ela= 36313 name|mode=1415053318 usn<<16 | slot=1245199 sequence=1894 obj#=20520 tim=4170152755340
WAIT #140650659166120: nam='gc buffer busy acquire' ela= 1268 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152756655
WAIT #140650659166120: nam='buffer busy waits' ela= 668 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152757363
WAIT #140650659166120: nam='KJC: Wait for msg sends to complete' ela= 11 msg=26941469232 dest|rcvr=65539 mtype=8 obj#=20520 tim=4170152757492
EXEC #140650659166120:c=1999,e=243530,p=0,cr=9,cu=4,mis=0,r=0,dep=0,og=1,plh=2291732815,tim=4170152757651
WAIT #140650659166120: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=20520 tim=4170152757709
FETCH #140650659166120:c=0,e=16,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=2291732815,tim=4170152757749
STAT #140650659166120 id=1 cnt=1 pid=0 pos=1 obj=0 op='FOR UPDATE  (cr=9 pr=0 pw=0 time=243443 us)'
STAT #140650659166120 id=2 cnt=2 pid=1 pos=1 obj=0 op='BUFFER SORT (cr=3 pr=0 pw=0 time=60 us)'
STAT #140650659166120 id=3 cnt=1 pid=2 pos=1 obj=48863 op='INDEX RANGE SCAN I_T1$SEC1 (cr=3 pr=0 pw=0 time=37 us cost=3 size=6 card=1)'
WAIT #140650659166120: nam='SQL*Net message from client' ela= 260 driver id=1413697536 #bytes=1 p3=0 obj#=20520 tim=4170152758109
CLOSE #140650659166120:c=0,e=5,dep=0,type=1,tim=4170152758141
XCTEND rlbk=1, rd_only=0, tim=4170152758170
WAIT #0: nam='gc buffer busy acquire' ela= 3764 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152761976
WAIT #0: nam='buffer busy waits' ela= 1084 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152763104
WAIT #0: nam='log file sync' ela= 246 buffer#=119491 sync scn=19690898 p3=0 obj#=20520 tim=4170152763502
WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=20520 tim=4170152763536

Result! There are gc buffer busy acquire events recorded. I can’t rule out TX-row lock contention since with all those threads and only 6 IDs to choose from there was going to be some locking on the same ID caused by the “for update” clause.

Now I am reasonably confident that I worked out at least one scenario causing a gc buffer busy acquire. You might also find the location of the blocks in the buffer cache interesting:

SYS:dbm011> select count(*), inst_id, block#, status
  2  from gv$bh where block# = 11981654
  3  group by inst_id, block#, status order by inst_id, status;

   COUNT(*)     INST_ID      BLOCK# STATUS
----------- ----------- ----------- ----------
          9           1    11981654 cr
          1           1    11981654 xcur
          9           2    11981654 cr

There is the one block in XCUR mode and 9 in CR mode in the buffer cache for that block.

Making it worse

Now I didn’t want to stop there, I was interested in what would happen under CPU load. During my career I noticed cluster waits appear primarily when you are CPU-bound (all other things being equal). This could be the infamous middle-tier-connection-pool-mismanagement or an execution plan going wrong with hundreds of users performing nested loop joins when they should hash-join large data sets… This is usually the point where OEM users ask the DBAs to do something against that “sea of grey” in the performance pages.

As with every cluster technology an overloaded CPU does not help. Well-I guess that’s true for all computing. To increase the CPU load I created 10 dd sessions to read from /dev/zero and write to /dev/null. Sounds silly but one of these hogs 1 CPU core 100%. With 10 out of 12 cores 100% occupied that way on node 1 I relaunched my test. The hypothesis that CPU overload has an effect was proven right by suddenly finding ASH samples of my session.

SQL> @ash/ashtop sql_id,session_state,event "sql_id='6a5jfvpcqvbk6'" sysdate-5/1440 sysdate

    Total
  Seconds     AAS %This   SQL_ID        SESSION EVENT
--------- ------- ------- ------------- ------- ----------------------------------------
      373     1.2   79% | 6a5jfvpcqvbk6 WAITING enq: TX - row lock contention
       54      .2   11% | 6a5jfvpcqvbk6 WAITING gc buffer busy release
       20      .1    4% | 6a5jfvpcqvbk6 ON CPU
       11      .0    2% | 6a5jfvpcqvbk6 WAITING gc buffer busy acquire
       11      .0    2% | 6a5jfvpcqvbk6 WAITING gc current block busy
        1      .0    0% | 6a5jfvpcqvbk6 WAITING gc current block 2-way

6 rows selected.

Using ASQLMON I can even see where time is spent:

SQL> @scripts/ash/asqlmon 6a5jfvpcqvbk6 % sysdate-1 sysdate

    SECONDS Activity Visual       Line ID Parent ASQLMON_OPERATION                   SESSION EVENT                                         AVG_P3 OBJ_ALIAS_ ASQLMON_PR
----------- -------- ------------ ------- ------ ----------------------------------- ------- ---------------------------------------- ----------- ---------- ----------
         38    2.4 % |          |       0        SELECT STATEMENT                    ON CPU                                                883065
          1     .1 % |          |       1      0  FOR UPDATE                         WAITING gc current block 2-way                      33554433  [SEL$1]
         25    1.6 % |          |       1      0                                     ON CPU                                            5369727.36  [SEL$1]
         17    1.1 % |          |       1      0                                     WAITING buffer busy waits                                  1  [SEL$1]
        109    6.8 % |#         |       1      0                                     WAITING gc buffer busy release                             1  [SEL$1]
         31    1.9 % |          |       1      0                                     WAITING gc buffer busy acquire                             1  [SEL$1]
         27    1.7 % |          |       1      0                                     WAITING gc current block busy                       33554433  [SEL$1]
        768   48.0 % |#####     |       1      0                                     WAITING enq: TX - row lock contention            6685.143229  [SEL$1]
          3     .2 % |          |       2      1   BUFFER SORT                       ON CPU                                                     0
          2     .1 % |          |       3      2    INDEX RANGE SCAN [I_T1$SEC1]     ON CPU                                                     0 T1@SEL$1 [ [A:]
                                                                                                                                                  SEL$1]     "ID"=:1


          2     .1 % |          |       0        SELECT STATEMENT                    ON CPU                                            16777216.5
          2     .1 % |          |       1      0  FOR UPDATE                         WAITING gc current block busy                       33554433  [SEL$1]
         24    1.5 % |          |       1      0                                     WAITING write complete waits                               0  [SEL$1]
          9     .6 % |          |       1      0                                     WAITING gc buffer busy acquire                             1  [SEL$1]
         30    1.9 % |          |       1      0                                     WAITING gc buffer busy release                             1  [SEL$1]
          9     .6 % |          |       1      0                                     WAITING buffer busy waits                                  1  [SEL$1]
          7     .4 % |          |       1      0                                     ON CPU                                           158.8571429  [SEL$1]
        496   31.0 % |###       |       1      0                                     WAITING enq: TX - row lock contention            6396.395161  [SEL$1]
                   % |          |       2      1   BUFFER SORT
                   % |          |       3      2    INDEX RANGE SCAN [I_T1$SEC1]                                                                  T1@SEL$1 [ [A:]
                                                                                                                                                  SEL$1]     "ID"=:1

Further Reading

I’m sure there is a wealth of resources available out there, in my case Riyaj’s blog helped me a lot. He even tagged posts with gc buffer busy: https://orainternals.wordpress.com/tag/gc-buffer-busy

Have a look at the Oaktable World 2014 agenda and watch Tanel Poder’s session attentively. You’d be surprised how many scripts he made publicly available to troubleshoot perform. Like snapper? It’s only the tip of the iceberg. And if you can, you should really attend his advanced troubleshooting seminar.

Posted in 12c Release 1, RAC | 2 Comments »

A first look at RAC 12c (part I)

Posted by Martin Bach on August 4, 2014

I have recently upgraded my RAC 12.1.0.1.3 system to RAC 12.1.0.2 including the RDBMS installation. Currently I am updating my skills with information relevant to what I would normally have called 12c Release 2 (so that would also answer the question: when is 12c Release 2 coming out?). Then I realised I haven’t posted a first look at RAC 12c post yet-so here it comes.

There are a few things that aren’t specifically mentioned in the new features guide that caught my eye. First of all, RAC 12 does a few really cool things. Have a look at the srvctl command output:

[oracle@rac12node1 ~]$ srvctl
Usage: srvctl <command> <object> [<options>]
    commands: enable|disable|export|import|start|stop|relocate|status|add|remove|modify|getenv|setenv|unsetenv|
       config|convert|update|upgrade|downgrade|predict
    objects: database|instance|service|nodeapps|vip|network|asm|diskgroup|listener|srvpool|server|scan|scan_listener|
        oc4j|home|filesystem|gns|cvu|havip|exportfs|rhpserver|rhpclient|mgmtdb|mgmtlsnr|volume|mountfs
For detailed help on each command and object and its options use:
  srvctl <command> -help [-compatible] or
  srvctl <command> <object> -help [-compatible]
[oracle@rac12node1 ~]$

Quite a few more than with 11.2.0.3:

[oracle@rac112node1 ~]$ srvctl
Usage: srvctl <command> <object> [<options>]
 commands: enable|disable|start|stop|relocate|status|add|remove|modify|getenv|setenv|unsetenv|config|convert|upgrade
 objects: database|instance|service|nodeapps|vip|network|asm|diskgroup|listener|srvpool|server|scan|scan_listener|oc4j|home|filesystem|gns|cvu
For detailed help on each command and object and its options use:
 srvctl <command> -h or
 srvctl <command> <object> -h

I will detail the meaning of some of these later in this post or another one to follow.

Evaluation and Prediction

When you are working with policy managed databases RAC 12c already gave you a “what if” option in form of the -eval flag. If for example you wanted to grow your server pool from 2 to 3 nodes:

[oracle@rac12node1 ~]$ srvctl modify srvpool -serverpool pool1 -max 3 -eval -verbose
Database two will be started on node rac12node3
Server rac12node3 will be moved from pool Free to pool pool1
[oracle@rac12node1 ~]$

Now you will be able to predict a resource failure as well:

[oracle@rac12node1 ~]$ srvctl predict -h

The SRVCTL predict command evaluates the consequences of resource failure.

Usage: srvctl predict database -db <database_name> [-verbose]
Usage: srvctl predict service -db <database_name> -service <service_name> [-verbose]
Usage: srvctl predict asm [-node <node_name>] [-verbose]
Usage: srvctl predict diskgroup -diskgroup <diskgroup_name> [-verbose]
Usage: srvctl predict filesystem -device <volume_device> [-verbose]
Usage: srvctl predict vip -vip <vip_name> [-verbose]
Usage: srvctl predict network [-netnum <network_number>] [-verbose]
Usage: srvctl predict listener -listener <listener_name> [-verbose]
Usage: srvctl predict scan -scannumber <scan_ordinal_number> [-netnum <network_number>] [-verbose]
Usage: srvctl predict scan_listener -scannumber <scan_ordinal_number> [-netnum <network_number>] [-verbose]
Usage: srvctl predict oc4j [-verbose]

So what would happen if a disk group failed?

[oracle@rac12node1 ~]$ srvctl predict diskgroup -diskgroup DATA -verbose
Resource ora.DATA.dg will be stopped
Resource ora.DATA.ORAHOMEVOL.advm will be stopped
[oracle@rac12node1 ~]$

What it doesn’t do at this stage seems to be an assessment of cascading further problems. If +DATA went down, it would pretty much drag the whole cluster with it, too.

Status

Interestingly you can see a lot more detail with 12.1.0.2 than previously. Here is an example of a policy-managed RAC One Node database:

[oracle@rac12node1 ~]$ srvctl config database -d RONNCDB
Database unique name: RONNCDB
Database name: RONNCDB
Oracle home: /u01/app/oracle/product/12.1.0.2/dbhome_1
Oracle user: oracle
Spfile: +DATA/RONNCDB/PARAMETERFILE/spfile.319.854718651
Password file: +DATA/RONNCDB/PASSWORD/pwdronncdb.290.854718263
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: ronpool1
Disk Groups: RECO,DATA
Mount point paths:
Services: NCDB
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: RONNCDB
Candidate servers:
OSDBA group: dba
OSOPER group:
Database instances:
Database is policy managed

Did you spot the OSDBA and OSOPER group mappings in the output? DBCA by default creates the password file and server parameter file into ASM since 12.1.0.1.

You can get a lot more status information in 12.1.0.2 then previously, especially when compared to 11.2:

[oracle@rac12node1 ~]$ srvctl status -h

The SRVCTL status command displays the current state of the object.

Usage: srvctl status database {-db <db_unique_name> [-serverpool <serverpool_name>] | -serverpool <serverpool_name> | -thisversion | -thishome} [-force] [-verbose]
Usage: srvctl status instance -db <db_unique_name> {-node <node_name> | -instance <inst_name_list>} [-force] [-verbose]
Usage: srvctl status service {-db <db_unique_name> [-service  "<service_name_list>"] | -serverpool <serverpool_name> [-db <db_unique_name>]} [-force] [-verbose]
Usage: srvctl status nodeapps [-node <node_name>]
Usage: srvctl status vip {-node <node_name> | -vip <vip_name>} [-verbose]
Usage: srvctl status listener [-listener <lsnr_name>] [-node <node_name>] [-verbose]
Usage: srvctl status asm [-proxy] [-node <node_name>] [-detail] [-verbose]
Usage: srvctl status scan [[-netnum <network_number>] [-scannumber <scan_ordinal_number>] | -all] [-verbose]
Usage: srvctl status scan_listener [[-netnum <network_number>] [-scannumber <scan_ordinal_number>] | -all] [-verbose]
Usage: srvctl status srvpool [-serverpool <pool_name>] [-detail]
Usage: srvctl status server -servers "<server_list>" [-detail]
Usage: srvctl status oc4j [-node <node_name>] [-verbose]
Usage: srvctl status rhpserver
Usage: srvctl status rhpclient
Usage: srvctl status home -oraclehome <oracle_home> -statefile <state_file> -node <node_name>
Usage: srvctl status filesystem [-device <volume_device>] [-verbose]
Usage: srvctl status volume [-device <volume_device>] [-volume <volume_name>] [-diskgroup <group_name>] [-node <node_list> | -all]
Usage: srvctl status diskgroup -diskgroup <dg_name> [-node "<node_list>"] [-detail] [-verbose]
Usage: srvctl status cvu [-node <node_name>]
Usage: srvctl status gns [-node <node_name>] [-verbose]
Usage: srvctl status mgmtdb [-verbose]
Usage: srvctl status mgmtlsnr [-verbose]
Usage: srvctl status exportfs [-name <expfs_name> |-id <havip id>]
Usage: srvctl status havip [-id <id>]
Usage: srvctl status mountfs -name <mountfs_name>
For detailed help on each command and object and its options use:
  srvctl <command> <object> -help [-compatible]

RAC 12.1.0.2 adds a nifty few little flags: thisversion and thishome to srvctl status database. That works really well where you have multiple versions of Oracle on the same machine (think consolidation):

[oracle@rac12node1 ~]$ srvctl status database -thisversion
Database unique name: RONNCDB
Instance RONNCDB_1 is running on node rac12node4
Online relocation: INACTIVE

Database unique name: TWO
Instance TWO_1 is running on node rac12node1
Instance TWO_2 is running on node rac12node2

Verbose!

Some commands are actually more verbose when you specify the -verbose flag:

[oracle@rac12node1 ~]$ srvctl status database -d RONNCDB -verbose
Instance RONNCDB_1 is running on node rac12node4 with online services NCDB. Instance status: Open.
Online relocation: INACTIVE
[oracle@rac12node1 ~]$ srvctl status database -d RONNCDB
Instance RONNCDB_1 is running on node rac12node4
Online relocation: INACTIVE
[oracle@rac12node1 ~]$

But that’s not new in 12.1.0.2 I believe.

Interesting changes for database logging

The database itself will also tell you more about memory allocation:

**********************************************************************
Dump of system resources acquired for SHARED GLOBAL AREA (SGA)
 Per process system memlock (soft) limit = 64K
Thu Jul 31 13:34:58 2014
 Expected per process system memlock (soft) limit to lock
 SHARED GLOBAL AREA (SGA) into memory: 1538M
Thu Jul 31 13:34:58 2014
 Available system pagesizes:
  4K, 2048K
 Supported system pagesize(s):
  PAGESIZE  AVAILABLE_PAGES  EXPECTED_PAGES  ALLOCATED_PAGES  ERROR(s)
        4K       Configured               3          393219        NONE
     2048K                0             769               0        NONE

RECOMMENDATION:
 1. For optimal performance, configure system with expected number
 of pages for every supported system pagesize prior to the next
 instance restart operation.
 2. Increase per process memlock (soft) limit to at least 1538MB
 to lock 100% of SHARED GLOBAL AREA (SGA) pages into physical memory

As you can see I am not using large pages here at all, which I did for demonstration purposes only. I don’t see any reason not to use large pages on a 64bit system these days. I’m curious to see whether the AIX port supports all the AIX page sizes here.

End of part I

This has already turned into a longer post than I expected it to be when I started writing. I think I’ll continue the series in a couple of weeks when I’m finding the time.

Posted in 12c Release 1, RAC | Leave a Comment »

Upgrading clustered Grid Infrastructure to 12.1.0.2 from 12.1.0.1.3

Posted by Martin Bach on July 29, 2014

Oracle 12.1.0.2 is out, after lots of announcements the product has finally been released. I had just extended my 12.1.0.1.3 cluster to 3 nodes and was about to apply the July PSU when I saw the news. So why not try and upgrade to the brand new thing?

What struck me at first was the list of new features … Oracle’s patching strategy has really changed over time. I remember the days when Oracle didn’t usually add additional features into point releases. Have a look at the new 12.1.0.2 features and that would possibly qualify to be 12c Release 2…

In summary the upgrade process is actually remarkably simple, and hasn’t changed much since earlier versions of the software. Here are the steps in chronological order.

./runInstaller

I don’t know how often I have type ./ruinInstaller instead of runInstaller, but here you go. This is the first wizard screen after splash screen has disappeared.

GI 12.1.0.2-001

Naturally I went for the upgrade of my cluster. Before launching the installer though I made sure that everything was in working order by means of cluvfy. On to the next screen:

GI 002

always install English only. Troubleshooting Oracle in a different language (especially if I don’t speak or understand) is really hard so I avoid it in the first place.

Over to the screen that follows and oops-my SYSDG disk group (containing OCR and voting files) is too small. Bugger. In the end I added 3 new 10GB LUNs and dropped the old ones. But it took me a couple of hours to do so. Worse: it wasn’t even needed, but proved to be a good learning exercise. The requirement to have that much free space is most likely caused by the management repository and related infrastructure.

GI 003 error

Back to this screen everything is in best order, the print screen has been taken just prior to the change to the next. Note the button to skip the updates on unreachable nodes. Not sure if I wanted to do that though.

GI 003

I haven’t got OEM agents on the servers (yet) so I’m skipping the registration for now. You can always do that later.

GI 004

This screen is familiar; I am keeping my choices from the initial installation. Grid Infrastructure is owned by Oracle despite the ASMDBA and ASMADMIN groups by the way.

GI 005

On the screen below you define where on the file system you want to install Grid Infrastructure. Remember that for clustered deployments the ORACLE_HOME cannot be in the path of the ORACLE_BASE. For this to work you have to jump to the command line and create the directory on all servers and grant ownership to the GI owner account (oracle in this case, could be grid as well).

GI 006

Since I like to be in control I don’t allow Oracle to run the root scripts. I didn’t in 12.1.0.1 either:

GI 007

In that screen you notice the familiar checking of requirements.

GI 008

In my case there were only a few new ones shown here. This is a lab server so I don’t plan on using swap, but the kernel parameter “panic_on_oops” is new. I also didn’t set the reverse path filtering which I corrected before continuing. Interestingly the installer points out that there is a change in the asm_diskstring with its implications.

One thing I haven’t recorded here (because I am using Oracle Linux 6.5 with UEK3) is the requirement for using a 2.6.39 kernel – that sounds like UEK2 to me.

Update: my system is Oracle Linux 6.5, not Red Hat. See Sigrid’s comments below: for Red Hat Linux there doesn’t seem to be a similar requirement to use UEK 2, which matches the documentation (Installation guide for Grid Infrastructure/Linux).

Another interesting case was that the kernel_core pattern wasn’t equal on all nodes. Turned out that 2 nodes had the package abrt installed, and the other two didn’t. Once the packages were installed on all nodes, the warning went away.

GI 009

Unfortunately I didn’t take a print screen of the summary in case you wonder where that is. I went straight into the installation phase:

GI 010

At the end of which you are prompted to run the upgrade scripts. Remember to run them in screen and pay attention to the order you run them in.

GI 011

The output from the last node is shown here:

[root@rac12node3 ~]# /u01/app/12.1.0.2/grid/rootupgrade.sh
Performing root user operation.

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u01/app/12.1.0.2/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/12.1.0.2/grid/crs/install/crsconfig_params
2014/07/26 16:15:51 CLSRSC-4015: Performing install or upgrade action for Oracle
Trace File Analyzer (TFA) Collector.

2014/07/26 16:19:58 CLSRSC-4003: Successfully patched Oracle Trace File Analyzer (TFA) Collector.

2014/07/26 16:20:02 CLSRSC-464: Starting retrieval of the cluster configuration data

2014/07/26 16:20:51 CLSRSC-465: Retrieval of the cluster configuration data has successfully completed.

2014/07/26 16:20:51 CLSRSC-363: User ignored prerequisites during installation

ASM configuration upgraded in local node successfully.

2014/07/26 16:21:16 CLSRSC-466: Starting shutdown of the current Oracle Grid Infrastructure stack

2014/07/26 16:22:51 CLSRSC-467: Shutdown of the current Oracle Grid Infrastructure stack has successfully
completed.

OLR initialization - successful
2014/07/26 16:26:53 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.conf'

CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
2014/07/26 16:34:34 CLSRSC-343: Successfully started Oracle Clusterware stack

clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 12c Release 1.
Successfully taken the backup of node specific configuration in OCR.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
2014/07/26 16:35:55 CLSRSC-478: Setting Oracle Clusterware active version on the last node to be upgraded

2014/07/26 16:35:55 CLSRSC-482: Running command: '/u01/app/12.1.0.2/grid/bin/crsctl set crs activeversion'

Started to upgrade the Oracle Clusterware. This operation may take a few minutes.
Started to upgrade the CSS.
The CSS was successfully upgraded.
Started to upgrade Oracle ASM.
Started to upgrade the CRS.
The CRS was successfully upgraded.
Successfully upgraded the Oracle Clusterware.
Oracle Clusterware operating version was successfully set to 12.1.0.2.0
2014/07/26 16:38:51 CLSRSC-479: Successfully set Oracle Clusterware active version

2014/07/26 16:39:13 CLSRSC-476: Finishing upgrade of resource types

2014/07/26 16:39:26 CLSRSC-482: Running command: 'upgrade model  -s 12.1.0.1.0 -d 12.1.0.2.0 -p last'

2014/07/26 16:39:26 CLSRSC-477: Successfully completed upgrade of resource types

2014/07/26 16:40:17 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded

Did you notice that TFA has been added? Trace File Analyzer is another of these cool things to play with, it was available with 11.2.0.4 and as an add-on to 12.1.0.1.

Result!

Back to OUI to complete the upgrade. After which cluvfy performs a final check and I’m done. Prove it worked:

[oracle@rac12node1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Thu Jul 26 17:13:02 2014

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> select banner from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
PL/SQL Release 12.1.0.2.0 - Production
CORE	12.1.0.2.0	Production
TNS for Linux: Version 12.1.0.2.0 - Production
NLSRTL Version 12.1.0.2.0 - Production

SQL>

In another post I’ll detail the upgrade for my databases. I am particularly interested about the unplug/plug way of migrating…

Posted in 12c Release 1, KVM, Linux, RAC | Tagged: , | 6 Comments »

Runtime Load Balancing Advisory in RAC 12c-addendum

Posted by Martin Bach on February 19, 2014

A reader asked an interesting question yesterday with regards to the previous post on the subject: where did you get your service metrics from when you queried v$servicemetric-PDB or CDB$ROOT?

I queried the PDB, but this morning repeated the test to make sure the results are consistent, and they are. This is definitely something you’d hope for: you should not have different results in the same v$-view depending on the container you execute your query in for a given CON_ID.

During testing I noticed something interesting though. I queried gv$servicemetric but did not limit the result to the service I wanted to test with (FCFSRV). Here is the query against gv$servicemetric while the system was idle.

select inst_id,begin_time,end_time,service_name,cpupercall,callspersec,goodness,delta,con_id
from gv$servicemetric
where con_id = 3 and service_name  in ('FCFSRV','demopdb')
order by service_name, inst_id;

   INST_ID BEGIN_TIME          END_TIME            SERVICE_NAME         CPUPERCALL CALLSPERSEC   GOODNESS      DELTA     CON_ID
---------- ------------------- ------------------- -------------------- ---------- ----------- ---------- ---------- ----------
         1 19.02.2014 10:07:16 19.02.2014 10:07:21 FCFSRV                        0           0        100        100          3
         1 19.02.2014 10:05:57 19.02.2014 10:06:56 FCFSRV                        0           0        100        100          3
         2 19.02.2014 10:05:55 19.02.2014 10:06:54 FCFSRV                        0           0        100        100          3
         2 19.02.2014 10:07:19 19.02.2014 10:07:24 FCFSRV                        0           0        100        100          3
         1 19.02.2014 10:07:16 19.02.2014 10:07:21 demopdb                       0           0          0          1          3
         1 19.02.2014 10:05:57 19.02.2014 10:06:56 demopdb                       0           0          0          1          3
         2 19.02.2014 10:07:19 19.02.2014 10:07:24 demopdb                       0           0          0          1          3
         2 19.02.2014 10:05:55 19.02.2014 10:06:54 demopdb                       0           0          0          1          3

8 rows selected.

Read the rest of this entry »

Posted in 12c Release 1, Linux, RAC | Leave a Comment »

Runtime Load Balancing Advisory in RAC 12c

Posted by Martin Bach on February 18, 2014

This is a follow-up on yesterday’s post about services in the new 12c database architecture. After having worked out everything I needed to know about TAF and RAC 12c in CDBs I wanted to check how FCF works with PDBs today. While investigating I found out that the Runtime Load Balancing Advisory does not seem to work as expected in some cases. But I’m getting ahead of myself. First of all, here is my test case:

  • Oracle Linux 6.4 x86-64
  • Grid Infrastructure 12.1.0.1.2, i.e. January 2014 PSU applied
  • RDBMS 12.1.0.1.2, likewise patched with the January PSU
  • A CDB with just 1 PDB for this purpose, named DEMOPDB
  • Service FCFSRV is used

Read the rest of this entry »

Posted in 12c Release 1, Linux, RAC | 2 Comments »

RAC and Pluggable Databases

Posted by Martin Bach on February 17, 2014

In preparation of the OUGN Spring Seminar and to finally fulfill at least a part of my promise from July I was getting ready to research RAC, PDBs and services for my demos. It turned out to become a lot more interesting than I first assumed.

RAC and Multi-Tenancy

So the first attempt to really look at how this works has started with my 2 node cluster where I created a RAC database: RAC12C, administrator managed with instance RAC12C1 and RAC12C2. The database is registered in Clusterware. Clusterware and RDBMS are patched to the January PSU, i.e. 12.1.0.1.2.

The second step was to create a PDB for testing-it’s named DEMOPDB and available on both my instances by design. By the way-PDBs do not start automatically, it is your responsibility to start them when the database starts. I used a startup-trigger for this in the Consolidation Book, have a look at it to see the example. There are other ways available as I found out.

As soon as the PDBs is opened (in all Oracle deployment types, not limited to a RAC instance), a new service with the same name as the PDB is automatically started. As it turned out, using that service is the most reliable way to connect to the PDB:

SQL> connect user/password@single-client-access-name/pdbname

This works really well. Substitute your hostname with the SCAN for single instance. Now if you would like to implement some more interesting features (TAF/FAN+FCF/Application Continuity) you could create an additional service. The srvctl syntax has changed, Oracle now uses long parameter names (-service instead of -s), which doesn’t really respect the UNIX/GNU way of naming parameters (short parameter: single dash, long parameter name: double-dash) but that’s how it is.

Read the rest of this entry »

Posted in 12c Release 1, Linux, RAC | Tagged: , , , | 1 Comment »