Category Archives: 12c Release 1

Initialising PL/SQL associative arrays in 18c and later

I can never remember how to initialise PL/SQL associative arrays and thought I’d write a short post about it. This is primarily based on an article on Oracle’s Ask Tom site, plus a little extra detail from Steven Feuerstein. Associative arrays were previously known as index-by tables, by the way.

Associative arrays before 18c

Prior to Oracle 18c, you had to initialise an associative array in a slightly cumbersome way, like so:

    TYPE array_t IS
    v_the_array   array_t;
    v_index       PLS_INTEGER;
    v_the_array(1) := 'one';
    v_the_array(2) := 'two';
    v_the_array(3) := 'three';
    v_the_array(9) := 'nine';
    v_index := v_the_array.first;
    WHILE ( v_index IS NOT NULL ) LOOP
        dbms_output.put_line('v_the_array(' || v_index || '): ' || v_the_array(v_index));
        v_index :=;

There are slight variations to the theme as explained in example 5-3 of the Oracle Database PL/SQL language reference you might find interesting. You end up having to repeat yourself a lot, as you can see immediately below the BEGIN keyword.

Oracle 18c simplifies this task

With 18c the syntax reminds me a little of Perl, as in you define the type first; in the next step you add a variable to your anonymous code block, which you initialise in the same step:

    TYPE array_t IS

    -- this is new for 18c, see Steven Feuerstein's article here:
    v_the_array   array_t := array_t(
        1 => 'one', 
        2 => 'two', 
        3 => 'three',    -- note gap here ;)
        9 => 'nine');

    v_index       PLS_INTEGER;
    v_index := v_the_array.first;

    WHILE ( v_index IS NOT NULL ) LOOP
        dbms_output.put_line('v_the_array(' || v_index || '): ' || v_the_array(v_index));
        v_index :=;


This way you can define the array in a much nicer looking way and with less code. I also find it more readable.

You are of course not limited to using PLS_INTEGER for indexing the array, you can index differently:

    TYPE capitals_t IS
    v_capitals   capitals_t := capitals_t(
        'France' => 'Paris', 
        'Belgium' => 'Brussels', 
        'Austria' => 'Vienna');

    v_index      VARCHAR2(100);
    v_index := v_capitals.first;
    WHILE ( v_index IS NOT NULL ) LOOP
        dbms_output.put_line('The capital of ' || v_index || ' is ' 
                             || v_capitals(v_index));

        v_index :=;


Once I wrapped my head around this it all made sense, and I think I’ll use this syntax from now on.

Dead Connection Detection (DCD) and the Oracle database

Dead Connection Detection is a useful feature of the Oracle database: it allows for the cleanup of “dead” sessions so they don’t linger around consuming memory and other system resources. The idea is simple: if the database detects that a client process is no longer connected to its server process, it cleans up. This can happen in many ways, in most cases this kind of problem is triggered by an end user.

A dead connection shouldn’t be confused with idle connections: an idle connection still maintains the network link between client and server process, except that there is no activity. Idle connections aren’t maintained/controlled via DCD, there are other tools in the database handling such cases.

As a by product, DCD can also help with overly eager firewalls forcibly removing seemingly idle network connections. I found the following posts and the references therein very useful:

With Oracle 12c Oracle changed the way DCD works by no longer relying on its network layer but rather pushing the functionality into the TCP stack on platforms that support it. This change in behaviour is also explained in an Oracle white paper from 2014.

For the first part of this post I set sqlnet.expire_time to 1 as per a MOS note I found, your value is probably different. The parameter is documented in the Net*8 reference, please use it to work out what the best value is for you. As others have pointed out, this parameter has to go into the RDBMS home, more specifically $ORACLE_HOME/network/admin/sqlnet.ora. I am using Oracle 19.4 on Oracle Linux 7.7 for this blog post. I have seen the same behaviour in as well in my lab.


In addition to the proven methods of checking whether TCP_KEEPALIVE is enabled for a given session I wanted to show another one. Using the ss(8) utility it is possible to show socket options. I also tried lsof but on my system I couldn’t get it to print the options:

SQL> select spid from v$process where addr = (select paddr from v$session where username = 'MARTIN');


SQL> exit 


[root@server2 ~]# lsof -nP -p 13656 -T f
lsof: unsupported TCP/TPI info selection: f
lsof 4.87

Although the man-page for lsof reads:

       -T [t]   controls the reporting of some TCP/TPI information,  also  reported  by  net‐
                stat(1),  following  the network addresses.  In normal output the information
                appears in parentheses, each item except TCP or TPI state name identified  by
                a keyword, followed by `=', separated from others by a single space:

                     <TCP or TPI state name>
                     QR=<read queue length>
                     QS=<send queue length>
                     SO=<socket options and values>
                     SS=<socket states>
                     TF=<TCP flags and values>
                     WR=<window read length>
                     WW=<window write length>

                Not all values are reported for all UNIX dialects.  Items values (when avail‐
                able) are reported after the item name and '='.

                When the field output mode is in effect  (See  OUTPUT  FOR  OTHER  PROGRAMS.)
                each item appears as a field with a `T' leading character.

                -T with no following key characters disables TCP/TPI information reporting.

                -T with following characters selects the reporting of specific TCP/TPI infor‐

                     f    selects reporting of socket options,
                          states and values, and TCP flags and

So let’s try something else: ss(8) – another utility to investigate sockets

Revealing socket options

I have used ss(8) in the past when I didn’t have netstat available, which is more and more common now that netstat is deprecated and its designated successor is ss :)

As far as I know you can’t limit ss to show information just for a PID, I use grep to limit the output. The output is in fact very wide, which is why this might not look pretty on the blog depending on whether the renderer decides to wrap output or not.

[root@server2 ~]# ss -nop | egrep 'NetidState|13656'
NetidState Recv-Q Send-Q                                    Local Address:Port                                      Peer Address:Port                                                                                                           
tcp  ESTAB 0      0                               [::ffff:]:1521                           [::ffff:]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,4.412ms,0)        
[root@server2 ~]# 

I used the following options:

  • -n for “Do not try to resolve service names”
  • -o for “Show timer information” and finally
  • -p to “Show process using socket”.

The main option here is -o. As per the man page:

       -o, --options
              Show timer information. For tcp protocol, the output format is:


                     the name of the timer, there are five kind of timer names:

                     on: means one of these timers: tcp retrans timer, tcp early retrans timer and tail loss probe timer
                     keepalive: tcp keep alive timer
                     timewait: timewait stage timer
                     persist: zero window probe timer
                     unknown: none of the above timers

                     how long time the timer will expire

                     how many times the retran occurs

With a little shell loop I can show how that timer is decrementing:

[root@server2 ~]# for i in $(seq 1 5); do ss -nop | grep 13656; sleep 1 ; done
tcp  ESTAB 0      0                               [::ffff:]:1521                           [::ffff:]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,20sec,0)          
tcp  ESTAB 0      0                               [::ffff:]:1521                           [::ffff:]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,19sec,0)          
tcp  ESTAB 0      0                               [::ffff:]:1521                           [::ffff:]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,18sec,0)          
tcp  ESTAB 0      0                               [::ffff:]:1521                           [::ffff:]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,17sec,0)          
tcp  ESTAB 0      0                               [::ffff:]:1521                           [::ffff:]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,16sec,0)          
[root@server2 ~]# 


Using the ss utility it is possible to check if a keepalive timer is implemented as a means to support DCD with 12.1 and later releases. Invoking ss(8) hasn’t caused any problems on my system, but as with every such tool you need to ensure it’s safe to use before attempting to look at an important system.

sqlldr, direct path loads and concurrency in 12.1 and earlier

I have recently come across an interesting issue related to concurrent data loading into the Oracle database using sqlldr’s direct path mode. Although I investigated the situation on, I found that the same holds true in 19.4 as well when using the defaults. I reconstructed the case, although it is simplified a little to bring the point home.

The environment I used to put this post together is Oracle Restart 19.4.0 on Oracle Linux 7.6.

Test overview

For this test I am running concurrent sqlldr sessions to demonstrate the case. I am conscious of that fact that I could have used external tables, but then I wouldn’t have been able to write this post :)

Assume there’s a table named t2:

SQL> create table t2 as select * from dba_objects where 1 = 0;

Table created.

I am planning on populating the table with data in CSV format. Using sqlcl it is dead easy to create an input file, simply specify the /*csv*/ hint in your query and spool the output to a file: job done. The resulting input file is named t2.dat.

Next up is the control file. I am a creature of habit and tend to use a control file although I could have tried the same process using express mode.

Using the documentation I figured the following control file should do. Since it’s easy enough to forget passing direct=true to the command line I added it to the options clause. When using sqlcl to create a CSV file it adds the column headings in line 1, and they won’t load so I’m skipping these records. Here is the full control file:

options (direct=true, skip=1)
load data 
infile 't2.dat'
into table t2
fields csv without embedded 
date format 'dd-mon-yy'
trailing nullcols
CREATED date,                                                                        
LAST_DDL_TIME date,                                                                  

With the prerequisites at hand I’m ready to perform some testing.

Test #1: concurrent sessions using defaults

First of all, what happens when starting 5 concurrent sessions? Will they complete? Let’s try this in bash:

for i in $(seq 1 5) ; do
  echo starting iteration $i
  sqlldr /@loader control=t2.ctl log=session_${i}.log &

This ran to completion without any visible errors, but I prefer to run sanity checking anyway. The first thing to do is to see if all records have been loaded. The input file contains 73148 data records by the way.

$ grep 'Rows successfully loaded' session*log
session_1.log:  73148 Rows successfully loaded.
session_2.log:  73148 Rows successfully loaded.
session_3.log:  73148 Rows successfully loaded.
session_4.log:  73148 Rows successfully loaded.
session_5.log:  73148 Rows successfully loaded.

So this looks ok, what about the run times?

$ grep '^Run' *.log
session_1.log:Run began on Mon Jul 22 21:32:31 2019
session_1.log:Run ended on Mon Jul 22 21:32:44 2019
session_2.log:Run began on Mon Jul 22 21:32:31 2019
session_2.log:Run ended on Mon Jul 22 21:32:40 2019
session_3.log:Run began on Mon Jul 22 21:32:31 2019
session_3.log:Run ended on Mon Jul 22 21:32:46 2019
session_4.log:Run began on Mon Jul 22 21:32:31 2019
session_4.log:Run ended on Mon Jul 22 21:32:48 2019
session_5.log:Run began on Mon Jul 22 21:32:31 2019
session_5.log:Run ended on Mon Jul 22 21:32:42 2019

All in all these loads completed in a reasonably short time. Some took longer to finish than others though. That made me curious and I logged into the database to see what was going on. I never consciously saw this event:

SQL> select sid, event, state from v$session where program like 'sqlldr%';

       SID EVENT                                                            STATE
---------- ---------------------------------------------------------------- -------------------
        44 kpodplck wait before retrying ORA-54                             WAITING
        52 kpodplck wait before retrying ORA-54                             WAITING
        54 kpodplck wait before retrying ORA-54                             WAITING
       290 Data file init write                                             WAITING
       291 kpodplck wait before retrying ORA-54                             WAITING


Using the output from my above query I could see that one session was active, all the others queued up behind it. Interestingly there is no “enq: TM – contention” which you’d see with direct path inserts in sqlplus for example:

-- in session 1
SQL> insert /*+ append */ into t2 select * from dba_objects;

73157 rows created.

-- notice there is no commit! 

-- Session 2 waits ...
SQL> insert /*+ append */ into t2 select * from dba_objects;

-- the DBA sees it all:
SQL> select sid, event, state from v$session where username = 'MARTIN'

       SID EVENT                                                            STATE
---------- ---------------------------------------------------------------- -------------------
        51 enq: TM - contention                                             WAITING
       270 SQL*Net message from client                                      WAITING

What’s even more interesting is this: ddl_lock_timeout which I thought might have helped, has no effect on sqlldr operations:

SQL> show parameter ddl_lock_timeout

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
ddl_lock_timeout                     integer     0

The parameter isn’t set at all, yet I could see “kpodplck wait before retrying ORA-54” which is strange. ORA-54 is of course the dreaded “resource busy and acquire with nowait specified” error. I managed to trace kpodp back to the direct load path functionality.

Without the ability to control waiting via a database parameter I am aware of, there must be some other timeout. And sure enough, there is:

$ time sqlldr /@loader control=t2.ctl

SQL*Loader: Release - Production on Mon Jul 22 21:46:59 2019

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct
SQL*Loader-951: Error calling once/load initialization
ORA-00604: error occurred at recursive SQL level 1
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

real    0m32.038s
user    0m0.011s
sys     0m0.011s

It appears as if sqlldr times out waiting for a locked resource after round about 30 seconds, regardless of ddl_lock_timeout.

And this is the end of the story if you aren’t on 12.2 or later. In the next post I’ll show you how you can work around this problem with more current releases.

RBAL (ospid: nnn): terminating the instance due to error 27625 after patching Oracle Restart

I have come across an odd behaviour trying to patch an Oracle Restart environment to January 2019. Based on a twitter conversation this isn’t necessarily limited to my patch combination, there might be others as well. I have used opatchauto to apply patch 28813884 to both RDBMS and GRID homes plus its corresponding OJVM (Java) patch. Before diving into details, this is the environment I have been working with:

  • Oracle Restart with an earlier PSU/OJVM combo applied
  • Separation of duties with oracle as the RDBMS owner, and grid owning the GRID infrastructure installation. This is a key point!
  • Patches to be applied
    • Patch 28813884 (GI PSU)
    • Patch 28790654 (corresponding OJVM patch)
  • OPatch version as installed in the GRID|RDBMS home
    • opatchauto:
    • opatch:

Following the instructions in the readme file I upgraded OPatch to the required version. The file I downloaded was named I double-checked the readme, and to me this is the correct file. After upgrading OPatch in the RDBMS and GRID homes, I started patching.

After this process completed, I wanted to start the database in UPGRADE mode as required by the OJVM patch. This needs to be done via sqlplus since srvctl does not support an “upgrade” option in “start database”.

And that’s where it hit me: whenever I tried to open the database in UPGRADE mode, it threw an error:

 SQL*Plus: Release Production on Tue Jul 16 12:15:34 2019

 Copyright (c) 1982, 2014, Oracle.  All rights reserved.

 Connected to an idle instance.
 SQL> startup upgrade
 ORACLE instance started.
 Total System Global Area 1073741824 bytes
 Fixed Size                  2932632 bytes
 Variable Size             713031784 bytes
 Database Buffers          352321536 bytes
 Redo Buffers                5455872 bytes
 ORA-03113: end-of-file on communication channel
 Process ID: 22354
 Session ID: 14 Serial number: 41958

Looking at the alert log I can see that RBAL terminated the instance:

 2019-07-16 12:15:43.404000 -04:00
 NOTE: ASMB mounting group 1 (DATA)
 WARNING: cellinit.ora is missing. RBAL is terminating the instance.
 RBAL (ospid: 22332): terminating the instance due to error 27625
 System state dump requested by (instance=1, osid=22332 (RBAL)), summary=[abnormal instance termination].
 System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/ORCL/trace/ORCL_diag_22306_20190716121543.trc
 Dumping diagnostic data in directory=[cdmp_20190716121543], requested by (instance=1, osid=22332 (RBAL)), summary=[abnormal instance termination].
 Instance terminated by RBAL, pid = 22332
 2019-07-16 12:15:49.164000 -04:00

But since this is Oracle Restart, Clusterware will simply restart the database. Unless of course you’ve been changing the default behaviour. And funny enough, this works (see further down in the article as to why). Although I’d appreciate this in most cases, the automatic restart isn’t appropriate in my situation: when started by Clusterware, the database is not in upgrade mode:

 SQL> select status from v$instance;


Which is a problem for me. A MOS search about error 27625 didn’t reveal anything too useful, and it took me quite a while to realise the problem has to do with permissions. An Internet search finally gave me the right answer, a fellow blogger has pointed it out a little while ago …

So what exactly is the problem? The RDBMS “oracle” binary needs a specific set of permissions and ownership/group membership for Oracle Restart with separation of duties enabled. This is what it looked like before applying the patch:

$ ls -l $ORACLE_HOME/bin/oracle
-rwsr-s--x. 1 oracle asmadmin 324518992 Jul 16 12:15 /u01/app/oracle/product/

The file permissions are ‘6751’ with oracle owning the file but it belongs to the asmadmin group. Remember, I am using a separate user for Grid Infrastructure with its own specific operating system groups. After running opatch, this changed to:

$ ls -l $ORACLE_HOME/bin/oracle
-rwsr-s--x. 1 oracle oinstall 324518992 Jul 16 12:29 /u01/app/oracle/product/

Permissions remained the same, however the group changed from asmadmin to oinstall. Not quite what I had in mind, and it reproducibly causes instance crashes. There is a simple solution: make sure permissions are set correctly! Using “Database Creation on 11.2/12.1/12.2 Grid Infrastructure with Role Separation ( ORA-15025, KFSG-00312, ORA-15081 ) (Doc ID 1084186.1)” got me on the right track.

 $ ls -l $ORACLE_HOME/bin/oracle
 -rwsr-s--x. 1 oracle asmadmin 324518992 Jul 16 12:29 /u01/app/oracle/product/

 $ sqlplus / as sysdba

 SQL*Plus: Release Production on Tue Jul 16 12:34:45 2019

 Copyright (c) 1982, 2014, Oracle.  All rights reserved.

 Connected to an idle instance.

 SQL> startup upgrade
 ORACLE instance started.

 Total System Global Area 1073741824 bytes
 Fixed Size                  2932632 bytes
 Variable Size             713031784 bytes
 Database Buffers          352321536 bytes
 Redo Buffers                5455872 bytes
 Database mounted.
 Database opened.

 SQL> select status from v$instance;



With the database in upgrade mode (I believe the “migrate” is a left over from the 9i days) I am reassured that running datapatch (the OJVM part actually) works as advertised.

By the way Clusterware corrects the group permissions when you issue a “srvctl start database -db …” command as documented in “Starting the database using srvctl changes the group setting for oracle binary (Doc ID 1508027.1)”. Which didn’t solve my problem as I can’t start the database in upgrade mode using srvctl.

Hope this helps you one day!

Creating a RAC 12.1 Data Guard Physical Standby environment (3b)

Huh, what is this I hear you ask? Part 3b? Oracle 12.1? Well, there’s a bit of a story to this post. Back in December 2016 I started to write a series of blog posts (part 1 | part 2 | part 3 | part 4) about how I created a standby database on RAC 12.1. For some reason I forgot to post this part. Up until now the step where I am creating the broker configuration was missing. Thanks to a friend I discovered my mistake. I decided to post this article to complete the series. There’s a pretty big logical gap that needed filling :)

I also deliberately decided against making changes to my notes, so this is written in the same style as the articles preceding this one. Also bear in mind that this is Oracle 12.1!

NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small-scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!

Here is the original blog post

In a last-minute decision I decided to split part 3 into half: first the database duplication, followed by the Data Guard configuration. It really got a bit too long (even for my standards), and would have put readers off the article. I appreciate your patience though…

What’s the situation?

At the end of part 3 I had a physical standby database ready for use, including its registration in the OCR. it was looking promising, but there was more work to do.

At the moment, the standby database is out of sync with the primary, a fact that needs to be rectified by creating a Data Guard configuration. I prefer to do this on the command line. If you are so inclined you can of course push the mouse around and do it with OEM.

Step 1: Data Guard Broker pre-requisites

Before creating the configuration you should change the location of the broker configuration files to shared storage. This needs to be done on each side – primary and every standby – before you can start the broker.

SQL> alter system set dg_broker_config_file1='+DATA/NCDBA/dr1NCDBA.dat' scope=both sid='*';

System altered.

SQL> alter system set dg_broker_config_file2='+DATA/NCDBA/dr2NCDBA.dat' scope=both sid='*';

System altered.

SQL> show parameter dg_broker

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
dg_broker_config_file1               string      +DATA/NCDBA/dr1NCDBA.dat
dg_broker_config_file2               string      +DATA/NCDBA/dr2NCDBA.dat
dg_broker_start                      boolean     FALSE

SQL> alter system set dg_broker_start=true scope=both sid='*';

System altered.


I had to repeat the step on NCDBB, of course with adapted paths and file names.

Step 2: Creating a Data Guard Broker configuration

Once that’s done, you create the configuration using dgmgrl, the Data Guard Broker Manager Line Mode tool. I named my configuration RACTEST and added both databases. Here is an example of my session:

[oracle@rac12pri1 ~]$ dgmgrl
DGMGRL for Linux: Version - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys@ncdba
Connected as SYSDBA.
DGMGRL> create configuration ractest as primary database is 'NCDBA' connect identifier is 'NCDBA';
Configuration "ractest" created with primary database "NCDBA"
DGMGRL> add database 'NCDBB' as connect identifier is 'NCDBB';
Database "NCDBB" added
DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxPerformance
  NCDBA - Primary database
    NCDBB - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:

As a first step I connect as SYSDBA before creating the configuration. You can usually type help command to remind you of the exact syntax. After the configuration is created you can start adding (standby) databases. In my case there is only 1 – physical standby – database, but you can think of environments where you have more than one standby environment although that is less common. When specifying the connect identifier you use valid net*8 names found in the default tnsnames.ora (the one in the RDBMS home).

At this point it is important to set standby_file_management to auto. Otherwise managed recovery can abort and you will have to do extra work managing data files that can easily be avoided. The broker syntax to do so involves changing database properties (type show database verbose 'NCDBA' to show all of them). Many Data Guard broker properties are not easily mapped back to the corresponding initialisation parameter. In the case of standby_file_management it is not hard, and the documentation helps with the rest.

Eventually I enable the configuration and check the status. I have found that it can take a couple of minutes at times to get the SUCCESS message.

DGMGRL> edit database 'NCDBA' set property standbyfilemanagement=auto;
Property "standbyfilemanagement" updated
DGMGRL> edit database 'NCDBB' set property standbyfilemanagement=auto;
Property "standbyfilemanagement" updated
DGMGRL> enable configuration
DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxPerformance
  NCDBA - Primary database
    NCDBB - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 21 seconds ago)

There are other things you must consider in accordance with your company’s (data protection/availability/business continuity/…) standards and other documentation – again my example is rather basic, but deliberately so. It’s my lab playground after all.

For the sake of completeness, here is the current configuration of NCDBA and NCDBB:

DGMGRL> show database 'NCDBA'

Database - NCDBA

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON

Database Status:

DGMGRL> show database 'NCDBB';

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 223.00 KByte/s
  Real Time Query:    OFF
    NCDBB2 (apply instance)

Database Status:

DGMGRL> exit

Please note that the broker detected the fact that my systems are cluster databases and lists both instances for each database. It also shows that NCDBB2 is the apply instance. I was positively surprised about the broker’s ability to detect both instances automatically. I seem to remember cases in earlier releases where said instance detection occasionally required a little nudging to get it right.

This example is deliberately kept as minimalist as possible: it works. Most likely it won’t satisfy your requirements. Review your Data Guard (and any other relevant) standards and change the configuration accordingly.


… back to present time :) This post completes the series, all my notes are now transcribed to the blog. I hope you find this useful.

back to part 3 | on to part 4

A quick look at Oracle 12.2 performance instrumentation

Thanks to the prep-work done by Andy Colvin, who continuously and boldly goes where no man has gone before, one of our Exadata systems in the lab is now fully upgraded to It comes fully equipped with the matching cellos to support all the cool new features. Exciting times!

The reason for this post is simple: I have started working on our talk for @Enkitec’s E4 conference in June but thought I’d share this little article with you as a teaser :) There might be one or two more of these posts but if you want the full story make sure you catch us (online) during the conference.

More diagnostic information in 12.2

The Oracle database truly champions diagnosability in a way I have not seen with any other system, and it does so out of the box. Granted, some of that requires an investment into extra cost options but the value one gets in form of Active Session History (ASH) and Automatic Workload Repository (AWR) is real. After I read the chapter on instrumentation in “Insights-tales from the Oaktable” (Apress) a long time ago, I started to realise the inherent beauty of having insights to code. This code can be your code if you instrumented it properly, or the Oracle codepath externalised as wait events. Sadly most application developers do not adhere to the principle of instrumenting code (or maybe don’t even know about the possibility?) and therefore complicate troubleshooting unnecessarily. The latter is not so much an issue on many platforms where you don’t have an equivalent of the Oracle Wait Interface and session counters anyway, but on Oracle it’s a real wasted opportunity as others have pointed out before me.

I’ll now take my idealist hat off and focus on the real world :) In the far more common case where the application isn’t instrumented by the developer, you can still get to some conclusions by using the Wait Interface and session counters. In most scenarios I am involved in the first step is to use Tanel Poder’s session snapper script which gives me insights to both.

So what has changed in this respect with 12.2? The test environment I am using is an Exadata quarter rack as mentioned before. The findings should be comparable with other Oracle software offerings, in the cloud and on premises.

Wait Interface

The Wait Interface is one of the most useful features for the performance engineer, and one of the building blocks for Statspack, ASH and AWR. I was curious if new events have been introduced in 12.2, hence this investigation. The sheer number of events tracked in the database prevents them from being listed verbally in this post. If you want to, you can use Tanel’s “sed.sql” to find out more, or simply query v$event_name.

An interesting tidbit for 12.2 has been covered by @FranckPachot: some of the more misleading event names such as db file scattered read and db file sequential readhave been clarified in 12.2. Search for events where the name is not equal to the display_name, or read Franck’s post on the DBI blog.

Wait events in

I like to start comparisons with a benchmark, and seems to be a good candidate. Just looking at the wait_classes and counting events per class should be a good starting point:

SQL> select count(*), wait_class 
  2  from v$event_name 
  3  group by rollup(wait_class) 
  4  order by wait_class;

----------- ----------------------------------------------------------------
         55 Administrative
         17 Application
         50 Cluster
          2 Commit
         33 Concurrency
         24 Configuration
         95 Idle
         35 Network
        745 Other
          9 Queueing
          8 Scheduler
         31 System I/O
         48 User I/O

14 rows selected.

So there are 1152 events total in, keep that number in mind.

Wait events in

In my opinion 12.1 is a major step ahead, and I said it many times: I haven’t seen so many fundamental changes to the Oracle database for a long time. For various reasons though 12.1 hasn’t seen the traction in the field it probably deserved. Note how the diagnosability has been greatly enhanced:

SQL> select count(*), wait_class 
  2  from v$event_name 
  3  group by rollup(wait_class) 
  4  order by wait_class;

----------- ----------------------------------------------------------------
         57 Administrative
         17 Application
         64 Cluster
          4 Commit
         38 Concurrency
         26 Configuration
        121 Idle
         28 Network
       1186 Other
          9 Queueing
          9 Scheduler
         35 System I/O
         56 User I/O

14 rows selected.

A quick calculation reveals that 12.1 features 498 additional events, a lot more than Not too bad in my opinion. Unfortunately most of these additional events ended up in the “Other” wait class. This is a real shame, and I have remarked this before when blogging about the new Data Guard Wait events: they probably should have ended up in the Network class instead. I am sure there are other examples like this.

Wait events in

Now what about 12.2? First let’s start with the list:

SQL> select count(*), wait_class
  2  from v$event_name
  3  group by rollup(wait_class)
  4  order by wait_class;

---------- ----------------------------------------------------------------
        57 Administrative
        17 Application
        68 Cluster
         4 Commit
        49 Concurrency
        27 Configuration
       135 Idle
        29 Network
      1314 Other
         9 Queueing
        10 Scheduler
        35 System I/O
        57 User I/O

14 rows selected.

There are indeed some new events, most of them can be found in the Other wait class. Again, this is quite unfortunate as it prevents the performance architect from identifying unknown wait events quickly.

I have decided to keep this post short-ish and will spend more time some other day to investigate the exact difference between and Most likely after E4 this year.

Session Counters

With the wait interface covered, it’s time to move on to the session counters. Continuing the approach I took with wait events I will group all session counters by class. Instead of re-inventing the wheel I am using a slightly adapted version of Tanel Poder’s “mys.sql” script to group counters by class. Most of them fall in just one, but there are others where more than one class is applicable. The 12.2 Reference Guide explains v$statname.class in more detail.

Session counters in

Before investigating 12.1 and 12.2 I’ll look at first, as in the first section of the article. The SQL statement I used is this:

with classification as (
select name, TRIM(
  CASE WHEN BITAND(class,  1) =   1 THEN 'USER  ' END ||
  CASE WHEN BITAND(class,  2) =   2 THEN 'REDO  ' END ||
  CASE WHEN BITAND(class,  4) =   4 THEN 'ENQ   ' END ||
  CASE WHEN BITAND(class,  8) =   8 THEN 'CACHE ' END ||
  CASE WHEN BITAND(class, 16) =  16 THEN 'OS    ' END ||
  CASE WHEN BITAND(class, 32) =  32 THEN 'RAC   ' END ||
  CASE WHEN BITAND(class, 64) =  64 THEN 'SQL   ' END ||
  CASE WHEN BITAND(class,128) = 128 THEN 'DEBUG ' END
) class_name 
from v$statname
) select count(*), class_name from classification 
group by rollup(class_name)
order by class_name;

Executed on an database this reveals the following numbers:

----------- ------------------------------------------------
        121 CACHE
         27 CACHE RAC
         15 CACHE SQL
        188 DEBUG
          9 ENQ
         16 OS
         25 RAC
         32 REDO
         93 SQL
          2 SQL   DEBUG
        107 USER
          3 USER  RAC

13 rows selected.

638 of them altogether. Keep the number in mind when moving to 12.1.

Session counters in

There was a major increase in the number of counters between and Consider the numbers:

---------- ------------------------------------------------
       151 CACHE
        53 CACHE RAC
        15 CACHE SQL
       565 DEBUG
         9 ENQ
        16 OS
        35 RAC
        68 REDO
         1 REDO  RAC
       130 SQL
         2 SQL   DEBUG
       130 USER
         3 USER  RAC

14 rows selected.

That nearly doubles the number of counters available. Note that quite a few of the new counters fall into the DEBUG section. Looking a bit more closely you can see they seem to be used by the In Memory (IM) Option:

  2      regexp_substr(name,'\w+ \w+') AS short_name,
  3      COUNT(*)
  4  FROM
  5      v$statname
  6  WHERE
  7      class = 128
  9      regexp_substr(name,'\w+ \w+')
 11      2 DESC

SHORT_NAME                        COUNT(*)
------------------------------ -----------
IM repopulate                           49
IM scan                                 47
IM populate                             37
spare statistic                         35
IM space                                26

5 rows selected.

There are 198 session counters beginning with ‘IM %’. I can feel another post about DB In Memory coming …

Session counters in

Finally, here is the list of statistics in

---------- ------------------------------------------------
       174 CACHE
        73 CACHE RAC
        15 CACHE SQL
      1067 DEBUG
         9 ENQ
        16 OS
        35 RAC
        75 REDO
         1 REDO  RAC
       190 SQL
         2 SQL   DEBUG
       144 USER
         3 USER  RAC

Another 626 additional counters, that’s almost the number of counters available in total on the system! Running my previous query again it seems that IM-related statistics dominate, but there are lots of others of interest.

As with the 12.2 wait events I don’t want to give too much away at this point (and the post is long enough anyway) so stay tuned for an update at a later time.


Oracle has been one of the best database engines around, and with instrumentation is further improved. This post has again become too long, so I’ll stop here and defer the write-up of my investigation into those stats relevant for Exadata to a later point.

Happy troubleshooting!

Little things worth knowing: Executing RDA on RAC

Result! I have finally been able to gather a complete RDA (Oracle Remote Diagnostic Agent) output on my 2 node RAC system. After consulting the relevant documentation on MOS-which is spread over at least 42 Doc IDs-I found them not to be very helpful to the degree that some of what I read is actually wrong or contradicting. I put together a short note, primarily to myself, to remind me of the process. I hope you find it useful, too.

The RDA version I used for this post is 8.14.x from MOS March 4th 2017. My RAC nodes are based on Oracle Linux 7.3/UEK 4.

Starting the data collection

RDA is one of the standard tools I use and I have previously blogged about it. RDA started off as a simple-to-use tool. After having used it for some years I started to run into issues with the interactive configuration which simply took too long to complete. As soon as I learned about them I fell in love with profiles. RDA profiles only prompt you for relevant information about the product you want to collect data for.

Running RDA on a RAC system is similar to single instance, and you automatically use profiles-a nice touch. It appears to me that RAC data collection is triggered via SSH on the remote nodes with the results being transferred to the local node.

I believe there are two methods of data gathering on RAC: one for Grid Infrastructure, and another one for the RDBMS layer. Depending on your settings of ORACLE_HOME and ORACLE_SID different parts of the stack are analysed. In this post I was primarily interested in running RDA for the RDBMS part of my cluster.

I started off by downloading the current RDA version to the system I want to run it on. This is a 2-node RAC, with nodes named rac12pri1 and rac12pri2. Both are based on Oracle Linux 7.3 with the most current UEK4 at the time of writing. My RDBMS homes are patched with the January 2017 proactive bundle patch + OJVM.

The first caveat looms right at the beginning of the entire process: I didn’t find it stated explicitly in the documentation on MOS, but it seemed that you need to deploy RDA on all nodes in the cluster, and in the exact same location, before you start the data collection. I use /home/oracle/rda for that purpose. Make sure you have plenty of space in the location you chose, sometimes /home doesn’t provide enough for larger, more active systems.

During my first unsuccessful attempts I didn’t deploy RDA on all nodes before starting the data collection, only to see it not gather any information from the remote node. This is somewhat confusing, because the output (when collecting data) states this:

NOD002: Installing RDA software

NOD002 is short for the second node in my cluster.

Also, when configuring the data gathering process, you will see a prompt like this one:

Enter the absolute path of the directory where RDA will be installed on the remote nodes.

To the casual observer like me these messages suggest that RDA is actually installed on the remote nodes as part of the data collection process-but it is only partially done. Comparing directory sizes between remote and local node revealed /home/oracle/rda to be greater than 200M locally, while on the remote nodes it was only 37MB in size. Something seems to be missing…

Once deployed, you can change to the RDA directory and prepare for the data collection. You actually execute RDA on the first node, the rest of the work is done programatically. The MOS note seems to be correct this time, here is an example of my configuration session:

[oracle@rac12pri1 rda]$ ./ -vXRda start CLOUD -pRac_Assessment
Creating collection "output" ...
        - Trying to identify the domain ... (can take time)
 Inside DFT scenario, define the input and profile targets
 Inside DFT scenario, check Oracle home or Middleware presence
        - RDA:DCbegin ...
RDA.BEGIN: Initializes the Data Collection
Enter the Oracle home to be used for data analysis
Press Return to accept the default (/u01/app/oracle/product/

        - RDA:DCconfig ...
RDA.CONFIG: Collects Key Configuration Information
        - RDA:DCocm ...
RDA.OCM: Set up the Configuration Manager Interface
        - RDA:DCstatus ...
RDA.STATUS: Produces the Remote Data Collection Reports
        - RDA:DCload ...
RDA.LOAD: Produces the External Collection Reports
        - RDA:DCfilter ...
RDA.FILTER: Controls Report Content Filtering
        - RDA:DCend ...
RDA.END: Finalizes the Data Collection
In CRS start scenario, getting nodes using /u01/app/
Requesting common cluster information
Enter the absolute path of the directory where RDA will be installed on the
remote nodes.
Press Return to accept the default (/home/oracle/rda)

Do you want RDA to use an alternative login to execute remote requests (Y/N)?
Press Return to accept the default (N)

In CRS start scenario, getting local node information
In CRS start scenario, pinging node rac12pri2
Requesting information for node rac12pri1
Enter the Oracle home to be analyzed on the node rac12pri1
Press Return to accept the default (/u01/app/oracle/product/

Enter the Oracle SID to be analyzed on the node rac12pri1
Press Return to accept the default (NCDBA1)

Requesting information for node rac12pri2
Enter the Oracle home to be analyzed on the node rac12pri2
Press Return to accept the default (/u01/app/oracle/product/

Enter the Oracle SID to be analyzed on the node rac12pri2

        - RDA:DCremote ...
RAC Setup Summary
. NOD001  rac12pri1/NCDBA1
. NOD002  rac12pri2/NCDBA2
2 nodes found

So RDA understands the RAC scenario, and it gathers data for each node that is part of the cluster as reported by the olsnodes tool. There was nothing really surprising about the prompts, except that I haven’t found a way to analyse more than one database, or ASM and a database together.

Be careful to enter the correct ORACLE_SID for your (remote) RAC nodes. RDA picks up your ORACLE_HOME and ORACLE_SID if they are set.

Optionally verify the correct number of nodes was detected

I am only mentioning this for the sake of completeness, the RAC Setup Summary you saw in the previous step was more than adequate for me. If you really want to find out about the number of nodes you are about to collect information, be careful: MOS Doc ID 359395.1 is wrong – the command ./ -vC RDA.REMTOE list will not only list the nodes, it will also start the data collection. Use this one instead, which I found in MOS Doc ID 1682909.1:

[oracle@rac12pri1 rda]$ ./ -XRemote list
Defined nodes:
  NOD001  rac12pri1  NCDBA1
  NOD002  rac12pri2  NCDBA2
[oracle@rac12pri1 rda]$

It merely lists the nodes, without actually starting to do any work.

Initiate data collection

When you are happy with the output and configuration, start collecting data. Here is an example of my session:

[oracle@rac12pri1 rda]$ ./ -v -e TRC/TRACE=1
Collecting diagnostic data ...
RDA Data Collection Started 04-Mar-2017 16:10:15
Processing RDA.BEGIN module ...
 Inside BEGIN module, testing the RDA engine code build
 Inside BEGIN module, testing the report directory
 Inside BEGIN module, testing the module targets
 Inside BEGIN module, launching parallel executions
Processing RDA.CONFIG module ...
 Inside CONFIG module, listing Oracle homes
 Inside CONFIG module, getting Oracle home inventory (can take time)
Processing RDA.REMOTE module ...
NOD001: Detecting storage type
NOD002: Detecting storage type
NOD001: Running RDA command
NOD002: Installing RDA software
NOD002: Running RDA command
NOD002: Transfering report package
NOD001: Transfering report package
Processing RDA.END module ...
 Inside END module, gathering system information
 Inside END module, getting CPU information (linux)
 Inside END module, getting memory information (linux)
 Inside END module, producing the file catalog
 Inside END module, producing target overview
 Inside END module, waiting for parallel execution completion
 Inside END module, producing setting overview
RDA Data Collection Ended 04-Mar-2017 16:17:44
Generating the reports ...
        - collect/RDA_CONFIG_homes.txt ...
        - collect/RDA_CONFIG_oh_inv.txt ...
        - collect/RDA_END_system.txt ...
        - collect/RDA_END_files.txt ...
        - collect/RDA_END_target.txt ...
        - collect/RDA_END_report.txt ...
        - Report index ...
Packaging collection results ...

[...skipping a lot of text...]

  You can review the reports by transferring the /home/oracle/rda/output
  directory structure to a location where you have web-browser access.
Then, point your browser at this file to display the reports:

[...skipping a lot of text...]

It is crucially important to see these lines:

Processing RDA.REMOTE module ...
NOD001: Detecting storage type
NOD002: Detecting storage type
NOD001: Running RDA command
NOD002: Installing RDA software
NOD002: Running RDA command
NOD002: Transfering report package
NOD001: Transfering report package

In my first attempts, when I didn’t deploy RDA on all nodes myself, the lines “NOD002: Running RDA command” and “NOD002: Transfering report package” weren’t shown. Unsurprisingly no data was gathered on the remote nodes.

Viewing the output

At the end of the data collection you should see a *.tar.gz file per node. In my 2 node cluster setup, there are two:

[oracle@rac12pri1 rda]$ ls output/remote/
RDA_nod001_output.tar.gz  RDA_nod002_output.tar.gz

You can view these after extracting to a temporary location in your browser, start with the file named RDA__start.htm which presents the various parts of the report.

Further reading: Testing user equivalence prior to gathering data

While trying to troubleshoot my remote data gathering problems (I suspected a problem with SSH at first) I noticed that RDA offers test modules as well (see MOS Doc ID: 330760.1). What’s missing from the Doc ID is an example on how to invoke the SSH test module, or rather the command in the RAC specific instructions seems not to work. But it isn’t too hard to figure out the proper call to the RDA executable. The following worked for me:

[oracle@rac12pri1 rda]$ ./ -T ssh
Processing remote operation test module ...

Command Availability:
  rcp          (Not found)
  remsh        (Not found)
  rsh          (Not found)
  scp          /bin/scp
  ssh          /bin/ssh
  ssh-add      /bin/ssh-add
  ssh-agent    /bin/ssh-agent

Related files:

Check if an authentication agent has been started
Agent identities:
  1024 some:colums:separated:by:colons:indicating:the:ID
  /home/oracle/.ssh/id_rsa (RSA)

Driver Availability:
    da Available
  jsch Available
   ssh Available
   rsh -

Settings modified by the remote operation initialization:
  REMOTE.T_SCP_OPTIONS=-BCpq -o ConnectTimeout=30
  REMOTE.T_SSH_OPTIONS=-Cnq -o ConnectTimeout=30

Test a remote command
Enter the remote host name
Press Return to accept the default (localhost)
> rac12pri2

Do you want RDA to use an alternative login to execute remote requests (Y/N)?
Press Return to accept the default (N)

Check remote command execution with best driver
Exit code: 0

Check remote command execution using DA
Exit code: 0

Check remote command execution using JSCH
Exit code: 0

Check remote command execution using SSH
Exit code: 0

Check remote command execution using SSH (no timeout)
Exit code: 0
[oracle@rac12pri1 rda]$

Looking at the above output let me to believe that there isn’t a SSH-related problem with my cluster.