Category Archives: 12c Release 1

Dead Connection Detection (DCD) and the Oracle database

Dead Connection Detection is a useful feature of the Oracle database: it allows for the cleanup of “dead” sessions so they don’t linger around consuming memory and other system resources. The idea is simple: if the database detects that a client process is no longer connected to its server process, it cleans up. This can happen in many ways, in most cases this kind of problem is triggered by an end user.

A dead connection shouldn’t be confused with idle connections: an idle connection still maintains the network link between client and server process, except that there is no activity. Idle connections aren’t maintained/controlled via DCD, there are other tools in the database handling such cases.

As a by product, DCD can also help with overly eager firewalls forcibly removing seemingly idle network connections. I found the following posts and the references therein very useful:

With Oracle 12c Oracle changed the way DCD works by no longer relying on its network layer but rather pushing the functionality into the TCP stack on platforms that support it. This change in behaviour is also explained in an Oracle white paper from 2014.

For the first part of this post I set sqlnet.expire_time to 1 as per a MOS note I found, your value is probably different. The parameter is documented in the Net*8 reference, please use it to work out what the best value is for you. As others have pointed out, this parameter has to go into the RDBMS home, more specifically $ORACLE_HOME/network/admin/sqlnet.ora. I am using Oracle 19.4 on Oracle Linux 7.7 for this blog post. I have seen the same behaviour in 12.1.0.2 as well in my lab.

Background

In addition to the proven methods of checking whether TCP_KEEPALIVE is enabled for a given session I wanted to show another one. Using the ss(8) utility it is possible to show socket options. I also tried lsof but on my system I couldn’t get it to print the options:

SQL> select spid from v$process where addr = (select paddr from v$session where username = 'MARTIN');

SPID
------------------------
13656

SQL> exit 

...

[root@server2 ~]# lsof -nP -p 13656 -T f
lsof: unsupported TCP/TPI info selection: f
lsof 4.87

Although the man-page for lsof reads:

       -T [t]   controls the reporting of some TCP/TPI information,  also  reported  by  net‐
                stat(1),  following  the network addresses.  In normal output the information
                appears in parentheses, each item except TCP or TPI state name identified  by
                a keyword, followed by `=', separated from others by a single space:

                     <TCP or TPI state name>
                     QR=<read queue length>
                     QS=<send queue length>
                     SO=<socket options and values>
                     SS=<socket states>
                     TF=<TCP flags and values>
                     WR=<window read length>
                     WW=<window write length>

                Not all values are reported for all UNIX dialects.  Items values (when avail‐
                able) are reported after the item name and '='.

                When the field output mode is in effect  (See  OUTPUT  FOR  OTHER  PROGRAMS.)
                each item appears as a field with a `T' leading character.

                -T with no following key characters disables TCP/TPI information reporting.

                -T with following characters selects the reporting of specific TCP/TPI infor‐
                mation:

                     f    selects reporting of socket options,
                          states and values, and TCP flags and
                          values.

So let’s try something else: ss(8) – another utility to investigate sockets

Revealing socket options

I have used ss(8) in the past when I didn’t have netstat available, which is more and more common now that netstat is deprecated and its designated successor is ss :)

As far as I know you can’t limit ss to show information just for a PID, I use grep to limit the output. The output is in fact very wide, which is why this might not look pretty on the blog depending on whether the renderer decides to wrap output or not.

[root@server2 ~]# ss -nop | egrep 'NetidState|13656'
NetidState Recv-Q Send-Q                                    Local Address:Port                                      Peer Address:Port                                                                                                           
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,4.412ms,0)        
[root@server2 ~]# 

I used the following options:

  • -n for “Do not try to resolve service names”
  • -o for “Show timer information” and finally
  • -p to “Show process using socket”.

The main option here is -o. As per the man page:

       -o, --options
              Show timer information. For tcp protocol, the output format is:

              timer:(<timer_name>,<expire_time>,<retrans>)

              <timer_name>
                     the name of the timer, there are five kind of timer names:

                     on: means one of these timers: tcp retrans timer, tcp early retrans timer and tail loss probe timer
                     keepalive: tcp keep alive timer
                     timewait: timewait stage timer
                     persist: zero window probe timer
                     unknown: none of the above timers

              <expire_time>
                     how long time the timer will expire

              <retrans>
                     how many times the retran occurs

With a little shell loop I can show how that timer is decrementing:

[root@server2 ~]# for i in $(seq 1 5); do ss -nop | grep 13656; sleep 1 ; done
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,20sec,0)          
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,19sec,0)          
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,18sec,0)          
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,17sec,0)          
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,16sec,0)          
[root@server2 ~]# 

Summary

Using the ss utility it is possible to check if a keepalive timer is implemented as a means to support DCD with 12.1 and later releases. Invoking ss(8) hasn’t caused any problems on my system, but as with every such tool you need to ensure it’s safe to use before attempting to look at an important system.

sqlldr, direct path loads and concurrency in 12.1 and earlier

I have recently come across an interesting issue related to concurrent data loading into the Oracle database using sqlldr’s direct path mode. Although I investigated the situation on 12.1.0.2, I found that the same holds true in 19.4 as well when using the defaults. I reconstructed the case, although it is simplified a little to bring the point home.

The environment I used to put this post together is Oracle Restart 19.4.0 on Oracle Linux 7.6.

Test overview

For this test I am running concurrent sqlldr sessions to demonstrate the case. I am conscious of that fact that I could have used external tables, but then I wouldn’t have been able to write this post :)

Assume there’s a table named t2:

SQL> create table t2 as select * from dba_objects where 1 = 0;

Table created.

I am planning on populating the table with data in CSV format. Using sqlcl it is dead easy to create an input file, simply specify the /*csv*/ hint in your query and spool the output to a file: job done. The resulting input file is named t2.dat.

Next up is the control file. I am a creature of habit and tend to use a control file although I could have tried the same process using express mode.

Using the documentation I figured the following control file should do. Since it’s easy enough to forget passing direct=true to the command line I added it to the options clause. When using sqlcl to create a CSV file it adds the column headings in line 1, and they won’t load so I’m skipping these records. Here is the full control file:

options (direct=true, skip=1)
load data 
infile 't2.dat'
into table t2
append
fields csv without embedded 
date format 'dd-mon-yy'
trailing nullcols
(
OWNER,                                                                          
OBJECT_NAME,                                                                    
SUBOBJECT_NAME,                                                                 
OBJECT_ID,                                                                      
DATA_OBJECT_ID,                                                                 
OBJECT_TYPE,                                                                    
CREATED date,                                                                        
LAST_DDL_TIME date,                                                                  
TIMESTAMP,                                                                      
STATUS,                                                                         
TEMPORARY,                                                                      
GENERATED,                                                                      
SECONDARY,                                                                      
NAMESPACE,                                                                      
EDITION_NAME,                                                                   
SHARING,                                                                        
EDITIONABLE,                                                                    
ORACLE_MAINTAINED,                                                              
APPLICATION,                                                                    
DEFAULT_COLLATION,                                                              
DUPLICATED,                                                                     
SHARDED,                                                                        
CREATED_APPID,                                                                  
CREATED_VSNID,                                                                  
MODIFIED_APPID,                                                                 
MODIFIED_VSNID
)

With the prerequisites at hand I’m ready to perform some testing.

Test #1: concurrent sessions using defaults

First of all, what happens when starting 5 concurrent sessions? Will they complete? Let’s try this in bash:

for i in $(seq 1 5) ; do
  echo starting iteration $i
  sqlldr /@loader control=t2.ctl log=session_${i}.log &
done

This ran to completion without any visible errors, but I prefer to run sanity checking anyway. The first thing to do is to see if all records have been loaded. The input file contains 73148 data records by the way.

$ grep 'Rows successfully loaded' session*log
session_1.log:  73148 Rows successfully loaded.
session_2.log:  73148 Rows successfully loaded.
session_3.log:  73148 Rows successfully loaded.
session_4.log:  73148 Rows successfully loaded.
session_5.log:  73148 Rows successfully loaded.

So this looks ok, what about the run times?

$ grep '^Run' *.log
session_1.log:Run began on Mon Jul 22 21:32:31 2019
session_1.log:Run ended on Mon Jul 22 21:32:44 2019
session_2.log:Run began on Mon Jul 22 21:32:31 2019
session_2.log:Run ended on Mon Jul 22 21:32:40 2019
session_3.log:Run began on Mon Jul 22 21:32:31 2019
session_3.log:Run ended on Mon Jul 22 21:32:46 2019
session_4.log:Run began on Mon Jul 22 21:32:31 2019
session_4.log:Run ended on Mon Jul 22 21:32:48 2019
session_5.log:Run began on Mon Jul 22 21:32:31 2019
session_5.log:Run ended on Mon Jul 22 21:32:42 2019

All in all these loads completed in a reasonably short time. Some took longer to finish than others though. That made me curious and I logged into the database to see what was going on. I never consciously saw this event:

SQL> select sid, event, state from v$session where program like 'sqlldr%';

       SID EVENT                                                            STATE
---------- ---------------------------------------------------------------- -------------------
        44 kpodplck wait before retrying ORA-54                             WAITING
        52 kpodplck wait before retrying ORA-54                             WAITING
        54 kpodplck wait before retrying ORA-54                             WAITING
       290 Data file init write                                             WAITING
       291 kpodplck wait before retrying ORA-54                             WAITING

SQL> 

Using the output from my above query I could see that one session was active, all the others queued up behind it. Interestingly there is no “enq: TM – contention” which you’d see with direct path inserts in sqlplus for example:

-- in session 1
SQL> insert /*+ append */ into t2 select * from dba_objects;

73157 rows created.

-- notice there is no commit! 

-- Session 2 waits ...
SQL> insert /*+ append */ into t2 select * from dba_objects;

-- the DBA sees it all:
SQL> select sid, event, state from v$session where username = 'MARTIN'

       SID EVENT                                                            STATE
---------- ---------------------------------------------------------------- -------------------
        51 enq: TM - contention                                             WAITING
       270 SQL*Net message from client                                      WAITING

What’s even more interesting is this: ddl_lock_timeout which I thought might have helped, has no effect on sqlldr operations:

SQL> show parameter ddl_lock_timeout

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
ddl_lock_timeout                     integer     0
SQL> 

The parameter isn’t set at all, yet I could see “kpodplck wait before retrying ORA-54” which is strange. ORA-54 is of course the dreaded “resource busy and acquire with nowait specified” error. I managed to trace kpodp back to the direct load path functionality.

Without the ability to control waiting via a database parameter I am aware of, there must be some other timeout. And sure enough, there is:

$ time sqlldr /@loader control=t2.ctl

SQL*Loader: Release 19.0.0.0.0 - Production on Mon Jul 22 21:46:59 2019
Version 19.4.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct
SQL*Loader-951: Error calling once/load initialization
ORA-00604: error occurred at recursive SQL level 1
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

real    0m32.038s
user    0m0.011s
sys     0m0.011s

It appears as if sqlldr times out waiting for a locked resource after round about 30 seconds, regardless of ddl_lock_timeout.

And this is the end of the story if you aren’t on 12.2 or later. In the next post I’ll show you how you can work around this problem with more current releases.

RBAL (ospid: nnn): terminating the instance due to error 27625 after patching Oracle Restart

I have come across an odd behaviour trying to patch an Oracle Restart environment to 12.1.0.2 January 2019. Based on a twitter conversation this isn’t necessarily limited to my patch combination, there might be others as well. I have used opatchauto to apply patch 28813884 to both RDBMS and GRID homes plus its corresponding OJVM (Java) patch. Before diving into details, this is the environment I have been working with:

  • Oracle Restart 12.1.0.2 with an earlier PSU/OJVM combo applied
  • Separation of duties with oracle as the RDBMS owner, and grid owning the GRID infrastructure installation. This is a key point!
  • Patches to be applied
    • Patch 28813884 (GI PSU)
    • Patch 28790654 (corresponding OJVM patch)
  • OPatch version as installed in the GRID|RDBMS home
    • opatchauto: 13.9.4.2.0
    • opatch: 12.2.0.1.17

Following the instructions in the readme file I upgraded OPatch to the required version. The file I downloaded was named p6880880_121010_Linux-x86-64.zip. I double-checked the readme, and to me this is the correct file. After upgrading OPatch in the RDBMS and GRID homes, I started patching.

After this process completed, I wanted to start the database in UPGRADE mode as required by the OJVM patch. This needs to be done via sqlplus since srvctl does not support an “upgrade” option in “start database”.

And that’s where it hit me: whenever I tried to open the database in UPGRADE mode, it threw an error:

 SQL*Plus: Release 12.1.0.2.0 Production on Tue Jul 16 12:15:34 2019

 Copyright (c) 1982, 2014, Oracle.  All rights reserved.

 Connected to an idle instance.
 
 SQL> startup upgrade
 
 ORACLE instance started.
 
 Total System Global Area 1073741824 bytes
 Fixed Size                  2932632 bytes
 Variable Size             713031784 bytes
 Database Buffers          352321536 bytes
 Redo Buffers                5455872 bytes
 ORA-03113: end-of-file on communication channel
 Process ID: 22354
 Session ID: 14 Serial number: 41958

Looking at the alert log I can see that RBAL terminated the instance:

ALTER DATABASE   MOUNT
 ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.56.113)(PORT=1521))' SCOPE=MEMORY SID='ORCL';
 2019-07-16 12:15:43.404000 -04:00
 NOTE: ASMB mounting group 1 (DATA)
 WARNING: cellinit.ora is missing. RBAL is terminating the instance.
 RBAL (ospid: 22332): terminating the instance due to error 27625
 System state dump requested by (instance=1, osid=22332 (RBAL)), summary=[abnormal instance termination].
 System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/ORCL/trace/ORCL_diag_22306_20190716121543.trc
 Dumping diagnostic data in directory=[cdmp_20190716121543], requested by (instance=1, osid=22332 (RBAL)), summary=[abnormal instance termination].
 Instance terminated by RBAL, pid = 22332
 2019-07-16 12:15:49.164000 -04:00

But since this is Oracle Restart, Clusterware will simply restart the database. Unless of course you’ve been changing the default behaviour. And funny enough, this works (see further down in the article as to why). Although I’d appreciate this in most cases, the automatic restart isn’t appropriate in my situation: when started by Clusterware, the database is not in upgrade mode:

 SQL> select status from v$instance;

 STATUS
 ------------
 OPEN

Which is a problem for me. A MOS search about error 27625 didn’t reveal anything too useful, and it took me quite a while to realise the problem has to do with permissions. An Internet search finally gave me the right answer, a fellow blogger has pointed it out a little while ago …

So what exactly is the problem? The RDBMS “oracle” binary needs a specific set of permissions and ownership/group membership for Oracle Restart with separation of duties enabled. This is what it looked like before applying the patch:

$ ls -l $ORACLE_HOME/bin/oracle
-rwsr-s--x. 1 oracle asmadmin 324518992 Jul 16 12:15 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle

The file permissions are ‘6751’ with oracle owning the file but it belongs to the asmadmin group. Remember, I am using a separate user for Grid Infrastructure with its own specific operating system groups. After running opatch, this changed to:

$ ls -l $ORACLE_HOME/bin/oracle
-rwsr-s--x. 1 oracle oinstall 324518992 Jul 16 12:29 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle

Permissions remained the same, however the group changed from asmadmin to oinstall. Not quite what I had in mind, and it reproducibly causes instance crashes. There is a simple solution: make sure permissions are set correctly! Using “Database Creation on 11.2/12.1/12.2 Grid Infrastructure with Role Separation ( ORA-15025, KFSG-00312, ORA-15081 ) (Doc ID 1084186.1)” got me on the right track.

 $ ls -l $ORACLE_HOME/bin/oracle
 -rwsr-s--x. 1 oracle asmadmin 324518992 Jul 16 12:29 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle

 $ sqlplus / as sysdba

 SQL*Plus: Release 12.1.0.2.0 Production on Tue Jul 16 12:34:45 2019

 Copyright (c) 1982, 2014, Oracle.  All rights reserved.

 Connected to an idle instance.

 SQL> startup upgrade
 ORACLE instance started.

 Total System Global Area 1073741824 bytes
 Fixed Size                  2932632 bytes
 Variable Size             713031784 bytes
 Database Buffers          352321536 bytes
 Redo Buffers                5455872 bytes
 Database mounted.
 Database opened.

 SQL> select status from v$instance;

 STATUS
 ------------
 OPEN MIGRATE

 SQL> 

With the database in upgrade mode (I believe the “migrate” is a left over from the 9i days) I am reassured that running datapatch (the OJVM part actually) works as advertised.

By the way Clusterware corrects the group permissions when you issue a “srvctl start database -db …” command as documented in “Starting the database using srvctl changes the group setting for oracle binary (Doc ID 1508027.1)”. Which didn’t solve my problem as I can’t start the database in upgrade mode using srvctl.

Hope this helps you one day!

Creating a RAC 12.1 Data Guard Physical Standby environment (3b)

Huh, what is this I hear you ask? Part 3b? Oracle 12.1? Well, there’s a bit of a story to this post. Back in December 2016 I started to write a series of blog posts (part 1 | part 2 | part 3 | part 4) about how I created a standby database on RAC 12.1. For some reason I forgot to post this part. Up until now the step where I am creating the broker configuration was missing. Thanks to a friend I discovered my mistake. I decided to post this article to complete the series. There’s a pretty big logical gap that needed filling :)

I also deliberately decided against making changes to my notes, so this is written in the same style as the articles preceding this one. Also bear in mind that this is Oracle 12.1!

NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small-scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!

Here is the original blog post

In a last-minute decision I decided to split part 3 into half: first the database duplication, followed by the Data Guard configuration. It really got a bit too long (even for my standards), and would have put readers off the article. I appreciate your patience though…

What’s the situation?

At the end of part 3 I had a physical standby database ready for use, including its registration in the OCR. it was looking promising, but there was more work to do.

At the moment, the standby database is out of sync with the primary, a fact that needs to be rectified by creating a Data Guard configuration. I prefer to do this on the command line. If you are so inclined you can of course push the mouse around and do it with OEM.

Step 1: Data Guard Broker pre-requisites

Before creating the configuration you should change the location of the broker configuration files to shared storage. This needs to be done on each side – primary and every standby – before you can start the broker.

SQL> alter system set dg_broker_config_file1='+DATA/NCDBA/dr1NCDBA.dat' scope=both sid='*';

System altered.

SQL> alter system set dg_broker_config_file2='+DATA/NCDBA/dr2NCDBA.dat' scope=both sid='*';

System altered.

SQL> show parameter dg_broker

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
dg_broker_config_file1               string      +DATA/NCDBA/dr1NCDBA.dat
dg_broker_config_file2               string      +DATA/NCDBA/dr2NCDBA.dat
dg_broker_start                      boolean     FALSE

SQL> alter system set dg_broker_start=true scope=both sid='*';

System altered.

SQL> 

I had to repeat the step on NCDBB, of course with adapted paths and file names.

Step 2: Creating a Data Guard Broker configuration

Once that’s done, you create the configuration using dgmgrl, the Data Guard Broker Manager Line Mode tool. I named my configuration RACTEST and added both databases. Here is an example of my session:

[oracle@rac12pri1 ~]$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys@ncdba
Password:
Connected as SYSDBA.
DGMGRL> create configuration ractest as primary database is 'NCDBA' connect identifier is 'NCDBA';
Configuration "ractest" created with primary database "NCDBA"
DGMGRL> add database 'NCDBB' as connect identifier is 'NCDBB';
Database "NCDBB" added
DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
DISABLED

As a first step I connect as SYSDBA before creating the configuration. You can usually type help command to remind you of the exact syntax. After the configuration is created you can start adding (standby) databases. In my case there is only 1 – physical standby – database, but you can think of environments where you have more than one standby environment although that is less common. When specifying the connect identifier you use valid net*8 names found in the default tnsnames.ora (the one in the RDBMS home).

At this point it is important to set standby_file_management to auto. Otherwise managed recovery can abort and you will have to do extra work managing data files that can easily be avoided. The broker syntax to do so involves changing database properties (type show database verbose 'NCDBA' to show all of them). Many Data Guard broker properties are not easily mapped back to the corresponding initialisation parameter. In the case of standby_file_management it is not hard, and the documentation helps with the rest.

Eventually I enable the configuration and check the status. I have found that it can take a couple of minutes at times to get the SUCCESS message.

DGMGRL> edit database 'NCDBA' set property standbyfilemanagement=auto;
Property "standbyfilemanagement" updated
DGMGRL> edit database 'NCDBB' set property standbyfilemanagement=auto;
Property "standbyfilemanagement" updated
DGMGRL> enable configuration
Enabled.
DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 21 seconds ago)

There are other things you must consider in accordance with your company’s (data protection/availability/business continuity/…) standards and other documentation – again my example is rather basic, but deliberately so. It’s my lab playground after all.

For the sake of completeness, here is the current configuration of NCDBA and NCDBB:

DGMGRL> show database 'NCDBA'

Database - NCDBA

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    NCDBA1
    NCDBA2

Database Status:
SUCCESS

DGMGRL> show database 'NCDBB';

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 223.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

DGMGRL> exit

Please note that the broker detected the fact that my systems are cluster databases and lists both instances for each database. It also shows that NCDBB2 is the apply instance. I was positively surprised about the broker’s ability to detect both instances automatically. I seem to remember cases in earlier releases where said instance detection occasionally required a little nudging to get it right.

This example is deliberately kept as minimalist as possible: it works. Most likely it won’t satisfy your requirements. Review your Data Guard (and any other relevant) standards and change the configuration accordingly.

Summary

… back to present time :) This post completes the series, all my notes are now transcribed to the blog. I hope you find this useful.

back to part 3 | on to part 4

A quick look at Oracle 12.2 performance instrumentation

Thanks to the prep-work done by Andy Colvin, who continuously and boldly goes where no man has gone before, one of our Exadata systems in the lab is now fully upgraded to 12.2.0.1. It comes fully equipped with the matching cellos to support all the cool new features. Exciting times!

The reason for this post is simple: I have started working on our talk for @Enkitec’s E4 conference in June but thought I’d share this little article with you as a teaser :) There might be one or two more of these posts but if you want the full story make sure you catch us (online) during the conference.

More diagnostic information in 12.2

The Oracle database truly champions diagnosability in a way I have not seen with any other system, and it does so out of the box. Granted, some of that requires an investment into extra cost options but the value one gets in form of Active Session History (ASH) and Automatic Workload Repository (AWR) is real. After I read the chapter on instrumentation in “Insights-tales from the Oaktable” (Apress) a long time ago, I started to realise the inherent beauty of having insights to code. This code can be your code if you instrumented it properly, or the Oracle codepath externalised as wait events. Sadly most application developers do not adhere to the principle of instrumenting code (or maybe don’t even know about the possibility?) and therefore complicate troubleshooting unnecessarily. The latter is not so much an issue on many platforms where you don’t have an equivalent of the Oracle Wait Interface and session counters anyway, but on Oracle it’s a real wasted opportunity as others have pointed out before me.

I’ll now take my idealist hat off and focus on the real world :) In the far more common case where the application isn’t instrumented by the developer, you can still get to some conclusions by using the Wait Interface and session counters. In most scenarios I am involved in the first step is to use Tanel Poder’s session snapper script which gives me insights to both.

So what has changed in this respect with 12.2? The test environment I am using is an Exadata quarter rack as mentioned before. The findings should be comparable with other Oracle software offerings, in the cloud and on premises.

Wait Interface

The Wait Interface is one of the most useful features for the performance engineer, and one of the building blocks for Statspack, ASH and AWR. I was curious if new events have been introduced in 12.2, hence this investigation. The sheer number of events tracked in the database prevents them from being listed verbally in this post. If you want to, you can use Tanel’s “sed.sql” to find out more, or simply query v$event_name.

An interesting tidbit for 12.2 has been covered by @FranckPachot: some of the more misleading event names such as db file scattered read and db file sequential readhave been clarified in 12.2. Search for events where the name is not equal to the display_name, or read Franck’s post on the DBI blog.

Wait events in 11.2.0.3

I like to start comparisons with a benchmark, and 11.2.0.3 seems to be a good candidate. Just looking at the wait_classes and counting events per class should be a good starting point:

SQL> select count(*), wait_class 
  2  from v$event_name 
  3  group by rollup(wait_class) 
  4  order by wait_class;

   COUNT(*) WAIT_CLASS
----------- ----------------------------------------------------------------
         55 Administrative
         17 Application
         50 Cluster
          2 Commit
         33 Concurrency
         24 Configuration
         95 Idle
         35 Network
        745 Other
          9 Queueing
          8 Scheduler
         31 System I/O
         48 User I/O
       1152

14 rows selected.

So there are 1152 events total in 11.2.0.3, keep that number in mind.

Wait events in 12.1.0.2

In my opinion 12.1 is a major step ahead, and I said it many times: I haven’t seen so many fundamental changes to the Oracle database for a long time. For various reasons though 12.1 hasn’t seen the traction in the field it probably deserved. Note how the diagnosability has been greatly enhanced:

SQL> select count(*), wait_class 
  2  from v$event_name 
  3  group by rollup(wait_class) 
  4  order by wait_class;

   COUNT(*) WAIT_CLASS
----------- ----------------------------------------------------------------
         57 Administrative
         17 Application
         64 Cluster
          4 Commit
         38 Concurrency
         26 Configuration
        121 Idle
         28 Network
       1186 Other
          9 Queueing
          9 Scheduler
         35 System I/O
         56 User I/O
       1650

14 rows selected.

A quick calculation reveals that 12.1 features 498 additional events, a lot more than 11.2.0.3. Not too bad in my opinion. Unfortunately most of these additional events ended up in the “Other” wait class. This is a real shame, and I have remarked this before when blogging about the new Data Guard Wait events: they probably should have ended up in the Network class instead. I am sure there are other examples like this.

Wait events in 12.2.0.1

Now what about 12.2? First let’s start with the list:

SQL> select count(*), wait_class
  2  from v$event_name
  3  group by rollup(wait_class)
  4  order by wait_class;

  COUNT(*) WAIT_CLASS
---------- ----------------------------------------------------------------
        57 Administrative
        17 Application
        68 Cluster
         4 Commit
        49 Concurrency
        27 Configuration
       135 Idle
        29 Network
      1314 Other
         9 Queueing
        10 Scheduler
        35 System I/O
        57 User I/O
      1811

14 rows selected.

There are indeed some new events, most of them can be found in the Other wait class. Again, this is quite unfortunate as it prevents the performance architect from identifying unknown wait events quickly.

I have decided to keep this post short-ish and will spend more time some other day to investigate the exact difference between 12.1.0.2 and 12.2.0.1. Most likely after E4 this year.

Session Counters

With the wait interface covered, it’s time to move on to the session counters. Continuing the approach I took with wait events I will group all session counters by class. Instead of re-inventing the wheel I am using a slightly adapted version of Tanel Poder’s “mys.sql” script to group counters by class. Most of them fall in just one, but there are others where more than one class is applicable. The 12.2 Reference Guide explains v$statname.class in more detail.

Session counters in 11.2.0.3

Before investigating 12.1 and 12.2 I’ll look at 11.2.0.3 first, as in the first section of the article. The SQL statement I used is this:

with classification as (
select name, TRIM(
  CASE WHEN BITAND(class,  1) =   1 THEN 'USER  ' END ||
  CASE WHEN BITAND(class,  2) =   2 THEN 'REDO  ' END ||
  CASE WHEN BITAND(class,  4) =   4 THEN 'ENQ   ' END ||
  CASE WHEN BITAND(class,  8) =   8 THEN 'CACHE ' END ||
  CASE WHEN BITAND(class, 16) =  16 THEN 'OS    ' END ||
  CASE WHEN BITAND(class, 32) =  32 THEN 'RAC   ' END ||
  CASE WHEN BITAND(class, 64) =  64 THEN 'SQL   ' END ||
  CASE WHEN BITAND(class,128) = 128 THEN 'DEBUG ' END
) class_name 
from v$statname
) select count(*), class_name from classification 
group by rollup(class_name)
order by class_name;

Executed on an 11.2.0.3 database this reveals the following numbers:

   COUNT(*) CLASS_NAME
----------- ------------------------------------------------
        121 CACHE
         27 CACHE RAC
         15 CACHE SQL
        188 DEBUG
          9 ENQ
         16 OS
         25 RAC
         32 REDO
         93 SQL
          2 SQL   DEBUG
        107 USER
          3 USER  RAC
        638

13 rows selected.

638 of them altogether. Keep the number in mind when moving to 12.1.

Session counters in 12.1.0.2

There was a major increase in the number of counters between 11.2.0.3 and 12.1.0.2. Consider the numbers:

  COUNT(*) CLASS_NAME
---------- ------------------------------------------------
       151 CACHE
        53 CACHE RAC
        15 CACHE SQL
       565 DEBUG
         9 ENQ
        16 OS
        35 RAC
        68 REDO
         1 REDO  RAC
       130 SQL
         2 SQL   DEBUG
       130 USER
         3 USER  RAC
      1178

14 rows selected.

That nearly doubles the number of counters available. Note that quite a few of the new counters fall into the DEBUG section. Looking a bit more closely you can see they seem to be used by the In Memory (IM) Option:

SQL>  SELECT
  2      regexp_substr(name,'\w+ \w+') AS short_name,
  3      COUNT(*)
  4  FROM
  5      v$statname
  6  WHERE
  7      class = 128
  8  GROUP BY
  9      regexp_substr(name,'\w+ \w+')
 10  ORDER BY
 11      2 DESC
 12  FETCH FIRST 5 ROWS ONLY;

SHORT_NAME                        COUNT(*)
------------------------------ -----------
IM repopulate                           49
IM scan                                 47
IM populate                             37
spare statistic                         35
IM space                                26

5 rows selected.

There are 198 session counters beginning with ‘IM %’. I can feel another post about DB In Memory coming …

Session counters in 12.2.0.1

Finally, here is the list of statistics in 12.2.0.1:

  COUNT(*) CLASS_NAME
---------- ------------------------------------------------
       174 CACHE
        73 CACHE RAC
        15 CACHE SQL
      1067 DEBUG
         9 ENQ
        16 OS
        35 RAC
        75 REDO
         1 REDO  RAC
       190 SQL
         2 SQL   DEBUG
       144 USER
         3 USER  RAC
      1804

Another 626 additional counters, that’s almost the number of counters available in total on the 11.2.0.3 system! Running my previous query again it seems that IM-related statistics dominate, but there are lots of others of interest.

As with the 12.2 wait events I don’t want to give too much away at this point (and the post is long enough anyway) so stay tuned for an update at a later time.

Summary

Oracle has been one of the best database engines around, and with 12.2.0.1 instrumentation is further improved. This post has again become too long, so I’ll stop here and defer the write-up of my investigation into those stats relevant for Exadata to a later point.

Happy troubleshooting!

Little things worth knowing: Executing RDA on RAC

Result! I have finally been able to gather a complete RDA (Oracle Remote Diagnostic Agent) output on my 2 node RAC system. After consulting the relevant documentation on MOS-which is spread over at least 42 Doc IDs-I found them not to be very helpful to the degree that some of what I read is actually wrong or contradicting. I put together a short note, primarily to myself, to remind me of the process. I hope you find it useful, too.

The RDA version I used for this post is 8.14.x from MOS March 4th 2017. My RAC nodes are based on Oracle Linux 7.3/UEK 4.

Starting the data collection

RDA is one of the standard tools I use and I have previously blogged about it. RDA started off as a simple-to-use tool. After having used it for some years I started to run into issues with the interactive configuration which simply took too long to complete. As soon as I learned about them I fell in love with profiles. RDA profiles only prompt you for relevant information about the product you want to collect data for.

Running RDA on a RAC system is similar to single instance, and you automatically use profiles-a nice touch. It appears to me that RAC data collection is triggered via SSH on the remote nodes with the results being transferred to the local node.

I believe there are two methods of data gathering on RAC: one for Grid Infrastructure, and another one for the RDBMS layer. Depending on your settings of ORACLE_HOME and ORACLE_SID different parts of the stack are analysed. In this post I was primarily interested in running RDA for the RDBMS part of my cluster.

I started off by downloading the current RDA version to the system I want to run it on. This is a 2-node RAC, with nodes named rac12pri1 and rac12pri2. Both are based on Oracle Linux 7.3 with the most current UEK4 at the time of writing. My RDBMS homes are 12.1.0.2 patched with the January 2017 proactive bundle patch + OJVM.

The first caveat looms right at the beginning of the entire process: I didn’t find it stated explicitly in the documentation on MOS, but it seemed that you need to deploy RDA on all nodes in the cluster, and in the exact same location, before you start the data collection. I use /home/oracle/rda for that purpose. Make sure you have plenty of space in the location you chose, sometimes /home doesn’t provide enough for larger, more active systems.

During my first unsuccessful attempts I didn’t deploy RDA on all nodes before starting the data collection, only to see it not gather any information from the remote node. This is somewhat confusing, because the output (when collecting data) states this:

NOD002: Installing RDA software

NOD002 is short for the second node in my cluster.

Also, when configuring the data gathering process, you will see a prompt like this one:

Enter the absolute path of the directory where RDA will be installed on the remote nodes.

To the casual observer like me these messages suggest that RDA is actually installed on the remote nodes as part of the data collection process-but it is only partially done. Comparing directory sizes between remote and local node revealed /home/oracle/rda to be greater than 200M locally, while on the remote nodes it was only 37MB in size. Something seems to be missing…

Once deployed, you can change to the RDA directory and prepare for the data collection. You actually execute RDA on the first node, the rest of the work is done programatically. The MOS note seems to be correct this time, here is an example of my configuration session:

[oracle@rac12pri1 rda]$ ./rda.pl -vXRda start CLOUD -pRac_Assessment
Creating collection "output" ...
        - Trying to identify the domain ... (can take time)
 Inside DFT scenario, define the input and profile targets
 Inside DFT scenario, check Oracle home or Middleware presence
        - RDA:DCbegin ...
------------------------------------------------------------------------------
RDA.BEGIN: Initializes the Data Collection
------------------------------------------------------------------------------
Enter the Oracle home to be used for data analysis
Press Return to accept the default (/u01/app/oracle/product/12.1.0.2/dbhome_1)
>

        - RDA:DCconfig ...
------------------------------------------------------------------------------
RDA.CONFIG: Collects Key Configuration Information
------------------------------------------------------------------------------
        - RDA:DCocm ...
------------------------------------------------------------------------------
RDA.OCM: Set up the Configuration Manager Interface
------------------------------------------------------------------------------
        - RDA:DCstatus ...
------------------------------------------------------------------------------
RDA.STATUS: Produces the Remote Data Collection Reports
------------------------------------------------------------------------------
        - RDA:DCload ...
------------------------------------------------------------------------------
RDA.LOAD: Produces the External Collection Reports
------------------------------------------------------------------------------
        - RDA:DCfilter ...
------------------------------------------------------------------------------
RDA.FILTER: Controls Report Content Filtering
------------------------------------------------------------------------------
        - RDA:DCend ...
------------------------------------------------------------------------------
RDA.END: Finalizes the Data Collection
------------------------------------------------------------------------------
In CRS start scenario, getting nodes using /u01/app/12.1.0.2/grid/bin/olsnodes
2>/dev/null
------------------------------------------------------------------------------
Requesting common cluster information
------------------------------------------------------------------------------
Enter the absolute path of the directory where RDA will be installed on the
remote nodes.
Press Return to accept the default (/home/oracle/rda)
>

Do you want RDA to use an alternative login to execute remote requests (Y/N)?
Press Return to accept the default (N)
>

In CRS start scenario, getting local node information
In CRS start scenario, pinging node rac12pri2
------------------------------------------------------------------------------
Requesting information for node rac12pri1
------------------------------------------------------------------------------
Enter the Oracle home to be analyzed on the node rac12pri1
Press Return to accept the default (/u01/app/oracle/product/12.1.0.2/dbhome_1)
>

Enter the Oracle SID to be analyzed on the node rac12pri1
Press Return to accept the default (NCDBA1)
>

------------------------------------------------------------------------------
Requesting information for node rac12pri2
------------------------------------------------------------------------------
Enter the Oracle home to be analyzed on the node rac12pri2
Press Return to accept the default (/u01/app/oracle/product/12.1.0.2/dbhome_1)
>

Enter the Oracle SID to be analyzed on the node rac12pri2
> NCDBA2

        - RDA:DCremote ...
------------------------------------------------------------------------------
RAC Setup Summary
------------------------------------------------------------------------------
Nodes:
. NOD001  rac12pri1/NCDBA1
. NOD002  rac12pri2/NCDBA2
2 nodes found

So RDA understands the RAC scenario, and it gathers data for each node that is part of the cluster as reported by the olsnodes tool. There was nothing really surprising about the prompts, except that I haven’t found a way to analyse more than one database, or ASM and a database together.

Be careful to enter the correct ORACLE_SID for your (remote) RAC nodes. RDA picks up your ORACLE_HOME and ORACLE_SID if they are set.

Optionally verify the correct number of nodes was detected

I am only mentioning this for the sake of completeness, the RAC Setup Summary you saw in the previous step was more than adequate for me. If you really want to find out about the number of nodes you are about to collect information, be careful: MOS Doc ID 359395.1 is wrong – the command ./rda.pl -vC RDA.REMTOE list will not only list the nodes, it will also start the data collection. Use this one instead, which I found in MOS Doc ID 1682909.1:

[oracle@rac12pri1 rda]$ ./rda.pl -XRemote list
Defined nodes:
  NOD001  rac12pri1  NCDBA1
  NOD002  rac12pri2  NCDBA2
[oracle@rac12pri1 rda]$

It merely lists the nodes, without actually starting to do any work.

Initiate data collection

When you are happy with the output and configuration, start collecting data. Here is an example of my session:

[oracle@rac12pri1 rda]$ ./rda.pl -v -e TRC/TRACE=1
Collecting diagnostic data ...
------------------------------------------------------------------------------
RDA Data Collection Started 04-Mar-2017 16:10:15
------------------------------------------------------------------------------
Processing RDA.BEGIN module ...
 Inside BEGIN module, testing the RDA engine code build
 Inside BEGIN module, testing the report directory
 Inside BEGIN module, testing the module targets
 Inside BEGIN module, launching parallel executions
Processing RDA.CONFIG module ...
 Inside CONFIG module, listing Oracle homes
 Inside CONFIG module, getting Oracle home inventory (can take time)
Processing RDA.REMOTE module ...
NOD001: Detecting storage type
NOD002: Detecting storage type
NOD001: Running RDA command
NOD002: Installing RDA software
NOD002: Running RDA command
NOD002: Transfering report package
NOD001: Transfering report package
Processing RDA.END module ...
 Inside END module, gathering system information
 Inside END module, getting CPU information (linux)
 Inside END module, getting memory information (linux)
 Inside END module, producing the file catalog
 Inside END module, producing target overview
 Inside END module, waiting for parallel execution completion
 Inside END module, producing setting overview
------------------------------------------------------------------------------
RDA Data Collection Ended 04-Mar-2017 16:17:44
------------------------------------------------------------------------------
Generating the reports ...
        - collect/RDA_CONFIG_homes.txt ...
        - collect/RDA_CONFIG_oh_inv.txt ...
        - collect/RDA_END_system.txt ...
        - collect/RDA_END_files.txt ...
        - collect/RDA_END_target.txt ...
        - collect/RDA_END_report.txt ...
        - Report index ...
Packaging collection results ...

[...skipping a lot of text...]

  You can review the reports by transferring the /home/oracle/rda/output
  directory structure to a location where you have web-browser access.
Then, point your browser at this file to display the reports:
    RDA__start.htm

[...skipping a lot of text...]

It is crucially important to see these lines:

Processing RDA.REMOTE module ...
NOD001: Detecting storage type
NOD002: Detecting storage type
NOD001: Running RDA command
NOD002: Installing RDA software
NOD002: Running RDA command
NOD002: Transfering report package
NOD001: Transfering report package

In my first attempts, when I didn’t deploy RDA on all nodes myself, the lines “NOD002: Running RDA command” and “NOD002: Transfering report package” weren’t shown. Unsurprisingly no data was gathered on the remote nodes.

Viewing the output

At the end of the data collection you should see a *.tar.gz file per node. In my 2 node cluster setup, there are two:

[oracle@rac12pri1 rda]$ ls output/remote/
RDA_nod001_output.tar.gz  RDA_nod002_output.tar.gz

You can view these after extracting to a temporary location in your browser, start with the file named RDA__start.htm which presents the various parts of the report.

Further reading: Testing user equivalence prior to gathering data

While trying to troubleshoot my remote data gathering problems (I suspected a problem with SSH at first) I noticed that RDA offers test modules as well (see MOS Doc ID: 330760.1). What’s missing from the Doc ID is an example on how to invoke the SSH test module, or rather the command in the RAC specific instructions seems not to work. But it isn’t too hard to figure out the proper call to the RDA executable. The following worked for me:

[oracle@rac12pri1 rda]$ ./rda.pl -T ssh
Processing remote operation test module ...

Command Availability:
  rcp          (Not found)
  remsh        (Not found)
  rsh          (Not found)
  scp          /bin/scp
  ssh          /bin/ssh
  ssh-add      /bin/ssh-add
  ssh-agent    /bin/ssh-agent

Related files:
  /etc/ssh/moduli
  /etc/ssh/ssh_config
  /etc/ssh/ssh_host_ecdsa_key
  /etc/ssh/ssh_host_ecdsa_key.pub
  /etc/ssh/ssh_host_ed25519_key
  /etc/ssh/ssh_host_ed25519_key.pub
  /etc/ssh/ssh_host_rsa_key
  /etc/ssh/ssh_host_rsa_key.pub
  /etc/ssh/sshd_config
  /home/oracle/.ssh/authorized_keys
  /home/oracle/.ssh/config
  /home/oracle/.ssh/config.backup
  /home/oracle/.ssh/id_rsa
  /home/oracle/.ssh/id_rsa.pub
  /home/oracle/.ssh/known_hosts

Check if an authentication agent has been started
  SSH_AGENT_PID=1
  SSH_AUTH_SOCK=1
Agent identities:
  1024 some:colums:separated:by:colons:indicating:the:ID
  /home/oracle/.ssh/id_rsa (RSA)

Driver Availability:
    da Available
  jsch Available
   ssh Available
   rsh -

Settings modified by the remote operation initialization:
  REMOTE.F_SCP_COMMAND=/bin/scp
  REMOTE.F_SSH_COMMAND=/bin/ssh
  REMOTE.T_SCP_OPTIONS=-BCpq -o ConnectTimeout=30
  REMOTE.T_SSH_OPTIONS=-Cnq -o ConnectTimeout=30

------------------------------------------------------------------------------
Test a remote command
------------------------------------------------------------------------------
Enter the remote host name
Press Return to accept the default (localhost)
> rac12pri2

Do you want RDA to use an alternative login to execute remote requests (Y/N)?
Press Return to accept the default (N)
>

Check remote command execution with best driver
Exit code: 0

Check remote command execution using DA
Exit code: 0

Check remote command execution using JSCH
Exit code: 0

Check remote command execution using SSH
Exit code: 0

Check remote command execution using SSH (no timeout)
Exit code: 0
[oracle@rac12pri1 rda]$

Looking at the above output let me to believe that there isn’t a SSH-related problem with my cluster.

New Events for Data Guard and Synchronous Redo Transport in 12c (2)

After the baseline has been established in the first part of this series it’s time to measure the effect of the network in this part. The second test will introduce an interesting feature: Using Linux’s own Traffic Shaper/Quality of Services module I will add a delay of 100ms to the Data Guard network interface card (NIC) to slow things down a little.

WARNING: this is of course a lab or VM-only situation. I can’t exert control over wire quality in my (own) switches, hence some software magic is needed on my virtual ones. This post is intended to be merely for educational purposes, not for use at work.

I am continuing to use the 2 node RAC 12.1.0.2.170117 primary database on Oracle Linux 7 with UEK 4 and an identical RAC to host my standby database.

All database traffic still uses eth0 – my public network (192.168.100.0/24). In order not to affect my SSH session with the planned network changes I created a new out-of-band management interface (eth2) on the Data Guard standby database servers. I love KVM and virtio drivers for the NICs: a few commands in virsh later and every RAC node has a new NIC, and of course without rebooting. Adding some nmcli magic and I can configure all interfaces in 1 line!

Here’s what it looks like from connection manager (the system is Oracle Linux 7.3)

[root@rac12sec1 ~]# nmcli c show
NAME  UUID                                  TYPE            DEVICE 
eth0  3e8915f1-10ce-4da7-99b8-d2a6947f9ff7  802-3-ethernet  eth0   
eth1  0ebe1279-40e1-4faf-a0be-49588d347fcf  802-3-ethernet  eth1   
mgmt  51054729-19ae-49e7-ba18-c6b41475ec1b  802-3-ethernet  eth2   

My eth0 device maps to the public networks, e.g. all database traffic including Data Guard. The private interconnect uses eth1 with eth2 as the management interface. I have deliberately kept this simple, usually I’d request a dedicated set of interfaces to route Data Guard traffic (I blogged about that earlier in the context of RAC) here:

Please read past part I of the Adding an additional SCAN-series to get the full story.

Change protection mode

Following the scenario I outlined in part I I first have to change the protection mode to Maximum Availability. Again, please see part I of this series for an explanation about Protection Levels, my choice of protection level, and why “Maximum Availability” might or might not work for you.

Since I’m using the Data Guard Broker, I need to head over and change the mode in dgmgrl.

DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 40 seconds ago)

As with many other maintenance tasks, you use the edit command to change the configuration.

DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
Error: ORA-16627: operation disallowed since no standby databases would 
remain to support protection mode

Failed.

I seem to have gotten a bit ahead of myself – changing to the more serious protection modes requires changing the transport mode from async to sync. Let’s do that, and then change the protection mode:

DGMGRL> edit database 'NCDBA' set property LogXptMode = 'SYNC';
Property "logxptmode" updated
DGMGRL> edit database 'NCDBB' set property LogXptMode = 'SYNC';
Property "logxptmode" updated
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
Succeeded.
DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxAvailability
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 19 seconds ago)

DGMGRL> 

That looks better now, the first half of the job is done. The second half of this article details how to make the network look a lot worse.

Net Emulator

Using the Linux netem kernel module I can simulate many complex network situations – I found https://wiki.linuxfoundation.org/networking/netem to be a good reference.

I should probably say at this point that no one would use a link that has 100 ms delay between sites for (sync) redo transport. That simply doesn’t make sense. I used 100ms to have a large enough factor so that I can actually be sure that a deviation of performance figures is actually a result of my fiddling with the network latency. Large numbers make spotting outliers easier.

How do you mess with the network? That is simple when you are on Linux! On my standby cluster I use the tc command, part of the iproute package. This is the “normal” state:

[root@rac12sec1 ~]# tc qdisc
qdisc pfifo_fast 0: dev eth0 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth2 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
[root@rac12sec1 ~]# 

I am merely using these tools, if you want to understand them, have a look at the Linux Advanced Routing & Traffic Control HOWTO chapter 9, and refer to the link to the Linux Foundation wiki I mentioned earlier. Then take it from there.

The syntax of tc qdisc is complex at first sight:

[root@rac12sec1 ~]# tc qdisc help
Usage: tc qdisc [ add | del | replace | change | show ] dev STRING
       [ handle QHANDLE ] [ root | ingress | parent CLASSID ]
       [ estimator INTERVAL TIME_CONSTANT ]
       [ stab [ help | STAB_OPTIONS] ]
       [ [ QDISC_KIND ] [ help | OPTIONS ] ]

       tc qdisc show [ dev STRING ] [ingress]
Where:
QDISC_KIND := { [p|b]fifo | tbf | prio | cbq | red | etc. }
OPTIONS := ... try tc qdisc add <desired QDISC_KIND> help
STAB_OPTIONS := ... try tc qdisc add stab help

The above allows us to introduce NetEM to the mix. On my Linux 7.3/UEK4 system it offers the following options:

[root@rac12sec1 ~]# tc qdisc add netem help
Usage: ... netem [ limit PACKETS ] 
                 [ delay TIME [ JITTER [CORRELATION]]]
                 [ distribution {uniform|normal|pareto|paretonormal} ]
                 [ corrupt PERCENT [CORRELATION]] 
                 [ duplicate PERCENT [CORRELATION]]
                 [ loss random PERCENT [CORRELATION]]
                 [ loss state P13 [P31 [P32 [P23 P14]]]
                 [ loss gemodel PERCENT [R [1-H [1-K]]]
                 [ ecn ]
                 [ reorder PRECENT [CORRELATION] [ gap DISTANCE ]]
                 [ rate RATE [PACKETOVERHEAD] [CELLSIZE] [CELLOVERHEAD]]

That’s a wealth of options. I can simulate somewhat unreliable WAN links, even with variations in delay! Refer to the Linux Foundation wiki for more details on how to simulate networks more realistically.

Introducing Delay

I prefer to keep it simple for this post, so I’ll stick with the most basic setting for network delay. I am introducing this to the Data Guard network on both of my standby nodes while connected to the management interface eth2:

[root@rac12sec1 ~]# tc qdisc add dev eth0 root netem delay 100ms

[root@rac12sec1 ~]# tc qdisc
qdisc netem 8001: dev eth0 root refcnt 2 limit 1000 delay 100.0ms
qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth2 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

[root@rac12sec2 ~]# tc qdisc
qdisc netem 8001: dev eth0 root refcnt 2 limit 1000 delay 100.0ms
qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth2 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

[root@rac12sec2 ~]# tc qdisc show dev eth0 
qdisc netem 8001: root refcnt 2 limit 1000 delay 100.0ms

The effect can be seen immediately by pinging one of the standby nodes:

[oracle@rac12pri1 ~]$ ping rac12sec1
PING rac12sec1.example.com (192.168.100.20) 56(84) bytes of data.
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=1 ttl=64 time=100 ms
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=2 ttl=64 time=100 ms
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=3 ttl=64 time=100 ms
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=4 ttl=64 time=100 ms
^C
--- rac12sec1.example.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 100.368/100.415/100.478/0.229 ms
[oracle@rac12pri1 ~]$ 

So 100ms delay it is!

Starting the Benchmark

With the scene set, it’s time to start the benchmark. The command to start it resembles the one I used for part I, except for the name of the results file.

[oracle@loadgen1 bin]$ ./charbench -cs //rac12pri-scan/swingbench_both \
> -uc 20 -r results_20_maxavail_netem.xml -rt 00:10
Author  :	 Dominic Giles
Version :	 2.5.0.971

Results will be written to results_20_maxavail_netem.xml.
Hit Return to Terminate Run...

Time		Users	TPM	TPS

1:08:56 PM      20      5188    94          

What effect does this have on the database? Let’s check the Data Guard broker first:

DGMGRL> show database 'NCDBB'

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          10 seconds (computed 0 seconds ago)
  Average Apply Rate: 207.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

DGMGRL> show database 'NCDBB'

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          0 seconds (computed 0 seconds ago)
  Average Apply Rate: 211.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

So it looks like there isn’t any lag visible with the benchmark underway, however it took a little while for it to start as visible in the first example. Here is the output from Tanel Poder’s ASHTOP script again (using Tanel Poder’s ashtop again – which requires you to have the Diagnostic Pack license as it access v$active_session_history):

SQL> @ashtop program,event,wait_class,session_state session_type='BACKGROUND' "to_date('17.02.2017 08:10','dd.mm.yyyy hh24:mi')" "to_date('17.02.2017 08:15','dd.mm.yyyy hh24:mi')"

    Total
  Seconds     AAS %This   PROGRAM                                          EVENT                                    WAIT_CLASS      SESSION FIRST_SEEN          LAST_SEEN           DIST_SQLEXEC_SEEN
--------- ------- ------- ------------------------------------------------ ---------------------------------------- --------------- ------- ------------------- ------------------- -----------------
      298     1.0   24% | oracle@rac12pri1 (LGWR)                          SYNC Remote Write                        Other           WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59                 1
      298     1.0   24% | oracle@rac12pri1 (NSS2)                          Redo Transport MISC                      Other           WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59                 1
      276      .9   23% | oracle@rac12pri2 (LGWR)                          SYNC Remote Write                        Other           WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59                 1
      274      .9   23% | oracle@rac12pri2 (NSS2)                          Redo Transport MISC                      Other           WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59                 1
        8      .0    1% | oracle@rac12pri2 (LMS1)                                                                                   ON CPU  2017-02-17 08:10:16 2017-02-17 08:14:59                 1
        7      .0    1% | oracle@rac12pri2 (LMS0)                          gcs log flush sync                       Other           WAITING 2017-02-17 08:10:43 2017-02-17 08:14:51                 1
        5      .0    0% | oracle@rac12pri1 (LMS0)                          gcs log flush sync                       Other           WAITING 2017-02-17 08:10:23 2017-02-17 08:13:48                 1
        5      .0    0% | oracle@rac12pri2 (LMS0)                                                                                   ON CPU  2017-02-17 08:10:00 2017-02-17 08:13:24                 1
        5      .0    0% | oracle@rac12pri2 (NSV1)                                                                                   ON CPU  2017-02-17 08:12:43 2017-02-17 08:14:44                 1
        4      .0    0% | oracle@rac12pri1 (DBW0)                                                                                   ON CPU  2017-02-17 08:11:16 2017-02-17 08:13:35                 1
        3      .0    0% | oracle@rac12pri2 (CJQ0)                                                                                   ON CPU  2017-02-17 08:12:39 2017-02-17 08:12:51                 2
        3      .0    0% | oracle@rac12pri2 (RSM0)                                                                                   ON CPU  2017-02-17 08:11:44 2017-02-17 08:14:44                 3
        2      .0    0% | oracle@rac12pri1 (LMON)                          control file sequential read             System I/O      WAITING 2017-02-17 08:11:51 2017-02-17 08:14:01                 1
        2      .0    0% | oracle@rac12pri1 (LMS1)                          gcs log flush sync                       Other           WAITING 2017-02-17 08:11:14 2017-02-17 08:11:44                 1
        2      .0    0% | oracle@rac12pri2 (DBW0)                          db file parallel write                   System I/O      WAITING 2017-02-17 08:12:14 2017-02-17 08:14:24                 1
        2      .0    0% | oracle@rac12pri2 (LGWR)                          log file parallel write                  System I/O      WAITING 2017-02-17 08:12:59 2017-02-17 08:14:42                 1
        1      .0    0% | oracle@rac12pri1 (CKPT)                          control file sequential read             System I/O      WAITING 2017-02-17 08:13:46 2017-02-17 08:13:46                 1
        1      .0    0% | oracle@rac12pri1 (DBW0)                          db file parallel write                   System I/O      WAITING 2017-02-17 08:10:28 2017-02-17 08:10:28                 1
        1      .0    0% | oracle@rac12pri1 (LGWR)                          Redo Transport Close                     Other           WAITING 2017-02-17 08:12:22 2017-02-17 08:12:22                 1
        1      .0    0% | oracle@rac12pri1 (LGWR)                          log file sequential read                 System I/O      WAITING 2017-02-17 08:14:16 2017-02-17 08:14:16                 1

20 rows selected.

As you can see the top waits are in wait class “other”: “SYNC Remote Write” and “Redo Transport MISC”. I have pulled an AWR diff report that shows an interesting picture. The first snapshot set represents the full network performance, while the second one shows the effect of the 100ms network delay. Only relevant information is shown.

WORKLOAD REPOSITORY RAC Compare Period Report for

Database Summary
~~~~~~~~~~~~~~~~
                            Database                   Snapshot Ids     Number of Instances    Number of Hosts    Report Total (minutes)
               ----------------------------------   -----------------   -----------------   -----------------   ---------------------
Snapshot Set            Id Name      RAC Block Si      Begin      End   In Repor    Total   In Repor    Total      DB time Elapsed ti
------------   ----------- --------- --- --------   -------- --------   -------- --------   -------- --------   ---------- ----------
First (1st)      481582958 NCDBA     YES     8192        303      304          2        2          2        2         22.1       10.2
Second (2nd)     481582958 NCDBA     YES     8192        301      302          2        2          2        2        141.9       10.6
Database Instances Included In Report
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                                                                             Elapsed              Instance Avg Activ
Set       DB Id Inst Instance  Release         Host       Startup         Begin Snap Time End Snap Time   Time (min) DB time (m Up Time (h  Sessions Platform
--- ----------- ---- --------- --------------- ---------- --------------- --------------- --------------- ---------- ---------- ---------- --------- ---------------
1st   481582958    1 NCDBA1    12.1.0.2.0      rac12pri1  17-Feb-17 04:00 17-Feb-17 08:59 17-Feb-17 09:09       10.0        2.0        5.2       0.2 Linux x86 64-bi
                   2 NCDBA2    12.1.0.2.0      rac12pri2  17-Feb-17 04:00 17-Feb-17 08:59 17-Feb-17 09:09       10.0       20.1        5.2       2.0 Linux x86 64-bi

2nd   481582958    1 NCDBA1    12.1.0.2.0      rac12pri1  17-Feb-17 04:00 17-Feb-17 08:06 17-Feb-17 08:17       10.5      134.0        4.3      12.8 Linux x86 64-bi
                   2 NCDBA2    12.1.0.2.0      rac12pri2  17-Feb-17 04:00 17-Feb-17 08:07 17-Feb-17 08:17       10.5        7.9        4.3       0.8 Linux x86 64-bi

...

Workload Comparison
~~~~~~~~~~~~~~~~~~~                      1st Per Sec          2nd Per Sec      %Diff              1st Per Txn          2nd Per Txn      %Diff
                                     ---------------      ---------------     ------          ---------------      ---------------     ------
                       DB time:                  2.2                 13.3      514.3                      0.0                  0.2    1,600.0
                      CPU time:                  1.5                  0.5      -66.0                      0.0                  0.0        0.0
           Background CPU time:                  0.5                  0.2      -56.9                      0.0                  0.0        0.0
             Redo size (bytes):            983,488.1            322,026.1      -67.3                  4,164.1              4,192.4        0.7
         Logical read (blocks):             30,773.7              8,299.3      -73.0                    130.3                108.0      -17.1
                 Block changes:              5,665.8              1,886.0      -66.7                     24.0                 24.6        2.3
        Physical read (blocks):                 10.5                 18.3       74.4                      0.0                  0.2      500.0
       Physical write (blocks):                654.9                196.8      -69.9                      2.8                  2.6       -7.6
              Read IO requests:                 10.1                 12.9       27.3                      0.0                  0.2      325.0
             Write IO requests:                503.8                110.4      -78.1                      2.1                  1.4      -32.4
                  Read IO (MB):                  0.1                  0.1       75.0                      0.0                  0.0        0.0
                 Write IO (MB):                  5.1                  1.5      -69.9                      0.0                  0.0        0.0
                  IM scan rows:                  0.0                  0.0        0.0                      0.0                  0.0        0.0
       Session Logical Read IM:
                    User calls:                515.8                188.2      -63.5                      2.2                  2.5       12.4
                  Parses (SQL):                 24.9                 17.7      -28.8                      0.1                  0.2      109.1
             Hard parses (SQL):                  0.1                  0.2       45.5                      0.0                  0.0        0.0
            SQL Work Area (MB):                  0.1                  0.5      253.8                      0.0                  0.0      253.8
                        Logons:                  1.3                  3.9      191.7                      0.0                  0.1      400.0
                Executes (SQL):              2,587.7                853.0      -67.0                     11.0                 11.1        1.4
                  Transactions:                236.2                 76.8      -67.5

...

Top Timed Events   First DB/Inst: NCDBA/NCDBA1 Snaps: 303-304 (Elapsed time: 612.266 sec  DB time: 1326.09 sec),  Second DB/Inst: NCDBA/NCDBA1 Snaps: 301-302 (Elapsed time: 638.691 sec  DB time: 8514.42 sec)
-> Events with a "-" did not make the Top list in this set of snapshots, but are displayed for comparison purposes
-> Summary for all instances

                                               1st                                                                                                2nd
------------------------------------------------------------------------------------------------   ------------------------------------------------------------------------------------------------
Event                          Wait Class           Waits      Time(s)  Avg Time(ms)    %DB time   Event                          Wait Class           Waits      Time(s)  Avg Time(ms)    %DB time
------------------------------ ------------- ------------ ------------ ------------- -----------   ------------------------------ ------------- ------------ ------------ ------------- -----------
 CPU time                                             N/A        919.5           N/A        69.3    log file sync                 Commit              49,010      7,976.7         162.8        93.7
 log file sync                 Commit             144,284        270.4           1.9        20.4    Redo Transport MISC           Other               11,392      1,192.0         104.6        14.0
 log file parallel write       System I/O         141,581        156.6           1.1        11.8    SYNC Remote Write             Other               11,392      1,179.1         103.5        13.8
 gc current block 2-way        Cluster             98,483         71.9           0.7         5.4    CPU time                                             N/A        328.7           N/A         3.9
 gc cr block 2-way             Cluster             71,469         47.9           0.7         3.6    gcs log flush sync            Other                4,480         56.0          12.5         0.7
 target log write size         Other               22,284         26.5           1.2         2.0    gc current block 2-way        Cluster             62,843         43.3           0.7         0.5
 gc current grant busy         Cluster             22,561         16.8           0.7         1.3    gc current block busy         Cluster                284         39.3         138.4         0.5
 control file sequential read  System I/O           8,497          6.0           0.7         0.5    log file switch (checkpoint i Configuration           75         37.1         495.1         0.4
 db file sequential read       User I/O             5,862          5.4           0.9         0.4    rdbms ipc reply               Other                   32         27.1         845.4         0.3
 log file sequential read      System I/O          14,895          4.8           0.3         0.4    gc cr block 2-way             Cluster             39,015         26.0           0.7         0.3
-log file switch (checkpoint i Configuration          212          3.6          17.2         0.3   -log file parallel write       System I/O          11,267         20.3           1.8         0.2
-gcs log flush sync            Other                  461          0.4           0.9         0.0   -gc current grant busy         Cluster             12,427          8.2           0.7         0.1
-gc current block busy         Cluster                126          0.3           2.7         0.0   -control file sequential read  System I/O           7,326          5.0           0.7         0.1
-                                                     N/A          N/A           N/A         N/A   -db file sequential read       User I/O             6,615          4.7           0.7         0.1
-                                                     N/A          N/A           N/A         N/A   -log file sequential read      System I/O             174          0.4           2.4         0.0
                          --------------------------------------------------------------------------------------------------------------------

This doesn’t look too good, does it? The throughput is down substantially, which is visible in Swingbench itself: in the first example I managed to achieve 255 TPS while I’m averaging 90-95 in the synch + delay example. The numbers from the AWR excerpt shown above confirm this, and allows you to work out where the problem is.

Here is a print screen taken from OEM (that’s a different benchmark execution), showing the same situation. Make sure you are licensed appropriately to use this view.

oemexpress_with_latency

Summary Part II

So yes, there is a very noticeable affect of the network delay, and the AWR report shows the relevant wait events in 12c indicating where time is spent.

In my opinion it’s a bit unfortunate to have the new waits in the “Other” wait class, but that’s how it is.

References

http://www.oracle.com/technetwork/database/availability/sync-2437177.pdf