Category Archives: Linux

12.2 new features: a first encounter with the new Cluster Health Advisor

One of the many new manuals that showed up in Oracle Database 12.2 is the Oracle Autonomous Health Framework User’s Guide. This book contains a lot of information, some of which we already know from previous releases, and some new stuff as well. One of the new chapters caught my eye in particular:

“Proactively Detecting and Diagnosing Performance Issues for Oracle RAC”

Said chapter 5 introduces a new feature, called the Cluster Health Advisor (CHA), that should help you understand the performance of your cluster better. Really? Time to start the VMs to create a test.

My system is based on Oracle Linux 7.3 with UEK4. I created a 2 node 12.2.0.1.0 RAC, there weren’t any patches out at the time of writing. In other words, this function might change a little. My database, named CDB, features 2 PDBs: swingbench1 and swingbench2 (you can already guess where this is heading). It appears that my cluster is a “flex cluster” operating in standard mode. You noticed during configuration that you defined the nodes as “HUB nodes”, didn’t you? I have print screens from my RAC installation that I’ll put into a blog post when I find the time. In the meantime head over to oracle-base.com where you can find a RAC 12.2 installation walkthrough for Oracle Linux on VirtualBox.

The plan for this post is to overload the system deliberately to provoke the health advisor to give me recommendations about the cluster and/or the database.

The official documentation is found here: https://docs.oracle.com/database/122/ATNMS/monitoring-rac-with-cha.htm

How is the Cluster Health Advisory implemented?

As per the documentation, there are Cluster Health Advisory daemon processes found on each node, and sure enough there are:

[oracle@rac122pri1 ~]$ crsctl stat res -t 
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.DATA.dg
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.MGMT.dg
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.OCR.dg
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.RECO.dg
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.chad
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.net1.network
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.ons
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.proxy_advm
               OFFLINE OFFLINE      rac122pri1               STABLE
               OFFLINE OFFLINE      rac122pri2               STABLE
...

The resource in question is “ora.chad”, and its profile defines dependencies to the GIMR or Grid Infrastructure Management Repository:

[oracle@rac122pri1 ~]$ crsctl stat res ora.chad -p | grep "DEPENDENCIES\b"
START_DEPENDENCIES=hard(global:ora.mgmtdb) pullup(global:ora.mgmtdb)
STOP_DEPENDENCIES=hard(global:intermediate:ora.mgmtdb)
[oracle@rac122pri1 ~]$ 

The documentation further states that the Health Advisor (CHA) Daemon automatically manages cluster nodes. Databases registered in Clusterware however must be registered specifically with the monitoring daemon.

The main tool to interact with the CHA is – chactl (there’s a surprise). It can take quite a few options:

[oracle@rac122pri1 bin]$ chactl -h
CLSCH-3659 : invalid command line syntax
Usage:
chactl monitor cluster [-model  [-force]]
chactl monitor database -db 
                        [-model  [-force]]
chactl unmonitor database -db 
chactl status [cluster|database [-db ]] [-verbose]
chactl config [cluster|database -db ]
chactl calibrate {cluster|database -db }
                 -model  [-force]
                 [-timeranges 'start=,end=,...']
                 [-kpiset 'name= min= max=,...' ]
    WHERE:
        -interval  : interval is in hrs
        -timeranges 'start=,end=,...' :
             Timestamp must be in format 'YYYY-MM-DD HH24:MI:SS'
    KPI for db:
        CPUPERCENT - CPU utilization - Percent
        IOREAD - Disk read - Mbyte/sec
        DBTIMEPERCALL - Database time per user call - usec/call
        IOWRITE - Disk write - Mbyte/sec
        IOTHROUGHPUT - Disk throughput - IO/sec
    KPI for cluster:
        CPUPERCENT - CPU utilization - Percent
        IOREAD - Disk read - Mbyte/sec
        IOWRITE - Disk write - Mbyte/sec
        IOTHROUGHPUT - Disk throughput - IO/sec
chactl query diagnosis [-cluster|-db ]
                       [-start  -end ]
                       [-htmlfile ]
chactl query model [-name  [-verbose]]
chactl query repository
chactl query calibration {-cluster|-db }
        [-timeranges 'start=,end=,...']
        [-kpiset 'name= min= max=,...' ]
        [-interval ]
    WHERE:
        -interval  : interval is in hrs
        -timeranges 'start=,end=,...' :
             Timestamp must be in format 'YYYY-MM-DD HH24:MI:SS'
    KPI for db:
        CPUPERCENT - CPU utilization - Percent
        IOREAD - Disk read - Mbyte/sec
        DBTIMEPERCALL - Database time per user call - usec/call
        IOWRITE - Disk write - Mbyte/sec
        IOTHROUGHPUT - Disk throughput - IO/sec
    KPI for cluster:
        CPUPERCENT - CPU utilization - Percent
        IOREAD - Disk read - Mbyte/sec
        IOWRITE - Disk write - Mbyte/sec
        IOTHROUGHPUT - Disk throughput - IO/sec
chactl remove model -name 
chactl rename model -from  -to 
chactl import model -name  -file  [-force]
chactl export model -name  -file 
chactl set maxretention -time 
chactl resize repository -entities 
   [-force | -eval]
[oracle@rac122pri1 bin]$

The output of the “help” command has been reformatted for readability.

And yes the documentation is right. At first only the cluster resource was monitored.

[oracle@rac122pri1 ~]$ chactl status
monitoring nodes rac122pri2, rac122pri1
not monitoring databases
[oracle@rac122pri1 ~]$ chactl status -verbose
monitoring nodes rac122pri2, rac122pri1 using model DEFAULT_CLUSTER
not monitoring databases
[oracle@rac122pri1 ~]$

Monitoring my database

In the next step I wanted to include my database to the monitoring configuration. That is easy:

[oracle@rac122pri1 ~]$ chactl monitor database -db CDB
[oracle@rac122pri1 ~]$ chactl status -verbose
monitoring nodes rac122pri2, rac122pri1 using model DEFAULT_CLUSTER
monitoring database cdb, instances CDB2, CDB1 using model DEFAULT_DB

Creating load and causing trouble

With the monitoring in place it was time to create some trouble. I started a swingbench run deliberately overloading the CPUs on my cluster. To add to the fun I rebooted a node during the swingbench execution. Although that didn’t stop processing (I used the connection pool method to establish sessions) I caused all of my sessions to pile up on the surviving node. It held its head up, but performance took a nosedive.

Any recommendations?

The Cluster Health Advisor can be interrogated using the “query diagnosis” command. After causing sufficient trouble I asked for a first assessment. Interestingly enough I hit the repository right after I rebooted node 2. I hadn’t checked at the time, but node 2 ran the GIMR – hence this output:

[oracle@rac122pri1 bin]$ chactl query diagnosis -db CDB\
> -start "2017-03-15 05:20:50" -end "2017-03-15 05:55:50"
CLSCH-2005 : Oracle Cluster Health Analysis Service (OCHAD) failed due to a Grid
Infrastructure Management Repository error.
Unable to start the Universal Connection Pool:
oracle.ucp.UniversalConnectionPoolException: Cannot get Connection from Datasource:
java.sql.SQLRecoverableException: Listener refused the connection with the following
error:

ORA-12514, TNS:listener does not currently know of service requested in connect descriptor

After a little while the failed resources were up again, and I could get some output:

[oracle@rac122pri1 bin]$ chactl query diagnosis -db CDB \
> -start "2017-03-15 05:20:50" -end "2017-03-15 05:55:50"
2017-03-15 05:54:40.0  Database cdb  DB Recovery Read IO Performance (CDB1) [detected]
2017-03-15 05:55:00.0  Database cdb  DB Recovery Read IO Performance (CDB1) [cleared]

Problem: DB Recovery Read IO Performance
Description: CHA detected that recovery reads are slower than expected.
Cause: The Cluster Health Advisor (CHA) detected that a database instance left the cluster
unexpectedly and instance recovery had a larger impact on performance than expected
because more database blocks than expected had to be recovered. The data blocks to which
recovery was applied were read from disk or from the global cache.
Action: Decrease the target for the mean time to recover ( MTTR target ) to increase the
incremental checkpointing rate.
[oracle@rac122pri1 bin]$

Spot on!

SQL> show parameter mttr

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_mttr_target               integer     0
SQL> 

This is typical for DBCA created databases based on the General_Purpose.dbc template. Good catch!

Now the system was still struggling under load:


SQL> select count(*), a.inst_id, b.name from gv$session a, v$pdbs b
  2  where a.con_id = b.con_id and a.username = 'SOE' group by a.inst_id, b.name;

  COUNT(*)    INST_ID NAME
---------- ---------- ------------------------------
        75          1 SWINGBENCH1

The pool was set to 75 users, but should have been split between both nodes. With the second node down however there was only one to run on.

The operating system didn’t look too happy either:

[root@rac122pri1 trace]# top -b  -n 1 | head -n 15
top - 06:05:08 up 21:14,  2 users,  load average: 15.47, 13.66, 8.69
Tasks: 1403 total,  10 running, 1393 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.0 us,  2.2 sy,  0.0 ni, 92.5 id,  0.2 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 16167836 total,  5510468 free,  4714544 used,  5942824 buff/cache
KiB Swap:   782332 total,   782332 free,        0 used.  7338672 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
10339 root      20   0  158952   5516   3608 R  16.7  0.0   0:00.03 top
10341 root      20   0   28604   5428   3728 R  11.1  0.0   0:00.02 perl
20798 oracle    20   0 2545588 812828 805112 S  11.1  5.0   0:40.34 oracle_20798_cd
20824 oracle    20   0 2543552 817728 809924 S  11.1  5.1   0:40.01 oracle_20824_cd
20828 oracle    20   0 2545592 810800 803068 S  11.1  5.0   0:39.88 oracle_20828_cd
24365 oracle    20   0 2543524 588452 581008 S  11.1  3.6   0:22.72 oracle_24365_cd
24457 oracle    20   0 2543748 596728 588752 S  11.1  3.7   0:22.56 oracle_24457_cd
28569 oracle    20   0 2543524 552440 545016 S  11.1  3.4   0:20.18 oracle_28569_cd
[root@rac122pri1 trace]#

Although I guess that is not too bad for a VM with only 4 cores! Inevitably though there is trouble (but it seems to be over)

[oracle@rac122pri1 bin]$ chactl query diagnosis -db CDB \
> -start "2017-03-15 05:20:00" -end "2017-03-15 06:20:00"
2017-03-15 05:54:40.0  Database cdb  DB Recovery Read IO Performance (CDB1) [detected]
2017-03-15 05:55:00.0  Database cdb  DB Recovery Read IO Performance (CDB1) [cleared]
2017-03-15 05:58:55.0  Database cdb  Long Waits for Global Cache Message Requests (CDB1) [detected]
2017-03-15 05:58:55.0  Database cdb  Long Redo Log Sync Wait (CDB1) [detected]
2017-03-15 06:04:25.0  Database cdb  Long Waits for Global Cache Message Requests (CDB1) [cleared]
2017-03-15 06:04:25.0  Database cdb  Long Redo Log Sync Wait (CDB1) [cleared]

Problem: DB Recovery Read IO Performance
Description: CHA detected that recovery reads are slower than expected.
Cause: The Cluster Health Advisor (CHA) detected that a database instance left the cluster
unexpectedly and instance recovery had a larger impact on performance than expected because
more database blocks than expected had to be recovered. The data blocks to which recovery
was applied were read from disk or from the global cache.
Action: Decrease the target for the mean time to recover ( MTTR target ) to increase the
incremental checkpointing rate.

Problem: Long Waits for Global Cache Message Requests
Description: CHA detected that global cache messages are taking a long time.
Cause: The Cluster Health Advisor (CHA) detected that global cache messages have not
completed and are blocking database processes. It is likely that Global Cache Server
Processes (LMS) on another node are unresponsive or are waiting for CPU or Memory.
Action: Check whether CHA reports any issues related to severe performance degradation
caused by CPU or memory starvation on other nodes in the cluster and follow the suggestions
to resolve those problems.

Problem: Long Redo Log Sync Wait
Description: CHA detected that commits are blocked for several seconds.
Cause: The Cluster Health Advisor (CHA) detected that commits are blocked for several
seconds because cluster commit progagation messages are very slow. It is likely that a
Global Cache Server Process (LMS) on another node is unresponsive or is waiting for CPU
or Memory.
Action: Check whether CHA reports any issues related to severe performance degradation
on other nodes in the cluster. Resolve the problems found on other nodes and instances
of this database in the cluster.
[oracle@rac122pri1 bin]$

I really like the text representation because I like the command line. For those of us who prefer a more aesthetically pleasing representation of the data, you can also create a HTML version of the findings specifying the htmlfile flag.

What else?

Going over the list of options again I found a few more that look interesting. For instance, what’s the footprint of this information in the repository? Query it:

[oracle@rac122pri1 bin]$ chactl query repository

specified max retention time(hrs): 72
available retention time(hrs)    : 307
available number of entities     : 17
allocated number of entities     : 4
total repository size(gb)        : 15.00
allocated repository size(gb)    : 1.29
[oracle@rac122pri1 bin]$ 

And what are these models I see when querying the status? DEFAULT_DB and DEFAULT_CLUSTER exist:

[oracle@rac122pri1 bin]$ chactl query model
Models: DEFAULT_CLUSTER, DEFAULT_DB

[oracle@rac122pri1 bin]$ chactl query model -name DEFAULT_DB
Model: DEFAULT_DB
Target Type: DATABASE
Version: 12.2.0.1_0
OS Calibrated on: 
Calibration Target Name: 
Calibration Date: 
Calibration Time Ranges: 
Calibration KPIs: 
[oracle@rac122pri1 bin]$ 

The documentation elaborates, it is possible to calibrate the tool to better reflect the environment.

This is actually quite useful, however I doubt many clusters will deviate from the standard model. Anyway, if you want to calibrate your cluster to a specific workload, you start of querying CHA to check if it has sufficient data. In my case I could dream up a scenario called “swingbench run”.

The cluster as such comes first:

[oracle@rac122pri1 bin]$ chactl query calibration -cluster -timeranges 
>'start=2017-03-15 05:20:00,end=2017-03-15 06:20:00'

Cluster name : rac122pri
Start time : 2017-03-15 05:20:00
End time : 2017-03-15 06:30:00
Total Samples : 1647
Percentage of filtered data : 100%

1) Disk read (ASM) (Mbyte/sec)

MEAN      MEDIAN    STDDEV    MIN       MAX     
1.19      0.45      3.71      0.00      86.69   

<25       <50       <75       =100    
99.64%    0.18%     0.12%     0.06%     0.00%    

2) Disk write (ASM) (Mbyte/sec)

MEAN      MEDIAN    STDDEV    MIN       MAX     
3.14      0.42      4.67      0.00      61.67   

<50       <100      <150      =200    
99.94%    0.06%     0.00%     0.00%     0.00%    

3) Disk throughput (ASM) (IO/sec)

MEAN      MEDIAN    STDDEV    MIN       MAX     
339.59    100.00    512.97    0.00      9700.00 

<5000     <10000    <15000    =20000  
99.88%    0.12%     0.00%     0.00%     0.00%    

4) CPU utilization (total) (%)

MEAN      MEDIAN    STDDEV    MIN       MAX     
37.89     16.80     33.90     4.30      99.90   

<20       <40       <60       =80     
55.68%    7.35%     3.76%     4.74%     28.48%   

[oracle@rac122pri1 bin]$ 

As per the command’s help message you can limit the output of KPIs. Similarly you can check if you have enough data for the database.

[oracle@rac122pri1 bin]$ chactl query calibration -db CDB -timeranges \
> 'start=2017-03-15 05:20:00,end=2017-03-15 06:20:00'
Database name : cdb
Start time : 2017-03-15 05:20:00
End time : 2017-03-15 06:30:00
Total Samples : 1632
Percentage of filtered data : 100%

1) Disk read (ASM) (Mbyte/sec)

MEAN      MEDIAN    STDDEV    MIN       MAX     
1.16      0.46      3.61      0.00      86.69   

<25       <50       <75       =100    
99.69%    0.12%     0.12%     0.06%     0.00%    

2) Disk write (ASM) (Mbyte/sec)

MEAN      MEDIAN    STDDEV    MIN       MAX     
3.17      0.43      4.69      0.00      61.67   

<50       <100      <150      =200    
99.94%    0.06%     0.00%     0.00%     0.00%    

3) Disk throughput (ASM) (IO/sec)

MEAN      MEDIAN    STDDEV    MIN       MAX     
341.61    100.00    514.31    0.00      9700.00 

<5000     <10000    <15000    =20000  
99.88%    0.12%     0.00%     0.00%     0.00%    

4) CPU utilization (total) (%)

MEAN      MEDIAN    STDDEV    MIN       MAX     
37.94     16.75     33.99     4.30      99.90   

<20       <40       <60       =80     
55.82%    7.17%     3.62%     4.66%     28.74%   

5) Database time (per user call) (usec/call)

MEAN      MEDIAN    STDDEV    MIN       MAX     
108878.33  6712.80   2778378.28  0.00      93077792.00

<10000000  <20000000  <30000000  <40000000  <50000000  <60000000  =70000000
99.88%    0.00%     0.00%     0.00%     0.00%     0.00%     0.06%     0.06%    

[oracle@rac122pri1 bin]$ 

Be careful though, sometimes you don’t have enough data! I saw a message like this when setting the window to 30 minutes:

[oracle@rac122pri1 ~]$ chactl query calibration -cluster \
> -timeranges 'start=2017-03-15 05:50:00,end=2017-03-15 06:20:00'

Cluster name : rac122pri
Start time : 2017-03-15 05:50:00
End time : 2017-03-15 06:20:00
Total Samples : 687
Percentage of filtered data : 100%
The number of data samples may not be sufficient for calibration.
...

It looks like you can leave it at that, or create a new, more accurate model with finer granularity. Interesting new features await!

DOAG Exaday 2017: a preview

I am quite excited to have been accepted to speak at the upcoming DOAG Exaday taking place June 20 in Frankfurt. It is the third time I have been lucky enough to present at the event, having spoken both in Frankfurt in Hamburg in previous years.

As you can probably imagine by looking at this weblog, I am a great supporter of the Oracle Engineered Systems community. My interest is not limited to Germany: I still remember the many great meetings in London in the context of UKOUG’s Exadays. I am also very lucky to work for @enkitec where we have been sharing real-world experience at E4 for years and will do so again in June this year.

Unfortunately interest in the UKOUG’s Systems Event has dwindled a bit recently which lead me to focus on the Exadata in Germany this year. I am going to speak – in German – about how you can make your migrations to Exadata easier. This is a talk I have delivered a few times before but leaped at the opportunity to update the material to release 12.2.

I am very glad to see my colleague @fritshoogland and many other great speakers too numerous to mention at the event. If you are an engineer working on Oracle Exadata and its cousins this is definitely the event to attend.

A quick look at Oracle 12.2 performance instrumentation

Thanks to the prep-work done by Andy Colvin, who continuously and boldly goes where no man has gone before, one of our Exadata systems in the lab is now fully upgraded to 12.2.0.1. It comes fully equipped with the matching cellos to support all the cool new features. Exciting times!

The reason for this post is simple: I have started working on our talk for @Enkitec’s E4 conference in June but thought I’d share this little article with you as a teaser :) There might be one or two more of these posts but if you want the full story make sure you catch us (online) during the conference.

More diagnostic information in 12.2

The Oracle database truly champions diagnosability in a way I have not seen with any other system, and it does so out of the box. Granted, some of that requires an investment into extra cost options but the value one gets in form of Active Session History (ASH) and Automatic Workload Repository (AWR) is real. After I read the chapter on instrumentation in “Insights-tales from the Oaktable” (Apress) a long time ago, I started to realise the inherent beauty of having insights to code. This code can be your code if you instrumented it properly, or the Oracle codepath externalised as wait events. Sadly most application developers do not adhere to the principle of instrumenting code (or maybe don’t even know about the possibility?) and therefore complicate troubleshooting unnecessarily. The latter is not so much an issue on many platforms where you don’t have an equivalent of the Oracle Wait Interface and session counters anyway, but on Oracle it’s a real wasted opportunity as others have pointed out before me.

I’ll now take my idealist hat off and focus on the real world :) In the far more common case where the application isn’t instrumented by the developer, you can still get to some conclusions by using the Wait Interface and session counters. In most scenarios I am involved in the first step is to use Tanel Poder’s session snapper script which gives me insights to both.

So what has changed in this respect with 12.2? The test environment I am using is an Exadata quarter rack as mentioned before. The findings should be comparable with other Oracle software offerings, in the cloud and on premises.

Wait Interface

The Wait Interface is one of the most useful features for the performance engineer, and one of the building blocks for Statspack, ASH and AWR. I was curious if new events have been introduced in 12.2, hence this investigation. The sheer number of events tracked in the database prevents them from being listed verbally in this post. If you want to, you can use Tanel’s “sed.sql” to find out more, or simply query v$event_name.

An interesting tidbit for 12.2 has been covered by @FranckPachot: some of the more misleading event names such as db file scattered read and db file sequential readhave been clarified in 12.2. Search for events where the name is not equal to the display_name, or read Franck’s post on the DBI blog.

Wait events in 11.2.0.3

I like to start comparisons with a benchmark, and 11.2.0.3 seems to be a good candidate. Just looking at the wait_classes and counting events per class should be a good starting point:

SQL> select count(*), wait_class 
  2  from v$event_name 
  3  group by rollup(wait_class) 
  4  order by wait_class;

   COUNT(*) WAIT_CLASS
----------- ----------------------------------------------------------------
         55 Administrative
         17 Application
         50 Cluster
          2 Commit
         33 Concurrency
         24 Configuration
         95 Idle
         35 Network
        745 Other
          9 Queueing
          8 Scheduler
         31 System I/O
         48 User I/O
       1152

14 rows selected.

So there are 1152 events total in 11.2.0.3, keep that number in mind.

Wait events in 12.1.0.2

In my opinion 12.1 is a major step ahead, and I said it many times: I haven’t seen so many fundamental changes to the Oracle database for a long time. For various reasons though 12.1 hasn’t seen the traction in the field it probably deserved. Note how the diagnosability has been greatly enhanced:

SQL> select count(*), wait_class 
  2  from v$event_name 
  3  group by rollup(wait_class) 
  4  order by wait_class;

   COUNT(*) WAIT_CLASS
----------- ----------------------------------------------------------------
         57 Administrative
         17 Application
         64 Cluster
          4 Commit
         38 Concurrency
         26 Configuration
        121 Idle
         28 Network
       1186 Other
          9 Queueing
          9 Scheduler
         35 System I/O
         56 User I/O
       1650

14 rows selected.

A quick calculation reveals that 12.1 features 498 additional events, a lot more than 11.2.0.3. Not too bad in my opinion. Unfortunately most of these additional events ended up in the “Other” wait class. This is a real shame, and I have remarked this before when blogging about the new Data Guard Wait events: they probably should have ended up in the Network class instead. I am sure there are other examples like this.

Wait events in 12.2.0.1

Now what about 12.2? First let’s start with the list:

SQL> select count(*), wait_class
  2  from v$event_name
  3  group by rollup(wait_class)
  4  order by wait_class;

  COUNT(*) WAIT_CLASS
---------- ----------------------------------------------------------------
        57 Administrative
        17 Application
        68 Cluster
         4 Commit
        49 Concurrency
        27 Configuration
       135 Idle
        29 Network
      1314 Other
         9 Queueing
        10 Scheduler
        35 System I/O
        57 User I/O
      1811

14 rows selected.

There are indeed some new events, most of them can be found in the Other wait class. Again, this is quite unfortunate as it prevents the performance architect from identifying unknown wait events quickly.

I have decided to keep this post short-ish and will spend more time some other day to investigate the exact difference between 12.1.0.2 and 12.2.0.1. Most likely after E4 this year.

Session Counters

With the wait interface covered, it’s time to move on to the session counters. Continuing the approach I took with wait events I will group all session counters by class. Instead of re-inventing the wheel I am using a slightly adapted version of Tanel Poder’s “mys.sql” script to group counters by class. Most of them fall in just one, but there are others where more than one class is applicable. The 12.2 Reference Guide explains v$statname.class in more detail.

Session counters in 11.2.0.3

Before investigating 12.1 and 12.2 I’ll look at 11.2.0.3 first, as in the first section of the article. The SQL statement I used is this:

with classification as (
select name, TRIM(
  CASE WHEN BITAND(class,  1) =   1 THEN 'USER  ' END ||
  CASE WHEN BITAND(class,  2) =   2 THEN 'REDO  ' END ||
  CASE WHEN BITAND(class,  4) =   4 THEN 'ENQ   ' END ||
  CASE WHEN BITAND(class,  8) =   8 THEN 'CACHE ' END ||
  CASE WHEN BITAND(class, 16) =  16 THEN 'OS    ' END ||
  CASE WHEN BITAND(class, 32) =  32 THEN 'RAC   ' END ||
  CASE WHEN BITAND(class, 64) =  64 THEN 'SQL   ' END ||
  CASE WHEN BITAND(class,128) = 128 THEN 'DEBUG ' END
) class_name 
from v$statname
) select count(*), class_name from classification 
group by rollup(class_name)
order by class_name;

Executed on an 11.2.0.3 database this reveals the following numbers:

   COUNT(*) CLASS_NAME
----------- ------------------------------------------------
        121 CACHE
         27 CACHE RAC
         15 CACHE SQL
        188 DEBUG
          9 ENQ
         16 OS
         25 RAC
         32 REDO
         93 SQL
          2 SQL   DEBUG
        107 USER
          3 USER  RAC
        638

13 rows selected.

638 of them altogether. Keep the number in mind when moving to 12.1.

Session counters in 12.1.0.2

There was a major increase in the number of counters between 11.2.0.3 and 12.1.0.2. Consider the numbers:

  COUNT(*) CLASS_NAME
---------- ------------------------------------------------
       151 CACHE
        53 CACHE RAC
        15 CACHE SQL
       565 DEBUG
         9 ENQ
        16 OS
        35 RAC
        68 REDO
         1 REDO  RAC
       130 SQL
         2 SQL   DEBUG
       130 USER
         3 USER  RAC
      1178

14 rows selected.

That nearly doubles the number of counters available. Note that quite a few of the new counters fall into the DEBUG section. Looking a bit more closely you can see they seem to be used by the In Memory (IM) Option:

SQL>  SELECT
  2      regexp_substr(name,'\w+ \w+') AS short_name,
  3      COUNT(*)
  4  FROM
  5      v$statname
  6  WHERE
  7      class = 128
  8  GROUP BY
  9      regexp_substr(name,'\w+ \w+')
 10  ORDER BY
 11      2 DESC
 12  FETCH FIRST 5 ROWS ONLY;

SHORT_NAME                        COUNT(*)
------------------------------ -----------
IM repopulate                           49
IM scan                                 47
IM populate                             37
spare statistic                         35
IM space                                26

5 rows selected.

There are 198 session counters beginning with ‘IM %’. I can feel another post about DB In Memory coming …

Session counters in 12.2.0.1

Finally, here is the list of statistics in 12.2.0.1:

  COUNT(*) CLASS_NAME
---------- ------------------------------------------------
       174 CACHE
        73 CACHE RAC
        15 CACHE SQL
      1067 DEBUG
         9 ENQ
        16 OS
        35 RAC
        75 REDO
         1 REDO  RAC
       190 SQL
         2 SQL   DEBUG
       144 USER
         3 USER  RAC
      1804

Another 626 additional counters, that’s almost the number of counters available in total on the 11.2.0.3 system! Running my previous query again it seems that IM-related statistics dominate, but there are lots of others of interest.

As with the 12.2 wait events I don’t want to give too much away at this point (and the post is long enough anyway) so stay tuned for an update at a later time.

Summary

Oracle has been one of the best database engines around, and with 12.2.0.1 instrumentation is further improved. This post has again become too long, so I’ll stop here and defer the write-up of my investigation into those stats relevant for Exadata to a later point.

Happy troubleshooting!

Using virt-install/KVM and kickstart to create lots of identical VMs quickly

In preparation for an upcoming talk about Oracle 12.2 I am planning on getting my feet wet with the Oracle 12.2 Sharding option. It seems to be a very interesting subject to get into, but like all new subjects, they can be intimidating at first due to their complexity. So why not use the instructions someone has already written down?

Oracle has a nice series of articles about Sharding. I found the Sharding landing page in the Maximum Availability Architecture section on OTN. For the purpose of this blog post I’m intending to follow the “Oracle Sharded Database Deployment with Active Data Guard using CREATE SHARD method for On-Premises – Cookbook”, available from the aforementioned landing page.

About those hardware requirements

Well that’s what I initially had planned. What I hadn’t expected was the number of VMs to be created. On page 7 in the aforementioned PDF, the author explains the pre-requisites. Copying verbally from the document:

You must acquire brand new 7 VMs which do not have any pre-existing Oracle database or listeners running on them.

Well, that’s 7 VMs. Not an awful lot of fun if you have to install all of them manually. Thankfully I don’t have to! In my lab environments I run KVM plus libvirt to make my life easier. There are many useful aspects of this combination, and one of them is named virt-install. The tool allows you to create a VM plus storage, networking etc entirely on the command line. But that would still require the administrator to navigate the anaconda screens, something I’d like to avoid. For quite some time administrators have deployed systems using an automation option. On Red Hat Linux and clones that mechanism is called “Kickstart”. Kickstart – documented in the Red Hat Installation Guide chapter 23 – is an immensely powerful way of deploying systems in a consistent, and always correct (or wrong :) way. In this article I’d like to share how you could run virt-install to install VMs with minimum effort.

Say hello to virt-install

virt-install is beautifully documented making it easy to create VMs. The syntax however takes a little while to understand. I have previously written about virt-install but those posts are now showing their age, and a new example seems in order.

Before presenting the command to create the VM let’s think about the machine’s capabilities first.

  • VM to host a single instance Oracle database and some auxiliary software
  • Should probably have at least two vcpus
  • Eight GB of DRAM should be the minimum
  • I like to separate the root file system from the application data, e.g. I want two volume groups: rootvg and oraclevg

Let’s translate that to virt-install; this works on virt-install 1.4.0 which is the current version on my Oracle Linux 7.3 host.

# virt-install --name shard0 --memory 8192 --memorybacking hugepages=yes \
--vcpus 2 --metadata description=shard0_vm --location /path/to/OL7.3.iso \
--initrd-inject=/tmp/shard0.ks --os-variant=rhel7.3 \
--extra-args="ks=file:/shard0.ks console=tty0 console=ttyS0,115200n8"  \
--disk size=12,pool=capacitypool --disk size=50,pool=latencypool \
--network network=public --nographics

This worked for me: the virt-install command (sourcing the kickstart file) creates and configures a fresh VM with all the required packages for Oracle 12c. I couldn’t reference the 12.2 preinstall RPM because it’s not part of the Oracle Linux 7.3 ISO. That’s ok for me, I can wait for the 7.4 ISO when this might be part of the standard distribution. As a workaround you could install the 12.2 preinstall RPM once the new VM is installed, the RPM can be found in the OL7 “latest” yum channel.

virt-install what?

The virt-install command might require an explanation or two. All of these parameters are explained in the man-page, the ones that stick out a bit are covered here.

name
This is the name the VM gets. You can manage it using virsh, the libvirt command line interface or it’s GUI cousin, virt-manager
memorybacking
Using this command I instruct the VM to use large pages in the hope that it simplifies the hypervisor’s job of memory management. Obviously this requires large pages to be set up and enough of them available
location
As explained in the docs you can make an ISO available to virt-install that serves as the installation source. Doing so requires you to run virt-install as root
initrd-inject and extra-args
Using these together in combination with the location flag allows me to stick a kickstart file through to the VM. In the kickstart file I configured partitioning, networking, firewall/selinux settings as well as packages. I can even have a %post section in which I can run arbitrary shell commands. The other parameters are taken from the Red Hat documentation and allow me to install the VM in console mode. Requires the “text” setting in the kickstart file, and the nographics flag
disk
Using the disk directive I request 2 qcow2 virtual disk images to be created. I place the O/S on the capacity pool (backed by hard disk – a lot of capacity but dog slow) and the Oracle related data on a virtual disk backed by the SSDs in the machine.
network
In a similar way I am instructing the VM to connect to the network I defined beforehand, named public.
nographics
If you want to make sure you don’t need X11 at all to install the VM, you can use nographics. You have to provide a “text” directive in your kickstart file and most likely set the console to something sensible, see above for an example

I have said it before and I say it again: libvirt is a fantastic tool, and so easy to use. If you spend a little bit of time preparing your environment, it gets even simpler. Here are a couple of examples:

Before creating VMs on my host I created a couple of storage pools, namely capacity and latency. That’s where I instruct the storage to be used. This way I can simply instruct libvirt to create a volume of x GB in the pool and assign it to the VM. The order matters, the 12 GB disk will become /dev/vda (KVM uses virtio drivers), the second one /dev/vdb. The partitioning scheme is defined in the kickstart file.

Likewise, I can create a network that all my VMs can connect to. On the back-end, libvirt created iptables rules, and a network bridge:

# virsh net-info public
Name:        public
UUID:        some long identifier
Active:      yes
Persistent:  yes
Autostart:   yes
Bridge:      virbr1

VMs that are part of this network – including the host (x.x.x.1) can communicate with each other.

I hope I could get you a little interested in libvirt. Have a look at the documentation, it is very good and explains these topics in more detail.

If you haven’t used virt-install previously this might look uninspiring. However using the commands above I don’t need to set up the installation source via NFS, SMB or HTTP. Neither do I need to provide the kickstart file via a web server, keeping setup requirements very low.

What about the kickstart file? Where do I get one

I had the same question when I first started with the tool. The documentation is quite good, but a working example would be nice. Turns out you have one, it’s created by default in /root/anaconda-ks.cfg and just like an Oracle response file, contains a recording of the options you entered in the GUI. Have a look at it, and adjust to your needs.

Little things worth knowing: Executing RDA on RAC

Result! I have finally been able to gather a complete RDA (Oracle Remote Diagnostic Agent) output on my 2 node RAC system. After consulting the relevant documentation on MOS-which is spread over at least 42 Doc IDs-I found them not to be very helpful to the degree that some of what I read is actually wrong or contradicting. I put together a short note, primarily to myself, to remind me of the process. I hope you find it useful, too.

The RDA version I used for this post is 8.14.x from MOS March 4th 2017. My RAC nodes are based on Oracle Linux 7.3/UEK 4.

Starting the data collection

RDA is one of the standard tools I use and I have previously blogged about it. RDA started off as a simple-to-use tool. After having used it for some years I started to run into issues with the interactive configuration which simply took too long to complete. As soon as I learned about them I fell in love with profiles. RDA profiles only prompt you for relevant information about the product you want to collect data for.

Running RDA on a RAC system is similar to single instance, and you automatically use profiles-a nice touch. It appears to me that RAC data collection is triggered via SSH on the remote nodes with the results being transferred to the local node.

I believe there are two methods of data gathering on RAC: one for Grid Infrastructure, and another one for the RDBMS layer. Depending on your settings of ORACLE_HOME and ORACLE_SID different parts of the stack are analysed. In this post I was primarily interested in running RDA for the RDBMS part of my cluster.

I started off by downloading the current RDA version to the system I want to run it on. This is a 2-node RAC, with nodes named rac12pri1 and rac12pri2. Both are based on Oracle Linux 7.3 with the most current UEK4 at the time of writing. My RDBMS homes are 12.1.0.2 patched with the January 2017 proactive bundle patch + OJVM.

The first caveat looms right at the beginning of the entire process: I didn’t find it stated explicitly in the documentation on MOS, but it seemed that you need to deploy RDA on all nodes in the cluster, and in the exact same location, before you start the data collection. I use /home/oracle/rda for that purpose. Make sure you have plenty of space in the location you chose, sometimes /home doesn’t provide enough for larger, more active systems.

During my first unsuccessful attempts I didn’t deploy RDA on all nodes before starting the data collection, only to see it not gather any information from the remote node. This is somewhat confusing, because the output (when collecting data) states this:

NOD002: Installing RDA software

NOD002 is short for the second node in my cluster.

Also, when configuring the data gathering process, you will see a prompt like this one:

Enter the absolute path of the directory where RDA will be installed on the remote nodes.

To the casual observer like me these messages suggest that RDA is actually installed on the remote nodes as part of the data collection process-but it is only partially done. Comparing directory sizes between remote and local node revealed /home/oracle/rda to be greater than 200M locally, while on the remote nodes it was only 37MB in size. Something seems to be missing…

Once deployed, you can change to the RDA directory and prepare for the data collection. You actually execute RDA on the first node, the rest of the work is done programatically. The MOS note seems to be correct this time, here is an example of my configuration session:

[oracle@rac12pri1 rda]$ ./rda.pl -vXRda start CLOUD -pRac_Assessment
Creating collection "output" ...
        - Trying to identify the domain ... (can take time)
 Inside DFT scenario, define the input and profile targets
 Inside DFT scenario, check Oracle home or Middleware presence
        - RDA:DCbegin ...
------------------------------------------------------------------------------
RDA.BEGIN: Initializes the Data Collection
------------------------------------------------------------------------------
Enter the Oracle home to be used for data analysis
Press Return to accept the default (/u01/app/oracle/product/12.1.0.2/dbhome_1)
>

        - RDA:DCconfig ...
------------------------------------------------------------------------------
RDA.CONFIG: Collects Key Configuration Information
------------------------------------------------------------------------------
        - RDA:DCocm ...
------------------------------------------------------------------------------
RDA.OCM: Set up the Configuration Manager Interface
------------------------------------------------------------------------------
        - RDA:DCstatus ...
------------------------------------------------------------------------------
RDA.STATUS: Produces the Remote Data Collection Reports
------------------------------------------------------------------------------
        - RDA:DCload ...
------------------------------------------------------------------------------
RDA.LOAD: Produces the External Collection Reports
------------------------------------------------------------------------------
        - RDA:DCfilter ...
------------------------------------------------------------------------------
RDA.FILTER: Controls Report Content Filtering
------------------------------------------------------------------------------
        - RDA:DCend ...
------------------------------------------------------------------------------
RDA.END: Finalizes the Data Collection
------------------------------------------------------------------------------
In CRS start scenario, getting nodes using /u01/app/12.1.0.2/grid/bin/olsnodes
2>/dev/null
------------------------------------------------------------------------------
Requesting common cluster information
------------------------------------------------------------------------------
Enter the absolute path of the directory where RDA will be installed on the
remote nodes.
Press Return to accept the default (/home/oracle/rda)
>

Do you want RDA to use an alternative login to execute remote requests (Y/N)?
Press Return to accept the default (N)
>

In CRS start scenario, getting local node information
In CRS start scenario, pinging node rac12pri2
------------------------------------------------------------------------------
Requesting information for node rac12pri1
------------------------------------------------------------------------------
Enter the Oracle home to be analyzed on the node rac12pri1
Press Return to accept the default (/u01/app/oracle/product/12.1.0.2/dbhome_1)
>

Enter the Oracle SID to be analyzed on the node rac12pri1
Press Return to accept the default (NCDBA1)
>

------------------------------------------------------------------------------
Requesting information for node rac12pri2
------------------------------------------------------------------------------
Enter the Oracle home to be analyzed on the node rac12pri2
Press Return to accept the default (/u01/app/oracle/product/12.1.0.2/dbhome_1)
>

Enter the Oracle SID to be analyzed on the node rac12pri2
> NCDBA2

        - RDA:DCremote ...
------------------------------------------------------------------------------
RAC Setup Summary
------------------------------------------------------------------------------
Nodes:
. NOD001  rac12pri1/NCDBA1
. NOD002  rac12pri2/NCDBA2
2 nodes found

So RDA understands the RAC scenario, and it gathers data for each node that is part of the cluster as reported by the olsnodes tool. There was nothing really surprising about the prompts, except that I haven’t found a way to analyse more than one database, or ASM and a database together.

Be careful to enter the correct ORACLE_SID for your (remote) RAC nodes. RDA picks up your ORACLE_HOME and ORACLE_SID if they are set.

Optionally verify the correct number of nodes was detected

I am only mentioning this for the sake of completeness, the RAC Setup Summary you saw in the previous step was more than adequate for me. If you really want to find out about the number of nodes you are about to collect information, be careful: MOS Doc ID 359395.1 is wrong – the command ./rda.pl -vC RDA.REMTOE list will not only list the nodes, it will also start the data collection. Use this one instead, which I found in MOS Doc ID 1682909.1:

[oracle@rac12pri1 rda]$ ./rda.pl -XRemote list
Defined nodes:
  NOD001  rac12pri1  NCDBA1
  NOD002  rac12pri2  NCDBA2
[oracle@rac12pri1 rda]$

It merely lists the nodes, without actually starting to do any work.

Initiate data collection

When you are happy with the output and configuration, start collecting data. Here is an example of my session:

[oracle@rac12pri1 rda]$ ./rda.pl -v -e TRC/TRACE=1
Collecting diagnostic data ...
------------------------------------------------------------------------------
RDA Data Collection Started 04-Mar-2017 16:10:15
------------------------------------------------------------------------------
Processing RDA.BEGIN module ...
 Inside BEGIN module, testing the RDA engine code build
 Inside BEGIN module, testing the report directory
 Inside BEGIN module, testing the module targets
 Inside BEGIN module, launching parallel executions
Processing RDA.CONFIG module ...
 Inside CONFIG module, listing Oracle homes
 Inside CONFIG module, getting Oracle home inventory (can take time)
Processing RDA.REMOTE module ...
NOD001: Detecting storage type
NOD002: Detecting storage type
NOD001: Running RDA command
NOD002: Installing RDA software
NOD002: Running RDA command
NOD002: Transfering report package
NOD001: Transfering report package
Processing RDA.END module ...
 Inside END module, gathering system information
 Inside END module, getting CPU information (linux)
 Inside END module, getting memory information (linux)
 Inside END module, producing the file catalog
 Inside END module, producing target overview
 Inside END module, waiting for parallel execution completion
 Inside END module, producing setting overview
------------------------------------------------------------------------------
RDA Data Collection Ended 04-Mar-2017 16:17:44
------------------------------------------------------------------------------
Generating the reports ...
        - collect/RDA_CONFIG_homes.txt ...
        - collect/RDA_CONFIG_oh_inv.txt ...
        - collect/RDA_END_system.txt ...
        - collect/RDA_END_files.txt ...
        - collect/RDA_END_target.txt ...
        - collect/RDA_END_report.txt ...
        - Report index ...
Packaging collection results ...

[...skipping a lot of text...]

  You can review the reports by transferring the /home/oracle/rda/output
  directory structure to a location where you have web-browser access.
Then, point your browser at this file to display the reports:
    RDA__start.htm

[...skipping a lot of text...]

It is crucially important to see these lines:

Processing RDA.REMOTE module ...
NOD001: Detecting storage type
NOD002: Detecting storage type
NOD001: Running RDA command
NOD002: Installing RDA software
NOD002: Running RDA command
NOD002: Transfering report package
NOD001: Transfering report package

In my first attempts, when I didn’t deploy RDA on all nodes myself, the lines “NOD002: Running RDA command” and “NOD002: Transfering report package” weren’t shown. Unsurprisingly no data was gathered on the remote nodes.

Viewing the output

At the end of the data collection you should see a *.tar.gz file per node. In my 2 node cluster setup, there are two:

[oracle@rac12pri1 rda]$ ls output/remote/
RDA_nod001_output.tar.gz  RDA_nod002_output.tar.gz

You can view these after extracting to a temporary location in your browser, start with the file named RDA__start.htm which presents the various parts of the report.

Further reading: Testing user equivalence prior to gathering data

While trying to troubleshoot my remote data gathering problems (I suspected a problem with SSH at first) I noticed that RDA offers test modules as well (see MOS Doc ID: 330760.1). What’s missing from the Doc ID is an example on how to invoke the SSH test module, or rather the command in the RAC specific instructions seems not to work. But it isn’t too hard to figure out the proper call to the RDA executable. The following worked for me:

[oracle@rac12pri1 rda]$ ./rda.pl -T ssh
Processing remote operation test module ...

Command Availability:
  rcp          (Not found)
  remsh        (Not found)
  rsh          (Not found)
  scp          /bin/scp
  ssh          /bin/ssh
  ssh-add      /bin/ssh-add
  ssh-agent    /bin/ssh-agent

Related files:
  /etc/ssh/moduli
  /etc/ssh/ssh_config
  /etc/ssh/ssh_host_ecdsa_key
  /etc/ssh/ssh_host_ecdsa_key.pub
  /etc/ssh/ssh_host_ed25519_key
  /etc/ssh/ssh_host_ed25519_key.pub
  /etc/ssh/ssh_host_rsa_key
  /etc/ssh/ssh_host_rsa_key.pub
  /etc/ssh/sshd_config
  /home/oracle/.ssh/authorized_keys
  /home/oracle/.ssh/config
  /home/oracle/.ssh/config.backup
  /home/oracle/.ssh/id_rsa
  /home/oracle/.ssh/id_rsa.pub
  /home/oracle/.ssh/known_hosts

Check if an authentication agent has been started
  SSH_AGENT_PID=1
  SSH_AUTH_SOCK=1
Agent identities:
  1024 some:colums:separated:by:colons:indicating:the:ID
  /home/oracle/.ssh/id_rsa (RSA)

Driver Availability:
    da Available
  jsch Available
   ssh Available
   rsh -

Settings modified by the remote operation initialization:
  REMOTE.F_SCP_COMMAND=/bin/scp
  REMOTE.F_SSH_COMMAND=/bin/ssh
  REMOTE.T_SCP_OPTIONS=-BCpq -o ConnectTimeout=30
  REMOTE.T_SSH_OPTIONS=-Cnq -o ConnectTimeout=30

------------------------------------------------------------------------------
Test a remote command
------------------------------------------------------------------------------
Enter the remote host name
Press Return to accept the default (localhost)
> rac12pri2

Do you want RDA to use an alternative login to execute remote requests (Y/N)?
Press Return to accept the default (N)
>

Check remote command execution with best driver
Exit code: 0

Check remote command execution using DA
Exit code: 0

Check remote command execution using JSCH
Exit code: 0

Check remote command execution using SSH
Exit code: 0

Check remote command execution using SSH (no timeout)
Exit code: 0
[oracle@rac12pri1 rda]$

Looking at the above output let me to believe that there isn’t a SSH-related problem with my cluster.

New Events for Data Guard and Synchronous Redo Transport in 12c (2)

After the baseline has been established in the first part of this series it’s time to measure the effect of the network in this part. The second test will introduce an interesting feature: Using Linux’s own Traffic Shaper/Quality of Services module I will add a delay of 100ms to the Data Guard network interface card (NIC) to slow things down a little.

WARNING: this is of course a lab or VM-only situation. I can’t exert control over wire quality in my (own) switches, hence some software magic is needed on my virtual ones. This post is intended to be merely for educational purposes, not for use at work.

I am continuing to use the 2 node RAC 12.1.0.2.170117 primary database on Oracle Linux 7 with UEK 4 and an identical RAC to host my standby database.

All database traffic still uses eth0 – my public network (192.168.100.0/24). In order not to affect my SSH session with the planned network changes I created a new out-of-band management interface (eth2) on the Data Guard standby database servers. I love KVM and virtio drivers for the NICs: a few commands in virsh later and every RAC node has a new NIC, and of course without rebooting. Adding some nmcli magic and I can configure all interfaces in 1 line!

Here’s what it looks like from connection manager (the system is Oracle Linux 7.3)

[root@rac12sec1 ~]# nmcli c show
NAME  UUID                                  TYPE            DEVICE 
eth0  3e8915f1-10ce-4da7-99b8-d2a6947f9ff7  802-3-ethernet  eth0   
eth1  0ebe1279-40e1-4faf-a0be-49588d347fcf  802-3-ethernet  eth1   
mgmt  51054729-19ae-49e7-ba18-c6b41475ec1b  802-3-ethernet  eth2   

My eth0 device maps to the public networks, e.g. all database traffic including Data Guard. The private interconnect uses eth1 with eth2 as the management interface. I have deliberately kept this simple, usually I’d request a dedicated set of interfaces to route Data Guard traffic (I blogged about that earlier in the context of RAC) here:

Please read past part I of the Adding an additional SCAN-series to get the full story.

Change protection mode

Following the scenario I outlined in part I I first have to change the protection mode to Maximum Availability. Again, please see part I of this series for an explanation about Protection Levels, my choice of protection level, and why “Maximum Availability” might or might not work for you.

Since I’m using the Data Guard Broker, I need to head over and change the mode in dgmgrl.

DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 40 seconds ago)

As with many other maintenance tasks, you use the edit command to change the configuration.

DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
Error: ORA-16627: operation disallowed since no standby databases would 
remain to support protection mode

Failed.

I seem to have gotten a bit ahead of myself – changing to the more serious protection modes requires changing the transport mode from async to sync. Let’s do that, and then change the protection mode:

DGMGRL> edit database 'NCDBA' set property LogXptMode = 'SYNC';
Property "logxptmode" updated
DGMGRL> edit database 'NCDBB' set property LogXptMode = 'SYNC';
Property "logxptmode" updated
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
Succeeded.
DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxAvailability
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 19 seconds ago)

DGMGRL> 

That looks better now, the first half of the job is done. The second half of this article details how to make the network look a lot worse.

Net Emulator

Using the Linux netem kernel module I can simulate many complex network situations – I found https://wiki.linuxfoundation.org/networking/netem to be a good reference.

I should probably say at this point that no one would use a link that has 100 ms delay between sites for (sync) redo transport. That simply doesn’t make sense. I used 100ms to have a large enough factor so that I can actually be sure that a deviation of performance figures is actually a result of my fiddling with the network latency. Large numbers make spotting outliers easier.

How do you mess with the network? That is simple when you are on Linux! On my standby cluster I use the tc command, part of the iproute package. This is the “normal” state:

[root@rac12sec1 ~]# tc qdisc
qdisc pfifo_fast 0: dev eth0 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth2 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
[root@rac12sec1 ~]# 

I am merely using these tools, if you want to understand them, have a look at the Linux Advanced Routing & Traffic Control HOWTO chapter 9, and refer to the link to the Linux Foundation wiki I mentioned earlier. Then take it from there.

The syntax of tc qdisc is complex at first sight:

[root@rac12sec1 ~]# tc qdisc help
Usage: tc qdisc [ add | del | replace | change | show ] dev STRING
       [ handle QHANDLE ] [ root | ingress | parent CLASSID ]
       [ estimator INTERVAL TIME_CONSTANT ]
       [ stab [ help | STAB_OPTIONS] ]
       [ [ QDISC_KIND ] [ help | OPTIONS ] ]

       tc qdisc show [ dev STRING ] [ingress]
Where:
QDISC_KIND := { [p|b]fifo | tbf | prio | cbq | red | etc. }
OPTIONS := ... try tc qdisc add <desired QDISC_KIND> help
STAB_OPTIONS := ... try tc qdisc add stab help

The above allows us to introduce NetEM to the mix. On my Linux 7.3/UEK4 system it offers the following options:

[root@rac12sec1 ~]# tc qdisc add netem help
Usage: ... netem [ limit PACKETS ] 
                 [ delay TIME [ JITTER [CORRELATION]]]
                 [ distribution {uniform|normal|pareto|paretonormal} ]
                 [ corrupt PERCENT [CORRELATION]] 
                 [ duplicate PERCENT [CORRELATION]]
                 [ loss random PERCENT [CORRELATION]]
                 [ loss state P13 [P31 [P32 [P23 P14]]]
                 [ loss gemodel PERCENT [R [1-H [1-K]]]
                 [ ecn ]
                 [ reorder PRECENT [CORRELATION] [ gap DISTANCE ]]
                 [ rate RATE [PACKETOVERHEAD] [CELLSIZE] [CELLOVERHEAD]]

That’s a wealth of options. I can simulate somewhat unreliable WAN links, even with variations in delay! Refer to the Linux Foundation wiki for more details on how to simulate networks more realistically.

Introducing Delay

I prefer to keep it simple for this post, so I’ll stick with the most basic setting for network delay. I am introducing this to the Data Guard network on both of my standby nodes while connected to the management interface eth2:

[root@rac12sec1 ~]# tc qdisc add dev eth0 root netem delay 100ms

[root@rac12sec1 ~]# tc qdisc
qdisc netem 8001: dev eth0 root refcnt 2 limit 1000 delay 100.0ms
qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth2 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

[root@rac12sec2 ~]# tc qdisc
qdisc netem 8001: dev eth0 root refcnt 2 limit 1000 delay 100.0ms
qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth2 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

[root@rac12sec2 ~]# tc qdisc show dev eth0 
qdisc netem 8001: root refcnt 2 limit 1000 delay 100.0ms

The effect can be seen immediately by pinging one of the standby nodes:

[oracle@rac12pri1 ~]$ ping rac12sec1
PING rac12sec1.example.com (192.168.100.20) 56(84) bytes of data.
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=1 ttl=64 time=100 ms
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=2 ttl=64 time=100 ms
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=3 ttl=64 time=100 ms
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=4 ttl=64 time=100 ms
^C
--- rac12sec1.example.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 100.368/100.415/100.478/0.229 ms
[oracle@rac12pri1 ~]$ 

So 100ms delay it is!

Starting the Benchmark

With the scene set, it’s time to start the benchmark. The command to start it resembles the one I used for part I, except for the name of the results file.

[oracle@loadgen1 bin]$ ./charbench -cs //rac12pri-scan/swingbench_both \
> -uc 20 -r results_20_maxavail_netem.xml -rt 00:10
Author  :	 Dominic Giles
Version :	 2.5.0.971

Results will be written to results_20_maxavail_netem.xml.
Hit Return to Terminate Run...

Time		Users	TPM	TPS

1:08:56 PM      20      5188    94          

What effect does this have on the database? Let’s check the Data Guard broker first:

DGMGRL> show database 'NCDBB'

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          10 seconds (computed 0 seconds ago)
  Average Apply Rate: 207.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

DGMGRL> show database 'NCDBB'

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          0 seconds (computed 0 seconds ago)
  Average Apply Rate: 211.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

So it looks like there isn’t any lag visible with the benchmark underway, however it took a little while for it to start as visible in the first example. Here is the output from Tanel Poder’s ASHTOP script again (using Tanel Poder’s ashtop again – which requires you to have the Diagnostic Pack license as it access v$active_session_history):

SQL> @ashtop program,event,wait_class,session_state session_type='BACKGROUND' "to_date('17.02.2017 08:10','dd.mm.yyyy hh24:mi')" "to_date('17.02.2017 08:15','dd.mm.yyyy hh24:mi')"

    Total
  Seconds     AAS %This   PROGRAM                                          EVENT                                    WAIT_CLASS      SESSION FIRST_SEEN          LAST_SEEN           DIST_SQLEXEC_SEEN
--------- ------- ------- ------------------------------------------------ ---------------------------------------- --------------- ------- ------------------- ------------------- -----------------
      298     1.0   24% | oracle@rac12pri1 (LGWR)                          SYNC Remote Write                        Other           WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59                 1
      298     1.0   24% | oracle@rac12pri1 (NSS2)                          Redo Transport MISC                      Other           WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59                 1
      276      .9   23% | oracle@rac12pri2 (LGWR)                          SYNC Remote Write                        Other           WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59                 1
      274      .9   23% | oracle@rac12pri2 (NSS2)                          Redo Transport MISC                      Other           WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59                 1
        8      .0    1% | oracle@rac12pri2 (LMS1)                                                                                   ON CPU  2017-02-17 08:10:16 2017-02-17 08:14:59                 1
        7      .0    1% | oracle@rac12pri2 (LMS0)                          gcs log flush sync                       Other           WAITING 2017-02-17 08:10:43 2017-02-17 08:14:51                 1
        5      .0    0% | oracle@rac12pri1 (LMS0)                          gcs log flush sync                       Other           WAITING 2017-02-17 08:10:23 2017-02-17 08:13:48                 1
        5      .0    0% | oracle@rac12pri2 (LMS0)                                                                                   ON CPU  2017-02-17 08:10:00 2017-02-17 08:13:24                 1
        5      .0    0% | oracle@rac12pri2 (NSV1)                                                                                   ON CPU  2017-02-17 08:12:43 2017-02-17 08:14:44                 1
        4      .0    0% | oracle@rac12pri1 (DBW0)                                                                                   ON CPU  2017-02-17 08:11:16 2017-02-17 08:13:35                 1
        3      .0    0% | oracle@rac12pri2 (CJQ0)                                                                                   ON CPU  2017-02-17 08:12:39 2017-02-17 08:12:51                 2
        3      .0    0% | oracle@rac12pri2 (RSM0)                                                                                   ON CPU  2017-02-17 08:11:44 2017-02-17 08:14:44                 3
        2      .0    0% | oracle@rac12pri1 (LMON)                          control file sequential read             System I/O      WAITING 2017-02-17 08:11:51 2017-02-17 08:14:01                 1
        2      .0    0% | oracle@rac12pri1 (LMS1)                          gcs log flush sync                       Other           WAITING 2017-02-17 08:11:14 2017-02-17 08:11:44                 1
        2      .0    0% | oracle@rac12pri2 (DBW0)                          db file parallel write                   System I/O      WAITING 2017-02-17 08:12:14 2017-02-17 08:14:24                 1
        2      .0    0% | oracle@rac12pri2 (LGWR)                          log file parallel write                  System I/O      WAITING 2017-02-17 08:12:59 2017-02-17 08:14:42                 1
        1      .0    0% | oracle@rac12pri1 (CKPT)                          control file sequential read             System I/O      WAITING 2017-02-17 08:13:46 2017-02-17 08:13:46                 1
        1      .0    0% | oracle@rac12pri1 (DBW0)                          db file parallel write                   System I/O      WAITING 2017-02-17 08:10:28 2017-02-17 08:10:28                 1
        1      .0    0% | oracle@rac12pri1 (LGWR)                          Redo Transport Close                     Other           WAITING 2017-02-17 08:12:22 2017-02-17 08:12:22                 1
        1      .0    0% | oracle@rac12pri1 (LGWR)                          log file sequential read                 System I/O      WAITING 2017-02-17 08:14:16 2017-02-17 08:14:16                 1

20 rows selected.

As you can see the top waits are in wait class “other”: “SYNC Remote Write” and “Redo Transport MISC”. I have pulled an AWR diff report that shows an interesting picture. The first snapshot set represents the full network performance, while the second one shows the effect of the 100ms network delay. Only relevant information is shown.

WORKLOAD REPOSITORY RAC Compare Period Report for

Database Summary
~~~~~~~~~~~~~~~~
                            Database                   Snapshot Ids     Number of Instances    Number of Hosts    Report Total (minutes)
               ----------------------------------   -----------------   -----------------   -----------------   ---------------------
Snapshot Set            Id Name      RAC Block Si      Begin      End   In Repor    Total   In Repor    Total      DB time Elapsed ti
------------   ----------- --------- --- --------   -------- --------   -------- --------   -------- --------   ---------- ----------
First (1st)      481582958 NCDBA     YES     8192        303      304          2        2          2        2         22.1       10.2
Second (2nd)     481582958 NCDBA     YES     8192        301      302          2        2          2        2        141.9       10.6
Database Instances Included In Report
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                                                                             Elapsed              Instance Avg Activ
Set       DB Id Inst Instance  Release         Host       Startup         Begin Snap Time End Snap Time   Time (min) DB time (m Up Time (h  Sessions Platform
--- ----------- ---- --------- --------------- ---------- --------------- --------------- --------------- ---------- ---------- ---------- --------- ---------------
1st   481582958    1 NCDBA1    12.1.0.2.0      rac12pri1  17-Feb-17 04:00 17-Feb-17 08:59 17-Feb-17 09:09       10.0        2.0        5.2       0.2 Linux x86 64-bi
                   2 NCDBA2    12.1.0.2.0      rac12pri2  17-Feb-17 04:00 17-Feb-17 08:59 17-Feb-17 09:09       10.0       20.1        5.2       2.0 Linux x86 64-bi

2nd   481582958    1 NCDBA1    12.1.0.2.0      rac12pri1  17-Feb-17 04:00 17-Feb-17 08:06 17-Feb-17 08:17       10.5      134.0        4.3      12.8 Linux x86 64-bi
                   2 NCDBA2    12.1.0.2.0      rac12pri2  17-Feb-17 04:00 17-Feb-17 08:07 17-Feb-17 08:17       10.5        7.9        4.3       0.8 Linux x86 64-bi

...

Workload Comparison
~~~~~~~~~~~~~~~~~~~                      1st Per Sec          2nd Per Sec      %Diff              1st Per Txn          2nd Per Txn      %Diff
                                     ---------------      ---------------     ------          ---------------      ---------------     ------
                       DB time:                  2.2                 13.3      514.3                      0.0                  0.2    1,600.0
                      CPU time:                  1.5                  0.5      -66.0                      0.0                  0.0        0.0
           Background CPU time:                  0.5                  0.2      -56.9                      0.0                  0.0        0.0
             Redo size (bytes):            983,488.1            322,026.1      -67.3                  4,164.1              4,192.4        0.7
         Logical read (blocks):             30,773.7              8,299.3      -73.0                    130.3                108.0      -17.1
                 Block changes:              5,665.8              1,886.0      -66.7                     24.0                 24.6        2.3
        Physical read (blocks):                 10.5                 18.3       74.4                      0.0                  0.2      500.0
       Physical write (blocks):                654.9                196.8      -69.9                      2.8                  2.6       -7.6
              Read IO requests:                 10.1                 12.9       27.3                      0.0                  0.2      325.0
             Write IO requests:                503.8                110.4      -78.1                      2.1                  1.4      -32.4
                  Read IO (MB):                  0.1                  0.1       75.0                      0.0                  0.0        0.0
                 Write IO (MB):                  5.1                  1.5      -69.9                      0.0                  0.0        0.0
                  IM scan rows:                  0.0                  0.0        0.0                      0.0                  0.0        0.0
       Session Logical Read IM:
                    User calls:                515.8                188.2      -63.5                      2.2                  2.5       12.4
                  Parses (SQL):                 24.9                 17.7      -28.8                      0.1                  0.2      109.1
             Hard parses (SQL):                  0.1                  0.2       45.5                      0.0                  0.0        0.0
            SQL Work Area (MB):                  0.1                  0.5      253.8                      0.0                  0.0      253.8
                        Logons:                  1.3                  3.9      191.7                      0.0                  0.1      400.0
                Executes (SQL):              2,587.7                853.0      -67.0                     11.0                 11.1        1.4
                  Transactions:                236.2                 76.8      -67.5

...

Top Timed Events   First DB/Inst: NCDBA/NCDBA1 Snaps: 303-304 (Elapsed time: 612.266 sec  DB time: 1326.09 sec),  Second DB/Inst: NCDBA/NCDBA1 Snaps: 301-302 (Elapsed time: 638.691 sec  DB time: 8514.42 sec)
-> Events with a "-" did not make the Top list in this set of snapshots, but are displayed for comparison purposes
-> Summary for all instances

                                               1st                                                                                                2nd
------------------------------------------------------------------------------------------------   ------------------------------------------------------------------------------------------------
Event                          Wait Class           Waits      Time(s)  Avg Time(ms)    %DB time   Event                          Wait Class           Waits      Time(s)  Avg Time(ms)    %DB time
------------------------------ ------------- ------------ ------------ ------------- -----------   ------------------------------ ------------- ------------ ------------ ------------- -----------
 CPU time                                             N/A        919.5           N/A        69.3    log file sync                 Commit              49,010      7,976.7         162.8        93.7
 log file sync                 Commit             144,284        270.4           1.9        20.4    Redo Transport MISC           Other               11,392      1,192.0         104.6        14.0
 log file parallel write       System I/O         141,581        156.6           1.1        11.8    SYNC Remote Write             Other               11,392      1,179.1         103.5        13.8
 gc current block 2-way        Cluster             98,483         71.9           0.7         5.4    CPU time                                             N/A        328.7           N/A         3.9
 gc cr block 2-way             Cluster             71,469         47.9           0.7         3.6    gcs log flush sync            Other                4,480         56.0          12.5         0.7
 target log write size         Other               22,284         26.5           1.2         2.0    gc current block 2-way        Cluster             62,843         43.3           0.7         0.5
 gc current grant busy         Cluster             22,561         16.8           0.7         1.3    gc current block busy         Cluster                284         39.3         138.4         0.5
 control file sequential read  System I/O           8,497          6.0           0.7         0.5    log file switch (checkpoint i Configuration           75         37.1         495.1         0.4
 db file sequential read       User I/O             5,862          5.4           0.9         0.4    rdbms ipc reply               Other                   32         27.1         845.4         0.3
 log file sequential read      System I/O          14,895          4.8           0.3         0.4    gc cr block 2-way             Cluster             39,015         26.0           0.7         0.3
-log file switch (checkpoint i Configuration          212          3.6          17.2         0.3   -log file parallel write       System I/O          11,267         20.3           1.8         0.2
-gcs log flush sync            Other                  461          0.4           0.9         0.0   -gc current grant busy         Cluster             12,427          8.2           0.7         0.1
-gc current block busy         Cluster                126          0.3           2.7         0.0   -control file sequential read  System I/O           7,326          5.0           0.7         0.1
-                                                     N/A          N/A           N/A         N/A   -db file sequential read       User I/O             6,615          4.7           0.7         0.1
-                                                     N/A          N/A           N/A         N/A   -log file sequential read      System I/O             174          0.4           2.4         0.0
                          --------------------------------------------------------------------------------------------------------------------

This doesn’t look too good, does it? The throughput is down substantially, which is visible in Swingbench itself: in the first example I managed to achieve 255 TPS while I’m averaging 90-95 in the synch + delay example. The numbers from the AWR excerpt shown above confirm this, and allows you to work out where the problem is.

Here is a print screen taken from OEM (that’s a different benchmark execution), showing the same situation. Make sure you are licensed appropriately to use this view.

oemexpress_with_latency

Summary Part II

So yes, there is a very noticeable affect of the network delay, and the AWR report shows the relevant wait events in 12c indicating where time is spent.

In my opinion it’s a bit unfortunate to have the new waits in the “Other” wait class, but that’s how it is.

References

http://www.oracle.com/technetwork/database/availability/sync-2437177.pdf

New Events for Data Guard and Synchronous Redo Transport in 12c (1)

I may have said it before but I consider presenting and teaching a great way to expand one’s knowledge: first of all it requires me to really understand a subject. Secondly, when presenting, you get lots of interesting questions that can turn into blog posts like this one.

Lately I have been asked about the impact of synchronous log shipping to a physical standby database. I was sure there was an effect to be observed, depending most likely on the network latency between systems but I didn’t have any evidence I could pull out of the hat to back up my thoughts. So what better than trying! I also read that some of the events have changed in 12c, and wanted to make them visible. My environment is based on the 2 node RAC primary/2 node RAC standby configuration I wrote about in my previous posts.

Since their initial setup I upgraded the cluster to 12.1.0.2.170117 for Clusterware and RDBMS.

I am planning to share my findings in two posts: the first one (the one you are reading) will present the baseline, the second one to follow shows how performance can be impacted by poor network performance. I guess the second one will be a bit longer …

Background

Data Guard allows the administrator to define the databases in a Data Guard configuration to be in one of three possible modes. The first one is Maximum Performance, the second Maximum Availability and finally Maximum Protection. All three modes are covered in the official documentation set and MOS. Without saying it very scientifically:

  • Maximum Performance uses asynchronous log shipping and is the default. The performance of the standby database (or rather lack thereof) should not impact the performance of the primary. At a cost: it is not guaranteed that redo is actually written to the standby when the commit command returns.
  • Maximum Protection allows you to enforce a Zero Data Loss (ZDL) strategy but also at a cost: The primary will shut down if it can’t transmit redo information to the standby. That is how you prevent data loss!
  • Maximum Availability is the middle ground and I’ll focus on that protection mode here.

Make sure to read the documentation and understand the implications of each protection mode when assessing the most suitable mode for your environment!

Background: New Events

Before going into details I should probably point out that there are some new Data Guard events. Hurray! More events and more meaningful names, great. Just one caveat-have a look at the output from my 12c database:

select name, wait_class from v$event_name
where name in (
'ARCH Remote Write','ASYNC Remote Write','Redo Transport Attach',
'Redo Transport Close','Redo Transport Detach','Redo Transport Open',
'Redo Transport Ping','Redo Transport Slave Shutdown','Redo Transport Slave Startup',
'Redo Writer Remote Sync Complete', 'Redo Writer Remote Sync Notify',
'Remote SYNC Ping','SYNC Remote Write');

NAME                                               WAIT_CLASS
-------------------------------------------------- ----------------------------------------
ASYNC Remote Write                                 Other
SYNC Remote Write                                  Other
ARCH Remote Write                                  Other
Redo Transport Attach                              Other
Redo Transport Detach                              Other
Redo Transport Open                                Other
Redo Transport Close                               Other
Redo Transport Ping                                Other
Remote SYNC Ping                                   Other
Redo Transport Slave Startup                       Other
Redo Transport Slave Shutdown                      Other
Redo Writer Remote Sync Notify                     Other
Redo Writer Remote Sync Complete                   Other

13 rows selected.

Compare this to the pre 12c events (taken from an 11.2.0.3 database since they are mostly gone in 12c)

select name, wait_class from v$event_name
where name in (
'ARCH wait on ATTACH','ARCH wait on SENDREQ','ARCH wait on DETACH',
'LNS wait on ATTACH', 'LNS wait on SENDREQ','LNS wait on DETACH',
'LGWR wait on LNS','LNS wait on LGWR','LGWR-LNS wait on channel');

NAME                                               WAIT_CLASS
-------------------------------------------------- --------------------
ARCH wait on ATTACH                                Network
LNS wait on ATTACH                                 Network
LNS wait on SENDREQ                                Network
LNS wait on DETACH                                 Network
ARCH wait on SENDREQ                               Network
ARCH wait on DETACH                                Network
LNS wait on LGWR                                   Network
LGWR wait on LNS                                   Network
LGWR-LNS wait on channel                           Other

9 rows selected.

For the sake of completeness here are those events that remained in 12c:

select name, wait_class from v$event_name
where name in (
'ARCH wait on ATTACH','ARCH wait on SENDREQ','ARCH wait on DETACH',
'LNS wait on ATTACH', 'LNS wait on SENDREQ','LNS wait on DETACH',
'LGWR wait on LNS','LNS wait on LGWR','LGWR-LNS wait on channel');

NAME                           WAIT_CLASS
------------------------------ ------------------------------
LNS wait on LGWR               Network
LGWR wait on LNS               Network
LGWR-LNS wait on channel       Other

Bugger. The new events are all in the “Other” wait class. My initial attempts at filtering information from v$event_histogram based on events with the “Network” class had little success. I guess friends of the OEM performance pages might also find that interesting. If you wonder from where I pulled these events – I used a white paper named “Best Practices for Synchronous Redo Transport” as reference.

Testing: The Environment

As you read in the introduction, this environment is based on a two node RAC primary -> two node RAC standby configuration:

DGMGRL> show configuration verbose

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database

  Properties:
    FastStartFailoverThreshold      = '30'
    OperationTimeout                = '30'
    TraceLevel                      = 'USER'
    FastStartFailoverLagLimit       = '30'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '0'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'CONTINUE'

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS
DGMGRL> show database 'NCDBA';

Database - NCDBA

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    NCDBA1
    NCDBA2

Database Status:
SUCCESS

DGMGRL> show database 'NCDBB'

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          0 seconds (computed 0 seconds ago)
  Average Apply Rate: 177.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

DGMGRL> show database 'NCDBA' LogXptMode
  LogXptMode = 'async'
DGMGRL> show database 'NCDBB' LogXptMode
  LogXptMode = 'async'

The configuration is currently set to “Maximum Performance”, databases ship redo asynchronously.

All systems are patched to the January 2017 Proactive Bundle Patch:

[oracle@rac12pri1 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
24917972;Database PSU 12.1.0.2.170117, Oracle JavaVM Component (JAN2017)
24732082;Database Patch Set Update : 12.1.0.2.170117 (24732082)
24828633;OCW Patch Set Update : 12.1.0.2.170117 (24828633)

OPatch succeeded.

Testing part I: The Baseline

I am using Swingbench’s Order Entry benchmark to generate a little bit of load. For the avoidance of doubt: I am not interested in a comparison between asynchronous and synchronous log shipping. I would like to see the effect caused by the difference in latency in the form of hugely increased waits. In my first test I am running with the defaults of my VM (the NIC uses virtio drivers), no change to SDU or send/receive buffers.

$ date
Fri Feb 17 13:58:58 UTC 2017
$ ./charbench -cs //rac12pri-scan/swingbench_both \
> -uc 20 -r results_20_maxperf.xml -rt 00:10
Author  :        Dominic Giles
Version :        2.5.0.971

Results will be written to results_20_maxperf.xml.
Hit Return to Terminate Run...

Time            Users   TPM     TPS

1:59:55 PM      20      3852    255

On the apply side this translates to the following figures (the average apply rate is misleading and should be ignored; it captures all apply since MRP0 was started)

DGMGRL> show database 'NCDBB'

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          0 seconds (computed 0 seconds ago)
  Average Apply Rate: 713.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

There is no transport lag, and no apply lag either. When checking for waits on background processes (using Tanel Poder’s ashtop – which requires you to have the Diagnostic Pack license as it access v$active_session_history) I noticed the following:

SQL> @ashtop program,event,wait_class,session_state session_type='BACKGROUND' "to_date('17.02.2017 09:02','dd.mm.yyyy hh24:mi')" "to_date('17.02.2017 09:07','dd.mm.yyyy hh24:mi')"

    Total
  Seconds     AAS %This   PROGRAM                                          EVENT                                    WAIT_CLASS      SESSION FIRST_SEEN          LAST_SEEN           DIST_SQLEXEC_SEEN
--------- ------- ------- ------------------------------------------------ ---------------------------------------- --------------- ------- ------------------- ------------------- -----------------
       53      .2   29% | oracle@rac12pri2 (LGWR)                          log file parallel write                  System I/O      WAITING 2017-02-17 09:03:42 2017-02-17 09:06:53                 1
       23      .1   13% | oracle@rac12pri2 (LG00)                          log file parallel write                  System I/O      WAITING 2017-02-17 09:02:03 2017-02-17 09:03:26                 1
       19      .1   10% | oracle@rac12pri2 (DBW0)                                                                                   ON CPU  2017-02-17 09:02:07 2017-02-17 09:06:39                 1
       12      .0    7% | oracle@rac12pri2 (LGWR)                          target log write size                    Other           WAITING 2017-02-17 09:02:03 2017-02-17 09:03:27                 1
        9      .0    5% | oracle@rac12pri1 (LMS1)                                                                                   ON CPU  2017-02-17 09:04:37 2017-02-17 09:06:58                 1
        8      .0    4% | oracle@rac12pri2 (TT00)                                                                                   ON CPU  2017-02-17 09:02:15 2017-02-17 09:06:56                 1
        6      .0    3% | oracle@rac12pri1 (LG00)                          log file parallel write                  System I/O      WAITING 2017-02-17 09:03:08 2017-02-17 09:06:44                 1
        6      .0    3% | oracle@rac12pri2 (LGWR)                                                                                   ON CPU  2017-02-17 09:04:05 2017-02-17 09:06:41                 1
        5      .0    3% | oracle@rac12pri2 (LG00)                                                                                   ON CPU  2017-02-17 09:02:02 2017-02-17 09:03:27                 1
        5      .0    3% | oracle@rac12pri2 (LMS0)                                                                                   ON CPU  2017-02-17 09:02:07 2017-02-17 09:06:14                 1
        5      .0    3% | oracle@rac12pri2 (LMS1)                                                                                   ON CPU  2017-02-17 09:02:24 2017-02-17 09:06:28                 1
        4      .0    2% | oracle@rac12pri1 (LMS0)                                                                                   ON CPU  2017-02-17 09:04:21 2017-02-17 09:06:05                 1
        3      .0    2% | oracle@rac12pri1 (LGWR)                          LGWR all worker groups                   Other           WAITING 2017-02-17 09:03:08 2017-02-17 09:06:13                 1
        2      .0    1% | oracle@rac12pri1 (DBW0)                          db file parallel write                   System I/O      WAITING 2017-02-17 09:03:08 2017-02-17 09:03:42                 1
        2      .0    1% | oracle@rac12pri2 (CKPT)                                                                                   ON CPU  2017-02-17 09:05:10 2017-02-17 09:05:13                 1
        2      .0    1% | oracle@rac12pri2 (TT00)                          log file sequential read                 System I/O      WAITING 2017-02-17 09:03:17 2017-02-17 09:03:46                 1
        1      .0    1% | oracle@rac12pri1 (CJQ0)                                                                                   ON CPU  2017-02-17 09:04:44 2017-02-17 09:04:44                 1
        1      .0    1% | oracle@rac12pri1 (DBW0)                                                                                   ON CPU  2017-02-17 09:03:05 2017-02-17 09:03:05                 1
        1      .0    1% | oracle@rac12pri1 (LG00)                                                                                   ON CPU  2017-02-17 09:02:02 2017-02-17 09:02:02                 1
        1      .0    1% | oracle@rac12pri1 (LGWR)                          target log write size                    Other           WAITING 2017-02-17 09:05:47 2017-02-17 09:05:47                 1

The time difference between charbench and the database can be explained by time zones: my load generator is set to use UTC whereas the database is on EST; the result is a 6 hour time difference. This is a minor oversight on my part and has nothing to do with the location of the machines. In fact, they are all virtualised.

Summary Part I

In this part I tried to lay the foundation for the next one where I’m using my lab environment to simulate an artificially high network latency. I have to admit again that my setup is somewhat over-simplified-using Data Guard with asynchronous log shipping and Maximum Performance mode is not something I’d like to support for anything more important than my toy databases in the lab.

References

http://www.oracle.com/technetwork/database/availability/sync-2437177.pdf