Category Archives: 12c Release 2

12.2 new features: a first encounter with the new Cluster Health Advisor

One of the many new manuals that showed up in Oracle Database 12.2 is the Oracle Autonomous Health Framework User’s Guide. This book contains a lot of information, some of which we already know from previous releases, and some new stuff as well. One of the new chapters caught my eye in particular:

“Proactively Detecting and Diagnosing Performance Issues for Oracle RAC”

Said chapter 5 introduces a new feature, called the Cluster Health Advisor (CHA), that should help you understand the performance of your cluster better. Really? Time to start the VMs to create a test.

My system is based on Oracle Linux 7.3 with UEK4. I created a 2 node 12.2.0.1.0 RAC, there weren’t any patches out at the time of writing. In other words, this function might change a little. My database, named CDB, features 2 PDBs: swingbench1 and swingbench2 (you can already guess where this is heading). It appears that my cluster is a “flex cluster” operating in standard mode. You noticed during configuration that you defined the nodes as “HUB nodes”, didn’t you? I have print screens from my RAC installation that I’ll put into a blog post when I find the time. In the meantime head over to oracle-base.com where you can find a RAC 12.2 installation walkthrough for Oracle Linux on VirtualBox.

The plan for this post is to overload the system deliberately to provoke the health advisor to give me recommendations about the cluster and/or the database.

The official documentation is found here: https://docs.oracle.com/database/122/ATNMS/monitoring-rac-with-cha.htm

How is the Cluster Health Advisory implemented?

As per the documentation, there are Cluster Health Advisory daemon processes found on each node, and sure enough there are:

[oracle@rac122pri1 ~]$ crsctl stat res -t 
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.DATA.dg
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.MGMT.dg
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.OCR.dg
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.RECO.dg
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.chad
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.net1.network
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.ons
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.proxy_advm
               OFFLINE OFFLINE      rac122pri1               STABLE
               OFFLINE OFFLINE      rac122pri2               STABLE
...

The resource in question is “ora.chad”, and its profile defines dependencies to the GIMR or Grid Infrastructure Management Repository:

[oracle@rac122pri1 ~]$ crsctl stat res ora.chad -p | grep "DEPENDENCIES\b"
START_DEPENDENCIES=hard(global:ora.mgmtdb) pullup(global:ora.mgmtdb)
STOP_DEPENDENCIES=hard(global:intermediate:ora.mgmtdb)
[oracle@rac122pri1 ~]$ 

The documentation further states that the Health Advisor (CHA) Daemon automatically manages cluster nodes. Databases registered in Clusterware however must be registered specifically with the monitoring daemon.

The main tool to interact with the CHA is – chactl (there’s a surprise). It can take quite a few options:

[oracle@rac122pri1 bin]$ chactl -h
CLSCH-3659 : invalid command line syntax
Usage:
chactl monitor cluster [-model  [-force]]
chactl monitor database -db 
                        [-model  [-force]]
chactl unmonitor database -db 
chactl status [cluster|database [-db ]] [-verbose]
chactl config [cluster|database -db ]
chactl calibrate {cluster|database -db }
                 -model  [-force]
                 [-timeranges 'start=,end=,...']
                 [-kpiset 'name= min= max=,...' ]
    WHERE:
        -interval  : interval is in hrs
        -timeranges 'start=,end=,...' :
             Timestamp must be in format 'YYYY-MM-DD HH24:MI:SS'
    KPI for db:
        CPUPERCENT - CPU utilization - Percent
        IOREAD - Disk read - Mbyte/sec
        DBTIMEPERCALL - Database time per user call - usec/call
        IOWRITE - Disk write - Mbyte/sec
        IOTHROUGHPUT - Disk throughput - IO/sec
    KPI for cluster:
        CPUPERCENT - CPU utilization - Percent
        IOREAD - Disk read - Mbyte/sec
        IOWRITE - Disk write - Mbyte/sec
        IOTHROUGHPUT - Disk throughput - IO/sec
chactl query diagnosis [-cluster|-db ]
                       [-start  -end ]
                       [-htmlfile ]
chactl query model [-name  [-verbose]]
chactl query repository
chactl query calibration {-cluster|-db }
        [-timeranges 'start=,end=,...']
        [-kpiset 'name= min= max=,...' ]
        [-interval ]
    WHERE:
        -interval  : interval is in hrs
        -timeranges 'start=,end=,...' :
             Timestamp must be in format 'YYYY-MM-DD HH24:MI:SS'
    KPI for db:
        CPUPERCENT - CPU utilization - Percent
        IOREAD - Disk read - Mbyte/sec
        DBTIMEPERCALL - Database time per user call - usec/call
        IOWRITE - Disk write - Mbyte/sec
        IOTHROUGHPUT - Disk throughput - IO/sec
    KPI for cluster:
        CPUPERCENT - CPU utilization - Percent
        IOREAD - Disk read - Mbyte/sec
        IOWRITE - Disk write - Mbyte/sec
        IOTHROUGHPUT - Disk throughput - IO/sec
chactl remove model -name 
chactl rename model -from  -to 
chactl import model -name  -file  [-force]
chactl export model -name  -file 
chactl set maxretention -time 
chactl resize repository -entities 
   [-force | -eval]
[oracle@rac122pri1 bin]$

The output of the “help” command has been reformatted for readability.

And yes the documentation is right. At first only the cluster resource was monitored.

[oracle@rac122pri1 ~]$ chactl status
monitoring nodes rac122pri2, rac122pri1
not monitoring databases
[oracle@rac122pri1 ~]$ chactl status -verbose
monitoring nodes rac122pri2, rac122pri1 using model DEFAULT_CLUSTER
not monitoring databases
[oracle@rac122pri1 ~]$

Monitoring my database

In the next step I wanted to include my database to the monitoring configuration. That is easy:

[oracle@rac122pri1 ~]$ chactl monitor database -db CDB
[oracle@rac122pri1 ~]$ chactl status -verbose
monitoring nodes rac122pri2, rac122pri1 using model DEFAULT_CLUSTER
monitoring database cdb, instances CDB2, CDB1 using model DEFAULT_DB

Creating load and causing trouble

With the monitoring in place it was time to create some trouble. I started a swingbench run deliberately overloading the CPUs on my cluster. To add to the fun I rebooted a node during the swingbench execution. Although that didn’t stop processing (I used the connection pool method to establish sessions) I caused all of my sessions to pile up on the surviving node. It held its head up, but performance took a nosedive.

Any recommendations?

The Cluster Health Advisor can be interrogated using the “query diagnosis” command. After causing sufficient trouble I asked for a first assessment. Interestingly enough I hit the repository right after I rebooted node 2. I hadn’t checked at the time, but node 2 ran the GIMR – hence this output:

[oracle@rac122pri1 bin]$ chactl query diagnosis -db CDB\
> -start "2017-03-15 05:20:50" -end "2017-03-15 05:55:50"
CLSCH-2005 : Oracle Cluster Health Analysis Service (OCHAD) failed due to a Grid
Infrastructure Management Repository error.
Unable to start the Universal Connection Pool:
oracle.ucp.UniversalConnectionPoolException: Cannot get Connection from Datasource:
java.sql.SQLRecoverableException: Listener refused the connection with the following
error:

ORA-12514, TNS:listener does not currently know of service requested in connect descriptor

After a little while the failed resources were up again, and I could get some output:

[oracle@rac122pri1 bin]$ chactl query diagnosis -db CDB \
> -start "2017-03-15 05:20:50" -end "2017-03-15 05:55:50"
2017-03-15 05:54:40.0  Database cdb  DB Recovery Read IO Performance (CDB1) [detected]
2017-03-15 05:55:00.0  Database cdb  DB Recovery Read IO Performance (CDB1) [cleared]

Problem: DB Recovery Read IO Performance
Description: CHA detected that recovery reads are slower than expected.
Cause: The Cluster Health Advisor (CHA) detected that a database instance left the cluster
unexpectedly and instance recovery had a larger impact on performance than expected
because more database blocks than expected had to be recovered. The data blocks to which
recovery was applied were read from disk or from the global cache.
Action: Decrease the target for the mean time to recover ( MTTR target ) to increase the
incremental checkpointing rate.
[oracle@rac122pri1 bin]$

Spot on!

SQL> show parameter mttr

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_mttr_target               integer     0
SQL> 

This is typical for DBCA created databases based on the General_Purpose.dbc template. Good catch!

Now the system was still struggling under load:


SQL> select count(*), a.inst_id, b.name from gv$session a, v$pdbs b
  2  where a.con_id = b.con_id and a.username = 'SOE' group by a.inst_id, b.name;

  COUNT(*)    INST_ID NAME
---------- ---------- ------------------------------
        75          1 SWINGBENCH1

The pool was set to 75 users, but should have been split between both nodes. With the second node down however there was only one to run on.

The operating system didn’t look too happy either:

[root@rac122pri1 trace]# top -b  -n 1 | head -n 15
top - 06:05:08 up 21:14,  2 users,  load average: 15.47, 13.66, 8.69
Tasks: 1403 total,  10 running, 1393 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.0 us,  2.2 sy,  0.0 ni, 92.5 id,  0.2 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 16167836 total,  5510468 free,  4714544 used,  5942824 buff/cache
KiB Swap:   782332 total,   782332 free,        0 used.  7338672 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
10339 root      20   0  158952   5516   3608 R  16.7  0.0   0:00.03 top
10341 root      20   0   28604   5428   3728 R  11.1  0.0   0:00.02 perl
20798 oracle    20   0 2545588 812828 805112 S  11.1  5.0   0:40.34 oracle_20798_cd
20824 oracle    20   0 2543552 817728 809924 S  11.1  5.1   0:40.01 oracle_20824_cd
20828 oracle    20   0 2545592 810800 803068 S  11.1  5.0   0:39.88 oracle_20828_cd
24365 oracle    20   0 2543524 588452 581008 S  11.1  3.6   0:22.72 oracle_24365_cd
24457 oracle    20   0 2543748 596728 588752 S  11.1  3.7   0:22.56 oracle_24457_cd
28569 oracle    20   0 2543524 552440 545016 S  11.1  3.4   0:20.18 oracle_28569_cd
[root@rac122pri1 trace]#

Although I guess that is not too bad for a VM with only 4 cores! Inevitably though there is trouble (but it seems to be over)

[oracle@rac122pri1 bin]$ chactl query diagnosis -db CDB \
> -start "2017-03-15 05:20:00" -end "2017-03-15 06:20:00"
2017-03-15 05:54:40.0  Database cdb  DB Recovery Read IO Performance (CDB1) [detected]
2017-03-15 05:55:00.0  Database cdb  DB Recovery Read IO Performance (CDB1) [cleared]
2017-03-15 05:58:55.0  Database cdb  Long Waits for Global Cache Message Requests (CDB1) [detected]
2017-03-15 05:58:55.0  Database cdb  Long Redo Log Sync Wait (CDB1) [detected]
2017-03-15 06:04:25.0  Database cdb  Long Waits for Global Cache Message Requests (CDB1) [cleared]
2017-03-15 06:04:25.0  Database cdb  Long Redo Log Sync Wait (CDB1) [cleared]

Problem: DB Recovery Read IO Performance
Description: CHA detected that recovery reads are slower than expected.
Cause: The Cluster Health Advisor (CHA) detected that a database instance left the cluster
unexpectedly and instance recovery had a larger impact on performance than expected because
more database blocks than expected had to be recovered. The data blocks to which recovery
was applied were read from disk or from the global cache.
Action: Decrease the target for the mean time to recover ( MTTR target ) to increase the
incremental checkpointing rate.

Problem: Long Waits for Global Cache Message Requests
Description: CHA detected that global cache messages are taking a long time.
Cause: The Cluster Health Advisor (CHA) detected that global cache messages have not
completed and are blocking database processes. It is likely that Global Cache Server
Processes (LMS) on another node are unresponsive or are waiting for CPU or Memory.
Action: Check whether CHA reports any issues related to severe performance degradation
caused by CPU or memory starvation on other nodes in the cluster and follow the suggestions
to resolve those problems.

Problem: Long Redo Log Sync Wait
Description: CHA detected that commits are blocked for several seconds.
Cause: The Cluster Health Advisor (CHA) detected that commits are blocked for several
seconds because cluster commit progagation messages are very slow. It is likely that a
Global Cache Server Process (LMS) on another node is unresponsive or is waiting for CPU
or Memory.
Action: Check whether CHA reports any issues related to severe performance degradation
on other nodes in the cluster. Resolve the problems found on other nodes and instances
of this database in the cluster.
[oracle@rac122pri1 bin]$

I really like the text representation because I like the command line. For those of us who prefer a more aesthetically pleasing representation of the data, you can also create a HTML version of the findings specifying the htmlfile flag.

What else?

Going over the list of options again I found a few more that look interesting. For instance, what’s the footprint of this information in the repository? Query it:

[oracle@rac122pri1 bin]$ chactl query repository

specified max retention time(hrs): 72
available retention time(hrs)    : 307
available number of entities     : 17
allocated number of entities     : 4
total repository size(gb)        : 15.00
allocated repository size(gb)    : 1.29
[oracle@rac122pri1 bin]$ 

And what are these models I see when querying the status? DEFAULT_DB and DEFAULT_CLUSTER exist:

[oracle@rac122pri1 bin]$ chactl query model
Models: DEFAULT_CLUSTER, DEFAULT_DB

[oracle@rac122pri1 bin]$ chactl query model -name DEFAULT_DB
Model: DEFAULT_DB
Target Type: DATABASE
Version: 12.2.0.1_0
OS Calibrated on: 
Calibration Target Name: 
Calibration Date: 
Calibration Time Ranges: 
Calibration KPIs: 
[oracle@rac122pri1 bin]$ 

The documentation elaborates, it is possible to calibrate the tool to better reflect the environment.

This is actually quite useful, however I doubt many clusters will deviate from the standard model. Anyway, if you want to calibrate your cluster to a specific workload, you start of querying CHA to check if it has sufficient data. In my case I could dream up a scenario called “swingbench run”.

The cluster as such comes first:

[oracle@rac122pri1 bin]$ chactl query calibration -cluster -timeranges 
>'start=2017-03-15 05:20:00,end=2017-03-15 06:20:00'

Cluster name : rac122pri
Start time : 2017-03-15 05:20:00
End time : 2017-03-15 06:30:00
Total Samples : 1647
Percentage of filtered data : 100%

1) Disk read (ASM) (Mbyte/sec)

MEAN      MEDIAN    STDDEV    MIN       MAX     
1.19      0.45      3.71      0.00      86.69   

<25       <50       <75       =100    
99.64%    0.18%     0.12%     0.06%     0.00%    

2) Disk write (ASM) (Mbyte/sec)

MEAN      MEDIAN    STDDEV    MIN       MAX     
3.14      0.42      4.67      0.00      61.67   

<50       <100      <150      =200    
99.94%    0.06%     0.00%     0.00%     0.00%    

3) Disk throughput (ASM) (IO/sec)

MEAN      MEDIAN    STDDEV    MIN       MAX     
339.59    100.00    512.97    0.00      9700.00 

<5000     <10000    <15000    =20000  
99.88%    0.12%     0.00%     0.00%     0.00%    

4) CPU utilization (total) (%)

MEAN      MEDIAN    STDDEV    MIN       MAX     
37.89     16.80     33.90     4.30      99.90   

<20       <40       <60       =80     
55.68%    7.35%     3.76%     4.74%     28.48%   

[oracle@rac122pri1 bin]$ 

As per the command’s help message you can limit the output of KPIs. Similarly you can check if you have enough data for the database.

[oracle@rac122pri1 bin]$ chactl query calibration -db CDB -timeranges \
> 'start=2017-03-15 05:20:00,end=2017-03-15 06:20:00'
Database name : cdb
Start time : 2017-03-15 05:20:00
End time : 2017-03-15 06:30:00
Total Samples : 1632
Percentage of filtered data : 100%

1) Disk read (ASM) (Mbyte/sec)

MEAN      MEDIAN    STDDEV    MIN       MAX     
1.16      0.46      3.61      0.00      86.69   

<25       <50       <75       =100    
99.69%    0.12%     0.12%     0.06%     0.00%    

2) Disk write (ASM) (Mbyte/sec)

MEAN      MEDIAN    STDDEV    MIN       MAX     
3.17      0.43      4.69      0.00      61.67   

<50       <100      <150      =200    
99.94%    0.06%     0.00%     0.00%     0.00%    

3) Disk throughput (ASM) (IO/sec)

MEAN      MEDIAN    STDDEV    MIN       MAX     
341.61    100.00    514.31    0.00      9700.00 

<5000     <10000    <15000    =20000  
99.88%    0.12%     0.00%     0.00%     0.00%    

4) CPU utilization (total) (%)

MEAN      MEDIAN    STDDEV    MIN       MAX     
37.94     16.75     33.99     4.30      99.90   

<20       <40       <60       =80     
55.82%    7.17%     3.62%     4.66%     28.74%   

5) Database time (per user call) (usec/call)

MEAN      MEDIAN    STDDEV    MIN       MAX     
108878.33  6712.80   2778378.28  0.00      93077792.00

<10000000  <20000000  <30000000  <40000000  <50000000  <60000000  =70000000
99.88%    0.00%     0.00%     0.00%     0.00%     0.00%     0.06%     0.06%    

[oracle@rac122pri1 bin]$ 

Be careful though, sometimes you don’t have enough data! I saw a message like this when setting the window to 30 minutes:

[oracle@rac122pri1 ~]$ chactl query calibration -cluster \
> -timeranges 'start=2017-03-15 05:50:00,end=2017-03-15 06:20:00'

Cluster name : rac122pri
Start time : 2017-03-15 05:50:00
End time : 2017-03-15 06:20:00
Total Samples : 687
Percentage of filtered data : 100%
The number of data samples may not be sufficient for calibration.
...

It looks like you can leave it at that, or create a new, more accurate model with finer granularity. Interesting new features await!

A quick look at Oracle 12.2 performance instrumentation

Thanks to the prep-work done by Andy Colvin, who continuously and boldly goes where no man has gone before, one of our Exadata systems in the lab is now fully upgraded to 12.2.0.1. It comes fully equipped with the matching cellos to support all the cool new features. Exciting times!

The reason for this post is simple: I have started working on our talk for @Enkitec’s E4 conference in June but thought I’d share this little article with you as a teaser :) There might be one or two more of these posts but if you want the full story make sure you catch us (online) during the conference.

More diagnostic information in 12.2

The Oracle database truly champions diagnosability in a way I have not seen with any other system, and it does so out of the box. Granted, some of that requires an investment into extra cost options but the value one gets in form of Active Session History (ASH) and Automatic Workload Repository (AWR) is real. After I read the chapter on instrumentation in “Insights-tales from the Oaktable” (Apress) a long time ago, I started to realise the inherent beauty of having insights to code. This code can be your code if you instrumented it properly, or the Oracle codepath externalised as wait events. Sadly most application developers do not adhere to the principle of instrumenting code (or maybe don’t even know about the possibility?) and therefore complicate troubleshooting unnecessarily. The latter is not so much an issue on many platforms where you don’t have an equivalent of the Oracle Wait Interface and session counters anyway, but on Oracle it’s a real wasted opportunity as others have pointed out before me.

I’ll now take my idealist hat off and focus on the real world :) In the far more common case where the application isn’t instrumented by the developer, you can still get to some conclusions by using the Wait Interface and session counters. In most scenarios I am involved in the first step is to use Tanel Poder’s session snapper script which gives me insights to both.

So what has changed in this respect with 12.2? The test environment I am using is an Exadata quarter rack as mentioned before. The findings should be comparable with other Oracle software offerings, in the cloud and on premises.

Wait Interface

The Wait Interface is one of the most useful features for the performance engineer, and one of the building blocks for Statspack, ASH and AWR. I was curious if new events have been introduced in 12.2, hence this investigation. The sheer number of events tracked in the database prevents them from being listed verbally in this post. If you want to, you can use Tanel’s “sed.sql” to find out more, or simply query v$event_name.

An interesting tidbit for 12.2 has been covered by @FranckPachot: some of the more misleading event names such as db file scattered read and db file sequential readhave been clarified in 12.2. Search for events where the name is not equal to the display_name, or read Franck’s post on the DBI blog.

Wait events in 11.2.0.3

I like to start comparisons with a benchmark, and 11.2.0.3 seems to be a good candidate. Just looking at the wait_classes and counting events per class should be a good starting point:

SQL> select count(*), wait_class 
  2  from v$event_name 
  3  group by rollup(wait_class) 
  4  order by wait_class;

   COUNT(*) WAIT_CLASS
----------- ----------------------------------------------------------------
         55 Administrative
         17 Application
         50 Cluster
          2 Commit
         33 Concurrency
         24 Configuration
         95 Idle
         35 Network
        745 Other
          9 Queueing
          8 Scheduler
         31 System I/O
         48 User I/O
       1152

14 rows selected.

So there are 1152 events total in 11.2.0.3, keep that number in mind.

Wait events in 12.1.0.2

In my opinion 12.1 is a major step ahead, and I said it many times: I haven’t seen so many fundamental changes to the Oracle database for a long time. For various reasons though 12.1 hasn’t seen the traction in the field it probably deserved. Note how the diagnosability has been greatly enhanced:

SQL> select count(*), wait_class 
  2  from v$event_name 
  3  group by rollup(wait_class) 
  4  order by wait_class;

   COUNT(*) WAIT_CLASS
----------- ----------------------------------------------------------------
         57 Administrative
         17 Application
         64 Cluster
          4 Commit
         38 Concurrency
         26 Configuration
        121 Idle
         28 Network
       1186 Other
          9 Queueing
          9 Scheduler
         35 System I/O
         56 User I/O
       1650

14 rows selected.

A quick calculation reveals that 12.1 features 498 additional events, a lot more than 11.2.0.3. Not too bad in my opinion. Unfortunately most of these additional events ended up in the “Other” wait class. This is a real shame, and I have remarked this before when blogging about the new Data Guard Wait events: they probably should have ended up in the Network class instead. I am sure there are other examples like this.

Wait events in 12.2.0.1

Now what about 12.2? First let’s start with the list:

SQL> select count(*), wait_class
  2  from v$event_name
  3  group by rollup(wait_class)
  4  order by wait_class;

  COUNT(*) WAIT_CLASS
---------- ----------------------------------------------------------------
        57 Administrative
        17 Application
        68 Cluster
         4 Commit
        49 Concurrency
        27 Configuration
       135 Idle
        29 Network
      1314 Other
         9 Queueing
        10 Scheduler
        35 System I/O
        57 User I/O
      1811

14 rows selected.

There are indeed some new events, most of them can be found in the Other wait class. Again, this is quite unfortunate as it prevents the performance architect from identifying unknown wait events quickly.

I have decided to keep this post short-ish and will spend more time some other day to investigate the exact difference between 12.1.0.2 and 12.2.0.1. Most likely after E4 this year.

Session Counters

With the wait interface covered, it’s time to move on to the session counters. Continuing the approach I took with wait events I will group all session counters by class. Instead of re-inventing the wheel I am using a slightly adapted version of Tanel Poder’s “mys.sql” script to group counters by class. Most of them fall in just one, but there are others where more than one class is applicable. The 12.2 Reference Guide explains v$statname.class in more detail.

Session counters in 11.2.0.3

Before investigating 12.1 and 12.2 I’ll look at 11.2.0.3 first, as in the first section of the article. The SQL statement I used is this:

with classification as (
select name, TRIM(
  CASE WHEN BITAND(class,  1) =   1 THEN 'USER  ' END ||
  CASE WHEN BITAND(class,  2) =   2 THEN 'REDO  ' END ||
  CASE WHEN BITAND(class,  4) =   4 THEN 'ENQ   ' END ||
  CASE WHEN BITAND(class,  8) =   8 THEN 'CACHE ' END ||
  CASE WHEN BITAND(class, 16) =  16 THEN 'OS    ' END ||
  CASE WHEN BITAND(class, 32) =  32 THEN 'RAC   ' END ||
  CASE WHEN BITAND(class, 64) =  64 THEN 'SQL   ' END ||
  CASE WHEN BITAND(class,128) = 128 THEN 'DEBUG ' END
) class_name 
from v$statname
) select count(*), class_name from classification 
group by rollup(class_name)
order by class_name;

Executed on an 11.2.0.3 database this reveals the following numbers:

   COUNT(*) CLASS_NAME
----------- ------------------------------------------------
        121 CACHE
         27 CACHE RAC
         15 CACHE SQL
        188 DEBUG
          9 ENQ
         16 OS
         25 RAC
         32 REDO
         93 SQL
          2 SQL   DEBUG
        107 USER
          3 USER  RAC
        638

13 rows selected.

638 of them altogether. Keep the number in mind when moving to 12.1.

Session counters in 12.1.0.2

There was a major increase in the number of counters between 11.2.0.3 and 12.1.0.2. Consider the numbers:

  COUNT(*) CLASS_NAME
---------- ------------------------------------------------
       151 CACHE
        53 CACHE RAC
        15 CACHE SQL
       565 DEBUG
         9 ENQ
        16 OS
        35 RAC
        68 REDO
         1 REDO  RAC
       130 SQL
         2 SQL   DEBUG
       130 USER
         3 USER  RAC
      1178

14 rows selected.

That nearly doubles the number of counters available. Note that quite a few of the new counters fall into the DEBUG section. Looking a bit more closely you can see they seem to be used by the In Memory (IM) Option:

SQL>  SELECT
  2      regexp_substr(name,'\w+ \w+') AS short_name,
  3      COUNT(*)
  4  FROM
  5      v$statname
  6  WHERE
  7      class = 128
  8  GROUP BY
  9      regexp_substr(name,'\w+ \w+')
 10  ORDER BY
 11      2 DESC
 12  FETCH FIRST 5 ROWS ONLY;

SHORT_NAME                        COUNT(*)
------------------------------ -----------
IM repopulate                           49
IM scan                                 47
IM populate                             37
spare statistic                         35
IM space                                26

5 rows selected.

There are 198 session counters beginning with ‘IM %’. I can feel another post about DB In Memory coming …

Session counters in 12.2.0.1

Finally, here is the list of statistics in 12.2.0.1:

  COUNT(*) CLASS_NAME
---------- ------------------------------------------------
       174 CACHE
        73 CACHE RAC
        15 CACHE SQL
      1067 DEBUG
         9 ENQ
        16 OS
        35 RAC
        75 REDO
         1 REDO  RAC
       190 SQL
         2 SQL   DEBUG
       144 USER
         3 USER  RAC
      1804

Another 626 additional counters, that’s almost the number of counters available in total on the 11.2.0.3 system! Running my previous query again it seems that IM-related statistics dominate, but there are lots of others of interest.

As with the 12.2 wait events I don’t want to give too much away at this point (and the post is long enough anyway) so stay tuned for an update at a later time.

Summary

Oracle has been one of the best database engines around, and with 12.2.0.1 instrumentation is further improved. This post has again become too long, so I’ll stop here and defer the write-up of my investigation into those stats relevant for Exadata to a later point.

Happy troubleshooting!