Category Archives: Exadata

Hybrid Columnar Compression in 12.2 – nice new feature

Oracle 12.2 introduced an interesting optimisation for Hybrid Columnar Compression (HCC). Until 12.2 you had to use direct path inserts into HCC compressed segments for data to be actually compressed. If you didn’t use a direct path insert you would still succeed in entering data into the segment, however your newly inserted data was not HCC compressed. There is no error message or other warning telling you about that, which can lead to surprises for the unaware.

My friend and colleague Nick has pointed out that the official HCC white paper states – somewhat hidden – that this requirement is no longer as strict in 12.2. I haven’t managed to find the document Nick pointed out, but a quick search using my favourite retrieval engine unearthed the updated version for 18c.

The new feature is easy to miss: you read on page 3 that data must be loaded using data warehouse bulk loading (direct path) techniques. Examples of said technologies follow before the author continues with a discussion about querying HCC data. For me that was the end of the subject… Turns out it wasn’t: the nugget Nick unearthed was on page 4, in the last paragraph before the next section on “HCC Warehouse (Query) Compression”. Quoting literally from said white paper:

Starting with Oracle Database 12.2, HCC automatically compresses new data from SQL INSERT … SELECT statements, without the APPEND hint and array inserts from programmatic interfaces such as PL/SQL and the Oracle Call Interface (OCI).

Aha! Once you know what to look for you find the same information in the 12.2 new features guide, too. Sometimes it’s hard to see the wood for all those trees.

So there is no more need for the append hint for insert-select in 12.2? This would be great news, and should alleviate some issues people have from unknowingly using HCC without direct path inserts. And it’s good for concurrency! Although I do like white papers, there are times when the white paper has to be checked against the lab to see if it is actually true.

I did exactly this for this blog post.

12.2 Test Case

I am using the SOE.ORDERS table for that purpose, as it holds a fair bit of data. To see whether the new algorithm works I decided to create a new empty table ORDERS_HCC with the same table structure as SOE.ORDERS. In the next step I issue an insert-select statement. If the white paper is correct it’ll compress the data using Query High.

SQL (12.2)> show user    
USER is "SOE"
SQL (12.2)> select banner from v$version where rownum  create table orders_hcc 
  2  column store compress for query high 
  3  as select * from orders where 1 = 0; 
                                                                                                
Table created.

SQL (12.2)> insert into orders_hcc 
  2  select * from orders where rownum  commit;

Commit complete.

SQL (12.2)>

Note that I specifically omitted the /*+ append */ hint in the insert statement. Also note that the preceding CTAS statement didn’t select any rows from the source. In Oracle releases up to and including 12.1, data in ORDERS_HCC would not be compressed at the end of this little code snippet.

But how can you prove the white paper is right? Thankfully Oracle is well instrumented, and even more thankfully that instrumentation not limited to performance. The Oracle-provided dbms_compression package features a procedure called “get_compression_type()”, which allows you to pass it a ROWID and some other information and it’ll tell you the block’s compression algorithm.

Remember that you can change the compression algorithm for a given segment many times over. A partition in a range partitioned table can start uncompressed while being actively used, and as data gets colder, compression levels can be increased. Technically speaking the “alter table … column store” command on its own does not change the way data is stored in the segment. Data currently stored will remain in whatever state it was before changing the compression attribute. Only newly inserted data will be compressed according to the changed segment attribute.

Back to the example: using DBMS_COMPRESSION I would like to find out if my table data is indeed compressed for Query High after my earlier insert command. Let’s see if it is, using the first 10 rows as a rough indication.

SQL (12.2)> select dbms_compression.get_compression_type(user, 'ORDERS_HCC', rowid) ctype
  2  from ORDERS_HCC where rownum < 11;

     CTYPE
----------
	 4
	 4
	 4
	 4
	 4
	 4
	 4
	 4
	 4
	 4

10 rows selected.

Well it would appear as if these are all compressed for Query High (QH). Looking at the package definition I can see that a compression type of 4 indicates Query High.

So far so good, but I merely checked 10 out of 1 million rows. I’m fairly sure the rest of the table is also HCC compressed for QH, but I want proof. To remove any doubts, I can query the whole table. I’m doing this so you don’t have to. The next query will take forever (eg certainly more than 1 minute) to execute, and it is CPU bound so please don’t do this at work. If you really feel like having to run a query like this, don’t do it outside the lab. You have been warned :)

SQL (12.2)> with comptypes as (
  2   select rownum rn, dbms_compression.get_compression_type(user, 'ORDERS_HCC', rowid) ctype
  3     from ORDERS_HCC
  4  )
  5      select count(*), ctype from comptypes group by ctype;

  COUNT(*)      CTYPE
---------- ----------
   1000000          4

I always start these types of queries in a screen (1) session to prevent network connection issues from interrupting my long running task. After some time, the query returns with the results as you can see. The entire table is compressed with Query High.

Summary

Array-inserts into HCC segments can compress data in Oracle 12.2 even if you don’t specify the append hint. The behaviour for conventional inserts did not change. I am going to post another piece of research containing some more data later this week or next.

Advertisement

Taming XML data in the v$cell% family of Exadata views

While researching Exadata 12.2 features that have been introduced with Oracle 12.2.1.1.0 I ran into an old problem of mine, querying the v$cell% views that are new with Exadata 12.1.1.1.0 and Oracle RDBSM 12c Release 1 in a meaningful way. In case you haven’t seen them yet, these views expose cell metrics and other information in the database without having to actually connect to the cells and execute cellcli commands. They are very useful, and form the basis of the latter part of the Exadata AWR reports.

On our 12.2 Exadata system (using cellos 12.2.1.1.0 and RDBMS 12.2.0.1.0) the following views are exposed to the curious researcher:

SQL> select table_name from dict 
  where table_name like 'V%CELL%' 
  order by 1;
                                                  
TABLE_NAME
-------------------------------------
V$CELL
V$CELL_CONFIG
V$CELL_CONFIG_INFO
V$CELL_DB
V$CELL_DB_HISTORY
V$CELL_DISK
V$CELL_DISK_HISTORY
V$CELL_GLOBAL
V$CELL_GLOBAL_HISTORY
V$CELL_IOREASON
V$CELL_IOREASON_NAME
V$CELL_METRIC_DESC
V$CELL_OFL_THREAD_HISTORY
V$CELL_OPEN_ALERTS
V$CELL_REQUEST_TOTALS
V$CELL_STATE
V$CELL_THREAD_HISTORY

17 rows selected.

There is a lot of useful info to be found in these, and most of the views are explained in Expert Oracle Exadata (2nd edition), chapter 11. I don’t always have a copy of the book at hand which is why I am putting this post together. To add some value, I am going to try and explain how I get to the results as well :)

Querying the views

For example if you are interested in your cell’s configuration, v$cell_config can be queried.

SQL> describe v$cell_config

 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 CELLNAME                                           VARCHAR2(1024)
 CELLHASH                                           NUMBER
 CONFTYPE                                           VARCHAR2(15)
 CONFVAL                                            CLOB
 CON_ID                                             NUMBER

SQL>

This view is a good representation of the way most of them are set up. There is a cell identifier (CELLNAME in the example), a configuration type (CONFTYPE) and then actual data stored as XML (stored as CONFVAL). Instead of storing the XML information as an XMLTYPE, a CLOB is more commonly found. When getting data from the CLOB it is often difficult to read, which is why I usually cast it to an XMLTYPE which formats nicer in SQLcl and SQLPlus.

A small example of getting information from this view is shown here. I am interested in the IORM plan for a specific cell:

SQL> select xmltype(confval) from v$cell_config 
  where cellname = '192.168.10.9' 
    and conftype = 'IORM';

XMLTYPE(CONFVAL)
--------------------------------------------------------------------------------
<?xml version="1.0" encoding="US-ASCII"?>
<cli-output>
  <version>1.0</version>
  <timestamp>1498120597819</timestamp>
  <context cell="someCell"/>
  <iormplan>
    <name>someCell_IORMPLAN</name>
    <catPlan/>
    <dbPlan/>
    <objective>basic</objective>
    <status>active</status>
  </iormplan>
</cli-output>

SQL>          

One alternative way to do so is to query the cell directly, while logged in:

[root@someCell ~]# cellcli -e list iormplan detail
         name:                   someCell_IORMPLAN
         catPlan:                
         dbPlan:                 
         objective:              basic
         status:                 active
[root@someCell ~]#

Except that cells are most often not accessible to DBAs.

Transforming XML

Just looking at the XML query output is good enough for small chunks of XML, however when investigating a larger XML document this quickly becomes impractical.

Back to the motivation for writing this post: a very useful view I wanted to query for information about statistics on columnar cache is named v$cell_state. I have used the Smart-Scan Deep Dive blog as a source. It features an example on how to extract a specific statistic from the XML data. I have extended the example to translate all statistics pertaining to the columnar cache into the more familiar format.

The XML in question is rather deeply nested. I have shortened the other output to give you an idea of the data I care about:

<cell_stats>
  <stats type="scheduler">...</stats>
  <stats type="network_directory">...</stats>
  <stats type="cache">...</stats>
  <stats type="MemCache">...</stats>
  <stats type="flashcache">
    <stats type="columnarcache">
      <stat name="columnar_cache_size">26673152</stat>
      <stat name="columnar_cache_keep_size">0</stat>
      <stat name="columnar_cache_keep_read">0</stat>
  ...

I need all the information in the ‘<stats type=”columnarcache”>’ part, lines 8 and following. The most useful way to translate XML to what looks like a relational table I found is based on a rather complex function, named XMLTABLE(). It is explained in the XML DB Developer’s Guide, chapter 4 XQuery and Oracle XML DB. Using that function I can translate XML into a relational format.

It is probably easiest to show you the query and then walk you through. According to Oracle’s blog post I can find the information about the columnar cache in v$cell_state. I don’t have to worry about the different statistics_type values because the filtering of output will be done on the XML using what is called an XPATH expression. I would like to check a single cell only, to demonstrate that it is possible to combine the output of XMLTABLE with other views/tables and query them using “regular” SQL:

SQL> select b.*
  from v$cell_state a,
  xmltable (
      '/cell_stats/stats/stats[@type="columnarcache"]/stat'
      passing xmltype(a.statistics_value)
      columns
          name   varchar2(50) path '@name',
          value  number       path '.'
  ) b
  where a.cell_name = '192.168.10.9'
  and b.value <> 0
 order by b.value;

Unsurprisingly I need to pass the XML data to the XMLTABLE in order to process it further. The XML data is found in v$cell_state.statistics_value. I need to access the view and pass the relevant column to the XMLTABLE function’s “passing” clause. An XMLTYPE is expected here, which is why I have to convert the LOB.

The XMLTABLE function should only process data that belongs to the nodes below ‘<stats type=”columnarcache”>’. An XPATH expression helps me defining the scope. This is actually where I spent most of the time because writing correct XMLPATHs doesn’t come naturally to me. Have a look at Wikipedia if XPATH is completely alien to you.

Getting the XPATH right can be challenging if you don’t do it every day. I learned from Marco Gralike who did something very similar to my investigation that it is possible to return the pruned XML tree structure as an XMLTYPE. I wasn’t quite sure for example which part of the XML document I wanted to use for the XMLTABLE’s columns clause. I went through a couple of iterations to find “my” subset of data. Here is one way to do this according to Marco’s blog post:

SQL> select b.cc
  from v$cell_state a,
  xmltable (
      '/cell_stats/stats/stats[@type="columnarcache"]'
      passing xmltype(a.statistics_value)
      columns cc XMLTYPE PATH '*'
  ) b
  where a.cell_name = '192.168.10.9';

You will get XML back (the column returned by XMLTABLE is cast to XMLTYPE in line 6), and then use that as the basis for further analysis.

Once the XML returned contains all the data you need, you can continue with writing the XMLTABLE’s columns clause to extract information from the XML document. That is where I ran into the next problem: I had syntax issues trying to reference the “self” node (which is a dot as you see in the full example in line 8) which is why I experimented a bit. It turned out I had to specify the column name, data type, and path before referencing the element.

The tricky bit working with the XML data is finding the correct XPATH, at least for me, because I am not an XML developer. I found it useful to start in the row containing the information I want and then work my way up the tree until I reach the root element. Playing around a bit more I managed to shorten my XPATH to just this one:

'//stats[@type="columnarcache"]/stat'

I still prefer the “complete” path expression because it gives me a clue where in the DOM model that information is stored.

Wrapping Up

If you want the complete picture for your entire Exadata cluster, simply add the cell name to the query:

SQL> select a.cell_name,b.name, b.value
from v$cell_state a,
xmltable (
    '/cell_stats/stats/stats[@type="columnarcache"]/stat'
    passing xmltype(a.statistics_value)
    columns 
        name   varchar2(50) path '@name',
        value  number       path '.'
) b
where b.value <> 0
order by a.cell_name,b.value;

Please have a look at Marco’s post where he explains a more complex scenario of nesting XMLTABLE functions, depending on what information you are after.

Hope this helps!

DOAG Exaday 2017: a preview

I am quite excited to have been accepted to speak at the upcoming DOAG Exaday taking place June 20 in Frankfurt. It is the third time I have been lucky enough to present at the event, having spoken both in Frankfurt in Hamburg in previous years.

As you can probably imagine by looking at this weblog, I am a great supporter of the Oracle Engineered Systems community. My interest is not limited to Germany: I still remember the many great meetings in London in the context of UKOUG’s Exadays. I am also very lucky to work for @enkitec where we have been sharing real-world experience at E4 for years and will do so again in June this year.

Unfortunately interest in the UKOUG’s Systems Event has dwindled a bit recently which lead me to focus on the Exadata in Germany this year. I am going to speak – in German – about how you can make your migrations to Exadata easier. This is a talk I have delivered a few times before but leaped at the opportunity to update the material to release 12.2.

I am very glad to see my colleague @fritshoogland and many other great speakers too numerous to mention at the event. If you are an engineer working on Oracle Exadata and its cousins this is definitely the event to attend.

A quick look at Oracle 12.2 performance instrumentation

Thanks to the prep-work done by Andy Colvin, who continuously and boldly goes where no man has gone before, one of our Exadata systems in the lab is now fully upgraded to 12.2.0.1. It comes fully equipped with the matching cellos to support all the cool new features. Exciting times!

The reason for this post is simple: I have started working on our talk for @Enkitec’s E4 conference in June but thought I’d share this little article with you as a teaser :) There might be one or two more of these posts but if you want the full story make sure you catch us (online) during the conference.

More diagnostic information in 12.2

The Oracle database truly champions diagnosability in a way I have not seen with any other system, and it does so out of the box. Granted, some of that requires an investment into extra cost options but the value one gets in form of Active Session History (ASH) and Automatic Workload Repository (AWR) is real. After I read the chapter on instrumentation in “Insights-tales from the Oaktable” (Apress) a long time ago, I started to realise the inherent beauty of having insights to code. This code can be your code if you instrumented it properly, or the Oracle codepath externalised as wait events. Sadly most application developers do not adhere to the principle of instrumenting code (or maybe don’t even know about the possibility?) and therefore complicate troubleshooting unnecessarily. The latter is not so much an issue on many platforms where you don’t have an equivalent of the Oracle Wait Interface and session counters anyway, but on Oracle it’s a real wasted opportunity as others have pointed out before me.

I’ll now take my idealist hat off and focus on the real world :) In the far more common case where the application isn’t instrumented by the developer, you can still get to some conclusions by using the Wait Interface and session counters. In most scenarios I am involved in the first step is to use Tanel Poder’s session snapper script which gives me insights to both.

So what has changed in this respect with 12.2? The test environment I am using is an Exadata quarter rack as mentioned before. The findings should be comparable with other Oracle software offerings, in the cloud and on premises.

Wait Interface

The Wait Interface is one of the most useful features for the performance engineer, and one of the building blocks for Statspack, ASH and AWR. I was curious if new events have been introduced in 12.2, hence this investigation. The sheer number of events tracked in the database prevents them from being listed verbally in this post. If you want to, you can use Tanel’s “sed.sql” to find out more, or simply query v$event_name.

An interesting tidbit for 12.2 has been covered by @FranckPachot: some of the more misleading event names such as db file scattered read and db file sequential readhave been clarified in 12.2. Search for events where the name is not equal to the display_name, or read Franck’s post on the DBI blog.

Wait events in 11.2.0.3

I like to start comparisons with a benchmark, and 11.2.0.3 seems to be a good candidate. Just looking at the wait_classes and counting events per class should be a good starting point:

SQL> select count(*), wait_class 
  2  from v$event_name 
  3  group by rollup(wait_class) 
  4  order by wait_class;

   COUNT(*) WAIT_CLASS
----------- ----------------------------------------------------------------
         55 Administrative
         17 Application
         50 Cluster
          2 Commit
         33 Concurrency
         24 Configuration
         95 Idle
         35 Network
        745 Other
          9 Queueing
          8 Scheduler
         31 System I/O
         48 User I/O
       1152

14 rows selected.

So there are 1152 events total in 11.2.0.3, keep that number in mind.

Wait events in 12.1.0.2

In my opinion 12.1 is a major step ahead, and I said it many times: I haven’t seen so many fundamental changes to the Oracle database for a long time. For various reasons though 12.1 hasn’t seen the traction in the field it probably deserved. Note how the diagnosability has been greatly enhanced:

SQL> select count(*), wait_class 
  2  from v$event_name 
  3  group by rollup(wait_class) 
  4  order by wait_class;

   COUNT(*) WAIT_CLASS
----------- ----------------------------------------------------------------
         57 Administrative
         17 Application
         64 Cluster
          4 Commit
         38 Concurrency
         26 Configuration
        121 Idle
         28 Network
       1186 Other
          9 Queueing
          9 Scheduler
         35 System I/O
         56 User I/O
       1650

14 rows selected.

A quick calculation reveals that 12.1 features 498 additional events, a lot more than 11.2.0.3. Not too bad in my opinion. Unfortunately most of these additional events ended up in the “Other” wait class. This is a real shame, and I have remarked this before when blogging about the new Data Guard Wait events: they probably should have ended up in the Network class instead. I am sure there are other examples like this.

Wait events in 12.2.0.1

Now what about 12.2? First let’s start with the list:

SQL> select count(*), wait_class
  2  from v$event_name
  3  group by rollup(wait_class)
  4  order by wait_class;

  COUNT(*) WAIT_CLASS
---------- ----------------------------------------------------------------
        57 Administrative
        17 Application
        68 Cluster
         4 Commit
        49 Concurrency
        27 Configuration
       135 Idle
        29 Network
      1314 Other
         9 Queueing
        10 Scheduler
        35 System I/O
        57 User I/O
      1811

14 rows selected.

There are indeed some new events, most of them can be found in the Other wait class. Again, this is quite unfortunate as it prevents the performance architect from identifying unknown wait events quickly.

I have decided to keep this post short-ish and will spend more time some other day to investigate the exact difference between 12.1.0.2 and 12.2.0.1. Most likely after E4 this year.

Session Counters

With the wait interface covered, it’s time to move on to the session counters. Continuing the approach I took with wait events I will group all session counters by class. Instead of re-inventing the wheel I am using a slightly adapted version of Tanel Poder’s “mys.sql” script to group counters by class. Most of them fall in just one, but there are others where more than one class is applicable. The 12.2 Reference Guide explains v$statname.class in more detail.

Session counters in 11.2.0.3

Before investigating 12.1 and 12.2 I’ll look at 11.2.0.3 first, as in the first section of the article. The SQL statement I used is this:

with classification as (
select name, TRIM(
  CASE WHEN BITAND(class,  1) =   1 THEN 'USER  ' END ||
  CASE WHEN BITAND(class,  2) =   2 THEN 'REDO  ' END ||
  CASE WHEN BITAND(class,  4) =   4 THEN 'ENQ   ' END ||
  CASE WHEN BITAND(class,  8) =   8 THEN 'CACHE ' END ||
  CASE WHEN BITAND(class, 16) =  16 THEN 'OS    ' END ||
  CASE WHEN BITAND(class, 32) =  32 THEN 'RAC   ' END ||
  CASE WHEN BITAND(class, 64) =  64 THEN 'SQL   ' END ||
  CASE WHEN BITAND(class,128) = 128 THEN 'DEBUG ' END
) class_name 
from v$statname
) select count(*), class_name from classification 
group by rollup(class_name)
order by class_name;

Executed on an 11.2.0.3 database this reveals the following numbers:

   COUNT(*) CLASS_NAME
----------- ------------------------------------------------
        121 CACHE
         27 CACHE RAC
         15 CACHE SQL
        188 DEBUG
          9 ENQ
         16 OS
         25 RAC
         32 REDO
         93 SQL
          2 SQL   DEBUG
        107 USER
          3 USER  RAC
        638

13 rows selected.

638 of them altogether. Keep the number in mind when moving to 12.1.

Session counters in 12.1.0.2

There was a major increase in the number of counters between 11.2.0.3 and 12.1.0.2. Consider the numbers:

  COUNT(*) CLASS_NAME
---------- ------------------------------------------------
       151 CACHE
        53 CACHE RAC
        15 CACHE SQL
       565 DEBUG
         9 ENQ
        16 OS
        35 RAC
        68 REDO
         1 REDO  RAC
       130 SQL
         2 SQL   DEBUG
       130 USER
         3 USER  RAC
      1178

14 rows selected.

That nearly doubles the number of counters available. Note that quite a few of the new counters fall into the DEBUG section. Looking a bit more closely you can see they seem to be used by the In Memory (IM) Option:

SQL>  SELECT
  2      regexp_substr(name,'\w+ \w+') AS short_name,
  3      COUNT(*)
  4  FROM
  5      v$statname
  6  WHERE
  7      class = 128
  8  GROUP BY
  9      regexp_substr(name,'\w+ \w+')
 10  ORDER BY
 11      2 DESC
 12  FETCH FIRST 5 ROWS ONLY;

SHORT_NAME                        COUNT(*)
------------------------------ -----------
IM repopulate                           49
IM scan                                 47
IM populate                             37
spare statistic                         35
IM space                                26

5 rows selected.

There are 198 session counters beginning with ‘IM %’. I can feel another post about DB In Memory coming …

Session counters in 12.2.0.1

Finally, here is the list of statistics in 12.2.0.1:

  COUNT(*) CLASS_NAME
---------- ------------------------------------------------
       174 CACHE
        73 CACHE RAC
        15 CACHE SQL
      1067 DEBUG
         9 ENQ
        16 OS
        35 RAC
        75 REDO
         1 REDO  RAC
       190 SQL
         2 SQL   DEBUG
       144 USER
         3 USER  RAC
      1804

Another 626 additional counters, that’s almost the number of counters available in total on the 11.2.0.3 system! Running my previous query again it seems that IM-related statistics dominate, but there are lots of others of interest.

As with the 12.2 wait events I don’t want to give too much away at this point (and the post is long enough anyway) so stay tuned for an update at a later time.

Summary

Oracle has been one of the best database engines around, and with 12.2.0.1 instrumentation is further improved. This post has again become too long, so I’ll stop here and defer the write-up of my investigation into those stats relevant for Exadata to a later point.

Happy troubleshooting!

GTT and Smart Scan – the importance of the append hint

While writing the previous two posts about GTTs and Smart Scan, I stumbled across an interesting observation. When I started putting my thoughts to (virtual) paper, I thought it was going to be a short post. Well, it wasn’t to be, again. Anyway, you might find this interesting.

Initial Situation

If you read the previous posts this code example I used to populate the GTT might look familiar:

insert /*+ append */ into gtt select * from t4 where rownum < 400000;
 
commit;

In situations like this where I’m moving data I developed a habit of using the append hint. I guess I’ve been working on HCC tables a bit too much and therefore didn’t even make a conscious decision to use that hint. It was the right thing to do, as you will see next.

The full test harness is shown here again for reference:

SQL> !cat test.sql
set lines 120 tab off trimspool on verify off timing on

-- this statement is using the /*+ append */ hint
insert /*+ append using_append */ into gtt select * from t4 where rownum < 400000;

commit;

-- not testing if I can get Smart Scans, I know there are :)
alter session set "_serial_direct_read" = always;

-- make sure the query can be ever so slightly more complex and always return rows
col min_id new_v min_id
col max_id new_v max_id

select min(id) min_id from gtt;
select max(id) max_id from gtt;

-- this is the actual test
@mystats start s=s
set echo on
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
set echo off
@mystats stop t=1

SQL>

When executing the script and looking at the output I didn’t see anything out of the ordinary. When checking for the “depth” of the Smart Scan, I regularly consult the cell blocks processed by % layer statistics to see if I lose blocks between code layers. It wouldn’t appear so:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
...
STAT    cell IO uncompressed bytes                                             546,136,064
STAT    cell blocks processed by cache layer                                        66,667
STAT    cell blocks processed by data layer                                         66,667
STAT    cell blocks processed by txn layer                                          66,667
...
STAT    cell physical IO bytes eligible for predicate offload                  546,136,064
STAT    cell physical IO interconnect bytes                                        112,504
STAT    cell physical IO interconnect bytes returned by smart scan                  96,120
...
STAT    physical read IO requests                                                      523
STAT    physical read bytes                                                    546,152,448
STAT    physical read requests optimized                                               317
STAT    physical read total IO requests                                                523
STAT    physical read total bytes                                              546,152,448
STAT    physical read total bytes optimized                                    330,153,984
STAT    physical read total multi block requests                                       521
STAT    physical reads                                                              66,669
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,667
...
STAT    table scans (direct read)                                                        1
STAT    table scans (long tables)                                                        1

66,667 blocks in each code layer. This looks ok to me: the number of all physical reads recorded for this statement is 66,669. I certainly don’t want to endorse tuning by ratio, but in this case it sort-of makes sense: the offload efficiency pretty high in my case:

SQL> @fsx.sql
Enter value for sql_text:
Enter value for sql_id: a4jrkbnsfgk1j

SQL_ID         CHILD  PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD IO_SAVED_% SQL_TEXT
------------- ------ ---------- ------ ---------- ------ ------- ---------- ----------------------------------------------------------------------
a4jrkbnsfgk1j      0 2363333961      1        .01      0 Yes         100.00 select /* using_append */ count(*), id from gtt where id in (        7

That should suffice for a recap of the previous posts.

No append hint-different outcome

Now imagine my surprise when I didn’t execute a script but rather typed the insert command in an interactive session. In the next example, created to reproduce the observation, I “forgot” to specify the append hint:

SQL> !cat test2.sql
-- not using the append hint
insert /* not_using_append */ into gtt select * from t4 where rownum < 400000;

commit;

-- not testing if I can get Smart Scans, I know there are :)
alter session set "_serial_direct_read" = always;

-- make sure the query can be ever so slightly more complex and always return rows
col min_id new_v min_id
col max_id new_v max_id

select min(id) min_id from gtt;
select max(id) max_id from gtt;

-- this is the actual test
@mystats start s=s
set echo on
select /* not_using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
set echo off
@mystats stop t=1

The difference between test.sql and test2.sql looks harmless enough – just a hint, or lack thereof. However the result is interesting. Most executions of the above script resulted in Oracle reporting the following statistics:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
...
STAT    calls to get snapshot scn: kcmgss                                              462
STAT    calls to kcmgcs                                                                 28
STAT    cell IO uncompressed bytes                                             544,800,768
STAT    cell blocks processed by cache layer                                        67,026
STAT    cell blocks processed by data layer                                              1
STAT    cell blocks processed by txn layer                                               1
STAT    cell commit cache queries                                                   67,025
STAT    cell flash cache read hits                                                     633
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  544,800,768
STAT    cell physical IO interconnect bytes                                    545,000,688
STAT    cell physical IO interconnect bytes returned by smart scan             544,984,304
STAT    cell scans                                                                       1
STAT    cleanout - number of ktugct calls                                           66,503
STAT    cleanouts only - consistent read gets                                       66,503
STAT    commit txn count during cleanout                                            66,503
...
STAT    consistent gets                                                            133,250
STAT    consistent gets direct                                                      66,504
STAT    consistent gets examination                                                 66,506
STAT    consistent gets examination (fastpath)                                      66,506
...
STAT    immediate (CR) block cleanout applications                                  66,503
...
STAT    physical read IO requests                                                    1,044
STAT    physical read bytes                                                    544,817,152
STAT    physical read requests optimized                                               633
STAT    physical read total IO requests                                              1,044
STAT    physical read total bytes                                              544,817,152
STAT    physical read total bytes optimized                                    329,859,072
STAT    physical read total multi block requests                                       748
STAT    physical reads                                                              66,506
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,504
...

Huh? I was clearly surprised to see so many rows not making it to the higher layers (txn and data) after having been read by the cache layer. Further executions of the select statement within the same session do not necessarily improve this significantly, but I noticed savings by storage index to come into play.

Speaking more scientifically, the listing immediately above shows that out of 67,026 blocks that were read by the cache layer only 1 (!) was processed by txn and data layers. As a direct result Smart Scan does not offer any real benefit: cell physical IO interconnect bytes returned by smart scan is actually ≥ cell physical IO bytes eligible for predicate offload.

Why? What?

There were a few red herrings in the statistic counters I saw:

  • Cleanout statistics have been reported by mystats
    • commit txn count during cleanout
    • cleanouts only – consistent read gets
    • immediate (CR) block cleanout applications
    • etc.
  • There are consistent gets examination that might be peeks at undo information
  • Lots of queries to the commit cache (cell commit cache queries) – but for most of my tests I didn’t see replies. Only 1 out of maybe 20 tests produced some cell transactions found in commit cache, but even then their number was not significant. I will provide an example towards the end of this post.

Further investigation

So this leads me to suspect that the blocks that were created with the append hint are “clean”, e.g. suitable for a consistent read on the cells without further need for processing. It would appear so, judging by this block dump (dump the block from the tempfile, not datafile!):

Start dump data blocks tsn: 3 file#:1 minblk 2305025 maxblk 2305025
...
Block dump from disk:
buffer tsn: 3 rdba: 0x00632c01 (1/2305025)
scn: 0x0.64aaeb9f seq: 0x01 flg: 0x0c tail: 0xeb9f0601
frmt: 0x02 chkval: 0xd11e type: 0x06=trans data
...
Block header dump:  0x00632c01
 Object id on Block? Y
 seg/obj: 0x632c00  csc: 0x00.64aaeb9e  itc: 3  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.019.0012beca  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x00632c01
data_block_dump,data header at 0x7f6250501074
===============
tsiz: 0x1f88
hsiz: 0x1e
pbl: 0x7f6250501074
     76543210
flag=--------
ntab=1
nrow=6
frre=-1
fsbo=0x1e
fseo=0x746
avsp=0x728
tosp=0x728
0xe:pti[0]      nrow=6  offs=0
0x12:pri[0]     offs=0x1b7d
0x14:pri[1]     offs=0x1772
0x16:pri[2]     offs=0x1367
0x18:pri[3]     offs=0xf5c
0x1a:pri[4]     offs=0xb51
0x1c:pri[5]     offs=0x746
block_row_dump:
tab 0, row 0, @0x1b7d
tl: 1035 fb: --H-FL-- lb: 0x0  cc: 6
col  0: [ 2]  c1 50
col  1: [999]
 37 39 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
...
tl: 1035 fb: --H-FL-- lb: 0x0  cc: 6
...
tl: 1035 fb: --H-FL-- lb: 0x0  cc: 6
...

This is in contrast to the blocks I dumped that have been created without the append hint. To make sure I’m comparing apples with apples these blocks have been created in a different session after having disconnected from the previous one:

Start dump data blocks tsn: 3 file#:1 minblk 2305025 maxblk 2305025
...
Block dump from disk:
buffer tsn: 3 rdba: 0x00632c01 (1/2305025)
scn: 0x0.64aab6bf seq: 0x00 flg: 0x0c tail: 0xb6bf0600
frmt: 0x02 chkval: 0x478e type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
...
Block header dump:  0x00632c01
 Object id on Block? Y
 seg/obj: 0x629300  csc: 0x00.64aab6bf  itc: 2  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.01b.0012be5d  0x00c04427.d97f.27  ----    6  fsc 0x0000.00000000
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x00632c01
data_block_dump,data header at 0x7f53b75e125c
===============
tsiz: 0x1fa0
hsiz: 0x1e
pbl: 0x7f53b75e125c
     76543210
flag=--------
ntab=1
nrow=6
frre=-1
fsbo=0x1e
fseo=0x752
avsp=0x734
tosp=0x734
0xe:pti[0]      nrow=6  offs=0
0x12:pri[0]     offs=0xf6c
0x14:pri[1]     offs=0x1379
0x16:pri[2]     offs=0x1786
0x18:pri[3]     offs=0x1b93
0x1a:pri[4]     offs=0x752
0x1c:pri[5]     offs=0xb5f
block_row_dump:
tab 0, row 0, @0xf6c
tl: 1037 fb: --H-FL-- lb: 0x1  cc: 6
col  0: [ 4]  c3 0f 05 50
col  1: [999]
 31 34 30 34 37 39 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
...
tl: 1037 fb: --H-FL-- lb: 0x1  cc: 6
...
tl: 1037 fb: --H-FL-- lb: 0x1  cc: 6
...

This supports the theory that the block I created without the append hint features some information that prevents it from being passed from cache to txn layer. I didn’t see active txn count during cleanout though, but commit txn count during cleanout and immediate (CR) block cleanout applications instead. So cleanout is definitely needed and performed.

I stated in the second bullet point that commit cache queries were not producing cache-hits most of the time. I managed to get an execution of the select statement against the blocks created without append hint, where at least some commit cache queries produced hits:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    cell IO uncompressed bytes                                             544,800,768
STAT    cell blocks helped by commit cache                                           6,981
STAT    cell blocks processed by cache layer                                        66,969
STAT    cell blocks processed by data layer                                          6,982
STAT    cell blocks processed by txn layer                                           6,982
STAT    cell commit cache queries                                                   66,968
STAT    cell flash cache read hits                                                     576
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  544,800,768
STAT    cell physical IO interconnect bytes                                    487,801,976
STAT    cell physical IO interconnect bytes returned by smart scan             487,785,592
STAT    cell scans                                                                       1
STAT    cell transactions found in commit cache                                      6,981
STAT    cleanout - number of ktugct calls                                           59,522
STAT    cleanouts only - consistent read gets                                       59,522
STAT    commit txn count during cleanout                                            59,522
...
STAT    physical read IO requests                                                      987
STAT    physical read bytes                                                    544,817,152
STAT    physical read requests optimized                                               576
STAT    physical read total IO requests                                                987
STAT    physical read total bytes                                              544,817,152
STAT    physical read total bytes optimized                                    329,859,072
STAT    physical read total multi block requests                                       723
STAT    physical reads                                                              66,506
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,504

I couldn’t reproduce this at will though, and it wasn’t for lack of trying.

Summary

I am not entirely sure what causes the observed behaviour, especially with regards to the commit cache and/or minscn optimisation but it evidently seems to be better to populate a GTT with the append hint if you want to make best use of Smart Scans against such segments. A very quick and dirty test against a heap table suggests that this is true for this segment types as well.

Smart Scanning GTTs – what about Flash Cache?

Continuing the example of the previous blog post (is it possible to Smart Scan GTTs?), I am investigating if Smart Scans against GTTs can benefit from Flash Cache, or “optimised reads”. The easiest way to see this in your session is to calculate the change in session counters based on v$mystat or v$sesstat. Other tools might also provide information about the amount of data read from Flash Cache, but there are quite often extra licenses required …

Calculating the delta in session counters can be done in many ways, for example using Tanel’s session snapper. For clearly defined test cases where I can control beginning and end of the execution I prefer to use another great script for the purpose. My thanks go to Adrian Billington for providing the Oracle community with the most useful “mystats” package which I’m using here. You can – and should – get it from oracle-developer.net.

Let’s check I can get flash cache read hits on GTT scans – which would help performance of repeated queries against the segment. And since GTTs reside on TEMP, I had a few doubts whether flash cache read hits were possible.

What better than to test?

Testing helps removing ambiguity (and documenting the test result helps me remember things!), so here’s the code used:

set lines 120 tab off trimspool on verify off timing on

-- it will be crucially important to use the /*+ append */ hint 
-- as you will see in the next post about GTTs on Exadata
insert /*+ append */ into gtt select * from t4 where rownum < 400000;

commit;

-- not testing if I can get Smart Scans, I know there are :)
alter session set "_serial_direct_read" = always;

-- make sure the query can be ever so slightly more complex and always return rows
col min_id new_v min_id
col max_id new_v max_id

select min(id) min_id from gtt;
select max(id) max_id from gtt;

-- hoping to trigger flash cache population here
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;

-- and measuring effect, if any

@mystats start s=s

set echo on
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
set echo off

@mystats stop t=1

Have a look at the code in mystats.sql to see what the options mean. Essentially I’m asking it to record session statistics only (start s=s) and after the execution list only those counters that have changed (t=1 where t is for threshold). I hacked the script a little to order by statistic name rather than the default.

The observation

It appears I can get Flash Cache hits. The relevant statistic name is “cell flash cache read hits”, and the fact that physical IO was optimised can also be seen in “physical read requests optimized” and “physical read total bytes optimized” amongst others. Here is the relevant output from mystats:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
...
STAT    cell blocks processed by cache layer                                        66,667
STAT    cell blocks processed by data layer                                         66,667
STAT    cell blocks processed by txn layer                                          66,667
STAT    cell flash cache read hits                                                     315
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  546,136,064
STAT    cell physical IO interconnect bytes                                        112,504
STAT    cell physical IO interconnect bytes returned by smart scan                  96,120
STAT    cell scans                                                                       1
...
STAT    physical read IO requests                                                      523
STAT    physical read bytes                                                    546,152,448
STAT    physical read requests optimized                                               315
STAT    physical read total IO requests                                                523
STAT    physical read total bytes                                              546,152,448
STAT    physical read total bytes optimized                                    329,252,864
STAT    physical read total multi block requests                                       521
STAT    physical reads                                                              66,669
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,667
...

I usually check the cell blocks processed by% layer first to get an idea about the “depth” of the Smart Scan-losing blocks in transaction and data layers wouldn’t be great. There’s been 1 Smart Scan against my segment (the GTT isn’t partitioned) recorded in “cell scans”, so that covers that aspect as well.

Out of the 523 “physical read IO requests” 315 were optimised so that’s a pretty good ratio of approximately 60% of the IO requests coming from Flash Cache. The other statistics represent the same ratio in bytes rather than I/O requests.

Little things worth knowing: Can I Smart Scan a Global Temporary Table?

A question that has come up while preparing to deliver another Exadata Optimisation Class is whether you can offload queries against Global Temporary Tables (GTT) or not. There are subtle differences between heap tables and GTTs such as how statistics are handled and where they are stored that merit a little investigation.

I have used the lab environment on the X3-2 (Exadata 12.1.2.3.0, 12.1.0.2.160419 RDBMS) to test if I can Smart Scan GTTs. There are some enhancements in 12c regarding GTTs that are mentioned on the Optimiser blog here:

https://blogs.oracle.com/optimizer/entry/gtts_and_upgrading_to_oracle

The Setup

I have used a brand new, 12c non-CDB for these tests. The SGA_TARGET is deliberately set to something really small so as to provoke Smart Scans without having to resort to setting _serial_direct_read to “always” in my session.

First of all, I needed a GTT for this purpose. I have a few scripts that create large tables for testing, and T4 will serve as my basis. Here is my setup:

SQL> create global temporary table gtt on commit preserve rows 
  2  as select * from t4 where 1 = 0;

Table created.

SQL> alter session enable parallel dml;

Session altered.

SQL> insert /*+ parallel(4) */ into gtt
  2  select /*+ parallel(4) */ * from t4
  3    where rownum < 400000;

399999 rows created.

SQL> commit

Commit complete.

This might be the time to point out that the GTT has two types of statistics in 12c – shared and session. Again, please refer to the Optimiser blog post for more information about private session statistics. I first looked at dba_tables for num_rows etc but found the column empty. Using a query I found on oracle-base.com in Tim’s “Session-Private Statistics for Global Temporary Tables in Oracle Database 12c Release 1” article I detected some statistics in the end:

SQL> select table_name, object_type, num_rows, scope
  2  from user_tab_statistics where table_name = 'GTT';

TABLE_NAME                     OBJECT_TYPE    NUM_ROWS SCOPE
------------------------------ ------------ ---------- -------
GTT                            TABLE                   SHARED
GTT                            TABLE            399999 SESSION

It’s a global optimiser setting where you gather session level statistics by default.

SQL> SELECT DBMS_STATS.get_prefs('GLOBAL_TEMP_TABLE_STATS') FROM dual;

DBMS_STATS.GET_PREFS('GLOBAL_TEMP_TABLE_STATS')
--------------------------------------------------
SESSION

The Testcase

With the table set up it is now time to test the Smart Scan. In addition to a SQL trace I request a NSMTIO trace to understand Oracle’s direct path read decision better.

SQL> alter session set events 'sql_trace level 8';

Session altered.

SQL> alter session set events 'trace[NSMTIO] disk=highest';

Session altered.

SQL> select count(*), id from gtt where id in (9244048,2529293) group by id;

  COUNT(*)         ID
---------- ----------
         1    2529293
         1    9244048

SQL> exit

Looking at the trace file I can see the following output for this particular statement:

=====================
PARSING IN CURSOR #140137295360240 len=70 dep=0 uid=78 oct=3 lid=78 tim=1228663420381 hv=2599214644 ad='1398f2ab0' sqlid='9p960vqdftrjn'
select count(*), id from gtt where id in (9244048,2529293) group by id
END OF STMT
PARSE #140137295360240:c=2000,e=1954,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=2363333961,tim=1228663420380
EXEC #140137295360240:c=0,e=74,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2363333961,tim=1228663420505
WAIT #140137295360240: nam='SQL*Net message to client' ela= 8 driver id=1650815232 #bytes=1 p3=0 obj#=163899 tim=1228663420582
NSMTIO: kcbism: islarge 1 next 0 nblks 67197 type 2, bpid 3, kcbisdbfc 0 kcbnhl 32768 kcbstt 6686 keep_nb 0 kcbnbh 317347 kcbnwp 4
NSMTIO: kcbimd: nblks 67197 kcbstt 6686 kcbpnb 33430 kcbisdbfc 3 is_medium 0
NSMTIO: kcbivlo: nblks 67197 vlot 500 pnb 334303 kcbisdbfc 0 is_large 0
NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)
NSMTIO: kcbdpc:DirectRead: tsn: 3, objd: 6468608, objn: 163899
ckpt: 1, nblks: 67197, ntcache: 3, ntdist:3
Direct Path for pdb 0 tsn 3  objd 6468608 objn 163899
Direct Path 1 ckpt 1, nblks 67197 ntcache 3 ntdist 3
Direct Path mndb 0 tdiob 3978 txiob 0 tciob 101
Direct path diomrc 128 dios 2 kcbisdbfc 0
NSMTIO: Additional Info: VLOT=1671515
Object# = 6468608, Object_Size = 67197 blocks
SqlId = 9p960vqdftrjn, plan_hash_value = 2363333961, Partition# = 0
WAIT #140137295360240: nam='reliable message' ela= 1053 channel context=5884629416 channel handle=5884574224 broadcast message=5885746648 obj#=163899 tim=1228663421984
WAIT #140137295360240: nam='enq: RO - fast object reuse' ela= 122 name|mode=1380909062 2=65585 0=1 obj#=163899 tim=1228663422173
WAIT #140137295360240: nam='enq: RO - fast object reuse' ela= 96 name|mode=1380909057 2=65585 0=2 obj#=163899 tim=1228663422323
WAIT #140137295360240: nam='cell single block physical read' ela= 308 cellhash#=2133459483 diskhash#=2964949887 bytes=8192 obj#=163899 tim=1228663423031
WAIT #140137295360240: nam='cell smart table scan' ela= 153 cellhash#=2133459483 p2=0 p3=0 obj#=163899 tim=1228663423651
WAIT #140137295360240: nam='cell smart table scan' ela= 152 cellhash#=379339958 p2=0 p3=0 obj#=163899 tim=1228663424114
WAIT #140137295360240: nam='cell smart table scan' ela= 145 cellhash#=3176594409 p2=0 p3=0 obj#=163899 tim=1228663424571
WAIT #140137295360240: nam='gc cr grant 2-way' ela= 135 p1=201 p2=2273408 p3=7 obj#=163899 tim=1228663425857
WAIT #140137295360240: nam='cell single block physical read' ela= 257 cellhash#=379339958 diskhash#=2689692736 bytes=8192 obj#=163899 tim=1228663426158
WAIT #140137295360240: nam='cell single block physical read' ela= 224 cellhash#=2133459483 diskhash#=3132070477 bytes=8192 obj#=163899 tim=1228663426466
WAIT #140137295360240: nam='cell smart table scan' ela= 251 cellhash#=2133459483 p2=0 p3=0 obj#=163899 tim=1228663426863
WAIT #140137295360240: nam='cell smart table scan' ela= 268 cellhash#=379339958 p2=0 p3=0 obj#=163899 tim=1228663427213
WAIT #140137295360240: nam='cell smart table scan' ela= 225 cellhash#=3176594409 p2=0 p3=0 obj#=163899 tim=1228663427514
WAIT #140137295360240: nam='cell smart table scan' ela= 85 cellhash#=2133459483 p2=0 p3=0 obj#=163899 tim=1228663427720
WAIT #140137295360240: nam='cell smart table scan' ela= 13 cellhash#=379339958 p2=0 p3=0 obj#=163899 tim=1228663427755
...

There is a simple conclusion: yes, there is a Smart Scan. I can’t see a massive difference between the NSMTIO trace for a GTT or a heap table. Since direct path read decisions are statistics driven (check parameter _direct_read_decision_statistics_driven) and there weren’t shared statistics on GTT I assume the single block reads are related to reading object statistics. OBJ# 163899 is the object_id of the GTT.

What is interesting is the use of “enq: RO – fast object reuse” prior to the start of the Smart Scan. This surprised me a little and I wasn’t expecting it. Here’s the same trace combination for a heap table for comparison:

=====================
PARSING IN CURSOR #140120190960712 len=91 dep=0 uid=78 oct=3 lid=78 tim=1226766583470 hv=2817388421 ad='13483d3f0' sqlid='g8948rkmyvvw5'
select /* heaptest001 */ count(*), id from heapt1 where id in (6460386, 6460390) group by id
END OF STMT
PARSE #140120190960712:c=1000,e=1925,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=3353166567,tim=1226766583470
EXEC #140120190960712:c=0,e=84,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3353166567,tim=1226766583599
WAIT #140120190960712: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=163896 tim=1226766583653
NSMTIO: kcbism: islarge 1 next 0 nblks 67024 type 2, bpid 3, kcbisdbfc 0 kcbnhl 32768 kcbstt 6686 keep_nb 0 kcbnbh 317347 kcbnwp 4
NSMTIO: kcbimd: nblks 67024 kcbstt 6686 kcbpnb 33430 kcbisdbfc 3 is_medium 0
NSMTIO: kcbivlo: nblks 67024 vlot 500 pnb 334303 kcbisdbfc 0 is_large 0
NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)
NSMTIO: kcbdpc:DirectRead: tsn: 9, objd: 163896, objn: 163896
ckpt: 1, nblks: 67024, ntcache: 482, ntdist:482
Direct Path for pdb 0 tsn 9  objd 163896 objn 163896
Direct Path 1 ckpt 1, nblks 67024 ntcache 482 ntdist 482
Direct Path mndb 0 tdiob 3978 txiob 0 tciob 109
Direct path diomrc 128 dios 2 kcbisdbfc 0
NSMTIO: Additional Info: VLOT=1671515
Object# = 163896, Object_Size = 67024 blocks
SqlId = g8948rkmyvvw5, plan_hash_value = 3353166567, Partition# = 0
WAIT #140120190960712: nam='enq: KO - fast object checkpoint' ela= 351 name|mode=1263468550 2=65585 0=2 obj#=163896 tim=1226766584359
WAIT #140120190960712: nam='reliable message' ela= 1372 channel context=5884629416 channel handle=5252488952 broadcast message=5885771824 obj#=163896 tim=1226766585825
WAIT #140120190960712: nam='enq: KO - fast object checkpoint' ela= 88 name|mode=1263468550 2=65585 0=1 obj#=163896 tim=1226766585972
WAIT #140120190960712: nam='enq: KO - fast object checkpoint' ela= 90 name|mode=1263468545 2=65585 0=2 obj#=163896 tim=1226766586140
WAIT #140120190960712: nam='cell smart table scan' ela= 201 cellhash#=3176594409 p2=0 p3=0 obj#=163896 tim=1226766587377
WAIT #140120190960712: nam='cell smart table scan' ela= 137 cellhash#=379339958 p2=0 p3=0 obj#=163896 tim=1226766589330
WAIT #140120190960712: nam='cell smart table scan' ela= 127 cellhash#=2133459483 p2=0 p3=0 obj#=163896 tim=1226766593585
WAIT #140120190960712: nam='cell smart table scan' ela= 215 cellhash#=3176594409 p2=0 p3=0 obj#=163896 tim=1226766602986

This is the more familiar picture: reliable message, enq: KO – fast object checkpoint followed by cell smart table scan.

The Verdict

Quite simple this time: yes, you can use Smart Scans on GTT. Additionally there are a few caveats around the use of GTTs when you are migrating your database to 12.1 but you can read about them in the link to the Optimiser blog provided.

JSON support in Exadata 12.1.2.1.0 and later

Some time ago Oracle announced that RDBMS 12.1.0.2 has built-in support for JSON processing. A little later it was also mentioned that you have support for JSON in the Exadata storage servers for offloading. This is probably a lot more exciting to users of JSON than it is to me as I’m not a developer. However, whenever an announcement such as the one I’m referring to is made I would like to see for myself how much of it is implemented in software. Like I said, I’m not a developer so apologies for a silly example: what I’m showing you here can probably done differently and is not the best use of an Exadata. But all I really wanted to test is if JSON support actually exists. I am using cellsrv 12.1.2.1.0 and RDBMS 12.1.0.2.2 for this test.

JSON

I have to say I struggled a little bit to understand the use case for JSON and therefore did what probably everyone does and consulted the official documentation and oracle-base.com for Tim’s views on JSON. Here’s a summary of links I found useful to get started:

The Test

Ok so that was enough to get me started. I needed data, and a table to store this in. It appeared to me that an apache log could be a useful source for JSON records, so I converted my webserver’s log file to JSON using libee0 on OpenSuSE (yes I know, but it’s a great virtualisation platform). The converted file was named httpd_access_log.json and had records such as these:

{"host": "192.168.100.1", "identity": "", "user": "", "date": "05/Feb/2015:12:13:05 +0100", "request": "HEAD /ol70 HTTP/1.1", "status": "404", "size": "", "f1": "", "useragent": "Python-urllib/2.7"}
{"host": "192.168.100.1", "identity": "", "user": "", "date": "05/Feb/2015:12:13:25 +0100", "request": "GET / HTTP/1.1", "status": "403", "size": "989", "f1": "", "useragent": "Mozilla/5.0 (X11; Linux x86_64; rv:35.0) Gecko/20100101 Firefox/35.0"}

Sorry for the wide output-it’s an Apache log…

I then created the table to store the data. JSON appears to be pretty unstructured, so this will do:

SQL> create table jsontest (id number,
  2   jdata clob,
  3   constraint jsontest check (jdata is json)
  4  )
  5  lob (jdata) store as securefile (
  6   enable storage in row
  7  );

SQL> create sequence s_jsontest;

Sequence created

If you look closely then you’ll see that the JSON data is stored in an inline CLOB-that’s one of the pre-requisites for offloading LOBs in 12c.

Loading JSON

Now I needed a way to get the data into the table. I think I could have used SQLLDR but since I have rusty perl scripting skills I gave DBD::Oracle on 12.1.0.2 a go. The following script inserts records slow-by-slow or row-by-row into the table and is probably not the most efficient way to do this. But one of the reasons I blog is so that I don’t have to remember everything. If you ever wondered how to write a DBI/DBD::Oracle script here’s a starting point. Note the emphasis on “starting point” since the script has been trimmed for readability-essential error checking is not shown. Whenever you work with data make sure that your error handling is top-notch!

#!/usr/bin/perl

use strict;
use warnings;

use DBI;
use DBD::Oracle;
use Getopt::Long;

# these will be set by GetOpt::Long
my $service;            # has to be in tnsnames.ora
my $username;
my $jsonfile;

GetOptions (
  "service=s"   => \$service,
  "user=s"      => \$username,
  "jsonfile=s"  => \$jsonfile
);
die "usage: load_json.pl --service <servicename> --jsonfile [--user username] " if (!defined ($service ) || !defined ($jsonfile));

die "$jsonfile is not a file" unless ( -f $jsonfile );

print "connecting to service $service as user $username to load file $jsonfile\n";

# about to start...
my $dbh = DBI->connect ("dbi:Oracle:$service", "$username", "someCleverPasswordOrCatName")
  or die ("Cannot connect to service $service: DBI:errstr!");

print "connection to the database established, trying to load data...\n";

# prepare a cursor to loop over all entries in the file
my $sth = $dbh->prepare(q{
 insert into jsontest (id, jdata) values(s_jsontest.nextval, :json)
});

if (! open JSON, "$jsonfile")  {
  print "cannot open $jsonfile: $!\n";
  $dbh->disconnect();
  die "Cannot continue\n";
}

while (<JSON>) {
  chomp;
  $sth->bind_param(":json", $_);
  $sth->execute();
}

$dbh->disconnect();

close JSON;

print "done\n";

This script read the file and inserted all the data into the table. Again, essential error checking must be added, the script is far from being complete. You also need to set the Perl environment variables to the perl installation in $ORACLE_HOME for it to find the DBI and DBD::Oracle drivers.

Offloading or not?

It turned out that the data I inserted was of course not enough to trigger a direct path read that could turn into a Smart Scan. A little inflation of the table was needed. Once that was done I started to get my feet wet with JSON queries:

SQL> select jdata from jsontest where rownum < 6;

JDATA
--------------------------------------------------------------------------------
{"host": "192.168.100.1", "identity": "", "user": "", "date": "05/Feb/2015:12:26
{"host": "192.168.100.156", "identity": "", "user": "", "date": "05/Feb/2015:12:
{"host": "192.168.100.156", "identity": "", "user": "", "date": "05/Feb/2015:12:
{"host": "192.168.100.156", "identity": "", "user": "", "date": "05/Feb/2015:12:
{"host": "192.168.100.156", "identity": "", "user": "", "date": "05/Feb/2015:12:

Interesting. Here are a few more examples with my data set. Again, refer to oracle-base.com and the official documentation set for more information about JSON and querying it in the database. It’s by no means an Exadata only feature.

SQL> select count(*) from jsontest where json_exists(jsontest.jdata, '$.host' false on error);

  COUNT(*)
----------
   2195968

SQL> select count(*) from jsontest where not json_exists(jsontest.jdata, '$.host' false on error);

  COUNT(*)
----------
         0

And finally, here is proof that you can offload JSON data in Exadata; at least for some of the operations it should  be possible judging by the information in v$sqlfn_metadata:

SQL> select name,offloadable from v$sqlfn_metadata where name like '%JSON%'
  2  order by offloadable,name;

NAME                                               OFF
-------------------------------------------------- ---
JSON_ARRAY                                         NO
JSON_ARRAYAGG                                      NO
JSON_EQUAL                                         NO
JSON_OBJECT                                        NO
JSON_OBJECTAGG                                     NO
JSON_QUERY                                         NO
JSON_SERIALIZE                                     NO
JSON_TEXTCONTAINS2                                 NO
JSON_VALUE                                         NO
JSON                                               YES
JSON                                               YES
JSON_EXISTS                                        YES
JSON_QUERY                                         YES
JSON_VALUE                                         YES

14 rows selected.

The two entries named “JSON” are most likely “is JSON” and “is not JSON”.

And now with real data volumes on a real system using JSON_EXISTS:

SQL> select /*+ monitor am_I_offloaded */ count(*)
  2  from jsontest where json_exists(jsontest.jdata, '$.host' false on error);                                                                                        

   COUNT(*)
-----------
    2195968

Elapsed: 00:00:04.96

SQL> select * from table(dbms_xplan.display_cursor);                                                                                                        

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
SQL_ID  6j73xcww7hmcw, child number 0
-------------------------------------
select /*+ monitor am_I_offloaded */ count(*) from jsontest where
json_exists(jsontest.jdata, '$.host' false on error)

Plan hash value: 568818393

---------------------------------------------------------------------------------------
| Id  | Operation                  | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |          |       |       | 91078 (100)|          |
|   1 |  SORT AGGREGATE            |          |     1 |   610 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| JSONTEST | 21960 |    12M| 91078   (1)| 00:00:04 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage(JSON_EXISTS2("JSONTEST"."JDATA" FORMAT JSON , '$.host' FALSE ON
              ERROR)=1)
       filter(JSON_EXISTS2("JSONTEST"."JDATA" FORMAT JSON , '$.host' FALSE ON
              ERROR)=1)

So the execution plan looks promising-I can see “table access storage full” and a storage() predicate. Looking at V$SQL I get:

SQL> select sql_id, child_number,
  2  case when io_cell_offload_eligible_bytes = 0 then 'NO' else 'YES' end offloaded,
  3  io_cell_offload_eligible_bytes/power(1024,2) offload_eligible_mb,
  4  io_interconnect_bytes/power(1024,2) interconnect_mb,
  5  io_cell_offload_returned_bytes/power(1024,2) returned_mb,
  6  io_cell_offload_returned_bytes/io_cell_offload_eligible_bytes*100 offload_pct
  7   from v$sql where sql_id = '6j73xcww7hmcw';                                                                                                                                                   

SQL_ID        CHILD_NUMBER OFF OFFLOAD_ELIGIBLE_MB INTERCONNECT_MB RETURNED_MB OFFLOAD_PCT
------------- ------------ --- ------------------- --------------- ----------- -----------
6j73xcww7hmcw            0 YES         2606.695313     1191.731941 1191.724129 45.71781455

And to avoid any doubt, I have the SQL Trace as well:

PARSING IN CURSOR #140370400430072 len=120 dep=0 uid=65 oct=3 lid=65 tim=1784977054418 hv=1750582781 ad='5bfcebed8' sqlid='bfwd4t5n5gjgx'
select /*+ monitor am_I_offloaded */ count(*)   from jsontest where json_exists(jsontest.jdata, '$.host' false on error)
END OF STMT
PARSE #140370400430072:c=103984,e=272239,p=909,cr=968,cu=0,mis=1,r=0,dep=0,og=1,plh=568818393,tim=1784977054417
EXEC #140370400430072:c=0,e=105,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=568818393,tim=1784977054587
WAIT #140370400430072: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 p3=0 obj#=96305 tim=1784977054666
WAIT #140370400430072: nam='reliable message' ela= 826 channel context=27059892880 channel handle=27196561216 broadcast message=26855409216 obj#=96305 tim=1784977055727
WAIT #140370400430072: nam='enq: KO - fast object checkpoint' ela= 159 name|mode=1263468550 2=65629 0=1 obj#=96305 tim=1784977055942
WAIT #140370400430072: nam='enq: KO - fast object checkpoint' ela= 229 name|mode=1263468545 2=65629 0=2 obj#=96305 tim=1784977056265
WAIT #140370400430072: nam='cell smart table scan' ela= 196 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977057370
WAIT #140370400430072: nam='cell smart table scan' ela= 171 cellhash#=822451848 p2=0 p3=0 obj#=96298 tim=1784977057884
WAIT #140370400430072: nam='cell smart table scan' ela= 188 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977058461
WAIT #140370400430072: nam='cell smart table scan' ela= 321 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977061623
WAIT #140370400430072: nam='cell smart table scan' ela= 224 cellhash#=822451848 p2=0 p3=0 obj#=96298 tim=1784977062053
WAIT #140370400430072: nam='cell smart table scan' ela= 254 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977062487
WAIT #140370400430072: nam='cell smart table scan' ela= 7 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977062969
WAIT #140370400430072: nam='cell smart table scan' ela= 25 cellhash#=822451848 p2=0 p3=0 obj#=96298 tim=1784977063016
WAIT #140370400430072: nam='cell smart table scan' ela= 81 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977063115
WAIT #140370400430072: nam='cell smart table scan' ela= 1134 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977065442
WAIT #140370400430072: nam='cell smart table scan' ela= 6 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977065883
WAIT #140370400430072: nam='cell smart table scan' ela= 14 cellhash#=822451848 p2=0 p3=0 obj#=96298 tim=1784977065917
WAIT #140370400430072: nam='cell smart table scan' ela= 105 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977066037
WAIT #140370400430072: nam='cell smart table scan' ela= 12 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977066207
WAIT #140370400430072: nam='cell smart table scan' ela= 6605 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977072866
WAIT #140370400430072: nam='cell smart table scan' ela= 27 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977073877
WAIT #140370400430072: nam='cell smart table scan' ela= 29 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977074903
WAIT #140370400430072: nam='cell smart table scan' ela= 907 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977077783
WAIT #140370400430072: nam='cell smart table scan' ela= 28 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977078753
WAIT #140370400430072: nam='cell smart table scan' ela= 24 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977080860
WAIT #140370400430072: nam='cell smart table scan' ela= 1077 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977082935
...

Summary

So yet, it would appear as if JSON is offloaded.

IO Resource Manager for Pluggable Databases in Exadata

Another interesting topic that goes over and above the CDB Resource Manager Plans I described earlier this week is the implementation of IORM Plans for Pluggable Databases. Pluggable Databases are an interesting object for studies, and I like to research things. When 12c came out there was no support for offloading initially, it is required that you are on cell software 12.1.x.x.x for full 12c support on Exadata. One aspect I was particularly interested in were PDBs and IORM. Limiting CPU on a PDB basis is already a cool thing, but in a hugely consolidated environment it does make sense to limit I/O based on a consumer group. Another one of the perks of working on Exadata :)

According to what I found out PDBs are treated as entities within the database, and they probably fall into the category of intra-database IORM. I have previously written about how DBRM plans filter down to the cells and become intra-database resource plans. This seems to be happening here, too.

IORM metric definitions

When looking at IORM I’m old fashioned and like to rely on the command line. More GUI oriented people should consider the use of OEM 12 to get similar data but in pretty pictures. But since OEM taps into cellcli under the covers it is only fair trying to understand the underlying technology.

The cells provide a lot of performance information in the metriccurrent and metrichistory views. You are shown metrics based on a name and objectType. For IORM the following objectTypes are of interest in general:

[celladmin@enkcel04 ~]$ cellcli -e "list metricdefinition attributes objectType" | grep IORM | sort | uniq
	 IORM_CATEGORY
	 IORM_CONSUMER_GROUP
	 IORM_DATABASE
	 IORM_PLUGGABLE_DATABASE

This was executed on Exadata 12.1.2.1.1, if you try this on an 11.2.3.x Exadata release you won’t see the IORM_PLUGGABLE_DATABASE category. Question is: which metrics are gathered in 12.1.2.1.1?

CELLCLI> list metricdefinition attributes name,description where objectType = 'IORM_PLUGGABLE_DATABASE'
	 PDB_FC_BY_ALLOCATED	 "Number of megabytes allocated in flash cache for this pluggable database"
	 PDB_FC_IO_BY_SEC   	 "Number of megabytes of I/O per second for this pluggable database to flash cache"
	 PDB_FC_IO_RQ       	 "Number of IO requests issued by this pluggable database to flash cache"
	 PDB_FC_IO_RQ_LG    	 "Number of large IO requests issued by this pluggable database to flash cache"
	 PDB_FC_IO_RQ_LG_SEC	 "Number of large IO requests issued by this pluggable database to flash cache per second"
	 PDB_FC_IO_RQ_SEC   	 "Number of IO requests issued by this pluggable database to flash cache per second"
	 PDB_FC_IO_RQ_SM    	 "Number of small IO requests issued by this pluggable database to flash cache"
	 PDB_FC_IO_RQ_SM_SEC	 "Number of small IO requests issued by this pluggable database to flash cache per second"
	 PDB_FD_IO_BY_SEC   	 "Number of megabytes of I/O per second for this pluggable database to flash disks"
	 PDB_FD_IO_LOAD     	 "Average I/O load from this pluggable database for flash disks"
	 PDB_FD_IO_RQ_LG    	 "Number of large IO requests issued by this pluggable database to flash disks"
	 PDB_FD_IO_RQ_LG_SEC	 "Number of large IO requests issued by this pluggable database to flash disks per second"
	 PDB_FD_IO_RQ_SM    	 "Number of small IO requests issued by this pluggable database to flash disks"
	 PDB_FD_IO_RQ_SM_SEC	 "Number of small IO requests issued by this pluggable database to flash disks per second"
	 PDB_FD_IO_TM       	 "The cumulative latency of reading blocks by this pluggable database from flash disks"
	 PDB_FD_IO_TM_RQ    	 "The rate which is the average latency of reading or writing blocks per request by this pluggable database from flash disks"
	 PDB_FD_IO_UTIL     	 "Percentage of flash resources utilized by this pluggable database"
	 PDB_FD_IO_WT_LG    	 "IORM wait time for large IO requests issued to flash disks by this pluggable database"
	 PDB_FD_IO_WT_LG_RQ 	 "Average IORM wait time per request for large IO requests issued to flash disks by this pluggable database"
	 PDB_FD_IO_WT_SM    	 "IORM wait time for small IO requests issued to flash disks by this pluggable database"
	 PDB_FD_IO_WT_SM_RQ 	 "Average IORM wait time per request for small IO requests issued to flash disks by this pluggable database"
	 PDB_IO_BY_SEC      	 "Number of megabytes of I/O per second for this pluggable database to hard disks"
	 PDB_IO_LOAD        	 "Average I/O load from this pluggable database for hard disks"
	 PDB_IO_RQ_LG       	 "Number of large IO requests issued by this pluggable database to hard disks"
	 PDB_IO_RQ_LG_SEC   	 "Number of large IO requests issued by this pluggable database to hard disks per second"
	 PDB_IO_RQ_SM       	 "Number of small IO requests issued by this pluggable database to hard disks"
	 PDB_IO_RQ_SM_SEC   	 "Number of small IO requests issued by this pluggable database to hard disks per second"
	 PDB_IO_TM_LG       	 "The cumulative latency of reading or writing large blocks by this pluggable database from hard disks"
	 PDB_IO_TM_LG_RQ    	 "The rate which is the average latency of reading or writing large blocks per request by this pluggable database from hard disks"
	 PDB_IO_TM_SM       	 "The cumulative latency of reading or writing small blocks by this pluggable database from hard disks"
	 PDB_IO_TM_SM_RQ    	 "The rate which is the average latency of reading or writing small blocks per request by this pluggable database from hard disks"
	 PDB_IO_UTIL_LG     	 "Percentage of disk resources utilized by large requests from this pluggable database"
	 PDB_IO_UTIL_SM     	 "Percentage of disk resources utilized by small requests from this pluggable database"
	 PDB_IO_WT_LG       	 "IORM wait time for large IO requests issued to hard disks by this pluggable database"
	 PDB_IO_WT_LG_RQ    	 "Average IORM wait time per request for large IO requests issued to hard disks by this pluggable database"
	 PDB_IO_WT_SM       	 "IORM wait time for small IO requests issued to hard disks by this pluggable database"
	 PDB_IO_WT_SM_RQ    	 "Average IORM wait time per request for small IO requests issued to hard disks by this pluggable database"

IORM metrics for PDBs

You still can’t “join” metricdefintition to metriccurrent but it is possible to use the objectType in metriccurrent, too. The way that Oracle externalises information about PDBs is as shown here:

CellCLI> list metriccurrent where objectType = 'IORM_PLUGGABLE_DATABASE' and metricObjectName like 'MBACHMT.*'
	 PDB_FC_BY_ALLOCATED	 MBACHMT.CDB$ROOT   	 1,843 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.PDB$SEED   	 41.500 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH0	 3,772 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH1	 3,374 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH2	 3,280 MB
...

PDB_FC_BY_ALLOCATED translates to “Number of megabytes allocated in flash cache for this pluggable database” as per the above translation. The Metric Object Name therefore is made up of the database name (I continue using MBACHMT as my CDB) and the container name. CDB$ROOT stands for the root, PDB$SEED for the seed database, and then the various PDB names you define. In my example I have “user PDBs” defined as swingbench{0,1,2}.

Stressing it

With the basics covered it is time to run some stress testing. I have created a 6 GB table named IOTEST in all my swingbench* PDBs and will use the same script to issue 80 sessions against that table in each PDB. My CDB resource manager plan is still the same, repeated here for your convenience:

begin
 dbms_resource_manager.clear_pending_area;
 dbms_resource_manager.create_pending_area;
 
 dbms_resource_manager.create_cdb_plan(
  plan => 'ENKITC_CDB_PLAN',
  comment => 'A CDB plan for 12c'
 );
 
 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench0',
  shares => 5,
  utilization_limit => 100);
 
 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench1',
  shares => 3,
  utilization_limit => 50);
 
 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench2',
  shares => 1,
  utilization_limit => 30);
 
 dbms_resource_manager.validate_pending_area;
 dbms_resource_manager.submit_pending_area;
end;
/

If I execute the scripts concurrently (80 sessions connecting against each PDB) and prefixing my scheduler with time command then I get the following results

  • swingbench0 – 143s
  • swingbench1 – 223s
  • swingbench2 – 288s

Interestingly, there is no event that would show I/O throttling in 12.1.0.2.2:

SYS:MBACHMT2> select count(*), con_id, event from v$session where username = 'SOE' group by con_id, event;

   COUNT(*)      CON_ID EVENT
----------- ----------- ----------------------------------------------------------------
         80           4 cell smart table scan
         80           5 cell smart table scan
         80           3 cell smart table scan

So all of them are Smart-Scanning. The fact that some of the sessions are throttled is not visible from the wait interface, or at least I haven’t seen a way to externalise I/O throttling. But it does happen. Using one of my favourite tools, metric_iorm.pl (available from MOS), I noticed the following:

Database: MBACHMT
Utilization:     Small=0%    Large=18%
Flash Cache:     IOPS=13087
Disk Throughput: MBPS=361
Small I/O's:     IOPS=4.0    Avg qtime=0.0ms
Large I/O's:     IOPS=346    Avg qtime=1026ms
	Consumer Group: SWINGBENCH1.SWINGBENCH_GROUP
	Utilization:     Small=0%    Large=4%
	Flash Cache:     IOPS=3482
	Disk Throughput: MBPS=91
	Small I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Large I/O's:     IOPS=87.7    Avg qtime=1336ms
	Consumer Group: SWINGBENCH0.SWINGBENCH_GROUP
	Utilization:     Small=0%    Large=13%
	Flash Cache:     IOPS=8886
	Disk Throughput: MBPS=254
	Small I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Large I/O's:     IOPS=244    Avg qtime=906ms
	Consumer Group: CDB$ROOT._ORACLE_LOWPRIBG_GROUP_
	Utilization:     Small=0%    Large=0%
	Flash Cache:     IOPS=0.8
	Disk Throughput: MBPS=0
	Small I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Large I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Consumer Group: CDB$ROOT.ORA$AUTOTASK
	Utilization:     Small=0%    Large=0%
	Flash Cache:     IOPS=1.4
	Disk Throughput: MBPS=0
	Small I/O's:     IOPS=0.4    Avg qtime=0.0ms
	Large I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Consumer Group: CDB$ROOT._ORACLE_BACKGROUND_GROUP_
	Utilization:     Small=0%    Large=0%
	Flash Cache:     IOPS=0.7
	Disk Throughput: MBPS=0
	Small I/O's:     IOPS=3.6    Avg qtime=0.0ms
	Large I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Consumer Group: SWINGBENCH2.SWINGBENCH_GROUP
	Utilization:     Small=0%    Large=0%
	Flash Cache:     IOPS=717
	Disk Throughput: MBPS=15
	Small I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Large I/O's:     IOPS=14.5    Avg qtime=1152ms

These are statistics from a single cell-this X2-2 has 3 of them. I have also gathered some of the raw stats here in case you are interested, again from a single cell:

CellCLI> list metriccurrent where name like 'PDB.*' and metricObjectName like 'MBACHMT.SW.*' and metricValue not like '0.*';
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH0	 3,779 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH1	 3,467 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH2	 3,461 MB
	 PDB_FC_IO_BY_SEC   	 MBACHMT.SWINGBENCH0	 835 MB/sec
	 PDB_FC_IO_BY_SEC   	 MBACHMT.SWINGBENCH1	 635 MB/sec
	 PDB_FC_IO_BY_SEC   	 MBACHMT.SWINGBENCH2	 338 MB/sec
	 PDB_FC_IO_RQ       	 MBACHMT.SWINGBENCH0	 4,807,060 IO requests
	 PDB_FC_IO_RQ       	 MBACHMT.SWINGBENCH1	 4,835,038 IO requests
	 PDB_FC_IO_RQ       	 MBACHMT.SWINGBENCH2	 4,833,804 IO requests
	 PDB_FC_IO_RQ_LG    	 MBACHMT.SWINGBENCH0	 4,797,336 IO requests
	 PDB_FC_IO_RQ_LG    	 MBACHMT.SWINGBENCH1	 4,828,945 IO requests
	 PDB_FC_IO_RQ_LG    	 MBACHMT.SWINGBENCH2	 4,827,506 IO requests
	 PDB_FC_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH0	 12,738 IO/sec
	 PDB_FC_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH1	 9,691 IO/sec
	 PDB_FC_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH2	 5,158 IO/sec
	 PDB_FC_IO_RQ_SEC   	 MBACHMT.SWINGBENCH0	 12,789 IO/sec
	 PDB_FC_IO_RQ_SEC   	 MBACHMT.SWINGBENCH1	 9,721 IO/sec
	 PDB_FC_IO_RQ_SEC   	 MBACHMT.SWINGBENCH2	 5,182 IO/sec
	 PDB_FC_IO_RQ_SM    	 MBACHMT.SWINGBENCH0	 9,724 IO requests
	 PDB_FC_IO_RQ_SM    	 MBACHMT.SWINGBENCH1	 6,093 IO requests
	 PDB_FC_IO_RQ_SM    	 MBACHMT.SWINGBENCH2	 6,298 IO requests
	 PDB_FC_IO_RQ_SM_SEC	 MBACHMT.SWINGBENCH0	 51.2 IO/sec
	 PDB_FC_IO_RQ_SM_SEC	 MBACHMT.SWINGBENCH1	 30.2 IO/sec
	 PDB_FC_IO_RQ_SM_SEC	 MBACHMT.SWINGBENCH2	 23.9 IO/sec
	 PDB_FD_IO_BY_SEC   	 MBACHMT.SWINGBENCH0	 835 MB/sec
	 PDB_FD_IO_BY_SEC   	 MBACHMT.SWINGBENCH1	 635 MB/sec
	 PDB_FD_IO_BY_SEC   	 MBACHMT.SWINGBENCH2	 338 MB/sec
	 PDB_FD_IO_LOAD     	 MBACHMT.SWINGBENCH0	 30.3
	 PDB_FD_IO_LOAD     	 MBACHMT.SWINGBENCH1	 30.4
	 PDB_FD_IO_LOAD     	 MBACHMT.SWINGBENCH2	 28.1
	 PDB_FD_IO_RQ_LG    	 MBACHMT.SWINGBENCH0	 4,797,336 IO requests
	 PDB_FD_IO_RQ_LG    	 MBACHMT.SWINGBENCH1	 4,828,945 IO requests
	 PDB_FD_IO_RQ_LG    	 MBACHMT.SWINGBENCH2	 4,827,506 IO requests
	 PDB_FD_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH0	 12,738 IO/sec
	 PDB_FD_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH1	 9,691 IO/sec
	 PDB_FD_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH2	 5,158 IO/sec
	 PDB_FD_IO_TM       	 MBACHMT.SWINGBENCH0	 69,803,464 us
	 PDB_FD_IO_TM       	 MBACHMT.SWINGBENCH1	 45,061,357 us
	 PDB_FD_IO_TM       	 MBACHMT.SWINGBENCH2	 40,433,099 us
	 PDB_FD_IO_TM_RQ    	 MBACHMT.SWINGBENCH0	 1.0 us/request
	 PDB_FD_IO_TM_RQ    	 MBACHMT.SWINGBENCH1	 1.0 us/request
	 PDB_FD_IO_TM_RQ    	 MBACHMT.SWINGBENCH2	 1.0 us/request
	 PDB_FD_IO_UTIL     	 MBACHMT.SWINGBENCH0	 67 %
	 PDB_FD_IO_UTIL     	 MBACHMT.SWINGBENCH1	 54 %
	 PDB_FD_IO_UTIL     	 MBACHMT.SWINGBENCH2	 29 %
	 PDB_FD_IO_WT_LG    	 MBACHMT.SWINGBENCH0	 830,669 ms
	 PDB_FD_IO_WT_LG    	 MBACHMT.SWINGBENCH1	 717,211 ms
	 PDB_FD_IO_WT_LG    	 MBACHMT.SWINGBENCH2	 221,666 ms
	 PDB_IO_BY_SEC      	 MBACHMT.SWINGBENCH0	 380 MB/sec
	 PDB_IO_BY_SEC      	 MBACHMT.SWINGBENCH1	 305 MB/sec
	 PDB_IO_BY_SEC      	 MBACHMT.SWINGBENCH2	 151 MB/sec
	 PDB_IO_LOAD        	 MBACHMT.SWINGBENCH0	 2.4
	 PDB_IO_LOAD        	 MBACHMT.SWINGBENCH1	 2.6
	 PDB_IO_LOAD        	 MBACHMT.SWINGBENCH2	 2.3
	 PDB_IO_RQ_LG       	 MBACHMT.SWINGBENCH0	 105,784 IO requests
	 PDB_IO_RQ_LG       	 MBACHMT.SWINGBENCH1	 88,549 IO requests
	 PDB_IO_RQ_LG       	 MBACHMT.SWINGBENCH2	 61,617 IO requests
	 PDB_IO_RQ_LG_SEC   	 MBACHMT.SWINGBENCH0	 365 IO/sec
	 PDB_IO_RQ_LG_SEC   	 MBACHMT.SWINGBENCH1	 292 IO/sec
	 PDB_IO_RQ_LG_SEC   	 MBACHMT.SWINGBENCH2	 145 IO/sec
	 PDB_IO_TM_LG       	 MBACHMT.SWINGBENCH0	 3,822,888,945 us
	 PDB_IO_TM_LG       	 MBACHMT.SWINGBENCH1	 3,355,167,650 us
	 PDB_IO_TM_LG       	 MBACHMT.SWINGBENCH2	 2,004,747,904 us
	 PDB_IO_TM_LG_RQ    	 MBACHMT.SWINGBENCH0	 1.0 us/request
	 PDB_IO_TM_LG_RQ    	 MBACHMT.SWINGBENCH1	 1.0 us/request
	 PDB_IO_TM_LG_RQ    	 MBACHMT.SWINGBENCH2	 1.0 us/request
	 PDB_IO_UTIL_LG     	 MBACHMT.SWINGBENCH0	 27 %
	 PDB_IO_UTIL_LG     	 MBACHMT.SWINGBENCH1	 20 %
	 PDB_IO_UTIL_LG     	 MBACHMT.SWINGBENCH2	 9 %
	 PDB_IO_WT_LG       	 MBACHMT.SWINGBENCH0	 108,668,272 ms
	 PDB_IO_WT_LG       	 MBACHMT.SWINGBENCH1	 105,099,717 ms
	 PDB_IO_WT_LG       	 MBACHMT.SWINGBENCH2	 132,192,319 ms
	 PDB_IO_WT_LG_RQ    	 MBACHMT.SWINGBENCH0	 1,655 ms/request
	 PDB_IO_WT_LG_RQ    	 MBACHMT.SWINGBENCH1	 2,979 ms/request
	 PDB_IO_WT_LG_RQ    	 MBACHMT.SWINGBENCH2	 4,921 ms/request

Some of the numbers don’t seem to make sense here, for example PDB_FD_IO_RQ_LG as the values are very similar. This is actually a feature (really!), because some metrics are cumulative, and some are instantaneous:

CELLCLI> list metricdefinition where name = 'PDB_FD_IO_RQ_LG' detail
	 name:              	 PDB_FD_IO_RQ_LG
	 description:       	 "Number of large IO requests issued by this pluggable database to flash disks"
	 metricType:        	 Cumulative
	 objectType:        	 IORM_PLUGGABLE_DATABASE
	 unit:              	 "IO requests"

So this is a cumulative metric. Others, like PDB_IO_RQ_LG_SEC measure the state “as is”:

CELLCLI> list metricdefinition where name = 'PDB_IO_RQ_LG_SEC' detail
	 name:              	 PDB_IO_RQ_LG_SEC
	 description:       	 "Number of large IO requests issued by this pluggable database to hard disks per second"
	 metricType:        	 Rate
	 objectType:        	 IORM_PLUGGABLE_DATABASE
	 unit:              	 IO/sec

Have fun!

Testing 12c CDB Resource Plans and a little bit about OEM Express

Inspired by Sue Lee’s presentation at Enkitec’s E4 conference I decided to re-run my test suite to work out how you can use Database and I/O Resource Manager in Oracle 12.1.0.2.2 to help consolidating databases in the new Multi-Tenant architecture. I should point out briefly that the use of Multi-Tenant as demonstrated in this article requires you to have a license. Tuning tools shown will also require you to be appropriately licensed…

Setup

I have created a Container Database (CDB), named MBACHMT (MBACH – Multi-Tenant), in which I want to run Swingbench to experiment with I/O Resource Manager. The same Pluggable Databases (PDBs) are subject to the new CDB-(Database) Resource Manager testing. In order to simplify the task I’ll just create a single PDB for now, install Swingbench’s Order Entry schema, and clone the PDB twice. The CDB is created using dbca on an Exadata X2-2 quarter rack. Once the CDB was in place I could create the first PDB. I don’t think the steps need a lot of explanation, so here they go without much commenting. First the PDB must be created-based on the SEED database, then I create a tablespace and a user account to host the actual data.

SYS:MBACHMT1> create pluggable database swingbench0 admin user admin identified by secret roles=(DBA);

Pluggable database created.

SYS:MBACHMT1> alter pluggable database swingbench0 open instance=all;

Pluggable database altered.

SYS:MBACHMT1> alter session set container = swingbench0;

Session altered.

SYS:MBACHMT1> create tablespace soe datafile size 5g;

Tablespace created.

SYS:MBACHMT1> create user soe identified by soe default tablespace soe;

User created.

SYS:MBACHMT1> grant connect to soe;

Grant succeeded.

SYS:MBACHMT1> grant execute on dbms_lock to soe;

Grant succeeded.

One thing requires an explanation, and that’s the “alter session set container = swingbench0” command. As you can see the SQLPROMPT is comprised of username – colon – container name. The container named does not change when using the SYS account to switch the context from CDB$ROOT to a PDB, hence you continue to see the MBACHMT1 prefix when in fact I am executing commands on the PDB level.

The next step is to create a basic Order Entry schema. I wrote about this before, and so has Dominic Giles on his blog. For your reference, here is the command I used:

oewizard -scale 1 -dbap supersecret -u soe -p soe -cl -cs //enkscan2/swingbench0 -ts SOE -create

SwingBench Wizard
Author  :        Dominic Giles
Version :        2.5.0.949

Running in Lights Out Mode using config file : oewizard.xml

============================================
|           Datagenerator Run Stats        |
============================================
Connection Time                        0:00:00.005
Data Generation Time                   0:00:21.380
DDL Creation Time                      0:00:52.221
Total Run Time                         0:01:13.609
Rows Inserted per sec                      566,237
Data Generated (MB) per sec                   46.1
Actual Rows Generated                   13,009,500


Post Creation Validation Report
===============================
The schema appears to have been created successfully.

Valid Objects
=============
Valid Tables : 'ORDERS','ORDER_ITEMS','CUSTOMERS','WAREHOUSES','ORDERENTRY_METADATA','INVENTORIES','PRODUCT_INFORMATION',
'PRODUCT_DESCRIPTIONS','ADDRESSES','CARD_DETAILS'
Valid Indexes : 'PRD_DESC_PK','PROD_NAME_IX','PRODUCT_INFORMATION_PK','PROD_SUPPLIER_IX','PROD_CATEGORY_IX','INVENTORY_PK',
'INV_PRODUCT_IX','INV_WAREHOUSE_IX','ORDER_PK','ORD_SALES_REP_IX','ORD_CUSTOMER_IX','ORD_ORDER_DATE_IX',
'ORD_WAREHOUSE_IX','ORDER_ITEMS_PK','ITEM_ORDER_IX','ITEM_PRODUCT_IX','WAREHOUSES_PK','WHS_LOCATION_IX',
'CUSTOMERS_PK','CUST_EMAIL_IX','CUST_ACCOUNT_MANAGER_IX','CUST_FUNC_LOWER_NAME_IX','ADDRESS_PK','ADDRESS_CUST_IX',
'CARD_DETAILS_PK','CARDDETAILS_CUST_IX'
Valid Views : 'PRODUCTS','PRODUCT_PRICES'
Valid Sequences : 'CUSTOMER_SEQ','ORDERS_SEQ','ADDRESS_SEQ','LOGON_SEQ','CARD_DETAILS_SEQ'
Valid Code : 'ORDERENTRY'
Schema Created

With the schema in place I need a PDB Resource Plan, or in other words, enable a resource manager plan on PDB-level. The PDB Resource Plan is almost identical to non-CDB DBRM plans, with a few restrictions mentioned in the Admin Guide chapter 44, just below figure 44-4. The one that affects me is the lack of multi-level resource plans. For this reason I’m going to use a simple plan based on the RATIO mgmt_mth of dbms_resource_manager.create_plan. Not having multi-level resource plans at your disposal might actually prevent incredibly complex plans from being created, that are beautifully architected but equally difficult to understand for me at least.

To keep it simple, my PDBs just have 1 purpose: execute Swingbench. As such there won’t be an additional application user, all I care about is the SOE account. I want it to be eligible for the lion share of CPU, so here’s the plan. You must make sure that your execution context is the new PDB (swingbench0). You can make sure by selecting “sys_context(‘userenv’,’con_name’) from dual.

begin
 dbms_resource_manager.clear_pending_area;
 dbms_resource_manager.create_pending_area;

 -- create a new resource consumer group to which we will later on add plan directives
 -- a consumer group is a logical construct grouping sessions to a similar/identical workload
 dbms_resource_manager.create_consumer_group('SWINGBENCH_GROUP', 'for swingbench processing');

 dbms_resource_manager.validate_pending_area();
 dbms_resource_manager.submit_pending_area();
end;
/

begin
 dbms_resource_manager.create_pending_area();

 -- when logging in as oracle user "SOE", map this session to the SWINGBENCH_GROUP
 dbms_resource_manager.set_consumer_group_mapping(
		dbms_resource_manager.oracle_user, 'SOE', 'SWINGBENCH_GROUP');
 dbms_resource_manager.submit_pending_area();
end;
/

begin
 -- must allow the SOE user to switch from OTHERS_GROUP to SWINGBENCH_GROUP. Forgetting this step
 -- is a common reason for DBRM not to work as expected
 dbms_resource_manager_privs.grant_switch_consumer_group('SOE','SWINGBENCH_GROUP', true);
end;
/

BEGIN
 dbms_resource_manager.clear_pending_area();
 dbms_resource_manager.create_pending_area();
 
 -- new create the plan in the first step. Note the mgmt_mth which essentially requires you
 -- to think of CPU shares, not percentages. Also enforces the requirement not to use
 -- multi-level plans
 -- thanks for @fritshoogland for making this obvious to me
 dbms_resource_manager.create_plan(
 	plan => 'ENKITEC_SWINGBENCH_PDB_PLAN',
        mgmt_mth => 'RATIO',
 	comment => 'sample DBRM plan for swingbench'
 );

 -- now define what the plan is about. Give the SYS_GROUP 3 shares
 dbms_resource_manager.create_plan_directive(
  plan => 'ENKITEC_SWINGBENCH_PDB_PLAN',
  comment => 'sys_group is level 1',
  group_or_subplan => 'SYS_GROUP',
  mgmt_p1 => 3);

 -- the SWINGBENCH user gets 7 shares
 dbms_resource_manager.create_plan_directive(
  plan => 'ENKITEC_SWINGBENCH_PDB_PLAN',
  group_or_subplan => 'SWINGBENCH_GROUP',
  comment => 'us before anyone else',
  mgmt_p1 => 7
 );

 -- finally anyone not in a previous consumer group will be mapped to the
 -- OTHER_GROUPS and get 1 share. 
 dbms_resource_manager.create_plan_directive(
  plan => 'ENKITEC_SWINGBENCH_PDB_PLAN',
  group_or_subplan => 'OTHER_GROUPS',
  comment => 'all the rest',
  mgmt_p1 => 1
 );
 
 dbms_resource_manager.validate_pending_area();
 dbms_resource_manager.submit_pending_area();
end;
/

If you didn’t get any errors you can enable the plan in the PDB using the familiar “alter system set resource_manager_plan = ENKITEC_SWINGBENCH_PDB_PLAN;” command.

The PDB is now ready for cloning, which requires it to be open read only. Once swingbench0 is in the correct open mode, clone it using “create pluggable database swingbench1 from swingbench0;” and “create pluggable database swingbench2 from swingbench0;”. Nice-no “RMAN> duplicate database to swingbench1 from active database” and all this … just a one-liner. Once the cloning is done, open the PDBs.

The CDB plan

With the PDBs all registered I am now able to define a CDB resource plan in CDB$ROOT. Again, check using “select sys_context(‘userenv’,’con_name’) from dual” that you are in the root, not a PDB. Here is the plan:

begin
 dbms_resource_manager.clear_pending_area;
 dbms_resource_manager.create_pending_area;

 dbms_resource_manager.create_cdb_plan(
  plan => 'ENKITC_CDB_PLAN',
  comment => 'A CDB plan for 12c'
 );

 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench0',
  shares => 5,
  utilization_limit => 100);

 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench1',
  shares => 3,
  utilization_limit => 50);

 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench2',
  shares => 1,
  utilization_limit => 30);

 dbms_resource_manager.validate_pending_area;
 dbms_resource_manager.submit_pending_area;
end;
/

SQL> alter system set RESOURCE_MANAGER_PLAN = 'FORCE:ENKITC_CDB_PLAN' scope=both sid='*';

System altered.

With this CDB plan I defined CPU shares and utilisation limits. There are other items worth investigating, refer to the documentation for those. Please take note that except for swingbench0 I capped the maximum utilisation of 50 and 30 percent respectively. This will be interesting later.

Testing

No workload is 100% CPU-bound and I tried a few iterations before coming up with a suitable model to view the CDB Resource Plan in action. In the end all I needed was CPU burning nonsense, and I have found one way of burning CPU by calculating millions of square roots. I have written a small “launcher” script that can execute a SQL script against a (Pluggable) database x-many times. When exceeding the CPU capacity on the system I should be able to see the effect. So I launched 20 sessions of my CPU burning for-loops against each of the PDBs in instance two and connected against CDB$ROOT to see the effect:

SYS:MBACHMT2> select count(*), con_id, inst_id, event from gv$session where username = 'SOE' group by con_id, inst_id, event;

   COUNT(*)      CON_ID     INST_ID EVENT
----------- ----------- ----------- ----------------------------------------------------------------
         20           3           2 resmgr:cpu quantum
         20           4           2 resmgr:cpu quantum
         20           5           2 resmgr:cpu quantum

And yes, that looks like it ;)

The timings for the executions were:

  • 232 seconds for swingbench0
  • 318 seconds for swingbench1
  • 509 seconds for swingbench2

A little bit of OEM Express

OEM Express was running at the time and it allows you to see the utilisation of your PDBs:

CDB Resource Plan with all 3 PDBs active

CDB Resource Plan with all 3 PDBs active

You can see the 3 PDBs working along. On the CPU bars to the right you can see the number of running sessions (green) and those waiting (beige). You can also see the entitlement as per shares (black vertical bar in “CPU Resource Limits”) and the utilisation limit (red vertical bar) The output almost perfectly matched the configuration.