Martins Blog

Trying to explain complex things in simple terms

Archive for the ‘Exadata’ Category

GTT and Smart Scan – the importance of the append hint

Posted by Martin Bach on December 5, 2016

While writing the previous two posts about GTTs and Smart Scan, I stumbled across an interesting observation. When I started putting my thoughts to (virtual) paper, I thought it was going to be a short post. Well, it wasn’t to be, again. Anyway, you might find this interesting.

Initial Situation

If you read the previous posts this code example I used to populate the GTT might look familiar:

insert /*+ append */ into gtt select * from t4 where rownum < 400000;
 
commit;

In situations like this where I’m moving data I developed a habit of using the append hint. I guess I’ve been working on HCC tables a bit too much and therefore didn’t even make a conscious decision to use that hint. It was the right thing to do, as you will see next.

The full test harness is shown here again for reference:

SQL> !cat test.sql
set lines 120 tab off trimspool on verify off timing on

-- this statement is using the /*+ append */ hint
insert /*+ append using_append */ into gtt select * from t4 where rownum < 400000;

commit;

-- not testing if I can get Smart Scans, I know there are :)
alter session set "_serial_direct_read" = always;

-- make sure the query can be ever so slightly more complex and always return rows
col min_id new_v min_id
col max_id new_v max_id

select min(id) min_id from gtt;
select max(id) max_id from gtt;

-- this is the actual test
@mystats start s=s
set echo on
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
set echo off
@mystats stop t=1

SQL>

When executing the script and looking at the output I didn’t see anything out of the ordinary. When checking for the “depth” of the Smart Scan, I regularly consult the cell blocks processed by % layer statistics to see if I lose blocks between code layers. It wouldn’t appear so:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
...
STAT    cell IO uncompressed bytes                                             546,136,064
STAT    cell blocks processed by cache layer                                        66,667
STAT    cell blocks processed by data layer                                         66,667
STAT    cell blocks processed by txn layer                                          66,667
...
STAT    cell physical IO bytes eligible for predicate offload                  546,136,064
STAT    cell physical IO interconnect bytes                                        112,504
STAT    cell physical IO interconnect bytes returned by smart scan                  96,120
...
STAT    physical read IO requests                                                      523
STAT    physical read bytes                                                    546,152,448
STAT    physical read requests optimized                                               317
STAT    physical read total IO requests                                                523
STAT    physical read total bytes                                              546,152,448
STAT    physical read total bytes optimized                                    330,153,984
STAT    physical read total multi block requests                                       521
STAT    physical reads                                                              66,669
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,667
...
STAT    table scans (direct read)                                                        1
STAT    table scans (long tables)                                                        1

66,667 blocks in each code layer. This looks ok to me: the number of all physical reads recorded for this statement is 66,669. I certainly don’t want to endorse tuning by ratio, but in this case it sort-of makes sense: the offload efficiency pretty high in my case:

SQL> @fsx.sql
Enter value for sql_text:
Enter value for sql_id: a4jrkbnsfgk1j

SQL_ID         CHILD  PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD IO_SAVED_% SQL_TEXT
------------- ------ ---------- ------ ---------- ------ ------- ---------- ----------------------------------------------------------------------
a4jrkbnsfgk1j      0 2363333961      1        .01      0 Yes         100.00 select /* using_append */ count(*), id from gtt where id in (        7

That should suffice for a recap of the previous posts.

No append hint-different outcome

Now imagine my surprise when I didn’t execute a script but rather typed the insert command in an interactive session. In the next example, created to reproduce the observation, I “forgot” to specify the append hint:

SQL> !cat test2.sql
-- not using the append hint
insert /* not_using_append */ into gtt select * from t4 where rownum < 400000;

commit;

-- not testing if I can get Smart Scans, I know there are :)
alter session set "_serial_direct_read" = always;

-- make sure the query can be ever so slightly more complex and always return rows
col min_id new_v min_id
col max_id new_v max_id

select min(id) min_id from gtt;
select max(id) max_id from gtt;

-- this is the actual test
@mystats start s=s
set echo on
select /* not_using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
set echo off
@mystats stop t=1

The difference between test.sql and test2.sql looks harmless enough – just a hint, or lack thereof. However the result is interesting. Most executions of the above script resulted in Oracle reporting the following statistics:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
...
STAT    calls to get snapshot scn: kcmgss                                              462
STAT    calls to kcmgcs                                                                 28
STAT    cell IO uncompressed bytes                                             544,800,768
STAT    cell blocks processed by cache layer                                        67,026
STAT    cell blocks processed by data layer                                              1
STAT    cell blocks processed by txn layer                                               1
STAT    cell commit cache queries                                                   67,025
STAT    cell flash cache read hits                                                     633
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  544,800,768
STAT    cell physical IO interconnect bytes                                    545,000,688
STAT    cell physical IO interconnect bytes returned by smart scan             544,984,304
STAT    cell scans                                                                       1
STAT    cleanout - number of ktugct calls                                           66,503
STAT    cleanouts only - consistent read gets                                       66,503
STAT    commit txn count during cleanout                                            66,503
...
STAT    consistent gets                                                            133,250
STAT    consistent gets direct                                                      66,504
STAT    consistent gets examination                                                 66,506
STAT    consistent gets examination (fastpath)                                      66,506
...
STAT    immediate (CR) block cleanout applications                                  66,503
...
STAT    physical read IO requests                                                    1,044
STAT    physical read bytes                                                    544,817,152
STAT    physical read requests optimized                                               633
STAT    physical read total IO requests                                              1,044
STAT    physical read total bytes                                              544,817,152
STAT    physical read total bytes optimized                                    329,859,072
STAT    physical read total multi block requests                                       748
STAT    physical reads                                                              66,506
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,504
...

Huh? I was clearly surprised to see so many rows not making it to the higher layers (txn and data) after having been read by the cache layer. Further executions of the select statement within the same session do not necessarily improve this significantly, but I noticed savings by storage index to come into play.

Speaking more scientifically, the listing immediately above shows that out of 67,026 blocks that were read by the cache layer only 1 (!) was processed by txn and data layers. As a direct result Smart Scan does not offer any real benefit: cell physical IO interconnect bytes returned by smart scan is actually ≥ cell physical IO bytes eligible for predicate offload.

Why? What?

There were a few red herrings in the statistic counters I saw:

  • Cleanout statistics have been reported by mystats
    • commit txn count during cleanout
    • cleanouts only – consistent read gets
    • immediate (CR) block cleanout applications
    • etc.
  • There are consistent gets examination that might be peeks at undo information
  • Lots of queries to the commit cache (cell commit cache queries) – but for most of my tests I didn’t see replies. Only 1 out of maybe 20 tests produced some cell transactions found in commit cache, but even then their number was not significant. I will provide an example towards the end of this post.

Further investigation

So this leads me to suspect that the blocks that were created with the append hint are “clean”, e.g. suitable for a consistent read on the cells without further need for processing. It would appear so, judging by this block dump (dump the block from the tempfile, not datafile!):

Start dump data blocks tsn: 3 file#:1 minblk 2305025 maxblk 2305025
...
Block dump from disk:
buffer tsn: 3 rdba: 0x00632c01 (1/2305025)
scn: 0x0.64aaeb9f seq: 0x01 flg: 0x0c tail: 0xeb9f0601
frmt: 0x02 chkval: 0xd11e type: 0x06=trans data
...
Block header dump:  0x00632c01
 Object id on Block? Y
 seg/obj: 0x632c00  csc: 0x00.64aaeb9e  itc: 3  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.019.0012beca  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x00632c01
data_block_dump,data header at 0x7f6250501074
===============
tsiz: 0x1f88
hsiz: 0x1e
pbl: 0x7f6250501074
     76543210
flag=--------
ntab=1
nrow=6
frre=-1
fsbo=0x1e
fseo=0x746
avsp=0x728
tosp=0x728
0xe:pti[0]      nrow=6  offs=0
0x12:pri[0]     offs=0x1b7d
0x14:pri[1]     offs=0x1772
0x16:pri[2]     offs=0x1367
0x18:pri[3]     offs=0xf5c
0x1a:pri[4]     offs=0xb51
0x1c:pri[5]     offs=0x746
block_row_dump:
tab 0, row 0, @0x1b7d
tl: 1035 fb: --H-FL-- lb: 0x0  cc: 6
col  0: [ 2]  c1 50
col  1: [999]
 37 39 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
...
tl: 1035 fb: --H-FL-- lb: 0x0  cc: 6
...
tl: 1035 fb: --H-FL-- lb: 0x0  cc: 6
...

This is in contrast to the blocks I dumped that have been created without the append hint. To make sure I’m comparing apples with apples these blocks have been created in a different session after having disconnected from the previous one:

Start dump data blocks tsn: 3 file#:1 minblk 2305025 maxblk 2305025
...
Block dump from disk:
buffer tsn: 3 rdba: 0x00632c01 (1/2305025)
scn: 0x0.64aab6bf seq: 0x00 flg: 0x0c tail: 0xb6bf0600
frmt: 0x02 chkval: 0x478e type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
...
Block header dump:  0x00632c01
 Object id on Block? Y
 seg/obj: 0x629300  csc: 0x00.64aab6bf  itc: 2  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.01b.0012be5d  0x00c04427.d97f.27  ----    6  fsc 0x0000.00000000
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x00632c01
data_block_dump,data header at 0x7f53b75e125c
===============
tsiz: 0x1fa0
hsiz: 0x1e
pbl: 0x7f53b75e125c
     76543210
flag=--------
ntab=1
nrow=6
frre=-1
fsbo=0x1e
fseo=0x752
avsp=0x734
tosp=0x734
0xe:pti[0]      nrow=6  offs=0
0x12:pri[0]     offs=0xf6c
0x14:pri[1]     offs=0x1379
0x16:pri[2]     offs=0x1786
0x18:pri[3]     offs=0x1b93
0x1a:pri[4]     offs=0x752
0x1c:pri[5]     offs=0xb5f
block_row_dump:
tab 0, row 0, @0xf6c
tl: 1037 fb: --H-FL-- lb: 0x1  cc: 6
col  0: [ 4]  c3 0f 05 50
col  1: [999]
 31 34 30 34 37 39 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
...
tl: 1037 fb: --H-FL-- lb: 0x1  cc: 6
...
tl: 1037 fb: --H-FL-- lb: 0x1  cc: 6
...

This supports the theory that the block I created without the append hint features some information that prevents it from being passed from cache to txn layer. I didn’t see active txn count during cleanout though, but commit txn count during cleanout and immediate (CR) block cleanout applications instead. So cleanout is definitely needed and performed.

I stated in the second bullet point that commit cache queries were not producing cache-hits most of the time. I managed to get an execution of the select statement against the blocks created without append hint, where at least some commit cache queries produced hits:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    cell IO uncompressed bytes                                             544,800,768
STAT    cell blocks helped by commit cache                                           6,981
STAT    cell blocks processed by cache layer                                        66,969
STAT    cell blocks processed by data layer                                          6,982
STAT    cell blocks processed by txn layer                                           6,982
STAT    cell commit cache queries                                                   66,968
STAT    cell flash cache read hits                                                     576
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  544,800,768
STAT    cell physical IO interconnect bytes                                    487,801,976
STAT    cell physical IO interconnect bytes returned by smart scan             487,785,592
STAT    cell scans                                                                       1
STAT    cell transactions found in commit cache                                      6,981
STAT    cleanout - number of ktugct calls                                           59,522
STAT    cleanouts only - consistent read gets                                       59,522
STAT    commit txn count during cleanout                                            59,522
...
STAT    physical read IO requests                                                      987
STAT    physical read bytes                                                    544,817,152
STAT    physical read requests optimized                                               576
STAT    physical read total IO requests                                                987
STAT    physical read total bytes                                              544,817,152
STAT    physical read total bytes optimized                                    329,859,072
STAT    physical read total multi block requests                                       723
STAT    physical reads                                                              66,506
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,504

I couldn’t reproduce this at will though, and it wasn’t for lack of trying.

Summary

I am not entirely sure what causes the observed behaviour, especially with regards to the commit cache and/or minscn optimisation but it evidently seems to be better to populate a GTT with the append hint if you want to make best use of Smart Scans against such segments. A very quick and dirty test against a heap table suggests that this is true for this segment types as well.

Posted in 12c Release 1, Exadata | Leave a Comment »

Smart Scanning GTTs – what about Flash Cache?

Posted by Martin Bach on November 29, 2016

Continuing the example of the previous blog post (is it possible to Smart Scan GTTs?), I am investigating if Smart Scans against GTTs can benefit from Flash Cache, or “optimised reads”. The easiest way to see this in your session is to calculate the change in session counters based on v$mystat or v$sesstat. Other tools might also provide information about the amount of data read from Flash Cache, but there are quite often extra licenses required …

Calculating the delta in session counters can be done in many ways, for example using Tanel’s session snapper. For clearly defined test cases where I can control beginning and end of the execution I prefer to use another great script for the purpose. My thanks go to Adrian Billington for providing the Oracle community with the most useful “mystats” package which I’m using here. You can – and should – get it from oracle-developer.net.

Let’s check I can get flash cache read hits on GTT scans – which would help performance of repeated queries against the segment. And since GTTs reside on TEMP, I had a few doubts whether flash cache read hits were possible.

What better than to test?

Testing helps removing ambiguity (and documenting the test result helps me remember things!), so here’s the code used:

set lines 120 tab off trimspool on verify off timing on

-- it will be crucially important to use the /*+ append */ hint 
-- as you will see in the next post about GTTs on Exadata
insert /*+ append */ into gtt select * from t4 where rownum < 400000;

commit;

-- not testing if I can get Smart Scans, I know there are :)
alter session set "_serial_direct_read" = always;

-- make sure the query can be ever so slightly more complex and always return rows
col min_id new_v min_id
col max_id new_v max_id

select min(id) min_id from gtt;
select max(id) max_id from gtt;

-- hoping to trigger flash cache population here
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;

-- and measuring effect, if any

@mystats start s=s

set echo on
select /* using_append */ count(*), id from gtt where id in (&min_id, &max_id) group by id;
set echo off

@mystats stop t=1

Have a look at the code in mystats.sql to see what the options mean. Essentially I’m asking it to record session statistics only (start s=s) and after the execution list only those counters that have changed (t=1 where t is for threshold). I hacked the script a little to order by statistic name rather than the default.

The observation

It appears I can get Flash Cache hits. The relevant statistic name is “cell flash cache read hits”, and the fact that physical IO was optimised can also be seen in “physical read requests optimized” and “physical read total bytes optimized” amongst others. Here is the relevant output from mystats:

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
...
STAT    cell blocks processed by cache layer                                        66,667
STAT    cell blocks processed by data layer                                         66,667
STAT    cell blocks processed by txn layer                                          66,667
STAT    cell flash cache read hits                                                     315
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  546,136,064
STAT    cell physical IO interconnect bytes                                        112,504
STAT    cell physical IO interconnect bytes returned by smart scan                  96,120
STAT    cell scans                                                                       1
...
STAT    physical read IO requests                                                      523
STAT    physical read bytes                                                    546,152,448
STAT    physical read requests optimized                                               315
STAT    physical read total IO requests                                                523
STAT    physical read total bytes                                              546,152,448
STAT    physical read total bytes optimized                                    329,252,864
STAT    physical read total multi block requests                                       521
STAT    physical reads                                                              66,669
STAT    physical reads cache                                                             2
STAT    physical reads direct                                                       66,667
...

I usually check the cell blocks processed by% layer first to get an idea about the “depth” of the Smart Scan-losing blocks in transaction and data layers wouldn’t be great. There’s been 1 Smart Scan against my segment (the GTT isn’t partitioned) recorded in “cell scans”, so that covers that aspect as well.

Out of the 523 “physical read IO requests” 315 were optimised so that’s a pretty good ratio of approximately 60% of the IO requests coming from Flash Cache. The other statistics represent the same ratio in bytes rather than I/O requests.

Posted in 12c Release 1, Exadata | Leave a Comment »

Little things worth knowing: Can I Smart Scan a Global Temporary Table?

Posted by Martin Bach on November 25, 2016

A question that has come up while preparing to deliver another Exadata Optimisation Class is whether you can offload queries against Global Temporary Tables (GTT) or not. There are subtle differences between heap tables and GTTs such as how statistics are handled and where they are stored that merit a little investigation.

I have used the lab environment on the X3-2 (Exadata 12.1.2.3.0, 12.1.0.2.160419 RDBMS) to test if I can Smart Scan GTTs. There are some enhancements in 12c regarding GTTs that are mentioned on the Optimiser blog here:

https://blogs.oracle.com/optimizer/entry/gtts_and_upgrading_to_oracle

The Setup

I have used a brand new, 12c non-CDB for these tests. The SGA_TARGET is deliberately set to something really small so as to provoke Smart Scans without having to resort to setting _serial_direct_read to “always” in my session.

First of all, I needed a GTT for this purpose. I have a few scripts that create large tables for testing, and T4 will serve as my basis. Here is my setup:

SQL> create global temporary table gtt on commit preserve rows 
  2  as select * from t4 where 1 = 0;

Table created.

SQL> alter session enable parallel dml;

Session altered.

SQL> insert /*+ parallel(4) */ into gtt
  2  select /*+ parallel(4) */ * from t4
  3    where rownum < 400000;

399999 rows created.

SQL> commit

Commit complete.

This might be the time to point out that the GTT has two types of statistics in 12c – shared and session. Again, please refer to the Optimiser blog post for more information about private session statistics. I first looked at dba_tables for num_rows etc but found the column empty. Using a query I found on oracle-base.com in Tim’s “Session-Private Statistics for Global Temporary Tables in Oracle Database 12c Release 1” article I detected some statistics in the end:

SQL> select table_name, object_type, num_rows, scope
  2  from user_tab_statistics where table_name = 'GTT';

TABLE_NAME                     OBJECT_TYPE    NUM_ROWS SCOPE
------------------------------ ------------ ---------- -------
GTT                            TABLE                   SHARED
GTT                            TABLE            399999 SESSION

It’s a global optimiser setting where you gather session level statistics by default.

SQL> SELECT DBMS_STATS.get_prefs('GLOBAL_TEMP_TABLE_STATS') FROM dual;

DBMS_STATS.GET_PREFS('GLOBAL_TEMP_TABLE_STATS')
--------------------------------------------------
SESSION

The Testcase

With the table set up it is now time to test the Smart Scan. In addition to a SQL trace I request a NSMTIO trace to understand Oracle’s direct path read decision better.

SQL> alter session set events 'sql_trace level 8';

Session altered.

SQL> alter session set events 'trace[NSMTIO] disk=highest';

Session altered.

SQL> select count(*), id from gtt where id in (9244048,2529293) group by id;

  COUNT(*)         ID
---------- ----------
         1    2529293
         1    9244048

SQL> exit

Looking at the trace file I can see the following output for this particular statement:

=====================
PARSING IN CURSOR #140137295360240 len=70 dep=0 uid=78 oct=3 lid=78 tim=1228663420381 hv=2599214644 ad='1398f2ab0' sqlid='9p960vqdftrjn'
select count(*), id from gtt where id in (9244048,2529293) group by id
END OF STMT
PARSE #140137295360240:c=2000,e=1954,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=2363333961,tim=1228663420380
EXEC #140137295360240:c=0,e=74,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2363333961,tim=1228663420505
WAIT #140137295360240: nam='SQL*Net message to client' ela= 8 driver id=1650815232 #bytes=1 p3=0 obj#=163899 tim=1228663420582
NSMTIO: kcbism: islarge 1 next 0 nblks 67197 type 2, bpid 3, kcbisdbfc 0 kcbnhl 32768 kcbstt 6686 keep_nb 0 kcbnbh 317347 kcbnwp 4
NSMTIO: kcbimd: nblks 67197 kcbstt 6686 kcbpnb 33430 kcbisdbfc 3 is_medium 0
NSMTIO: kcbivlo: nblks 67197 vlot 500 pnb 334303 kcbisdbfc 0 is_large 0
NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)
NSMTIO: kcbdpc:DirectRead: tsn: 3, objd: 6468608, objn: 163899
ckpt: 1, nblks: 67197, ntcache: 3, ntdist:3
Direct Path for pdb 0 tsn 3  objd 6468608 objn 163899
Direct Path 1 ckpt 1, nblks 67197 ntcache 3 ntdist 3
Direct Path mndb 0 tdiob 3978 txiob 0 tciob 101
Direct path diomrc 128 dios 2 kcbisdbfc 0
NSMTIO: Additional Info: VLOT=1671515
Object# = 6468608, Object_Size = 67197 blocks
SqlId = 9p960vqdftrjn, plan_hash_value = 2363333961, Partition# = 0
WAIT #140137295360240: nam='reliable message' ela= 1053 channel context=5884629416 channel handle=5884574224 broadcast message=5885746648 obj#=163899 tim=1228663421984
WAIT #140137295360240: nam='enq: RO - fast object reuse' ela= 122 name|mode=1380909062 2=65585 0=1 obj#=163899 tim=1228663422173
WAIT #140137295360240: nam='enq: RO - fast object reuse' ela= 96 name|mode=1380909057 2=65585 0=2 obj#=163899 tim=1228663422323
WAIT #140137295360240: nam='cell single block physical read' ela= 308 cellhash#=2133459483 diskhash#=2964949887 bytes=8192 obj#=163899 tim=1228663423031
WAIT #140137295360240: nam='cell smart table scan' ela= 153 cellhash#=2133459483 p2=0 p3=0 obj#=163899 tim=1228663423651
WAIT #140137295360240: nam='cell smart table scan' ela= 152 cellhash#=379339958 p2=0 p3=0 obj#=163899 tim=1228663424114
WAIT #140137295360240: nam='cell smart table scan' ela= 145 cellhash#=3176594409 p2=0 p3=0 obj#=163899 tim=1228663424571
WAIT #140137295360240: nam='gc cr grant 2-way' ela= 135 p1=201 p2=2273408 p3=7 obj#=163899 tim=1228663425857
WAIT #140137295360240: nam='cell single block physical read' ela= 257 cellhash#=379339958 diskhash#=2689692736 bytes=8192 obj#=163899 tim=1228663426158
WAIT #140137295360240: nam='cell single block physical read' ela= 224 cellhash#=2133459483 diskhash#=3132070477 bytes=8192 obj#=163899 tim=1228663426466
WAIT #140137295360240: nam='cell smart table scan' ela= 251 cellhash#=2133459483 p2=0 p3=0 obj#=163899 tim=1228663426863
WAIT #140137295360240: nam='cell smart table scan' ela= 268 cellhash#=379339958 p2=0 p3=0 obj#=163899 tim=1228663427213
WAIT #140137295360240: nam='cell smart table scan' ela= 225 cellhash#=3176594409 p2=0 p3=0 obj#=163899 tim=1228663427514
WAIT #140137295360240: nam='cell smart table scan' ela= 85 cellhash#=2133459483 p2=0 p3=0 obj#=163899 tim=1228663427720
WAIT #140137295360240: nam='cell smart table scan' ela= 13 cellhash#=379339958 p2=0 p3=0 obj#=163899 tim=1228663427755
...

There is a simple conclusion: yes, there is a Smart Scan. I can’t see a massive difference between the NSMTIO trace for a GTT or a heap table. Since direct path read decisions are statistics driven (check parameter _direct_read_decision_statistics_driven) and there weren’t shared statistics on GTT I assume the single block reads are related to reading object statistics. OBJ# 163899 is the object_id of the GTT.

What is interesting is the use of “enq: RO – fast object reuse” prior to the start of the Smart Scan. This surprised me a little and I wasn’t expecting it. Here’s the same trace combination for a heap table for comparison:

=====================
PARSING IN CURSOR #140120190960712 len=91 dep=0 uid=78 oct=3 lid=78 tim=1226766583470 hv=2817388421 ad='13483d3f0' sqlid='g8948rkmyvvw5'
select /* heaptest001 */ count(*), id from heapt1 where id in (6460386, 6460390) group by id
END OF STMT
PARSE #140120190960712:c=1000,e=1925,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=3353166567,tim=1226766583470
EXEC #140120190960712:c=0,e=84,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3353166567,tim=1226766583599
WAIT #140120190960712: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=163896 tim=1226766583653
NSMTIO: kcbism: islarge 1 next 0 nblks 67024 type 2, bpid 3, kcbisdbfc 0 kcbnhl 32768 kcbstt 6686 keep_nb 0 kcbnbh 317347 kcbnwp 4
NSMTIO: kcbimd: nblks 67024 kcbstt 6686 kcbpnb 33430 kcbisdbfc 3 is_medium 0
NSMTIO: kcbivlo: nblks 67024 vlot 500 pnb 334303 kcbisdbfc 0 is_large 0
NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)
NSMTIO: kcbdpc:DirectRead: tsn: 9, objd: 163896, objn: 163896
ckpt: 1, nblks: 67024, ntcache: 482, ntdist:482
Direct Path for pdb 0 tsn 9  objd 163896 objn 163896
Direct Path 1 ckpt 1, nblks 67024 ntcache 482 ntdist 482
Direct Path mndb 0 tdiob 3978 txiob 0 tciob 109
Direct path diomrc 128 dios 2 kcbisdbfc 0
NSMTIO: Additional Info: VLOT=1671515
Object# = 163896, Object_Size = 67024 blocks
SqlId = g8948rkmyvvw5, plan_hash_value = 3353166567, Partition# = 0
WAIT #140120190960712: nam='enq: KO - fast object checkpoint' ela= 351 name|mode=1263468550 2=65585 0=2 obj#=163896 tim=1226766584359
WAIT #140120190960712: nam='reliable message' ela= 1372 channel context=5884629416 channel handle=5252488952 broadcast message=5885771824 obj#=163896 tim=1226766585825
WAIT #140120190960712: nam='enq: KO - fast object checkpoint' ela= 88 name|mode=1263468550 2=65585 0=1 obj#=163896 tim=1226766585972
WAIT #140120190960712: nam='enq: KO - fast object checkpoint' ela= 90 name|mode=1263468545 2=65585 0=2 obj#=163896 tim=1226766586140
WAIT #140120190960712: nam='cell smart table scan' ela= 201 cellhash#=3176594409 p2=0 p3=0 obj#=163896 tim=1226766587377
WAIT #140120190960712: nam='cell smart table scan' ela= 137 cellhash#=379339958 p2=0 p3=0 obj#=163896 tim=1226766589330
WAIT #140120190960712: nam='cell smart table scan' ela= 127 cellhash#=2133459483 p2=0 p3=0 obj#=163896 tim=1226766593585
WAIT #140120190960712: nam='cell smart table scan' ela= 215 cellhash#=3176594409 p2=0 p3=0 obj#=163896 tim=1226766602986

This is the more familiar picture: reliable message, enq: KO – fast object checkpoint followed by cell smart table scan.

The Verdict

Quite simple this time: yes, you can use Smart Scans on GTT. Additionally there are a few caveats around the use of GTTs when you are migrating your database to 12.1 but you can read about them in the link to the Optimiser blog provided.

Posted in 12c Release 1, Exadata | 1 Comment »

JSON support in Exadata 12.1.2.1.0 and later

Posted by Martin Bach on July 8, 2015

Some time ago Oracle announced that RDBMS 12.1.0.2 has built-in support for JSON processing. A little later it was also mentioned that you have support for JSON in the Exadata storage servers for offloading. This is probably a lot more exciting to users of JSON than it is to me as I’m not a developer. However, whenever an announcement such as the one I’m referring to is made I would like to see for myself how much of it is implemented in software. Like I said, I’m not a developer so apologies for a silly example: what I’m showing you here can probably done differently and is not the best use of an Exadata. But all I really wanted to test is if JSON support actually exists. I am using cellsrv 12.1.2.1.0 and RDBMS 12.1.0.2.2 for this test.

JSON

I have to say I struggled a little bit to understand the use case for JSON and therefore did what probably everyone does and consulted the official documentation and oracle-base.com for Tim’s views on JSON. Here’s a summary of links I found useful to get started:

The Test

Ok so that was enough to get me started. I needed data, and a table to store this in. It appeared to me that an apache log could be a useful source for JSON records, so I converted my webserver’s log file to JSON using libee0 on OpenSuSE (yes I know, but it’s a great virtualisation platform). The converted file was named httpd_access_log.json and had records such as these:

{"host": "192.168.100.1", "identity": "", "user": "", "date": "05/Feb/2015:12:13:05 +0100", "request": "HEAD /ol70 HTTP/1.1", "status": "404", "size": "", "f1": "", "useragent": "Python-urllib/2.7"}
{"host": "192.168.100.1", "identity": "", "user": "", "date": "05/Feb/2015:12:13:25 +0100", "request": "GET / HTTP/1.1", "status": "403", "size": "989", "f1": "", "useragent": "Mozilla/5.0 (X11; Linux x86_64; rv:35.0) Gecko/20100101 Firefox/35.0"}

Sorry for the wide output-it’s an Apache log…

I then created the table to store the data. JSON appears to be pretty unstructured, so this will do:

SQL> create table jsontest (id number,
  2   jdata clob,
  3   constraint jsontest check (jdata is json)
  4  )
  5  lob (jdata) store as securefile (
  6   enable storage in row
  7  );

SQL> create sequence s_jsontest;

Sequence created

If you look closely then you’ll see that the JSON data is stored in an inline CLOB-that’s one of the pre-requisites for offloading LOBs in 12c.

Loading JSON

Now I needed a way to get the data into the table. I think I could have used SQLLDR but since I have rusty perl scripting skills I gave DBD::Oracle on 12.1.0.2 a go. The following script inserts records slow-by-slow or row-by-row into the table and is probably not the most efficient way to do this. But one of the reasons I blog is so that I don’t have to remember everything. If you ever wondered how to write a DBI/DBD::Oracle script here’s a starting point. Note the emphasis on “starting point” since the script has been trimmed for readability-essential error checking is not shown. Whenever you work with data make sure that your error handling is top-notch!

#!/usr/bin/perl

use strict;
use warnings;

use DBI;
use DBD::Oracle;
use Getopt::Long;

# these will be set by GetOpt::Long
my $service;            # has to be in tnsnames.ora
my $username;
my $jsonfile;

GetOptions (
  "service=s"   => \$service,
  "user=s"      => \$username,
  "jsonfile=s"  => \$jsonfile
);
die "usage: load_json.pl --service <servicename> --jsonfile [--user username] " if (!defined ($service ) || !defined ($jsonfile));

die "$jsonfile is not a file" unless ( -f $jsonfile );

print "connecting to service $service as user $username to load file $jsonfile\n";

# about to start...
my $dbh = DBI->connect ("dbi:Oracle:$service", "$username", "someCleverPasswordOrCatName")
  or die ("Cannot connect to service $service: DBI:errstr!");

print "connection to the database established, trying to load data...\n";

# prepare a cursor to loop over all entries in the file
my $sth = $dbh->prepare(q{
 insert into jsontest (id, jdata) values(s_jsontest.nextval, :json)
});

if (! open JSON, "$jsonfile")  {
  print "cannot open $jsonfile: $!\n";
  $dbh->disconnect();
  die "Cannot continue\n";
}

while (<JSON>) {
  chomp;
  $sth->bind_param(":json", $_);
  $sth->execute();
}

$dbh->disconnect();

close JSON;

print "done\n";

This script read the file and inserted all the data into the table. Again, essential error checking must be added, the script is far from being complete. You also need to set the Perl environment variables to the perl installation in $ORACLE_HOME for it to find the DBI and DBD::Oracle drivers.

Offloading or not?

It turned out that the data I inserted was of course not enough to trigger a direct path read that could turn into a Smart Scan. A little inflation of the table was needed. Once that was done I started to get my feet wet with JSON queries:

SQL> select jdata from jsontest where rownum < 6;

JDATA
--------------------------------------------------------------------------------
{"host": "192.168.100.1", "identity": "", "user": "", "date": "05/Feb/2015:12:26
{"host": "192.168.100.156", "identity": "", "user": "", "date": "05/Feb/2015:12:
{"host": "192.168.100.156", "identity": "", "user": "", "date": "05/Feb/2015:12:
{"host": "192.168.100.156", "identity": "", "user": "", "date": "05/Feb/2015:12:
{"host": "192.168.100.156", "identity": "", "user": "", "date": "05/Feb/2015:12:

Interesting. Here are a few more examples with my data set. Again, refer to oracle-base.com and the official documentation set for more information about JSON and querying it in the database. It’s by no means an Exadata only feature.

SQL> select count(*) from jsontest where json_exists(jsontest.jdata, '$.host' false on error);

  COUNT(*)
----------
   2195968

SQL> select count(*) from jsontest where not json_exists(jsontest.jdata, '$.host' false on error);

  COUNT(*)
----------
         0

And finally, here is proof that you can offload JSON data in Exadata; at least for some of the operations it should  be possible judging by the information in v$sqlfn_metadata:

SQL> select name,offloadable from v$sqlfn_metadata where name like '%JSON%'
  2  order by offloadable,name;

NAME                                               OFF
-------------------------------------------------- ---
JSON_ARRAY                                         NO
JSON_ARRAYAGG                                      NO
JSON_EQUAL                                         NO
JSON_OBJECT                                        NO
JSON_OBJECTAGG                                     NO
JSON_QUERY                                         NO
JSON_SERIALIZE                                     NO
JSON_TEXTCONTAINS2                                 NO
JSON_VALUE                                         NO
JSON                                               YES
JSON                                               YES
JSON_EXISTS                                        YES
JSON_QUERY                                         YES
JSON_VALUE                                         YES

14 rows selected.

The two entries named “JSON” are most likely “is JSON” and “is not JSON”.

And now with real data volumes on a real system using JSON_EXISTS:

SQL> select /*+ monitor am_I_offloaded */ count(*)
  2  from jsontest where json_exists(jsontest.jdata, '$.host' false on error);                                                                                        

   COUNT(*)
-----------
    2195968

Elapsed: 00:00:04.96

SQL> select * from table(dbms_xplan.display_cursor);                                                                                                        

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
SQL_ID  6j73xcww7hmcw, child number 0
-------------------------------------
select /*+ monitor am_I_offloaded */ count(*) from jsontest where
json_exists(jsontest.jdata, '$.host' false on error)

Plan hash value: 568818393

---------------------------------------------------------------------------------------
| Id  | Operation                  | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |          |       |       | 91078 (100)|          |
|   1 |  SORT AGGREGATE            |          |     1 |   610 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| JSONTEST | 21960 |    12M| 91078   (1)| 00:00:04 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage(JSON_EXISTS2("JSONTEST"."JDATA" FORMAT JSON , '$.host' FALSE ON
              ERROR)=1)
       filter(JSON_EXISTS2("JSONTEST"."JDATA" FORMAT JSON , '$.host' FALSE ON
              ERROR)=1)

So the execution plan looks promising-I can see “table access storage full” and a storage() predicate. Looking at V$SQL I get:

SQL> select sql_id, child_number,
  2  case when io_cell_offload_eligible_bytes = 0 then 'NO' else 'YES' end offloaded,
  3  io_cell_offload_eligible_bytes/power(1024,2) offload_eligible_mb,
  4  io_interconnect_bytes/power(1024,2) interconnect_mb,
  5  io_cell_offload_returned_bytes/power(1024,2) returned_mb,
  6  io_cell_offload_returned_bytes/io_cell_offload_eligible_bytes*100 offload_pct
  7   from v$sql where sql_id = '6j73xcww7hmcw';                                                                                                                                                   

SQL_ID        CHILD_NUMBER OFF OFFLOAD_ELIGIBLE_MB INTERCONNECT_MB RETURNED_MB OFFLOAD_PCT
------------- ------------ --- ------------------- --------------- ----------- -----------
6j73xcww7hmcw            0 YES         2606.695313     1191.731941 1191.724129 45.71781455

And to avoid any doubt, I have the SQL Trace as well:

PARSING IN CURSOR #140370400430072 len=120 dep=0 uid=65 oct=3 lid=65 tim=1784977054418 hv=1750582781 ad='5bfcebed8' sqlid='bfwd4t5n5gjgx'
select /*+ monitor am_I_offloaded */ count(*)   from jsontest where json_exists(jsontest.jdata, '$.host' false on error)
END OF STMT
PARSE #140370400430072:c=103984,e=272239,p=909,cr=968,cu=0,mis=1,r=0,dep=0,og=1,plh=568818393,tim=1784977054417
EXEC #140370400430072:c=0,e=105,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=568818393,tim=1784977054587
WAIT #140370400430072: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 p3=0 obj#=96305 tim=1784977054666
WAIT #140370400430072: nam='reliable message' ela= 826 channel context=27059892880 channel handle=27196561216 broadcast message=26855409216 obj#=96305 tim=1784977055727
WAIT #140370400430072: nam='enq: KO - fast object checkpoint' ela= 159 name|mode=1263468550 2=65629 0=1 obj#=96305 tim=1784977055942
WAIT #140370400430072: nam='enq: KO - fast object checkpoint' ela= 229 name|mode=1263468545 2=65629 0=2 obj#=96305 tim=1784977056265
WAIT #140370400430072: nam='cell smart table scan' ela= 196 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977057370
WAIT #140370400430072: nam='cell smart table scan' ela= 171 cellhash#=822451848 p2=0 p3=0 obj#=96298 tim=1784977057884
WAIT #140370400430072: nam='cell smart table scan' ela= 188 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977058461
WAIT #140370400430072: nam='cell smart table scan' ela= 321 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977061623
WAIT #140370400430072: nam='cell smart table scan' ela= 224 cellhash#=822451848 p2=0 p3=0 obj#=96298 tim=1784977062053
WAIT #140370400430072: nam='cell smart table scan' ela= 254 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977062487
WAIT #140370400430072: nam='cell smart table scan' ela= 7 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977062969
WAIT #140370400430072: nam='cell smart table scan' ela= 25 cellhash#=822451848 p2=0 p3=0 obj#=96298 tim=1784977063016
WAIT #140370400430072: nam='cell smart table scan' ela= 81 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977063115
WAIT #140370400430072: nam='cell smart table scan' ela= 1134 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977065442
WAIT #140370400430072: nam='cell smart table scan' ela= 6 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977065883
WAIT #140370400430072: nam='cell smart table scan' ela= 14 cellhash#=822451848 p2=0 p3=0 obj#=96298 tim=1784977065917
WAIT #140370400430072: nam='cell smart table scan' ela= 105 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977066037
WAIT #140370400430072: nam='cell smart table scan' ela= 12 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977066207
WAIT #140370400430072: nam='cell smart table scan' ela= 6605 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977072866
WAIT #140370400430072: nam='cell smart table scan' ela= 27 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977073877
WAIT #140370400430072: nam='cell smart table scan' ela= 29 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977074903
WAIT #140370400430072: nam='cell smart table scan' ela= 907 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977077783
WAIT #140370400430072: nam='cell smart table scan' ela= 28 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977078753
WAIT #140370400430072: nam='cell smart table scan' ela= 24 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977080860
WAIT #140370400430072: nam='cell smart table scan' ela= 1077 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977082935
...

Summary

So yet, it would appear as if JSON is offloaded.

Posted in 12c Release 1, Exadata | Tagged: | Leave a Comment »

IO Resource Manager for Pluggable Databases in Exadata

Posted by Martin Bach on June 16, 2015

Another interesting topic that goes over and above the CDB Resource Manager Plans I described earlier this week is the implementation of IORM Plans for Pluggable Databases. Pluggable Databases are an interesting object for studies, and I like to research things. When 12c came out there was no support for offloading initially, it is required that you are on cell software 12.1.x.x.x for full 12c support on Exadata. One aspect I was particularly interested in were PDBs and IORM. Limiting CPU on a PDB basis is already a cool thing, but in a hugely consolidated environment it does make sense to limit I/O based on a consumer group. Another one of the perks of working on Exadata :)

According to what I found out PDBs are treated as entities within the database, and they probably fall into the category of intra-database IORM. I have previously written about how DBRM plans filter down to the cells and become intra-database resource plans. This seems to be happening here, too.

IORM metric definitions

When looking at IORM I’m old fashioned and like to rely on the command line. More GUI oriented people should consider the use of OEM 12 to get similar data but in pretty pictures. But since OEM taps into cellcli under the covers it is only fair trying to understand the underlying technology.

The cells provide a lot of performance information in the metriccurrent and metrichistory views. You are shown metrics based on a name and objectType. For IORM the following objectTypes are of interest in general:

[celladmin@enkcel04 ~]$ cellcli -e "list metricdefinition attributes objectType" | grep IORM | sort | uniq
	 IORM_CATEGORY
	 IORM_CONSUMER_GROUP
	 IORM_DATABASE
	 IORM_PLUGGABLE_DATABASE

This was executed on Exadata 12.1.2.1.1, if you try this on an 11.2.3.x Exadata release you won’t see the IORM_PLUGGABLE_DATABASE category. Question is: which metrics are gathered in 12.1.2.1.1?

CELLCLI> list metricdefinition attributes name,description where objectType = 'IORM_PLUGGABLE_DATABASE'
	 PDB_FC_BY_ALLOCATED	 "Number of megabytes allocated in flash cache for this pluggable database"
	 PDB_FC_IO_BY_SEC   	 "Number of megabytes of I/O per second for this pluggable database to flash cache"
	 PDB_FC_IO_RQ       	 "Number of IO requests issued by this pluggable database to flash cache"
	 PDB_FC_IO_RQ_LG    	 "Number of large IO requests issued by this pluggable database to flash cache"
	 PDB_FC_IO_RQ_LG_SEC	 "Number of large IO requests issued by this pluggable database to flash cache per second"
	 PDB_FC_IO_RQ_SEC   	 "Number of IO requests issued by this pluggable database to flash cache per second"
	 PDB_FC_IO_RQ_SM    	 "Number of small IO requests issued by this pluggable database to flash cache"
	 PDB_FC_IO_RQ_SM_SEC	 "Number of small IO requests issued by this pluggable database to flash cache per second"
	 PDB_FD_IO_BY_SEC   	 "Number of megabytes of I/O per second for this pluggable database to flash disks"
	 PDB_FD_IO_LOAD     	 "Average I/O load from this pluggable database for flash disks"
	 PDB_FD_IO_RQ_LG    	 "Number of large IO requests issued by this pluggable database to flash disks"
	 PDB_FD_IO_RQ_LG_SEC	 "Number of large IO requests issued by this pluggable database to flash disks per second"
	 PDB_FD_IO_RQ_SM    	 "Number of small IO requests issued by this pluggable database to flash disks"
	 PDB_FD_IO_RQ_SM_SEC	 "Number of small IO requests issued by this pluggable database to flash disks per second"
	 PDB_FD_IO_TM       	 "The cumulative latency of reading blocks by this pluggable database from flash disks"
	 PDB_FD_IO_TM_RQ    	 "The rate which is the average latency of reading or writing blocks per request by this pluggable database from flash disks"
	 PDB_FD_IO_UTIL     	 "Percentage of flash resources utilized by this pluggable database"
	 PDB_FD_IO_WT_LG    	 "IORM wait time for large IO requests issued to flash disks by this pluggable database"
	 PDB_FD_IO_WT_LG_RQ 	 "Average IORM wait time per request for large IO requests issued to flash disks by this pluggable database"
	 PDB_FD_IO_WT_SM    	 "IORM wait time for small IO requests issued to flash disks by this pluggable database"
	 PDB_FD_IO_WT_SM_RQ 	 "Average IORM wait time per request for small IO requests issued to flash disks by this pluggable database"
	 PDB_IO_BY_SEC      	 "Number of megabytes of I/O per second for this pluggable database to hard disks"
	 PDB_IO_LOAD        	 "Average I/O load from this pluggable database for hard disks"
	 PDB_IO_RQ_LG       	 "Number of large IO requests issued by this pluggable database to hard disks"
	 PDB_IO_RQ_LG_SEC   	 "Number of large IO requests issued by this pluggable database to hard disks per second"
	 PDB_IO_RQ_SM       	 "Number of small IO requests issued by this pluggable database to hard disks"
	 PDB_IO_RQ_SM_SEC   	 "Number of small IO requests issued by this pluggable database to hard disks per second"
	 PDB_IO_TM_LG       	 "The cumulative latency of reading or writing large blocks by this pluggable database from hard disks"
	 PDB_IO_TM_LG_RQ    	 "The rate which is the average latency of reading or writing large blocks per request by this pluggable database from hard disks"
	 PDB_IO_TM_SM       	 "The cumulative latency of reading or writing small blocks by this pluggable database from hard disks"
	 PDB_IO_TM_SM_RQ    	 "The rate which is the average latency of reading or writing small blocks per request by this pluggable database from hard disks"
	 PDB_IO_UTIL_LG     	 "Percentage of disk resources utilized by large requests from this pluggable database"
	 PDB_IO_UTIL_SM     	 "Percentage of disk resources utilized by small requests from this pluggable database"
	 PDB_IO_WT_LG       	 "IORM wait time for large IO requests issued to hard disks by this pluggable database"
	 PDB_IO_WT_LG_RQ    	 "Average IORM wait time per request for large IO requests issued to hard disks by this pluggable database"
	 PDB_IO_WT_SM       	 "IORM wait time for small IO requests issued to hard disks by this pluggable database"
	 PDB_IO_WT_SM_RQ    	 "Average IORM wait time per request for small IO requests issued to hard disks by this pluggable database"

IORM metrics for PDBs

You still can’t “join” metricdefintition to metriccurrent but it is possible to use the objectType in metriccurrent, too. The way that Oracle externalises information about PDBs is as shown here:

CellCLI> list metriccurrent where objectType = 'IORM_PLUGGABLE_DATABASE' and metricObjectName like 'MBACHMT.*'
	 PDB_FC_BY_ALLOCATED	 MBACHMT.CDB$ROOT   	 1,843 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.PDB$SEED   	 41.500 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH0	 3,772 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH1	 3,374 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH2	 3,280 MB
...

PDB_FC_BY_ALLOCATED translates to “Number of megabytes allocated in flash cache for this pluggable database” as per the above translation. The Metric Object Name therefore is made up of the database name (I continue using MBACHMT as my CDB) and the container name. CDB$ROOT stands for the root, PDB$SEED for the seed database, and then the various PDB names you define. In my example I have “user PDBs” defined as swingbench{0,1,2}.

Stressing it

With the basics covered it is time to run some stress testing. I have created a 6 GB table named IOTEST in all my swingbench* PDBs and will use the same script to issue 80 sessions against that table in each PDB. My CDB resource manager plan is still the same, repeated here for your convenience:

begin
 dbms_resource_manager.clear_pending_area;
 dbms_resource_manager.create_pending_area;
 
 dbms_resource_manager.create_cdb_plan(
  plan => 'ENKITC_CDB_PLAN',
  comment => 'A CDB plan for 12c'
 );
 
 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench0',
  shares => 5,
  utilization_limit => 100);
 
 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench1',
  shares => 3,
  utilization_limit => 50);
 
 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench2',
  shares => 1,
  utilization_limit => 30);
 
 dbms_resource_manager.validate_pending_area;
 dbms_resource_manager.submit_pending_area;
end;
/

If I execute the scripts concurrently (80 sessions connecting against each PDB) and prefixing my scheduler with time command then I get the following results

  • swingbench0 – 143s
  • swingbench1 – 223s
  • swingbench2 – 288s

Interestingly, there is no event that would show I/O throttling in 12.1.0.2.2:

SYS:MBACHMT2> select count(*), con_id, event from v$session where username = 'SOE' group by con_id, event;

   COUNT(*)      CON_ID EVENT
----------- ----------- ----------------------------------------------------------------
         80           4 cell smart table scan
         80           5 cell smart table scan
         80           3 cell smart table scan

So all of them are Smart-Scanning. The fact that some of the sessions are throttled is not visible from the wait interface, or at least I haven’t seen a way to externalise I/O throttling. But it does happen. Using one of my favourite tools, metric_iorm.pl (available from MOS), I noticed the following:

Database: MBACHMT
Utilization:     Small=0%    Large=18%
Flash Cache:     IOPS=13087
Disk Throughput: MBPS=361
Small I/O's:     IOPS=4.0    Avg qtime=0.0ms
Large I/O's:     IOPS=346    Avg qtime=1026ms
	Consumer Group: SWINGBENCH1.SWINGBENCH_GROUP
	Utilization:     Small=0%    Large=4%
	Flash Cache:     IOPS=3482
	Disk Throughput: MBPS=91
	Small I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Large I/O's:     IOPS=87.7    Avg qtime=1336ms
	Consumer Group: SWINGBENCH0.SWINGBENCH_GROUP
	Utilization:     Small=0%    Large=13%
	Flash Cache:     IOPS=8886
	Disk Throughput: MBPS=254
	Small I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Large I/O's:     IOPS=244    Avg qtime=906ms
	Consumer Group: CDB$ROOT._ORACLE_LOWPRIBG_GROUP_
	Utilization:     Small=0%    Large=0%
	Flash Cache:     IOPS=0.8
	Disk Throughput: MBPS=0
	Small I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Large I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Consumer Group: CDB$ROOT.ORA$AUTOTASK
	Utilization:     Small=0%    Large=0%
	Flash Cache:     IOPS=1.4
	Disk Throughput: MBPS=0
	Small I/O's:     IOPS=0.4    Avg qtime=0.0ms
	Large I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Consumer Group: CDB$ROOT._ORACLE_BACKGROUND_GROUP_
	Utilization:     Small=0%    Large=0%
	Flash Cache:     IOPS=0.7
	Disk Throughput: MBPS=0
	Small I/O's:     IOPS=3.6    Avg qtime=0.0ms
	Large I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Consumer Group: SWINGBENCH2.SWINGBENCH_GROUP
	Utilization:     Small=0%    Large=0%
	Flash Cache:     IOPS=717
	Disk Throughput: MBPS=15
	Small I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Large I/O's:     IOPS=14.5    Avg qtime=1152ms

These are statistics from a single cell-this X2-2 has 3 of them. I have also gathered some of the raw stats here in case you are interested, again from a single cell:

CellCLI> list metriccurrent where name like 'PDB.*' and metricObjectName like 'MBACHMT.SW.*' and metricValue not like '0.*';
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH0	 3,779 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH1	 3,467 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH2	 3,461 MB
	 PDB_FC_IO_BY_SEC   	 MBACHMT.SWINGBENCH0	 835 MB/sec
	 PDB_FC_IO_BY_SEC   	 MBACHMT.SWINGBENCH1	 635 MB/sec
	 PDB_FC_IO_BY_SEC   	 MBACHMT.SWINGBENCH2	 338 MB/sec
	 PDB_FC_IO_RQ       	 MBACHMT.SWINGBENCH0	 4,807,060 IO requests
	 PDB_FC_IO_RQ       	 MBACHMT.SWINGBENCH1	 4,835,038 IO requests
	 PDB_FC_IO_RQ       	 MBACHMT.SWINGBENCH2	 4,833,804 IO requests
	 PDB_FC_IO_RQ_LG    	 MBACHMT.SWINGBENCH0	 4,797,336 IO requests
	 PDB_FC_IO_RQ_LG    	 MBACHMT.SWINGBENCH1	 4,828,945 IO requests
	 PDB_FC_IO_RQ_LG    	 MBACHMT.SWINGBENCH2	 4,827,506 IO requests
	 PDB_FC_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH0	 12,738 IO/sec
	 PDB_FC_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH1	 9,691 IO/sec
	 PDB_FC_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH2	 5,158 IO/sec
	 PDB_FC_IO_RQ_SEC   	 MBACHMT.SWINGBENCH0	 12,789 IO/sec
	 PDB_FC_IO_RQ_SEC   	 MBACHMT.SWINGBENCH1	 9,721 IO/sec
	 PDB_FC_IO_RQ_SEC   	 MBACHMT.SWINGBENCH2	 5,182 IO/sec
	 PDB_FC_IO_RQ_SM    	 MBACHMT.SWINGBENCH0	 9,724 IO requests
	 PDB_FC_IO_RQ_SM    	 MBACHMT.SWINGBENCH1	 6,093 IO requests
	 PDB_FC_IO_RQ_SM    	 MBACHMT.SWINGBENCH2	 6,298 IO requests
	 PDB_FC_IO_RQ_SM_SEC	 MBACHMT.SWINGBENCH0	 51.2 IO/sec
	 PDB_FC_IO_RQ_SM_SEC	 MBACHMT.SWINGBENCH1	 30.2 IO/sec
	 PDB_FC_IO_RQ_SM_SEC	 MBACHMT.SWINGBENCH2	 23.9 IO/sec
	 PDB_FD_IO_BY_SEC   	 MBACHMT.SWINGBENCH0	 835 MB/sec
	 PDB_FD_IO_BY_SEC   	 MBACHMT.SWINGBENCH1	 635 MB/sec
	 PDB_FD_IO_BY_SEC   	 MBACHMT.SWINGBENCH2	 338 MB/sec
	 PDB_FD_IO_LOAD     	 MBACHMT.SWINGBENCH0	 30.3
	 PDB_FD_IO_LOAD     	 MBACHMT.SWINGBENCH1	 30.4
	 PDB_FD_IO_LOAD     	 MBACHMT.SWINGBENCH2	 28.1
	 PDB_FD_IO_RQ_LG    	 MBACHMT.SWINGBENCH0	 4,797,336 IO requests
	 PDB_FD_IO_RQ_LG    	 MBACHMT.SWINGBENCH1	 4,828,945 IO requests
	 PDB_FD_IO_RQ_LG    	 MBACHMT.SWINGBENCH2	 4,827,506 IO requests
	 PDB_FD_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH0	 12,738 IO/sec
	 PDB_FD_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH1	 9,691 IO/sec
	 PDB_FD_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH2	 5,158 IO/sec
	 PDB_FD_IO_TM       	 MBACHMT.SWINGBENCH0	 69,803,464 us
	 PDB_FD_IO_TM       	 MBACHMT.SWINGBENCH1	 45,061,357 us
	 PDB_FD_IO_TM       	 MBACHMT.SWINGBENCH2	 40,433,099 us
	 PDB_FD_IO_TM_RQ    	 MBACHMT.SWINGBENCH0	 1.0 us/request
	 PDB_FD_IO_TM_RQ    	 MBACHMT.SWINGBENCH1	 1.0 us/request
	 PDB_FD_IO_TM_RQ    	 MBACHMT.SWINGBENCH2	 1.0 us/request
	 PDB_FD_IO_UTIL     	 MBACHMT.SWINGBENCH0	 67 %
	 PDB_FD_IO_UTIL     	 MBACHMT.SWINGBENCH1	 54 %
	 PDB_FD_IO_UTIL     	 MBACHMT.SWINGBENCH2	 29 %
	 PDB_FD_IO_WT_LG    	 MBACHMT.SWINGBENCH0	 830,669 ms
	 PDB_FD_IO_WT_LG    	 MBACHMT.SWINGBENCH1	 717,211 ms
	 PDB_FD_IO_WT_LG    	 MBACHMT.SWINGBENCH2	 221,666 ms
	 PDB_IO_BY_SEC      	 MBACHMT.SWINGBENCH0	 380 MB/sec
	 PDB_IO_BY_SEC      	 MBACHMT.SWINGBENCH1	 305 MB/sec
	 PDB_IO_BY_SEC      	 MBACHMT.SWINGBENCH2	 151 MB/sec
	 PDB_IO_LOAD        	 MBACHMT.SWINGBENCH0	 2.4
	 PDB_IO_LOAD        	 MBACHMT.SWINGBENCH1	 2.6
	 PDB_IO_LOAD        	 MBACHMT.SWINGBENCH2	 2.3
	 PDB_IO_RQ_LG       	 MBACHMT.SWINGBENCH0	 105,784 IO requests
	 PDB_IO_RQ_LG       	 MBACHMT.SWINGBENCH1	 88,549 IO requests
	 PDB_IO_RQ_LG       	 MBACHMT.SWINGBENCH2	 61,617 IO requests
	 PDB_IO_RQ_LG_SEC   	 MBACHMT.SWINGBENCH0	 365 IO/sec
	 PDB_IO_RQ_LG_SEC   	 MBACHMT.SWINGBENCH1	 292 IO/sec
	 PDB_IO_RQ_LG_SEC   	 MBACHMT.SWINGBENCH2	 145 IO/sec
	 PDB_IO_TM_LG       	 MBACHMT.SWINGBENCH0	 3,822,888,945 us
	 PDB_IO_TM_LG       	 MBACHMT.SWINGBENCH1	 3,355,167,650 us
	 PDB_IO_TM_LG       	 MBACHMT.SWINGBENCH2	 2,004,747,904 us
	 PDB_IO_TM_LG_RQ    	 MBACHMT.SWINGBENCH0	 1.0 us/request
	 PDB_IO_TM_LG_RQ    	 MBACHMT.SWINGBENCH1	 1.0 us/request
	 PDB_IO_TM_LG_RQ    	 MBACHMT.SWINGBENCH2	 1.0 us/request
	 PDB_IO_UTIL_LG     	 MBACHMT.SWINGBENCH0	 27 %
	 PDB_IO_UTIL_LG     	 MBACHMT.SWINGBENCH1	 20 %
	 PDB_IO_UTIL_LG     	 MBACHMT.SWINGBENCH2	 9 %
	 PDB_IO_WT_LG       	 MBACHMT.SWINGBENCH0	 108,668,272 ms
	 PDB_IO_WT_LG       	 MBACHMT.SWINGBENCH1	 105,099,717 ms
	 PDB_IO_WT_LG       	 MBACHMT.SWINGBENCH2	 132,192,319 ms
	 PDB_IO_WT_LG_RQ    	 MBACHMT.SWINGBENCH0	 1,655 ms/request
	 PDB_IO_WT_LG_RQ    	 MBACHMT.SWINGBENCH1	 2,979 ms/request
	 PDB_IO_WT_LG_RQ    	 MBACHMT.SWINGBENCH2	 4,921 ms/request

Some of the numbers don’t seem to make sense here, for example PDB_FD_IO_RQ_LG as the values are very similar. This is actually a feature (really!), because some metrics are cumulative, and some are instantaneous:

CELLCLI> list metricdefinition where name = 'PDB_FD_IO_RQ_LG' detail
	 name:              	 PDB_FD_IO_RQ_LG
	 description:       	 "Number of large IO requests issued by this pluggable database to flash disks"
	 metricType:        	 Cumulative
	 objectType:        	 IORM_PLUGGABLE_DATABASE
	 unit:              	 "IO requests"

So this is a cumulative metric. Others, like PDB_IO_RQ_LG_SEC measure the state “as is”:

CELLCLI> list metricdefinition where name = 'PDB_IO_RQ_LG_SEC' detail
	 name:              	 PDB_IO_RQ_LG_SEC
	 description:       	 "Number of large IO requests issued by this pluggable database to hard disks per second"
	 metricType:        	 Rate
	 objectType:        	 IORM_PLUGGABLE_DATABASE
	 unit:              	 IO/sec

Have fun!

Posted in 12c Release 1, Exadata | Tagged: , | 2 Comments »

Testing 12c CDB Resource Plans and a little bit about OEM Express

Posted by Martin Bach on June 15, 2015

Inspired by Sue Lee’s presentation at Enkitec’s E4 conference I decided to re-run my test suite to work out how you can use Database and I/O Resource Manager in Oracle 12.1.0.2.2 to help consolidating databases in the new Multi-Tenant architecture. I should point out briefly that the use of Multi-Tenant as demonstrated in this article requires you to have a license. Tuning tools shown will also require you to be appropriately licensed…

Setup

I have created a Container Database (CDB), named MBACHMT (MBACH – Multi-Tenant), in which I want to run Swingbench to experiment with I/O Resource Manager. The same Pluggable Databases (PDBs) are subject to the new CDB-(Database) Resource Manager testing. In order to simplify the task I’ll just create a single PDB for now, install Swingbench’s Order Entry schema, and clone the PDB twice. The CDB is created using dbca on an Exadata X2-2 quarter rack. Once the CDB was in place I could create the first PDB. I don’t think the steps need a lot of explanation, so here they go without much commenting. First the PDB must be created-based on the SEED database, then I create a tablespace and a user account to host the actual data.

SYS:MBACHMT1> create pluggable database swingbench0 admin user admin identified by secret roles=(DBA);

Pluggable database created.

SYS:MBACHMT1> alter pluggable database swingbench0 open instance=all;

Pluggable database altered.

SYS:MBACHMT1> alter session set container = swingbench0;

Session altered.

SYS:MBACHMT1> create tablespace soe datafile size 5g;

Tablespace created.

SYS:MBACHMT1> create user soe identified by soe default tablespace soe;

User created.

SYS:MBACHMT1> grant connect to soe;

Grant succeeded.

SYS:MBACHMT1> grant execute on dbms_lock to soe;

Grant succeeded.

One thing requires an explanation, and that’s the “alter session set container = swingbench0” command. As you can see the SQLPROMPT is comprised of username – colon – container name. The container named does not change when using the SYS account to switch the context from CDB$ROOT to a PDB, hence you continue to see the MBACHMT1 prefix when in fact I am executing commands on the PDB level.

The next step is to create a basic Order Entry schema. I wrote about this before, and so has Dominic Giles on his blog. For your reference, here is the command I used:

oewizard -scale 1 -dbap supersecret -u soe -p soe -cl -cs //enkscan2/swingbench0 -ts SOE -create

SwingBench Wizard
Author  :        Dominic Giles
Version :        2.5.0.949

Running in Lights Out Mode using config file : oewizard.xml

============================================
|           Datagenerator Run Stats        |
============================================
Connection Time                        0:00:00.005
Data Generation Time                   0:00:21.380
DDL Creation Time                      0:00:52.221
Total Run Time                         0:01:13.609
Rows Inserted per sec                      566,237
Data Generated (MB) per sec                   46.1
Actual Rows Generated                   13,009,500


Post Creation Validation Report
===============================
The schema appears to have been created successfully.

Valid Objects
=============
Valid Tables : 'ORDERS','ORDER_ITEMS','CUSTOMERS','WAREHOUSES','ORDERENTRY_METADATA','INVENTORIES','PRODUCT_INFORMATION',
'PRODUCT_DESCRIPTIONS','ADDRESSES','CARD_DETAILS'
Valid Indexes : 'PRD_DESC_PK','PROD_NAME_IX','PRODUCT_INFORMATION_PK','PROD_SUPPLIER_IX','PROD_CATEGORY_IX','INVENTORY_PK',
'INV_PRODUCT_IX','INV_WAREHOUSE_IX','ORDER_PK','ORD_SALES_REP_IX','ORD_CUSTOMER_IX','ORD_ORDER_DATE_IX',
'ORD_WAREHOUSE_IX','ORDER_ITEMS_PK','ITEM_ORDER_IX','ITEM_PRODUCT_IX','WAREHOUSES_PK','WHS_LOCATION_IX',
'CUSTOMERS_PK','CUST_EMAIL_IX','CUST_ACCOUNT_MANAGER_IX','CUST_FUNC_LOWER_NAME_IX','ADDRESS_PK','ADDRESS_CUST_IX',
'CARD_DETAILS_PK','CARDDETAILS_CUST_IX'
Valid Views : 'PRODUCTS','PRODUCT_PRICES'
Valid Sequences : 'CUSTOMER_SEQ','ORDERS_SEQ','ADDRESS_SEQ','LOGON_SEQ','CARD_DETAILS_SEQ'
Valid Code : 'ORDERENTRY'
Schema Created

With the schema in place I need a PDB Resource Plan, or in other words, enable a resource manager plan on PDB-level. The PDB Resource Plan is almost identical to non-CDB DBRM plans, with a few restrictions mentioned in the Admin Guide chapter 44, just below figure 44-4. The one that affects me is the lack of multi-level resource plans. For this reason I’m going to use a simple plan based on the RATIO mgmt_mth of dbms_resource_manager.create_plan. Not having multi-level resource plans at your disposal might actually prevent incredibly complex plans from being created, that are beautifully architected but equally difficult to understand for me at least.

To keep it simple, my PDBs just have 1 purpose: execute Swingbench. As such there won’t be an additional application user, all I care about is the SOE account. I want it to be eligible for the lion share of CPU, so here’s the plan. You must make sure that your execution context is the new PDB (swingbench0). You can make sure by selecting “sys_context(‘userenv’,’con_name’) from dual.

begin
 dbms_resource_manager.clear_pending_area;
 dbms_resource_manager.create_pending_area;

 -- create a new resource consumer group to which we will later on add plan directives
 -- a consumer group is a logical construct grouping sessions to a similar/identical workload
 dbms_resource_manager.create_consumer_group('SWINGBENCH_GROUP', 'for swingbench processing');

 dbms_resource_manager.validate_pending_area();
 dbms_resource_manager.submit_pending_area();
end;
/

begin
 dbms_resource_manager.create_pending_area();

 -- when logging in as oracle user "SOE", map this session to the SWINGBENCH_GROUP
 dbms_resource_manager.set_consumer_group_mapping(
		dbms_resource_manager.oracle_user, 'SOE', 'SWINGBENCH_GROUP');
 dbms_resource_manager.submit_pending_area();
end;
/

begin
 -- must allow the SOE user to switch from OTHERS_GROUP to SWINGBENCH_GROUP. Forgetting this step
 -- is a common reason for DBRM not to work as expected
 dbms_resource_manager_privs.grant_switch_consumer_group('SOE','SWINGBENCH_GROUP', true);
end;
/

BEGIN
 dbms_resource_manager.clear_pending_area();
 dbms_resource_manager.create_pending_area();
 
 -- new create the plan in the first step. Note the mgmt_mth which essentially requires you
 -- to think of CPU shares, not percentages. Also enforces the requirement not to use
 -- multi-level plans
 -- thanks for @fritshoogland for making this obvious to me
 dbms_resource_manager.create_plan(
 	plan => 'ENKITEC_SWINGBENCH_PDB_PLAN',
        mgmt_mth => 'RATIO',
 	comment => 'sample DBRM plan for swingbench'
 );

 -- now define what the plan is about. Give the SYS_GROUP 3 shares
 dbms_resource_manager.create_plan_directive(
  plan => 'ENKITEC_SWINGBENCH_PDB_PLAN',
  comment => 'sys_group is level 1',
  group_or_subplan => 'SYS_GROUP',
  mgmt_p1 => 3);

 -- the SWINGBENCH user gets 7 shares
 dbms_resource_manager.create_plan_directive(
  plan => 'ENKITEC_SWINGBENCH_PDB_PLAN',
  group_or_subplan => 'SWINGBENCH_GROUP',
  comment => 'us before anyone else',
  mgmt_p1 => 7
 );

 -- finally anyone not in a previous consumer group will be mapped to the
 -- OTHER_GROUPS and get 1 share. 
 dbms_resource_manager.create_plan_directive(
  plan => 'ENKITEC_SWINGBENCH_PDB_PLAN',
  group_or_subplan => 'OTHER_GROUPS',
  comment => 'all the rest',
  mgmt_p1 => 1
 );
 
 dbms_resource_manager.validate_pending_area();
 dbms_resource_manager.submit_pending_area();
end;
/

If you didn’t get any errors you can enable the plan in the PDB using the familiar “alter system set resource_manager_plan = ENKITEC_SWINGBENCH_PDB_PLAN;” command.

The PDB is now ready for cloning, which requires it to be open read only. Once swingbench0 is in the correct open mode, clone it using “create pluggable database swingbench1 from swingbench0;” and “create pluggable database swingbench2 from swingbench0;”. Nice-no “RMAN> duplicate database to swingbench1 from active database” and all this … just a one-liner. Once the cloning is done, open the PDBs.

The CDB plan

With the PDBs all registered I am now able to define a CDB resource plan in CDB$ROOT. Again, check using “select sys_context(‘userenv’,’con_name’) from dual” that you are in the root, not a PDB. Here is the plan:

begin
 dbms_resource_manager.clear_pending_area;
 dbms_resource_manager.create_pending_area;

 dbms_resource_manager.create_cdb_plan(
  plan => 'ENKITC_CDB_PLAN',
  comment => 'A CDB plan for 12c'
 );

 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench0',
  shares => 5,
  utilization_limit => 100);

 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench1',
  shares => 3,
  utilization_limit => 50);

 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench2',
  shares => 1,
  utilization_limit => 30);

 dbms_resource_manager.validate_pending_area;
 dbms_resource_manager.submit_pending_area;
end;
/

SQL> alter system set RESOURCE_MANAGER_PLAN = 'FORCE:ENKITC_CDB_PLAN' scope=both sid='*';

System altered.

With this CDB plan I defined CPU shares and utilisation limits. There are other items worth investigating, refer to the documentation for those. Please take note that except for swingbench0 I capped the maximum utilisation of 50 and 30 percent respectively. This will be interesting later.

Testing

No workload is 100% CPU-bound and I tried a few iterations before coming up with a suitable model to view the CDB Resource Plan in action. In the end all I needed was CPU burning nonsense, and I have found one way of burning CPU by calculating millions of square roots. I have written a small “launcher” script that can execute a SQL script against a (Pluggable) database x-many times. When exceeding the CPU capacity on the system I should be able to see the effect. So I launched 20 sessions of my CPU burning for-loops against each of the PDBs in instance two and connected against CDB$ROOT to see the effect:

SYS:MBACHMT2> select count(*), con_id, inst_id, event from gv$session where username = 'SOE' group by con_id, inst_id, event;

   COUNT(*)      CON_ID     INST_ID EVENT
----------- ----------- ----------- ----------------------------------------------------------------
         20           3           2 resmgr:cpu quantum
         20           4           2 resmgr:cpu quantum
         20           5           2 resmgr:cpu quantum

And yes, that looks like it ;)

The timings for the executions were:

  • 232 seconds for swingbench0
  • 318 seconds for swingbench1
  • 509 seconds for swingbench2

A little bit of OEM Express

OEM Express was running at the time and it allows you to see the utilisation of your PDBs:

CDB Resource Plan with all 3 PDBs active

CDB Resource Plan with all 3 PDBs active

You can see the 3 PDBs working along. On the CPU bars to the right you can see the number of running sessions (green) and those waiting (beige). You can also see the entitlement as per shares (black vertical bar in “CPU Resource Limits”) and the utilisation limit (red vertical bar) The output almost perfectly matched the configuration.

 

Posted in 12c Release 1, Exadata | Tagged: , , | 1 Comment »

Smart Scan enhancements in Oracle Exadata 12c-inline LOBs

Posted by Martin Bach on May 25, 2015

As part of the 12c release of the database and cellos Oracle introduced the ability to perform Smart Scans on inline LOBs. The enhancement is certainly for Secure Files only. And as always, if you would like to read up about Secure Files you can head over to Tim Hall’s website for the details.

To demonstrate the new behaviour I have used an 11.2.0.3 BP 22 database on Exadata 11.2.3.3.1 and 12.1.0.2.2 database on Exadata 12.1.2.1.0. The test case evolves around the following table:

CREATE TABLE loboffload (
 id number primary key,
 comments clob)
enable row movement 
tablespace users
lob (comments) store as securefile (
 enable storage in row
);

This is the short version, Oracle fills in the blanks and converts the table DDL to

SQL> select dbms_metadata.get_ddl('TABLE','LOBOFFLOAD') from dual;

DBMS_METADATA.GET_DDL('TABLE','LOBOFFLOAD')
--------------------------------------------------------------------------------

  CREATE TABLE "MARTIN"."LOBOFFLOAD"
   (    "ID" NUMBER,
        "COMMENTS" CLOB,
         PRIMARY KEY ("ID")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"  ENABLE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"
 LOB ("COMMENTS") STORE AS SECUREFILE (
  TABLESPACE "USERS" ENABLE STORAGE IN ROW CHUNK 8192
  NOCACHE LOGGING  NOCOMPRESS  KEEP_DUPLICATES
  STORAGE(INITIAL 106496 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT))  ENABLE ROW
 MOVEMENT

The idea is that I am using the comments lob column to simulate a common approach in application design where you define a free text field on a GUI screen for the user to provide additional information about the record to be stored in the database. The design is not very practical though.

The important information is in the section about the LOB. Here you find the request to store the LOB as a SECUREFILE in-line with the table. Out of line LOBs are not offloadable as of 12.1.2.1.0 as far as I know.

To feed some data into the table I copied added a bit of data and then used the insert … append technique to populate it:

alter session enable parallel dml;

-- with nods to Jonathan Lewis who published this in one of his presentations on generating data
insert /*+ append parallel(4) */ into loboffload
with v1 as (
 SELECT rownum n FROM dual CONNECT BY level <= 10000
)
SELECT  rownum id,
  CASE
    WHEN mod(rownum,100000) = 0     THEN 'THIS WAS EXCEPTIONAL' 
    WHEN mod(rownum,10000) = 0      THEN 'THIS WAS VERY GOOD'
    WHEN mod(rownum,1000) = 0       THEN 'THIS WAS GOOD'
    WHEN mod(rownum,100) = 0        THEN 'OK'
    ELSE 'NO COMPLAINTS'
  END 
FROM v1,
  v1
WHERE rownum <= 1e6;

create sequence loboffload_s cache 10000 start with 1000001;


insert /*+ append parallel(4) */ into loboffload
select /*+ parallel(4) */ loboffload_s.nextval, comments from loboffload;

I ended up with 16 million rows in the end. I used impdp over a network link to move the table to the other database.

Test with 11.2.0.3

The test in 11.2.0.3 is used to confirm that inline LOBs are not offloadable. Here is the query and result:

MARTIN@ora11203> select /*+ monitor noparallel full(t) */ count(*) from loboffload t where comments like '%GOOD%';

  COUNT(*)
----------
     15840

Elapsed: 00:01:33.48

MARTIN@ora11203> -- removing irrelevant statistics from the output
MARTIN@ora11203> @mystats stop t=1
==========================================================================================
MyStats report : 25-MAY-2015 02:33:24
==========================================================================================


------------------------------------------------------------------------------------------
1. Summary Timings
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
TIMER   snapshot interval (seconds)                                                  95.28
TIMER   CPU time used (seconds)                                                      93.00


------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    CPU used by this session                                                     9,303
...
STAT    DB time                                                                      9,353
...
STAT    cell flash cache read hits                                                   9,454
STAT    cell physical IO interconnect bytes                                  1,233,436,672
...
STAT    consistent gets                                                            150,794
STAT    consistent gets - examination                                                    3
STAT    consistent gets direct                                                     150,558
STAT    consistent gets from cache                                                     236
STAT    consistent gets from cache (fastpath)                                          231
STAT    cursor authentications                                                           3
...
STAT    execute count                                                                   13
...
STAT    logical read bytes from cache                                           14,393,344
STAT    no work - consistent read gets                                             150,558
...
STAT    parse count (hard)                                                               1
STAT    parse count (total)                                                             13
STAT    parse time cpu                                                                   1
STAT    parse time elapsed                                                               1
STAT    physical read IO requests                                                    9,459
STAT    physical read bytes                                                  1,233,436,672
STAT    physical read requests optimized                                             9,454
STAT    physical read total IO requests                                              9,459
STAT    physical read total bytes                                            1,233,436,672
STAT    physical read total bytes optimized                                  1,233,395,712
STAT    physical read total multi block requests                                     9,255
STAT    physical reads                                                             150,566
STAT    physical reads cache                                                             8
STAT    physical reads direct                                                      150,558
...
STAT    table scan blocks gotten                                                   150,558
STAT    table scan rows gotten                                                  16,000,000
STAT    table scans (direct read)                                                        1
STAT    table scans (long tables)                                                        1
...

------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

So none of the essential cell% events recorded except for Flash Cache read hits. Without the cell scans statistics incremented by 1 (for the table segment) you can conclude that no Smart Scan happened here.

Test on 12.1.0.2/12.1.2.1.0

The first result was not too surprising, as it is the documented behaviour. The second test should hopefully reveal some offloading.

MARTIN@ORA12c> select /*+ monitor noparallel full(t) */ count(*) from loboffload t where comments like '%GOOD%';

  COUNT(*)
----------
     15840

Elapsed: 00:00:01.65

MARTIN@ORA12c> @mystats stop t=1
==========================================================================================
MyStats report : 25-MAY-2015 02:29:46
==========================================================================================


------------------------------------------------------------------------------------------
1. Summary Timings
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
TIMER   snapshot interval (seconds)                                                  14.61
TIMER   CPU time used (seconds)                                                       0.03


------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    CPU used by this session                                                         9
...
STAT    cell IO uncompressed bytes                                           1,234,296,832
STAT    cell blocks helped by minscn optimization                                  150,666
STAT    cell blocks processed by cache layer                                       150,671
STAT    cell blocks processed by data layer                                        150,671
STAT    cell blocks processed by txn layer                                         150,671
STAT    cell flash cache read hits                                                   1,143
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                1,234,296,832
STAT    cell physical IO interconnect bytes                                      6,273,368
STAT    cell physical IO interconnect bytes returned by smart scan               6,273,368
STAT    cell scans                                                                       1
...
STAT    consistent gets                                                            151,053
STAT    consistent gets direct                                                     150,671
STAT    consistent gets examination                                                      3
STAT    consistent gets examination (fastpath)                                           3
STAT    consistent gets from cache                                                     382
STAT    consistent gets pin                                                            379
STAT    consistent gets pin (fastpath)                                                 379
...
STAT    parse count (total)                                                             13
STAT    physical read IO requests                                                    1,205
STAT    physical read bytes                                                  1,234,296,832
STAT    physical read requests optimized                                             1,143
STAT    physical read total IO requests                                              1,205
STAT    physical read total bytes                                            1,234,296,832
STAT    physical read total bytes optimized                                  1,171,423,232
STAT    physical read total multi block requests                                     1,189
STAT    physical reads                                                             150,671
STAT    physical reads direct                                                      150,671
...
STAT    table scan blocks gotten                                                    15,744
STAT    table scan disk non-IMC rows gotten                                      1,670,692
STAT    table scan rows gotten                                                   1,670,692
STAT    table scans (direct read)                                                        1
STAT    table scans (short tables)                                                       1
...

------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

Good news, what Oracle said is true. The same query that did not offload in 11.2.0.3 is now offloaded in 12.1.0.2 with Exadata 12.1.2.1.0. You find proof in the existence of all the cell% statistics, especially cell scans plus the … processed by … layer statistic.

Posted in 12c Release 1, Exadata | 1 Comment »

Exadata Fast Data File Creation

Posted by Martin Bach on May 11, 2015

This post is the result of some testing I performed on Exadata data file creation. You may know that Exadata offloading incorporates SQL optimisations as well as some infrastructure work. For quite some time Exadata allowed the DBA to create data files a lot quicker than on traditional systems. This has been documented before, for example by @mpnsh here

The final comment on his blog entry was a remark that data file creation is quite fast, but that is not true for online redo logs. Especially in environments where you duplicate production to a lower tier environment you have to wait for the online redo logs (including all members across all threads) to be zeroed. This is no longer an issue with Fast Data File Creation. If your system is configured to use Write-Back Flash Cache (WBFC from now on) and you are on Exadata 11.2.3.3 then you can benefit from super-fast file creation, including online redo logs. Here is an example, taken from a SQL trace:

TKPROF: Release 12.1.0.2.0 
...
Trace file: /u01/app/oracle/diag/rdbms/mbach/MBACH1/trace/MBACH1_ora_124411.trc
Sort options: default

********************************************************************************
count    = number of times OCI procedure was executed
cpu      = cpu time in seconds executing
elapsed  = elapsed time in seconds executing
disk     = number of physical reads of buffers from disk
query    = number of buffers gotten for consistent read
current  = number of buffers gotten in current mode (usually for update)
rows     = number of rows processed by the fetch or execute call
********************************************************************************

SQL ID: fvt6psf2t3cdz Plan Hash: 0

alter database add logfile thread 2 group 5 ('+DATA','+RECO') size 4096m


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.37       5.09          0          0          2           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.38       5.09          0          0          2           0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  Disk file Mirror Read                           2        0.00          0.00
  control file sequential read                   23        0.00          0.01
  KSV master wait                                19        0.04          0.09
  ASM file metadata operation                     6        0.00          0.00
  CSS initialization                              1        0.00          0.00
  CSS operation: query                            6        0.00          0.00
  CSS operation: action                           2        0.00          0.00
  kfk: async disk IO                              2        0.00          0.00
  Disk file operations I/O                        3        0.00          0.00
  cell smart file creation                     3258        0.13          4.64
  log file single write                           2        0.00          0.00
  control file parallel write                    10        0.07          0.07
  log file sync                                   1        0.00          0.00
  SQL*Net message to client                       1        0.00          0.00
  SQL*Net message from client                     1       17.61         17.61
********************************************************************************

I have tried to replicate the command issued in Martin Nash’s blog post. If memory serves me right he was waiting for 37 seconds for this command to complete. The new method uses about 5 seconds-not bad! Interestingly there is no new wait event-the one we know (cell smart file creation) is used to indicate the work that has been done. In the case of Fast Data File Creation only the metadata about the new file is persisted in the WBFC, the actual formatting has not happened when the prompt returned.

Session Counters?

I have spent _a lot_ of time on Exadata related session counters in 12.1.0.2 and earlier releases. I think they are fascinating, and since the wait events do not really show me what happened during the execution of the statement I used the session counters instead. I am using Adrian Billington’s mystats tool for this:

SQL> @scripts/mystats start

SQL> alter database add logfile thread 2 group 5 ('+DATA','+RECO') size 4096m;

Database altered.

Elapsed: 00:00:04.87
SQL> @scripts/mystats stop t=1
...
STAT    cell flash cache read hits                                                      20
STAT    cell logical write IO requests                                                  14
STAT    cell overwrites in flash cache                                                  30
STAT    cell physical IO bytes eligible for predicate offload               17,179,869,184
STAT    cell physical IO bytes saved during optimized file creation         17,179,869,184
STAT    cell physical IO interconnect bytes                                      1,361,920
STAT    cell writes to flash cache                                                      30
...
STAT    physical write requests optimized                                               10
STAT    physical write total IO requests                                             4,126
STAT    physical write total bytes                                          17,180,197,888
STAT    physical write total bytes optimized                                       163,840
STAT    physical write total multi block requests                                    4,096

Let’s begin with the cell%-statistics. These indicate what happened on the storage layer. The command I executed was to create an online redo log group in thread 2, in disk groups DATA and RECO. Each file is approximately 4 GB in size.

SQL> select f.member, l.bytes/power(1024,2) m 
  2  from v$log l, v$logfile f
  3  where l.group# = f.group#
  4  and l.group# = 5;

MEMBER                                                                M
------------------------------------------------------------ ----------
+DATA/MBACH/ONLINELOG/group_5.454.879407723                        4096
+RECO/MBACH/ONLINELOG/group_5.1229.879407725                       4096

2 rows selected.

So why does the database report 17,180,197,888 “bytes saved during optimized file creation” and later again in “physical write total bytes”? The answer is ASM mirroring. In this system DATA and RECO are protected using ASM normal redundancy, doubling the writes.

What I found amusing is that the exact number of bytes eventually written is “eligible for predicate offload”. Until quite recently I only associated Smart Scans with this statistic counter.

You can also see a few writes (and overwrites) to (Write Back) Flash Cache.

Posted in 12c Release 1, Exadata, Performance | Leave a Comment »

Little things worth knowing: exp/imp vs expdp and impdp for HCC in Exadata

Posted by Martin Bach on March 30, 2015

Do you know the difference between exp/imp and expdp/impdp when it comes to importing HCC compressed data in Exadata?

If not, then follow me through two examples. This is on 11.2.0.3/11.2.3.3.1 but applies to all database releases you can have on Exadata. The task at hand is to export a table (which happens to be non-partitioned and HCC compressed for query high) and import it into a different user’s schema. This is quite a common approach when migrating data from a non-Exadata system into an Exadata system. You could for example pre-create the DDL for the tables and implement HCC before even importing a single row. When importing the data, the partitions’ HCC attributes will be honoured and data will be inserted compressed. Or won’t it?

The table

The table I want to export resides on our V2 system. Since I am (trying to be) a good citizen I want to use dbfs for the dump files. Beginning with 12.1.0.2 Grid Infrastructure you can also have ACFS by the way. Let’s start by creating the directory needed and some meta-information about the source table:

SQL> create or replace directory DBM01 as '/dbfs_direct/FS1/mbach/';

Directory created.

SQL> select owner, table_name, partitioned, num_rows, compress_for from dba_tables where table_name = 'T1_QH';

OWNER                          TABLE_NAME                     PAR   NUM_ROWS COMPRESS_FOR
------------------------------ ------------------------------ --- ---------- ------------
MARTIN                         T1_QH                          NO             QUERY HIGH

SQL> select bytes/power(1024,2) m, blocks from dba_segments where segment_name = 'T1_QH';

         M     BLOCKS
---------- ----------
        72       9216

The table is 72 MB in size, and HCC compressed (I actually ensured that it was by issuing an “alter table t1_qh move;” before the export started).

Data Pump

The first export uses expdp, followed by impdp to get the data back. I am remapping the schema so I don’t have to risk overwriting my source data.

[enkdb01:oracle:DBM011] /home/oracle/mbach
> expdp martin/xxxxxx directory=DBM01 dumpfile=exp_t1_qh.dmp logfile=exp_t1_qh.log tables=t1_qh

Export: Release 11.2.0.3.0 - Production on Tue Mar 24 06:15:34 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
Starting "MARTIN"."SYS_EXPORT_TABLE_02":  martin/******** directory=DBM01 dumpfile=exp_t1_qh.dmp logfile=exp_t1_qh.log tables=t1_qh
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 72 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
. . exported "MARTIN"."T1_QH"                            9.675 GB 10000000 rows
Master table "MARTIN"."SYS_EXPORT_TABLE_02" successfully loaded/unloaded
******************************************************************************
Dump file set for MARTIN.SYS_EXPORT_TABLE_02 is:
  /dbfs_direct/FS1/mbach/exp_t1_qh.dmp
Job "MARTIN"."SYS_EXPORT_TABLE_02" successfully completed at 06:17:16

The interesting bit here is that the table on disk occupies around 72 MB, and yet expdp tells me the 10000000 rows occupy 9.7 GB. Can anyone guess why?

[enkdb01:oracle:DBM011] /home/oracle/mbach
> ls -lh /dbfs_direct/FS1/mbach/exp_t1_qh.dmp
-rw-r----- 1 oracle dba 9.7G Mar 24 06:17 /dbfs_direct/FS1/mbach/exp_t1_qh.dmp

Yes, 10GB. Data apparently is not exported in its compressed form. Now this table is going to be imported:

[enkdb01:oracle:DBM011] /home/oracle/mbach
> impdp imptest/xxxxxx directory=DBM01 dumpfile=exp_t1_qh.dmp logfile=imp_t1_qh.log \
> tables=martin.t1_qh remap_schema=martin:imptest 

Import: Release 11.2.0.3.0 - Production on Tue Mar 24 06:23:44 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
Master table "IMPTEST"."SYS_IMPORT_TABLE_01" successfully loaded/unloaded
Starting "IMPTEST"."SYS_IMPORT_TABLE_01":  imptest/******** directory=DBM01
  dumpfile=exp_t1_qh.dmp logfile=imp_t1_qh.log tables=martin.t1_qh remap_schema=martin:imptest
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
. . imported "IMPTEST"."T1_QH"                           9.675 GB 10000000 rows
Job "IMPTEST"."SYS_IMPORT_TABLE_01" successfully completed at 06:29:53

This looks all right-all data back (the number of rows exported matches those imported). What about the segment size?

SQL> select owner, table_name, partitioned, num_rows, compress_for from dba_tables where table_name = 'T1_QH';

OWNER                          TABLE_NAME                     PAR   NUM_ROWS COMPRESS_FOR
------------------------------ ------------------------------ --- ---------- ------------
MARTIN                         T1_QH                          NO             QUERY HIGH
IMPTEST                        T1_QH                          NO             QUERY HIGH

2 rows selected.

SQL> select bytes/power(1024,2) m, blocks from dba_segments where segment_name = 'T1_QH';

         M     BLOCKS
---------- ----------
        72       9216
        72       9216

2 rows selected.

Identical down to the block.

Traditional Export/Import

Despite the fact that exp/imp are deprecated they are still included with 12.1.0.2, the current release at the time of writing. What if you did the same process with these instead? After all, many DBAs “grew up” with those tools and can use them in their sleep. This plus some initial deficits with Data Pump in 10g keep exp/imp high up in the list of tools we like.

Let’s export the table:

[enkdb01:oracle:DBM011] /home/oracle/mbach
> exp martin/xxxxxxx file=/dbfs_direct/FS1/mbach/exp_t1_qh_classic.dmp log=/dbfs_direct/FS1/mbach/exp_t1_qh_classic.log tables=t1_qh

Export: Release 11.2.0.3.0 - Production on Tue Mar 24 06:42:34 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Tes
Export done in US7ASCII character set and AL16UTF16 NCHAR character set

About to export specified tables via Conventional Path ...
. . exporting table                          T1_QH   10000000 rows exported
Export terminated successfully without warnings.

The files are more or less identical in size to the ones created before by Data Pump:

[enkdb01:oracle:DBM011] /home/oracle/mbach
> ls -lh /dbfs_direct/FS1/mbach/*classic*
-rw-r--r-- 1 oracle dba 9.7G Mar 24 07:01 /dbfs_direct/FS1/mbach/exp_t1_qh_classic.dmp
-rw-r--r-- 1 oracle dba  537 Mar 24 07:01 /dbfs_direct/FS1/mbach/exp_t1_qh_classic.log

What about the import?

[enkdb01:oracle:DBM011] /home/oracle/mbach
> imp imptest/xxxxxx file=/dbfs_direct/FS1/mbach/exp_t1_qh_classic.dmp log=/dbfs_direct/FS1/mbach/imp_t1_qh_classic.log \
> FROMUSER=martin TOUSER=imptest tables=t1_qh

Import: Release 11.2.0.3.0 - Production on Tue Mar 24 07:27:42 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Tes

Export file created by EXPORT:V11.02.00 via conventional path

Warning: the objects were exported by MARTIN, not by you

import done in US7ASCII character set and AL16UTF16 NCHAR character set
. importing MARTIN's objects into IMPTEST
. . importing table                        "T1_QH"

And this takes a loooong time. What’s happening in the background? I first checked what the session was doing. That’s simple-have a look at the output.

SQL> @scripts/as

   INST_ID   SID    SERIAL# USERNAME      PROG       SQL_ID         CHILD PLAN_HASH_VALUE      EXECS   AVG_ETIME OFF SQL_TEXT
---------- ----- ---------- ------------- ---------- ------------- ------ --------------- ---------- ----------- --- -----------------------------------------
         1   594         23 IMPTEST       imp@enkdb0 fgft1tcrr12ga      0               0      22723         .00 No  INSERT /*+NESTED_TABLE_SET_REFS+*/ INTO "

The #execs were interesting and indeed, after a couple of minutes that counter has gone up a lot:

   INST_ID   SID    SERIAL# USERNAME      PROG       SQL_ID         CHILD PLAN_HASH_VALUE      EXECS   AVG_ETIME OFF SQL_TEXT
---------- ----- ---------- ------------- ---------- ------------- ------ --------------- ---------- ----------- --- -----------------------------------------
         1   594         23 IMPTEST       imp@enkdb0 fgft1tcrr12ga      0               0     639818         .00 Yes INSERT /*+NESTED_TABLE_SET_REFS+*/ INTO "

You can probably guess what’s coming next. After quite some while the import finished:

> imp imptest/xxxxxxx file=/dbfs_direct/FS1/mbach/exp_t1_qh_classic.dmp log=/dbfs_direct/FS1/mbach/imp_t1_qh_classic.log \
> FROMUSER=martin TOUSER=imptest tables=t1_qh

Import: Release 11.2.0.3.0 - Production on Tue Mar 24 07:27:42 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Tes

Export file created by EXPORT:V11.02.00 via conventional path

Warning: the objects were exported by MARTIN, not by you

import done in US7ASCII character set and AL16UTF16 NCHAR character set
. importing MARTIN's objects into IMPTEST
. . importing table                        "T1_QH"   10000000 rows imported
Import terminated successfully without warnings.

And now for the reason of this blog post:

SQL> select owner, bytes/power(1024,2) m, blocks from dba_segments where segment_name = 'T1_QH';

OWNER                                   M     BLOCKS
------------------------------ ---------- ----------
MARTIN                                 72       9216
IMPTEST                             11227    1437056

The newly important table was not compressed at all when importing using the traditional path.

Posted in 11g Release 2, 12c Release 1, Exadata | Tagged: , , , , | 2 Comments »

Understanding enhancements to block cleanouts in Exadata part 2

Posted by Martin Bach on February 25, 2015

In part 1 of the series I tried to explain (probably a bit too verbose when it came to session statistics) what the effect is of delayed block cleanout and buffered I/O. In the final example the “dirty” blocks on disk have been cleaned out in the buffer cache, greatly reducing the amount of work to be done when reading them.

Catching up with now, and direct path reads. You probably noticed that the migration to 11.2 caused your I/O patterns to change. Suitably large segments are now read using direct path read not only during parallel query but also potentially during the serial execution of a query. Since the blocks read during a direct path read do not end up in the buffer cache there is an interesting side effect to block cleanouts. The scenario is the same unrealistic yet reproducible one as the one presented in part 1 of this article.

Enter Direct Path Reads – non Exadata

To be absolutely sure I am getting results without any optimisation offered by Exadata I am running the example on different hardware. I am also using 11.2.0.4 because that’s the version I had on my lab server. The principles here apply between versions unless I am very mistaken.

I am repeating my test under realistic conditions, leaving _serial_direct_read at its default, “auto”. This means that I am almost certainly going to see direct path reads now when scanning my table. As Christian Antognini has pointed out direct path reads have an impact on the amount of work that has to be done. The test commences with an update in session 1 updating the whole table and flushing the blocks in the buffer cache to disk to ensure they have an active transaction in the ITL part of the header. The selects on session 2 show the following behaviour. The first execution takes longer as expected, the second one is faster.

SQL> select /* not_exadata */ count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:20.62

SQL> @scripts/mystats stop r=physical|cleanout|consistent|cell|table
...
------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    active txn count during cleanout                                            83,334
STAT    cell physical IO interconnect bytes                                    747,798,528
STAT    cleanout - number of ktugct calls                                           83,334
STAT    cleanouts and rollbacks - consistent read gets                              83,334
STAT    consistent changes                                                           1,019
STAT    consistent gets                                                            751,365
STAT    consistent gets - examination                                              667,178
STAT    consistent gets direct                                                      83,334
STAT    consistent gets from cache                                                 668,031
STAT    consistent gets from cache (fastpath)                                          850
STAT    data blocks consistent reads - undo records applied                        583,334
STAT    immediate (CR) block cleanout applications                                  83,334
STAT    physical read IO requests                                                    8,631
STAT    physical read bytes                                                    747,798,528
STAT    physical read total IO requests                                              8,631
STAT    physical read total bytes                                              747,798,528
STAT    physical read total multi block requests                                       673
STAT    physical reads                                                              91,284
STAT    physical reads cache                                                         7,950
STAT    physical reads direct                                                       83,334
STAT    table fetch by rowid                                                           170
STAT    table scan blocks gotten                                                    83,334
STAT    table scan rows gotten                                                     500,000
STAT    table scans (direct read)                                                        1
STAT    table scans (short tables)                                                       1


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

The second execution shows slightly better performance as seen before in part 1. The relevant statistics are shown here:


SQL> select /* not_exadata */ count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:04.60

SQL> @scripts/mystats stop r=physical|cleanout|consistent|cell|table
...
------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    active txn count during cleanout                                            83,334
STAT    cell physical IO interconnect bytes                                    682,672,128
STAT    cleanout - number of ktugct calls                                           83,334
STAT    cleanouts and rollbacks - consistent read gets                              83,334
STAT    consistent changes                                                             472
STAT    consistent gets                                                            750,250
STAT    consistent gets - examination                                              666,671
STAT    consistent gets direct                                                      83,334
STAT    consistent gets from cache                                                 666,916
STAT    consistent gets from cache (fastpath)                                          245
STAT    data blocks consistent reads - undo records applied                        583,334
STAT    immediate (CR) block cleanout applications                                  83,334
STAT    physical read IO requests                                                      681
STAT    physical read bytes                                                    682,672,128
STAT    physical read total IO requests                                                681
STAT    physical read total bytes                                              682,672,128
STAT    physical read total multi block requests                                       673
STAT    physical reads                                                              83,334
STAT    physical reads direct                                                       83,334
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan rows gotten                                                     500,000
STAT    table scans (direct read)                                                        1


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

The amount of IO is slightly less. Still all the CR processing is done for every block in the table-each has an active transaction (active txn count during cleanout). As you would expect the buffer cache has no blocks from the segment stored:

SQL> select count(*), inst_id, status from gv$bh 
  2  where objd = (select data_object_id from dba_objects where object_name = 'T1_100K') 
  3  group by inst_id, status;

  COUNT(*)    INST_ID STATUS
---------- ---------- ----------
     86284          2 free
    584983          1 free
         1          1 scur

Subsequent executions all take approximately the same amount of time. Every execution has to perform cleanouts and rollbacks-no difference to before really except that direct path reads are used to read the table. No difference here to the buffered reads, with the exception that there aren’t any blocks from T1_100K in the buffer cache.

Commit

Committing in session 1 does not have much of an effect-the blocks read by the direct path read are going to the PGA instead of the buffer cache. A quick demonstration: the same select has been executed after the transaction in session 1 has committed.

SQL> select /* not_exadata */ count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:04.87

SQL> @scripts/mystats stop r=physical|cleanout|consistent|cell|table
...
------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    cell physical IO interconnect bytes                                    682,672,128
STAT    cleanout - number of ktugct calls                                           83,334
STAT    cleanouts only - consistent read gets                                       83,334
STAT    commit txn count during cleanout                                            83,334
STAT    consistent changes                                                             481
STAT    consistent gets                                                            166,916
STAT    consistent gets - examination                                               83,337
STAT    consistent gets direct                                                      83,334
STAT    consistent gets from cache                                                  83,582
STAT    consistent gets from cache (fastpath)                                          245
STAT    immediate (CR) block cleanout applications                                  83,334
STAT    physical read IO requests                                                      681
STAT    physical read bytes                                                    682,672,128
STAT    physical read total IO requests                                                681
STAT    physical read total bytes                                              682,672,128
STAT    physical read total multi block requests                                       673
STAT    physical reads                                                              83,334
STAT    physical reads direct                                                       83,334
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan rows gotten                                                     500,000
STAT    table scans (direct read)                                                        1


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

As you can see the work done for the cleanout is repeated very time, despite the fact that active txn count during cleanout is not found in the output. And still, there aren’t really any blocks pertaining to T1_100K in the buffer cache.

SQL> select count(*), inst_id, status from gv$bh where
  2   objd = (select data_object_id from dba_objects where object_name = 'T1_100K')
  3  group by inst_id, status;

  COUNT(*)    INST_ID STATUS
---------- ---------- ----------
     86284          2 free
    584983          1 free
         1          1 scur

OK that should be enough for now-over to the Exadata.

Direct Path Reads – Exadata

The same test again, but this time on an X2-2 quarter rack running database 12.1.0.2 and cellsrv 12.1.2.1.0. After the update of the entire table and the flushing of blocks to disk, here are the statistics for the first and second execution.

SQL> select count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:17.41

SQL> @scripts/mystats stop r=physical|cleanout|consistent|cell|table


------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    active txn count during cleanout                                            83,334
STAT    cell IO uncompressed bytes                                             682,672,128
STAT    cell blocks processed by cache layer                                        83,933
STAT    cell commit cache queries                                                   83,933
STAT    cell flash cache read hits                                                  14,011
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  682,672,128
STAT    cell physical IO interconnect bytes                                    804,842,256
STAT    cell physical IO interconnect bytes returned by smart scan             682,904,336
STAT    cell scans                                                                       1
STAT    cleanout - number of ktugct calls                                           83,334
STAT    cleanouts and rollbacks - consistent read gets                              83,334
STAT    consistent changes                                                             796
STAT    consistent gets                                                            917,051
STAT    consistent gets direct                                                      83,334
STAT    consistent gets examination                                                833,339
STAT    consistent gets examination (fastpath)                                      83,336
STAT    consistent gets from cache                                                 833,717
STAT    consistent gets pin                                                            378
STAT    consistent gets pin (fastpath)                                                 377
STAT    data blocks consistent reads - undo records applied                        750,002
STAT    immediate (CR) block cleanout applications                                  83,334
STAT    physical read IO requests                                                   16,147
STAT    physical read bytes                                                    804,610,048
STAT    physical read requests optimized                                            14,011
STAT    physical read total IO requests                                             16,147
STAT    physical read total bytes                                              804,610,048
STAT    physical read total bytes optimized                                    114,778,112
STAT    physical read total multi block requests                                       851
STAT    physical reads                                                              98,219
STAT    physical reads cache                                                        14,885
STAT    physical reads direct                                                       83,334
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan disk non-IMC rows gotten                                        500,000
STAT    table scan rows gotten                                                     500,000
STAT    table scans (direct read)                                                        1


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================


SQL> -- second execution, still no commit in session 1
SQL> select count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:01.31

SQL> @scripts/mystats stop r=physical|cleanout|consistent|cell|table
...
------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    active txn count during cleanout                                            83,334
STAT    cell IO uncompressed bytes                                             682,672,128
STAT    cell blocks processed by cache layer                                        83,949
STAT    cell commit cache queries                                                   83,949
STAT    cell flash cache read hits                                                   1,269
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  682,672,128
STAT    cell physical IO interconnect bytes                                    682,907,280
STAT    cell physical IO interconnect bytes returned by smart scan             682,907,280
STAT    cell scans                                                                       1
STAT    cleanout - number of ktugct calls                                           83,334
STAT    cleanouts and rollbacks - consistent read gets                              83,334
STAT    consistent changes                                                             791
STAT    consistent gets                                                            917,053
STAT    consistent gets direct                                                      83,334
STAT    consistent gets examination                                                833,339
STAT    consistent gets examination (fastpath)                                      83,337
STAT    consistent gets from cache                                                 833,719
STAT    consistent gets pin                                                            380
STAT    consistent gets pin (fastpath)                                                 380
STAT    data blocks consistent reads - undo records applied                        750,002
STAT    immediate (CR) block cleanout applications                                  83,334
STAT    physical read IO requests                                                    1,278
STAT    physical read bytes                                                    682,672,128
STAT    physical read requests optimized                                             1,269
STAT    physical read total IO requests                                              1,278
STAT    physical read total bytes                                              682,672,128
STAT    physical read total bytes optimized                                    678,494,208
STAT    physical read total multi block requests                                       885
STAT    physical reads                                                              83,334
STAT    physical reads direct                                                       83,334
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan disk non-IMC rows gotten                                        500,000
STAT    table scan rows gotten                                                     500,000
STAT    table scans (direct read)                                                        1


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

In the example table scans (direct read) indicates that the segment was read using a direct path read. Since this example was executed on an Exadata this direct path read was performed as a Smart Scan (cell scans = 1). However, the Smart Scan was not very successful: although all 83,334 table blocks were opened by the cache layer (the first one to touch a block on the cell during the Smart Scan) none of them passed the examination of the transaction layer-they have all been discarded. You can see that there was no saving by using the Smart Scan here: cell physical IO bytes eligible for predicate offload is equal to cell physical IO interconnect bytes returned by smart scan.

83,334 blocks were read during the table scan (table scan blocks gotten), all of which were read directly (physical reads direct). All of the blocks read had an active transaction (active txn count during cleanout) and all of these had to be rolled back (cleanouts and rollbacks – consistent read gets).

The same amount of work has to be done for every read.

Commit in session 1

After committing the transaction in session 1 the statistics for the select statement in session 2 are as follows (also note the execution time)

SQL> select count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:00.16

SQL> @scripts/mystats stop r=physical|cleanout|consistent|cell|table

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    cell IO uncompressed bytes                                             682,672,128
STAT    cell blocks helped by commit cache                                          83,334
STAT    cell blocks processed by cache layer                                        83,334
STAT    cell blocks processed by data layer                                         83,334
STAT    cell blocks processed by txn layer                                          83,334
STAT    cell commit cache queries                                                   83,334
STAT    cell flash cache read hits                                                     663
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  682,672,128
STAT    cell physical IO interconnect bytes                                     13,455,400
STAT    cell physical IO interconnect bytes returned by smart scan              13,455,400
STAT    cell scans                                                                       1
STAT    cell transactions found in commit cache                                     83,334
STAT    consistent changes                                                             791
STAT    consistent gets                                                             83,717
STAT    consistent gets direct                                                      83,334
STAT    consistent gets examination                                                      3
STAT    consistent gets examination (fastpath)                                           3
STAT    consistent gets from cache                                                     383
STAT    consistent gets pin                                                            380
STAT    consistent gets pin (fastpath)                                                 380
STAT    physical read IO requests                                                      663
STAT    physical read bytes                                                    682,672,128
STAT    physical read requests optimized                                               663
STAT    physical read total IO requests                                                663
STAT    physical read total bytes                                              682,672,128
STAT    physical read total bytes optimized                                    682,672,128
STAT    physical read total multi block requests                                       654
STAT    physical reads                                                              83,334
STAT    physical reads direct                                                       83,334
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan disk non-IMC rows gotten                                        500,000
STAT    table scan rows gotten                                                     500,000
STAT    table scans (direct read)                                                        1
STAT    table scans (short tables)                                                       1


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

There are no more active transactions found during the cleanout. The blocks however still have to be cleaned out every time the direct path read is performed. As a matter of design principle Oracle cannot reuse the blocks it has read with a direct path read (or Smart Scan for that matter) they simply are not cached.

But have a look at the execution time: quite a nice performance improvement that is. In fact there are a few things worth mentioning:

  • The segment is successfully scanned via a Smart Scan. How can you tell?
    • cell scans = 1
    • cell physical IO interconnect bytes returned by smart scan is a lot lower than cell physical IO bytes eligible for predicate offload
    • You see cell blocks processed by (cache|transaction|data) layer matching the number of blocks the table is made up of. Previously you only saw blocks processed by the cache layer
  • The commit cache made this happen
    • cell commit cache queries
    • cell blocks helped by commit cache
    • cell transactions found in commit cache

The commit cache “lives” in the cells and probably caches recently committed transactions. If a block is read by the cell and an active transaction is found the cell has a couple of options to avoid a round-trip of the block to the RDBMS layer for consistent read processing:

  1. If possible it uses the minscn optimisation. Oracle tracks the minimum SCN of all active transactions. The cell software can compare the SCN from the ITL in the block with the lowest SCN of all active transactions. If that number is greater than the SCN found in the block the transaction must have already completed and it is safe to read the block. This optimisation is not shown in the blog post, you’d see  cell blocks helped by minscn optimization increase
  2. If the first optimisation does not help the cells make use of the commit cache-visible as cell commit cache queries and if a cache hit has been scored, cell blocks helped by commit cache. While the transaction in session 1 hasn’t completed yet you could only see the commit cache queries. After the commit the queries were successful

If both of them fail the block must be sent to the RDBMS layer for consistent read processing. In that case there is no difference to the treatment of direct path read blocks outside Exadata.

Summary

The commit cache is a great enhancement Exadata offers for databases using the platform. While non-Exadata deployments will have to clean out blocks that haven’t been subject to cleanouts every time the “dirty” block is read via direct path read, this can be avoided on Exadata if the transaction has committed.

Posted in Exadata, Linux | Leave a Comment »