Martins Blog

Trying to explain complex things in simple terms

Public appearances 2015

Posted by Martin Bach on March 23, 2015

I think I’ll create this as a sticky post as soon as I found out how this works in WordPress :) This is going to be a list of my public appearances in 2015.

Upcoming Events

Paris Oracle Meetup

The next time I’ll be speaking publicly is on April 15 in Paris at the Paris Oracle Meetup. Many thanks go to @Ycolin and @BertrandDrouvot for inviting me to my first ever presentation in France! I am very honoured to present on two of my favourite topics:

  • An introduction to Exadata Smart Scans
  • Developing highly available applications in RAC 12c

DOAG Exaday

A couple of weeks later I am speaking at the DOAG Exaday 2015 on April 28. An established conference in the UK where the Exadata Days have been attracting top notch speakers this format now makes it to Germany. I am very keen to discuss more about Engineered Systems from a user’s point of view in Frankfurt. I will present my brand new talk about Resource Manager in action (in German) as well as have a hands-on workshop on offer during the afternoon during which I’ll cover the most common Exadata optimisations and how to make good use of them.

Enkitec Extreme Exadata Expo (E4)

E4 is the conference when it comes to Exadata and Big Data. I have been over twice and it’s been more than fantastic. And I would tell you the same even if I wasn’t working for Enkitec, it truly is a serious tech conference (have a look at the agenda and speakers then I’m sure you’ll believe me). My part in the conference is my all-new and updated talk about Hybrid Columnar Compression internals.

Past conferences

I have attended the following events thus far:

  • Fill the Glass webinar with Cary Millsap
  • ClubOracle in London

Posted in Public Appearances | Leave a Comment »

Understanding enhancements to block cleanouts in Exadata part 2

Posted by Martin Bach on February 25, 2015

In part 1 of the series I tried to explain (probably a bit too verbose when it came to session statistics) what the effect is of delayed block cleanout and buffered I/O. In the final example the “dirty” blocks on disk have been cleaned out in the buffer cache, greatly reducing the amount of work to be done when reading them.

Catching up with now, and direct path reads. You probably noticed that the migration to 11.2 caused your I/O patterns to change. Suitably large segments are now read using direct path read not only during parallel query but also potentially during the serial execution of a query. Since the blocks read during a direct path read do not end up in the buffer cache there is an interesting side effect to block cleanouts. The scenario is the same unrealistic yet reproducible one as the one presented in part 1 of this article.

Enter Direct Path Reads – non Exadata

To be absolutely sure I am getting results without any optimisation offered by Exadata I am running the example on different hardware. I am also using 11.2.0.4 because that’s the version I had on my lab server. The principles here apply between versions unless I am very mistaken.

I am repeating my test under realistic conditions, leaving _serial_direct_read at its default, “auto”. This means that I am almost certainly going to see direct path reads now when scanning my table. As Christian Antognini has pointed out direct path reads have an impact on the amount of work that has to be done. The test commences with an update in session 1 updating the whole table and flushing the blocks in the buffer cache to disk to ensure they have an active transaction in the ITL part of the header. The selects on session 2 show the following behaviour. The first execution takes longer as expected, the second one is faster.

SQL> select /* not_exadata */ count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:20.62

SQL> @scripts/mystats stop r=physical|cleanout|consistent|cell|table
...
------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    active txn count during cleanout                                            83,334
STAT    cell physical IO interconnect bytes                                    747,798,528
STAT    cleanout - number of ktugct calls                                           83,334
STAT    cleanouts and rollbacks - consistent read gets                              83,334
STAT    consistent changes                                                           1,019
STAT    consistent gets                                                            751,365
STAT    consistent gets - examination                                              667,178
STAT    consistent gets direct                                                      83,334
STAT    consistent gets from cache                                                 668,031
STAT    consistent gets from cache (fastpath)                                          850
STAT    data blocks consistent reads - undo records applied                        583,334
STAT    immediate (CR) block cleanout applications                                  83,334
STAT    physical read IO requests                                                    8,631
STAT    physical read bytes                                                    747,798,528
STAT    physical read total IO requests                                              8,631
STAT    physical read total bytes                                              747,798,528
STAT    physical read total multi block requests                                       673
STAT    physical reads                                                              91,284
STAT    physical reads cache                                                         7,950
STAT    physical reads direct                                                       83,334
STAT    table fetch by rowid                                                           170
STAT    table scan blocks gotten                                                    83,334
STAT    table scan rows gotten                                                     500,000
STAT    table scans (direct read)                                                        1
STAT    table scans (short tables)                                                       1


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

The second execution shows slightly better performance as seen before in part 1. The relevant statistics are shown here:


SQL> select /* not_exadata */ count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:04.60

SQL> @scripts/mystats stop r=physical|cleanout|consistent|cell|table
...
------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    active txn count during cleanout                                            83,334
STAT    cell physical IO interconnect bytes                                    682,672,128
STAT    cleanout - number of ktugct calls                                           83,334
STAT    cleanouts and rollbacks - consistent read gets                              83,334
STAT    consistent changes                                                             472
STAT    consistent gets                                                            750,250
STAT    consistent gets - examination                                              666,671
STAT    consistent gets direct                                                      83,334
STAT    consistent gets from cache                                                 666,916
STAT    consistent gets from cache (fastpath)                                          245
STAT    data blocks consistent reads - undo records applied                        583,334
STAT    immediate (CR) block cleanout applications                                  83,334
STAT    physical read IO requests                                                      681
STAT    physical read bytes                                                    682,672,128
STAT    physical read total IO requests                                                681
STAT    physical read total bytes                                              682,672,128
STAT    physical read total multi block requests                                       673
STAT    physical reads                                                              83,334
STAT    physical reads direct                                                       83,334
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan rows gotten                                                     500,000
STAT    table scans (direct read)                                                        1


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

The amount of IO is slightly less. Still all the CR processing is done for every block in the table-each has an active transaction (active txn count during cleanout). As you would expect the buffer cache has no blocks from the segment stored:

SQL> select count(*), inst_id, status from gv$bh 
  2  where objd = (select data_object_id from dba_objects where object_name = 'T1_100K') 
  3  group by inst_id, status;

  COUNT(*)    INST_ID STATUS
---------- ---------- ----------
     86284          2 free
    584983          1 free
         1          1 scur

Subsequent executions all take approximately the same amount of time. Every execution has to perform cleanouts and rollbacks-no difference to before really except that direct path reads are used to read the table. No difference here to the buffered reads, with the exception that there aren’t any blocks from T1_100K in the buffer cache.

Commit

Committing in session 1 does not have much of an effect-the blocks read by the direct path read are going to the PGA instead of the buffer cache. A quick demonstration: the same select has been executed after the transaction in session 1 has committed.

SQL> select /* not_exadata */ count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:04.87

SQL> @scripts/mystats stop r=physical|cleanout|consistent|cell|table
...
------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    cell physical IO interconnect bytes                                    682,672,128
STAT    cleanout - number of ktugct calls                                           83,334
STAT    cleanouts only - consistent read gets                                       83,334
STAT    commit txn count during cleanout                                            83,334
STAT    consistent changes                                                             481
STAT    consistent gets                                                            166,916
STAT    consistent gets - examination                                               83,337
STAT    consistent gets direct                                                      83,334
STAT    consistent gets from cache                                                  83,582
STAT    consistent gets from cache (fastpath)                                          245
STAT    immediate (CR) block cleanout applications                                  83,334
STAT    physical read IO requests                                                      681
STAT    physical read bytes                                                    682,672,128
STAT    physical read total IO requests                                                681
STAT    physical read total bytes                                              682,672,128
STAT    physical read total multi block requests                                       673
STAT    physical reads                                                              83,334
STAT    physical reads direct                                                       83,334
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan rows gotten                                                     500,000
STAT    table scans (direct read)                                                        1


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

As you can see the work done for the cleanout is repeated very time, despite the fact that active txn count during cleanout is not found in the output. And still, there aren’t really any blocks pertaining to T1_100K in the buffer cache.

SQL> select count(*), inst_id, status from gv$bh where
  2   objd = (select data_object_id from dba_objects where object_name = 'T1_100K')
  3  group by inst_id, status;

  COUNT(*)    INST_ID STATUS
---------- ---------- ----------
     86284          2 free
    584983          1 free
         1          1 scur

OK that should be enough for now-over to the Exadata.

Direct Path Reads – Exadata

The same test again, but this time on an X2-2 quarter rack running database 12.1.0.2 and cellsrv 12.1.2.1.0. After the update of the entire table and the flushing of blocks to disk, here are the statistics for the first and second execution.

SQL> select count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:17.41

SQL> @scripts/mystats stop r=physical|cleanout|consistent|cell|table


------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    active txn count during cleanout                                            83,334
STAT    cell IO uncompressed bytes                                             682,672,128
STAT    cell blocks processed by cache layer                                        83,933
STAT    cell commit cache queries                                                   83,933
STAT    cell flash cache read hits                                                  14,011
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  682,672,128
STAT    cell physical IO interconnect bytes                                    804,842,256
STAT    cell physical IO interconnect bytes returned by smart scan             682,904,336
STAT    cell scans                                                                       1
STAT    cleanout - number of ktugct calls                                           83,334
STAT    cleanouts and rollbacks - consistent read gets                              83,334
STAT    consistent changes                                                             796
STAT    consistent gets                                                            917,051
STAT    consistent gets direct                                                      83,334
STAT    consistent gets examination                                                833,339
STAT    consistent gets examination (fastpath)                                      83,336
STAT    consistent gets from cache                                                 833,717
STAT    consistent gets pin                                                            378
STAT    consistent gets pin (fastpath)                                                 377
STAT    data blocks consistent reads - undo records applied                        750,002
STAT    immediate (CR) block cleanout applications                                  83,334
STAT    physical read IO requests                                                   16,147
STAT    physical read bytes                                                    804,610,048
STAT    physical read requests optimized                                            14,011
STAT    physical read total IO requests                                             16,147
STAT    physical read total bytes                                              804,610,048
STAT    physical read total bytes optimized                                    114,778,112
STAT    physical read total multi block requests                                       851
STAT    physical reads                                                              98,219
STAT    physical reads cache                                                        14,885
STAT    physical reads direct                                                       83,334
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan disk non-IMC rows gotten                                        500,000
STAT    table scan rows gotten                                                     500,000
STAT    table scans (direct read)                                                        1


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================


SQL> -- second execution, still no commit in session 1
SQL> select count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:01.31

SQL> @scripts/mystats stop r=physical|cleanout|consistent|cell|table
...
------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    active txn count during cleanout                                            83,334
STAT    cell IO uncompressed bytes                                             682,672,128
STAT    cell blocks processed by cache layer                                        83,949
STAT    cell commit cache queries                                                   83,949
STAT    cell flash cache read hits                                                   1,269
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  682,672,128
STAT    cell physical IO interconnect bytes                                    682,907,280
STAT    cell physical IO interconnect bytes returned by smart scan             682,907,280
STAT    cell scans                                                                       1
STAT    cleanout - number of ktugct calls                                           83,334
STAT    cleanouts and rollbacks - consistent read gets                              83,334
STAT    consistent changes                                                             791
STAT    consistent gets                                                            917,053
STAT    consistent gets direct                                                      83,334
STAT    consistent gets examination                                                833,339
STAT    consistent gets examination (fastpath)                                      83,337
STAT    consistent gets from cache                                                 833,719
STAT    consistent gets pin                                                            380
STAT    consistent gets pin (fastpath)                                                 380
STAT    data blocks consistent reads - undo records applied                        750,002
STAT    immediate (CR) block cleanout applications                                  83,334
STAT    physical read IO requests                                                    1,278
STAT    physical read bytes                                                    682,672,128
STAT    physical read requests optimized                                             1,269
STAT    physical read total IO requests                                              1,278
STAT    physical read total bytes                                              682,672,128
STAT    physical read total bytes optimized                                    678,494,208
STAT    physical read total multi block requests                                       885
STAT    physical reads                                                              83,334
STAT    physical reads direct                                                       83,334
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan disk non-IMC rows gotten                                        500,000
STAT    table scan rows gotten                                                     500,000
STAT    table scans (direct read)                                                        1


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

In the example table scans (direct read) indicates that the segment was read using a direct path read. Since this example was executed on an Exadata this direct path read was performed as a Smart Scan (cell scans = 1). However, the Smart Scan was not very successful: although all 83,334 table blocks were opened by the cache layer (the first one to touch a block on the cell during the Smart Scan) none of them passed the examination of the transaction layer-they have all been discarded. You can see that there was no saving by using the Smart Scan here: cell physical IO bytes eligible for predicate offload is equal to cell physical IO interconnect bytes returned by smart scan.

83,334 blocks were read during the table scan (table scan blocks gotten), all of which were read directly (physical reads direct). All of the blocks read had an active transaction (active txn count during cleanout) and all of these had to be rolled back (cleanouts and rollbacks – consistent read gets).

The same amount of work has to be done for every read.

Commit in session 1

After committing the transaction in session 1 the statistics for the select statement in session 2 are as follows (also note the execution time)

SQL> select count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:00.16

SQL> @scripts/mystats stop r=physical|cleanout|consistent|cell|table

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    cell IO uncompressed bytes                                             682,672,128
STAT    cell blocks helped by commit cache                                          83,334
STAT    cell blocks processed by cache layer                                        83,334
STAT    cell blocks processed by data layer                                         83,334
STAT    cell blocks processed by txn layer                                          83,334
STAT    cell commit cache queries                                                   83,334
STAT    cell flash cache read hits                                                     663
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  682,672,128
STAT    cell physical IO interconnect bytes                                     13,455,400
STAT    cell physical IO interconnect bytes returned by smart scan              13,455,400
STAT    cell scans                                                                       1
STAT    cell transactions found in commit cache                                     83,334
STAT    consistent changes                                                             791
STAT    consistent gets                                                             83,717
STAT    consistent gets direct                                                      83,334
STAT    consistent gets examination                                                      3
STAT    consistent gets examination (fastpath)                                           3
STAT    consistent gets from cache                                                     383
STAT    consistent gets pin                                                            380
STAT    consistent gets pin (fastpath)                                                 380
STAT    physical read IO requests                                                      663
STAT    physical read bytes                                                    682,672,128
STAT    physical read requests optimized                                               663
STAT    physical read total IO requests                                                663
STAT    physical read total bytes                                              682,672,128
STAT    physical read total bytes optimized                                    682,672,128
STAT    physical read total multi block requests                                       654
STAT    physical reads                                                              83,334
STAT    physical reads direct                                                       83,334
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan disk non-IMC rows gotten                                        500,000
STAT    table scan rows gotten                                                     500,000
STAT    table scans (direct read)                                                        1
STAT    table scans (short tables)                                                       1


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

There are no more active transactions found during the cleanout. The blocks however still have to be cleaned out every time the direct path read is performed. As a matter of design principle Oracle cannot reuse the blocks it has read with a direct path read (or Smart Scan for that matter) they simply are not cached.

But have a look at the execution time: quite a nice performance improvement that is. In fact there are a few things worth mentioning:

  • The segment is successfully scanned via a Smart Scan. How can you tell?
    • cell scans = 1
    • cell physical IO interconnect bytes returned by smart scan is a lot lower than cell physical IO bytes eligible for predicate offload
    • You see cell blocks processed by (cache|transaction|data) layer matching the number of blocks the table is made up of. Previously you only saw blocks processed by the cache layer
  • The commit cache made this happen
    • cell commit cache queries
    • cell blocks helped by commit cache
    • cell transactions found in commit cache

The commit cache “lives” in the cells and probably caches recently committed transactions. If a block is read by the cell and an active transaction is found the cell has a couple of options to avoid a round-trip of the block to the RDBMS layer for consistent read processing:

  1. If possible it uses the minscn optimisation. Oracle tracks the minimum SCN of all active transactions. The cell software can compare the SCN from the ITL in the block with the lowest SCN of all active transactions. If that number is greater than the SCN found in the block the transaction must have already completed and it is safe to read the block. This optimisation is not shown in the blog post, you’d see  cell blocks helped by minscn optimization increase
  2. If the first optimisation does not help the cells make use of the commit cache-visible as cell commit cache queries and if a cache hit has been scored, cell blocks helped by commit cache. While the transaction in session 1 hasn’t completed yet you could only see the commit cache queries. After the commit the queries were successful

If both of them fail the block must be sent to the RDBMS layer for consistent read processing. In that case there is no difference to the treatment of direct path read blocks outside Exadata.

Summary

The commit cache is a great enhancement Exadata offers for databases using the platform. While non-Exadata deployments will have to clean out blocks that haven’t been subject to cleanouts every time the “dirty” block is read via direct path read, this can be avoided on Exadata if the transaction has committed.

Posted in Exadata, Linux | Leave a Comment »

Understanding enhancements to block cleanouts in Exadata part 1

Posted by Martin Bach on February 23, 2015

Travel time is writing time and I have the perfect setting for a techie post. Actually I got quite excited about the subject causing the article to get a bit longer than initially anticipated. In this part you can read about block cleanouts when using buffered I/O. The next part will show how this works using direct path reads and Smart Scans.

The article ultimately aims at describing the enhancements Exadata brings to the table for direct path reads and delayed block cleanouts. Delayed block cleanouts are described in Jonathan Lewis’s “Oracle Core”, and in one of his blog posts, so here’s just a summary.

The delayed block cleanout

In a nutshell Oracle’s database writer (the process persisting your transactions to disk) is free to write blocks to disk in batches when it has to be done. A commit on its own won’t trigger a write of dirty (or modified) block to disk. If it were to trigger the write, the commit-time would be proportional to the number of blocks affected by the last transaction. The commit command however completes very quickly regardless of how much data has been affected. Unlike a rollback … It is entirely possible that a block modified by a transaction is written to disk before the transaction has been completed. A little later, once the commit has been acknowledged by Oracle there is no process that would read the blocks back into the buffer cache and clear them out-this happens later. It would also be quite inefficient to do so.

Defining “clean”

Now when database writer writes to disk it is possible that the block just written has an active transaction recorded in its header. You can see this by dumping a block with an active transaction – the ITL in the header will reference the XID, UBA and the number of rows affected plus other information. The individual rows that are part of the transaction have their lock byte set as you can see in the row directory. The number in the lb field refers back to an ITL you see further up in the block dump (don’t worry I’ll show you an example shortly).

What happened in Oracle before direct path reads became a lot more common is this:

  • A user starts a transaction, for example by updating a large portion of the table
  • In the meantime the database writer flushes some blocks to disk, including some of the ones affected by the transaction.
  • The user commits a transaction
  • The next user queries the block (let’s assume it is not in the buffer cache)
  • The second session’s foreground process reads the block
    • Realises that it has an active transaction recorded in it
    • Checks if that transaction is still active
    • And clears the block out if not
  • The block with a valid CR image is now in the buffer cache

Huh?

Sounds too abstract-I agree? Let’s have an example.

SQL> select table_name,num_rows,compression,partitioned from tabs
  2  where table_name = 'T1_100k';

TABLE_NAME                       NUM_ROWS COMPRESS PAR
------------------------------ ---------- -------- ---
T1_100K                            500000 DISABLED NO

SQL> update T1_100k set state = 'MODIFIED';

5000000 rows updated.

An easy way to simulate the flushing of blocks to disk is a slightly brute-force approach of flushing the buffer cache. Only recommended in the lab, really. Let’s have a look at what the block looks like:

SQL> select dbms_rowid.rowid_block_number(rowid), dbms_rowid.rowid_relative_fno(rowid)
  2  from t1_100k where rownum < 11;

DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID)
------------------------------------ ------------------------------------
                             3940811                                    5
                             3940811                                    5
                             3940811                                    5
                             3940811                                    5
                             3940811                                    5
                             3940811                                    5
                             3940812                                    5
                             3940812                                    5
                             3940812                                    5
                             3940812                                    5

10 rows selected.

Elapsed: 00:00:00.01
SQL> alter system dump datafile 5 block 3940811;

System altered.

Elapsed: 00:00:00.01
SQL> select value from v$diag_info where name like 'Default%';

VALUE
---------------------------------------------------------------------------
/u01/app/oracle/diag/rdbms/dbm01/dbm011/trace/dbm011_ora_94351.trc

Elapsed: 00:00:00.02

The block dump for block 3940811 on datafile 5 (users tablespace) is now in that trace file:

Block header dump:  0x017c21cb
 Object id on Block? Y
 seg/obj: 0x12a13  csc: 0x00.2191ff7  itc: 3  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x17c21c8 ver: 0x01 opc: 0
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.02191d10
0x02   0x0009.005.00002575  0x0003688a.0ef1.1f  ----    6  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x017c21cb
data_block_dump,data header at 0x7f026361707c
===============
tsiz: 0x1f80
hsiz: 0x1e
pbl: 0x7f026361707c
     76543210
flag=--------
ntab=1
nrow=6
...
0xe:pti[0]      nrow=6  offs=0
0x12:pri[0]     offs=0x1b75
0x14:pri[1]     offs=0x176a
0x16:pri[2]     offs=0x135f
0x18:pri[3]     offs=0xf54
0x1a:pri[4]     offs=0xb49
0x1c:pri[5]     offs=0x73e
block_row_dump:
tab 0, row 0, @0x1b75
tl: 1035 fb: --H-FL-- lb: 0x2  cc: 6
col  0: [ 2]  c1 02
col  1: [999]
 31 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
...

The significant pieces of information here are in the ITL (interested transaction list) and the lock byte set, pointing to the second ITL entry. The absence of flags for ITL #2 indicates the transaction is ongoing (you can see this in v$transaction-this is a lab environment, in real databases you’d see more than 1 active transaction):

SQL> select xidusn, xidslot, xidsqn, status, start_scn from v$transaction;

    XIDUSN    XIDSLOT     XIDSQN STATUS            START_SCN
---------- ---------- ---------- ---------------- ----------
         9          5       9589 ACTIVE             35200741

This is indeed the transaction that is referenced in v$transaction and the ITL 0x02 above (9589 is decimal for 2575 in hexadecimal notation, 9 and 5 don’t need converting to decimal numbers). If you are curious now about some more internals then you might find this post by Arup Nanda interesting.

Now what happens if another session queries the table? Since the query in session 1 has not yet committed the changes in it must not be seen by any other session (that would be a dirty read otherwise. We do not use dirty reads in Oracle RDBMS). So to give the user a consistent view of the data the block must be rolled back. Before the test I made sure there wasn’t a block in the buffer cache for the object in question:

SQL> select count(*), inst_id, status from gv$bh where objd =
  2   (select data_object_id from dba_objects where object_name = 'T1_100K')
  3  group by inst_id, status;

  COUNT(*)    INST_ID STATUS
---------- ---------- ----------
    524327          1 free

Let’s test:

SQL> @mystats start

SQL> select /* test004 */ count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:13.96

@mystats stop t=1

==========================================================================================
MyStats report : 22-FEB-2015 08:42:09
==========================================================================================


------------------------------------------------------------------------------------------
1. Summary Timings
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
TIMER   snapshot interval (seconds)                                                  35.57
TIMER   CPU time used (seconds)                                                       3.21


------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    CPU used by this session                                                       327
STAT    CPU used when call started                                                     327
STAT    CR blocks created                                                           83,334
STAT    DB time                                                                      1,401
STAT    Requests to/from client                                                         16
STAT    SQL*Net roundtrips to/from client                                               16
STAT    active txn count during cleanout                                            83,334
STAT    buffer is not pinned count                                                       2
STAT    bytes received via SQL*Net from client                                      13,279
STAT    bytes sent via SQL*Net to client                                             5,106
STAT    calls to get snapshot scn: kcmgss                                              758
STAT    calls to kcmgas                                                             83,332
STAT    calls to kcmgcs                                                                 30
STAT    cell flash cache read hits                                                   8,045
STAT    cell physical IO interconnect bytes                                    748,576,768
STAT    change write time                                                               48
STAT    cleanout - number of ktugct calls                                           83,334
STAT    cleanouts and rollbacks - consistent read gets                              83,334
STAT    consistent changes                                                         591,717
STAT    consistent gets                                                            757,980
STAT    consistent gets examination                                                674,254
STAT    consistent gets examination (fastpath)                                      83,337
STAT    consistent gets from cache                                                 757,980
STAT    consistent gets pin                                                         83,726
STAT    consistent gets pin (fastpath)                                              83,055
STAT    data blocks consistent reads - undo records applied                        590,917
STAT    db block changes                                                            84,134
STAT    db block gets                                                                2,301
STAT    db block gets from cache                                                     2,301
STAT    enqueue releases                                                                 4
STAT    enqueue requests                                                                 4
STAT    execute count                                                                   13
STAT    file io wait time                                                       11,403,214
STAT    free buffer requested                                                      174,766
STAT    gc local grants                                                            174,713
STAT    global enqueue gets sync                                                        43
STAT    global enqueue releases                                                         41
STAT    heap block compress                                                          4,549
STAT    hot buffers moved to head of LRU                                                 4
STAT    immediate (CR) block cleanout applications                                  83,334
STAT    index fetch by key                                                               1
STAT    lob writes                                                                     375
STAT    lob writes unaligned                                                           375
STAT    logical read bytes from cache                                        6,228,221,952
STAT    messages sent                                                                    9
STAT    non-idle wait count                                                         17,446
STAT    non-idle wait time                                                           1,141
STAT    opened cursors cumulative                                                       13
STAT    parse count (hard)                                                               1
STAT    parse count (total)                                                             13
STAT    physical read IO requests                                                    8,715
STAT    physical read bytes                                                    748,576,768
STAT    physical read requests optimized                                             8,045
STAT    physical read total IO requests                                              8,715
STAT    physical read total bytes                                              748,576,768
STAT    physical read total bytes optimized                                     65,904,640
STAT    physical read total multi block requests                                       654
STAT    physical reads                                                              91,379
STAT    physical reads cache                                                        91,379
STAT    physical reads cache prefetch                                               82,664
STAT    recursive calls                                                              1,985
STAT    recursive cpu usage                                                              6
STAT    redo entries                                                                84,005
STAT    redo entries for lost write detection                                          671
STAT    redo size                                                                9,016,740
STAT    redo size for lost write detection                                       3,016,296
STAT    redo subscn max counts                                                      83,334
STAT    rows fetched via callback                                                        1
STAT    session cursor cache count                                                       4
STAT    session cursor cache hits                                                        9
STAT    session logical reads                                                      760,281
STAT    session pga memory                                                         -65,536
STAT    session uga memory                                                          65,488
STAT    session uga memory max                                                     148,312
STAT    shared hash latch upgrades - no wait                                         8,218
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan disk non-IMC rows gotten                                        500,000
STAT    table scan rows gotten                                                     500,000
STAT    table scans (short tables)                                                       1
STAT    temp space allocated (bytes)                                             2,097,152
STAT    user I/O wait time                                                           1,141
STAT    user calls                                                                      22
STAT    workarea executions - optimal                                                    1
STAT    workarea memory allocated                                                      -43


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

I am using Adrian Billington’s mystats script here which I can only recommend to you-it’s very decent (and that’s an understatement). In the example above it took two snapshots and calculated the change of the session counters in v$mystat during the query execution. Have a look at the cleanout% statistics here! These are the block cleanouts. As this generates redo you can see that recorded here too. This is a tricky interview question: can a select generate redo? Yes sure can! There is also a fair amount of physical I/O going on. After all the buffer cache was empty.

The second execution of the query completes a lot faster due to the absence of physical I/O. Nevertheless, the session has to clean out all these blocks and perform consistent read processing again (remember the transaction is still uncommitted as seen in active txn%):

SQL> select count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:01.76

SQL> @mystats stop t=1

==========================================================================================
MyStats report : 22-FEB-2015 08:42:59
==========================================================================================


------------------------------------------------------------------------------------------
1. Summary Timings
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
TIMER   snapshot interval (seconds)                                                   8.55
TIMER   CPU time used (seconds)                                                       1.75


------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    CPU used by this session                                                       181
STAT    CPU used when call started                                                     181
STAT    CR blocks created                                                           83,334
STAT    DB time                                                                        183
STAT    Requests to/from client                                                         16
STAT    SQL*Net roundtrips to/from client                                               16
STAT    active txn count during cleanout                                            83,334
STAT    buffer is not pinned count                                                       2
STAT    bytes received via SQL*Net from client                                      13,280
STAT    bytes sent via SQL*Net to client                                             5,177
STAT    calls to get snapshot scn: kcmgss                                              757
STAT    calls to kcmgas                                                             83,331
STAT    calls to kcmgcs                                                                 30
STAT    change write time                                                               28
STAT    cleanout - number of ktugct calls                                           83,334
STAT    cleanouts and rollbacks - consistent read gets                              83,334
STAT    consistent changes                                                         591,708
STAT    consistent gets                                                            757,980
STAT    consistent gets examination                                                674,254
STAT    consistent gets examination (fastpath)                                      83,337
STAT    consistent gets from cache                                                 757,980
STAT    consistent gets pin                                                         83,726
STAT    consistent gets pin (fastpath)                                              83,726
STAT    cursor authentications                                                           1
STAT    data blocks consistent reads - undo records applied                        590,917
STAT    db block changes                                                            84,125
STAT    db block gets                                                                2,895
STAT    db block gets from cache                                                     2,895
STAT    execute count                                                                   13
STAT    free buffer requested                                                       83,382
STAT    global enqueue gets sync                                                        38
STAT    global enqueue releases                                                         36
STAT    heap block compress                                                          4,549
STAT    immediate (CR) block cleanout applications                                  83,334
STAT    index fetch by key                                                               1
STAT    lob writes                                                                     375
STAT    lob writes unaligned                                                           375
STAT    logical read bytes from cache                                        6,233,088,000
STAT    messages sent                                                                    7
STAT    non-idle wait count                                                             16
STAT    opened cursors cumulative                                                       13
STAT    parse count (total)                                                             13
STAT    recursive calls                                                              1,982
STAT    recursive cpu usage                                                              6
STAT    redo entries                                                                83,334
STAT    redo size                                                                6,000,400
STAT    rows fetched via callback                                                        1
STAT    session cursor cache hits                                                       12
STAT    session logical reads                                                      760,875
STAT    session pga memory                                                      -1,769,472
STAT    session uga memory                                                      -1,858,472
STAT    shared hash latch upgrades - no wait                                           524
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan disk non-IMC rows gotten                                        500,000
STAT    table scan rows gotten                                                     500,000
STAT    table scans (short tables)                                                       1
STAT    user calls                                                                      22
STAT    workarea executions - optimal                                                    1
STAT    workarea memory allocated                                                      -83


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

I should have mentioned that these 2 examples were executed on an X2-2 quarter rack. To prevent Exadata from applying some clever optimisations I specifically disabled direct path reads and turned off Exadata features. Here is the view from the buffer cache again:

SQL> select count(*), inst_id, status from gv$bh where objd =
  2   (select data_object_id from dba_objects where object_name = 'T1_100K')
  3  group by inst_id, status;

  COUNT(*)    INST_ID STATUS
---------- ---------- ----------
    351704          1 free
         1          1 scur
     83334          1 xcur
     83334          1 cr

4 rows selected.

For each subsequent execution of the select statement consistent read processing has to be conducted. The number of CR blocks in the buffer cache will increase. For the third execution I can see the following:

SQL> r
  1  select count(*), inst_id, status from gv$bh where objd =
  2   (select data_object_id from dba_objects where object_name = 'T1_100K')
  3* group by inst_id, status

  COUNT(*)    INST_ID STATUS
---------- ---------- ----------
    267526          1 free
         1          1 scur
     83334          1 xcur
    166668          1 cr

4 rows selected.

Things improve when the transaction commits.

The impact of a commit

What about a commit now? Committing in session 1 and running the query again in session 2 shows a very good response time. Please remember that I forced buffered reads. The next part deals with this case when using direct path reads instead (those are not buffered). The segment size would have qualified for direct path reads otherwise.

SQL> select /* test004 */ count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:00.18

==========================================================================================
MyStats report : 22-FEB-2015 09:40:08
==========================================================================================


------------------------------------------------------------------------------------------
1. Summary Timings
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
TIMER   snapshot interval (seconds)                                                  19.73
TIMER   CPU time used (seconds)                                                       0.18


------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    CPU used by this session                                                        25
STAT    CPU used when call started                                                      25
STAT    DB time                                                                         25
STAT    Heatmap SegLevel - Full Table Scan                                               1
STAT    Requests to/from client                                                         16
STAT    SQL*Net roundtrips to/from client                                               16
STAT    buffer is not pinned count                                                       2
STAT    bytes received via SQL*Net from client                                      13,280
STAT    bytes sent via SQL*Net to client                                             5,107
STAT    calls to get snapshot scn: kcmgss                                              757
STAT    calls to kcmgcs                                                                 30
STAT    consistent changes                                                             800
STAT    consistent gets                                                             83,729
STAT    consistent gets examination                                                      3
STAT    consistent gets examination (fastpath)                                           3
STAT    consistent gets from cache                                                  83,729
STAT    consistent gets pin                                                         83,726
STAT    consistent gets pin (fastpath)                                              83,726
STAT    db block changes                                                               800
STAT    db block gets                                                                2,301
STAT    db block gets from cache                                                     2,301
STAT    enqueue releases                                                                 3
STAT    enqueue requests                                                                 3
STAT    execute count                                                                   13
STAT    free buffer requested                                                           53
STAT    global enqueue gets sync                                                        39
STAT    global enqueue releases                                                         37
STAT    index fetch by key                                                               1
STAT    lob writes                                                                     375
STAT    lob writes unaligned                                                           375
STAT    logical read bytes from cache                                          704,757,760
STAT    no work - consistent read gets                                              83,334
STAT    non-idle wait count                                                             16
STAT    opened cursors cumulative                                                       13
STAT    parse count (total)                                                             13
STAT    recursive calls                                                              1,984
STAT    recursive cpu usage                                                              7
STAT    rows fetched via callback                                                        1
STAT    session cursor cache count                                                       4
STAT    session cursor cache hits                                                       10
STAT    session logical reads                                                       86,030
STAT    session pga memory                                                          65,536
STAT    session pga memory max                                                      65,536
STAT    session uga memory                                                          65,488
STAT    session uga memory max                                                     148,312
STAT    shared hash latch upgrades - no wait                                           525
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan disk non-IMC rows gotten                                        500,000
STAT    table scan rows gotten                                                     500,000
STAT    table scans (short tables)                                                       1
STAT    temp space allocated (bytes)                                             2,097,152
STAT    user calls                                                                      22
STAT    workarea executions - optimal                                                    1
STAT    workarea memory allocated                                                      -40


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

Access to blocks in the buffer cache is quick! There was no need to clean out any blocks anymore. The fastest way to do something is indeed not having to do it at all. Mind you, the block on disk still has the active transaction recorded!

SQL> alter system dump datafile 5 block 3940811;

System altered.

Elapsed: 00:00:00.01
SQL> select value from v$diag_info where name like 'Default%';

VALUE
------------------------------------------------------------------------------------------------------------------------
/u01/app/oracle/diag/rdbms/dbm01/dbm011/trace/dbm011_ora_100297.trc

SQL> exit

$ vi /u01/app/oracle/diag/rdbms/dbm01/dbm011/trace/dbm011_ora_100297.trc

Block header dump:  0x017c21cb
 Object id on Block? Y
 seg/obj: 0x12a13  csc: 0x00.2317620  itc: 3  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x17c21c8 ver: 0x01 opc: 0
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.02191d10
0x02   0x0009.005.00002575  0x0003688a.0ef1.1f  --U-    6  fsc 0x0000.0232cfca
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x017c21cb
data_block_dump,data header at 0x7fc9f3cb407c
...

$ grep lb -B1 /u01/app/oracle/diag/rdbms/dbm01/dbm011/trace/dbm011_ora_100297.trc
tab 0, row 0, @0x1b75
tl: 1035 fb: --H-FL-- lb: 0x2  cc: 6
--
tab 0, row 1, @0x176a
tl: 1035 fb: --H-FL-- lb: 0x2  cc: 6
--
tab 0, row 2, @0x135f
tl: 1035 fb: --H-FL-- lb: 0x2  cc: 6
--
tab 0, row 3, @0xf54
tl: 1035 fb: --H-FL-- lb: 0x2  cc: 6
--
tab 0, row 4, @0xb49
tl: 1035 fb: --H-FL-- lb: 0x2  cc: 6
--
tab 0, row 5, @0x73e
tl: 1035 fb: --H-FL-- lb: 0x2  cc: 6
--

The transaction recorded in the ITL slot 2 is gone though:

SQL> select xidusn, xidslot, xidsqn, status, start_scn from v$transaction
  2  where xidusn = 9 and xidslot = 5 and xidsqn = 9589;

no rows selected

Did you notice the Flag in the second ITL entry? That wasn’t there before. It indicates a fast commit according to Core Oracle by Jonathan Lewis.

Summary

Block cleanouts are simple with buffered I/O. Dirty blocks can be rolled back and kept in the buffer cache and can be accessed without having to undergo block cleanout.

Posted in Exadata, Linux | 1 Comment »

What happens in ASM if usable_file_mb is negative and you lose a failgroup

Posted by Martin Bach on February 16, 2015

Having read the excellent post “Demystifying ASM REQUIRED_MIRROR_FREE_MB and USABLE_FILE_MB” again by Harald von Breederode I wanted to see what happens if you create a setup where your usable_file_mb is negative and you actually have to rebalance after a fatal failgroup error. I am using 12.1.0.2.0 on Oracle Linux 6.6/UEK3 in a KVM in case anyone is interested. I/O times aren’t stellar on that environment. It’s Oracle Restart, not clustered ASM.

Note: this post is only applicable if you are using ASM for data protection, e.g. normal or high redundancy. External redundancy is a different thing: if you lose a failgroup in a disk group with external redundancy defined then you are toast. The disk group will dismount on all ASM instances. No disk group = nothing to write to = crash of dependent databases.

Setup

Here is my setup. I created the VM with 2 volume groups, one for the operating system, and another one for Oracle. After the installation of the database and Grid Infrastructure I have a disk group named DATA comprised of 4 disks, each in their own failgroup (which is both the default and intentional):

ASMCMD> lsdsk -k
Total_MB  Free_MB  OS_MB  Name       Failgroup  Failgroup_Type  Library  Label  UDID  Product  Redund   Path
    5119     4131   5119  DATA_0000  DATA_0000  REGULAR         System                         UNKNOWN  /dev/vdc1
    5119     4127   5119  DATA_0001  DATA_0001  REGULAR         System                         UNKNOWN  /dev/vdd1
    5119     4128   5119  DATA_0002  DATA_0002  REGULAR         System                         UNKNOWN  /dev/vde1
    5119     4139   5119  DATA_0003  DATA_0003  REGULAR         System                         UNKNOWN  /dev/vdf1
ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  1048576     20476    16525             5119            5703              0             N  DATA/
ASMCMD>

My database uses the datafiles as shown here:

SQL> select name, bytes from v$datafile;

NAME                                                              BYTES
------------------------------------------------------------ ----------
+DATA/NCDB/DATAFILE/system.258.871538097                      817889280
+DATA/NCDB/DATAFILE/sysaux.257.871538033                      629145600
+DATA/NCDB/DATAFILE/undotbs1.260.871538187                     62914560
+DATA/NCDB/DATAFILE/users.259.871538183                         5242880

ASMCMD must take the file information from v$asm_file-the “space” column seems to take the normal redundancy into account:


SQL> select block_size, blocks, bytes, space, redundancy from v$asm_file where file_number = 259 and incarnation = 871538183;

BLOCK_SIZE     BLOCKS      BYTES      SPACE REDUND
---------- ---------- ---------- ---------- ------
      8192        641    5251072   12582912 MIRROR

As others have explained already the “space” column also takes into account ASM metadata that is of course mirrored too. So to me those numbers make sense. If you look up to the asmcmd output you will notice that the useable_file_mb is 5703MB.

This is where it goes wrong

Let’s cause trouble for the sake of this blog post and to trigger failure. The following is STRONGLY discouraged in your environment! I have put this here so you can see the effect of “negative free space” without having to endure this yourself.

I am deliberately oversubscribing space in the data disk group, which is definitely Not A Good Thing(tm). Please monitor usable_file_mb for each disk group carefully and add storage when needed (you might be unpleasantly surprised how long it can take to add storage so request it early). You will see why it shouldn’t be negative-ever-shortly:

SQL> create tablespace iamtoolargetofit datafile size 6g;

Tablespace created.

[oracle@server1 ~]$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  1048576     20476     4212             5119            -453              0             N  DATA/

The new tablespace is bigger than usable_file_mb and turned that number into the negative. Now if I were to lose a failgroup (the term “disk” is a bit misleading here) I’d be in Big Trouble. Since this is my controlled lab environment and I wanted to see what happens I carried on.

On the host I use virsh detach-disk to remove an arbitrary of my ASM disks. Note from further up the repair timer was 0 for all disks. In other words, there won’t be a grace period to allow for transient failures, the disk will be dropped straight away.

# virsh detach-disk --domain server_12c_asm --target /var/lib/.../server1_asm_data02.qcow2 --current
Disk detached successfully

This removed disk /dev/vdd on the guest. This has not gone unnoticed. From the ASM alert.log:

NOTE: Standard client NCDB:NCDB:ASM registered, osid 7732, mbr 0x1, asmb 7728 (reg:3524888596)
2015-02-15 06:01:56.858000 -05:00
NOTE: client NCDB:NCDB:ASM mounted group 1 (DATA)
2015-02-15 08:23:29.188000 -05:00
NOTE: process _user9772_+asm (9772) initiating offline of disk 1.3916023153 (DATA_0001) with mask 0x7e in group 1 (DATA) with client assisting
NOTE: checking PST: grp = 1
GMON checking disk modes for group 1 at 10 for pid 23, osid 9772
NOTE: checking PST for grp 1 done.
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969c571, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 11 for pid 23, osid 9772
WARNING: Write Failed. group:1 disk:1 AU:1 offset:1044480 size:4096
path:/dev/vdd1
 incarnation:0xe969c571 synchronous result:'I/O error'
 subsys:System krq:0x7fb991c5e9e8 bufp:0x7fb991c5f000 osderr1:0x0 osderr2:0x0
 IO elapsed time: 0 usec Time waited on I/O: 0 usec
WARNING: found another non-responsive disk 1.3916023153 (DATA_0001) that will be offlined
NOTE: group DATA: updated PST location: disk 0000 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0002 (PST copy 1)
NOTE: group DATA: updated PST location: disk 0003 (PST copy 2)
NOTE: PST update grp = 1 completed successfully

...

NOTE: process _b000_+asm (11355) initiating offline of disk 1.3916023153 (DATA_0001) with mask 0x7e in group 1 (DATA) without client assisting
NOTE: sending set offline flag message (1976690176) to 1 disk(s) in group 1
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_gmon_5836.trc:
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 4088
Additional information: 4294967295
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_gmon_5836.trc:
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 4088
Additional information: 4294967295
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_gmon_5836.trc:
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 4088
Additional information: 4294967295
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_gmon_5836.trc:

...


WARNING: Disk 1 (DATA_0001) in group 1 mode 0x15 is now being offlined
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969c571, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 12 for pid 25, osid 11355

...

SUCCESS: PST-initiated drop disk in group 1(3645453725))
2015-02-15 08:24:04.910000 -05:00
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: starting rebalance of group 1/0xd949359d (DATA) at power 1
Starting background process ARB0
ARB0 started with pid=26, OS id=11364
NOTE: assigning ARB0 to group 1/0xd949359d (DATA) with 1 parallel I/O
NOTE: F1X0 on disk 3 (fmt 2) relocated at fcn 0.26087: AU 0 -> AU 4059
NOTE: 02/15/15 08:24:04 DATA.F1X0 copy 2 relocating from 1:10 to 2:10 at FCN 0.26087
NOTE: 02/15/15 08:24:04 DATA.F1X0 copy 3 relocating from 2:10 to 3:4059 at FCN 0.26087
NOTE: F1B1 fcn on disk 0 synced at fcn 0.26087
NOTE: F1B1 fcn on disk 2 synced at fcn 0.26087
2015-02-15 08:24:28.853000 -05:00
NOTE: restored redundancy of control and online logs in DATA
2015-02-15 08:24:33.219000 -05:00

...

And that went on for a little while. As with any rebalance you can see the progress in v$asm_operation:

SQL> r
  1* select * from v$asm_operation

GROUP_NUMBER OPERA PASS      STAT      POWER     ACTUAL      SOFAR   EST_WORK   EST_RATE EST_MINUTES ERROR_CODE                                       CON_ID
------------ ----- --------- ---- ---------- ---------- ---------- ---------- ---------- ----------- -------------------------------------------- ----------
           1 REBAL RESYNC    DONE         11         11          0          0          0           0                                                       0
           1 REBAL REBALANCE RUN          11         11        559       4529      12417           0                                                       0
           1 REBAL COMPACT   WAIT         11         11          0          0          0           0                                                       0

You can see detailed information about the disk dropping in asmcmd too:

ASMCMD> lsdsk -kpt
Total_MB  Free_MB  OS_MB  Name                Failgroup  Failgroup_Type  Library  Label  UDID  Product  Redund   Group_Num  Disk_Num      Incarn  Mount_Stat  Header_Stat  Mode_Stat  State    Create_Date  Mount_Date  Repair_Timer  Path
    5119     3237      0  _DROPPED_0001_DATA  DATA_0001  REGULAR         System                         UNKNOWN          1         1  3916023153  MISSING     UNKNOWN      OFFLINE    FORCING  15-FEB-15    15-FEB-15   0
    5119      323   5119  DATA_0000           DATA_0000  REGULAR         System                         UNKNOWN          1         0  3916023154  CACHED      MEMBER       ONLINE     NORMAL   15-FEB-15    15-FEB-15   0             /dev/vdc1
    5119      324   5119  DATA_0002           DATA_0002  REGULAR         System                         UNKNOWN          1         2  3916023152  CACHED      MEMBER       ONLINE     NORMAL   15-FEB-15    15-FEB-15   0             /dev/vde1
    5119      328   5119  DATA_0003           DATA_0003  REGULAR         System                         UNKNOWN          1         3  3916023156  CACHED      MEMBER       ONLINE     NORMAL   15-FEB-15    15-FEB-15   0             /dev/vdf1
ASMCMD>

Missing, forcing, offline all don’t sound too good… What is worse was this:

ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  Y         512   4096  1048576     15357        0             5119           -2559              1             N  DATA/

And then the inevitable happened:

2015-02-15 08:31:55.491000 -05:00
ERROR: ORA-15041 thrown in ARB0 for group number 1
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_arb0_11547.trc:
ORA-15041: diskgroup "DATA" space exhausted
WARNING: Resync encountered ORA-15041; continuing rebalance
NOTE: stopping process ARB0
NOTE: requesting all-instance membership refresh for group=1
WARNING: rebalance not completed for group 1/0xd949359d (DATA)
GMON updating for reconfiguration, group 1 at 20 for pid 25, osid 11773
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA

ORA-15041 is the dreaded error: space exhausted. This was to be expected, but I had to see it with my own eyes.

The ASM rebalance is not complete:

SQL> r
  1* select * from v$asm_operation

GROUP_NUMBER OPERA PASS      STAT      POWER     ACTUAL      SOFAR   EST_WORK   EST_RATE EST_MINUTES ERROR_CODE                                       CON_ID
------------ ----- --------- ---- ---------- ---------- ---------- ---------- ---------- ----------- -------------------------------------------- ----------
           1 REBAL RESYNC    DONE         11                                                                                                               0
           1 REBAL REBALANCE ERRS         11                                                         ORA-15041                                             0
           1 REBAL COMPACT   WAIT         11                                                                                                               0

Interestingly the database remained up-but I can’t do anything with it as these examples demonstrate:

SQL> alter tablespace IAMTOOLARGETOFIT add datafile size 100m;
alter tablespace IAMTOOLARGETOFIT add datafile size 100m
*
ERROR at line 1:
ORA-01119: error in creating database file '+DATA'
ORA-17502: ksfdcre:4 Failed to create file +DATA
ORA-15041: diskgroup "DATA" space exhausted



SQL> create table t as select * from dba_objects;
create table t as select * from dba_objects
*
ERROR at line 1:
ORA-01652: unable to extend temp segment by 128 in tablespace USERS

Errors in file /u01/app/oracle/diag/rdbms/ncdb/NCDB/trace/NCDB_arc3_12194.trc:
ORA-19504: failed to create file "+DATA"
ORA-17502: ksfdcre:4 Failed to create file +DATA
ORA-15041: diskgroup "DATA" space exhausted
ARC3: Error 19504 Creating archive log file to '+DATA'
2015-02-15 08:47:31.410000 -05:00
Unable to create archive log file '+DATA'

You are now in a bit of a situation … So be careful when you are on negative values for usable_file_mb-a loss of only one failgroup does not cause the disk group to be dismounted in ASM normal redundancy, but you are likewise ending up in a very undesirable place.

Posted in 12c Release 1, Linux | Leave a Comment »

How to resolve the text behind v$views?

Posted by Martin Bach on January 21, 2015

This is a common problem I have and I never write it down (except now). For example, today I wanted to know what the valid parameters for _serial_direct_read were:

SQL> select * from v$parameter_valid_values where name ='_serial_direct_read';

no rows selected

OK so if Oracle doesn’t tell me then maybe I can work it out? Getting the view_text has worked in the past:

SQL> select view_name, text_vc from dba_views where view_name = '%PARAMETER_VALID_VALUES'

VIEW_NAME                          TEXT_VC
---------------------------------- ----------------------------------------------------------------------------------------------------
V_$PARAMETER_VALID_VALUES          select "NUM","NAME","ORDINAL","VALUE","ISDEFAULT","CON_ID"
                                   from v$parameter_valid_values
GV_$PARAMETER_VALID_VALUES         select "INST_ID","NUM","NAME","ORDINAL","VALUE","ISDEFAULT","CON_ID"
                                   from gv$parameter_valid_values

I’m sure I did the step wrong, but I couldn’t find what the lower case thingie was.

SQL> desc "v$parameter_valid_values"
ERROR:
ORA-04043: object "v$parameter_valid_values" does not exist

SQL> sho user
USER is "SYS"
SQL> desc "gv$parameter_valid_values"
ERROR:
ORA-04043: object "gv$parameter_valid_values" does not exist

SQL> select * from dba_views where view_name = '"gv$parameter_valid_values"';

no rows selected

SQL> select * from dba_objects where object_name = '"gv$parameter_valid_values"';

no rows selected

Yes, I’m pretty sure I got something wrong along the way.

Solutions

One possibility is to use dbms_xplan.display_cursor() – easy!

SQL> select * from v$parameter_valid_values where name = '_serial_direct_read';

no rows selected

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
SQL_ID  9hkygnf02nd8y, child number 0
-------------------------------------
select * from v$parameter_valid_values where name =
'_serial_direct_read'

Plan hash value: 1012408093

-------------------------------------------------------------------------
| Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                 |       |       |     1 (100)|
|*  1 |  FIXED TABLE FULL| X$KSPVLD_VALUES |     1 |    49 |     0   (0)|
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("NAME_KSPVLD_VALUES"='_serial_direct_read' AND
              TRANSLATE("NAME_KSPVLD_VALUES",'_','#') NOT LIKE '#%' AND
              "INST_ID"=USERENV('INSTANCE')))

Now if I translate this I can write a query that shows me what I need. It also demonstrates that – just like v$parameter – underscore parameters aren’t shown in this view.

SQL> @desc X$KSPVLD_VALUES
           Name                            Null?    Type
           ------------------------------- -------- ----------------------------
    1      ADDR                                     RAW(8)
    2      INDX                                     NUMBER
    3      INST_ID                                  NUMBER
    4      CON_ID                                   NUMBER
    5      PARNO_KSPVLD_VALUES                      NUMBER
    6      NAME_KSPVLD_VALUES                       VARCHAR2(64)
    7      ORDINAL_KSPVLD_VALUES                    NUMBER
    8      VALUE_KSPVLD_VALUES                      VARCHAR2(255)
    9      ISDEFAULT_KSPVLD_VALUES                  VARCHAR2(64)

SQL> select PARNO_KSPVLD_VALUES, NAME_KSPVLD_VALUES, ORDINAL_KSPVLD_VALUES, VALUE_KSPVLD_VALUES
  2  from X$KSPVLD_VALUES where NAME_KSPVLD_VALUES ='_serial_direct_read';

PARNO_KSPVLD_VALUES NAME_KSPVLD_VALUES             ORDINAL_KSPVLD_VALUES VALUE_KSPVLD_VALUES
------------------- ------------------------------ --------------------- ------------------------------
               2873 _serial_direct_read                                1 ALWAYS
               2873 _serial_direct_read                                2 AUTO
               2873 _serial_direct_read                                3 NEVER
               2873 _serial_direct_read                                4 TRUE
               2873 _serial_direct_read                                5 FALSE

There you go!

Another way is to use the 12c functionality in DBMS_UTILITY.EXPAND_SQL_TEXT. Reusing the example by Tom Kyte:

SQL> var x clob.

SQL> exec dbms_utility.expand_sql_text( -
  2   input_sql_text => 'select * from V$PARAMETER_VALID_VALUES', -
  3   output_sql_text => :x)

print :x

X
--------------------------------------------------------------------------------------------------------
SELECT "A1"."NUM" "NUM","A1"."NAME" "NAME","A1"."ORDINAL" "ORDINAL","A1"."VALUE" "VALUE",
"A1"."ISDEFAULT" "ISDEFAULT","A1"."CON_ID" "CON_ID" FROM  (SELECT "A2"."NUM" "NUM","A2"."NAME"
"NAME","A2"."ORDINAL" "ORDINAL","A2"."VALUE" "VALUE","A2"."ISDEFAULT" "ISDEFAULT","A2"."CON_ID" "CON_ID"
FROM  (SELECT "A3"."INST_ID" "INST_ID","A3"."PARNO_KSPVLD_VALUES" "NUM","A3"."NAME_KSPVLD_VALUES"
"NAME","A3"."ORDINAL_KSPVLD_VALUES" "ORDINAL","A3"."VALUE_KSPVLD_VALUES"
"VALUE","A3"."ISDEFAULT_KSPVLD_VALUES" "ISDEFAULT","A3"."CON_ID" "CON_ID" FROM SYS."X$KSPVLD_VALUES" "A3"
WHERE TRANSLATE("A3"."NAME_KSPVLD_VALUES",'_','#') NOT LIKE '#%') "A2" WHERE
"A2"."INST_ID"=USERENV('INSTANCE')) "A1"

This seems to have worked in earlier versions too, one example is on Jonathan Lewis’ blog.

Update: the most obvious solution to this was to use v$fixed_view_definition! The view must have dropped at the cold end of my brain’s LRU list. As others have pointed out (thanks everyone for your comments!), this would be the way to query the object:

SQL> select VIEW_DEFINITION from V$FIXED_VIEW_DEFINITION where view_name like 'GV$PARAMETER_VALID_VALUES';

VIEW_DEFINITION
-------------------------------------------------------------------------------------------------------
SELECT INST_ID, PARNO_KSPVLD_VALUES, NAME_KSPVLD_VALUES, ORDINAL_KSPVLD_VALUES, VALUE_KSPVLD_VALUES,
 ISDEFAULT_KSPVLD_VALUES, CON_ID
 FROM X$KSPVLD_VALUES WHERE TRANSLATE(NAME_KSPVLD_VALUES,'_','#') NOT LIKE '#%'

Summary

It’s probably not what Oracle intended but DBMS_UTILITY.EXPAND_SQL_TEXT() worked really well. I came across the DBMS_XPLAN.DISPLAY_CURSOR() output by chance when I ran my diagnostic script at the wrong time but it, too, does the job.

Or, I could have used Tanel Poder’s script I didn’t know about until now:


SQL> @pvalid _serial_direct_read
Display valid values for multioption parameters matching "_serial_direct_read"...

  PAR# PARAMETER                                                 ORD VALUE                          DEFAULT
------ -------------------------------------------------- ---------- ------------------------------ -------
  2873 _serial_direct_read                                         1 ALWAYS
       _serial_direct_read                                         2 AUTO
       _serial_direct_read                                         3 NEVER
       _serial_direct_read                                         4 TRUE
       _serial_direct_read                                         5 FALSE

Posted in 12c Release 1 | 5 Comments »

Adaptive plans and v$sql_plan and related views

Posted by Martin Bach on January 13, 2015

Adaptive plans are one of the coolest new optimiser features in Oracle 12c. If you haven’t seen or heard about them in detail I recommend the following resources:

There is a caveat with this though: if your tuning script relies on pulling information from v$sql_plan and related views, you get more information than you might want. I found out about this while working on our 12c New Features training class. This, and a lot more, will be part of it. Stay tuned :)

Consider the following example. I will use the following query in this article:

SELECT
  /* statement002 */
  /*+ gather_plan_statistics monitor */
  TRUNC(order_date, 'mm'),
  COUNT(TRUNC(order_date, 'mm'))
FROM orders o,
  order_items oi
WHERE oi.ORDER_ID    = o.order_id
AND o.CUSTOMER_CLASS = 'Prime'
AND o.WAREHOUSE_ID   = 501
AND o.ORDER_DATE BETWEEN DATE '2012-01-01' AND DATE '2012-07-01'
GROUP BY TRUNC(order_date, 'mm')
ORDER BY 1;

These tables are part of the Swingbench SOE (order entry) schema. I have inflated the order_items table to twice its size for a total of 171,579,632 rows.

When executing this query on the x4-2 half rack in the lab I get this (sorry for the wide output!):

TRUNC(ORDER_DATE, COUNT(TRUNC(ORDER_DATE,'MM'))
----------------- -----------------------------
20120101 00:00:00                           472
20120201 00:00:00                           580
20120301 00:00:00                           614
20120401 00:00:00                           578

SQL> @x
Display execution plan for last statement for this session from library cache...

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  3wb68w2kj6gdd, child number 0
-------------------------------------
SELECT   /* statement002 */   /*+ gather_plan_statistics monitor */
TRUNC(order_date, 'mm'),   COUNT(TRUNC(order_date, 'mm')) FROM orders
o,   order_items oi WHERE oi.ORDER_ID    = o.order_id AND
o.CUSTOMER_CLASS = 'Prime' AND o.WAREHOUSE_ID   = 501 AND o.ORDER_DATE
BETWEEN DATE '2012-01-01' AND DATE '2012-07-01' GROUP BY
TRUNC(order_date, 'mm') ORDER BY 1

Plan hash value: 812470616

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                      | Name             | Starts | E-Rows |E-Bytes| Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                               |                  |      1 |        |       | 29766 (100)|       |       |      4 |00:00:00.06 |   29354 |     17 |       |       |          |
|   1 |  SORT ORDER BY                                 |                  |      1 |    404 | 14948 | 29766   (1)|       |       |      4 |00:00:00.06 |   29354 |     17 |  2048 |  2048 | 2048  (0)|
|   2 |   HASH GROUP BY                                |                  |      1 |    404 | 14948 | 29766   (1)|       |       |      4 |00:00:00.06 |   29354 |     17 |  1394K|  1394K|  634K (0)|
|*  3 |    FILTER                                      |                  |      1 |        |       |            |       |       |   2244 |00:00:00.05 |   29354 |     17 |       |       |          |
|   4 |     NESTED LOOPS                               |                  |      1 |   2996 |   108K| 29764   (1)|       |       |   2244 |00:00:00.05 |   29354 |     17 |       |       |          |
|*  5 |      TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| ORDERS           |      1 |    434 | 13454 | 28462   (1)| ROWID | ROWID |    362 |00:00:00.03 |   28261 |     17 |       |       |          |
|*  6 |       INDEX RANGE SCAN                         | ORD_WAREHOUSE_IX |      1 |  28624 |       |    90   (0)|       |       |  28441 |00:00:00.02 |      90 |      0 |  1025K|  1025K|          |
|*  7 |      INDEX RANGE SCAN                          | ITEM_ORDER_IX    |    362 |      7 |    42 |     3   (0)|       |       |   2244 |00:00:00.01 |    1093 |      0 |  1025K|  1025K|          |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(TIMESTAMP' 2012-07-01 00:00:00'>=TIMESTAMP' 2012-01-01 00:00:00')
   5 - filter(("O"."CUSTOMER_CLASS"='Prime' AND "O"."ORDER_DATE">=TIMESTAMP' 2012-01-01 00:00:00' AND "O"."ORDER_DATE"<=TIMESTAMP' 2012-07-01 00:00:00'))
   6 - access("O"."WAREHOUSE_ID"=501)
   7 - access("OI"."ORDER_ID"="O"."ORDER_ID")

   - dynamic statistics used: dynamic sampling (level=2)
   - this is an adaptive plan
   - 2 Sql Plan Directives used for this statement

@x is one of the many useful scripts written by Tanel Poder. I recommend you download the tpt_public.zip and get familiar with what they do-they are just great.

The Execution Plan

Please take a note at the output of the execution plan. The query is an adaptive plan (see notes). The join between orders and order_items is performed using a nested loop (step 4 and the following ones). You can also use v$sql to show that this plan is an adaptive plan:

SQL> select sql_id, child_number, executions, is_reoptimizable, is_resolved_adaptive_plan
  2  from v$sql where sql_id = '3wb68w2kj6gdd';

SQL_ID        CHILD_NUMBER EXECUTIONS I I
------------- ------------ ---------- - -
3wb68w2kj6gdd            0          1 Y Y

It is indeed already resolved. Using Jonathan Lewis’s recent notes on reoptimisation I checked what the optimiser worked out:

SQL> select sql_id, child_number, hint_id, hint_text, reparse
  2  from v$sql_reoptimization_hints where sql_id = '3wb68w2kj6gdd' and child_number = 0;

SQL_ID        CHILD_NUMBER    HINT_ID HINT_TEXT                                             REPARSE
------------- ------------ ---------- -------------------------------------------------- ----------
3wb68w2kj6gdd            0          1 OPT_ESTIMATE (@"SEL$1" GROUP_BY ROWS=4.000000 )             1
3wb68w2kj6gdd            0          2 OPT_ESTIMATE (@"SEL$1" JOIN ("OI"@"SEL$1" "O"@"SEL          1
                                      $1") ROWS=1.000000 )

Nice! That also provides some insights that could be useful. You can map the hint_text to v$sql_plan.object_alias, it is the query block name. I have actually executed slight variations of the query in preparation …

Back to my problem

In the past I have used v$sql_plan (indirectly, via scripts written by far more clever people than me) to work each step in the plan. v$sql_plan has an ID, PARENT_ID and DEPTH that make it easier to work out where in the plan you are. Using my old approach I got stuck, consider this:

SQL> r
  1  select
  2    lpad(' ',sp.depth*1,' ')
  3    || sp.operation AS operation,
  4    sp.OPTIONS,
  5    sp.object#,
  6    sp.object_name,
  7    sp.object_alias,
  8    sp.object_type
  9  FROM v$sql_plan sp
 10* where sql_id = '3wb68w2kj6gdd' and child_number = 0

OPERATION                                OPTIONS                                OBJECT# OBJECT_NAME               OBJECT_ALI OBJECT_TYP
---------------------------------------- ----------------------------------- ---------- ------------------------- ---------- ----------
SELECT STATEMENT
 SORT                                    ORDER BY
  HASH                                   GROUP BY
   FILTER
    HASH JOIN
     NESTED LOOPS
      STATISTICS COLLECTOR
       TABLE ACCESS                      BY GLOBAL INDEX ROWID BATCHED            28865 ORDERS                    O@SEL$1    TABLE
        INDEX                            RANGE SCAN                               29329 ORD_WAREHOUSE_IX          O@SEL$1    INDEX
      INDEX                              RANGE SCAN                               29302 ITEM_ORDER_IX             OI@SEL$1   INDEX
     INDEX                               FAST FULL SCAN                           29302 ITEM_ORDER_IX             OI@SEL$1   INDEX

11 rows selected.

If you compare this with the DBMS_XPLAN-output from above then you notice there is a lot more information in the query against v$sql_plan …In fact, that’s the same output as what you get when calling DBMS_XPLAN.DISPLAY_CURSOR(… format => ‘ADAPTIVE’):

SQL> select * from table(dbms_xplan.display_cursor('3wb68w2kj6gdd',0,'+ADAPTIVE'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  3wb68w2kj6gdd, child number 0
-------------------------------------
SELECT   /* statement002 */   /*+ gather_plan_statistics monitor */
TRUNC(order_date, 'mm'),   COUNT(TRUNC(order_date, 'mm')) FROM orders
o,   order_items oi WHERE oi.ORDER_ID    = o.order_id AND
o.CUSTOMER_CLASS = 'Prime' AND o.WAREHOUSE_ID   = 501 AND o.ORDER_DATE
BETWEEN DATE '2012-01-01' AND DATE '2012-07-01' GROUP BY
TRUNC(order_date, 'mm') ORDER BY 1

Plan hash value: 812470616

---------------------------------------------------------------------------------------------------------------------------------------
|   Id  | Operation                                        | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                                 |                  |       |       | 29766 (100)|          |       |       |
|     1 |  SORT ORDER BY                                   |                  |   404 | 14948 | 29766   (1)| 00:00:02 |       |       |
|     2 |   HASH GROUP BY                                  |                  |   404 | 14948 | 29766   (1)| 00:00:02 |       |       |
|  *  3 |    FILTER                                        |                  |       |       |            |          |       |       |
|- *  4 |     HASH JOIN                                    |                  |  2996 |   108K| 29764   (1)| 00:00:02 |       |       |
|     5 |      NESTED LOOPS                                |                  |  2996 |   108K| 29764   (1)| 00:00:02 |       |       |
|-    6 |       STATISTICS COLLECTOR                       |                  |       |       |            |          |       |       |
|  *  7 |        TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| ORDERS           |   434 | 13454 | 28462   (1)| 00:00:02 | ROWID | ROWID |
|  *  8 |         INDEX RANGE SCAN                         | ORD_WAREHOUSE_IX | 28624 |       |    90   (0)| 00:00:01 |       |       |
|  *  9 |       INDEX RANGE SCAN                           | ITEM_ORDER_IX    |     7 |    42 |     3   (0)| 00:00:01 |       |       |
|-   10 |      INDEX FAST FULL SCAN                        | ITEM_ORDER_IX    |     7 |    42 |     3   (0)| 00:00:01 |       |       |
---------------------------------------------------------------------------------------------------------------------------------------
...
Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - this is an adaptive plan (rows marked '-' are inactive)
   - 2 Sql Plan Directives used for this statement

Note the line: this is an adaptive plan (rows marked ‘-‘ are inactive). But how does DBMS_XPLAN know that these lines are hidden? There doesn’t seem to be a view v$sql_plan_hidden_lines. I tried a few things and eventually traced the call to DBMS_XPLAN.DISPLAY_CURSOR. In the trace I found the trick Oracle uses:

=====================
PARSING IN CURSOR #140127730386096 len=298 dep=2 uid=75 oct=3 lid=75 tim=767822822383 hv=580989905 ad='e9e6f7570' sqlid='fg4skgcja2cyj'
SELECT EXTRACTVALUE(VALUE(D), '/row/@op'), EXTRACTVALUE(VALUE(D), '/row/@dis'), EXTRACTVALUE(VALUE(D), '/row/@par'), EXTRACTVALUE(VALUE(D),
'/row/@prt'), EXTRACTVALUE(VALUE(D), '/row/@dep'), EXTRACTVALUE(VALUE(D), '/row/@skp') FROM TABLE(XMLSEQUENCE(EXTRACT(XMLTYPE(:B1 ), 
'/*/display_map/row'))) D
END OF STMT
...
STAT #140127732564416 id=1 cnt=11 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=0 pr=0 pw=0 time=691 us cost=1 size=8935 card=1)'
STAT #140127732564416 id=2 cnt=11 pid=1 pos=1 obj=0 op='VIEW  (cr=0 pr=0 pw=0 time=1223 us cost=0 size=8935 card=1)'
STAT #140127732564416 id=3 cnt=11 pid=2 pos=1 obj=0 op='NESTED LOOPS  (cr=0 pr=0 pw=0 time=1116 us cost=0 size=957 card=1)'
STAT #140127732564416 id=4 cnt=11 pid=3 pos=1 obj=0 op='FIXED TABLE FIXED INDEX X$KQLFXPL (ind:4) (cr=0 pr=0 pw=0 time=902 us cost=0 size=853 card=1)'
STAT #140127732564416 id=5 cnt=11 pid=3 pos=2 obj=0 op='FIXED TABLE FIXED INDEX X$KGLCURSOR_CHILD (ind:2) (cr=0 pr=0 pw=0 time=134 us cost=0 size=104 card=1)'

So that’s it! The X$ views map to v$sql and v$sql_plan unless I am very mistaken. V$SQL_PLAN only has 1 column that contains XML-other_xml. Using this information I thought there has to be something in there … and indeed, there is:

SQL> select xmltype(other_xml)
  2  from v$sql_plan
  3  where sql_id = '3wb68w2kj6gdd' and child_number = 0
  4  and other_xml is not null;

XMLTYPE(OTHER_XML)
---------------------------------------------------------------------------------------------------------------------------
<other_xml>
  <info type="db_version">12.1.0.2</info>
  <info type="parse_schema"><![CDATA["SOE"]]></info>
  <info type="dynamic_sampling" note="y">2</info>
  <info type="plan_hash_full">166760258</info>
  <info type="plan_hash">812470616</info>
  <info type="plan_hash_2">3729130925</info>
  <info type="adaptive_plan" note="y">yes</info>
  <spd>
    <cv>0</cv>
    <cu>2</cu>
  </spd>
  <outline_data>
    <hint><![CDATA[IGNORE_OPTIM_EMBEDDED_HINTS]]></hint>
    <hint><![CDATA[OPTIMIZER_FEATURES_ENABLE('12.1.0.2')]]></hint>
    <hint><![CDATA[DB_VERSION('12.1.0.2')]]></hint>
    <hint><![CDATA[ALL_ROWS]]></hint>
    <hint><![CDATA[OUTLINE_LEAF(@"SEL$1")]]></hint>
    <hint><![CDATA[INDEX_RS_ASC(@"SEL$1" "O"@"SEL$1" ("ORDERS"."WAREHOUSE_ID" "ORDERS"."ORDER_STATUS"))]]></hint>
    <hint><![CDATA[BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$1" "O"@"SEL$1")]]></hint>
    <hint><![CDATA[INDEX(@"SEL$1" "OI"@"SEL$1" ("ORDER_ITEMS"."ORDER_ID"))]]></hint>
    <hint><![CDATA[LEADING(@"SEL$1" "O"@"SEL$1" "OI"@"SEL$1")]]></hint>
    <hint><![CDATA[USE_NL(@"SEL$1" "OI"@"SEL$1")]]></hint>
    <hint><![CDATA[USE_HASH_AGGREGATION(@"SEL$1")]]></hint>
  </outline_data>
  <display_map>
    <row op="1" dis="1" par="0" prt="0" dep="1" skp="0"/>
    <row op="2" dis="2" par="1" prt="0" dep="2" skp="0"/>
    <row op="3" dis="3" par="2" prt="0" dep="3" skp="0"/>
    <row op="4" dis="3" par="3" prt="0" dep="3" skp="1"/>
    <row op="5" dis="4" par="3" prt="0" dep="4" skp="0"/>
    <row op="6" dis="4" par="4" prt="0" dep="4" skp="1"/>
    <row op="7" dis="5" par="4" prt="5" dep="5" skp="0"/>
    <row op="8" dis="6" par="5" prt="0" dep="6" skp="0"/>
    <row op="9" dis="7" par="4" prt="0" dep="5" skp="0"/>
    <row op="10" dis="7" par="3" prt="0" dep="3" skp="1"/>
  </display_map>
</other_xml>

The SQL statement from the trace was not much use to me, the following seemed better suited to work out what was happening. I added what I think the abbreviations stand for:

WITH display_map AS
  (SELECT X.*
  FROM v$sql_plan,
    XMLTABLE ( '/other_xml/display_map/row' passing XMLTYPE(other_xml ) COLUMNS 
      op  NUMBER PATH '@op',    -- operation
      dis NUMBER PATH '@dis',   -- display
      par NUMBER PATH '@par',   -- parent
      prt NUMBER PATH '@prt',   -- ?
      dep NUMBER PATH '@dep',   -- depth
      skp NUMBER PATH '@skp' )  -- skip
  AS X
  WHERE sql_id     = '&sql_id'
  AND child_number = &sql_child
  AND other_xml   IS NOT NULL
  )
SELECT * from display_map;

Enter value for sql_id: 3wb68w2kj6gdd
Enter value for sql_child: 0

        OP        DIS        PAR        PRT        DEP        SKP
---------- ---------- ---------- ---------- ---------- ----------
         1          1          0          0          1          0
         2          2          1          0          2          0
         3          3          2          0          3          0
         4          3          3          0          3          1
         5          4          3          0          4          0
         6          4          4          0          4          1
         7          5          4          5          5          0
         8          6          5          0          6          0
         9          7          4          0          5          0
        10          7          3          0          3          1

10 rows selected.

Well-that’s a starting point. Now all I have to do is join the display map to v$sql_plan cleverly. After a little bit of fiddling with the query this seems to work:

WITH display_map AS
  (SELECT X.*
  FROM v$sql_plan,
    XMLTABLE ( '/other_xml/display_map/row' passing XMLTYPE(other_xml ) COLUMNS 
      op  NUMBER PATH '@op',    -- operation
      dis NUMBER PATH '@dis',   -- display
      par NUMBER PATH '@par',   -- parent
      prt NUMBER PATH '@prt',   -- ?
      dep NUMBER PATH '@dep',   -- depth
      skp NUMBER PATH '@skp' )  -- skip
  AS X
  WHERE sql_id     = '&sql_id'
  AND child_number = &sql_child
  AND other_xml   IS NOT NULL
  )
SELECT 
  -- new ID, depth, parent etc from display_map
  NVL(m.dis, 0) AS new_id,
  m.par         AS new_parent,
  m.dep         AS new_depth,
  -- plan formatting, as usual
  lpad(' ',m.dep*1,' ')
  || sp.operation AS operation,
  sp.OPTIONS,
  sp.object#,
  sp.object_name,
  sp.object_alias,
  sp.object_type
FROM v$sql_plan sp
LEFT OUTER JOIN display_map m
ON (sp.id = m.op)
WHERE sp.sql_Id        = '&sql_id'
AND sp.child_number    = &sql_child
AND NVL(m.skp,0)      <> 1
ORDER BY NVL(dis,0);

    NEW_ID NEW_PARENT  NEW_DEPTH OPERATION                      OPTIONS                                OBJECT# OBJECT_NAME               OBJECT_ALI OBJECT_TYP
---------- ---------- ---------- ------------------------------ ----------------------------------- ---------- ------------------------- ---------- ----------
         0                       SELECT STATEMENT
         1          0          1  SORT                          ORDER BY
         2          1          2   HASH                         GROUP BY
         3          2          3    FILTER
         4          3          4     NESTED LOOPS
         5          4          5      TABLE ACCESS              BY GLOBAL INDEX ROWID BATCHED            28865 ORDERS                    O@SEL$1    TABLE
         6          5          6       INDEX                    RANGE SCAN                               29329 ORD_WAREHOUSE_IX          O@SEL$1    INDEX
         7          4          5      INDEX                     RANGE SCAN                               29302 ITEM_ORDER_IX             OI@SEL$1   INDEX

8 rows selected.

This seems to match what Oracle produces:

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  3wb68w2kj6gdd, child number 0
-------------------------------------
SELECT   /* statement002 */   /*+ gather_plan_statistics monitor */
TRUNC(order_date, 'mm'),   COUNT(TRUNC(order_date, 'mm')) FROM orders
o,   order_items oi WHERE oi.ORDER_ID    = o.order_id AND
o.CUSTOMER_CLASS = 'Prime' AND o.WAREHOUSE_ID   = 501 AND o.ORDER_DATE
BETWEEN DATE '2012-01-01' AND DATE '2012-07-01' GROUP BY
TRUNC(order_date, 'mm') ORDER BY 1

Plan hash value: 812470616

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                               |                  |       |       | 29766 (100)|          |       |       |
|   1 |  SORT ORDER BY                                 |                  |   404 | 14948 | 29766   (1)| 00:00:02 |       |       |
|   2 |   HASH GROUP BY                                |                  |   404 | 14948 | 29766   (1)| 00:00:02 |       |       |
|*  3 |    FILTER                                      |                  |       |       |            |          |       |       |
|   4 |     NESTED LOOPS                               |                  |  2996 |   108K| 29764   (1)| 00:00:02 |       |       |
|*  5 |      TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| ORDERS           |   434 | 13454 | 28462   (1)| 00:00:02 | ROWID | ROWID |
|*  6 |       INDEX RANGE SCAN                         | ORD_WAREHOUSE_IX | 28624 |       |    90   (0)| 00:00:01 |       |       |
|*  7 |      INDEX RANGE SCAN                          | ITEM_ORDER_IX    |     7 |    42 |     3   (0)| 00:00:01 |       |       |
-----------------------------------------------------------------------------------------------------------------------------------

With this done it should be possible to add other diagnostic information too, just join the additional views and add the relevant columns. Hope this helps! I haven’t performed extensive testing on the approach but wanted to put it out here for the more clever people to tell me where I’m wrong.

Posted in 12c Release 1 | Leave a Comment »

Installing Oracle 12.1.0.2 RAC on Oracle Linux 7-part 2

Posted by Martin Bach on January 12, 2015

In the first part of the article series you could read how a kickstart file made the installation of Oracle Linux 7 a lot more bearable. In this part of the series it’s all about configuring the operating system. The installation of Grid Infrastructure and the Oracle database is for another set of posts.

There are quite some differences between Oracle Linux 6 and 7

To me the transition from Oracle Linux 6 to 7 feels like the step from Solaris 9 to 10 at the time. Personally I think that a lot has changed. Although, it’s fair to say that it has been quite some time it has been announced that the network stack commands we know and love are deprecated and might go… Even with Oracle Linux 6 there was a threat that network manager would now be the only tool to modify your network settings (which thankfully was not the case). A lot of efforts of the Linux community have now come to fruition, and it’s time to adjust to the future. Even when it’s painful (and it is, at least a bit).

Configuring the network

The warning has been out there quite a while but now it seems to be true-no more system-config-network-tui to configure the network! No more ifconfig! Oh dear-quite a bit of learning to be done. Luckily someone else has done all the legwork and documented the changes. A good example is this one:

https://dougvitale.wordpress.com/2011/12/21/deprecated-linux-networking-commands-and-their-replacements/

So first of all-don’t fear: although all network interfaces are configured using network manager now, you can still use a command line tool: nmtui. After trying it out I have to say I’m not really convinced about its usability. What appears better is the use of the nmcli, network manager command line tool. It’s use is quite confusing, and it appears to me as if the whole network manager toolset was developed for laptop users, not servers. But I digress. I have a few interfaces in my RAC VM, the first was configured during the installation, eth[1-3] aren’t configured yet.

[root@localhost ~]# nmcli connection show
NAME         UUID                                  TYPE            DEVICE
System eth0  77e3f8a9-76d0-4051-a8f2-cbbe39dab089  802-3-ethernet  eth0
[root@localhost ~]# nmcli device status
DEVICE  TYPE      STATE         CONNECTION
eth0    ethernet  connected     System eth0
eth1    ethernet  disconnected  --
eth2    ethernet  disconnected  --
eth3    ethernet  disconnected  --
lo      loopback  unmanaged     --
[root@localhost ~]#

At this point I have used eth0 as the management network (similar to the way Exadata does) and will use the other networks for the database. eth1 will act as the public network, eth2 and eth3 will be private.

Although the network interfaces can be named differently for device name persistence I stick with the old naming for now. I don’t want to run into trouble with the installer just yet. On physical hardware you are very likely to see very different network interface names, the kernel uses a naming scheme identifying where the cards are (on the main board, or in extension cards for example). I’ll write another post about that soon.

Using dnsmasq (on the host) I configure my hosts for these addresses:

[root@ol62 ~]# grep rac12pri /etc/hosts
192.168.100.107	rac12pri1.example.com		rac12pri1
192.168.100.108	rac12pri1-vip.example.com	rac12pri1-vip
192.168.100.109	rac12pri2.example.com		rac12pri2
192.168.100.110	rac12pri2-vip.example.com	rac12pri2-vip
192.168.100.111	rac12pri-scan.example.com	rac12pri-scan
192.168.100.112	rac12pri-scan.example.com	rac12pri-scan
192.168.100.113	rac12pri-scan.example.com	rac12pri-scan

Configuring the interface is actually not too hard once you got the hang of it. It took me a little while to get it though… It almost appears as if something that was simple and easy to use was made difficult to use.

[root@localhost ~]# nmcli con add con-name eth1 ifname eth1 type ethernet ip4 192.168.100.107/24 gw4 192.168.100.1
[root@localhost ~]# nmcli con add con-name eth2 ifname eth2 type ethernet ip4 192.168.101.107/24
[root@localhost ~]# nmcli con add con-name eth3 ifname eth3 type ethernet ip4 192.168.102.107/24 

[root@localhost ~]# nmcli con show
NAME         UUID                                  TYPE            DEVICE
eth2         ccc7f592-b563-4b9d-a36b-2b45809e4643  802-3-ethernet  eth2
eth1         ae897dee-42ff-4ccd-843b-7c97ba0d5315  802-3-ethernet  eth1
System eth0  77e3f8a9-76d0-4051-a8f2-cbbe39dab089  802-3-ethernet  eth0
eth3         b6074c9a-dcc4-4487-9a8a-052e4c60bbca  802-3-ethernet  eth3

I can now verify the IP addresses using the “ip” tool (ifconfig was not installed, I haven’t yet checked if there was a compatibility package though)

[root@localhost ~]# ip addr show
1: lo: &lt;LOOPBACK,UP,LOWER_UP&gt; mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:6e:6f:67 brd ff:ff:ff:ff:ff:ff
    inet 192.168.150.111/24 brd 192.168.150.255 scope global eth0
    inet6 fe80::5054:ff:fe6e:6f67/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:96:ad:88 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.107/24 brd 192.168.100.255 scope global eth1
    inet6 fe80::5054:ff:fe96:ad88/64 scope link
       valid_lft forever preferred_lft forever
4: eth2: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:c1:cc:8e brd ff:ff:ff:ff:ff:ff
    inet 192.168.101.107/24 brd 192.168.101.255 scope global eth2
    inet6 fe80::5054:ff:fec1:cc8e/64 scope link
       valid_lft forever preferred_lft forever
5: eth3: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:7e:59:45 brd ff:ff:ff:ff:ff:ff
    inet 192.168.102.107/24 brd 192.168.102.255 scope global eth3
    inet6 fe80::5054:ff:fe7e:5945/64 scope link
       valid_lft forever preferred_lft forever

Now what’s left is setting the hostname-which is a simple call to hostnamectl –static set-hostname rac12pri1. nmcli gives you an interface to changing the hostname as well. I repeated the steps for node 2, they are identical except for the network IP addresses of course.

So that concludes the network setup.

Managing linux daemons

If you are curious about setting services at runlevel, then there’ll be another surprise:

[root@rac12pri2 ~]# chkconfig --list

Note: This output shows SysV services only and does not include native
      systemd services. SysV configuration data might be overridden by native
      systemd configuration.

      If you want to list systemd services use 'systemctl list-unit-files'.
      To see services enabled on particular target use
      'systemctl list-dependencies [target]'.

iprdump        	0:off	1:off	2:on	3:on	4:on	5:on	6:off
iprinit        	0:off	1:off	2:on	3:on	4:on	5:on	6:off
iprupdate      	0:off	1:off	2:on	3:on	4:on	5:on	6:off
netconsole     	0:off	1:off	2:off	3:off	4:off	5:off	6:off
network        	0:off	1:off	2:on	3:on	4:on	5:on	6:off
pmcd           	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmie           	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmlogger       	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmmgr          	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmproxy        	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmwebd         	0:off	1:off	2:off	3:off	4:off	5:off	6:off
rhnsd          	0:off	1:off	2:on	3:on	4:on	5:on	6:off
[root@rac12pri2 ~]#

If you just got familiar with upstart then there are some bad news: upstart is now replaced with systemd… This might be the right time to read up on that if you aren’t familiar with it yet:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/System_Administrators_Guide/chap-Managing_Services_with_systemd.html

Things are a little different with that, so here is an example how to enable and start the NTP service. It has to be installed first if that hasn’t been the case. You also should add the -x flag in /etc/sysconfig/ntpd. First I would like to see if the service is available. You use systemctl for this-so instead of a chkconfig ntpd –list you call systemctl as shown:

[root@rac12pri ~]# systemctl list-units --type service --all | grep ntpd
ntpd.service                                                                              loaded inactive dead    Network Time Service
ntpdate.service                                                                           loaded inactive dead    Set time via NTP

I have to get used to the new syntax: previously you used “service <whatever> status” and then, if you needed, typed backspace a few times and changed status to start. The new syntax is closer to human language but less practical: systemctl status <service>. Changing status to start requires more typing.

The check proved that the service exists (i.e. the NTP package is installed), but it is not started. We can change this:

[root@rac12pri ~]# systemctl enable ntpd.service
[root@rac12pri ~]# systemctl start ntpd.service
[root@rac12pri ~]# systemctl status ntpd.service
ntpd.service - Network Time Service
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled)
Active: active (running) since Tue 2014-12-16 15:38:47 GMT; 1s ago
Process: 5179 ExecStart=/usr/sbin/ntpd -u ntp:ntp $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 5180 (ntpd)
CGroup: /system.slice/ntpd.service
└─5180 /usr/sbin/ntpd -u ntp:ntp -g -x

Dec 16 15:38:47 rac12pri ntpd[5180]: Listen normally on 8 eth1 fe80::5054:ff:fe96:ad88 UDP 123
Dec 16 15:38:47 rac12pri ntpd[5180]: Listen normally on 9 eth2 fe80::5054:ff:fec1:cc8e UDP 123
Dec 16 15:38:47 rac12pri ntpd[5180]: Listen normally on 10 eth3 fe80::5054:ff:fe7e:5945 UDP 123
Dec 16 15:38:47 rac12pri ntpd[5180]: Listen normally on 11 eth0 fe80::5054:ff:fe6e:6f67 UDP 123
Dec 16 15:38:47 rac12pri ntpd[5180]: Listening on routing socket on fd #28 for interface updates
Dec 16 15:38:47 rac12pri ntpd[5180]: 0.0.0.0 c016 06 restart
Dec 16 15:38:47 rac12pri ntpd[5180]: 0.0.0.0 c012 02 freq_set ntpd 0.000 PPM
Dec 16 15:38:47 rac12pri ntpd[5180]: 0.0.0.0 c011 01 freq_not_set
Dec 16 15:38:47 rac12pri systemd[1]: Started Network Time Service.
Dec 16 15:38:48 rac12pri ntpd[5180]: 0.0.0.0 c614 04 freq_mode
[root@rac12pri ~]#

The call to “systemctl enable” replaces an invocation of chkconfig to automatically start ntpd as a service (chkconfig ntpd on). Starting the service does not produce any output, hence the need to check the status.

There is a slight caveat with the use of NTP: it is not the default time keeping service. Another tool, named chronyd is used instead.

This causes a problem after the next reboot: chronyd will be started, NTPd won’t be. The Red Hat documentation therefore has a section on how to switch:

[root@rac12pri ~]# systemctl stop chronyd
[root@rac12pri ~]# systemctl disable chronyd
[root@rac12pri ~]# systemctl status chronyd

Storage

Shared storage is provided by KVM. I am using my SSDs in the lab from where I create a few “LUNs”. These must explicitly be made “shareable” to be accessible by more than one guest. Since 12.1.0.2.0 Oracle installs a database for the cluster health monitor by default. Currently I use the following setup for my lab 12.1.0.2 clusters:

  1. +CHM (external redundancy) – 1x 15GB
  2. +OCR (normal redundancy) – 3x 2 GB
  3. +DATA (external redundancy) – 1 x 15GB
  4. +RECO (external redundancy) – 1 x 10 GB

If you use the guided installation of Grid Infrastructure the installer will prompt you for a single disk group only. This means that the CHM database as well as the OCR and voting files be installed in that disk group. I prefer to separate them though, which is why I create a second disk group OCR after the installation has completed and move the voting files and OCR out of +CHM.

DATA and RECO are standard Exadata disk groups and I like to keep things consistent for myself.

I use fdisk to partition the future ASM disks with 1 partition spanning the whole LUN.

Other tasks

A lot of the other pre-installation tasks can actually be performed during the kickstart installation. I still like to use SELinux in permissive mode even though-according to Requirements for Installing Oracle Database 12.1 on RHEL6 or OL6 64-bit (x86-64) (Doc ID 1529864.1)-selinux can be in “enforcing”. The directive in the kickstart file is

selinux –permissive

You shouldn’t have to install additional packages-all packages to be installed should go into the %packages section of the file. Simply copy the package names from the official documentation and paste below the last package in the section. There is one exception to the rule: cvuqdisk must be installed from the Oracle installation media.

Settings for /etc/sysctl.conf and /etc/security/limits.conf can also be made in the kickstart file as shown in the first part of this series.

Storage to be made available to RAC must have permissions set. Since there isn’t an ASMLib in Oracle Linux 7 to my knowledge UDEV will have to be used, and my udev configuration file, too, is in the first part.

To make sure my user and group IDs for the oracle and grid account are the same I create the accounts in the kickstart file as well. Passwords are deliberately not set-they may evolve and I can’t possibly remember them all :)

User equivalence can be set up using a technique I have already described in an earlier blog post. Although the user equivalence setup can be deferred to when you install Grid Infrastructure I still perform it before to allow me to run the cluster verification tool with the -fixup option.

Posted in 12c Release 1, KVM, Linux, RAC | 3 Comments »

Installing Oracle 12.1.0.2 RAC on Oracle Linux 7-part 1

Posted by Martin Bach on January 5, 2015

Now that 12.1.0.2 is certified on RedHat Linux 7 and spin-off environments it’s time to test the installation of RAC on such a system.

The installation of the OS is different from Oracle Linux 5 and 6-with these distributions was very straight forward how to install the operating system the method has changed significantly in release 7. I won’t cover the complete installation here, as always Tim Hall was quicker than me, but it makes me wonder who signed off the user interface for the partitioning “wizard”… I personally think that the kickstart partitioning-information is a lot easier to understand.

http://oracle-base.com/articles/linux/oracle-linux-7-installation.php

After walking through the steps in the graphical installer I thought that I’d rather kickstart the whole lot. If you have not yet used kickstart to create Oracle/Red Hat/CentOS hosts then now is the time to start admiring the technology!

My host is Oracle Linux 6.6 and there aren’t any packages that aren’t part of the distribution, with one exception: I have compiled dnsmasq from the current source since it had a bug when specifying the –bind-interfaces command line option if other dnsmasq processes were already running. libvirt uses dnsmasq extensively for its own networking (I since then upgraded the host to Oracle Linux 7 too and the problem is no longer present)

I don’t like repetitive tasks

Anything that is repetitive is not the best use of anyone’s time, especially not mine. I prefer to run a single command, do something different and come back with the OS already installed. For the installation to work I am relying on PXE booting to pass the kernel and initial RAM disk. TFTP is in charge of that. A Kickstart file is used for the silent installation. Let’s start off with the Pre Execution Boot Environment (I am putting a shameless plug in here for my 12c book which covers this subject extensively if you need more background). Here is my default configuration file in pxelinux.cfg:

default menu.c32
prompt 0

menu title PXE Boot Menu

label oracle-linux 6.6 x86-64
    menu label oracle-linux 6.6 x86-64
    kernel /tftpboot/webroot/images/ol/6/6/vmlinuz
    append initrd=/tftpboot/webroot/images/ol/6/6/initrd.img ks=http://192.168.150.1/ol66.ks ksdevice=eth0

label oracle-linux 7.0 x86-64
    menu label oracle-linux 7.0 x86-64
    kernel /tftpboot/webroot/images/ol/7/0/vmlinuz
    append initrd=/tftpboot/webroot/images/ol/7/0/initrd.img inst.ks=http://192.168.150.1/ol70.ks inst.repo=http://192.168.150.1/ol70

The next step is to ensure that you have the images you are pointing to.

[root@ol62 0]# cp -iv /media/ol70/images/pxeboot/* .
`/media/ol70/images/pxeboot/initrd.img' -> `./initrd.img'
`/media/ol70/images/pxeboot/TRANS.TBL' -> `./TRANS.TBL'
`/media/ol70/images/pxeboot/upgrade.img' -> `./upgrade.img'
`/media/ol70/images/pxeboot/vmlinuz' -> `./vmlinuz'
[root@ol62 0]# pwd
/tftpboot/webroot/images/ol/7/0

The Pre Execution Environment (PXE) will supply the KS (kickstart) directive to the boot process. The Initial RAMdisk and kernel will be pulled via tftp-don’t forget to enable xinetd and tftp if you are not using dnsmasq like me or this won’t work. You also need a DHCP server that provides the NIC of the VM with an IP and the reference to the PXE boot configuration. If you are using a lab server then dnsmasq can do this for you. I believe that admins in real life might not do it that way… The official Red Hat documentation has a nice section on how to configure your environment for pxeboot with the ISC DHCP package, too. Here is the pxe.conf file I used successfully (with that newly compiled dnsnasq-again the stock Oracle 6.x version can’t handle the –bind-interface at the time of writing.

[root@ol62 ~]# cat /etc/dnsmasq.d/pxe.conf
tftp-root=/tftpboot/webroot
enable-tftp

dhcp-boot=pxelinux.0
dhcp-option=vendor:PXEClient,6,2b
dhcp-no-override

pxe-prompt="Press F8 for boot menu", 3
pxe-service=X86PC, "Boot from network", pxelinux
pxe-service=X86PC, "Boot from local hard disk", 0

#log-queries
#log-dhcp

domain=example.com
dhcp-range=192.168.150.50,192.168.150.150,12h

This file then went into /etc/dnsmasq.d/ as you can see, and I restarted the daemon to make it read the file. Initially I found the log-queries and log-dhcp output quite useful to work out what was happening, you can see these messages in /var/log/messages. There are many … so you might want to turn that off as soon as you fixed any potential problems.

Kickstarting

My kickstart file assumes that the installation tree is available via http from 192.168.150.1/ol70. It’s just a loopback mounted ISO image so easy to implement in my host.

My kickstart file is not really special, it simply sets the partitioning schema creating 2 volume groups: rootvg for system related things such as swap, boot and root plus oraclevg for the oracle binaries. I based it primarily on the anaconda-ks.cfg file in /root that has been created during the installation. The kickstart documentationis also rather nice. Since there is no preinstall RPM available right now I resorted to a %post script to add the Oracle-specific settings:

%post --log=/root/ks-post.log

cd /etc/yum.repos.d
mv public-yum-ol7.repo public-yum-ol7.repo.ignore
wget http://192.168.150.1/ol70_local.repo

cat >> /etc/sysctl.conf <<EOF
fs.file-max = 6815744
kernel.sem = 250 32000 100 128
kernel.shmmni = 4096
kernel.shmall = 1073741824
kernel.shmmax = 4398046511104
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048576
fs.aio-max-nr = 1048576
net.ipv4.ip_local_port_range = 9000 65500
kernel.panic_on_oops = 1
EOF

sysctl -f --system

cat >> /etc/security/limits.conf <<EOF
oracle   soft   nofile    1024
oracle   hard   nofile    65536
oracle   soft   nproc     2047
oracle   hard   nproc     16384
oracle   soft   stack     10240
oracle   hard   stack     32768
oracle   soft   memlock   60397977
oracle   hard   memlock   60397977

grid   soft   nofile    1024
grid   hard   nofile    65536
grid   soft   nproc     2047
grid   hard   nproc     16384
grid   soft   stack     10240
grid   hard   stack     32768
grid   soft   memlock   60397977
grid   hard   memlock   60397977
EOF

cat > /etc/udev/rules.d/61-asm.rules <<EOF
KERNEL=="vd[cdefgh]1", OWNER="grid", GROUP="asmdba" MODE="0660"
EOF

gid=10000
for group in asmdba asmadmin kmdba dgdba backupdba dba oinstall; do
  groupadd -g $(( gid += 1 )) $group ;
done

useradd -u 10000 -g oinstall -G asmdba,kmdba,dgdba,backupdba,dba oracle
useradd -u 10001 -g oinstall -G asmdba,asmadmin,dba grid

mkdir -p /u01/app/oracle
mkdir -p /u01/app/grid

chown -R grid:oinstall /u01
chown -R oracle:oinstall /u01/app/oracle

This is of course just an example-you need to adjust this to your environment. I am assuming a separation of duties here as well which not everyone will (and should!) apply in his environment. There are also quite a few individual packages in the %package list right now that I’ll replace with the call to the preinstall RPM once it’s available. Oracle Linux 7 reflects the fact that many system daemons use a conf.d directory to separate customisations from the main configuration file. Oracle 12.1.0.2 does not check for this: when I tried moving my Oracle specific code into /etc/sysctl.d/50-oracle.conf OUI could not find and parse the file. It subsequently complained about an unknown configuration. So therefore we are back to the old days.

Now all I need to do is to call virt-install on the command line and wait for the KVM environment to be created.

References

Posted in Linux, Oracle | 1 Comment »

RAC One Node 12c-revisited on ODA 12.1.2.1.0 featuring database on ACFS

Posted by Martin Bach on December 18, 2014

Thanks to Andy Colvin (@acolvin) and the Enkitec lab I have been able to get my hands on an ODA X4-2. And since that’s a lot quieter than turning on my own lab server, and also a lot faster I used the environment to test RAC One Node in 12.1.0.2.1. I recently had a question from a customer about the usefulness of the solution and what it brings to the HA table. Well here you go.

There is just a slight caveat: to make use of the high availability features in RAC One Node you need to have an application that has been written to take advantage of them. Using Fast Connection Failover or Application Continuity is the way to go. Although I have written sample code to exploit the functionality of FCF/FAN and Application Continuity it has recently come to my attention that the very popular swingbench benchmark can do the same. I saw the connection pool tab in my demos but it didn’t “click”. Now I would like to show you what happens with your application during common scenarios in RAC One Node.

Scenario 1: Database relocation

For this blog post I’d like to try a database live migration from the first to the second host in the cluster. The database has been created using oakcli, primarily because I wanted to get more familiar with it. This is ODA release 12.1.2.1.0, the latest and greatest at the time of writing. I also love the fact that the ODA comes with quite a few of these: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz. To be precise, each node has 2s24c48t of Ivy Bridge Xeons. Nice.

[oracle@server1 ~]$ /opt/oracle/oak/bin/oakcli show version
Version
-------
12.1.2.1.0

As root (indeed!) I invoked oakcli, the primary ODA command line tool. The first time around I was bitten by not having read the documentation: if you specify the -version flag oakcli will create a new RDBMS home in both cluster nodes. That was _not_ what I had in mind. On the other hand I am now familiar with checking the environment (various permutations of oakcli show…) and deleting an Oracle home. This command created the datbase in the end:

[root@server1 ~]# oakcli create database -db RON -oh OraDb12102_home1

The command will prompt you for the datbase type (single instance, RAC One, clustered) – good stuff, simple to use, and more importantly: it worked first time.

Databases in ACFS

Now remember that from ODA release 12.1.2 onwards 11.2.0.4 and 12.1.x databases will be created on ACFS. In my case that looks slightly odd, I guess I have to get used to it first:

SQL> select name from v$datafile;

NAME
---------------------------------------------------------------------------------------------
/u02/app/oracle/oradata/datastore/.ACFS/snaps/RON/RON/datafile/o1_mf_system_b93xdj44_.dbf
/u02/app/oracle/oradata/datastore/.ACFS/snaps/RON/RON/datafile/o1_mf_sysaux_b93xdmhk_.dbf
/u02/app/oracle/oradata/datastore/.ACFS/snaps/RON/RON/datafile/o1_mf_undotbs1_b93xdosz_.dbf
/u02/app/oracle/oradata/datastore/.ACFS/snaps/RON/RON/datafile/o1_mf_undotbs2_b93xdypp_.dbf
/u02/app/oracle/oradata/datastore/.ACFS/snaps/RON/RON/datafile/o1_mf_users_b93xdzjj_.dbf
/u02/app/oracle/oradata/datastore/.ACFS/snaps/RON/RON/datafile/o1_mf_soe_tbs_b93yl62m_.dbf

6 rows selected.

SQL> select name from v$tempfile;

NAME 
----------------------------------------------------------------------------------------
/u02/app/oracle/oradata/datastore/.ACFS/snaps/RON/RON/datafile/o1_mf_temp_b93xdpl2_.tmp

SQL> select member from v$logfile

MEMBER
-----------------------------------------------------------------------------
/u01/app/oracle/oradata/datastore/RON/RON/onlinelog/o1_mf_1_b93xd3tb_.log
/u01/app/oracle/oradata/datastore/RON/RON/onlinelog/o1_mf_2_b93xd9m5_.log
/u01/app/oracle/oradata/datastore/RON/RON/onlinelog/o1_mf_3_b93y0hrq_.log
/u01/app/oracle/oradata/datastore/RON/RON/onlinelog/o1_mf_4_b93y0pf9_.log

SQL> show parameter control_files

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
control_files                        string      /u01/app/oracle/oradata/datast
                                                 ore/RON/RON/controlfile/o1_mf_
                                                 b93xd3ll_.ctl
SQL> show parameter db_recovery_file_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest                string      /u01/app/oracle/fast_recovery_
                                                 area/datastore/RON
db_recovery_file_dest_size           big integer 465G

The funny bit is that /u02/app/oracle/oradata/datastore contains snapshots… This is visible in the data file location where you find an .ACFS/snaps component. I believe the naming convention is to use .ACFS (invisible) followed by the snapshot name. You can view the snapshot detail using acfsutil:

[grid@server1 ~]$ acfsutil snap info RON /u02/app/oracle/oradata/datastore/
snapshot name:               RON
snapshot location:           /u02/app/oracle/oradata/datastore/.ACFS/snaps/RON
RO snapshot or RW snapshot:  RW
parent name:                 /u02/app/oracle/oradata/datastore/
snapshot creation time:      Wed Dec 17 15:26:24 2014

Interestingly the /u01 data store does not have a snapshot:

[grid@server1 ~]$ acfsutil snap info /u01/app/oracle/oradata/datastore
    number of snapshots:  0
    snapshot space usage: 0  ( 0.00 )

But I digress…

Database setup and configuration

After the database has been created, it is defined in Clusterware as follows:

[oracle@server1 ~]$ srvctl config database -d ron
Database unique name: RON
Database name: RON
Oracle home: /u01/app/oracle/product/12.1.0.2/dbhome_1
Oracle user: oracle
Spfile: /u02/app/oracle/oradata/datastore/.ACFS/snaps/RON/RON/spfileRON.ora
Password file: /u02/app/oracle/oradata/datastore/.ACFS/snaps/RON/RON/orapwRON
Domain: 
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: 
Disk Groups: 
Mount point paths: /u01/app/oracle/fast_recovery_area/datastore,/u01/app/oracle/oradata/datastore,/u02/app/oracle/oradata/datastore
Services: RON_racone
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: RON
Candidate servers: server1,server2
OSDBA group: dba
OSOPER group: racoper
Database instances: 
Database is administrator managed
[oracle@server1 ~]$ 

The mandatory service is defined like this:

[oracle@server1 ~]$ srvctl config service -d ron
Service name: RON_racone
Server pool: 
Cardinality: 1
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Global: false
Commit Outcome: false
Failover type: 
Failover method: 
TAF failover retries: 
TAF failover delay: 
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition: 
Pluggable database name: 
Maximum lag time: ANY
SQL Translation Profile: 
Retention: 86400 seconds
Replay Initiation Time: 300 seconds
Session State Consistency: 
GSM Flags: 0
Service is enabled
Preferred instances: RON_2
Available instances: 
[oracle@server1 ~]$ 

With RAC One Node it is even more important to connect to the service, and not to the instance.

Getting ready for the testing

The situation before the test is shown here:

[oracle@server1 ~]$ srvctl status database -d ron
Instance RON_2 is running on node server2
Online relocation: INACTIVE
[oracle@server1 ~]$ srvctl status service -d ron
Service RON_racone is running on instance(s) RON_2

For swingbench to benefit from FCF it needs to be configured so that it uses a connection pool. I have used the swingbench FAQ (http://www.dominicgiles.com/swingbenchfaq.html) to do so. My connection is defined as follows:

   <Connection>
      <UserName>soe</UserName>
      <Password>soe</Password>
      <ConnectString>//cluster-scan/RON_racone</ConnectString>
      <DriverType>Oracle jdbc Driver</DriverType>
        <ConnectionPooling>
            <PooledInitialLimit>50</PooledInitialLimit>
            <PooledMinLimit>25</PooledMinLimit>
            <PooledMaxLimit>100</PooledMaxLimit>
            <PooledInactivityTimeOut>50</PooledInactivityTimeOut>
            <PooledConnectionWaitTimeout>45</PooledConnectionWaitTimeout>
            <PooledPropertyCheckInterval>10</PooledPropertyCheckInterval>
            <PooledAbandonedConnectionTimeout>120</PooledAbandonedConnectionTimeout>
        </ConnectionPooling>
      <Properties> 
         <Property Key="StatementCaching">50</Property>
         <Property Key="FastFailover">true</Property>
         <Property Key="OnsConfiguration">nodes=server1:6200,server2:6200</Property>
         <Property Key="AppContinuityDriver">true</Property>
         <Property Key="FetchSize">20</Property>
      </Properties>
   </Connection>   

This part and the rest of the configuration is based largely on this post: Application Continuity in Oracle Database 12c (12.1.0.2).

Now – finally – for the test!

Starting charbench will trigger the creation of 100 sessions on the currently active instance (I set NumberOfUsers to 100) and they will be chugging along nicely. Here is the output after a couple of minutes:

[oracle@server1 bin]$ ./charbench -c oda_soe.xml 
Author  :        Dominic Giles
Version :        2.5.0.952

Results will be written to results.xml.
Hit Return to Terminate Run...

Time            Users   TPM     TPS

7:24:28 AM      100     17242   291                                            

I saw about 290 TPS for a total of around 18,000 TPM over some time. Let’s start the live migration. But first, I’d like to use the glass bowl to see what might happen (new in 12.1.0.2):

[oracle@server1 ~]$ srvctl predict database -db RON -verbose
Database ron will be started on node server1
Database ron will be stopped on node server2
Service ron_racone will be started on node server1

Well I guess that might be correct, but let’s try:

[oracle@server1 ~]$ srvctl relocate database -d ron -verbose -node server1
Configuration updated to two instances
Instance RON_1 started
Services relocated
Waiting for up to 30 minutes for instance RON_2 to stop ...
Instance RON_2 stopped
Configuration updated to one instance
[oracle@server1 ~]$ 

While this command executed I didn’t lose a single connection – 100 SOE connections were always established. It also takes a few second for the cache to warm up, during which the transaction rate dips a little bit:

[oracle@server1 bin]$ ./charbench -c oda_soe.xml
Author  :        Dominic Giles
Version :        2.5.0.952

Results will be written to results.xml.
Hit Return to Terminate Run...

Time            Users   TPM     TPS

7:29:11 AM      100     13135   191                                            

One minute after this the throughput is back to normal.

[oracle@server1 bin]$ ./charbench -c oda_soe.xml
Author  :        Dominic Giles
Version :        2.5.0.952

Results will be written to results.xml.
Hit Return to Terminate Run...

Time            Users   TPM     TPS

7:30:50 AM      100     16800   295          

Online relocation with RAC One node is certainly a possibility and works nicely if your application is ready for connection pools and the new way of connecting to the database.

Preview of the next article in the series

In the next article I’d like to add a couple of things I haven’t had time to test yet: 12.1.0.x extends the concept of the database resident connection pool to Java applications (and other middle tier) which would allow me to scale even further. I’d also like to show you what happens if the current RAC One Node instance fails. Stay tuned!

Posted in 12c Release 1, Linux | Tagged: , | 1 Comment »

Adventures in RAC: gc buffer busy acquire and release

Posted by Martin Bach on December 16, 2014

It seems that I’m getting more and more drawn into the world of performance analysis, and since I sometimes tend to forget things I need to write them down. I almost enjoy the “getting there” more than ultimately solving the problem. You pick up quite a few things on the way.

This environment is Exadata 12.1.1.1.1/Oracle 12.1.0.2 but as with so many things the fact that the database is on Exadata shouldn’t matter.

So here is one of these posts, this time I’m writing up what I saw related to GC Buffer Busy Acquire.

gc buffer busy acquire?

Whenever I see a wait event I haven’t dealt with extensively in the past I try to provoke behaviour to study it more closely. But first you need to know the event’s meaning. One option is to check v$event_name:

SQL> select name, parameter1, parameter2, parameter3, wait_class
  2  from v$event_name where name = 'gc buffer busy acquire';

NAME                           PARAMETER1 PARAMETER2 PARAMETER3 WAIT_CLASS
------------------------------ ---------- ---------- ---------- -------------------
gc buffer busy acquire         file#      block#     class#     Cluster

So when you see this event in ASH/traces/v$session etc you know it’s a cluster wait and potentially limiting your processing throughput. It also tells you the file#, block# and class# of the buffer which you can link to v$bh. This view allows you to find the data object ID given these input parameters.

Using https://orainternals.wordpress.com/2012/04/19/gc-buffer-busy-acquire-vs-release/ as a source I worked out that the event has to do with acquiring a buffer (=block) in RAC (gc = global cache) on the local instance. If the block you need is on the remote instance you wait for it to be released, and the wait event is gc buffer busy release.

Since Oracle will clone blocks in buffer caches for consistent reads and use a shared lock on these for reading I thought that waiting can only happen if someone requested a block in XCUR (exclusive current) mode. So with that working hypothesis I went to work.

How to test

I started off writing a small java class that creates a connection pool against my RAC database. I initially used the default service name in the connect descriptor but had to find out that dbms_monitor.SERV_MOD_ACT_TRACE_ENABLE did not trace my sessions. In the end I created a true RAC service with CLB and RLB goals against both instances and I ended up with traces in the diagnostic_dest.

After setting up the UCP connection pool the code will create a number of threads that each will pull a connection from the pool, do some work (*) and hand it back to the pool as good citizens should do.

(*) The do some work bit is this::

...
                        try {

                                PreparedStatement pstmt = conn.prepareStatement(
                                  "select /* bufferbusy001 */ id, to_char(sysdate,'hh24:mi:ss') d " +
                                  "from t1 where id = ? for update");

                                int randomID = new Random().nextInt((1450770 - 1450765) + 1) + 1450765;
                                System.out.println("thread " + mThreadID + " now querying for ID " + randomID);
                                pstmt.setInt(1, randomID);

                                ResultSet rs = pstmt.executeQuery();

                                while (rs.next()) {
                                        System.out.println("Thread " + mThreadID + " reporting an id of "
                                        + rs.getInt("id") + ". Now it is " + rs.getString("d"));
                                }

                                rs.close();
                                pstmt.close();
                                conn.rollback();
                                conn.close();
                                conn = null;

                                Thread.sleep(2000);
                        } catch (Exception e) {
                                e.printStackTrace();
                        }
...

I think that’s how a Java developer would do it (with more error handling of course) but then I’m not a Java developer. It did work though! What I considered most important was to generate contention on a single block. Using dbms_rowid I could find out which IDs belong to (a random) block:

SQL> select * from (
  2    select id,DBMS_ROWID.ROWID_BLOCK_NUMBER (rowid, 'BIGFILE') as block
  3      from t1
  4 ) where block = 11981654;

        ID      BLOCK
---------- ----------
   1450765   11981654
   1450766   11981654
   1450767   11981654
   1450768   11981654
   1450769   11981654
   1450770   11981654

6 rows selected.

So if I manage to randomly select from the table where ID in the range …765 to …770 then I should be ok and just hit that particular block.

It turned out that the SQL statement completed so quickly I had to considerably ramp up the number of sessions in the pool to see anything. I went up from 10 to 500 before I could notice a change. Most of the statements are too quick to even be caught in ASH-Tanel’s ashtop script showed pretty much nothing except ON-CPU occasionally as well as the odd log file sync event. Snapper also reported sessions in idle state.

SQL> r
  1  select count(*), inst_id, status, sql_id, event, state
  2  from gv$session where module = 'BufferBusy'
  3* group by inst_id, status, sql_id, event, state

  COUNT(*)    INST_ID STATUS   SQL_ID        EVENT                          STATE
---------- ---------- -------- ------------- ------------------------------ -------------------
       251          1 INACTIVE               SQL*Net message from client    WAITING
       248          2 INACTIVE               SQL*Net message from client    WAITING

2 rows selected.

That’s what you see for most of the time.

Let me trace this for you

So in order to get any meaningful idea about the occurrence (or absence) of the gc buffer busy acquire event I added a MODULE to my sessions so I can later on run trcsess to combine traces. Here is the resulting raw trace, or rather an excerpt from it:

=====================
PARSING IN CURSOR #140650659166120 len=96 dep=0 uid=65 oct=3 lid=65 tim=4170152514049 hv=1500360262 ad='5b58a4a10' sqlid='6a5jfvpcqvbk6'
select /* bufferbusy001 */ id, to_char(sysdate,'hh24:mi:ss') d from t1 where id = :1  for update
END OF STMT
PARSE #140650659166120:c=0,e=5598,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2291732815,tim=4170152514046
WAIT #140650659166120: nam='gc buffer busy acquire' ela= 12250 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152526533
WAIT #140650659166120: nam='buffer busy waits' ela= 1890 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152528475
WAIT #140650659166120: nam='enq: TX - row lock contention' ela= 188606 name|mode=1415053318 usn<<16 | slot=1179674 sequence=1485 obj#=20520 tim=4170152717199
WAIT #140650659166120: nam='gc buffer busy acquire' ela= 1590 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152718839
WAIT #140650659166120: nam='enq: TX - row lock contention' ela= 36313 name|mode=1415053318 usn<<16 | slot=1245199 sequence=1894 obj#=20520 tim=4170152755340
WAIT #140650659166120: nam='gc buffer busy acquire' ela= 1268 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152756655
WAIT #140650659166120: nam='buffer busy waits' ela= 668 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152757363
WAIT #140650659166120: nam='KJC: Wait for msg sends to complete' ela= 11 msg=26941469232 dest|rcvr=65539 mtype=8 obj#=20520 tim=4170152757492
EXEC #140650659166120:c=1999,e=243530,p=0,cr=9,cu=4,mis=0,r=0,dep=0,og=1,plh=2291732815,tim=4170152757651
WAIT #140650659166120: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=20520 tim=4170152757709
FETCH #140650659166120:c=0,e=16,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=2291732815,tim=4170152757749
STAT #140650659166120 id=1 cnt=1 pid=0 pos=1 obj=0 op='FOR UPDATE  (cr=9 pr=0 pw=0 time=243443 us)'
STAT #140650659166120 id=2 cnt=2 pid=1 pos=1 obj=0 op='BUFFER SORT (cr=3 pr=0 pw=0 time=60 us)'
STAT #140650659166120 id=3 cnt=1 pid=2 pos=1 obj=48863 op='INDEX RANGE SCAN I_T1$SEC1 (cr=3 pr=0 pw=0 time=37 us cost=3 size=6 card=1)'
WAIT #140650659166120: nam='SQL*Net message from client' ela= 260 driver id=1413697536 #bytes=1 p3=0 obj#=20520 tim=4170152758109
CLOSE #140650659166120:c=0,e=5,dep=0,type=1,tim=4170152758141
XCTEND rlbk=1, rd_only=0, tim=4170152758170
WAIT #0: nam='gc buffer busy acquire' ela= 3764 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152761976
WAIT #0: nam='buffer busy waits' ela= 1084 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152763104
WAIT #0: nam='log file sync' ela= 246 buffer#=119491 sync scn=19690898 p3=0 obj#=20520 tim=4170152763502
WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=20520 tim=4170152763536

Result! There are gc buffer busy acquire events recorded. I can’t rule out TX-row lock contention since with all those threads and only 6 IDs to choose from there was going to be some locking on the same ID caused by the “for update” clause.

Now I am reasonably confident that I worked out at least one scenario causing a gc buffer busy acquire. You might also find the location of the blocks in the buffer cache interesting:

SYS:dbm011> select count(*), inst_id, block#, status
  2  from gv$bh where block# = 11981654
  3  group by inst_id, block#, status order by inst_id, status;

   COUNT(*)     INST_ID      BLOCK# STATUS
----------- ----------- ----------- ----------
          9           1    11981654 cr
          1           1    11981654 xcur
          9           2    11981654 cr

There is the one block in XCUR mode and 9 in CR mode in the buffer cache for that block.

Making it worse

Now I didn’t want to stop there, I was interested in what would happen under CPU load. During my career I noticed cluster waits appear primarily when you are CPU-bound (all other things being equal). This could be the infamous middle-tier-connection-pool-mismanagement or an execution plan going wrong with hundreds of users performing nested loop joins when they should hash-join large data sets… This is usually the point where OEM users ask the DBAs to do something against that “sea of grey” in the performance pages.

As with every cluster technology an overloaded CPU does not help. Well-I guess that’s true for all computing. To increase the CPU load I created 10 dd sessions to read from /dev/zero and write to /dev/null. Sounds silly but one of these hogs 1 CPU core 100%. With 10 out of 12 cores 100% occupied that way on node 1 I relaunched my test. The hypothesis that CPU overload has an effect was proven right by suddenly finding ASH samples of my session.

SQL> @ash/ashtop sql_id,session_state,event "sql_id='6a5jfvpcqvbk6'" sysdate-5/1440 sysdate

    Total
  Seconds     AAS %This   SQL_ID        SESSION EVENT
--------- ------- ------- ------------- ------- ----------------------------------------
      373     1.2   79% | 6a5jfvpcqvbk6 WAITING enq: TX - row lock contention
       54      .2   11% | 6a5jfvpcqvbk6 WAITING gc buffer busy release
       20      .1    4% | 6a5jfvpcqvbk6 ON CPU
       11      .0    2% | 6a5jfvpcqvbk6 WAITING gc buffer busy acquire
       11      .0    2% | 6a5jfvpcqvbk6 WAITING gc current block busy
        1      .0    0% | 6a5jfvpcqvbk6 WAITING gc current block 2-way

6 rows selected.

Using ASQLMON I can even see where time is spent:

SQL> @scripts/ash/asqlmon 6a5jfvpcqvbk6 % sysdate-1 sysdate

    SECONDS Activity Visual       Line ID Parent ASQLMON_OPERATION                   SESSION EVENT                                         AVG_P3 OBJ_ALIAS_ ASQLMON_PR
----------- -------- ------------ ------- ------ ----------------------------------- ------- ---------------------------------------- ----------- ---------- ----------
         38    2.4 % |          |       0        SELECT STATEMENT                    ON CPU                                                883065
          1     .1 % |          |       1      0  FOR UPDATE                         WAITING gc current block 2-way                      33554433  [SEL$1]
         25    1.6 % |          |       1      0                                     ON CPU                                            5369727.36  [SEL$1]
         17    1.1 % |          |       1      0                                     WAITING buffer busy waits                                  1  [SEL$1]
        109    6.8 % |#         |       1      0                                     WAITING gc buffer busy release                             1  [SEL$1]
         31    1.9 % |          |       1      0                                     WAITING gc buffer busy acquire                             1  [SEL$1]
         27    1.7 % |          |       1      0                                     WAITING gc current block busy                       33554433  [SEL$1]
        768   48.0 % |#####     |       1      0                                     WAITING enq: TX - row lock contention            6685.143229  [SEL$1]
          3     .2 % |          |       2      1   BUFFER SORT                       ON CPU                                                     0
          2     .1 % |          |       3      2    INDEX RANGE SCAN [I_T1$SEC1]     ON CPU                                                     0 T1@SEL$1 [ [A:]
                                                                                                                                                  SEL$1]     "ID"=:1


          2     .1 % |          |       0        SELECT STATEMENT                    ON CPU                                            16777216.5
          2     .1 % |          |       1      0  FOR UPDATE                         WAITING gc current block busy                       33554433  [SEL$1]
         24    1.5 % |          |       1      0                                     WAITING write complete waits                               0  [SEL$1]
          9     .6 % |          |       1      0                                     WAITING gc buffer busy acquire                             1  [SEL$1]
         30    1.9 % |          |       1      0                                     WAITING gc buffer busy release                             1  [SEL$1]
          9     .6 % |          |       1      0                                     WAITING buffer busy waits                                  1  [SEL$1]
          7     .4 % |          |       1      0                                     ON CPU                                           158.8571429  [SEL$1]
        496   31.0 % |###       |       1      0                                     WAITING enq: TX - row lock contention            6396.395161  [SEL$1]
                   % |          |       2      1   BUFFER SORT
                   % |          |       3      2    INDEX RANGE SCAN [I_T1$SEC1]                                                                  T1@SEL$1 [ [A:]
                                                                                                                                                  SEL$1]     "ID"=:1

Further Reading

I’m sure there is a wealth of resources available out there, in my case Riyaj’s blog helped me a lot. He even tagged posts with gc buffer busy: https://orainternals.wordpress.com/tag/gc-buffer-busy

Have a look at the Oaktable World 2014 agenda and watch Tanel Poder’s session attentively. You’d be surprised how many scripts he made publicly available to troubleshoot perform. Like snapper? It’s only the tip of the iceberg. And if you can, you should really attend his advanced troubleshooting seminar.

Posted in 12c Release 1, RAC | 2 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 2,620 other followers