Martins Blog

Trying to explain complex things in simple terms

Public appearances 2015

Posted by Martin Bach on March 23, 2015

This is going to be a list of my upcoming public appearances in 2015.

Upcoming Events

UKOUG Systems Event

I’ll be at the UKOUG systems event in London May 20. A well established one day event where many great speakers have presented in the past. After a short detour to Birmingham the event is now back in London and @Enkitec I am thrilled to be there. My presentation is titled “Migrating to Exadata the easy way” and incorporates a fair bit of new research.

Enkitec Extreme Exadata Expo (E4)

E4 is the conference when it comes to Exadata and Big Data. I have been over twice and it’s been more than fantastic. And I would tell you the same even if I wasn’t working for Enkitec, it truly is a serious tech conference (have a look at the agenda and speakers then I’m sure you’ll believe me). My part in the conference is my all-new and updated talk about Hybrid Columnar Compression internals.

UKOUG logo

UKOUG Database Server and RAC Cloud, Infrastructure and Availability SIG

I will be speaking at the Combined RDBMS and RAC SIG in Reading July first this year. It’ll be a first for my “Resource Manager in Action” presentation. I am hoping to see you there!

Past conferences

I have attended the following events thus far:

  • Fill the Glass webinar with Cary Millsap
  • ClubOracle in London
  • Paris Oracle Meetup
  • DOAG Exaday

Posted in Public Appearances | Leave a Comment »

If you use swingbench take note of sbutil

Posted by Martin Bach on May 19, 2015

This is going to be a very short post for a change. I have used Swingbench extensively and really love the tool. Many thanks to Dominic Giles!

Recently he announced a new tool on his blog that you can use to inflate your data volume. So instead of using the “-scale” argument when executing oewizard you can just keep the defaults and later on create as much data as you like. Here is an example, the reason for this post.

Setting the Scene

If you have not yet downloaded the latest Swingbench version and JDK 8 do this first. You can get Swingbench from the tool’s landing page and the Java software from java.sun.com. I used Swingbench 2.5.791 (stable). With both deployed, I created the Order Entry schema in lights-out mode:

[oracle@lab bin]$ ./oewizard -dbap secret -u soe -p soe -cl -cs //lab/martin -ts soe_tbs -create -dba system
SwingBench Wizard
Author  :        Dominic Giles
Version :        2.5.0.971

Running in Lights Out Mode using config file : oewizard.xml
The following statement failed : GRANT EXECUTE ON dbms_lock TO soe : Due to : ORA-01031: insufficient privileges

============================================
|           Datagenerator Run Stats        |
============================================
Connection Time                        0:00:00.006
Data Generation Time                   0:01:06.110
DDL Creation Time                      0:00:30.346
Total Run Time                         0:01:36.465
Rows Inserted per sec                      183,121
Data Generated (MB) per sec                   14.9
Actual Rows Generated                   13,002,138

Post Creation Validation Report
===============================
The schema appears to have been created successfully.

Valid Objects
=============
Valid Tables : 'ORDERS','ORDER_ITEMS','CUSTOMERS','WAREHOUSES','ORDERENTRY_METADATA','INVENTORIES',
'PRODUCT_INFORMATION','PRODUCT_DESCRIPTIONS','ADDRESSES','CARD_DETAILS'
Valid Indexes : 'PRD_DESC_PK','PROD_NAME_IX','PRODUCT_INFORMATION_PK','PROD_SUPPLIER_IX','PROD_CATEGORY_IX',
'INVENTORY_PK','INV_PRODUCT_IX','INV_WAREHOUSE_IX','ORDER_PK','ORD_SALES_REP_IX',
'ORD_CUSTOMER_IX','ORD_ORDER_DATE_IX','ORD_WAREHOUSE_IX','ORDER_ITEMS_PK','ITEM_ORDER_IX',
'ITEM_PRODUCT_IX','WAREHOUSES_PK','WHS_LOCATION_IX','CUSTOMERS_PK','CUST_EMAIL_IX',
'CUST_ACCOUNT_MANAGER_IX','CUST_FUNC_LOWER_NAME_IX','ADDRESS_PK','ADDRESS_CUST_IX',
'CARD_DETAILS_PK','CARDDETAILS_CUST_IX'
Valid Views : 'PRODUCTS','PRODUCT_PRICES'
Valid Sequences : 'CUSTOMER_SEQ','ORDERS_SEQ','ADDRESS_SEQ','LOGON_SEQ','CARD_DETAILS_SEQ'
Valid Code : 'ORDERENTRY'
Schema Created

More Data!

The next step is to invoke sbutil:

[oracle@lab bin]$ ./sbutil -u soe -p soe  -cs //lab/martin -soe parallel 12 -dup 4
Getting table Info
Got table information. Completed in : 0:00:01.558
Dropping Indexes
Dropped Indexes. Completed in : 0:00:03.252
Creating copies of tables
Created copies of tables. Completed in : 0:00:00.042
Begining data duplication
Completed Iteration 4. Completed in : 0:00:07.942
Creating  Constraints
Created  Constraints. Completed in : 0:00:28.737
Creating  Indexes
Created  Indexes. Completed in : 0:00:54.541
Updating Metadata and Recompiling Code
Updated Metadata. Completed in : 0:00:00.754
Updating Sequences
Updated Sequences. Completed in : 0:00:00.792
Determining New Row Counts
Got New Row Counts. Completed in : 0:00:03.132
Completed Data Duplication in 0 hour(s) 2 minute(s) 8 second(s) 454 millisecond(s)
----------------------------------------------------------------------------------------------------------
|Table Name          |  Original Row Count|       Original Size|       New Row Count|            New Size|
----------------------------------------------------------------------------------------------------------
|ORDER_ITEMS         |           4,288,582|              280 MB|          17,154,328|              1.1 GB|
|CUSTOMERS           |           1,000,000|              136 MB|           4,000,000|            552.4 MB|
|CARD_DETAILS        |           1,500,000|               80 MB|           6,000,000|            337.8 MB|
|ORDERS              |           1,429,790|              160 MB|           5,719,160|              664 MB|
|ADDRESSES           |           1,500,000|              184 MB|           6,000,000|            571.8 MB|
----------------------------------------------------------------------------------------------------------
|Total               |                    |              840 MB|                    |              3.2 GB|
----------------------------------------------------------------------------------------------------------

How Cool is That?

Really, it is. And look at the timings. This didn’t take too long, even on my poor old AMD 6200 series lab server.

Posted in Performance | Tagged: , , | 1 Comment »

Additional information on Oracle 12c big table caching

Posted by Martin Bach on May 13, 2015

Teaching is on the things I like doing, and currently I am investigating the Oracle 12c features around caching data in the various memory areas. Since the In-Memory (cost) option has been discussed by other far more knowledgeable people I would like to share some findings about the big table caching here.

Some Background

In Oracle 12c you have two additional options to cache information: full database caching and big table caching. The first is great if you have a massively big machine with lots and lots of DRAM plus a clever OS that can deal with the inevitable ccNUMA setup you will have to tackle. And maybe don’t want to pay for the In-Memory option. This post is not about full database caching, but rather about the other possibility to cache blocks.

This other option is to have just a few tables in an area within the buffer cache. That’s right-after the keep and recycle pools you can now tweak the default buffer cache. In fact you tell Oracle by means of an initialisation parameter how much of the default buffer cache can be used to cache full scans. You can use zero percent (default) for caching full scans, up to 90%. You must leave 10% to OLTP workloads. I haven’t really investigated the pre-12c caching mechanism in detail (hey it works!) but I read that full scans are not cached by default to avoid thrashing the cache.

Big Table Caching

Now what? We can cache full scans too. Interesting, let’s try it. I am using 12.1.0.2.2 on an ODA, but that shouldn’t really matter. The SGA is 24 GB in size, and my segment to be scanned (a non-partitioned table without indexes) is about 20GB in size. I want 50% of the buffer cache allocated for the big table caching tool.

SQL> select component, current_size/power(1024,2) size_mb
  2  from v$sga_dynamic_components where current_size <> 0;

COMPONENT                                                           SIZE_MB
---------------------------------------------------------------- ----------
shared pool                                                            2688
large pool                                                              192
java pool                                                               128
DEFAULT buffer cache                                                  20992
Shared IO Pool                                                          512

SQL> show parameter db_big_table_cache_percent_target

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_big_table_cache_percent_target    string      50

SQL> select * from v$bt_scan_cache;

BT_CACHE_ALLOC BT_CACHE_TARGET OBJECT_COUNT MEMORY_BUF_ALLOC MIN_CACHED_TEMP     CON_ID
-------------- --------------- ------------ ---------------- --------------- ----------
    .500334108              50            0                0            1000          0

SQL> select bytes/power(1024,2) m from dba_segments
  2  where owner = 'MARTIN' and segment_name = 'T1';

         M
----------
     20864

Now let’s see if we can make use of this. 50% of 20GB are about 10GB useable for the scan cache. If I start a query against T1 in another session I can see the object count increase.

SQL> r
  1* select * from v$bt_scan_cache

BT_CACHE_ALLOC BT_CACHE_TARGET OBJECT_COUNT MEMORY_BUF_ALLOC MIN_CACHED_TEMP     CON_ID
-------------- --------------- ------------ ---------------- --------------- ----------
    .500334108              50            1          1292363            1000          0

Subsequent queries against T1 will increase the temperature and potentially the cached information. The “object temperature” is a method Oracle uses to determine the suitability of an object to be cached. In my very basic example there is only one table which has been full-scanned so far. Later on I’ll add T2 to the mix. The temperature and other object information are reflected in the second view, v$bt_scan_obj_temps:

SQL> select * from v$bt_scan_obj_temps;

       TS#   DATAOBJ# SIZE_IN_BLKS TEMPERATURE POLICY     CACHED_IN_MEM     CON_ID
---------- ---------- ------------ ----------- ---------- ------------- ----------
         4      39461      2669763        4000 MEM_PART         1292711          0

Session Counters!

I have developed a habit of looking into session counters when running queries to see if there is anything of interest. Using snapper with the before/after snapshot I got this:

SQL> @snapper4 all,end 5 1 163
Sampling SID 163 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v4.11 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
...
    163, MARTIN    , STAT, user calls                                                ,             3,        .06,         ,             ,          ,           ,          3 per execution
...
    163, MARTIN    , STAT, physical read total IO requests                           ,         10752,     224.51,         ,             ,          ,           ,     10.75k per execution
    163, MARTIN    , STAT, physical read total multi block requests                  ,         10752,     224.51,         ,             ,          ,           ,     10.75k per execution
    163, MARTIN    , STAT, physical read total bytes                                 ,   11262763008,    235.17M,         ,             ,          ,           ,     11.26G per execution
...
    163, MARTIN    , STAT, consistent gets                                           ,       2666722,     55.68k,         ,             ,          ,           ,      2.67M per execution
    163, MARTIN    , STAT, consistent gets from cache                                ,       1292769,     26.99k,         ,             ,          ,           ,      1.29M per execution
    163, MARTIN    , STAT, consistent gets pin                                       ,       1292769,     26.99k,         ,             ,          ,           ,      1.29M per execution
    163, MARTIN    , STAT, consistent gets pin (fastpath)                            ,       1292768,     26.99k,         ,             ,          ,           ,      1.29M per execution
    163, MARTIN    , STAT, consistent gets direct                                    ,       1373953,     28.69k,         ,             ,          ,           ,      1.37M per execution
...
    163, MARTIN    , STAT, physical reads                                            ,       1374849,     28.71k,         ,             ,          ,           ,      1.37M per execution
    163, MARTIN    , STAT, physical reads cache                                      ,           128,       2.67,         ,             ,          ,           ,        128 per execution
    163, MARTIN    , STAT, physical reads direct                                     ,       1374721,     28.71k,         ,             ,          ,           ,      1.37M per execution
    163, MARTIN    , STAT, physical read IO requests                                 ,         10752,     224.51,         ,             ,          ,           ,      1.05M bytes per request
    163, MARTIN    , STAT, physical read bytes                                       ,   11262763008,    235.17M,         ,             ,          ,           ,     11.26G per execution
...
    163, MARTIN    , STAT, data warehousing scanned objects                          ,             1,        .02,         ,             ,          ,           ,          1 per execution
    163, MARTIN    , STAT, data warehousing scanned blocks                           ,       2666668,     55.68k,         ,             ,          ,           ,      2.67M per execution
    163, MARTIN    , STAT, data warehousing scanned blocks - memory                  ,       1292715,     26.99k,         ,             ,          ,           ,      1.29M per execution
    163, MARTIN    , STAT, data warehousing scanned blocks - disk                    ,       1373953,     28.69k,         ,             ,          ,           ,      1.37M per execution
...
    163, MARTIN    , STAT, table scans (short tables)                                ,             1,        .02,         ,             ,          ,           ,          1 per execution
    163, MARTIN    , STAT, table scan rows gotten                                    ,      16000006,    334.09k,         ,             ,          ,           ,        16M per execution
    163, MARTIN    , STAT, table scan disk non-IMC rows gotten                       ,      16000006,    334.09k,         ,             ,          ,           ,        16M per execution
    163, MARTIN    , STAT, table scan blocks gotten                                  ,       2666668,     55.68k,         ,             ,          ,           ,      2.67M per execution
...
    163, MARTIN    , STAT, execute count                                             ,             1,        .02,         ,             ,          ,           ,          1 executions per parse
...
    163, MARTIN    , TIME, parse time elapsed                                        ,            26,      .54us,      .0%, [          ],          ,           ,
    163, MARTIN    , TIME, DB CPU                                                    ,       5682136,   118.65ms,    11.9%, [@@        ],          ,           ,
    163, MARTIN    , TIME, sql execute elapsed time                                  ,      33220099,   693.66ms,    69.4%, [#######   ],          ,           ,
    163, MARTIN    , TIME, DB time                                                   ,      33220306,   693.66ms,    69.4%, [#######   ],          ,           ,       -.32 % unaccounted time
    163, MARTIN    , WAIT, db file scattered read                                    ,         24636,   514.41us,      .1%, [          ],         1,        .02,    24.64ms average wait
    163, MARTIN    , WAIT, direct path read                                          ,      27536664,   574.98ms,    57.5%, [WWWWWW    ],     10694,      223.3,     2.57ms average wait
    163, MARTIN    , WAIT, SQL*Net message to client                                 ,             3,      .06us,      .0%, [          ],         2,        .04,      1.5us average wait
    163, MARTIN    , WAIT, SQL*Net message from client                               ,      14826292,   309.58ms,    31.0%, [WWWW      ],         2,        .04,      7.41s average wait

--  End of Stats snap 1, end=2015-05-11 10:13:13, seconds=47.9

Don’t worry about the wide output-all you need in the output is the STATISTIC and DELTA columns. This can all be a bit overwhelming at first, so let me guide you through-statistics that are not important for the discussion have already been removed.

First of all you see the phyical reads. 10752 IO requests were issued, all of them multi block requests (physical read total IO requests and physical read total multi block requests, repeated also in physical read IO requests). Hmm – 10GB – isn’t that 50% of my buffer cache? It might actually be possible that 10GB of the 20GB table were read from disk using direct path reads, and another 10GB came from memory.

What is interesting is the next set of counters: “data warehousing scanned %” which I haven’t seen until now. In fact, if I run @TanelPoder’s statn.sql script in my session I can see there are more than the ones I have in the snapper output:

SQL> @statn.sql warehousing

     STAT# HEX#      OFFSET NAME                                                                  VALUE
---------- ----- ---------- ---------------------------------------------------------------- ----------
       237 ED          1896 data warehousing scanned objects                                          5
       238 EE          1904 data warehousing scanned blocks                                    13333340
       239 EF          1912 data warehousing scanned blocks - memory                            6462815
       240 F0          1920 data warehousing scanned blocks - flash                                   0
       241 F1          1928 data warehousing scanned blocks - disk                              6870525
       242 F2          1936 data warehousing scanned blocks - offload                                 0
       243 F3          1944 data warehousing evicted objects                                          0
       244 F4          1952 data warehousing evicted objects - cooling                                0
       245 F5          1960 data warehousing evicted objects - replace                                0
       246 F6          1968 data warehousing cooling action                                           0

10 rows selected.

Interesting! I’m quite glad Oracle gave us so many statistics that describe the way the big table caching works. I have queried T1 5 times in my session, and scanned 13333340 blocks during these. The distribution between memory and disk is 6462815 to 6870525, almost 50:50. It looks like you can use flash for this and offloading (note to self: repeat the test on Exadata). That makes sense as in the example just shown: the segment to be scanned is 20GB out of which 10GB are in the buffer cache. If the rest of the segment can be scanned using direct path reads as in the above example then it is more than likely that you can offload the scan as well.

I can even see if there was space pressure on the big table cache, the data warehousing evicted% statistics hint at space management.

Purely from memory?

Looking at the session counters above my take is too big. I would like to see more scans entirely from memory :) So I created a table with 25% of the size of T1 and called it T2 using the sample clause. Initial scans against T2 showed disk scans only (policy = DISK):

SQL> r
  1* select * from v$bt_scan_obj_temps

       TS#   DATAOBJ# SIZE_IN_BLKS TEMPERATURE POLICY     CACHED_IN_MEM     CON_ID
---------- ---------- ------------ ----------- ---------- ------------- ----------
         4      39461      2669763        8000 MEM_PART         1293081          0
         4      39465       668107        5000 DISK                   0          0

But after the 9th scan (each scan increased the temperature by 1000) the situation changed:

SQL> r
  1* select * from v$bt_scan_obj_temps

       TS#   DATAOBJ# SIZE_IN_BLKS TEMPERATURE POLICY     CACHED_IN_MEM     CON_ID
---------- ---------- ------------ ----------- ---------- ------------- ----------
         4      39465       668107        9000 MEM_ONLY          668107          0
         4      39461      2669763        8000 MEM_PART          624974          0

The smaller table now fits into memory and took over! The result is visible in the execution time:

SQL> r
  1* select min(date_created) from t2

MIN(DATE_CREATED)
-----------------
20140513 00:00:00

Elapsed: 00:00:05.39

...

SQL> r
  1* select min(date_created) from t2

MIN(DATE_CREATED)
-----------------
20140513 00:00:00

Elapsed: 00:00:01.25

So to prove the point I reconnected to the database and ran my test again:


SQL> select min(date_created) from t2;

MIN(DATE_CREATED)
-----------------
20140513 00:00:00

Elapsed: 00:00:01.36

SQL> @statn warehousing

     STAT# HEX#      OFFSET NAME                                                                  VALUE
---------- ----- ---------- ---------------------------------------------------------------- ----------
       237 ED          1896 data warehousing scanned objects                                          1
       238 EE          1904 data warehousing scanned blocks                                      666966
       239 EF          1912 data warehousing scanned blocks - memory                             666966
       240 F0          1920 data warehousing scanned blocks - flash                                   0
       241 F1          1928 data warehousing scanned blocks - disk                                    0
       242 F2          1936 data warehousing scanned blocks - offload                                 0
       243 F3          1944 data warehousing evicted objects                                          0
       244 F4          1952 data warehousing evicted objects - cooling                                0
       245 F5          1960 data warehousing evicted objects - replace                                0
       246 F6          1968 data warehousing cooling action                                           0

10 rows selected.

So the popularity of the object plays a big role working out which segment to cache as well. This is an interesting feature worth testing in your development environments. I have no idea if it’s licensable or not so please make sure you check with Oracle before using it to avoid surprises.

Posted in 12c Release 1, Performance | Tagged: | 9 Comments »

Exadata Fast Data File Creation

Posted by Martin Bach on May 11, 2015

This post is the result of some testing I performed on Exadata data file creation. You may know that Exadata offloading incorporates SQL optimisations as well as some infrastructure work. For quite some time Exadata allowed the DBA to create data files a lot quicker than on traditional systems. This has been documented before, for example by @mpnsh here

The final comment on his blog entry was a remark that data file creation is quite fast, but that is not true for online redo logs. Especially in environments where you duplicate production to a lower tier environment you have to wait for the online redo logs (including all members across all threads) to be zeroed. This is no longer an issue with Fast Data File Creation. If your system is configured to use Write-Back Flash Cache (WBFC from now on) and you are on Exadata 11.2.3.3 then you can benefit from super-fast file creation, including online redo logs. Here is an example, taken from a SQL trace:

TKPROF: Release 12.1.0.2.0 
...
Trace file: /u01/app/oracle/diag/rdbms/mbach/MBACH1/trace/MBACH1_ora_124411.trc
Sort options: default

********************************************************************************
count    = number of times OCI procedure was executed
cpu      = cpu time in seconds executing
elapsed  = elapsed time in seconds executing
disk     = number of physical reads of buffers from disk
query    = number of buffers gotten for consistent read
current  = number of buffers gotten in current mode (usually for update)
rows     = number of rows processed by the fetch or execute call
********************************************************************************

SQL ID: fvt6psf2t3cdz Plan Hash: 0

alter database add logfile thread 2 group 5 ('+DATA','+RECO') size 4096m


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.37       5.09          0          0          2           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.38       5.09          0          0          2           0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  Disk file Mirror Read                           2        0.00          0.00
  control file sequential read                   23        0.00          0.01
  KSV master wait                                19        0.04          0.09
  ASM file metadata operation                     6        0.00          0.00
  CSS initialization                              1        0.00          0.00
  CSS operation: query                            6        0.00          0.00
  CSS operation: action                           2        0.00          0.00
  kfk: async disk IO                              2        0.00          0.00
  Disk file operations I/O                        3        0.00          0.00
  cell smart file creation                     3258        0.13          4.64
  log file single write                           2        0.00          0.00
  control file parallel write                    10        0.07          0.07
  log file sync                                   1        0.00          0.00
  SQL*Net message to client                       1        0.00          0.00
  SQL*Net message from client                     1       17.61         17.61
********************************************************************************

I have tried to replicate the command issued in Martin Nash’s blog post. If memory serves me right he was waiting for 37 seconds for this command to complete. The new method uses about 5 seconds-not bad! Interestingly there is no new wait event-the one we know (cell smart file creation) is used to indicate the work that has been done. In the case of Fast Data File Creation only the metadata about the new file is persisted in the WBFC, the actual formatting has not happened when the prompt returned.

Session Counters?

I have spent _a lot_ of time on Exadata related session counters in 12.1.0.2 and earlier releases. I think they are fascinating, and since the wait events do not really show me what happened during the execution of the statement I used the session counters instead. I am using Adrian Billington’s mystats tool for this:

SQL> @scripts/mystats start

SQL> alter database add logfile thread 2 group 5 ('+DATA','+RECO') size 4096m;

Database altered.

Elapsed: 00:00:04.87
SQL> @scripts/mystats stop t=1
...
STAT    cell flash cache read hits                                                      20
STAT    cell logical write IO requests                                                  14
STAT    cell overwrites in flash cache                                                  30
STAT    cell physical IO bytes eligible for predicate offload               17,179,869,184
STAT    cell physical IO bytes saved during optimized file creation         17,179,869,184
STAT    cell physical IO interconnect bytes                                      1,361,920
STAT    cell writes to flash cache                                                      30
...
STAT    physical write requests optimized                                               10
STAT    physical write total IO requests                                             4,126
STAT    physical write total bytes                                          17,180,197,888
STAT    physical write total bytes optimized                                       163,840
STAT    physical write total multi block requests                                    4,096

Let’s begin with the cell%-statistics. These indicate what happened on the storage layer. The command I executed was to create an online redo log group in thread 2, in disk groups DATA and RECO. Each file is approximately 4 GB in size.

SQL> select f.member, l.bytes/power(1024,2) m 
  2  from v$log l, v$logfile f
  3  where l.group# = f.group#
  4  and l.group# = 5;

MEMBER                                                                M
------------------------------------------------------------ ----------
+DATA/MBACH/ONLINELOG/group_5.454.879407723                        4096
+RECO/MBACH/ONLINELOG/group_5.1229.879407725                       4096

2 rows selected.

So why does the database report 17,180,197,888 “bytes saved during optimized file creation” and later again in “physical write total bytes”? The answer is ASM mirroring. In this system DATA and RECO are protected using ASM normal redundancy, doubling the writes.

What I found amusing is that the exact number of bytes eventually written is “eligible for predicate offload”. Until quite recently I only associated Smart Scans with this statistic counter.

You can also see a few writes (and overwrites) to (Write Back) Flash Cache.

Posted in 12c Release 1, Exadata, Performance | Leave a Comment »

Little things worth knowing: Is there a penalty in establishing a connection to Oracle using the MAA connection string? Part 2

Posted by Martin Bach on April 23, 2015

In the very lengthy previous post about the MAA connect string I wanted to explain the use of the MAA connection string as promoted by Oracle. I deliberately kept the first part simple: both primary and standby cluster were up, and although the database was operating in the primary role on what I called standby cluster (again it’s probably not a good idea to include the intended role in the infrastructure names) there was no penalty establishing a connection.

As pointed out by readers of the blog entry that is of course only one part of the story (you may have guessed by the TNS alias MAA_TEST1 … there are more to come). When you define the connection string this way, you cater for the situation where something goes wrong. Let’s try a few more scenarios. Here is the current Data Guard situation:

DGMGRL> show configuration

Configuration - martin

  Protection Mode: MaxPerformance
  Members:
  CDB   - Primary database
    Error: ORA-16810: multiple errors or warnings detected for the database

    STDBY - Physical standby database 
      Error: ORA-12543: TNS:destination host unreachable

Fast-Start Failover: DISABLED

Configuration Status:
ERROR   (status updated 71 seconds ago)
DGMGRL> show database 'CDB'

Database - CDB

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    CDB1
      Error: ORA-16737: the redo transport service for standby database "STDBY" has an error

    CDB2
      Error: ORA-16737: the redo transport service for standby database "STDBY" has an error

Database Status:
ERROR

As you can see CDB is operating in primary role, and redo transport is broken. This simple reason for broken redo transport is the fact that the standby cluster in its entirety is down. For the sake of completeness I have repeated the MAA connection string here:

MAA_TEST1 =
  (DESCRIPTION_LIST=
     (LOAD_BALANCE=off)(FAILOVER=on)
       (DESCRIPTION=
         (CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)
           (ADDRESS_LIST= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST=rac12pri-scan)(PORT=1521)))
           (CONNECT_DATA=(SERVICE_NAME=rootsrv)))
       (DESCRIPTION= (CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)
           (ADDRESS_LIST= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST= rac12sby-scan)(PORT=1521)))
           (CONNECT_DATA=(SERVICE_NAME=rootsrv)))
  )

Scenario 1: shut down of the standby cluster, database in primary role on primary cluster

This does not seem to be much of a problem. Using the same test as in part 1 I get no noteworthy penalty at all:

[oracle@lab tns]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Thu Apr 23 10:11:53 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Thu Apr 23 2015 09:59:48 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> 
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
CDB2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real	0m0.246s
user	0m0.020s
sys	0m0.025s
[oracle@lab tns]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Thu Apr 23 10:12:12 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Thu Apr 23 2015 10:11:52 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> 
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
CDB1

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real	0m0.171s
user	0m0.018s
sys	0m0.018s
[oracle@lab tns]$ 

Trying from a 12.1.0.1.0 client on a different machine:

[oracle@oracledev ~]$ time sqlplus system/secret@maa_test1 <<EOF
> select sys_context('userenv','instance_name') from dual;
> exit
> EOF

SQL*Plus: Release 12.1.0.1.0 Production on Thu Apr 23 10:13:50 2015

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Last Successful login time: Thu Apr 23 2015 10:12:11 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> 
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
CDB1

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real	0m0.427s
user	0m0.012s
sys	0m0.027s
[oracle@oracledev ~]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.1.0 Production on Thu Apr 23 10:13:52 2015

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Last Successful login time: Thu Apr 23 2015 10:13:50 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> 
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
CDB2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real	0m0.158s
user	0m0.012s
sys	0m0.016s

That seems ok, but can be easily explained by the (LOAD_BALANCE=off)(FAILOVER=on) in the DESCRIPTION_LIST. But what if we do it the other way around? Let’s start by reversing the roles:

DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected as SYSDBA.
DGMGRL> switchover to 'STDBY'
Performing switchover NOW, please wait...
New primary database "STDBY" is opening...
Oracle Clusterware is restarting database "CDB" ...
Switchover succeeded, new primary is "STDBY"

DGMGRL> show configuration

Configuration - martin

  Protection Mode: MaxPerformance
  Members:
  STDBY - Primary database
    CDB   - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 48 seconds ago)


Scenario 2: black-out of the primary cluster, database in primary role on standby cluster

The next is the worst case: a date centre has gone down, and none of the hosts in the other cluster (rac12pri) is reachable. To simulate this I simply shut down rac12pri1 and rac12pri2, that’s as dead as it gets. The SCAN will still resolve in DNS, but there is neither a SCAN VIP nor a SCAN LISTENER anywhere to answer on the primary cluster.

As you saw in the output leading to this paragraph the primary database is started on rac12sby, nodes rac12sby1 and rac12sby2. In the current format, there is – for the first time – a connection penalty:

preventde
[oracle@rac12sby1 ~]$ time sqlplus system/secret@maa_test1 <<EOF
> select sys_context('userenv','instance_name') from dual;
> exit;
> EOF

SQL*Plus: Release 12.1.0.2.0 Production on Tue Apr 21 07:17:36 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Mon Apr 20 2015 10:39:29 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL>
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
STDBY2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real    0m35.770s
user    0m0.024s
sys     0m0.043s

[oracle@rac12sby1 ~]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit;
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Tue Apr 21 07:18:39 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Tue Apr 21 2015 07:18:09 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL>
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
STDBY2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real    0m36.219s
user    0m0.021s
sys     0m0.022s

During testing with 12c these ranged between 30 seconds and 38. Here is proof that I truly cannot connect to the primary cluster:

[oracle@lab tns]$ ping rac12pri-scan
PING rac12pri-scan.example.com (192.168.100.111) 56(84) bytes of data.
From lab.example.com (192.168.100.1) icmp_seq=1 Destination Host Unreachable
From lab.example.com (192.168.100.1) icmp_seq=2 Destination Host Unreachable
From lab.example.com (192.168.100.1) icmp_seq=3 Destination Host Unreachable
From lab.example.com (192.168.100.1) icmp_seq=4 Destination Host Unreachable
^C
--- rac12pri-scan.example.com ping statistics ---
5 packets transmitted, 0 received, +4 errors, 100% packet loss, time 4001ms

[oracle@rac12sby1 ~]$ time telnet rac12pri-scan 1521
Trying 192.168.100.112...
telnet: connect to address 192.168.100.112: No route to host
Trying 192.168.100.113...
telnet: connect to address 192.168.100.113: No route to host
Trying 192.168.100.111...
telnet: connect to address 192.168.100.111: No route to host

real	0m9.022s
user	0m0.001s
sys	0m0.003s

I found it interesting that the telnet command took almost no time at all to complete, yet trying to connect using SQL_Net it seems to take forever. Why could that be? Trace!

Getting to the bottom of this

Tracing can help sometimes, so I tried that. I try and keep my working TNS configuration separate from the production entries in $ORACLE_HOME. Using ~/tns I defined sqlnet.ora to read:

  • diag_adr_enabled = off
  • trace_level_client = support
  • trace_directory_client = /home/oracle/tns
  • log_directory_client = /home/oracle/tns
  • trace_unique_client = true

And connected again. I noticed time increases whenever nsc2addr was invoked:

[oracle@lab tns]$ grep "nsc2addr.*DESC" cli_6630.trc | nl
     1  (1947452928) [23-APR-2015 11:07:45:376] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.112)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     2  (1947452928) [23-APR-2015 11:07:48:383] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.111)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     3  (1947452928) [23-APR-2015 11:07:51:390] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.113)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     4  (1947452928) [23-APR-2015 11:07:54:396] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.113)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     5  (1947452928) [23-APR-2015 11:07:57:402] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.111)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     6  (1947452928) [23-APR-2015 11:08:00:408] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.112)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     7  (1947452928) [23-APR-2015 11:08:03:414] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.113)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     8  (1947452928) [23-APR-2015 11:08:06:421] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.111)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     9  (1947452928) [23-APR-2015 11:08:09:427] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.112)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
    10  (1947452928) [23-APR-2015 11:08:12:433] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.111)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
    11  (1947452928) [23-APR-2015 11:08:15:439] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.112)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
    12  (1947452928) [23-APR-2015 11:08:18:445] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.113)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
    13  (1947452928) [23-APR-2015 11:08:21:452] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.120)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))

Sorry for the lengthy listing but that way it shows best! The primary SCAN is set to 111,112 and 113 with the standby SCAN on 119,120, and 121.

What you can see clearly by looking at the timestamps is that the transport_connect_timout works perfectly well-a new address is tried every 3 seconds. What surprises me is that each SCAN VIP is tried 4 times before the second SCAN is tried (lines 1-12). I thought I had that limited to 3 retries … Line 13 is the first reference to the second SCAN, and as soon as that is referenced a connection is established.

That can’t be it, can it?

Hmm maybe Oracle went a bit too far by adding 3 retries to every single SCAN VIP. The whole process can be sped up by trying every SCAN VIP only once before failing over to the second SCAN. This can be done with the following connection string:

MAA_TEST2 =
  (DESCRIPTION_LIST=
     (LOAD_BALANCE=off)(FAILOVER=on)
     (CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)
       (DESCRIPTION=
           (ADDRESS_LIST= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST=rac12pri-scan)(PORT=1521)))
           (CONNECT_DATA=(SERVICE_NAME=rootsrv)))
       (DESCRIPTION= 
           (ADDRESS_LIST= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST= rac12sby-scan)(PORT=1521)))
           (CONNECT_DATA=(SERVICE_NAME=rootsrv)))
  )

Trying this works a lot better from a timing perspective:

[oracle@lab tns]$ time sqlplus system/secret@maa_test2<<EOF
exit
EOF


SQL*Plus: Release 12.1.0.2.0 Production on Thu Apr 23 11:35:17 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Thu Apr 23 2015 11:08:21 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real	0m9.829s
user	0m0.041s
sys	0m0.022s
[oracle@lab tns]$ 

Visible in the SQL*Net trace:

[oracle@lab tns]$ grep "nsc2addr.*DESC" cli_7305.trc | nl 
     1	(3474875904) [23-APR-2015 11:35:17:935] nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.113)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     2	(3474875904) [23-APR-2015 11:35:20:940] nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.112)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     3	(3474875904) [23-APR-2015 11:35:23:948] nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.111)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     4	(3474875904) [23-APR-2015 11:35:26:956] nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.119)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))

That was the solution to the “slow” connectivity that the person who asked me the question in Paris shared, but I haven’t been able to find proof as to why it works better: this is it. Now the question remains why the MAA string in the Application Continuity white paper is more complex? I think what’s failing here is the CONNECT_TIMEOUT. According to tnsnames.ora reference in 12c the purpose of the parameter is “To specify the timeout duration in seconds for a client to establish an Oracle Net connection to an Oracle database”. You could read this so that after CONNECT_TIMEOUT is reached the failover to the next SCAN occurs. However it appears – by looking at the trace – that the connection timeout is similar to transport timeout and the lower of both is used.

There is a high probability that the syntax is in the document there for a reason, although it escapes me right now. I guess the penalty when creating new connections with connection pools doesn’t really matter that much, because you create the pool (where you “pay” the penalty) only once, and subsequently pull connections from there, which should be very fast.

Posted in 12c Release 1, Linux | Tagged: | Leave a Comment »

Little things worth knowing: Is there a penalty in establishing a connection to Oracle using the MAA connection string?

Posted by Martin Bach on April 20, 2015

Sorry for the long title!

I had a question during my session about “advanced RAC programming features” during the last Paris Oracle Meetup about the MAA connection string. I showed an example taken from the Appication Continuity White Paper (http://www.oracle.com/technetwork/database/options/clustering/application-continuity-wp-12c-1966213.pdf). Someone from the audience asked me if I had experienced any problems with it, such as very slow connection timeouts. I haven’t, but wanted to double-check anyway. This is a simplified test using a sqlplus connection since it is easier to time than a call to a connection pool creation. If you know of a way to reliably do so in Java/UCP let me know and I’ll test it.

My system is 12.1.0.2 on Oracle Linux 7.1 with UEK (3.8.13-55.1.6.el7uek.x86_64) and the following patches on the RDBMS and Grid Infrastructure side:

[oracle@rac12sby1 ~]$ opatch lspatches -oh /u01/app/12.1.0.2/grid
19872484;WLM Patch Set Update: 12.1.0.2.2 (19872484)
19769480;Database Patch Set Update : 12.1.0.2.2 (19769480)
19769479;OCW Patch Set Update : 12.1.0.2.2 (19769479)
19769473;ACFS Patch Set Update : 12.1.0.2.2 (19769473)

[oracle@rac12sby1 ~]$ opatch lspatches -oh /u01/app/oracle/product/12.1.0.2/dbhome_1
19877336;Database PSU 12.1.0.2.2, Oracle JavaVM Component (Jan2015)
19769480;Database Patch Set Update : 12.1.0.2.2 (19769480)
19769479;OCW Patch Set Update : 12.1.0.2.2 (19769479)

This is the January System Patch by the way, 20132450. At first I wanted to blog about the patch application but it was so uneventful I decided against it.

I have two clusters, rac12pri and rac12sby. Both consist of 2 nodes each. Database CDB is located on rac12pri, STDBY on rac12sby. Normally I am against naming databases by function as you might end up using STDBY in primary role after a data centre migration, and people tend to find the use of STDBY as primary database odd. In this case I hope it helps understanding the concept better, and it’s a lab environment anyway …

Creating the broker configuration

There is nothing special about creating the Broker Configuration, just remember to define the broker configuration files on shared storage and enabling automatic standby file management. I also recommend standby redo logs on both the primary and standby databases. Once you have the configuration in place, check the database(s) in verbose mode to get the broker connection string. You can copy/paste the connection string to sqlplus to ensure that every instance can be started thanks to a static listener registration (Data Guard will restart databases during switchover and failover operations, which is bound to fail unless the databases are statically registered with the listener). Here is what I mean:

DGMGRL> show instance verbose 'CDB1';

Instance 'CDB1' of database 'CDB'

  Host Name: rac12pri1
  PFILE:     
  Properties:
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.108)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=CDB_DGMGRL)(INSTANCE_NAME=CDB1)(SERVER=DEDICATED)))'
    StandbyArchiveLocation          = 'USE_DB_RECOVERY_FILE_DEST'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = '%t_%s_%r.dbf'
    TopWaitEvents                   = '(monitor)'

Instance Status:
SUCCESS

Now I can take the static connection identifier and use it (be sure to specify the “as sysdba” at the end):

[oracle@rac12pri1 ~]$ sqlplus 

SQL*Plus: Release 12.1.0.2.0 Production on Sat Apr 18 10:50:09 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter user-name: sys@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.108)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=CDB_DGMGRL)(INSTANCE_NAME=CDB1)(SERVER=DEDICATED))) as sysdba
Enter password: 

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL>

When this worked for all instances in the configuration you are a big step further in your Data Guard setup!

Service setup

I have created a new service, named rootsrv on both databases.

[oracle@rac12sby1 ~]$ srvctl add service -d STDBY -s rootsrv -role primary -policy automatic -preferred STDBY1,STDBY2

[oracle@rac12pri1 ~]$ srvctl add service -d CDB -s rootsrv -role primary -policy automatic -preferred CDB1,CDB2

To better follow along here is the current status of the configuration (In real life you might want to use a different protection mode):

DGMGRL> show configuration

Configuration - martin

  Protection Mode: MaxPerformance
  Members:
  STDBY - Primary database
    CDB   - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 28 seconds ago)

As per its definition, the service is started on the primary but not on the standby. In fact, I can’t start it on the standby:

[oracle@rac12sby1 ~]$ srvctl status service -d STDBY -s rootsrv
Service rootsrv is running on instance(s) STDBY1,STDBY2

[oracle@rac12pri1 ~]$ srvctl status service -d CDB -s rootsrv
Service rootsrv is not running.
[oracle@rac12pri1 ~]$ srvctl start service -d CDB -s rootsrv
PRCD-1084 : Failed to start service rootsrv
PRCR-1079 : Failed to start resource ora.cdb.rootsrv.svc
CRS-2800: Cannot start resource 'ora.cdb.db' as it is already in the INTERMEDIATE state on server 'rac12pri1'
CRS-2632: There are no more servers to try to place resource 'ora.cdb.rootsrv.svc' on that would satisfy its placement policy
CRS-2800: Cannot start resource 'ora.cdb.db' as it is already in the INTERMEDIATE state on server 'rac12pri2'
[oracle@rac12pri1 ~]$

So no more need for database triggers to start and stop services (this was another question I had during my talk) depending on the database’s role.

The MAA connection string

The MAA connection string as taken from the white paper and slightly adapted is this one:

MAA_TEST1 =
  (DESCRIPTION_LIST=
     (LOAD_BALANCE=off)(FAILOVER=on)
       (DESCRIPTION=
         (CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)
           (ADDRESS_LIST= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST=rac12pri-scan)(PORT=1521)))
           (CONNECT_DATA=(SERVICE_NAME=rootsrv)))
       (DESCRIPTION= (CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)
           (ADDRESS_LIST= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST= rac12sby-scan)(PORT=1521)))
           (CONNECT_DATA=(SERVICE_NAME=rootsrv)))
  )

Please refer to the white paper for an explanation of the various parameters.

Notice that both the primary and standby SCAN are referenced in there, both connecting to “rootsrv” which is currently active on STDBY. To get some proper timing about the connection delay I use the following little snippet:


$ time sqlplus system/secret@maa_test1 <<EOF
> select sys_context('userenv','instance_name') from dual;
> exit
> EOF

Testing on the “primary cluster” first, the local database is operating in the standby role:

[oracle@rac12pri1 ~]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Sat Apr 18 10:41:25 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sat Apr 18 2015 10:33:53 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL>
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
STDBY2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real    0m0.432s
user    0m0.026s
sys     0m0.028s

[oracle@rac12pri1 ~]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Sat Apr 18 10:31:25 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sat Apr 18 2015 10:31:24 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL>
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
STDBY2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real    0m0.415s
user    0m0.023s
sys     0m0.025s

That was quick-not even a second. And as you can see the connection is set up against database STDBY on the other cluster.

Next on the standby cluster:

[oracle@rac12sby1 ~]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Sat Apr 18 10:33:26 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sat Apr 18 2015 10:33:22 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL>
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
STDBY2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real    0m0.613s
user    0m0.025s
sys     0m0.019s

[oracle@rac12sby1 ~]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Sat Apr 18 10:33:52 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sat Apr 18 2015 10:33:50 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL>
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
STDBY2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real    0m0.199s
user    0m0.024s
sys     0m0.021s

So this was quick too. And finally on a different machine altogether (my lab server)

[oracle@lab tns]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv', 'instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Sat Apr 18 08:13:50 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sat Apr 18 2015 08:13:47 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> 
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
STDBY2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real	0m0.387s
user	0m0.020s
sys	0m0.018s

Does it change after a role reversal? I think I’ll cover that in a different post …

Appendix: setting up Data Guard for RAC

This is something I always forget so it’s here for reference. I use an identical tnsnames.ora file across all nodes, here it is:

CDB =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac12pri-scan)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = CDB)
    )
  ) 

STDBY =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac12sby-scan)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = STDBY)
    )
  ) 

# used for DG and RMAN duplicate
STDBY_NON_SCAN =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12sby1-vip)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12sby2-vip)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = STDBY)
    )
  )

CDB_NON_SCAN =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12pri1-vip)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12pri2-vip)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = CDB)
    )
  )

The non-scan TNS entries are a quick workaround to the problem that I registered the SIDs statically with the node listeners only, not the SCAN listeners. If there is a more elegant way of statically registering a database, please let me know! I couldn’t find any documentation on how to do this. I guess adding a SID_LIST_LISTENER_SCANx would do, too. Anyway, here is an example for the local node listener configuration (not the complete file contents). The ORACLE_SID and host names must be adapted for each node in the cluster.

LISTENER =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = IPC)(KEY = LISTENER))
    )
  )

# for broker
SID_LIST_LISTENER=
  (SID_LIST=
    (SID_DESC=
      (GLOBAL_DBNAME=STDBY)
      (ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1)
      (SID_NAME=STDBY1))
    (SID_DESC=
      (GLOBAL_DBNAME=STDBY_DGMGRL)
      (ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1)
      (SID_NAME=STDBY1))
  )

After changing the listener configuration you need to reload the local node listener.

Posted in 12c Release 1, Data Guard, Linux | Tagged: , | 7 Comments »

Little things worth knowing: what does opatchauto actually do?

Posted by Martin Bach on April 16, 2015

I recently applied system patch 20132450 to my 12.1.0.2.0 installation on a 2 node RAC system on Oracle Linux 7.1. While ensuring that OPatch is the latest version available I came across an interesting command line option in opatchauto. It is called “-generateSteps”.

[oracle@rac12sby1 ~]$ $ORACLE_HOME/OPatch/opatchauto apply -help
OPatch Automation Tool
Copyright (c) 2015, Oracle Corporation.  All rights reserved.


DESCRIPTION
    Apply a System Patch to Oracle Home. User specified the patch
    location or the current directory will be taken as the patch location.
    opatchauto must run from the GI Home as root user.

SYNTAX
<GI_HOME>/OPatch/opatchauto apply
                            [-analyze]
                            [-database <database names> ]
                            [-generateSteps]
                            [-invPtrLoc <Path to oraInst.loc> ]
                            [-jre <LOC> ] 
                            [-norestart ]
                            [-nonrolling ]
                            [-ocmrf <OCM response file location> ]
                            [-oh <OH_LIST> ]
                            [ <Patch Location> ]

OPTIONS


       -analyze
              This option runs all the required prerequisite checks to confirm
              the patchability of the system without actually patching or 
              affecting the system in any way.

       -database
              Used to specify the RDBMS home(s) to be patched. Option value 
              is oracle database name separated by comma.

       -generateSteps
              Generate the manual steps of apply session. These steps are what
              'opatchauto apply' does actually.

       -invPtrLoc
              Used to locate the oraInst.loc file when the installation used 
              the -invPtrLoc. This should be the path to the oraInst.loc file.

       -jre
              This option uses JRE (java) from the
              specified location instead of the default location
              under Oracle Home.

       -norestart
              This option tells opatchauto not to restart the Grid Infrastructure
              stack and database home resources after patching.

       -nonrolling
              This option makes the patching session run in 'nonrolling' mode.
              It is required that the stack on the local node is running while
              it must be stopped on all the remaining nodes before the patching
              session starts.

       -ocmrf 
              This option specifies the absolute path to the OCM response file. 
              It is required if the target Oracle Home doesn't have OCM 
              installed and configured.

       -oh
              This option specifies Oracle Home(s) to be patched. This can be a
              RAC home, a Grid Home or comma separated list of multiple homes.


PARAMETERS
      Patch Location
                    Path to the location for the patch. If the patch 
                    location is not specified, then the current
                    directory is taken as the patch location.


Example:
      To patch GI home and all RAC homes: 
      '<GI_HOME>/OPatch/opatchauto apply'

      To patch multiple homes:
      '<GI_HOME>/OPatch/opatchauto apply -oh <GI_HOME>,<RAC_HOME1>,<RAC_HOME2>'

      To patch databases running from RAC homes only:
      '<RAC_HOME>/OPatch/opatchauto apply -database db1,db2...dbn'

      To patch software-only installation:
      '<RAC_HOME>/OPatch/opatchauto apply -oh <RAC_HOME>' OR
      '<GI_HOME>/OPatch/opatchauto apply -oh <GI_HOME>'


opatchauto succeeded.
[oracle@rac12sby1 ~]$ 

So this could be quite interesting so I wanted to see what it does with the system patch. I don’t have a database installed in my cluster yet, it’s going to be the standby site for another cluster (rac12pri). Unfortunately opatchauto still skips RDBMS homes without a database in it in my tests so I end up using the “-oh” flag in this case. Here’s the result, formatted for better readability. In reality all calls to opatchauto are in a single line:

[root@rac12sby1 ~]# opatchauto apply /u01/patches/20132450 \
-oh /u01/app/12.1.0.2/grid -generateSteps -ocmrf /u01/patches/ocm.rsp
OPatch Automation Tool
Copyright (c) 2015, Oracle Corporation.  All rights reserved.

OPatchauto version : 12.1.0.1.5
OUI version        : 12.1.0.2.0
Running from       : /u01/app/12.1.0.2/grid

Invoking opatchauto utility "generateapplysteps"

To apply the patch, Please do the following manual actions:

Step 1 As the "oracle" user on the host "rac12sby1", please run the following commands:
  Action 1.1 [oracle@rac12sby1]$
  /u01/app/12.1.0.2/grid/OPatch/opatch version -oh /u01/app/12.1.0.2/grid -invPtrLoc \
/u01/app/12.1.0.2/grid/oraInst.loc -v2c 12.1.0.1.5

Step 2 As the "oracle" user on the host "rac12sby1", please run the following commands:
  Action 2.1 [oracle@rac12sby1]$
  /u01/app/12.1.0.2/grid/OPatch/opatch lsinventory -invPtrLoc \
/u01/app/12.1.0.2/grid/oraInst.loc -oh /u01/app/12.1.0.2/grid

Step 3 As the "oracle" user on the host "rac12sby1", please run the following commands:
  Action 3.1 [oracle@rac12sby1]$
  /u01/app/12.1.0.2/grid/OPatch/opatch prereq CheckComponents \
-ph /u01/patches/20132450/19769473 -invPtrLoc /u01/app/12.1.0.2/grid/oraInst.loc \
-oh /u01/app/12.1.0.2/grid

  Action 3.2 [oracle@rac12sby1]$
  /u01/app/12.1.0.2/grid/OPatch/opatch prereq CheckComponents \
-ph /u01/patches/20132450/19769479 -invPtrLoc /u01/app/12.1.0.2/grid/oraInst.loc \
-oh /u01/app/12.1.0.2/grid

  Action 3.3 [oracle@rac12sby1]$
  /u01/app/12.1.0.2/grid/OPatch/opatch prereq CheckComponents \
-ph /u01/patches/20132450/19769480 -invPtrLoc /u01/app/12.1.0.2/grid/oraInst.loc \
-oh /u01/app/12.1.0.2/grid

  Action 3.4 [oracle@rac12sby1]$
  /u01/app/12.1.0.2/grid/OPatch/opatch prereq CheckComponents \
-ph /u01/patches/20132450/19872484 -invPtrLoc /u01/app/12.1.0.2/grid/oraInst.loc \
-oh /u01/app/12.1.0.2/grid

Step 4 As the "oracle" user on the host "rac12sby1", please run the following commands:
  Action 4.1 [oracle@rac12sby1]$
  /u01/app/12.1.0.2/grid/OPatch/opatch prereq CheckConflictAgainstOH \
-ph /u01/patches/20132450/19769473 -invPtrLoc /u01/app/12.1.0.2/grid/oraInst.loc \
-oh /u01/app/12.1.0.2/grid

  Action 4.2 [oracle@rac12sby1]$
  /u01/app/12.1.0.2/grid/OPatch/opatch prereq CheckConflictAgainstOH \
-ph /u01/patches/20132450/19769479 -invPtrLoc /u01/app/12.1.0.2/grid/oraInst.loc \
-oh /u01/app/12.1.0.2/grid

  Action 4.3 [oracle@rac12sby1]$
  /u01/app/12.1.0.2/grid/OPatch/opatch prereq CheckConflictAgainstOH \
-ph /u01/patches/20132450/19769480 -invPtrLoc /u01/app/12.1.0.2/grid/oraInst.loc \
-oh /u01/app/12.1.0.2/grid

  Action 4.4 [oracle@rac12sby1]$
  /u01/app/12.1.0.2/grid/OPatch/opatch prereq CheckConflictAgainstOH \
-ph /u01/patches/20132450/19872484 -invPtrLoc /u01/app/12.1.0.2/grid/oraInst.loc \
-oh /u01/app/12.1.0.2/grid

Step 5 As the "root" user on the host "rac12sby1", please run the following commands:
  Action 5.1 [root@rac12sby1]#
  /u01/app/12.1.0.2/grid/perl/bin/perl -I/u01/app/12.1.0.2/grid/perl/lib \
-I/u01/app/12.1.0.2/grid/crs/install /u01/app/12.1.0.2/grid/crs/install/rootcrs.pl -prepatch

Step 6 As the "oracle" user on the host "rac12sby1", please run the following commands:
  Action 6.1 [oracle@rac12sby1]$
  echo /u01/patches/20132450/19769473 > /tmp/OraGI12Home1_patchList

  Action 6.2 [oracle@rac12sby1]$
  echo /u01/patches/20132450/19769479 >> /tmp/OraGI12Home1_patchList

  Action 6.3 [oracle@rac12sby1]$
  echo /u01/patches/20132450/19769480 >> /tmp/OraGI12Home1_patchList

  Action 6.4 [oracle@rac12sby1]$
  echo /u01/patches/20132450/19872484 >> /tmp/OraGI12Home1_patchList

  Action 6.5 [oracle@rac12sby1]$
  /u01/app/12.1.0.2/grid/OPatch/opatch napply -phBaseFile /tmp/OraGI12Home1_patchList \
-local  -invPtrLoc /u01/app/12.1.0.2/grid/oraInst.loc -oh /u01/app/12.1.0.2/grid -silent

Step 7 As the "root" user on the host "rac12sby1", please run the following commands:
  Action 7.1 [root@rac12sby1]#
  /u01/app/12.1.0.2/grid/rdbms/install/rootadd_rdbms.sh

Step 8 As the "root" user on the host "rac12sby1", please run the following commands:
  Action 8.1 [root@rac12sby1]#
  /u01/app/12.1.0.2/grid/perl/bin/perl -I/u01/app/12.1.0.2/grid/perl/lib \
-I/u01/app/12.1.0.2/grid/crs/install /u01/app/12.1.0.2/grid/crs/install/rootcrs.pl -postpatch


Step 9 As the "oracle" user on the host "rac12sby1", please run the following commands:
  Action 9.1 [oracle@rac12sby1]$
  /u01/app/12.1.0.2/grid/OPatch/opatch lsinventory \
-invPtrLoc /u01/app/12.1.0.2/grid/oraInst.loc -oh /u01/app/12.1.0.2/grid | grep 19769473

  Action 9.2 [oracle@rac12sby1]$
  /u01/app/12.1.0.2/grid/OPatch/opatch lsinventory \
-invPtrLoc /u01/app/12.1.0.2/grid/oraInst.loc -oh /u01/app/12.1.0.2/grid | grep 19769479

  Action 9.3 [oracle@rac12sby1]$
  /u01/app/12.1.0.2/grid/OPatch/opatch lsinventory \
-invPtrLoc /u01/app/12.1.0.2/grid/oraInst.loc -oh /u01/app/12.1.0.2/grid | grep 19769480

  Action 9.4 [oracle@rac12sby1]$
  /u01/app/12.1.0.2/grid/OPatch/opatch lsinventory \
-invPtrLoc /u01/app/12.1.0.2/grid/oraInst.loc -oh /u01/app/12.1.0.2/grid | grep 19872484

You could of course use oplan which is bundled in $ORACLE_HOME/OPatch/oplan/oplan to generate a lot more detailed profile for the patch application. For a first glance at the activity opatchauto -generateSteps seems quite useful.

The steps all rang a bell from my 11.2.0.1 days when this was the procedure to patch Grid Infrastructure (thank god that’s over). What I didn’t recognise was rootadd_rdbms.sh. Looking at the file I can see it is changing ownership and permissions for oradism (important!) and some other executables in $ORACLE_HOME and fixes potential problems with “fs.file-max*” in /etc/sysctl.conf.

Posted in 12c Release 1, Linux | Tagged: , | Leave a Comment »

Little things worth knowing: direct path inserts and referential integrity

Posted by Martin Bach on April 13, 2015

This is another post to remind myself that Oracle evolves, and what I thought I knew might no longer be relevant. So double-checking instead of assuming should become a habit!

Today’s example: direct path inserts. I seemed to remember from Oracle 9i that a direct path insert ignores referential integrity. This is still confirmed in the 9i Release 2 Concepts Guide, chapter 19 “Direct Path Insert”. Quoting from there:

During direct-path INSERT operations, Oracle appends the inserted data after existing data in the table. Data is written directly into datafiles, bypassing the buffer cache. Free space in the existing data is not reused, and referential integrity constraints are ignored

That sounds a bit harsh in today’s times so it’s worth a test. On Oracle 12.1.0.2 I created a parent/child relationship, admittedly rather crude:

SQL> create table parent (id, vc) as
  2   select distinct data_object_id, subobject_name from dba_objects
  3   where data_object_id is not null;

Table created.

SQL> alter table parent add constraint pk_parent primary key (id);

Table altered.

SQL> create table child (id number, p_id number not null, vc varchar2(100),
  2  constraint pk_child primary key (id),
  3  constraint fk_parent_child foreign key (p_id) references parent (id));

Table created.

Now when I try to insert data using a direct path insert it fails:

SQL> insert /*+ append */ into child select s_child.nextval, -1, 'this will not work' from parent;
insert /*+ append */ into child select s_child.nextval, -1, 'this will not work' from parent
*
ERROR at line 1:
ORA-02291: integrity constraint (MARTIN.FK_PARENT_CHILD) violated - parent key not found

Which is brilliant and what I expected since it prevents me from writing a lot of garbage into my table. If you have developed with Oracle then you probably know about deferrable constraints. An existing constraint can’t be changed to a status of “initially deferrable”, which is why I have to drop it and then try the insert again:

SQL> alter table child drop constraint FK_PARENT_CHILD;

Table altered.

Elapsed: 00:00:00.04
SQL> alter table child add constraint FK_PARENT_CHILD foreign key (p_id) references parent (id)
  2  deferrable initially deferred;

Table altered.

Elapsed: 00:00:00.02
SQL> insert /*+ append */ into child select s_child.nextval, -1, 'this will not work' from parent;

7640 rows created.

Elapsed: 00:00:00.30
SQL> commit;
commit
*
ERROR at line 1:
ORA-02091: transaction rolled back
ORA-02291: integrity constraint (MARTIN.FK_PARENT_CHILD) violated - parent key not found

So this contraint fires as well! Good news for me. If it’s all ok then the insert will of course succeed.

Trouble is that Oracle “silently” ignores the direct path load! I haven’t initially put this into the post but thanks to Oren for adding it to it in the comments section (make sure to have a look at it).

Back to the “hot” constraint definition and inserting into the table yields the expected result.

SQL> alter table child drop constraint FK_PARENT_CHILD;

Table altered.

SQL> alter table child add constraint FK_PARENT_CHILD foreign key (p_id) references parent (id);

Table altered.

SQL> insert /*+ append */ into child select s_child.nextval, 2, 'this should work however' from parent;

7640 rows created.

SQL> commit;

Commit complete.

Summary

Again-the insert wasn’t really a “direct path insert”, see comments. It really pays off to verify and update “old knowledge” from time to time! I still prefer valid data over garbage, and when it comes to ILM I can “move” my table to get the required compression result.

References

Have fun!

Posted in 12c Release 1 | 4 Comments »

Little things worth knowing: exp/imp vs expdp and impdp for HCC in Exadata

Posted by Martin Bach on March 30, 2015

Do you know the difference between exp/imp and expdp/impdp when it comes to importing HCC compressed data in Exadata?

If not, then follow me through two examples. This is on 11.2.0.3/11.2.3.3.1 but applies to all database releases you can have on Exadata. The task at hand is to export a table (which happens to be non-partitioned and HCC compressed for query high) and import it into a different user’s schema. This is quite a common approach when migrating data from a non-Exadata system into an Exadata system. You could for example pre-create the DDL for the tables and implement HCC before even importing a single row. When importing the data, the partitions’ HCC attributes will be honoured and data will be inserted compressed. Or won’t it?

The table

The table I want to export resides on our V2 system. Since I am (trying to be) a good citizen I want to use dbfs for the dump files. Beginning with 12.1.0.2 Grid Infrastructure you can also have ACFS by the way. Let’s start by creating the directory needed and some meta-information about the source table:

SQL> create or replace directory DBM01 as '/dbfs_direct/FS1/mbach/';

Directory created.

SQL> select owner, table_name, partitioned, num_rows, compress_for from dba_tables where table_name = 'T1_QH';

OWNER                          TABLE_NAME                     PAR   NUM_ROWS COMPRESS_FOR
------------------------------ ------------------------------ --- ---------- ------------
MARTIN                         T1_QH                          NO             QUERY HIGH

SQL> select bytes/power(1024,2) m, blocks from dba_segments where segment_name = 'T1_QH';

         M     BLOCKS
---------- ----------
        72       9216

The table is 72 MB in size, and HCC compressed (I actually ensured that it was by issuing an “alter table t1_qh move;” before the export started).

Data Pump

The first export uses expdp, followed by impdp to get the data back. I am remapping the schema so I don’t have to risk overwriting my source data.

[enkdb01:oracle:DBM011] /home/oracle/mbach
> expdp martin/xxxxxx directory=DBM01 dumpfile=exp_t1_qh.dmp logfile=exp_t1_qh.log tables=t1_qh

Export: Release 11.2.0.3.0 - Production on Tue Mar 24 06:15:34 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
Starting "MARTIN"."SYS_EXPORT_TABLE_02":  martin/******** directory=DBM01 dumpfile=exp_t1_qh.dmp logfile=exp_t1_qh.log tables=t1_qh
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 72 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
. . exported "MARTIN"."T1_QH"                            9.675 GB 10000000 rows
Master table "MARTIN"."SYS_EXPORT_TABLE_02" successfully loaded/unloaded
******************************************************************************
Dump file set for MARTIN.SYS_EXPORT_TABLE_02 is:
  /dbfs_direct/FS1/mbach/exp_t1_qh.dmp
Job "MARTIN"."SYS_EXPORT_TABLE_02" successfully completed at 06:17:16

The interesting bit here is that the table on disk occupies around 72 MB, and yet expdp tells me the 10000000 rows occupy 9.7 GB. Can anyone guess why?

[enkdb01:oracle:DBM011] /home/oracle/mbach
> ls -lh /dbfs_direct/FS1/mbach/exp_t1_qh.dmp
-rw-r----- 1 oracle dba 9.7G Mar 24 06:17 /dbfs_direct/FS1/mbach/exp_t1_qh.dmp

Yes, 10GB. Data apparently is not exported in its compressed form. Now this table is going to be imported:

[enkdb01:oracle:DBM011] /home/oracle/mbach
> impdp imptest/xxxxxx directory=DBM01 dumpfile=exp_t1_qh.dmp logfile=imp_t1_qh.log \
> tables=martin.t1_qh remap_schema=martin:imptest 

Import: Release 11.2.0.3.0 - Production on Tue Mar 24 06:23:44 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
Master table "IMPTEST"."SYS_IMPORT_TABLE_01" successfully loaded/unloaded
Starting "IMPTEST"."SYS_IMPORT_TABLE_01":  imptest/******** directory=DBM01
  dumpfile=exp_t1_qh.dmp logfile=imp_t1_qh.log tables=martin.t1_qh remap_schema=martin:imptest
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
. . imported "IMPTEST"."T1_QH"                           9.675 GB 10000000 rows
Job "IMPTEST"."SYS_IMPORT_TABLE_01" successfully completed at 06:29:53

This looks all right-all data back (the number of rows exported matches those imported). What about the segment size?

SQL> select owner, table_name, partitioned, num_rows, compress_for from dba_tables where table_name = 'T1_QH';

OWNER                          TABLE_NAME                     PAR   NUM_ROWS COMPRESS_FOR
------------------------------ ------------------------------ --- ---------- ------------
MARTIN                         T1_QH                          NO             QUERY HIGH
IMPTEST                        T1_QH                          NO             QUERY HIGH

2 rows selected.

SQL> select bytes/power(1024,2) m, blocks from dba_segments where segment_name = 'T1_QH';

         M     BLOCKS
---------- ----------
        72       9216
        72       9216

2 rows selected.

Identical down to the block.

Traditional Export/Import

Despite the fact that exp/imp are deprecated they are still included with 12.1.0.2, the current release at the time of writing. What if you did the same process with these instead? After all, many DBAs “grew up” with those tools and can use them in their sleep. This plus some initial deficits with Data Pump in 10g keep exp/imp high up in the list of tools we like.

Let’s export the table:

[enkdb01:oracle:DBM011] /home/oracle/mbach
> exp martin/xxxxxxx file=/dbfs_direct/FS1/mbach/exp_t1_qh_classic.dmp log=/dbfs_direct/FS1/mbach/exp_t1_qh_classic.log tables=t1_qh

Export: Release 11.2.0.3.0 - Production on Tue Mar 24 06:42:34 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Tes
Export done in US7ASCII character set and AL16UTF16 NCHAR character set

About to export specified tables via Conventional Path ...
. . exporting table                          T1_QH   10000000 rows exported
Export terminated successfully without warnings.

The files are more or less identical in size to the ones created before by Data Pump:

[enkdb01:oracle:DBM011] /home/oracle/mbach
> ls -lh /dbfs_direct/FS1/mbach/*classic*
-rw-r--r-- 1 oracle dba 9.7G Mar 24 07:01 /dbfs_direct/FS1/mbach/exp_t1_qh_classic.dmp
-rw-r--r-- 1 oracle dba  537 Mar 24 07:01 /dbfs_direct/FS1/mbach/exp_t1_qh_classic.log

What about the import?

[enkdb01:oracle:DBM011] /home/oracle/mbach
> imp imptest/xxxxxx file=/dbfs_direct/FS1/mbach/exp_t1_qh_classic.dmp log=/dbfs_direct/FS1/mbach/imp_t1_qh_classic.log \
> FROMUSER=martin TOUSER=imptest tables=t1_qh

Import: Release 11.2.0.3.0 - Production on Tue Mar 24 07:27:42 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Tes

Export file created by EXPORT:V11.02.00 via conventional path

Warning: the objects were exported by MARTIN, not by you

import done in US7ASCII character set and AL16UTF16 NCHAR character set
. importing MARTIN's objects into IMPTEST
. . importing table                        "T1_QH"

And this takes a loooong time. What’s happening in the background? I first checked what the session was doing. That’s simple-have a look at the output.

SQL> @scripts/as

   INST_ID   SID    SERIAL# USERNAME      PROG       SQL_ID         CHILD PLAN_HASH_VALUE      EXECS   AVG_ETIME OFF SQL_TEXT
---------- ----- ---------- ------------- ---------- ------------- ------ --------------- ---------- ----------- --- -----------------------------------------
         1   594         23 IMPTEST       imp@enkdb0 fgft1tcrr12ga      0               0      22723         .00 No  INSERT /*+NESTED_TABLE_SET_REFS+*/ INTO "

The #execs were interesting and indeed, after a couple of minutes that counter has gone up a lot:

   INST_ID   SID    SERIAL# USERNAME      PROG       SQL_ID         CHILD PLAN_HASH_VALUE      EXECS   AVG_ETIME OFF SQL_TEXT
---------- ----- ---------- ------------- ---------- ------------- ------ --------------- ---------- ----------- --- -----------------------------------------
         1   594         23 IMPTEST       imp@enkdb0 fgft1tcrr12ga      0               0     639818         .00 Yes INSERT /*+NESTED_TABLE_SET_REFS+*/ INTO "

You can probably guess what’s coming next. After quite some while the import finished:

> imp imptest/xxxxxxx file=/dbfs_direct/FS1/mbach/exp_t1_qh_classic.dmp log=/dbfs_direct/FS1/mbach/imp_t1_qh_classic.log \
> FROMUSER=martin TOUSER=imptest tables=t1_qh

Import: Release 11.2.0.3.0 - Production on Tue Mar 24 07:27:42 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Tes

Export file created by EXPORT:V11.02.00 via conventional path

Warning: the objects were exported by MARTIN, not by you

import done in US7ASCII character set and AL16UTF16 NCHAR character set
. importing MARTIN's objects into IMPTEST
. . importing table                        "T1_QH"   10000000 rows imported
Import terminated successfully without warnings.

And now for the reason of this blog post:

SQL> select owner, bytes/power(1024,2) m, blocks from dba_segments where segment_name = 'T1_QH';

OWNER                                   M     BLOCKS
------------------------------ ---------- ----------
MARTIN                                 72       9216
IMPTEST                             11227    1437056

The newly important table was not compressed at all when importing using the traditional path.

Posted in 11g Release 2, 12c Release 1, Exadata | Tagged: , , , , | Leave a Comment »

Understanding enhancements to block cleanouts in Exadata part 2

Posted by Martin Bach on February 25, 2015

In part 1 of the series I tried to explain (probably a bit too verbose when it came to session statistics) what the effect is of delayed block cleanout and buffered I/O. In the final example the “dirty” blocks on disk have been cleaned out in the buffer cache, greatly reducing the amount of work to be done when reading them.

Catching up with now, and direct path reads. You probably noticed that the migration to 11.2 caused your I/O patterns to change. Suitably large segments are now read using direct path read not only during parallel query but also potentially during the serial execution of a query. Since the blocks read during a direct path read do not end up in the buffer cache there is an interesting side effect to block cleanouts. The scenario is the same unrealistic yet reproducible one as the one presented in part 1 of this article.

Enter Direct Path Reads – non Exadata

To be absolutely sure I am getting results without any optimisation offered by Exadata I am running the example on different hardware. I am also using 11.2.0.4 because that’s the version I had on my lab server. The principles here apply between versions unless I am very mistaken.

I am repeating my test under realistic conditions, leaving _serial_direct_read at its default, “auto”. This means that I am almost certainly going to see direct path reads now when scanning my table. As Christian Antognini has pointed out direct path reads have an impact on the amount of work that has to be done. The test commences with an update in session 1 updating the whole table and flushing the blocks in the buffer cache to disk to ensure they have an active transaction in the ITL part of the header. The selects on session 2 show the following behaviour. The first execution takes longer as expected, the second one is faster.

SQL> select /* not_exadata */ count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:20.62

SQL> @scripts/mystats stop r=physical|cleanout|consistent|cell|table
...
------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    active txn count during cleanout                                            83,334
STAT    cell physical IO interconnect bytes                                    747,798,528
STAT    cleanout - number of ktugct calls                                           83,334
STAT    cleanouts and rollbacks - consistent read gets                              83,334
STAT    consistent changes                                                           1,019
STAT    consistent gets                                                            751,365
STAT    consistent gets - examination                                              667,178
STAT    consistent gets direct                                                      83,334
STAT    consistent gets from cache                                                 668,031
STAT    consistent gets from cache (fastpath)                                          850
STAT    data blocks consistent reads - undo records applied                        583,334
STAT    immediate (CR) block cleanout applications                                  83,334
STAT    physical read IO requests                                                    8,631
STAT    physical read bytes                                                    747,798,528
STAT    physical read total IO requests                                              8,631
STAT    physical read total bytes                                              747,798,528
STAT    physical read total multi block requests                                       673
STAT    physical reads                                                              91,284
STAT    physical reads cache                                                         7,950
STAT    physical reads direct                                                       83,334
STAT    table fetch by rowid                                                           170
STAT    table scan blocks gotten                                                    83,334
STAT    table scan rows gotten                                                     500,000
STAT    table scans (direct read)                                                        1
STAT    table scans (short tables)                                                       1


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

The second execution shows slightly better performance as seen before in part 1. The relevant statistics are shown here:


SQL> select /* not_exadata */ count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:04.60

SQL> @scripts/mystats stop r=physical|cleanout|consistent|cell|table
...
------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    active txn count during cleanout                                            83,334
STAT    cell physical IO interconnect bytes                                    682,672,128
STAT    cleanout - number of ktugct calls                                           83,334
STAT    cleanouts and rollbacks - consistent read gets                              83,334
STAT    consistent changes                                                             472
STAT    consistent gets                                                            750,250
STAT    consistent gets - examination                                              666,671
STAT    consistent gets direct                                                      83,334
STAT    consistent gets from cache                                                 666,916
STAT    consistent gets from cache (fastpath)                                          245
STAT    data blocks consistent reads - undo records applied                        583,334
STAT    immediate (CR) block cleanout applications                                  83,334
STAT    physical read IO requests                                                      681
STAT    physical read bytes                                                    682,672,128
STAT    physical read total IO requests                                                681
STAT    physical read total bytes                                              682,672,128
STAT    physical read total multi block requests                                       673
STAT    physical reads                                                              83,334
STAT    physical reads direct                                                       83,334
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan rows gotten                                                     500,000
STAT    table scans (direct read)                                                        1


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

The amount of IO is slightly less. Still all the CR processing is done for every block in the table-each has an active transaction (active txn count during cleanout). As you would expect the buffer cache has no blocks from the segment stored:

SQL> select count(*), inst_id, status from gv$bh 
  2  where objd = (select data_object_id from dba_objects where object_name = 'T1_100K') 
  3  group by inst_id, status;

  COUNT(*)    INST_ID STATUS
---------- ---------- ----------
     86284          2 free
    584983          1 free
         1          1 scur

Subsequent executions all take approximately the same amount of time. Every execution has to perform cleanouts and rollbacks-no difference to before really except that direct path reads are used to read the table. No difference here to the buffered reads, with the exception that there aren’t any blocks from T1_100K in the buffer cache.

Commit

Committing in session 1 does not have much of an effect-the blocks read by the direct path read are going to the PGA instead of the buffer cache. A quick demonstration: the same select has been executed after the transaction in session 1 has committed.

SQL> select /* not_exadata */ count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:04.87

SQL> @scripts/mystats stop r=physical|cleanout|consistent|cell|table
...
------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    cell physical IO interconnect bytes                                    682,672,128
STAT    cleanout - number of ktugct calls                                           83,334
STAT    cleanouts only - consistent read gets                                       83,334
STAT    commit txn count during cleanout                                            83,334
STAT    consistent changes                                                             481
STAT    consistent gets                                                            166,916
STAT    consistent gets - examination                                               83,337
STAT    consistent gets direct                                                      83,334
STAT    consistent gets from cache                                                  83,582
STAT    consistent gets from cache (fastpath)                                          245
STAT    immediate (CR) block cleanout applications                                  83,334
STAT    physical read IO requests                                                      681
STAT    physical read bytes                                                    682,672,128
STAT    physical read total IO requests                                                681
STAT    physical read total bytes                                              682,672,128
STAT    physical read total multi block requests                                       673
STAT    physical reads                                                              83,334
STAT    physical reads direct                                                       83,334
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan rows gotten                                                     500,000
STAT    table scans (direct read)                                                        1


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

As you can see the work done for the cleanout is repeated very time, despite the fact that active txn count during cleanout is not found in the output. And still, there aren’t really any blocks pertaining to T1_100K in the buffer cache.

SQL> select count(*), inst_id, status from gv$bh where
  2   objd = (select data_object_id from dba_objects where object_name = 'T1_100K')
  3  group by inst_id, status;

  COUNT(*)    INST_ID STATUS
---------- ---------- ----------
     86284          2 free
    584983          1 free
         1          1 scur

OK that should be enough for now-over to the Exadata.

Direct Path Reads – Exadata

The same test again, but this time on an X2-2 quarter rack running database 12.1.0.2 and cellsrv 12.1.2.1.0. After the update of the entire table and the flushing of blocks to disk, here are the statistics for the first and second execution.

SQL> select count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:17.41

SQL> @scripts/mystats stop r=physical|cleanout|consistent|cell|table


------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    active txn count during cleanout                                            83,334
STAT    cell IO uncompressed bytes                                             682,672,128
STAT    cell blocks processed by cache layer                                        83,933
STAT    cell commit cache queries                                                   83,933
STAT    cell flash cache read hits                                                  14,011
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  682,672,128
STAT    cell physical IO interconnect bytes                                    804,842,256
STAT    cell physical IO interconnect bytes returned by smart scan             682,904,336
STAT    cell scans                                                                       1
STAT    cleanout - number of ktugct calls                                           83,334
STAT    cleanouts and rollbacks - consistent read gets                              83,334
STAT    consistent changes                                                             796
STAT    consistent gets                                                            917,051
STAT    consistent gets direct                                                      83,334
STAT    consistent gets examination                                                833,339
STAT    consistent gets examination (fastpath)                                      83,336
STAT    consistent gets from cache                                                 833,717
STAT    consistent gets pin                                                            378
STAT    consistent gets pin (fastpath)                                                 377
STAT    data blocks consistent reads - undo records applied                        750,002
STAT    immediate (CR) block cleanout applications                                  83,334
STAT    physical read IO requests                                                   16,147
STAT    physical read bytes                                                    804,610,048
STAT    physical read requests optimized                                            14,011
STAT    physical read total IO requests                                             16,147
STAT    physical read total bytes                                              804,610,048
STAT    physical read total bytes optimized                                    114,778,112
STAT    physical read total multi block requests                                       851
STAT    physical reads                                                              98,219
STAT    physical reads cache                                                        14,885
STAT    physical reads direct                                                       83,334
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan disk non-IMC rows gotten                                        500,000
STAT    table scan rows gotten                                                     500,000
STAT    table scans (direct read)                                                        1


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================


SQL> -- second execution, still no commit in session 1
SQL> select count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:01.31

SQL> @scripts/mystats stop r=physical|cleanout|consistent|cell|table
...
------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    active txn count during cleanout                                            83,334
STAT    cell IO uncompressed bytes                                             682,672,128
STAT    cell blocks processed by cache layer                                        83,949
STAT    cell commit cache queries                                                   83,949
STAT    cell flash cache read hits                                                   1,269
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  682,672,128
STAT    cell physical IO interconnect bytes                                    682,907,280
STAT    cell physical IO interconnect bytes returned by smart scan             682,907,280
STAT    cell scans                                                                       1
STAT    cleanout - number of ktugct calls                                           83,334
STAT    cleanouts and rollbacks - consistent read gets                              83,334
STAT    consistent changes                                                             791
STAT    consistent gets                                                            917,053
STAT    consistent gets direct                                                      83,334
STAT    consistent gets examination                                                833,339
STAT    consistent gets examination (fastpath)                                      83,337
STAT    consistent gets from cache                                                 833,719
STAT    consistent gets pin                                                            380
STAT    consistent gets pin (fastpath)                                                 380
STAT    data blocks consistent reads - undo records applied                        750,002
STAT    immediate (CR) block cleanout applications                                  83,334
STAT    physical read IO requests                                                    1,278
STAT    physical read bytes                                                    682,672,128
STAT    physical read requests optimized                                             1,269
STAT    physical read total IO requests                                              1,278
STAT    physical read total bytes                                              682,672,128
STAT    physical read total bytes optimized                                    678,494,208
STAT    physical read total multi block requests                                       885
STAT    physical reads                                                              83,334
STAT    physical reads direct                                                       83,334
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan disk non-IMC rows gotten                                        500,000
STAT    table scan rows gotten                                                     500,000
STAT    table scans (direct read)                                                        1


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

In the example table scans (direct read) indicates that the segment was read using a direct path read. Since this example was executed on an Exadata this direct path read was performed as a Smart Scan (cell scans = 1). However, the Smart Scan was not very successful: although all 83,334 table blocks were opened by the cache layer (the first one to touch a block on the cell during the Smart Scan) none of them passed the examination of the transaction layer-they have all been discarded. You can see that there was no saving by using the Smart Scan here: cell physical IO bytes eligible for predicate offload is equal to cell physical IO interconnect bytes returned by smart scan.

83,334 blocks were read during the table scan (table scan blocks gotten), all of which were read directly (physical reads direct). All of the blocks read had an active transaction (active txn count during cleanout) and all of these had to be rolled back (cleanouts and rollbacks – consistent read gets).

The same amount of work has to be done for every read.

Commit in session 1

After committing the transaction in session 1 the statistics for the select statement in session 2 are as follows (also note the execution time)

SQL> select count(*) from t1_100k;

  COUNT(*)
----------
    500000

Elapsed: 00:00:00.16

SQL> @scripts/mystats stop r=physical|cleanout|consistent|cell|table

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    cell IO uncompressed bytes                                             682,672,128
STAT    cell blocks helped by commit cache                                          83,334
STAT    cell blocks processed by cache layer                                        83,334
STAT    cell blocks processed by data layer                                         83,334
STAT    cell blocks processed by txn layer                                          83,334
STAT    cell commit cache queries                                                   83,334
STAT    cell flash cache read hits                                                     663
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  682,672,128
STAT    cell physical IO interconnect bytes                                     13,455,400
STAT    cell physical IO interconnect bytes returned by smart scan              13,455,400
STAT    cell scans                                                                       1
STAT    cell transactions found in commit cache                                     83,334
STAT    consistent changes                                                             791
STAT    consistent gets                                                             83,717
STAT    consistent gets direct                                                      83,334
STAT    consistent gets examination                                                      3
STAT    consistent gets examination (fastpath)                                           3
STAT    consistent gets from cache                                                     383
STAT    consistent gets pin                                                            380
STAT    consistent gets pin (fastpath)                                                 380
STAT    physical read IO requests                                                      663
STAT    physical read bytes                                                    682,672,128
STAT    physical read requests optimized                                               663
STAT    physical read total IO requests                                                663
STAT    physical read total bytes                                              682,672,128
STAT    physical read total bytes optimized                                    682,672,128
STAT    physical read total multi block requests                                       654
STAT    physical reads                                                              83,334
STAT    physical reads direct                                                       83,334
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                    83,334
STAT    table scan disk non-IMC rows gotten                                        500,000
STAT    table scan rows gotten                                                     500,000
STAT    table scans (direct read)                                                        1
STAT    table scans (short tables)                                                       1


------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

There are no more active transactions found during the cleanout. The blocks however still have to be cleaned out every time the direct path read is performed. As a matter of design principle Oracle cannot reuse the blocks it has read with a direct path read (or Smart Scan for that matter) they simply are not cached.

But have a look at the execution time: quite a nice performance improvement that is. In fact there are a few things worth mentioning:

  • The segment is successfully scanned via a Smart Scan. How can you tell?
    • cell scans = 1
    • cell physical IO interconnect bytes returned by smart scan is a lot lower than cell physical IO bytes eligible for predicate offload
    • You see cell blocks processed by (cache|transaction|data) layer matching the number of blocks the table is made up of. Previously you only saw blocks processed by the cache layer
  • The commit cache made this happen
    • cell commit cache queries
    • cell blocks helped by commit cache
    • cell transactions found in commit cache

The commit cache “lives” in the cells and probably caches recently committed transactions. If a block is read by the cell and an active transaction is found the cell has a couple of options to avoid a round-trip of the block to the RDBMS layer for consistent read processing:

  1. If possible it uses the minscn optimisation. Oracle tracks the minimum SCN of all active transactions. The cell software can compare the SCN from the ITL in the block with the lowest SCN of all active transactions. If that number is greater than the SCN found in the block the transaction must have already completed and it is safe to read the block. This optimisation is not shown in the blog post, you’d see  cell blocks helped by minscn optimization increase
  2. If the first optimisation does not help the cells make use of the commit cache-visible as cell commit cache queries and if a cache hit has been scored, cell blocks helped by commit cache. While the transaction in session 1 hasn’t completed yet you could only see the commit cache queries. After the commit the queries were successful

If both of them fail the block must be sent to the RDBMS layer for consistent read processing. In that case there is no difference to the treatment of direct path read blocks outside Exadata.

Summary

The commit cache is a great enhancement Exadata offers for databases using the platform. While non-Exadata deployments will have to clean out blocks that haven’t been subject to cleanouts every time the “dirty” block is read via direct path read, this can be avoided on Exadata if the transaction has committed.

Posted in Exadata, Linux | Leave a Comment »

 
Follow

Get every new post delivered to your Inbox.

Join 2,719 other followers