Martins Blog

Trying to explain complex things in simple terms

RAC One Node 12c-revisited on ODA 12.1.2.1.0 featuring database on ACFS

Posted by Martin Bach on December 18, 2014

Thanks to Andy Colvin (@acolvin) and the Enkitec lab I have been able to get my hands on an ODA X4-2. And since that’s a lot quieter than turning on my own lab server, and also a lot faster I used the environment to test RAC One Node in 12.1.0.2.1. I recently had a question from a customer about the usefulness of the solution and what it brings to the HA table. Well here you go.

There is just a slight caveat: to make use of the high availability features in RAC One Node you need to have an application that has been written to take advantage of them. Using Fast Connection Failover or Application Continuity is the way to go. Although I have written sample code to exploit the functionality of FCF/FAN and Application Continuity it has recently come to my attention that the very popular swingbench benchmark can do the same. I saw the connection pool tab in my demos but it didn’t “click”. Now I would like to show you what happens with your application during common scenarios in RAC One Node.

Scenario 1: Database relocation

For this blog post I’d like to try a database live migration from the first to the second host in the cluster. The database has been created using oakcli, primarily because I wanted to get more familiar with it. This is ODA release 12.1.2.1.0, the latest and greatest at the time of writing. I also love the fact that the ODA comes with quite a few of these: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz. To be precise, each node has 2s24c48t of Ivy Bridge Xeons. Nice.

[oracle@server1 ~]$ /opt/oracle/oak/bin/oakcli show version
Version
-------
12.1.2.1.0

As root (indeed!) I invoked oakcli, the primary ODA command line tool. The first time around I was bitten by not having read the documentation: if you specify the -version flag oakcli will create a new RDBMS home in both cluster nodes. That was _not_ what I had in mind. On the other hand I am now familiar with checking the environment (various permutations of oakcli show…) and deleting an Oracle home. This command created the datbase in the end:

[root@server1 ~]# oakcli create database -db RON -oh OraDb12102_home1

The command will prompt you for the datbase type (single instance, RAC One, clustered) – good stuff, simple to use, and more importantly: it worked first time.

Databases in ACFS

Now remember that from ODA release 12.1.2 onwards 11.2.0.4 and 12.1.x databases will be created on ACFS. In my case that looks slightly odd, I guess I have to get used to it first:

SQL> select name from v$datafile;

NAME
---------------------------------------------------------------------------------------------
/u02/app/oracle/oradata/datastore/.ACFS/snaps/RON/RON/datafile/o1_mf_system_b93xdj44_.dbf
/u02/app/oracle/oradata/datastore/.ACFS/snaps/RON/RON/datafile/o1_mf_sysaux_b93xdmhk_.dbf
/u02/app/oracle/oradata/datastore/.ACFS/snaps/RON/RON/datafile/o1_mf_undotbs1_b93xdosz_.dbf
/u02/app/oracle/oradata/datastore/.ACFS/snaps/RON/RON/datafile/o1_mf_undotbs2_b93xdypp_.dbf
/u02/app/oracle/oradata/datastore/.ACFS/snaps/RON/RON/datafile/o1_mf_users_b93xdzjj_.dbf
/u02/app/oracle/oradata/datastore/.ACFS/snaps/RON/RON/datafile/o1_mf_soe_tbs_b93yl62m_.dbf

6 rows selected.

SQL> select name from v$tempfile;

NAME 
----------------------------------------------------------------------------------------
/u02/app/oracle/oradata/datastore/.ACFS/snaps/RON/RON/datafile/o1_mf_temp_b93xdpl2_.tmp

SQL> select member from v$logfile

MEMBER
-----------------------------------------------------------------------------
/u01/app/oracle/oradata/datastore/RON/RON/onlinelog/o1_mf_1_b93xd3tb_.log
/u01/app/oracle/oradata/datastore/RON/RON/onlinelog/o1_mf_2_b93xd9m5_.log
/u01/app/oracle/oradata/datastore/RON/RON/onlinelog/o1_mf_3_b93y0hrq_.log
/u01/app/oracle/oradata/datastore/RON/RON/onlinelog/o1_mf_4_b93y0pf9_.log

SQL> show parameter control_files

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
control_files                        string      /u01/app/oracle/oradata/datast
                                                 ore/RON/RON/controlfile/o1_mf_
                                                 b93xd3ll_.ctl
SQL> show parameter db_recovery_file_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest                string      /u01/app/oracle/fast_recovery_
                                                 area/datastore/RON
db_recovery_file_dest_size           big integer 465G

The funny bit is that /u02/app/oracle/oradata/datastore contains snapshots… This is visible in the data file location where you find an .ACFS/snaps component. I believe the naming convention is to use .ACFS (invisible) followed by the snapshot name. You can view the snapshot detail using acfsutil:

[grid@server1 ~]$ acfsutil snap info RON /u02/app/oracle/oradata/datastore/
snapshot name:               RON
snapshot location:           /u02/app/oracle/oradata/datastore/.ACFS/snaps/RON
RO snapshot or RW snapshot:  RW
parent name:                 /u02/app/oracle/oradata/datastore/
snapshot creation time:      Wed Dec 17 15:26:24 2014

Interestingly the /u01 data store does not have a snapshot:

[grid@server1 ~]$ acfsutil snap info /u01/app/oracle/oradata/datastore
    number of snapshots:  0
    snapshot space usage: 0  ( 0.00 )

But I digress…

Database setup and configuration

After the database has been created, it is defined in Clusterware as follows:

[oracle@server1 ~]$ srvctl config database -d ron
Database unique name: RON
Database name: RON
Oracle home: /u01/app/oracle/product/12.1.0.2/dbhome_1
Oracle user: oracle
Spfile: /u02/app/oracle/oradata/datastore/.ACFS/snaps/RON/RON/spfileRON.ora
Password file: /u02/app/oracle/oradata/datastore/.ACFS/snaps/RON/RON/orapwRON
Domain: 
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: 
Disk Groups: 
Mount point paths: /u01/app/oracle/fast_recovery_area/datastore,/u01/app/oracle/oradata/datastore,/u02/app/oracle/oradata/datastore
Services: RON_racone
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: RON
Candidate servers: server1,server2
OSDBA group: dba
OSOPER group: racoper
Database instances: 
Database is administrator managed
[oracle@server1 ~]$ 

The mandatory service is defined like this:

[oracle@server1 ~]$ srvctl config service -d ron
Service name: RON_racone
Server pool: 
Cardinality: 1
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Global: false
Commit Outcome: false
Failover type: 
Failover method: 
TAF failover retries: 
TAF failover delay: 
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition: 
Pluggable database name: 
Maximum lag time: ANY
SQL Translation Profile: 
Retention: 86400 seconds
Replay Initiation Time: 300 seconds
Session State Consistency: 
GSM Flags: 0
Service is enabled
Preferred instances: RON_2
Available instances: 
[oracle@server1 ~]$ 

With RAC One Node it is even more important to connect to the service, and not to the instance.

Getting ready for the testing

The situation before the test is shown here:

[oracle@server1 ~]$ srvctl status database -d ron
Instance RON_2 is running on node server2
Online relocation: INACTIVE
[oracle@server1 ~]$ srvctl status service -d ron
Service RON_racone is running on instance(s) RON_2

For swingbench to benefit from FCF it needs to be configured so that it uses a connection pool. I have used the swingbench FAQ (http://www.dominicgiles.com/swingbenchfaq.html) to do so. My connection is defined as follows:

   <Connection>
      <UserName>soe</UserName>
      <Password>soe</Password>
      <ConnectString>//cluster-scan/RON_racone</ConnectString>
      <DriverType>Oracle jdbc Driver</DriverType>
        <ConnectionPooling>
            <PooledInitialLimit>50</PooledInitialLimit>
            <PooledMinLimit>25</PooledMinLimit>
            <PooledMaxLimit>100</PooledMaxLimit>
            <PooledInactivityTimeOut>50</PooledInactivityTimeOut>
            <PooledConnectionWaitTimeout>45</PooledConnectionWaitTimeout>
            <PooledPropertyCheckInterval>10</PooledPropertyCheckInterval>
            <PooledAbandonedConnectionTimeout>120</PooledAbandonedConnectionTimeout>
        </ConnectionPooling>
      <Properties> 
         <Property Key="StatementCaching">50</Property>
         <Property Key="FastFailover">true</Property>
         <Property Key="OnsConfiguration">nodes=server1:6200,server2:6200</Property>
         <Property Key="AppContinuityDriver">true</Property>
         <Property Key="FetchSize">20</Property>
      </Properties>
   </Connection>   

This part and the rest of the configuration is based largely on this post: Application Continuity in Oracle Database 12c (12.1.0.2).

Now – finally – for the test!

Starting charbench will trigger the creation of 100 sessions on the currently active instance (I set NumberOfUsers to 100) and they will be chugging along nicely. Here is the output after a couple of minutes:

[oracle@server1 bin]$ ./charbench -c oda_soe.xml 
Author  :        Dominic Giles
Version :        2.5.0.952

Results will be written to results.xml.
Hit Return to Terminate Run...

Time            Users   TPM     TPS

7:24:28 AM      100     17242   291                                            

I saw about 290 TPS for a total of around 18,000 TPM over some time. Let’s start the live migration. But first, I’d like to use the glass bowl to see what might happen (new in 12.1.0.2):

[oracle@server1 ~]$ srvctl predict database -db RON -verbose
Database ron will be started on node server1
Database ron will be stopped on node server2
Service ron_racone will be started on node server1

Well I guess that might be correct, but let’s try:

[oracle@server1 ~]$ srvctl relocate database -d ron -verbose -node server1
Configuration updated to two instances
Instance RON_1 started
Services relocated
Waiting for up to 30 minutes for instance RON_2 to stop ...
Instance RON_2 stopped
Configuration updated to one instance
[oracle@server1 ~]$ 

While this command executed I didn’t lose a single connection – 100 SOE connections were always established. It also takes a few second for the cache to warm up, during which the transaction rate dips a little bit:

[oracle@server1 bin]$ ./charbench -c oda_soe.xml
Author  :        Dominic Giles
Version :        2.5.0.952

Results will be written to results.xml.
Hit Return to Terminate Run...

Time            Users   TPM     TPS

7:29:11 AM      100     13135   191                                            

One minute after this the throughput is back to normal.

[oracle@server1 bin]$ ./charbench -c oda_soe.xml
Author  :        Dominic Giles
Version :        2.5.0.952

Results will be written to results.xml.
Hit Return to Terminate Run...

Time            Users   TPM     TPS

7:30:50 AM      100     16800   295          

Online relocation with RAC One node is certainly a possibility and works nicely if your application is ready for connection pools and the new way of connecting to the database.

Preview of the next article in the series

In the next article I’d like to add a couple of things I haven’t had time to test yet: 12.1.0.x extends the concept of the database resident connection pool to Java applications (and other middle tier) which would allow me to scale even further. I’d also like to show you what happens if the current RAC One Node instance fails. Stay tuned!

Posted in 12c Release 1, Linux | Tagged: , | Leave a Comment »

Adventures in RAC: gc buffer busy acquire and release

Posted by Martin Bach on December 16, 2014

It seems that I’m getting more and more drawn into the world of performance analysis, and since I sometimes tend to forget things I need to write them down. I almost enjoy the “getting there” more than ultimately solving the problem. You pick up quite a few things on the way.

This environment is Exadata 12.1.1.1.1/Oracle 12.1.0.2 but as with so many things the fact that the database is on Exadata shouldn’t matter.

So here is one of these posts, this time I’m writing up what I saw related to GC Buffer Busy Acquire.

gc buffer busy acquire?

Whenever I see a wait event I haven’t dealt with extensively in the past I try to provoke behaviour to study it more closely. But first you need to know the event’s meaning. One option is to check v$event_name:

SQL> select name, parameter1, parameter2, parameter3, wait_class
  2  from v$event_name where name = 'gc buffer busy acquire';

NAME                           PARAMETER1 PARAMETER2 PARAMETER3 WAIT_CLASS
------------------------------ ---------- ---------- ---------- -------------------
gc buffer busy acquire         file#      block#     class#     Cluster

So when you see this event in ASH/traces/v$session etc you know it’s a cluster wait and potentially limiting your processing throughput. It also tells you the file#, block# and class# of the buffer which you can link to v$bh. This view allows you to find the data object ID given these input parameters.

Using https://orainternals.wordpress.com/2012/04/19/gc-buffer-busy-acquire-vs-release/ as a source I worked out that the event has to do with acquiring a buffer (=block) in RAC (gc = global cache) on the local instance. If the block you need is on the remote instance you wait for it to be released, and the wait event is gc buffer busy release.

Since Oracle will clone blocks in buffer caches for consistent reads and use a shared lock on these for reading I thought that waiting can only happen if someone requested a block in XCUR (exclusive current) mode. So with that working hypothesis I went to work.

How to test

I started off writing a small java class that creates a connection pool against my RAC database. I initially used the default service name in the connect descriptor but had to find out that dbms_monitor.SERV_MOD_ACT_TRACE_ENABLE did not trace my sessions. In the end I created a true RAC service with CLB and RLB goals against both instances and I ended up with traces in the diagnostic_dest.

After setting up the UCP connection pool the code will create a number of threads that each will pull a connection from the pool, do some work (*) and hand it back to the pool as good citizens should do.

(*) The do some work bit is this::

...
                        try {

                                PreparedStatement pstmt = conn.prepareStatement(
                                  "select /* bufferbusy001 */ id, to_char(sysdate,'hh24:mi:ss') d " +
                                  "from t1 where id = ? for update");

                                int randomID = new Random().nextInt((1450770 - 1450765) + 1) + 1450765;
                                System.out.println("thread " + mThreadID + " now querying for ID " + randomID);
                                pstmt.setInt(1, randomID);

                                ResultSet rs = pstmt.executeQuery();

                                while (rs.next()) {
                                        System.out.println("Thread " + mThreadID + " reporting an id of "
                                        + rs.getInt("id") + ". Now it is " + rs.getString("d"));
                                }

                                rs.close();
                                pstmt.close();
                                conn.rollback();
                                conn.close();
                                conn = null;

                                Thread.sleep(2000);
                        } catch (Exception e) {
                                e.printStackTrace();
                        }
...

I think that’s how a Java developer would do it (with more error handling of course) but then I’m not a Java developer. It did work though! What I considered most important was to generate contention on a single block. Using dbms_rowid I could find out which IDs belong to (a random) block:

SQL> select * from (
  2    select id,DBMS_ROWID.ROWID_BLOCK_NUMBER (rowid, 'BIGFILE') as block
  3      from t1
  4 ) where block = 11981654;

        ID      BLOCK
---------- ----------
   1450765   11981654
   1450766   11981654
   1450767   11981654
   1450768   11981654
   1450769   11981654
   1450770   11981654

6 rows selected.

So if I manage to randomly select from the table where ID in the range …765 to …770 then I should be ok and just hit that particular block.

It turned out that the SQL statement completed so quickly I had to considerably ramp up the number of sessions in the pool to see anything. I went up from 10 to 500 before I could notice a change. Most of the statements are too quick to even be caught in ASH-Tanel’s ashtop script showed pretty much nothing except ON-CPU occasionally as well as the odd log file sync event. Snapper also reported sessions in idle state.

SQL> r
  1  select count(*), inst_id, status, sql_id, event, state
  2  from gv$session where module = 'BufferBusy'
  3* group by inst_id, status, sql_id, event, state

  COUNT(*)    INST_ID STATUS   SQL_ID        EVENT                          STATE
---------- ---------- -------- ------------- ------------------------------ -------------------
       251          1 INACTIVE               SQL*Net message from client    WAITING
       248          2 INACTIVE               SQL*Net message from client    WAITING

2 rows selected.

That’s what you see for most of the time.

Let me trace this for you

So in order to get any meaningful idea about the occurrence (or absence) of the gc buffer busy acquire event I added a MODULE to my sessions so I can later on run trcsess to combine traces. Here is the resulting raw trace, or rather an excerpt from it:

=====================
PARSING IN CURSOR #140650659166120 len=96 dep=0 uid=65 oct=3 lid=65 tim=4170152514049 hv=1500360262 ad='5b58a4a10' sqlid='6a5jfvpcqvbk6'
select /* bufferbusy001 */ id, to_char(sysdate,'hh24:mi:ss') d from t1 where id = :1  for update
END OF STMT
PARSE #140650659166120:c=0,e=5598,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2291732815,tim=4170152514046
WAIT #140650659166120: nam='gc buffer busy acquire' ela= 12250 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152526533
WAIT #140650659166120: nam='buffer busy waits' ela= 1890 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152528475
WAIT #140650659166120: nam='enq: TX - row lock contention' ela= 188606 name|mode=1415053318 usn<<16 | slot=1179674 sequence=1485 obj#=20520 tim=4170152717199
WAIT #140650659166120: nam='gc buffer busy acquire' ela= 1590 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152718839
WAIT #140650659166120: nam='enq: TX - row lock contention' ela= 36313 name|mode=1415053318 usn<<16 | slot=1245199 sequence=1894 obj#=20520 tim=4170152755340
WAIT #140650659166120: nam='gc buffer busy acquire' ela= 1268 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152756655
WAIT #140650659166120: nam='buffer busy waits' ela= 668 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152757363
WAIT #140650659166120: nam='KJC: Wait for msg sends to complete' ela= 11 msg=26941469232 dest|rcvr=65539 mtype=8 obj#=20520 tim=4170152757492
EXEC #140650659166120:c=1999,e=243530,p=0,cr=9,cu=4,mis=0,r=0,dep=0,og=1,plh=2291732815,tim=4170152757651
WAIT #140650659166120: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=20520 tim=4170152757709
FETCH #140650659166120:c=0,e=16,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=2291732815,tim=4170152757749
STAT #140650659166120 id=1 cnt=1 pid=0 pos=1 obj=0 op='FOR UPDATE  (cr=9 pr=0 pw=0 time=243443 us)'
STAT #140650659166120 id=2 cnt=2 pid=1 pos=1 obj=0 op='BUFFER SORT (cr=3 pr=0 pw=0 time=60 us)'
STAT #140650659166120 id=3 cnt=1 pid=2 pos=1 obj=48863 op='INDEX RANGE SCAN I_T1$SEC1 (cr=3 pr=0 pw=0 time=37 us cost=3 size=6 card=1)'
WAIT #140650659166120: nam='SQL*Net message from client' ela= 260 driver id=1413697536 #bytes=1 p3=0 obj#=20520 tim=4170152758109
CLOSE #140650659166120:c=0,e=5,dep=0,type=1,tim=4170152758141
XCTEND rlbk=1, rd_only=0, tim=4170152758170
WAIT #0: nam='gc buffer busy acquire' ela= 3764 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152761976
WAIT #0: nam='buffer busy waits' ela= 1084 file#=6 block#=11981654 class#=1 obj#=20520 tim=4170152763104
WAIT #0: nam='log file sync' ela= 246 buffer#=119491 sync scn=19690898 p3=0 obj#=20520 tim=4170152763502
WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=20520 tim=4170152763536

Result! There are gc buffer busy acquire events recorded. I can’t rule out TX-row lock contention since with all those threads and only 6 IDs to choose from there was going to be some locking on the same ID caused by the “for update” clause.

Now I am reasonably confident that I worked out at least one scenario causing a gc buffer busy acquire. You might also find the location of the blocks in the buffer cache interesting:

SYS:dbm011> select count(*), inst_id, block#, status
  2  from gv$bh where block# = 11981654
  3  group by inst_id, block#, status order by inst_id, status;

   COUNT(*)     INST_ID      BLOCK# STATUS
----------- ----------- ----------- ----------
          9           1    11981654 cr
          1           1    11981654 xcur
          9           2    11981654 cr

There is the one block in XCUR mode and 9 in CR mode in the buffer cache for that block.

Making it worse

Now I didn’t want to stop there, I was interested in what would happen under CPU load. During my career I noticed cluster waits appear primarily when you are CPU-bound (all other things being equal). This could be the infamous middle-tier-connection-pool-mismanagement or an execution plan going wrong with hundreds of users performing nested loop joins when they should hash-join large data sets… This is usually the point where OEM users ask the DBAs to do something against that “sea of grey” in the performance pages.

As with every cluster technology an overloaded CPU does not help. Well-I guess that’s true for all computing. To increase the CPU load I created 10 dd sessions to read from /dev/zero and write to /dev/null. Sounds silly but one of these hogs 1 CPU core 100%. With 10 out of 12 cores 100% occupied that way on node 1 I relaunched my test. The hypothesis that CPU overload has an effect was proven right by suddenly finding ASH samples of my session.

SQL> @ash/ashtop sql_id,session_state,event "sql_id='6a5jfvpcqvbk6'" sysdate-5/1440 sysdate

    Total
  Seconds     AAS %This   SQL_ID        SESSION EVENT
--------- ------- ------- ------------- ------- ----------------------------------------
      373     1.2   79% | 6a5jfvpcqvbk6 WAITING enq: TX - row lock contention
       54      .2   11% | 6a5jfvpcqvbk6 WAITING gc buffer busy release
       20      .1    4% | 6a5jfvpcqvbk6 ON CPU
       11      .0    2% | 6a5jfvpcqvbk6 WAITING gc buffer busy acquire
       11      .0    2% | 6a5jfvpcqvbk6 WAITING gc current block busy
        1      .0    0% | 6a5jfvpcqvbk6 WAITING gc current block 2-way

6 rows selected.

Using ASQLMON I can even see where time is spent:

SQL> @scripts/ash/asqlmon 6a5jfvpcqvbk6 % sysdate-1 sysdate

    SECONDS Activity Visual       Line ID Parent ASQLMON_OPERATION                   SESSION EVENT                                         AVG_P3 OBJ_ALIAS_ ASQLMON_PR
----------- -------- ------------ ------- ------ ----------------------------------- ------- ---------------------------------------- ----------- ---------- ----------
         38    2.4 % |          |       0        SELECT STATEMENT                    ON CPU                                                883065
          1     .1 % |          |       1      0  FOR UPDATE                         WAITING gc current block 2-way                      33554433  [SEL$1]
         25    1.6 % |          |       1      0                                     ON CPU                                            5369727.36  [SEL$1]
         17    1.1 % |          |       1      0                                     WAITING buffer busy waits                                  1  [SEL$1]
        109    6.8 % |#         |       1      0                                     WAITING gc buffer busy release                             1  [SEL$1]
         31    1.9 % |          |       1      0                                     WAITING gc buffer busy acquire                             1  [SEL$1]
         27    1.7 % |          |       1      0                                     WAITING gc current block busy                       33554433  [SEL$1]
        768   48.0 % |#####     |       1      0                                     WAITING enq: TX - row lock contention            6685.143229  [SEL$1]
          3     .2 % |          |       2      1   BUFFER SORT                       ON CPU                                                     0
          2     .1 % |          |       3      2    INDEX RANGE SCAN [I_T1$SEC1]     ON CPU                                                     0 T1@SEL$1 [ [A:]
                                                                                                                                                  SEL$1]     "ID"=:1


          2     .1 % |          |       0        SELECT STATEMENT                    ON CPU                                            16777216.5
          2     .1 % |          |       1      0  FOR UPDATE                         WAITING gc current block busy                       33554433  [SEL$1]
         24    1.5 % |          |       1      0                                     WAITING write complete waits                               0  [SEL$1]
          9     .6 % |          |       1      0                                     WAITING gc buffer busy acquire                             1  [SEL$1]
         30    1.9 % |          |       1      0                                     WAITING gc buffer busy release                             1  [SEL$1]
          9     .6 % |          |       1      0                                     WAITING buffer busy waits                                  1  [SEL$1]
          7     .4 % |          |       1      0                                     ON CPU                                           158.8571429  [SEL$1]
        496   31.0 % |###       |       1      0                                     WAITING enq: TX - row lock contention            6396.395161  [SEL$1]
                   % |          |       2      1   BUFFER SORT
                   % |          |       3      2    INDEX RANGE SCAN [I_T1$SEC1]                                                                  T1@SEL$1 [ [A:]
                                                                                                                                                  SEL$1]     "ID"=:1

Further Reading

I’m sure there is a wealth of resources available out there, in my case Riyaj’s blog helped me a lot. He even tagged posts with gc buffer busy: https://orainternals.wordpress.com/tag/gc-buffer-busy

Have a look at the Oaktable World 2014 agenda and watch Tanel Poder’s session attentively. You’d be surprised how many scripts he made publicly available to troubleshoot perform. Like snapper? It’s only the tip of the iceberg. And if you can, you should really attend his advanced troubleshooting seminar.

Posted in 12c Release 1, RAC | 2 Comments »

UKOUG post conference geek update part 1 – ACFS for Oracle databases

Posted by Martin Bach on December 12, 2014

One of the many interesting things I heard at the conference this time around was that Oracle’s future direction includes the use of database files on ACFS. When ACFS came out this was strictly ruled out, but has been possible for a little while now, I believe with 12.1.0.1.0. With the Oracle Database Appliance (ODA) using this deployment option and hearing about it at the conference, a little further investigation was in order. During one of the presentation @OracleRACPM Markus Michalewicz had a reference to a script that I didn’t know on his slides. The script is called gDBClone, and I wanted to see how it works. The idea is that the script can be used to create a snap-clone of a database if the source is on ACFS and in archivelog mode.

As it turned out there were a few hurdles along the way and I will point them out so you don’t run into the same issues.

UPDATE: these hurdles might be there since Oracle databases aren’t supported on ACFS in Oracle Restart: https://docs.oracle.com/database/121/OSTMG/asmfilesystem.htm#OSTMG95961 Please consider yourself warned! For the rest of the article let’s pretend that this is a clustered environment. The article is therefore purely educational and no encouragement to do this in real life.

The script and associated white paper assume that you have two systems-production and dev/test/uat. The goal of the clone procedure is to be able to create a copy of your live database on your dev/test/uat cluster. This database can then be used as the source for ACFS snapshots. This can be represented in ASCII art:


+------+   clone   +------------------+
| PROD |   ---->   |      MASTER      |
+------+           |      /     \     |  snap #1
                   |     /       \    |  snap #2
                   | CLONE1    CLONE2 |
                   +------------------+

It is by no means required to follow this approach, and if nothing else then you can use the clone script to run a RMAN duplication in a single command. I once wrote a script to do the same but this was a truly complex thing to do.

For this article I’ll assume that you clone PROD (named ORCL) to MASTER, and snap MASTER to CLONE1.

The setup

Since my “travel-lab” is not a RAC cluster I opted for an installation of 12.1.0.2.1 for Oracle Restart and the database to keep it simple.

ACFS storage will be provided by and ADVM volume from disk group data. In my case this was quite easy to accomplish. Since this was an 12.1 system anyway I created my DATA disk group with ASM, RDBMS and ADVM compatibility for 12.1.0.2.0.

You create the ASM Dynamic Volume Manager (ADVM) volume on top of the ASM disk group, in my case on DATA. This is a simple task and can be performed by a variety of tools, I opted for a call to asmcmd:

ASMCMD> volcreate
usage: volcreate -G <diskgroup> -s <size> [ --column <number> ] [ --width <stripe_width> ]
[--redundancy {high|mirror|unprotected} ] [--primary {hot|cold}] [--secondary {hot|cold}] <volume>
help:  help volcreate
ASMCMD> volcreate -G data -s 10G volume1
ASMCMD> volinfo --all
Diskgroup Name: DATA

         Volume Name: VOLUME1
         Volume Device: /dev/asm/volume1-162
         State: ENABLED
         Size (MB): 10240
         Resize Unit (MB): 64
         Redundancy: UNPROT
         Stripe Columns: 8
         Stripe Width (K): 1024
         Usage:
         Mountpath:

Initially I got a message that ASM could not communicate with the (ASM) Volume driver.

SQL> /* ASMCMD */alter diskgroup data add volume 'volume1' size 10G
ORA-15032: not all alterations performed
ORA-15477: cannot communicate with the volume driver
ERROR: /* ASMCMD */alter diskgroup data add volume 'volume1' size 10G

This was solved by modprobe calls to the oracle kernel modules.

[root@server5 ~]# modprobe  oracleacfs
[root@server5 ~]# modprobe  oracleadvm
[root@server5 ~]# modprobe  oracleoks
[root@server5 ~]# lsmod | grep oracle
oracleadvm            507006  7
oracleacfs           3307457  1
oracleoks             505749  2 oracleadvm,oracleacfs

There is a known issue with the execution of udev rules (/etc/udev/rules.d/55-usm.rules) that might delay the setting of permissions. On my system a udevadm trigger solved it. Still odd (Oracle Linux 6.6/ UEK 3 3.8.13-44.1.1.el6uek.x86_64), especially since a call to acfsdriverstate supported stated it was supported.

Once the volume is created it needs to be formatted using ACFS. This can be done in this way:

[oracle@server5 ~]$  mkfs -t acfs /dev/asm/volume1-162
mkfs.acfs: version                   = 12.1.0.2.0
mkfs.acfs: on-disk version           = 39.0
mkfs.acfs: volume                    = /dev/asm/volume1-162
mkfs.acfs: volume size               = 10737418240  (  10.00 GB )
mkfs.acfs: Format complete.

Trying to register the file system in the ACFS registry pointed me to the first problem with the procedure on Oracle Restart:

[root@server5 ~]# acfsutil registry -a /dev/asm/volume1-162 /u01/oradata
Usage: srvctl <command> <object> [<options>]
    commands: enable|disable|start|stop|status|add|remove|modify|update|getenv|setenv|
       unsetenv|config|upgrade|downgrade
    objects: database|service|asm|diskgroup|listener|home|ons
For detailed help on each command and object and its options use:
  srvctl <command> -help [-compatible] or
  srvctl <command> <object> -help [-compatible]
PRKO-2012 : filesystem object is not supported in Oracle Restart
acfsutil registry: ACFS-03111: unable to add ACFS mount /u01/oradata within Oracle Registry

Interesting-but not a problem in the lab. I usually mount file systems where in my opinion they logically belong to. In this case I mounted the file system to /u01/oradata. Spoiler alert: this is not what you are supposed to do if you want to use the gDBClone script.

To cut a long story short, the mount point was assumed to be in /acfs for a snap’d or cloned database. The script also assumes that your system is a RAC environment, and I found it not to work well in 12.1 at all due to the way it tries to get the database version from the OCR (OLR) profile. First the new ACFS file system is mounted, then made accessible to the oracle user:

[root@server5 ~]# mount -t acfs /dev/asm/volume1-162 /acfs
[root@server5 ~]# chown -R oracle:dba /acfs
[root@server5 ~]# mount | grep acfs
/dev/asm/volume1-162 on /acfs type acfs (rw)

Creating the MASTER database

My source system resides in ASM, and there is no way of creating COW clones in ASM. The MASTER database must be moved to ACFS first as a result, from where you can take storage snapshots.

A quick hack was required since I only had 1 machine, so I created the source database (PROD in the ASCII art example) as “orcl” using the following call to dbca and setting it to archivelog mode:

dbca -silent -createDatabase -templateName General_Purpose.dbc  -gdbname orcl \
> -sid orcl -sysPassword pwd1 -systemPassword pwd2  -emConfiguration none  \
> -storageType ASM  -asmsnmpPassword pwd3 -diskGroupName data -recoveryGroupName reco  \
> -totalMemory 2048

The next step is to create the MASTER database. The gDBClone script checks if a database is on ACFS in function checkIfACFS(). If a database is found to be on ASM (by checking the SPFILE location in the Clusterware profile) it requires it to be CLONED as opposed to SNAPped. Here is the command to clone ORCL to MASTER. You must set your environment to an RDBMS home before executing the script.

# ./gDBClone clone -sdbname orcl -sdbhost server5 -sdbport 1521 -tdbname master -acfs /acfs -debug

It uses a RMAN duplicate under the covers. The arguments are almost self-explanatory. It takes the location of the source database (can be remote) and where you want to store the database. Since I desperately want to store the clone on ACFS I specified it in the command. The -debug flag prints more verbose output, a lot of information is also found in /var/log/gDBName/. Note that the script is to be run as root :(

To get there a few tricks were necessary in Oracle Restart environments, I’ll feed them back to Oracle to see if they can be added to the script. You probably won’t encounter problems when using a clustered 11.2 installation.

At one point the script checks the VERSION flag in the database resource profile (crsctl stat res ora..db -p | grep ‘^VERSION’, and since that field no longer shows up in 12.1 the variable is undefined in perl and the execution fails. The problem with Oracle Restart is related to setting the database type to SINGLE on the local host (srvctl add database … -c SINGLE -x …). Oracle Restart doesn’t understand those switches. Also you can’t set the db_unique_name in 12c to a name of a database already registered in the OCR. Moving the step to register the database further down in the execution helped.

Please don’t get me wrong: the script is very neat in that it allows you to run an RMAN duplicate command over the network, potentially creating the backup on the fly. In 12c RMAN will pull backup pieces if they exist instead of creating a backup on the fly to reduce its impact on the production database. Oh and I forgot-you can even convert it to a clustered database if it is not already.

Creating an ACFS clone database

Next up is the creation of the CLONE database. My first attempts were unsuccessful. The main trick seems to be to keep the ACFS mount underneath the / (root) file system. Mounting it elsewhere caused the script to fail. If you can read perl, check the $acfs variable and checkIfACFS() routine to understand why. Also, the compatibility of your ASM diskgroup containing the volume has to be greater than 11.2.0.3 or you get this error:

[root@server5 ~]# ./gDBClone snap -sdbname master -tdbname clone1
2014-12-12 11:03:53: I Getting host info...
2014-12-12 11:03:53: I Starting.....
2014-12-12 11:03:53: I Validating environment.....

Enter the source MASTER SYSDBA password:
2014-12-12 11:03:56: I Getting OH version...
2014-12-12 11:04:04: I Checking SCAN listener server5.example.com:1521...
2014-12-12 11:04:05: I Checking database CLONE1 existence...
2014-12-12 11:04:05: I Checking registered instance CLONE1 ...
2014-12-12 11:04:10: I Checking if the Source Database MASTER it's on ASM
2014-12-12 11:04:10: I Source Database MASTER it's on ACFS
2014-12-12 11:04:10: I Checking snapshot CLONE1 existence
2014-12-12 11:04:10: I Setting up clone environment....
2014-12-12 11:04:10: I Starting auxiliary listener....
2014-12-12 11:05:10: I Creating ACFS snapshot.....
2014-12-12 11:05:10: I Start/stop MASTER to check consistency.
2014-12-12 11:05:42: I Checking if the source database MASTER is stored on an ACFS snapshot
acfsutil snap create: ACFS-03048: Snapshot operation could not complete.
acfsutil snap create: ACFS-03174: The Oracle ASM Dynamic Volume Manager (Oracle ADVM) compatibility
    attribute for the disk group is less than 11.2.0.3.0.
2014-12-12 11:05:42: E Error getting ACFS snapshot

This can be fixed quite easily provided that you don’t break anything. Remember that compatibility can be raised but never lowered. With everything in place, the clone is quick:

[root@server5 ~]# ./gDBClone snap -sdbname master -tdbname clone1
2014-12-12 11:20:17: I Getting host info...
2014-12-12 11:20:17: I Starting.....
2014-12-12 11:20:17: I Validating environment.....

Enter the source MASTER SYSDBA password:
2014-12-12 11:20:19: I Getting OH version...
2014-12-12 11:20:27: I Checking SCAN listener server5.example.com:1521...
2014-12-12 11:20:27: I Checking database CLONE1 existence...
2014-12-12 11:20:28: I Checking registered instance CLONE1 ...
2014-12-12 11:20:31: I Checking if the Source Database MASTER it's on ASM
2014-12-12 11:20:31: I Source Database MASTER it's on ACFS
2014-12-12 11:20:31: I Checking snapshot CLONE1 existence
2014-12-12 11:20:31: I Setting up clone environment....
2014-12-12 11:20:31: I Starting auxiliary listener....
2014-12-12 11:21:31: I Creating ACFS snapshot.....
2014-12-12 11:21:31: I Start/stop MASTER to check consistency.
2014-12-12 11:21:58: I Checking if the source database MASTER is stored on an ACFS snapshot
2014-12-12 11:21:58: I Setting up snapshot database.....
2014-12-12 11:21:58: I Creating Clone parameter files
2014-12-12 11:22:01: I Activating clone database.....
PRKO-2002 : Invalid command line option: -j
2014-12-12 11:23:04: I Successfully created clone "CLONE1" database

Now what does this database look like? I remember a conversation with Andy Colvin about this since he saw it on our ODA first, but consider this:

[oracle@server5 ~]$ sq

SQL*Plus: Release 12.1.0.2.0 Production on Fri Dec 12 11:51:22 2014

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
/acfs/.ACFS/snaps/CLONE1/MASTER/MASTER/datafile/o1_mf_system_b8p3pxs7_.dbf
/acfs/.ACFS/snaps/CLONE1/MASTER/MASTER/datafile/o1_mf_sysaux_b8p3pz4f_.dbf
/acfs/.ACFS/snaps/CLONE1/MASTER/MASTER/datafile/o1_mf_undotbs1_b8p3q2q0_.dbf
/acfs/.ACFS/snaps/CLONE1/MASTER/MASTER/datafile/o1_mf_users_b8p3qo2o_.dbf

Now the interesting thing is when you look into that top-level ACFS directory:

SQL> !ls -la /acfs/
total 76
drwxr-xr-x.  5 oracle dba       4096 Dec 12 10:35 .
dr-xr-xr-x. 27 root   root      4096 Dec 12 10:30 ..
drwx------.  2 oracle dba      65536 Dec 12 10:31 lost+found
drwxr-xr-x.  3 oracle oinstall  4096 Dec 12 10:52 MASTER

No .ACFS file! Even more interestingly, you can actually see what’s in the .ACFS directory when referring to it directly

SQL> !ls -l /acfs/.ACFS
total 8
drwxrwx---. 6 root root 4096 Dec 12 10:31 repl
drwxr-xr-x. 3 root root 4096 Dec 12 11:21 snaps

What I want to do next is to run a few performance benchmarks on a database in ACFS to see how it holds up. But that’s for another day…

Posted in 12c Release 1, Automatic Storage Management, Linux | Tagged: , , | 1 Comment »

Little things worth knowing: creating the Swingbench order entry schema “lights out”

Posted by Martin Bach on December 2, 2014

This is a little note, primarily to myself I guess, about the creation of the order entry schema (part of Swingbench, written by Dominic Giles) when no VNC sessions are available (although you can almost always use port-forwarding :). Instead, you can create the schema on the command line. I always execute commands on remote systems in screen for increased peace of mind. Should the network drop, the order entry generation will continue as if nothing ever happened.

Like many others I use Swingbench during trainings and presentations to have some activity on a system. Very useful for demonstrating ASH and OEM, and many other things too!

This article references swingbench 2.5, the latest version at the time of writing. It requires a JDK to be present, I simply used the one bundled with Oracle 12.1.0.2.0 which is my target database version. You first start by downloading Swingbench from Dominic’s website and unzip it in a directory from where you want to run it. It is usually a good idea to keep the load generator on a different (set of) machine(s). As long as you can connect to the database you want to use to house the SOE schema you’re fine.

Creating the SOE schema

Before you can run swingbench (or minibench/charbench) you need to create the SOE schema. The oewizard is the tool of choice. If you haven’t done so, add java to your path. Then you can have a look at the available options:

$ JAVA_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1/jdk
$ export PATH=$PATH:$JAVA_HOME/bin

$ ./oewizard -h
usage: parameters:
 -allindexes             build all indexes for schema
 -bigfile                use big file tablespaces
 -c <filename>           wizard config file
 -cl                     run in character mode
 -compositepart          use a composite paritioning model if it exisits
 -compress               use default compression model if it exists
 -create                 create benchmarks schema
 -cs <connectString>     connectring for database
 -dba <username>         dba username for schema creation
 -dbap <password>        password for schema creation
 -debug                  turn on debugging output
 -debugf                 turn on debugging output to file (debug.log)
 -df <datafile>          datafile name used to create schema in
 -drop                   drop benchmarks schema
 -dt <driverType>        driver type (oci|thin)
 -g                      run in graphical mode (default)
 -generate               generate data for benchmark if available
 -h,--help               print this message
 -hashpart               use hash paritioning model if it exists
 -hcccompress            use HCC compression if it exisits
 -nocompress             don't use any database compression
 -noindexes              don't build any indexes for schema
 -nopart                 don't use any database partitioning
 -normalfile             use normal file tablespaces
 -oltpcompress           use OLTP compression if it exisits
 -p <password>           password for benchmark schema
 -part                   use default paritioning model if it exists
 -pkindexes              only create primary keys for schema
 -rangepart              use a range paritioning model if it exisits
 -s                      run in silent mode
 -scale <scale>          mulitiplier for default config
 -sp <soft partitions>   the number of softparitions used. Defaults to cpu
                         count
 -tc <thread count>      the number of threads(parallelism) used to
                         generate data. Defaults to cpus*2
 -ts <tablespace>        tablespace to create schema in
 -u <username>           username for benchmark schema
 -v                      run in verbose mode when running from command
                         line
 -version <version>      version of the benchmark to run

Note that not all partitioning schemes are valid in this case, and neither are all compression methods. I found the below command to create the schema the way I want it.

./oewizard -allindexes -cl -create -cs //cluster-scan/sbench -dba martin \
> -dbap notTellingYouHere -ts soe_tbs -part -p soe -scale 8 -tc 8 -u soe -v

SwingBench Wizard
Author  :        Dominic Giles
Version :        2.5.0.949

Running in Lights Out Mode using config file : oewizard.xml
Connecting to : jdbc:oracle:thin:@//cluster-scan/sbench
Connected
Starting script ../sql/soedgcreateuser.sql
Script completed in 0 hour(s) 0 minute(s) 0 second(s) 104 millisecond(s)
Starting run
Starting script ../sql/soedgdrop2.sql
Script completed in 0 hour(s) 0 minute(s) 0 second(s) 55 millisecond(s)
Starting script ../sql/soedgcreatetableshash2.sql
Script completed in 0 hour(s) 0 minute(s) 0 second(s) 520 millisecond(s)
Starting script ../sql/soedgviews.sql
Script completed in 0 hour(s) 0 minute(s) 0 second(s) 19 millisecond(s)
Starting script ../sql/soedgsqlset.sql
Script completed in 0 hour(s) 0 minute(s) 0 second(s) 84 millisecond(s)
Inserting data into table ADDRESSES_11499978
Inserting data into table ADDRESSES_3999993

-- more data

Completed processing table LOGON_7943271 in 0:00:25
Connection cache closed
Starting script ../sql/soedganalyzeschema2.sql
Script completed in 0 hour(s) 1 minute(s) 20 second(s) 701 millisecond(s)
Starting script ../sql/soedgconstraints2.sql
Script completed in 0 hour(s) 1 minute(s) 41 second(s) 520 millisecond(s)
Starting script ../sql/soedgindexes2.sql
Script completed in 0 hour(s) 0 minute(s) 44 second(s) 685 millisecond(s)
Starting script ../sql/soedgsequences2.sql
Script completed in 0 hour(s) 0 minute(s) 3 second(s) 53 millisecond(s)
Starting script ../sql/soedgpackage2_header.sql
Script completed in 0 hour(s) 0 minute(s) 0 second(s) 215 millisecond(s)
Starting script ../sql/soedgpackage2_body.sql
Script completed in 0 hour(s) 0 minute(s) 0 second(s) 127 millisecond(s)
Starting script ../sql/soedgsetupmetadata.sql
Script completed in 0 hour(s) 0 minute(s) 0 second(s) 614 millisecond(s)

============================================
|           Datagenerator Run Stats        |
============================================
Connection Time                        0:00:00.005
Data Generation Time                   0:10:38.185
DDL Creation Time                      0:03:51.630
Total Run Time                         0:14:29.823
Rows Inserted per sec                      151,713
Data Generated (MB) per sec                   12.1
Actual Rows Generated                   97,718,288

Connecting to : jdbc:oracle:thin:@//cluster-scan/sbench
Connected

Post Creation Validation Report
===============================
The creation of the schema appears to have been unsuccessful. See the following sections for further details.

Valid Objects
=============
Valid Tables : 'ORDERS','ORDER_ITEMS','CUSTOMERS','WAREHOUSES','ORDERENTRY_METADATA','INVENTORIES',
'PRODUCT_INFORMATION','PRODUCT_DESCRIPTIONS','ADDRESSES','CARD_DETAILS'
Valid Indexes : 'ORD_CUSTOMER_IX','ORD_ORDER_DATE_IX','ORD_WAREHOUSE_IX','ORDER_ITEMS_PK','ITEM_ORDER_IX',
'ITEM_PRODUCT_IX','CUST_EMAIL_IX','CUST_ACCOUNT_MANAGER_IX','CUST_FUNC_LOWER_NAME_IX',
'CARDDETAILS_CUST_IX''PRD_DESC_PK','PROD_NAME_IX','PRODUCT_INFORMATION_PK','PROD_SUPPLIER_IX',
'PROD_CATEGORY_IX','INVENTORY_PK','INV_PRODUCT_IX','INV_WAREHOUSE_IX','ORDER_PK',
'ORD_SALES_REP_IX','WAREHOUSES_PK','WHS_LOCATION_IX','CUSTOMERS_PK','ADDRESS_PK',
'ADDRESS_CUST_IX',
'CARD_DETAILS_PK'
Valid Views : 'PRODUCTS','PRODUCT_PRICES'
Valid Sequences : 'CUSTOMER_SEQ','ORDERS_SEQ','ADDRESS_SEQ','LOGON_SEQ','CARD_DETAILS_SEQ'
Valid Code :

Invalid Objects (1)
================
Invalid Tables :
Invalid Indexes :
Invalid Views :
Invalid Sequences :
Invalid Code : 'ORDERENTRY'

Missing Objects (0)
================
Missing Tables :
Missing Indexes : 
Missing Views :
Missing Sequences :
Missing Code :
Schema Created

To compile the package SYS needs to grant execute on DBMS_LOCK to SOE.

This results in a schema with the following number of rows:

SQL> declare
  2   v_cnt number;
  3  begin
  4   for i in (select table_name from dba_tables where owner='SOE') loop
  5    execute immediate 'select count(*) from soe.' || i.table_name into v_cnt;
  6    dbms_output.put_line(i.table_name || ' - ' || to_char(v_cnt, '999,999,999'));
  7   end loop;
  8  end;
  9  /
ORDER_ITEMS -   34,316,243
ORDERS -   11,438,320
INVENTORIES -      896,853
PRODUCT_INFORMATION -        1,000
LOGON -   19,063,872
PRODUCT_DESCRIPTIONS -        1,000
ORDERENTRY_METADATA -            4
CUSTOMERS -    8,000,000
ADDRESSES -   12,000,000
CARD_DETAILS -   12,000,000
WAREHOUSES -        1,000

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.70

SQL> compute sum of m on report
SQL> break on report

SQL> select sum(bytes)/power(1024,2) m, segment_name, segment_type
  2   from dba_segments where owner = 'SOE'
  3  group by segment_name, segment_type
  4  order by 1;

              M SEGMENT_NAME                   SEGMENT_TYPE
--------------- ------------------------------ ------------------
            .06 WAREHOUSES                     TABLE
            .06 PRODUCT_INFORMATION_PK         INDEX
            .06 ORDERENTRY_METADATA            TABLE
            .13 WHS_LOCATION_IX                INDEX
            .13 PROD_CATEGORY_IX               INDEX
            .13 WAREHOUSES_PK                  INDEX
            .13 PROD_SUPPLIER_IX               INDEX
            .13 PROD_NAME_IX                   INDEX
            .13 PRD_DESC_PK                    INDEX
            .25 PRODUCT_INFORMATION            TABLE
            .31 PRODUCT_DESCRIPTIONS           TABLE
          16.69 INV_WAREHOUSE_IX               INDEX
          16.69 INV_PRODUCT_IX                 INDEX
          18.00 INVENTORY_PK                   INDEX
          88.00 INVENTORIES                    TABLE
         169.00 CUST_ACCOUNT_MANAGER_IX        INDEX
         177.06 CUSTOMERS_PK                   INDEX
         204.06 CUST_DOB_IX                    INDEX
         242.19 ORD_SALES_REP_IX               INDEX
         253.38 ORDER_PK                       INDEX
         254.31 CUST_FUNC_LOWER_NAME_IX        INDEX
         265.06 ORD_CUSTOMER_IX                INDEX
         265.69 ADDRESS_PK                     INDEX
         265.75 CARD_DETAILS_PK                INDEX
         276.88 CARDDETAILS_CUST_IX            INDEX
         278.13 ORD_WAREHOUSE_IX               INDEX
         278.50 ADDRESS_CUST_IX                INDEX
         291.88 ORD_ORDER_DATE_IX              INDEX
         347.81 CUST_EMAIL_IX                  INDEX
         717.88 ITEM_PRODUCT_IX                INDEX
         788.94 ITEM_ORDER_IX                  INDEX
         864.00 LOGON                          TABLE PARTITION
         968.00 CARD_DETAILS                   TABLE PARTITION
       1,352.00 CUSTOMERS                      TABLE PARTITION
       1,408.00 ADDRESSES                      TABLE PARTITION
       1,544.00 ORDERS                         TABLE PARTITION
       2,368.00 ORDER_ITEMS                    TABLE PARTITION
---------------
      13,721.38
37 rows selected.

Happy benchmarking!

Posted in 12c Release 1, Linux | Tagged: , , | Leave a Comment »

Concurrent statistics gathering in 12.1.0.2

Posted by Martin Bach on November 10, 2014

Prompted by an actual task at hand I spent some time investigating an 11.2.0.2 feature – concurrent statistics gathering. It has been on my to-do list for quite some time but so far I didn’t have a use case, and use cases make it so much easier. The question was-how can I gather statistics on a really large, partitioned table? Previously, you could revert to the degree in dbms_stats.gather_table_stats to ensure that statistics were gathered in parallel. This is all good, but sometimes you need more umph. Some DBAs wrote scripts to execute individual statistic gathering jobs against partitions in parallel, using the tabname and partname arguments in dbms_stats.gather_table_stats(). But that requires manual effort – and the not-quite-so-new concurrent option is so much nicer. Let me take you along the ride… Actually I have to tell the story starting with the happy ending as I had a few snags along the way. This is 12.1.0.2.1 on Oracle Linux 6.5.

Enabling concurrent statistics gathering

A flag needs to be set to enable the new concurrent mechanism for gathering statistics. The initial blog post on the subject set the flag globally:

https://blogs.oracle.com/optimizer/entry/gathering_optimizer_statistics_is_one

Scroll down to “configuration and settings” for the detail. I personally would rather like to set this at table level, but despite the Oracle documentation suggesting that it was possible, it is not. You can check the header of DBMS_STATS to see the different values that can be passed as PNAME to set_table_prefs().

After enabling the concurrent flag on the database level, you can query it:

SQL> select dbms_stats.get_prefs('CONCURRENT')
  2  from dual;

DBMS_STATS.GET_PREFS('CONCURRENT')
------------------------------------------------------------------------------------
TRUE

SQL> select dbms_stats.get_prefs('CONCURRENT',user,'BIGTAB_PART') flag from dual;

FLAG
-------------------------------------------------------------------------------------
TRUE

We should be ready for using the concurrent method now. Here is some background information about the table in question:

SQL>  select partition_name, num_rows, sample_size, last_analyzed 
  2   from user_tab_partitions where table_name = 'BIGTAB_PART'
  3  ;

PARTITION_NAME                   NUM_ROWS SAMPLE_SIZE LAST_ANAL
------------------------------ ---------- ----------- ---------
SYS_P1254
SYS_P1253
SYS_P1252
SYS_P1251
SYS_P1250
SYS_P1249
SYS_P1248
SYS_P1247
SYS_P1246
SYS_P1245
P_MAN

11 rows selected.

SQL> select partition_name, bytes/power(1024,2) m from user_segments where segment_name = 'BIGTAB_PART';

PARTITION_NAME                          M
------------------------------ ----------
SYS_P1245                            8064
SYS_P1246                            8064
SYS_P1247                            8064
SYS_P1248                            8064
P_MAN                                8064
SYS_P1249                            8064
SYS_P1250                            8064
SYS_P1251                            8064
SYS_P1252                            8064
SYS_P1253                            8064
SYS_P1254                               8

11 rows selected.

So each partition is about 8 GB in size. A little helper script prints all the table preferences:

getting table preferences for table BIGTAB_PART
value for CASCADE is DBMS_STATS.AUTO_CASCADE
value for DEGREE is NULL
value for ESTIMATE_PERCENT is DBMS_STATS.AUTO_SAMPLE_SIZE
value for METHOD_OPT is FOR ALL COLUMNS SIZE AUTO
value for NO_INVALIDATE is DBMS_STATS.AUTO_INVALIDATE
value for GRANULARITY is AUTO
value for PUBLISH is TRUE
value for INCREMENTAL is FALSE
value for STALE_PERCENT is 10
value for CONCURRENT is TRUE

Now you can simply execute the call to dbms_stats.gather_table_stats(user, ‘BIGTAB_PART’, degree=>4) and thanks to all the defaults the gathering of statistics will be performed by multiple jobs. In order for this to work you need to have job_queue_processes set to > 4 (the default is 1000 in 12.1.0.2) and you need a resource manager plan to be active. Refer to the oracle blog post referenced earlier for more information about statement queueing and the execution of these jobs. The post also recommends setting parallel_adaptive_multi_user to false, which is the default in 11.2. Just be careful changing PX parameters-these can have undesired side effects.

Using the diagnostic queries take from the Oracle blog post you can see what’s going on. In my case, 6 jobs were created:

select job_name, state, comments from dba_scheduler_jobs where job_class like 'CONC%'

JOB_NAME             STATE           COMMENTS
-------------------- --------------- ----------------------------------------------------------------------------------------------------
ST$T3606_6_B2        SCHEDULED       "MARTIN"."BIGTAB_PART"."SYS_P1254";"MARTIN"."BIGTAB_PART"
ST$T3606_5_B2        RUNNING         "MARTIN"."BIGTAB_PART"."SYS_P1252";"MARTIN"."BIGTAB_PART"."SYS_P1253"
ST$T3606_4_B2        SCHEDULED       "MARTIN"."BIGTAB_PART"."SYS_P1250";"MARTIN"."BIGTAB_PART"."SYS_P1251"
ST$T3606_3_B2        RUNNING         "MARTIN"."BIGTAB_PART"."SYS_P1248";"MARTIN"."BIGTAB_PART"."SYS_P1249"
ST$T3606_2_B2        SCHEDULED       "MARTIN"."BIGTAB_PART"."SYS_P1246";"MARTIN"."BIGTAB_PART"."SYS_P1247"
ST$T3606_1_B2        RUNNING         "MARTIN"."BIGTAB_PART"."P_MAN";"MARTIN"."BIGTAB_PART"."SYS_P1245"

6 rows selected.

By the way if you are interested in the payload, you can use DBMS_METADATA:

SQL> get get_job
  1* SELECT DBMS_METADATA.GET_DDL('PROCOBJ', '&1') from dual;
SQL>

You pass it the job name and you get some pretty cryptic looking output :)

Interlude

Actually, you have to see this…

DBMS_METADATA.GET_DDL('PROCOBJ','ST$T3982_1')
-----------------------------------------------------------------------------------------


BEGIN
dbms_scheduler.create_job('"ST$T3982_1"',
job_type=>'PLSQL_BLOCK', job_action=>
'declare c dbms_stats.CContext := dbms_stats.CContext(); begin c.extend(30); c(1):=''ST$T3982_1;1;855;1;FALSE;FALSE;''; c(2):=q''#"MARTIN";#''; c(3):=q''#"BIGTAB_PART";#''; c(4):=q''#"SYS_P1250";#''; c(5):=''0;''; c(6):=''FALSE;''; c(7):=q''#FOR ALL
COLUMNS SIZE AUTO;#''; c(8):=''NULL;''; c(9):=''PARTITION;''; c(10):=''TRUE;''; c(11):=q''#NULL;#''; c(12):=q''#NULL;#''; c(13):=q''#NULL;#''; c(14):=''NULL;''; c(15):=''DATA;''; c(16):=''FALSE;''; c(17):=''GLOBAL AND PARTITION;''; c(18):=''TRUE;'';
c(19):=''0;''; c(20):='';''; c(21):=''FALSE;''; c(22):=''FALSE;''; c(23):=''4;''; c(24):=''ST$T3982;''; c(25):=''TRUE;''; c(26):=''FALSE;''; c(27):=''4.562199;''; c(28):=''228.109937;''; c(29):=''0;''; c(30):=''0;''; dbms_stats.gather_table_stats('''
','''',context=>c); end;'
, number_of_arguments=>0,
start_date=>NULL, repeat_interval=>
NULL
, end_date=>NULL,
job_class=>'"CONC_ST$T3982"', enabled=>FALSE, auto_drop=>TRUE,comments=>
'"MARTIN"."BIGTAB_PART"."SYS_P1250"'
);
dbms_scheduler.set_attribute('"ST$T3982_1"','raise_events',38);
dbms_scheduler.enable('"ST$T3982_1"');
COMMIT;
END;

This looks better when you parse it a bit more so that it reads:

DECLARE
  c dbms_stats.CContext := dbms_stats.CContext();
BEGIN
  c.extend(30);
  c(1) :='ST$T3982_1;1;855;1;FALSE;FALSE;';
  c(2) :=q'#"MARTIN";#';
  c(3) :=q'#"BIGTAB_PART";#';
  c(4) :=q'#"SYS_P1250";#';
  c(5) :='0;';
  c(6) :='FALSE;';
  c(7) :=q'#FOR ALL COLUMNS SIZE
AUTO;#';
  c(8) :='NULL;';
  c(9) :='PARTITION;';
  c(10):='TRUE;';
  c(11):=q'#NULL;#';
  c(12):=q'#NULL;#';
  c(13):=q'#NULL;#';
  c(14):='NULL;';
  c(15):='DATA;';
  c(16):='FALSE;';
  c(17):='GLOBAL AND PARTITION;';
  c(18):='TRUE;';
  c(19):='0;';
  c(20):=';';
  c(21):='FALSE;';
  c(22):='FALSE;';
  c(23):='4;';
  c(24):='ST$T3982;';
  c(25):='TRUE;';
  c(26):='FALSE;';
  c(27):='4.562199;';
  c(28):='228.109937;';
  c(29):='0;';
  c(30):='0;';
  dbms_stats.gather_table_stats('','',context=>c);
END;

Interesting!

The concurrent statistics gathering works for incremental statistics as well and is a good way to get the initial set of stats it needs anyway (see Doc ID 1541543.1 for background on incremental statistics gathering).

The end result are statistics!

SQL> select partition_name, num_rows, sample_size, last_analyzed
  2   from user_tab_partitions where table_name = 'BIGTAB_PART';

PARTITION_NAME                   NUM_ROWS SAMPLE_SIZE LAST_ANAL
------------------------------ ---------- ----------- ---------
SYS_P1254                              32          32 07-NOV-14
SYS_P1253                        25600000    25600000 07-NOV-14
SYS_P1252                        25600000    25600000 07-NOV-14
SYS_P1251                        25600000    25600000 07-NOV-14
SYS_P1250                        25600000    25600000 07-NOV-14
SYS_P1249                        25600000    25600000 07-NOV-14
SYS_P1248                        25600000    25600000 07-NOV-14
SYS_P1247                        25600000    25600000 07-NOV-14
SYS_P1246                        25600000    25600000 07-NOV-14
SYS_P1245                        25600000    25600000 07-NOV-14
P_MAN                            25599968    25599968 07-NOV-14

11 rows selected.

Stats not gathered concurrently if partitions are small

This all looks pretty straight forward, except that it wasn’t when I first tried. I set all the parameters as documented but still had no jobs running in the background. The first table I tried to gather statistics on concurrently was partitioned exactly like BIGTAB_PART with the difference that all partitions where compressed for QUERY HIGH. This means that a lot less space is taken for each. I got an idea that segment_size might be reladed, so I compressed BIGTAB_PART and here is the result:

SQL> select partition_name, bytes/power(1024,2) m from user_segments 
  2   where segment_name = 'BIGTAB_PART';

PARTITION_NAME                                   M
------------------------------ -------------------
P_MAN                                       321.75
SYS_P1245                                   321.00
SYS_P1246                                   322.50
SYS_P1247                                   321.50
SYS_P1248                                   320.75
SYS_P1249                                   322.25
SYS_P1250                                   322.25
SYS_P1251                                   322.25
SYS_P1252                                   322.25
SYS_P1253                                   322.00
SYS_P1254                                     8.25

11 rows selected.

When trying to gather statistics concurrently (without changing the table preferences at all), this did not happen. So remember next time you gather statistics concurrently and don’t see multiple jobs for intra-table statistics gathering that your partitions might actually be too small.

Posted in 11g Release 2, 12c Release 1, Linux | 2 Comments »

Little things worth knowing: ORACLE_SID, instance_name and the database names in Clusterware

Posted by Martin Bach on November 4, 2014

If you ever wanted to know how Clusterware works with registered database resources, read on! It takes a little while to get your head around the concepts of the ORACLE_SID, the instance_name and the database name as well. And how Clusterware deals with all of them. Although this post has been written on 12.1.0.2.0 on Linux, it should be applicable to 11.2 Clusterware as well. Oh and by Clusterware I mean Grid Infrastructure of course ;)

Why would you want to care?

Most deployments I have seen use the simple formula: ORACLE_SID = instance_name = db_name, especially in single instance deployments. RAC One Node and RAC databases are slightly different as their instances are usually named db_name where is the n-th instance in the cluster. What however, if you want to have separate SID, instance name and database names? I keep things simple for now and don’t throw in a different db_unique_name…

Assume for the first part of the article that you have created a database manually, e.g. not using dbca. In this case you define the database properties using the ORACLE_SID, the init.ora file (for db_name and instance_name) as well as the create database statement. Here is an example for my database, DB:

[oracle@oraclelinux ~]$ grep DB /etc/oratab
DB1:/u01/app/oracle/product/12.1.0.2/dbhome_1:N

And for the most basic init.ora:

[oracle@oraclelinux ~]$ cat /u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/initDB1.ora

db_name = DB
instance_name = DB1

db_create_file_dest = '+data'
compatible = 11.2.0.0.0
undo_management=AUTO
db_block_size = 8192
db_files = 300
processes = 500
sga_target=2G
filesystemio_options=setall
recyclebin = off
audit_file_dest=/u01/app/oracle/admin/DB/adump
control_files='+DATA/DB/CONTROLFILE/current.257.862665191'
diagnostic_dest=/u01/app/oracle

So let’s start with the SID

Question 1: can you easily change the ORACLE_SID? Answer is: yes you can. From the point of view of starting the database, the SID is used to look up configuration. Let’s change the ORACLE_SID to DBN (keeping the correct ORACLE_HOME) instead of DB1. Without changing any of the configuration files you will get this:

[oracle@oraclelinux ~]$ export ORACLE_SID=DBN
[oracle@oraclelinux ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Tue Nov 4 07:34:42 2014

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file '/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/initDBN.ora'
SQL>

What if we change these?

[oracle@oraclelinux dbs]$ mv initDB1.ora initDBN.ora
[oracle@oraclelinux dbs]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Tue Nov 4 07:36:17 2014

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area 2147483648 bytes
Fixed Size		    2926472 bytes
Variable Size		 1677723768 bytes
Database Buffers	  452984832 bytes
Redo Buffers		   13848576 bytes
Database mounted.
Database opened.
SQL>

Checking the background processes you can see that they are named …DBN:

[oracle@oraclelinux dbs]$ ps -ef | grep smon
oracle    1519     1  0 07:02 ?        00:00:00 asm_smon_+ASM
oracle    5073     1  0 07:36 ?        00:00:00 ora_smon_DBN
oracle    5425  1586  0 07:37 pts/0    00:00:00 grep smon
[oracle@oraclelinux dbs]$

Now in this case we have an instance_name of DB1, and an ORACLE_SID of DBN instead of DB1. The database name is still set to DB.

Registering a resource in Clusterware

Question 2: How can you register such a construct in Clusterware? This is an Oracle Restart configuration so I can’t show you how it works in RAC, but I can show you the registration and management for the single instance deployment. Keeping the above settings, you can use the srvctl add database command to add the database to Clusterware. Here is an example. If you used DBCA to create the database, it would use the same technique. Note that I have updated /etc/oratab, replacing DB1 with DBN to reflect the new ORACLE_SID.

[oracle@oraclelinux ~]$ srvctl add database -h

Adds a database configuration to be managed by Oracle Restart.

Usage: srvctl add database -db <db_unique_name> -oraclehome <oracle_home> [-domain <domain_name>] [-spfile <spfile>] [-pwfile <password_file_path>] [-role {PRIMARY | PHYSICAL_STANDBY | LOGICAL_STANDBY | SNAPSHOT_STANDBY | FAR_SYNC}] [-startoption <start_options>] [-stopoption <stop_options>] [-dbname <db_name>] [-instance <inst_name>] [-policy {AUTOMATIC | MANUAL | NORESTART}] [-diskgroup "<diskgroup_list>"]
    -db <db_unique_name>           Unique name for the database
    -oraclehome <path>             Oracle home path
    -domain <domain>               Domain for database. Must be set if database has DB_DOMAIN set.
    -spfile <spfile>               Server parameter file path
    -pwfile <password_file_path>   Password file path
    -role <role>                   Role of the database (PRIMARY, PHYSICAL_STANDBY, LOGICAL_STANDBY, SNAPSHOT_STANDBY, FAR_SYNC)
    -startoption <start_options>   Startup options for the database. Examples of startup options are OPEN, MOUNT, or "READ ONLY".
    -stopoption <stop_options>     Stop options for the database. Examples of shutdown options are NORMAL, TRANSACTIONAL, IMMEDIATE, or ABORT.
    -dbname <db_name>              Database name (DB_NAME), if different from the unique name given by the -db option
    -instance <inst_name>          Instance name
    -policy <dbpolicy>             Management policy for the database (AUTOMATIC, MANUAL, or NORESTART)
    -diskgroup "<diskgroup_list>"  Comma separated list of disk group names
    -verbose                       Verbose output
    -help                          Print usage

[oracle@oraclelinux ~]$ srvctl add database -db DB -oraclehome /u01/app/oracle/product/12.1.0.2/dbhome_1 \
> -pwfile /u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/initDBN.ora -role PRIMARY \
> -instance DBN -policy MANUAL -diskgroup "DATA,RECO"

Now the database is registered and I can query it’s configuration:

[oracle@oraclelinux ~]$ srvctl config database -d DB
Database unique name: DB
Database name:
Oracle home: /u01/app/oracle/product/12.1.0.2/dbhome_1
Oracle user: oracle
Spfile:
Password file: /u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/initDBN.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: MANUAL
Disk Groups: DATA,RECO
Services:
OSDBA group:
OSOPER group:
Database instance: DBN

How does Clusterware connect to my database?

Question 3: How does Clusterware connect to the database to perform administrative tasks such as shutting it down and starting it up?

The answer to that question is found in the oraagent log file. Its location changed in 12c. In 11.2 you found all relevant logs in $GRID_HOME/log/$(hostname -s). In 12c they moved to the diagnostic_dest. In my case the oracle account “owns” Clusterware and the RDBMS stack so that’ll be in $ORACLE_BASE/diag. The complete path to the agent log in my environment is /u01/app/oracle/diag/crs/oraclelinux/crs/trace, and the file to check is ohasd_oraagent_oracle.trc. In RAC the file locations are different-in Oracle Restart there isn’t a CRSD process.

Since this file gets a little verbose even with the low log levels, it’s a good idea to note down when you were performing your test. In the output to follow I skipped all the bits that aren’t relevant for this discussion, otherwise you probably wouldn’t see the wood for the trees.

2014-11-04 07:49:11.128239 :    AGFW:3548362496: {0:0:1726} Agent received the message: RESOURCE_START[ora.db.db 1 1] ID 4098:4393

2014-11-04 07:49:11.128266 :    AGFW:3548362496: {0:0:1726} ora.db.db 1 1 state changed from: UNKNOWN to: STARTING

2014-11-04 07:49:11.130019 :CLSDYNAM:3552564992: [ora.db.db]{0:0:1726} [start] clsnUtils::cmdIdIsStart CmdId:257
2014-11-04 07:49:11.130051 :CLSDYNAM:3552564992: [ora.db.db]{0:0:1726} [start] DbAgent:getOracleSid 13 oracle_sid = DBN
2014-11-04 07:49:11.130059 :CLSDYNAM:3552564992: [ora.db.db]{0:0:1726} [start] DbAgent:getOracleSid  oracle_sid = DBN
2014-11-04 07:49:11.130121 :CLSDYNAM:3552564992: [ora.db.db]{0:0:1726} [start] Utils::getOracleHomeAttrib oracle_home:/u01/app/oracle/product/12.1.0.2/dbhome_1
2014-11-04 07:49:11.130213 :CLSDYNAM:3552564992: [ora.db.db]{0:0:1726} [start] ConnectionPool::ConnectionPool 2 m_oracleHome:/u01/app/oracle/product/12.1.0.2/dbhome_1, m_oracleSid:DBN, m_usrOraEnv:  m_instanceType:1 m_instanceVersion:12.1.0.2.0
2014-11-04 07:49:11.130254 :CLSDYNAM:3552564992: [ora.db.db]{0:0:1726} [start] DbAgent::setOracleSidAttrib updating GEN_USR_ORA_INST_NAME to DBN
2014-11-04 07:49:11.130267 :CLSDYNAM:3552564992: [ora.db.db]{0:0:1726} [start] clsnUtils::setResAttrib nonPerX current value GEN_USR_ORA_INST_NAME value 

2014-11-04 07:49:11.132339 :CLSDYNAM:3552564992: [ora.db.db]{0:0:1726} [start] ConfigFile::parse mmap name:prod nameWithCase:prod value:/u01/app/oracle/product/12.1.0.2/dbhome_1:N comment:line added by Agent StartCompleted

2014-11-04 07:49:11.135691 :CLSDYNAM:3552564992: [ora.db.db]{0:0:1726} [start] ConfigFile::parse mmap name:dbn nameWithCase:DBN value:/u01/app/oracle/product/12.1.0.2/dbhome_1:N comment:line added by Agent

2014-11-04 07:49:11.166499 :CLSDYNAM:3552564992: [ora.db.db]{0:0:1726} [start] start dependency = hard(ora.DATA.dg,ora.RECO.dg) weak(type:ora.listener.type,uniform:ora.ons)

2014-11-04 07:49:11.269849 :CLSDYNAM:3552564992: [ora.db.db]{0:0:1726} [start] DbAgent:getConnection oracleHome:/u01/app/oracle/product/12.1.0.2/dbhome_1 oracleSid:DBN usrOraEnv: resVersion:12.1.0.2.0
2014-11-04 07:49:11.269958 :CLSDYNAM:3552564992: [ora.db.db]{0:0:1726} [start] InstConnection:InstConnection: init:c80b40b0 oracleHome:/u01/app/oracle/product/12.1.0.2/dbhome_1 oracleSid:DBN instanceType:1 instanceVersion:12.1.0.2.0
2014-11-04 07:49:11.270631 :CLSDYNAM:3552564992: [ora.db.db]{0:0:1726} [start] clsnInstConnection::makeConnectStr UsrOraEnv ,ORACLE_BASE= m_oracleHome /u01/app/oracle/product/12.1.0.2/dbhome_1 Crshome /u01/app/oracle/product/12.1.0.2/grid
2014-11-04 07:49:11.271977 :CLSDYNAM:3552564992: [ora.db.db]{0:0:1726} [start] makeConnectStr = (DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle)(ARGV0=oracleDBN)(ENVS='ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1,ORACLE_SID=DBN,LD_LIBRARY_PATH=,ORACLE_BASE=')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))')(CONNECT_DATA=(SID=DBN))))
2014-11-04 07:49:11.275192 :CLSDYNAM:3552564992: [ora.db.db]{0:0:1726} [start] Container:start oracle home /u01/app/oracle/product/12.1.0.2/dbhome_1
2014-11-04 07:49:11.275657 :CLSDYNAM:3552564992: [ora.db.db]{0:0:1726} [start] InstConnection::connectInt: server not attached
2014-11-04 07:49:12.315345 :CLSDYNAM:3552564992: [ora.db.db]{0:0:1726} [start] InstConnection:connectInt connected

2014-11-04 07:49:24.231865 :    AGFW:3548362496: {0:0:1726} ora.db.db 1 1 state changed from: STARTING to: ONLINE

This should be enough information to get us started. Now what’s happening here? First of all the Agent Framework (AGFW) receives the command to start the database. Oracle Clusterware manages resources, and the database is one of these. Each resource to be managed has to have a profile in Clusterware. The profile determines what Clusterware does with a resource. The profile for the DB resource is shown here for reference.

[oracle@oraclelinux ~]$ crsctl stat res ora.db.db -p
NAME=ora.db.db
TYPE=ora.database.type
ACL=owner:oracle:rwx,pgrp:asmdba:r-x,other::r--,group:oinstall:r-x,user:oracle:rwx
ACTIONS=startoption,group:"asmdba",user:"oracle",group:"oinstall"
ACTION_SCRIPT=
ACTION_START_OPTION=
ACTION_TIMEOUT=600
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
AUTO_START=never
CARDINALITY=1
CHECK_INTERVAL=1
CHECK_TIMEOUT=30
CLEAN_TIMEOUT=60
CLUSTER_DATABASE=false
DATABASE_TYPE=SINGLE
DB_UNIQUE_NAME=DB
DEGREE=1
DELETE_TIMEOUT=60
DESCRIPTION=Oracle Database resource
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=60
FAILURE_THRESHOLD=1
GEN_AUDIT_FILE_DEST=/u01/app/oracle/admin/DB/adump
GEN_START_OPTIONS=
GEN_USR_ORA_INST_NAME=DBN
HOSTING_MEMBERS=
INSTANCE_FAILOVER=1
INTERMEDIATE_TIMEOUT=0
LOAD=1
LOGGING_LEVEL=1
MANAGEMENT_POLICY=MANUAL
MODIFY_TIMEOUT=60
NLS_LANG=
OFFLINE_CHECK_INTERVAL=0
ONLINE_RELOCATION_TIMEOUT=0
ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1
ORACLE_HOME_OLD=
OS_CRASH_THRESHOLD=0
OS_CRASH_UPTIME=0
PLACEMENT=balanced
PWFILE=/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/initDBN.ora
RANK=0
RELOCATE_ACTION=0
RELOCATE_BY_DEPENDENCY=1
RESTART_ATTEMPTS=1
ROLE=PRIMARY
SCRIPT_TIMEOUT=60
SERVER_CATEGORY=
SERVER_POOLS=
SERVER_POOLS_PQ=
SPFILE=
START_CONCURRENCY=0
START_DEPENDENCIES=hard(ora.DATA.dg,ora.RECO.dg) weak(type:ora.listener.type,uniform:ora.ons)
START_TIMEOUT=600
STOP_CONCURRENCY=0
STOP_DEPENDENCIES=hard(intermediate:ora.asm,shutdown:ora.DATA.dg,shutdown:ora.RECO.dg)
STOP_TIMEOUT=600
TYPE_VERSION=3.3
UPTIME_THRESHOLD=1h
USER_WORKLOAD=yes
USE_STICKINESS=0
USR_ORA_DB_NAME=
USR_ORA_DOMAIN=
USR_ORA_ENV=
USR_ORA_FLAGS=
USR_ORA_INST_NAME=DBN
USR_ORA_OPEN_MODE=open
USR_ORA_OPI=false
USR_ORA_STOP_MODE=immediate

Back to the start sequence. After the start command has been received the configuration is looked up in /etc/oratab. Any start dependencies (disk groups DATA and RECO, the listener and the ONS daemon as shown in the profile) are validated and satisfied if needed. The most interesting information is found in makeConnectStr: using the ORACLE_SID and the associated information (ORACLE_HOME) the connection string is dynamically created:

2014-11-04 07:49:11.271977 :CLSDYNAM:3552564992: [ora.db.db]{0:0:1726} [start] makeConnectStr = (DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u01/app/oracle/prod
uct/12.1.0.2/dbhome_1/bin/oracle)(ARGV0=oracleDBN)(ENVS='ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1,ORACLE_SID=DBN,LD_LIBRARY_PATH=,ORACLE_BASE=')(ARG
S='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))')(CONNECT_DATA=(SID=DBN))))

This will be used to connect to the database instance and start it up. Oracle isn’t able to magically correct any configuration errors. If you define the instance to be DB1 in the srvctl add database command Clusterware won’t be able to start the instance. Just like you aren’t able to start it with SQL*Plus either. The oraagent_oracle.log file on RAC and the ohasd_oraagent_oracle.trc are good starting point to check for issues. Another useful flag is SRVM_TRACE:

[oracle@oraclelinux ~]$ grep SRVM_TRACE $(which srvctl)
if [ "X${SRVM_TRACE}" != "X" ]
  if [ "${SRVM_TRACE}" != "false" ]  && [ "${SRVM_TRACE}" != "FALSE" ]

Applied, this shows some rather verbose output but can help you in cases the resource won’t start. I’ll close this article with a traced call to get the status of the DB database and leave the interpretation of the output as an exercise to the reader.

[oracle@oraclelinux ~]$ SRVM_TRACE=TRUE srvctl status database -db DB
[main] [ 2014-11-04 08:11:36.482 GMT ] [OPSCTLDriver.setInternalDebugLevel:325]  tracing is true at level 2 to file null
[main] [ 2014-11-04 08:11:36.488 GMT ] [OPSCTLDriver.main:162]  SRVCTL arguments : args[0]=status args[1]=database args[2]=-db args[3]=DB
[main] [ 2014-11-04 08:11:36.490 GMT ] [SRVMContext.init:114]  Performing SRVM Context init. Init Counter=1
[main] [ 2014-11-04 08:11:36.527 GMT ] [Library.load:194]  library.load
[main] [ 2014-11-04 08:11:36.529 GMT ] [sPlatform.isHybrid:66]  osName=Linux osArch=amd64 JVM=64 rc=false
[main] [ 2014-11-04 08:11:36.530 GMT ] [sPlatform.isHybrid:66]  osName=Linux osArch=amd64 JVM=64 rc=false
[main] [ 2014-11-04 08:11:36.531 GMT ] [Library.load:271]  Loading  library /u01/app/oracle/product/12.1.0.2/dbhome_1/lib/libsrvmhas12.so
[main] [ 2014-11-04 08:11:36.546 GMT ] [SRVMContext.init:131]  SRVM Context init-ed
[main] [ 2014-11-04 08:11:36.568 GMT ] [Version.isPre:610]  version to be checked 12.1.0.2.0 major version to check against 10
[main] [ 2014-11-04 08:11:36.574 GMT ] [Version.isPre:621]  isPre.java: Returning FALSE
[main] [ 2014-11-04 08:11:36.580 GMT ] [OCR.loadLibrary:328]
 Inside constructor of OCR
[main] [ 2014-11-04 08:11:36.584 GMT ] [OCR.loadLibrary:336]  Going to load the ocr library
[main] [ 2014-11-04 08:11:36.585 GMT ] [Library.load:194]  library.load
[main] [ 2014-11-04 08:11:36.585 GMT ] [sPlatform.isHybrid:66]  osName=Linux osArch=amd64 JVM=64 rc=false
[main] [ 2014-11-04 08:11:36.587 GMT ] [sPlatform.isHybrid:66]  osName=Linux osArch=amd64 JVM=64 rc=false
[main] [ 2014-11-04 08:11:36.588 GMT ] [Library.load:271]  Loading  library /u01/app/oracle/product/12.1.0.2/dbhome_1/lib/libsrvmocr12.so
[main] [ 2014-11-04 08:11:36.589 GMT ] [OCR.loadLibrary:338]  loaded ocr libraries
[main] [ 2014-11-04 08:11:36.590 GMT ] [SRVMContext.init:114]  Performing SRVM Context init. Init Counter=2
[main] [ 2014-11-04 08:11:36.590 GMT ] [OCR.isCluster:1000]  Calling OCRNative for isCluster()
[main] [ 2014-11-04 08:11:36.592 GMT ] [OCR.isCluster:1004]  OCR Result status = true
[main] [ 2014-11-04 08:11:36.593 GMT ] [OCR.isCluster:1016]  Bolean result = false
[main] [ 2014-11-04 08:11:36.594 GMT ] [CommandLineParser.<init>:488]  Is this a cluster? : false
[main] [ 2014-11-04 08:11:36.599 GMT ] [CommandLineParser.parse:678]  parsing cmdline args
[main] [ 2014-11-04 08:11:36.601 GMT ] [CommandLineParser.antlrParse:500]  parsing cmdline args using antlr
[main] [ 2014-11-04 08:11:36.603 GMT ] [CommandLineParser.antlrParse:558]  input args are status database -db %OPTVAL%DB
[main] [ 2014-11-04 08:11:36.641 GMT ] [SrvExprParser.optverb:826]  antlr: Match verb
[main] [ 2014-11-04 08:11:36.642 GMT ] [SrvExprParser.optnoun:873]  antlr: Match noun
[main] [ 2014-11-04 08:11:36.644 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for output
[main] [ 2014-11-04 08:11:36.644 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:36.645 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for compatible
[main] [ 2014-11-04 08:11:36.645 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:36.646 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for HELP
[main] [ 2014-11-04 08:11:36.647 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:36.648 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for help
[main] [ 2014-11-04 08:11:36.648 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:36.649 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for file
[main] [ 2014-11-04 08:11:36.649 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:36.650 GMT ] [SrvExprParser.optval:906]  antlr: Call initOptVals for status database
[main] [ 2014-11-04 08:11:36.650 GMT ] [SrvExprParser.optval:946]  antlr: Getting OPTION VALUE
[main] [ 2014-11-04 08:11:36.651 GMT ] [SrvExprParser.optval:948]  antlr: Set -db value: %OPTVAL%DB
[main] [ 2014-11-04 08:11:36.652 GMT ] [CmdOptVals.checkAndSet:734]  Calling checkAndSet for -db = -db
[main] [ 2014-11-04 08:11:36.653 GMT ] [CmdOptVals.checkAndSet:802]  Calling checkAndSet for OptEnum db = DB
[main] [ 2014-11-04 08:11:36.654 GMT ] [OptVal.check:180]  Value DB for option db will be set.
[main] [ 2014-11-04 08:11:36.655 GMT ] [OptVal.check:221]  db = DB
[main] [ 2014-11-04 08:11:36.655 GMT ] [SrvExprParser.cmdline:529]  antlr: Set verb and noun and options
[main] [ 2014-11-04 08:11:36.656 GMT ] [SrvExprParser.srvrule:219]  antlr: Return optvals
[main] [ 2014-11-04 08:11:36.656 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for file
[main] [ 2014-11-04 08:11:36.657 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:36.658 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for help
[main] [ 2014-11-04 08:11:36.658 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:36.659 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for db
[main] [ 2014-11-04 08:11:36.660 GMT ] [OptVal.isSet:150]  m_value is DB
[main] [ 2014-11-04 08:11:36.660 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for database
[main] [ 2014-11-04 08:11:36.661 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:36.661 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for db
[main] [ 2014-11-04 08:11:36.662 GMT ] [OptVal.isSet:150]  m_value is DB
[main] [ 2014-11-04 08:11:36.663 GMT ] [CommandLineParser.checkOptionalCombOptions:813]  Loop for combList(0)
[main] [ 2014-11-04 08:11:36.663 GMT ] [CommandLineParser.checkOptionalCombOptions:815]  Set isMatched to true
[main] [ 2014-11-04 08:11:36.664 GMT ] [CommandLineParser.checkOptionalCombOptions:829]  Check option -db
[main] [ 2014-11-04 08:11:36.665 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for db
[main] [ 2014-11-04 08:11:36.666 GMT ] [OptVal.isSet:150]  m_value is DB
[main] [ 2014-11-04 08:11:36.666 GMT ] [CommandLineParser.checkOptionalCombOptions:813]  Loop for combList(1)
[main] [ 2014-11-04 08:11:36.667 GMT ] [CommandLineParser.checkOptionalCombOptions:815]  Set isMatched to true
[main] [ 2014-11-04 08:11:36.667 GMT ] [CommandLineParser.checkOptionalCombOptions:829]  Check option -thisversion
[main] [ 2014-11-04 08:11:36.668 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for thisversion
[main] [ 2014-11-04 08:11:36.668 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:36.669 GMT ] [CommandLineParser.checkOptionalCombOptions:846]  Set isMatched to false
[main] [ 2014-11-04 08:11:36.671 GMT ] [CommandLineParser.checkOptionalCombOptions:813]  Loop for combList(2)
[main] [ 2014-11-04 08:11:36.672 GMT ] [CommandLineParser.checkOptionalCombOptions:815]  Set isMatched to true
[main] [ 2014-11-04 08:11:36.672 GMT ] [CommandLineParser.checkOptionalCombOptions:829]  Check option -thishome
[main] [ 2014-11-04 08:11:36.673 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for thishome
[main] [ 2014-11-04 08:11:36.674 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:36.675 GMT ] [CommandLineParser.checkOptionalCombOptions:846]  Set isMatched to false
[main] [ 2014-11-04 08:11:36.675 GMT ] [CmdOptVals.checkRefChoices:1066]  in checkRefChoices
[main] [ 2014-11-04 08:11:36.676 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for output
[main] [ 2014-11-04 08:11:36.677 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:36.678 GMT ] [OPSCTLDriver.main:215]  Attempting to register interrupt handler for OPSCTL driver
[main] [ 2014-11-04 08:11:36.681 GMT ] [InterruptHandler.registerInterruptHandler:60]  Registering shutdown hook.....
[main] [ 2014-11-04 08:11:36.682 GMT ] [InterruptHandler.registerInterruptHandler:62]  Registered shutdown hook.....
[main] [ 2014-11-04 08:11:36.683 GMT ] [SRVCTLHandler.registerInterruptHandler:86]  Registered interrupt handler for OPSCTL driver
[main] [ 2014-11-04 08:11:36.684 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for compatible
[main] [ 2014-11-04 08:11:36.684 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:36.685 GMT ] [OPSCTLDriver.execute:345]  checking for -H option: print verbose command syntax
[main] [ 2014-11-04 08:11:36.685 GMT ] [CmdOptVals.isOptionSet:917]  Check isOptionSet for HELP
[main] [ 2014-11-04 08:11:36.687 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for help
[main] [ 2014-11-04 08:11:36.687 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:36.688 GMT ] [CmdOptVals.isOptionSet:917]  Check isOptionSet for HELP_H
[main] [ 2014-11-04 08:11:36.688 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for HELP
[main] [ 2014-11-04 08:11:36.689 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:36.711 GMT ] [CRSCache.<clinit>:74]  CRSCache: useCache is false
[main] [ 2014-11-04 08:11:36.712 GMT ] [OPSCTLDriver.execute:411]  executing 2-word command verb=6 noun=101
[main] [ 2014-11-04 08:11:36.746 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for db
[main] [ 2014-11-04 08:11:36.748 GMT ] [OptVal.isSet:150]  m_value is DB
[main] [ 2014-11-04 08:11:36.763 GMT ] [DatabaseFactoryImpl.getInstance:320]  s_instance is null
[main] [ 2014-11-04 08:11:36.765 GMT ] [DatabaseFactoryImpl.getInstance:323]  before returning object
[main] [ 2014-11-04 08:11:36.765 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for db
[main] [ 2014-11-04 08:11:36.766 GMT ] [OptVal.isSet:150]  m_value is DB
[main] [ 2014-11-04 08:11:36.767 GMT ] [DatabaseFactoryImpl.getDatabase:995]  Before calling get database with name and version
[main] [ 2014-11-04 08:11:36.815 GMT ] [HASContext.<init>:125]  moduleInit = 67
[main] [ 2014-11-04 08:11:36.816 GMT ] [SRVMContext.init:114]  Performing SRVM Context init. Init Counter=3
[main] [ 2014-11-04 08:11:36.816 GMT ] [HASContext.<init>:136]  Initialised SRVMContext
[main] [ 2014-11-04 08:11:36.882 GMT ] [EntityOperations.isEntityRegisteredCRSD:353]  entity: ora.db.db, type: 1, registered: true
[main] [ 2014-11-04 08:11:36.896 GMT ] [CRSCache.getAttributesFromCache:220]  updated: ora.db.db [<DATABASE_TYPE:SINGLE>, <ORACLE_HOME:/u01/app/oracle/product/12.1.0.2/dbhome_1>, <VERSION:12.1.0.2.0>]
[main] [ 2014-11-04 08:11:36.898 GMT ] [DatabaseFactoryImpl.internalGetDatabase:1115]  dbVersion4 = 12.1.0.2
[main] [ 2014-11-04 08:11:36.901 GMT ] [DatabaseFactoryImpl.internalGetDatabase:1129]  db type SINGLE
[main] [ 2014-11-04 08:11:36.905 GMT ] [Version.isPre:610]  version to be checked 12.1.0.2.0 major version to check against 11
[main] [ 2014-11-04 08:11:36.906 GMT ] [Version.isPre:621]  isPre.java: Returning FALSE
[main] [ 2014-11-04 08:11:36.907 GMT ] [Version.isPre:642]  version to be checked 12.1.0.2.0 major version to check against 11 minor version to check against 2
[main] [ 2014-11-04 08:11:36.908 GMT ] [Version.isPre:651]  isPre: Returning FALSE for major version check
[main] [ 2014-11-04 08:11:36.908 GMT ] [DatabaseFactoryImpl.getSingleInstanceDatabase:1923]  Getting single instance database for DB
[main] [ 2014-11-04 08:11:36.920 GMT ] [DatabaseAction.<init>:80]  Created Database Action
[main] [ 2014-11-04 08:11:36.932 GMT ] [CRSCache.getAttributesFromCache:220]  updated: ora.db.db [<DB_UNIQUE_NAME:DB>]
[main] [ 2014-11-04 08:11:36.933 GMT ] [DatabaseImpl.crsResource:5254]  calling parent's crsResource
[main] [ 2014-11-04 08:11:36.980 GMT ] [EntityOperations.isEntityRegisteredCRSD:353]  entity: ora.db.db, type: 1, registered: true
[main] [ 2014-11-04 08:11:36.987 GMT ] [CRSCache.getAttributesFromCache:229]  CRS cache: ora.db.db [<DATABASE_TYPE:SINGLE>]
[main] [ 2014-11-04 08:11:36.990 GMT ] [Version.isPre:610]  version to be checked 12.1.0.2.0 major version to check against 11
[main] [ 2014-11-04 08:11:36.991 GMT ] [Version.isPre:621]  isPre.java: Returning FALSE
[main] [ 2014-11-04 08:11:36.991 GMT ] [Version.isPre:691]  version to be checked 12.1.0.2.0 major version to check against 11 minor version to check against 2 patchset version to check against 2
[main] [ 2014-11-04 08:11:36.992 GMT ] [Version.isPre:701]  isPre: Returning FALSE for major version check
[main] [ 2014-11-04 08:11:36.993 GMT ] [CRSCache.getAttributesFromCRS:155]  CRS: name: ora.db.db, type 1, node: null
[main] [ 2014-11-04 08:11:36.994 GMT ] [CRSCache.getAttributesFromCRS:156]  attrs: all
[main] [ 2014-11-04 08:11:37.031 GMT ] [CRSCache.getAttributesFromCRS:163]  CRS: [<STATE:ONLINE>, <TARGET:ONLINE>, <ACL:owner:oracle:rwx,pgrp:asmdba:r-x,other::r--,group:oinstall:r-x,user:oracle:rwx>, <ACTIONS:startoption,group:"asmdba",user:"oracle",group:"oinstall">, <ACTION_FAILURE_TEMPLATE:>, <ACTION_SCRIPT:>, <ACTION_START_OPTION:>, <ACTION_TIMEOUT:600>, <ACTIVE_PLACEMENT:0>, <AGENT_FILENAME:%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%>, <ALERT_TEMPLATE:>, <ALIAS_NAME:>, <AUTO_START:never>, <CARDINALITY:1>, <CARDINALITY_ID:0>, <CHECK_INTERVAL:1>, <CHECK_TIMEOUT:30>, <CLEAN_TIMEOUT:60>, <CLUSTER_DATABASE:false>, <CONFIG_VERSION:1>, <CREATION_SEED:37>, <DATABASE_TYPE:SINGLE>, <DB_UNIQUE_NAME:DB>, <DEFAULT_TEMPLATE:PROPERTY(RESOURCE_CLASS=database) PROPERTY(DB_UNIQUE_NAME= CONCAT(PARSE(%NAME%, ., 2), %USR_ORA_DOMAIN%, .)) ELEMENT(INSTANCE_NAME= %GEN_USR_ORA_INST_NAME%) ELEMENT(DATABASE_TYPE= %DATABASE_TYPE%)>, <DEGREE:1>, <DEGREE_ID:0>, <DELETE_TIMEOUT:60>, <DESCRIPTION:Oracle Database resource>, <ENABLED:1>, <FAILOVER_DELAY:0>, <FAILURE_INTERVAL:60>, <FAILURE_THRESHOLD:1>, <GEN_AUDIT_FILE_DEST:/u01/app/oracle/admin/DB/adump>, <GEN_START_OPTIONS:>, <GEN_USR_ORA_INST_NAME:DBN>, <HOSTING_MEMBERS:>, <ID:ora.db.db>, <INSTANCE_COUNT:1>, <INSTANCE_FAILOVER:1>, <INTERMEDIATE_TIMEOUT:0>, <LOAD:1>, <LOGGING_LEVEL:1>, <MANAGEMENT_POLICY:MANUAL>, <MODIFY_TIMEOUT:60>, <NAME:ora.db.db>, <NLS_LANG:>, <NOT_RESTARTING_TEMPLATE:>, <OFFLINE_CHECK_INTERVAL:0>, <ONLINE_RELOCATION_TIMEOUT:0>, <ORACLE_HOME:/u01/app/oracle/product/12.1.0.2/dbhome_1>, <ORACLE_HOME_OLD:>, <OS_CRASH_THRESHOLD:0>, <OS_CRASH_UPTIME:0>, <PLACEMENT:balanced>, <PROFILE_CHANGE_TEMPLATE:>, <PWFILE:/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/initDBN.ora>, <RANK:0>, <RELOCATE_ACTION:0>, <RELOCATE_BY_DEPENDENCY:1>, <RESTART_ATTEMPTS:1>, <ROLE:PRIMARY>, <SCRIPT_TIMEOUT:60>, <SERVER_CATEGORY:>, <SERVER_POOLS:>, <SERVER_POOLS_PQ:>, <SPFILE:>, <START_CONCURRENCY:0>, <START_DEPENDENCIES:hard(ora.DATA.dg,ora.RECO.dg) weak(type:ora.listener.type,uniform:ora.ons)>, <START_TIMEOUT:600>, <STATE_CHANGE_TEMPLATE:>, <STOP_CONCURRENCY:0>, <STOP_DEPENDENCIES:hard(intermediate:ora.asm,shutdown:ora.DATA.dg,shutdown:ora.RECO.dg)>, <STOP_TIMEOUT:600>, <TYPE:ora.database.type>, <TYPE_VERSION:3.3>, <UPTIME_THRESHOLD:1h>, <USER_WORKLOAD:yes>, <USE_STICKINESS:0>, <USR_ORA_DB_NAME:>, <USR_ORA_DOMAIN:>, <USR_ORA_ENV:>, <USR_ORA_FLAGS:>, <USR_ORA_INST_NAME:DBN>, <USR_ORA_OPEN_MODE:open>, <USR_ORA_OPI:false>, <USR_ORA_STOP_MODE:immediate>, <VERSION:12.1.0.2.0>]
[main] [ 2014-11-04 08:11:37.032 GMT ] [DatabaseImpl.getDBInstances:2356]  name=STATE, value=ONLINE
[main] [ 2014-11-04 08:11:37.033 GMT ] [DatabaseImpl.getDBInstances:2356]  name=TARGET, value=ONLINE
[main] [ 2014-11-04 08:11:37.034 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ACL, value=owner:oracle:rwx,pgrp:asmdba:r-x,other::r--,group:oinstall:r-x,user:oracle:rwx
[main] [ 2014-11-04 08:11:37.035 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ACTIONS, value=startoption,group:"asmdba",user:"oracle",group:"oinstall"
[main] [ 2014-11-04 08:11:37.035 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ACTION_FAILURE_TEMPLATE, value=
[main] [ 2014-11-04 08:11:37.036 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ACTION_SCRIPT, value=
[main] [ 2014-11-04 08:11:37.037 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ACTION_START_OPTION, value=
[main] [ 2014-11-04 08:11:37.038 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ACTION_TIMEOUT, value=600
[main] [ 2014-11-04 08:11:37.038 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ACTIVE_PLACEMENT, value=0
[main] [ 2014-11-04 08:11:37.039 GMT ] [DatabaseImpl.getDBInstances:2356]  name=AGENT_FILENAME, value=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
[main] [ 2014-11-04 08:11:37.039 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ALERT_TEMPLATE, value=
[main] [ 2014-11-04 08:11:37.041 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ALIAS_NAME, value=
[main] [ 2014-11-04 08:11:37.042 GMT ] [DatabaseImpl.getDBInstances:2356]  name=AUTO_START, value=never
[main] [ 2014-11-04 08:11:37.042 GMT ] [DatabaseImpl.getDBInstances:2356]  name=CARDINALITY, value=1
[main] [ 2014-11-04 08:11:37.043 GMT ] [DatabaseImpl.getDBInstances:2356]  name=CARDINALITY_ID, value=0
[main] [ 2014-11-04 08:11:37.045 GMT ] [DatabaseImpl.getDBInstances:2356]  name=CHECK_INTERVAL, value=1
[main] [ 2014-11-04 08:11:37.047 GMT ] [DatabaseImpl.getDBInstances:2356]  name=CHECK_TIMEOUT, value=30
[main] [ 2014-11-04 08:11:37.047 GMT ] [DatabaseImpl.getDBInstances:2356]  name=CLEAN_TIMEOUT, value=60
[main] [ 2014-11-04 08:11:37.048 GMT ] [DatabaseImpl.getDBInstances:2356]  name=CLUSTER_DATABASE, value=false
[main] [ 2014-11-04 08:11:37.049 GMT ] [DatabaseImpl.getDBInstances:2356]  name=CONFIG_VERSION, value=1
[main] [ 2014-11-04 08:11:37.050 GMT ] [DatabaseImpl.getDBInstances:2356]  name=CREATION_SEED, value=37
[main] [ 2014-11-04 08:11:37.051 GMT ] [DatabaseImpl.getDBInstances:2356]  name=DATABASE_TYPE, value=SINGLE
[main] [ 2014-11-04 08:11:37.053 GMT ] [DatabaseImpl.getDBInstances:2356]  name=DB_UNIQUE_NAME, value=DB
[main] [ 2014-11-04 08:11:37.053 GMT ] [DatabaseImpl.getDBInstances:2356]  name=DEFAULT_TEMPLATE, value=PROPERTY(RESOURCE_CLASS=database) PROPERTY(DB_UNIQUE_NAME= CONCAT(PARSE(%NAME%, ., 2), %USR_ORA_DOMAIN%, .)) ELEMENT(INSTANCE_NAME= %GEN_USR_ORA_INST_NAME%) ELEMENT(DATABASE_TYPE= %DATABASE_TYPE%)
[main] [ 2014-11-04 08:11:37.054 GMT ] [DatabaseImpl.getDBInstances:2356]  name=DEGREE, value=1
[main] [ 2014-11-04 08:11:37.058 GMT ] [DatabaseImpl.getDBInstances:2356]  name=DEGREE_ID, value=0
[main] [ 2014-11-04 08:11:37.059 GMT ] [DatabaseImpl.getDBInstances:2356]  name=DELETE_TIMEOUT, value=60
[main] [ 2014-11-04 08:11:37.060 GMT ] [DatabaseImpl.getDBInstances:2356]  name=DESCRIPTION, value=Oracle Database resource
[main] [ 2014-11-04 08:11:37.061 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ENABLED, value=1
[main] [ 2014-11-04 08:11:37.061 GMT ] [DatabaseImpl.getDBInstances:2356]  name=FAILOVER_DELAY, value=0
[main] [ 2014-11-04 08:11:37.062 GMT ] [DatabaseImpl.getDBInstances:2356]  name=FAILURE_INTERVAL, value=60
[main] [ 2014-11-04 08:11:37.063 GMT ] [DatabaseImpl.getDBInstances:2356]  name=FAILURE_THRESHOLD, value=1
[main] [ 2014-11-04 08:11:37.063 GMT ] [DatabaseImpl.getDBInstances:2356]  name=GEN_AUDIT_FILE_DEST, value=/u01/app/oracle/admin/DB/adump
[main] [ 2014-11-04 08:11:37.064 GMT ] [DatabaseImpl.getDBInstances:2356]  name=GEN_START_OPTIONS, value=
[main] [ 2014-11-04 08:11:37.064 GMT ] [DatabaseImpl.getDBInstances:2356]  name=GEN_USR_ORA_INST_NAME, value=DBN
[main] [ 2014-11-04 08:11:37.066 GMT ] [DatabaseImpl.getDBInstances:2356]  name=HOSTING_MEMBERS, value=
[main] [ 2014-11-04 08:11:37.066 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ID, value=ora.db.db
[main] [ 2014-11-04 08:11:37.067 GMT ] [DatabaseImpl.getDBInstances:2356]  name=INSTANCE_COUNT, value=1
[main] [ 2014-11-04 08:11:37.068 GMT ] [DatabaseImpl.getDBInstances:2356]  name=INSTANCE_FAILOVER, value=1
[main] [ 2014-11-04 08:11:37.068 GMT ] [DatabaseImpl.getDBInstances:2356]  name=INTERMEDIATE_TIMEOUT, value=0
[main] [ 2014-11-04 08:11:37.069 GMT ] [DatabaseImpl.getDBInstances:2356]  name=LOAD, value=1
[main] [ 2014-11-04 08:11:37.069 GMT ] [DatabaseImpl.getDBInstances:2356]  name=LOGGING_LEVEL, value=1
[main] [ 2014-11-04 08:11:37.071 GMT ] [DatabaseImpl.getDBInstances:2356]  name=MANAGEMENT_POLICY, value=MANUAL
[main] [ 2014-11-04 08:11:37.072 GMT ] [DatabaseImpl.getDBInstances:2356]  name=MODIFY_TIMEOUT, value=60
[main] [ 2014-11-04 08:11:37.072 GMT ] [DatabaseImpl.getDBInstances:2356]  name=NAME, value=ora.db.db
[main] [ 2014-11-04 08:11:37.073 GMT ] [DatabaseImpl.getDBInstances:2356]  name=NLS_LANG, value=
[main] [ 2014-11-04 08:11:37.074 GMT ] [DatabaseImpl.getDBInstances:2356]  name=NOT_RESTARTING_TEMPLATE, value=
[main] [ 2014-11-04 08:11:37.075 GMT ] [DatabaseImpl.getDBInstances:2356]  name=OFFLINE_CHECK_INTERVAL, value=0
[main] [ 2014-11-04 08:11:37.076 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ONLINE_RELOCATION_TIMEOUT, value=0
[main] [ 2014-11-04 08:11:37.082 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ORACLE_HOME, value=/u01/app/oracle/product/12.1.0.2/dbhome_1
[main] [ 2014-11-04 08:11:37.083 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ORACLE_HOME_OLD, value=
[main] [ 2014-11-04 08:11:37.084 GMT ] [DatabaseImpl.getDBInstances:2356]  name=OS_CRASH_THRESHOLD, value=0
[main] [ 2014-11-04 08:11:37.085 GMT ] [DatabaseImpl.getDBInstances:2356]  name=OS_CRASH_UPTIME, value=0
[main] [ 2014-11-04 08:11:37.086 GMT ] [DatabaseImpl.getDBInstances:2356]  name=PLACEMENT, value=balanced
[main] [ 2014-11-04 08:11:37.086 GMT ] [DatabaseImpl.getDBInstances:2356]  name=PROFILE_CHANGE_TEMPLATE, value=
[main] [ 2014-11-04 08:11:37.087 GMT ] [DatabaseImpl.getDBInstances:2356]  name=PWFILE, value=/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/initDBN.ora
[main] [ 2014-11-04 08:11:37.087 GMT ] [DatabaseImpl.getDBInstances:2356]  name=RANK, value=0
[main] [ 2014-11-04 08:11:37.088 GMT ] [DatabaseImpl.getDBInstances:2356]  name=RELOCATE_ACTION, value=0
[main] [ 2014-11-04 08:11:37.089 GMT ] [DatabaseImpl.getDBInstances:2356]  name=RELOCATE_BY_DEPENDENCY, value=1
[main] [ 2014-11-04 08:11:37.090 GMT ] [DatabaseImpl.getDBInstances:2356]  name=RESTART_ATTEMPTS, value=1
[main] [ 2014-11-04 08:11:37.090 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ROLE, value=PRIMARY
[main] [ 2014-11-04 08:11:37.091 GMT ] [DatabaseImpl.getDBInstances:2356]  name=SCRIPT_TIMEOUT, value=60
[main] [ 2014-11-04 08:11:37.095 GMT ] [DatabaseImpl.getDBInstances:2356]  name=SERVER_CATEGORY, value=
[main] [ 2014-11-04 08:11:37.097 GMT ] [DatabaseImpl.getDBInstances:2356]  name=SERVER_POOLS, value=
[main] [ 2014-11-04 08:11:37.098 GMT ] [DatabaseImpl.getDBInstances:2356]  name=SERVER_POOLS_PQ, value=
[main] [ 2014-11-04 08:11:37.098 GMT ] [DatabaseImpl.getDBInstances:2356]  name=SPFILE, value=
[main] [ 2014-11-04 08:11:37.099 GMT ] [DatabaseImpl.getDBInstances:2356]  name=START_CONCURRENCY, value=0
[main] [ 2014-11-04 08:11:37.099 GMT ] [DatabaseImpl.getDBInstances:2356]  name=START_DEPENDENCIES, value=hard(ora.DATA.dg,ora.RECO.dg) weak(type:ora.listener.type,uniform:ora.ons)
[main] [ 2014-11-04 08:11:37.100 GMT ] [DatabaseImpl.getDBInstances:2356]  name=START_TIMEOUT, value=600
[main] [ 2014-11-04 08:11:37.101 GMT ] [DatabaseImpl.getDBInstances:2356]  name=STATE_CHANGE_TEMPLATE, value=
[main] [ 2014-11-04 08:11:37.101 GMT ] [DatabaseImpl.getDBInstances:2356]  name=STOP_CONCURRENCY, value=0
[main] [ 2014-11-04 08:11:37.102 GMT ] [DatabaseImpl.getDBInstances:2356]  name=STOP_DEPENDENCIES, value=hard(intermediate:ora.asm,shutdown:ora.DATA.dg,shutdown:ora.RECO.dg)
[main] [ 2014-11-04 08:11:37.103 GMT ] [DatabaseImpl.getDBInstances:2356]  name=STOP_TIMEOUT, value=600
[main] [ 2014-11-04 08:11:37.104 GMT ] [DatabaseImpl.getDBInstances:2356]  name=TYPE, value=ora.database.type
[main] [ 2014-11-04 08:11:37.105 GMT ] [DatabaseImpl.getDBInstances:2356]  name=TYPE_VERSION, value=3.3
[main] [ 2014-11-04 08:11:37.106 GMT ] [DatabaseImpl.getDBInstances:2356]  name=UPTIME_THRESHOLD, value=1h
[main] [ 2014-11-04 08:11:37.107 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USER_WORKLOAD, value=yes
[main] [ 2014-11-04 08:11:37.108 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USE_STICKINESS, value=0
[main] [ 2014-11-04 08:11:37.109 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USR_ORA_DB_NAME, value=
[main] [ 2014-11-04 08:11:37.110 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USR_ORA_DOMAIN, value=
[main] [ 2014-11-04 08:11:37.110 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USR_ORA_ENV, value=
[main] [ 2014-11-04 08:11:37.111 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USR_ORA_FLAGS, value=
[main] [ 2014-11-04 08:11:37.112 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USR_ORA_INST_NAME, value=DBN
[main] [ 2014-11-04 08:11:37.112 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USR_ORA_OPEN_MODE, value=open
[main] [ 2014-11-04 08:11:37.114 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USR_ORA_OPI, value=false
[main] [ 2014-11-04 08:11:37.114 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USR_ORA_STOP_MODE, value=immediate
[main] [ 2014-11-04 08:11:37.115 GMT ] [DatabaseImpl.getDBInstances:2356]  name=VERSION, value=12.1.0.2.0
[main] [ 2014-11-04 08:11:37.126 GMT ] [CRSCache.getAttributesFromCache:229]  CRS cache: ora.db.db [<DATABASE_TYPE:SINGLE>]
[main] [ 2014-11-04 08:11:37.130 GMT ] [CRSCache.getAttributesFromCRS:155]  CRS: name: ora.db.db, type 1, node: null
[main] [ 2014-11-04 08:11:37.131 GMT ] [CRSCache.getAttributesFromCRS:156]  attrs: [HOSTING_MEMBERS]
[main] [ 2014-11-04 08:11:37.134 GMT ] [CRSCache.getAttributesFromCRS:163]  CRS: [<HOSTING_MEMBERS:>]
[main] [ 2014-11-04 08:11:37.136 GMT ] [CRSCache.getAttributesFromCRS:155]  CRS: name: ora.db.db, type 1, node: null
[main] [ 2014-11-04 08:11:37.136 GMT ] [CRSCache.getAttributesFromCRS:156]  attrs: [SERVER_POOLS]
[main] [ 2014-11-04 08:11:37.158 GMT ] [CRSCache.getAttributesFromCRS:163]  CRS: [<SERVER_POOLS:>]
[main] [ 2014-11-04 08:11:37.169 GMT ] [CRSCache.getAttributesFromCRS:155]  CRS: name: ora.db.db, type 1, node: null
[main] [ 2014-11-04 08:11:37.170 GMT ] [CRSCache.getAttributesFromCRS:156]  attrs: [SERVER_POOLS]
[main] [ 2014-11-04 08:11:37.175 GMT ] [CRSCache.getAttributesFromCRS:163]  CRS: [<SERVER_POOLS:>]
[main] [ 2014-11-04 08:11:37.176 GMT ] [DatabaseImpl.serverPoolsHelper:5977]  list of server pool names =
[main] [ 2014-11-04 08:11:37.178 GMT ] [DatabaseImpl.crsResource:5254]  calling parent's crsResource
[main] [ 2014-11-04 08:11:37.201 GMT ] [EntityOperations.isEntityRegisteredCRSD:353]  entity: ora.db.db, type: 1, registered: true
[main] [ 2014-11-04 08:11:37.218 GMT ] [CRSNative.searchEntities:658]  found 0 entities
[main] [ 2014-11-04 08:11:37.218 GMT ] [EntityOperations.fetchRunningNodeNames:1161]  Looking for nodes: resource ora.db.db
[main] [ 2014-11-04 08:11:37.242 GMT ] [CRSNative.searchEntities:658]  found 1 entities
[main] [ 2014-11-04 08:11:37.244 GMT ] [EntityOperations.fetchRunningNodeNames:1193]  	riid : ora.db.db 1 1 node: oraclelinux, state ONLINE, istate STABLE
[main] [ 2014-11-04 08:11:37.251 GMT ] [DatabaseImpl.getDBInstances:2411]  Adding running node oraclelinux
[main] [ 2014-11-04 08:11:37.252 GMT ] [DatabaseImpl.getDBInstances:2415]  asList=[oraclelinux]
[main] [ 2014-11-04 08:11:37.252 GMT ] [DatabaseImpl.crsResource:5254]  calling parent's crsResource
[main] [ 2014-11-04 08:11:37.263 GMT ] [EntityOperations.isEntityRegisteredCRSD:353]  entity: ora.db.db, type: 1, registered: true
[main] [ 2014-11-04 08:11:37.286 GMT ] [CRSNative.searchEntities:658]  found 1 entities
[main] [ 2014-11-04 08:11:37.289 GMT ] [DatabaseImpl.getDBInstances:2421]  size of resultData = 1
[main] [ 2014-11-04 08:11:37.290 GMT ] [DatabaseImpl.getDBInstances:2454]  riID=ora.db.db 1 1 result={LAST_SERVER=oraclelinux}
[main] [ 2014-11-04 08:11:37.292 GMT ] [DatabaseImpl.getDBInstances:2481]  not getting any value from GEN
[main] [ 2014-11-04 08:11:37.292 GMT ] [DatabaseImpl.getDBInstances:2485]  both gen and config, instance name = DBN
[main] [ 2014-11-04 08:11:37.293 GMT ] [CRSCache.getAttributesFromCache:229]  CRS cache: ora.db.db [<DATABASE_TYPE:SINGLE>]
[main] [ 2014-11-04 08:11:37.294 GMT ] [CRSCache.getAttributesFromCRS:155]  CRS: name: ora.db.db, type 1, node: null
[main] [ 2014-11-04 08:11:37.295 GMT ] [CRSCache.getAttributesFromCRS:156]  attrs: [HOSTING_MEMBERS]
[main] [ 2014-11-04 08:11:37.304 GMT ] [CRSCache.getAttributesFromCRS:163]  CRS: [<HOSTING_MEMBERS:>]
[main] [ 2014-11-04 08:11:37.304 GMT ] [CRSCache.getAttributesFromCRS:155]  CRS: name: ora.db.db, type 1, node: null
[main] [ 2014-11-04 08:11:37.305 GMT ] [CRSCache.getAttributesFromCRS:156]  attrs: [SERVER_POOLS]
[main] [ 2014-11-04 08:11:37.327 GMT ] [CRSCache.getAttributesFromCRS:163]  CRS: [<SERVER_POOLS:>]
[main] [ 2014-11-04 08:11:37.340 GMT ] [Util.<init>:103]  Util Instance created.
[main] [ 2014-11-04 08:11:37.342 GMT ] [DatabaseImpl.getDBInstances:2543]  instName=DBN nodeName=oraclelinux
[main] [ 2014-11-04 08:11:37.346 GMT ] [CRSCache.getAttributesFromCRS:155]  CRS: name: ora.db.db, type 1, node: null
[main] [ 2014-11-04 08:11:37.347 GMT ] [CRSCache.getAttributesFromCRS:156]  attrs: [SERVER_POOLS_PQ]
[main] [ 2014-11-04 08:11:37.351 GMT ] [CRSCache.getAttributesFromCRS:163]  CRS: [<SERVER_POOLS_PQ:>]
[main] [ 2014-11-04 08:11:37.352 GMT ] [StatusAction.internalExecuteDatabase:491]  number of instances = 1
[main] [ 2014-11-04 08:11:37.353 GMT ] [StatusAction.internalExecuteDatabase:494]  instance name = DBN
[main] [ 2014-11-04 08:11:37.354 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for thisversion
[main] [ 2014-11-04 08:11:37.354 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:37.355 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for thishome
[main] [ 2014-11-04 08:11:37.356 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:37.357 GMT ] [DatabaseImpl.crsResource:5254]  calling parent's crsResource
[main] [ 2014-11-04 08:11:37.371 GMT ] [EntityOperations.isEntityRegisteredCRSD:353]  entity: ora.db.db, type: 1, registered: true
[main] [ 2014-11-04 08:11:37.398 GMT ] [CRSNative.searchEntities:658]  found 1 entities
[main] [ 2014-11-04 08:11:37.399 GMT ] [CRSNative.fetchStatusByNodesName:594]  	node: oraclelinux, status ONLINE, istate STABLE
[main] [ 2014-11-04 08:11:37.401 GMT ] [StatusAction$StatusDescription.<init>:4977]  StatusOnNode: node oraclelinux, value: RUNNING_ON_NODE
[main] [ 2014-11-04 08:11:37.402 GMT ] [Version.isPre:610]  version to be checked 12.1.0.2.0 major version to check against 11
[main] [ 2014-11-04 08:11:37.402 GMT ] [Version.isPre:621]  isPre.java: Returning FALSE
[main] [ 2014-11-04 08:11:37.403 GMT ] [Version.isPre:691]  version to be checked 12.1.0.2.0 major version to check against 11 minor version to check against 2 patchset version to check against 2
[main] [ 2014-11-04 08:11:37.404 GMT ] [Version.isPre:701]  isPre: Returning FALSE for major version check
[main] [ 2014-11-04 08:11:37.405 GMT ] [CRSCache.getAttributesFromCRS:155]  CRS: name: ora.db.db, type 1, node: null
[main] [ 2014-11-04 08:11:37.405 GMT ] [CRSCache.getAttributesFromCRS:156]  attrs: all
[main] [ 2014-11-04 08:11:37.424 GMT ] [CRSCache.getAttributesFromCRS:163]  CRS: [<STATE:ONLINE>, <TARGET:ONLINE>, <ACL:owner:oracle:rwx,pgrp:asmdba:r-x,other::r--,group:oinstall:r-x,user:oracle:rwx>, <ACTIONS:startoption,group:"asmdba",user:"oracle",group:"oinstall">, <ACTION_FAILURE_TEMPLATE:>, <ACTION_SCRIPT:>, <ACTION_START_OPTION:>, <ACTION_TIMEOUT:600>, <ACTIVE_PLACEMENT:0>, <AGENT_FILENAME:%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%>, <ALERT_TEMPLATE:>, <ALIAS_NAME:>, <AUTO_START:never>, <CARDINALITY:1>, <CARDINALITY_ID:0>, <CHECK_INTERVAL:1>, <CHECK_TIMEOUT:30>, <CLEAN_TIMEOUT:60>, <CLUSTER_DATABASE:false>, <CONFIG_VERSION:1>, <CREATION_SEED:37>, <DATABASE_TYPE:SINGLE>, <DB_UNIQUE_NAME:DB>, <DEFAULT_TEMPLATE:PROPERTY(RESOURCE_CLASS=database) PROPERTY(DB_UNIQUE_NAME= CONCAT(PARSE(%NAME%, ., 2), %USR_ORA_DOMAIN%, .)) ELEMENT(INSTANCE_NAME= %GEN_USR_ORA_INST_NAME%) ELEMENT(DATABASE_TYPE= %DATABASE_TYPE%)>, <DEGREE:1>, <DEGREE_ID:0>, <DELETE_TIMEOUT:60>, <DESCRIPTION:Oracle Database resource>, <ENABLED:1>, <FAILOVER_DELAY:0>, <FAILURE_INTERVAL:60>, <FAILURE_THRESHOLD:1>, <GEN_AUDIT_FILE_DEST:/u01/app/oracle/admin/DB/adump>, <GEN_START_OPTIONS:>, <GEN_USR_ORA_INST_NAME:DBN>, <HOSTING_MEMBERS:>, <ID:ora.db.db>, <INSTANCE_COUNT:1>, <INSTANCE_FAILOVER:1>, <INTERMEDIATE_TIMEOUT:0>, <LOAD:1>, <LOGGING_LEVEL:1>, <MANAGEMENT_POLICY:MANUAL>, <MODIFY_TIMEOUT:60>, <NAME:ora.db.db>, <NLS_LANG:>, <NOT_RESTARTING_TEMPLATE:>, <OFFLINE_CHECK_INTERVAL:0>, <ONLINE_RELOCATION_TIMEOUT:0>, <ORACLE_HOME:/u01/app/oracle/product/12.1.0.2/dbhome_1>, <ORACLE_HOME_OLD:>, <OS_CRASH_THRESHOLD:0>, <OS_CRASH_UPTIME:0>, <PLACEMENT:balanced>, <PROFILE_CHANGE_TEMPLATE:>, <PWFILE:/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/initDBN.ora>, <RANK:0>, <RELOCATE_ACTION:0>, <RELOCATE_BY_DEPENDENCY:1>, <RESTART_ATTEMPTS:1>, <ROLE:PRIMARY>, <SCRIPT_TIMEOUT:60>, <SERVER_CATEGORY:>, <SERVER_POOLS:>, <SERVER_POOLS_PQ:>, <SPFILE:>, <START_CONCURRENCY:0>, <START_DEPENDENCIES:hard(ora.DATA.dg,ora.RECO.dg) weak(type:ora.listener.type,uniform:ora.ons)>, <START_TIMEOUT:600>, <STATE_CHANGE_TEMPLATE:>, <STOP_CONCURRENCY:0>, <STOP_DEPENDENCIES:hard(intermediate:ora.asm,shutdown:ora.DATA.dg,shutdown:ora.RECO.dg)>, <STOP_TIMEOUT:600>, <TYPE:ora.database.type>, <TYPE_VERSION:3.3>, <UPTIME_THRESHOLD:1h>, <USER_WORKLOAD:yes>, <USE_STICKINESS:0>, <USR_ORA_DB_NAME:>, <USR_ORA_DOMAIN:>, <USR_ORA_ENV:>, <USR_ORA_FLAGS:>, <USR_ORA_INST_NAME:DBN>, <USR_ORA_OPEN_MODE:open>, <USR_ORA_OPI:false>, <USR_ORA_STOP_MODE:immediate>, <VERSION:12.1.0.2.0>]
[main] [ 2014-11-04 08:11:37.425 GMT ] [DatabaseImpl.getDBInstances:2356]  name=STATE, value=ONLINE
[main] [ 2014-11-04 08:11:37.426 GMT ] [DatabaseImpl.getDBInstances:2356]  name=TARGET, value=ONLINE
[main] [ 2014-11-04 08:11:37.427 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ACL, value=owner:oracle:rwx,pgrp:asmdba:r-x,other::r--,group:oinstall:r-x,user:oracle:rwx
[main] [ 2014-11-04 08:11:37.427 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ACTIONS, value=startoption,group:"asmdba",user:"oracle",group:"oinstall"
[main] [ 2014-11-04 08:11:37.428 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ACTION_FAILURE_TEMPLATE, value=
[main] [ 2014-11-04 08:11:37.428 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ACTION_SCRIPT, value=
[main] [ 2014-11-04 08:11:37.429 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ACTION_START_OPTION, value=
[main] [ 2014-11-04 08:11:37.429 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ACTION_TIMEOUT, value=600
[main] [ 2014-11-04 08:11:37.430 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ACTIVE_PLACEMENT, value=0
[main] [ 2014-11-04 08:11:37.431 GMT ] [DatabaseImpl.getDBInstances:2356]  name=AGENT_FILENAME, value=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
[main] [ 2014-11-04 08:11:37.432 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ALERT_TEMPLATE, value=
[main] [ 2014-11-04 08:11:37.432 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ALIAS_NAME, value=
[main] [ 2014-11-04 08:11:37.433 GMT ] [DatabaseImpl.getDBInstances:2356]  name=AUTO_START, value=never
[main] [ 2014-11-04 08:11:37.434 GMT ] [DatabaseImpl.getDBInstances:2356]  name=CARDINALITY, value=1
[main] [ 2014-11-04 08:11:37.435 GMT ] [DatabaseImpl.getDBInstances:2356]  name=CARDINALITY_ID, value=0
[main] [ 2014-11-04 08:11:37.435 GMT ] [DatabaseImpl.getDBInstances:2356]  name=CHECK_INTERVAL, value=1
[main] [ 2014-11-04 08:11:37.436 GMT ] [DatabaseImpl.getDBInstances:2356]  name=CHECK_TIMEOUT, value=30
[main] [ 2014-11-04 08:11:37.436 GMT ] [DatabaseImpl.getDBInstances:2356]  name=CLEAN_TIMEOUT, value=60
[main] [ 2014-11-04 08:11:37.437 GMT ] [DatabaseImpl.getDBInstances:2356]  name=CLUSTER_DATABASE, value=false
[main] [ 2014-11-04 08:11:37.438 GMT ] [DatabaseImpl.getDBInstances:2356]  name=CONFIG_VERSION, value=1
[main] [ 2014-11-04 08:11:37.439 GMT ] [DatabaseImpl.getDBInstances:2356]  name=CREATION_SEED, value=37
[main] [ 2014-11-04 08:11:37.439 GMT ] [DatabaseImpl.getDBInstances:2356]  name=DATABASE_TYPE, value=SINGLE
[main] [ 2014-11-04 08:11:37.440 GMT ] [DatabaseImpl.getDBInstances:2356]  name=DB_UNIQUE_NAME, value=DB
[main] [ 2014-11-04 08:11:37.441 GMT ] [DatabaseImpl.getDBInstances:2356]  name=DEFAULT_TEMPLATE, value=PROPERTY(RESOURCE_CLASS=database) PROPERTY(DB_UNIQUE_NAME= CONCAT(PARSE(%NAME%, ., 2), %USR_ORA_DOMAIN%, .)) ELEMENT(INSTANCE_NAME= %GEN_USR_ORA_INST_NAME%) ELEMENT(DATABASE_TYPE= %DATABASE_TYPE%)
[main] [ 2014-11-04 08:11:37.442 GMT ] [DatabaseImpl.getDBInstances:2356]  name=DEGREE, value=1
[main] [ 2014-11-04 08:11:37.444 GMT ] [DatabaseImpl.getDBInstances:2356]  name=DEGREE_ID, value=0
[main] [ 2014-11-04 08:11:37.444 GMT ] [DatabaseImpl.getDBInstances:2356]  name=DELETE_TIMEOUT, value=60
[main] [ 2014-11-04 08:11:37.445 GMT ] [DatabaseImpl.getDBInstances:2356]  name=DESCRIPTION, value=Oracle Database resource
[main] [ 2014-11-04 08:11:37.446 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ENABLED, value=1
[main] [ 2014-11-04 08:11:37.447 GMT ] [DatabaseImpl.getDBInstances:2356]  name=FAILOVER_DELAY, value=0
[main] [ 2014-11-04 08:11:37.447 GMT ] [DatabaseImpl.getDBInstances:2356]  name=FAILURE_INTERVAL, value=60
[main] [ 2014-11-04 08:11:37.448 GMT ] [DatabaseImpl.getDBInstances:2356]  name=FAILURE_THRESHOLD, value=1
[main] [ 2014-11-04 08:11:37.449 GMT ] [DatabaseImpl.getDBInstances:2356]  name=GEN_AUDIT_FILE_DEST, value=/u01/app/oracle/admin/DB/adump
[main] [ 2014-11-04 08:11:37.450 GMT ] [DatabaseImpl.getDBInstances:2356]  name=GEN_START_OPTIONS, value=
[main] [ 2014-11-04 08:11:37.450 GMT ] [DatabaseImpl.getDBInstances:2356]  name=GEN_USR_ORA_INST_NAME, value=DBN
[main] [ 2014-11-04 08:11:37.451 GMT ] [DatabaseImpl.getDBInstances:2356]  name=HOSTING_MEMBERS, value=
[main] [ 2014-11-04 08:11:37.452 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ID, value=ora.db.db
[main] [ 2014-11-04 08:11:37.452 GMT ] [DatabaseImpl.getDBInstances:2356]  name=INSTANCE_COUNT, value=1
[main] [ 2014-11-04 08:11:37.453 GMT ] [DatabaseImpl.getDBInstances:2356]  name=INSTANCE_FAILOVER, value=1
[main] [ 2014-11-04 08:11:37.454 GMT ] [DatabaseImpl.getDBInstances:2356]  name=INTERMEDIATE_TIMEOUT, value=0
[main] [ 2014-11-04 08:11:37.455 GMT ] [DatabaseImpl.getDBInstances:2356]  name=LOAD, value=1
[main] [ 2014-11-04 08:11:37.455 GMT ] [DatabaseImpl.getDBInstances:2356]  name=LOGGING_LEVEL, value=1
[main] [ 2014-11-04 08:11:37.456 GMT ] [DatabaseImpl.getDBInstances:2356]  name=MANAGEMENT_POLICY, value=MANUAL
[main] [ 2014-11-04 08:11:37.456 GMT ] [DatabaseImpl.getDBInstances:2356]  name=MODIFY_TIMEOUT, value=60
[main] [ 2014-11-04 08:11:37.457 GMT ] [DatabaseImpl.getDBInstances:2356]  name=NAME, value=ora.db.db
[main] [ 2014-11-04 08:11:37.458 GMT ] [DatabaseImpl.getDBInstances:2356]  name=NLS_LANG, value=
[main] [ 2014-11-04 08:11:37.458 GMT ] [DatabaseImpl.getDBInstances:2356]  name=NOT_RESTARTING_TEMPLATE, value=
[main] [ 2014-11-04 08:11:37.459 GMT ] [DatabaseImpl.getDBInstances:2356]  name=OFFLINE_CHECK_INTERVAL, value=0
[main] [ 2014-11-04 08:11:37.459 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ONLINE_RELOCATION_TIMEOUT, value=0
[main] [ 2014-11-04 08:11:37.460 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ORACLE_HOME, value=/u01/app/oracle/product/12.1.0.2/dbhome_1
[main] [ 2014-11-04 08:11:37.461 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ORACLE_HOME_OLD, value=
[main] [ 2014-11-04 08:11:37.462 GMT ] [DatabaseImpl.getDBInstances:2356]  name=OS_CRASH_THRESHOLD, value=0
[main] [ 2014-11-04 08:11:37.462 GMT ] [DatabaseImpl.getDBInstances:2356]  name=OS_CRASH_UPTIME, value=0
[main] [ 2014-11-04 08:11:37.463 GMT ] [DatabaseImpl.getDBInstances:2356]  name=PLACEMENT, value=balanced
[main] [ 2014-11-04 08:11:37.463 GMT ] [DatabaseImpl.getDBInstances:2356]  name=PROFILE_CHANGE_TEMPLATE, value=
[main] [ 2014-11-04 08:11:37.464 GMT ] [DatabaseImpl.getDBInstances:2356]  name=PWFILE, value=/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/initDBN.ora
[main] [ 2014-11-04 08:11:37.465 GMT ] [DatabaseImpl.getDBInstances:2356]  name=RANK, value=0
[main] [ 2014-11-04 08:11:37.466 GMT ] [DatabaseImpl.getDBInstances:2356]  name=RELOCATE_ACTION, value=0
[main] [ 2014-11-04 08:11:37.466 GMT ] [DatabaseImpl.getDBInstances:2356]  name=RELOCATE_BY_DEPENDENCY, value=1
[main] [ 2014-11-04 08:11:37.467 GMT ] [DatabaseImpl.getDBInstances:2356]  name=RESTART_ATTEMPTS, value=1
[main] [ 2014-11-04 08:11:37.468 GMT ] [DatabaseImpl.getDBInstances:2356]  name=ROLE, value=PRIMARY
[main] [ 2014-11-04 08:11:37.469 GMT ] [DatabaseImpl.getDBInstances:2356]  name=SCRIPT_TIMEOUT, value=60
[main] [ 2014-11-04 08:11:37.469 GMT ] [DatabaseImpl.getDBInstances:2356]  name=SERVER_CATEGORY, value=
[main] [ 2014-11-04 08:11:37.470 GMT ] [DatabaseImpl.getDBInstances:2356]  name=SERVER_POOLS, value=
[main] [ 2014-11-04 08:11:37.471 GMT ] [DatabaseImpl.getDBInstances:2356]  name=SERVER_POOLS_PQ, value=
[main] [ 2014-11-04 08:11:37.472 GMT ] [DatabaseImpl.getDBInstances:2356]  name=SPFILE, value=
[main] [ 2014-11-04 08:11:37.473 GMT ] [DatabaseImpl.getDBInstances:2356]  name=START_CONCURRENCY, value=0
[main] [ 2014-11-04 08:11:37.474 GMT ] [DatabaseImpl.getDBInstances:2356]  name=START_DEPENDENCIES, value=hard(ora.DATA.dg,ora.RECO.dg) weak(type:ora.listener.type,uniform:ora.ons)
[main] [ 2014-11-04 08:11:37.475 GMT ] [DatabaseImpl.getDBInstances:2356]  name=START_TIMEOUT, value=600
[main] [ 2014-11-04 08:11:37.476 GMT ] [DatabaseImpl.getDBInstances:2356]  name=STATE_CHANGE_TEMPLATE, value=
[main] [ 2014-11-04 08:11:37.477 GMT ] [DatabaseImpl.getDBInstances:2356]  name=STOP_CONCURRENCY, value=0
[main] [ 2014-11-04 08:11:37.478 GMT ] [DatabaseImpl.getDBInstances:2356]  name=STOP_DEPENDENCIES, value=hard(intermediate:ora.asm,shutdown:ora.DATA.dg,shutdown:ora.RECO.dg)
[main] [ 2014-11-04 08:11:37.478 GMT ] [DatabaseImpl.getDBInstances:2356]  name=STOP_TIMEOUT, value=600
[main] [ 2014-11-04 08:11:37.480 GMT ] [DatabaseImpl.getDBInstances:2356]  name=TYPE, value=ora.database.type
[main] [ 2014-11-04 08:11:37.480 GMT ] [DatabaseImpl.getDBInstances:2356]  name=TYPE_VERSION, value=3.3
[main] [ 2014-11-04 08:11:37.481 GMT ] [DatabaseImpl.getDBInstances:2356]  name=UPTIME_THRESHOLD, value=1h
[main] [ 2014-11-04 08:11:37.481 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USER_WORKLOAD, value=yes
[main] [ 2014-11-04 08:11:37.483 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USE_STICKINESS, value=0
[main] [ 2014-11-04 08:11:37.483 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USR_ORA_DB_NAME, value=
[main] [ 2014-11-04 08:11:37.484 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USR_ORA_DOMAIN, value=
[main] [ 2014-11-04 08:11:37.485 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USR_ORA_ENV, value=
[main] [ 2014-11-04 08:11:37.486 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USR_ORA_FLAGS, value=
[main] [ 2014-11-04 08:11:37.487 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USR_ORA_INST_NAME, value=DBN
[main] [ 2014-11-04 08:11:37.487 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USR_ORA_OPEN_MODE, value=open
[main] [ 2014-11-04 08:11:37.488 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USR_ORA_OPI, value=false
[main] [ 2014-11-04 08:11:37.488 GMT ] [DatabaseImpl.getDBInstances:2356]  name=USR_ORA_STOP_MODE, value=immediate
[main] [ 2014-11-04 08:11:37.489 GMT ] [DatabaseImpl.getDBInstances:2356]  name=VERSION, value=12.1.0.2.0
[main] [ 2014-11-04 08:11:37.490 GMT ] [CRSCache.getAttributesFromCache:229]  CRS cache: ora.db.db [<DATABASE_TYPE:SINGLE>]
[main] [ 2014-11-04 08:11:37.494 GMT ] [CRSCache.getAttributesFromCRS:155]  CRS: name: ora.db.db, type 1, node: null
[main] [ 2014-11-04 08:11:37.495 GMT ] [CRSCache.getAttributesFromCRS:156]  attrs: [HOSTING_MEMBERS]
[main] [ 2014-11-04 08:11:37.498 GMT ] [CRSCache.getAttributesFromCRS:163]  CRS: [<HOSTING_MEMBERS:>]
[main] [ 2014-11-04 08:11:37.499 GMT ] [CRSCache.getAttributesFromCRS:155]  CRS: name: ora.db.db, type 1, node: null
[main] [ 2014-11-04 08:11:37.500 GMT ] [CRSCache.getAttributesFromCRS:156]  attrs: [SERVER_POOLS]
[main] [ 2014-11-04 08:11:37.524 GMT ] [CRSCache.getAttributesFromCRS:163]  CRS: [<SERVER_POOLS:>]
[main] [ 2014-11-04 08:11:37.527 GMT ] [DatabaseImpl.crsResource:5254]  calling parent's crsResource
[main] [ 2014-11-04 08:11:37.539 GMT ] [EntityOperations.isEntityRegisteredCRSD:353]  entity: ora.db.db, type: 1, registered: true
[main] [ 2014-11-04 08:11:37.562 GMT ] [CRSNative.searchEntities:658]  found 1 entities
[main] [ 2014-11-04 08:11:37.562 GMT ] [DatabaseImpl.getDBInstances:2421]  size of resultData = 1
[main] [ 2014-11-04 08:11:37.564 GMT ] [DatabaseImpl.getDBInstances:2454]  riID=ora.db.db 1 1 result={LAST_SERVER=oraclelinux}
[main] [ 2014-11-04 08:11:37.565 GMT ] [CRSCache.getAttributesFromCache:229]  CRS cache: ora.db.db [<DATABASE_TYPE:SINGLE>]
[main] [ 2014-11-04 08:11:37.566 GMT ] [CRSCache.getAttributesFromCRS:155]  CRS: name: ora.db.db, type 1, node: null
[main] [ 2014-11-04 08:11:37.572 GMT ] [CRSCache.getAttributesFromCRS:156]  attrs: [HOSTING_MEMBERS]
[main] [ 2014-11-04 08:11:37.582 GMT ] [CRSCache.getAttributesFromCRS:163]  CRS: [<HOSTING_MEMBERS:>]
[main] [ 2014-11-04 08:11:37.583 GMT ] [CRSCache.getAttributesFromCRS:155]  CRS: name: ora.db.db, type 1, node: null
[main] [ 2014-11-04 08:11:37.584 GMT ] [CRSCache.getAttributesFromCRS:156]  attrs: [SERVER_POOLS]
[main] [ 2014-11-04 08:11:37.604 GMT ] [CRSCache.getAttributesFromCRS:163]  CRS: [<SERVER_POOLS:>]
[main] [ 2014-11-04 08:11:37.606 GMT ] [Util.<init>:103]  Util Instance created.
[main] [ 2014-11-04 08:11:37.608 GMT ] [DatabaseImpl.getDBInstances:2543]  instName=DBN nodeName=oraclelinux
[main] [ 2014-11-04 08:11:37.611 GMT ] [CRSCache.getAttributesFromCRS:155]  CRS: name: ora.db.db, type 1, node: null
[main] [ 2014-11-04 08:11:37.614 GMT ] [CRSCache.getAttributesFromCRS:156]  attrs: [SERVER_POOLS_PQ]
[main] [ 2014-11-04 08:11:37.625 GMT ] [CRSCache.getAttributesFromCRS:163]  CRS: [<SERVER_POOLS_PQ:>]
[main] [ 2014-11-04 08:11:37.626 GMT ] [Util.<init>:103]  Util Instance created.
[main] [ 2014-11-04 08:11:37.627 GMT ] [CmdOptVals.isOptionSet:917]  Check isOptionSet for VERBOSE
[main] [ 2014-11-04 08:11:37.629 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for verbose
[main] [ 2014-11-04 08:11:37.629 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:37.631 GMT ] [CmdOptVals.isOptionSet:917]  Check isOptionSet for INNER
[main] [ 2014-11-04 08:11:37.632 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for inner
[main] [ 2014-11-04 08:11:37.634 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:37.637 GMT ] [CRSCache.getAttributesFromCache:229]  CRS cache: ora.db.db [<DATABASE_TYPE:SINGLE>]
[main] [ 2014-11-04 08:11:37.638 GMT ] [CmdOptVals.isOptionSet:917]  Check isOptionSet for FORCE
[main] [ 2014-11-04 08:11:37.639 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for force
[main] [ 2014-11-04 08:11:37.639 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:37.641 GMT ] [CRSNative.getResourceInstancesID:1070]  Looking for RIs using filter: ((NAME == ora.db.db) AND ((STATE == ONLINE) OR (STATE == INTERMEDIATE)))
[main] [ 2014-11-04 08:11:37.645 GMT ] [CRSNative.getResourceInstancesID:1078]  found 1 RIs
[main] [ 2014-11-04 08:11:37.646 GMT ] [CRSNative.getResourceInstancesID:1080]  	ora.db.db 1 1
[main] [ 2014-11-04 08:11:37.647 GMT ] [CmdOptVals.isOptionSet:917]  Check isOptionSet for VERBOSE
[main] [ 2014-11-04 08:11:37.648 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for verbose
[main] [ 2014-11-04 08:11:37.648 GMT ] [OptVal.isSet:150]  m_value is null
Database is running.
[main] [ 2014-11-04 08:11:37.682 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for thisversion
[main] [ 2014-11-04 08:11:37.683 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:37.683 GMT ] [CmdOptVals.isOptionSet:861]  Calling isOptionSet for thishome
[main] [ 2014-11-04 08:11:37.684 GMT ] [OptVal.isSet:150]  m_value is null
[main] [ 2014-11-04 08:11:37.684 GMT ] [InterruptHandler.unRegisterInterruptHandler:76]  UNRegistering shutdown hook.....
[main] [ 2014-11-04 08:11:37.685 GMT ] [InterruptHandler.unRegisterInterruptHandler:81]  UnRegistered shutdown hook.....
[main] [ 2014-11-04 08:11:37.686 GMT ] [OPSCTLDriver.main:233]  Unregistered OPSCTL driver's interrupt handler
[main] [ 2014-11-04 08:11:37.687 GMT ] [SRVMContext.term:151]  Performing SRVM Context Term. Term counter is 2
[oracle@oraclelinux ~]$

Posted in 12c Release 1, Linux | Tagged: | Leave a Comment »

Interesting observations executing SLOB2 with ext4 and xfs on SSD

Posted by Martin Bach on October 31, 2014

My lab server has 2 SSDs, one is connected using SATA 2 and another is connected using SATA 3. I’d expect the SATA 3 connected device to be equally well equipped or even better to do work than the “old” interface. I ran SLOB on these devices to find out if there was a difference. To my great surprise the SATA2 – connected SSD performed a lot better than the SATA 3 device, as shown in the AWR report! Initially I was not entirely sure why, since the FIO results on both devices are roughly equal. You will see why though when reading this post. In summary: use XFS for any concurrent writes. Or maybe ASM.

Let’s investigate

Let’s do a little I/O investigation because a) it’s cool and b) you can.

This is Oracle Linux 6.5 with UEK 3. A word of warning though-ext4 has been, let’s say, a little flaky. I first found out about that in 2011 (http://martincarstenbach.wordpress.com/2011/11/04/an-interesting-problem-with-ext4-on-oracle-linux-5-5/) and Oracle now published a MOS note: ORA-1578 ORA-353 ORA-19599 Corrupt blocks with zeros when filesystemio_options=SETALL on ext4 file system using Linux (Doc ID 1487957.1). I am on a reasonably recent version so the observations in this post are probably not related to the above notes. The device in /dev/sda uses SATA2 and the file system /u01/app/oracle/oradata is XFS formatted. The other SSD, /dev/sde uses EXT4 but is connected via SATA3 provided by a PCIe v2 x4 card. I wish I had PCIe v3 but my Opteron 6200 system doesn’t support native SATA6G.

SLOB

I created a 12c non-CDB database for SLOB with the IOPS tablespace on my SATA 2 SSD that’s mounted on XFS. Every test has been repeated four times. Should you want to follow along, here is my slob.conf:

[/m/kvm/oracle]$ cat ~/SLOB2/SLOB/slob.conf 

UPDATE_PCT=20
RUN_TIME=120
WORK_LOOP=0
SCALE=10000
WORK_UNIT=256
REDO_STRESS=HEAVY
LOAD_PARALLEL_DEGREE=8
SHARED_DATA_MODULUS=0

# Settings for SQL*Net connectivity:
#ADMIN_SQLNET_SERVICE=slob
#SQLNET_SERVICE_BASE=slob
#SQLNET_SERVICE_MAX=2
#SYSDBA_PASSWD="change_on_install"

export UPDATE_PCT RUN_TIME WORK_LOOP SCALE WORK_UNIT LOAD_PARALLEL_DEGREE REDO_STRESS SHARED_DATA_MODULUS

[oracle@ol62 - /u01/app/oracle/product/12.1.0.2/dbhome_1: SLOB12]

Notice the UPDATE_PCT-that will become important later. The SLOB configuration has been created using a call to ./setup.sh with IOPS as the target tablespace and 128 users which is the default. My database uses this initialisation file shown below, which is more or lesse the same as the default (and my buffer cache is indeed 16M)

[oracle@ol62 - /u01/app/oracle/product/12.1.0.2/dbhome_1: SLOB12]
[/m/kvm/oracle]$ cat /u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/initSLOB12.ora 

#set the following to paths that make sense on your system:
db_create_file_dest = '/u01/app/oracle/oradata'
control_files='/u01/app/oracle/oradata/SLOB12/controlfile/o1_mf_b52b4lvk_.ctl'

db_name = SLOB12
compatible = 12.1.0.2.0
undo_management = auto
db_block_size = 8192
db_files = 2000
processes = 500
shared_pool_size = 1G
db_cache_size = 10M     # Will force maximum physical I/O
#db_cache_size = 15G    # Sufficient for cached runs with reference scale (10,000 rows) and 128 sessions
filesystemio_options = setall
parallel_max_servers = 0
log_buffer = 134217728
cpu_count = 2
pga_aggregate_target = 1G
remote_login_passwordfile = exclusive
workarea_size_policy = auto
recyclebin = off

# The following are needed to force db file sequential read. Omit for direct path short scans.
# If you would like to investigate a blend of short scans mixed with single-block random I/O
# then comment out these three parameters :
_db_block_prefetch_limit = 0
_db_block_prefetch_quota = 0
_db_file_noncontig_mblock_read_count = 0

# Uncomment the following if suffering library cache contention (at larger numbers of sessions) or high log file sync waits
# at CPU-saturated levels.
#_cursor_obsolete_threshold=130
#_high_priority_processes=LGWR

[oracle@ol62 - /u01/app/oracle/product/12.1.0.2/dbhome_1: SLOB12]

After the fourth execution of ./runit.sh 16 I moved the IOPS data file from /dev/sda to /dev/sde and ran another 4 tests.

Observations

I ran an AWR difference report to compare the performance of the XFS and EXT4 SSDs. The most interesting bit is right on the top. Apologies for the very wide output-that’s the way the text report lays it out.

WORKLOAD REPOSITORY COMPARE PERIOD REPORT

Snapshot Set    DB Id    Instance     Inst Num Release     Cluster Host          Std Block Size
------------ ----------- ------------ -------- ----------- ------- ------------ ---------------
First (1st)   1269760792 SLOB12              1 12.1.0.2.0  NO      ol62.localdo     8192
Second (2nd)  1269760792 SLOB12              1 12.1.0.2.0  NO      ol62.localdo     8192

Snapshot Set  Begin Snap Id Begin Snap Time            End Snap Id End Snap Time                  Avg Active Users           Elapsed Time (min)            DB time (min)
------------ -------------- ------------------------- ------------ ------------------------- -------------------------- -------------------------- --------------------------
1st                      41 31-Oct-14 12:18:14 (Fri)           42 31-Oct-14 12:20:17 (Fri)                      15.8                       2.0                      32.1
2nd                      49 31-Oct-14 12:38:18 (Fri)           50 31-Oct-14 12:40:20 (Fri)                      15.8                       2.0                      32.3
                                                                                      ~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                                                      %Diff:                     0.2%                       0.0%                       0.5%

Host Configuration Comparison
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                     1st                  2nd                 Diff     %Diff
----------------------------------- -------------------- -------------------- -------------------- ---------
Number of CPUs:                                       24                   24                    0       0.0
Number of CPU Cores:                                  24                   24                    0       0.0
Number of CPU Sockets:                                 2                    2                    0       0.0
Physical Memory:                                64413.7M             64413.7M                   0M       0.0
Load at Start Snapshot:                             9.13                 9.57                  .44       4.8
Load at End Snapshot:                              15.38                 15.6                  .22       1.4
%User Time:                                         1.24                  .64                  -.6     -48.4
%System Time:                                        .75                  .74                 -.02      -1.3
%Idle Time:                                        97.89                 98.6                  .71       0.7
%IO Wait Time:                                     35.21                 9.06               -26.15     -74.3
Cache Sizes
~~~~~~~~~~~
                          1st (M)    2nd (M)   Diff (M)    %Diff
---------------------- ---------- ---------- ---------- --------
Memory Target
.....SGA Target
..........Buffer Cache       16.0       16.0        0.0      0.0
..........Shared Pool     1,024.0    1,024.0        0.0      0.0
..........Large Pool
..........Java Pool          16.0       16.0        0.0      0.0
..........Streams Pool
.....PGA Target           1,024.0    1,024.0        0.0      0.0
Log Buffer                  128.0      128.0        0.0      0.0
In-Memory Area

Workload Comparison
~~~~~~~~~~~~~~~~~~~                      1st Per Sec          2nd Per Sec      %Diff              1st Per Txn          2nd Per Txn      %Diff
                                     ---------------      ---------------     ------          ---------------      ---------------     ------
                       DB time:                 15.8                 15.8        0.2                      2.5                  4.4       78.7
                      CPU time:                  0.5                  0.3      -29.8                      0.1                  0.1       28.6
           Background CPU time:                  0.1                  0.1      -40.0                      0.0                  0.0        0.0
             Redo size (bytes):          8,721,293.5          4,652,147.0      -46.7              1,379,498.8          1,311,306.6       -4.9
         Logical read (blocks):             10,403.9              5,231.1      -49.7                  1,645.6              1,474.5      -10.4
                 Block changes:              5,216.5              2,640.2      -49.4                    825.1                744.2       -9.8
        Physical read (blocks):              7,412.5              3,750.0      -49.4                  1,172.5              1,057.0       -9.8
       Physical write (blocks):              3,369.0              1,729.0      -48.7                    532.9                487.4       -8.5
              Read IO requests:              7,411.8              3,749.3      -49.4                  1,172.4              1,056.8       -9.9
             Write IO requests:              3,018.0              1,538.2      -49.0                    477.4                433.6       -9.2
                  Read IO (MB):                 57.9                 29.3      -49.4                      9.2                  8.3       -9.8
                 Write IO (MB):                 26.3                 13.5      -48.7                      4.2                  3.8       -8.4
                  IM scan rows:                  0.0                  0.0        0.0                      0.0                  0.0        0.0
       Session Logical Read IM:
                    User calls:                  1.4                  1.4       -1.4                      0.2                  0.4       77.3
                  Parses (SQL):                  5.9                  6.3        7.5                      0.9                  1.8       91.4
             Hard parses (SQL):                  0.1                  0.3      107.1                      0.0                  0.1      300.0
            SQL Work Area (MB):                  0.6                  0.4      -25.9                      0.1                  0.1      -25.9
                        Logons:                  0.1                  0.1        0.0                      0.0                  0.0      100.0
                Executes (SQL):                 38.1                 23.7      -37.6                      6.0                  6.7       11.1
                  Transactions:                  6.3                  3.5      -43.8

                                               First               Second       Diff
                                     ---------------      ---------------     ------
     % Blocks changed per Read:                 50.1                 50.5        0.3
              Recursive Call %:                 98.1                 97.6       -0.6
    Rollback per transaction %:                  0.0                  0.0        0.0
                 Rows per Sort:                 68.3                 35.4      -32.9
    Avg DB time per Call (sec):                 11.3                 11.5        0.2</pre>
<pre>...

I copied the header  from the report  to show you that nothing really changed between the test executions, and that I’m going to do a lot of I/O since the buffer cache is only 16 MB.

What is striking is that the physical reads and writes are roughly half of what I saw during the first batch of executions. The culprit is found quickly. In this scenario I chose SLOB is primarily performing sequential reads:

Top Timed Events   First DB/Inst: SLOB12/SLOB12 Snaps: 41-42 (Elapsed time: 122.27 sec  DB time: 1925.83 sec),  Second DB/Inst: SLOB12/SLOB12 Snaps: 49-50 (Elapsed time: 122.614 sec  DB time: 1935.33 sec)
-> Events with a "-" did not make the Top list in this set of snapshots, but are displayed for comparison purposes

                                               1st                                                                                                2nd
------------------------------------------------------------------------------------------------   ------------------------------------------------------------------------------------------------
Event                          Wait Class           Waits      Time(s)  Avg Time(ms)    %DB time   Event                          Wait Class           Waits      Time(s)  Avg Time(ms)    %DB time
------------------------------ ------------- ------------ ------------ ------------- -----------   ------------------------------ ------------- ------------ ------------ ------------- -----------
 db file sequential read       User I/O           906,150      1,899.1           2.1        98.6    db file sequential read       User I/O           459,637      1,920.9           4.2        99.3
 db file parallel write        System I/O         116,318         60.0           0.5         3.1    db file async I/O submit      System I/O           2,975         97.2          32.7         5.0
 CPU time                                             N/A         57.7           N/A         3.0    CPU time                                             N/A         40.2           N/A         2.1
 log file parallel write       System I/O           3,485         47.4          13.6         2.5    log file parallel write       System I/O           1,881          3.2           1.7         0.2
 db file async I/O submit      System I/O           7,425          3.7           0.5         0.2    db file parallel write        System I/O           4,670          0.9           0.2         0.0
...

The IO times for the second interval (the EXT4 case) are twice as what they are on the first. But why? I thought at first that the PCIe card adds the extra latency. And I came up with a few other ideas that all remained that: ideas. Ideas are nice, but proof is better. Maybe there is indeed something odd? Or maybe there is something in the way that Oracle uses the device? To find out I used FIO. FIO is my favourite I/O benchmark when it comes to measuring raw performance. If the FIO benchmark results match the Oracle results then indeed there must be something wrong with the SSD or the way it is addressed. I spent a bit of time to compile FIO and ran a set of tests.

FIO

I was really quite curious to see what was happening during the execution of FIO-was the SATA3 SSD really slower? So I started FIO tests, using FIO 2.1.3. Using the same test harness, just the destination is different:

[~/fio-2.1.13]$ cat rand-read-sd*
[rand-read]
numjobs=4
rw=randread
directory=/u01/fio/data
size=5G
bs=8k
ioengine=libaio
buffered=0
iodepth=8

[rand-read]
numjobs=4
rw=randread
directory=/m/kvm/oracle
size=5G
bs=8k
ioengine=libaio
buffered=0
iodepth=8

[oracle@ol62 - /u01/app/oracle/product/12.1.0.2/dbhome_1: SLOB12]
[~/fio-2.1.13]$

The /u01 file system sits on /dev/sda, and uses XFS. /m/kvm/oracle is on /dev/sde, using EXT4. Here is the output from the tests:

[~/fio-2.1.13]$ ./fio rand-read-sde.fio
rand-read: (g=0): rw=randread, bs=8K-8K/8K-8K/8K-8K, ioengine=libaio, iodepth=8
...
fio-2.1.13
Starting 4 processes
rand-read: Laying out IO file(s) (1 file(s) / 5120MB)
rand-read: Laying out IO file(s) (1 file(s) / 5120MB)
rand-read: Laying out IO file(s) (1 file(s) / 5120MB)
Jobs: 3 (f=3): [r(3),_(1)] [99.2% done] [192.8MB/0KB/0KB /s] [24.7K/0/0 iops] [eta 00m:01s]
rand-read: (groupid=0, jobs=1): err= 0: pid=6124: Thu Oct 30 15:41:15 2014
  read : io=5120.0MB, bw=44174KB/s, iops=5521, runt=118686msec
    slat (usec): min=4, max=109, avg=13.06, stdev= 6.14
    clat (usec): min=128, max=43082, avg=1434.23, stdev=586.65
     lat (usec): min=140, max=43094, avg=1447.47, stdev=586.41
    clat percentiles (usec):
     |  1.00th=[  516],  5.00th=[  708], 10.00th=[  820], 20.00th=[  988],
     | 30.00th=[ 1112], 40.00th=[ 1240], 50.00th=[ 1352], 60.00th=[ 1480],
     | 70.00th=[ 1608], 80.00th=[ 1816], 90.00th=[ 2160], 95.00th=[ 2480],
     | 99.00th=[ 3184], 99.50th=[ 3472], 99.90th=[ 4192], 99.95th=[ 4576],
     | 99.99th=[ 6752]
    bw (KB  /s): min=27088, max=70928, per=24.99%, avg=44119.50, stdev=6314.88
    lat (usec) : 250=0.01%, 500=0.78%, 750=6.02%, 1000=14.12%
    lat (msec) : 2=65.54%, 4=13.37%, 10=0.15%, 20=0.01%, 50=0.01%
  cpu          : usr=1.89%, sys=11.05%, ctx=421189, majf=0, minf=44
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=655360/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8
rand-read: (groupid=0, jobs=1): err= 0: pid=6125: Thu Oct 30 15:41:15 2014
  read : io=5120.0MB, bw=44138KB/s, iops=5517, runt=118784msec
    slat (usec): min=4, max=108, avg=13.14, stdev= 6.14
    clat (usec): min=117, max=43252, avg=1435.31, stdev=587.66
     lat (usec): min=124, max=43264, avg=1448.64, stdev=587.40
    clat percentiles (usec):
     |  1.00th=[  502],  5.00th=[  700], 10.00th=[  820], 20.00th=[  988],
     | 30.00th=[ 1128], 40.00th=[ 1240], 50.00th=[ 1352], 60.00th=[ 1480],
     | 70.00th=[ 1624], 80.00th=[ 1816], 90.00th=[ 2160], 95.00th=[ 2480],
     | 99.00th=[ 3152], 99.50th=[ 3472], 99.90th=[ 4192], 99.95th=[ 4640],
     | 99.99th=[ 6688]
    bw (KB  /s): min=27504, max=69760, per=24.91%, avg=43972.22, stdev=6169.69
    lat (usec) : 250=0.07%, 500=0.92%, 750=5.84%, 1000=13.94%
    lat (msec) : 2=65.68%, 4=13.40%, 10=0.16%, 20=0.01%, 50=0.01%
  cpu          : usr=1.89%, sys=11.10%, ctx=421379, majf=0, minf=46
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=655360/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8
rand-read: (groupid=0, jobs=1): err= 0: pid=6126: Thu Oct 30 15:41:15 2014
  read : io=5120.0MB, bw=44230KB/s, iops=5528, runt=118538msec
    slat (usec): min=4, max=120, avg=12.59, stdev= 5.61
    clat (usec): min=185, max=42871, avg=1433.09, stdev=581.75
     lat (usec): min=196, max=42881, avg=1445.85, stdev=581.45
    clat percentiles (usec):
     |  1.00th=[  532],  5.00th=[  716], 10.00th=[  828], 20.00th=[  988],
     | 30.00th=[ 1128], 40.00th=[ 1240], 50.00th=[ 1352], 60.00th=[ 1464],
     | 70.00th=[ 1608], 80.00th=[ 1800], 90.00th=[ 2128], 95.00th=[ 2480],
     | 99.00th=[ 3152], 99.50th=[ 3440], 99.90th=[ 4192], 99.95th=[ 4576],
     | 99.99th=[ 7008]
    bw (KB  /s): min=27600, max=65952, per=25.01%, avg=44149.65, stdev=5945.98
    lat (usec) : 250=0.01%, 500=0.63%, 750=5.90%, 1000=14.17%
    lat (msec) : 2=65.89%, 4=13.25%, 10=0.15%, 20=0.01%, 50=0.01%
  cpu          : usr=1.74%, sys=11.20%, ctx=426219, majf=0, minf=44
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=655360/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8
rand-read: (groupid=0, jobs=1): err= 0: pid=6127: Thu Oct 30 15:41:15 2014
  read : io=5120.0MB, bw=47359KB/s, iops=5919, runt=110704msec
    slat (usec): min=4, max=114, avg=12.83, stdev= 5.90
    clat (usec): min=188, max=42988, avg=1337.06, stdev=571.06
     lat (usec): min=198, max=43000, avg=1350.07, stdev=570.63
    clat percentiles (usec):
     |  1.00th=[  446],  5.00th=[  620], 10.00th=[  748], 20.00th=[  900],
     | 30.00th=[ 1020], 40.00th=[ 1144], 50.00th=[ 1256], 60.00th=[ 1368],
     | 70.00th=[ 1512], 80.00th=[ 1704], 90.00th=[ 2040], 95.00th=[ 2352],
     | 99.00th=[ 3024], 99.50th=[ 3280], 99.90th=[ 3984], 99.95th=[ 4384],
     | 99.99th=[ 6624]
    bw (KB  /s): min=33664, max=51008, per=26.83%, avg=47369.34, stdev=4051.34
    lat (usec) : 250=0.01%, 500=1.45%, 750=8.72%, 1000=17.95%
    lat (msec) : 2=61.12%, 4=10.65%, 10=0.09%, 20=0.01%, 50=0.01%
  cpu          : usr=1.98%, sys=11.60%, ctx=410433, majf=0, minf=44
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=655360/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
   READ: io=20480MB, aggrb=176551KB/s, minb=44137KB/s, maxb=47359KB/s, mint=110704msec, maxt=118784msec

Disk stats (read/write):
  sde: ios=2619049/4, merge=0/1, ticks=3663514/6, in_queue=3662827, util=100.00%

The best tool to look at IO without tracing block devices is collectl (in my opinion). I ran the extended disk stats in parallel:

[root@ol62 queue]# collectl -sD -i 5 --iosize --dskfilt sde
waiting for 5 second sample...

# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
sde              0      0    0    0  125340    129  123 1021    1020   219  1749      8   99
sde              0      0    0    0  119608     58  118 1017    1017   156  1666      7   88
sde         146968      0  18K    8       0      0    0    0       8    31     1      0   89
sde         165435      0  20K    8       2      0    0    6       7    31     1      0   99
sde         187097      0  23K    8       0      0    0    0       8    31     1      0  100
sde         187113      0  23K    8       0      0    0    0       8    31     1      0   99
sde         157684      0  19K    8       0      0    0    0       8    31     1      0   99
sde         186542      0  23K    8       1      0    0    4       7    31     1      0  100
sde         186721      0  23K    8       0      0    0    0       8    31     1      0   99
sde         158416      0  19K    8       0      0    0    0       8    31     1      0  100
sde         187168      0  23K    8       0      0    0    0       8    31     1      0   99
sde         186027      0  23K    8       0      0    0    0       8    31     1      0  100
sde         158366      0  19K    8       0      0    0    0       8    31     1      0   99
sde         186028      0  23K    8       0      0    0    0       8    31     1      0   99
sde         186416      0  23K    8       0      0    0    0       8    31     1      0  100
sde         159102      0  19K    8       0      0    0    0       8    31     1      0  100
sde         186348      0  23K    8       0      0    0    0       8    31     1      0  100
sde         186859      0  23K    8       0      0    0    0       8    31     1      0   99
sde         153996      0  19K    8       0      0    0    0       8    31     1      0   99
sde         186043      0  23K    8       0      0    0    0       8    31     1      0  100
sde         185348      0  23K    8       0      0    0    0       8    31     1      0   99
sde         158150      0  19K    8       0      0    0    0       8    31     1      0   99
sde         186148      0  23K    8       0      0    0    0       8    31     1      0   99
sde         185900      0  23K    8       0      0    0    0       8    31     1      0   99
sde         159353      0  19K    8       0      0    0    0       8    25     1      0   99
sde         167065      0  20K    8       0      0    0    0       8    22     0      0   86
sde              0      0    0    0       0      0    0    0       0     0     0      0    0

This matches the FIO output-and indeed, the device can do around 23k IOPS with an 8k IO size. What about the XFS mount?

[~/fio-2.1.13]$ ./fio rand-read-sda.fio
rand-read: (g=0): rw=randread, bs=8K-8K/8K-8K/8K-8K, ioengine=libaio, iodepth=8
...
fio-2.1.13
Starting 4 processes
rand-read: Laying out IO file(s) (1 file(s) / 5120MB)
rand-read: Laying out IO file(s) (1 file(s) / 5120MB)
rand-read: Laying out IO file(s) (1 file(s) / 5120MB)
Jobs: 4 (f=4): [r(4)] [100.0% done] [191.3MB/0KB/0KB /s] [24.5K/0/0 iops] [eta 00m:00s]
rand-read: (groupid=0, jobs=1): err= 0: pid=6329: Thu Oct 30 15:52:03 2014
  read : io=5120.0MB, bw=49087KB/s, iops=6135, runt=106809msec
    slat (usec): min=3, max=93, avg= 7.90, stdev= 6.40
    clat (usec): min=280, max=35113, avg=1294.26, stdev=152.91
     lat (usec): min=285, max=35130, avg=1302.35, stdev=152.60
    clat percentiles (usec):
     |  1.00th=[ 1112],  5.00th=[ 1160], 10.00th=[ 1192], 20.00th=[ 1240],
     | 30.00th=[ 1272], 40.00th=[ 1272], 50.00th=[ 1288], 60.00th=[ 1304],
     | 70.00th=[ 1320], 80.00th=[ 1352], 90.00th=[ 1384], 95.00th=[ 1416],
     | 99.00th=[ 1480], 99.50th=[ 1512], 99.90th=[ 1672], 99.95th=[ 2224],
     | 99.99th=[ 3024]
    bw (KB  /s): min=46000, max=49536, per=25.02%, avg=49124.10, stdev=291.98
    lat (usec) : 500=0.01%, 750=0.01%, 1000=0.03%
    lat (msec) : 2=99.89%, 4=0.07%, 10=0.01%, 20=0.01%, 50=0.01%
  cpu          : usr=2.39%, sys=8.05%, ctx=505195, majf=0, minf=43
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=655360/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8
rand-read: (groupid=0, jobs=1): err= 0: pid=6330: Thu Oct 30 15:52:03 2014
  read : io=5120.0MB, bw=49093KB/s, iops=6136, runt=106795msec
    slat (usec): min=3, max=114, avg= 7.55, stdev= 5.74
    clat (usec): min=263, max=35353, avg=1294.46, stdev=155.68
     lat (usec): min=271, max=35369, avg=1302.21, stdev=155.46
    clat percentiles (usec):
     |  1.00th=[ 1112],  5.00th=[ 1160], 10.00th=[ 1192], 20.00th=[ 1256],
     | 30.00th=[ 1272], 40.00th=[ 1288], 50.00th=[ 1288], 60.00th=[ 1304],
     | 70.00th=[ 1320], 80.00th=[ 1352], 90.00th=[ 1384], 95.00th=[ 1416],
     | 99.00th=[ 1480], 99.50th=[ 1512], 99.90th=[ 1672], 99.95th=[ 2224],
     | 99.99th=[ 3184]
    bw (KB  /s): min=45728, max=49472, per=25.02%, avg=49123.20, stdev=300.30
    lat (usec) : 500=0.01%, 750=0.01%, 1000=0.02%
    lat (msec) : 2=99.89%, 4=0.07%, 10=0.01%, 20=0.01%, 50=0.01%
  cpu          : usr=2.42%, sys=7.88%, ctx=549312, majf=0, minf=46
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=655360/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8
rand-read: (groupid=0, jobs=1): err= 0: pid=6331: Thu Oct 30 15:52:03 2014
  read : io=5120.0MB, bw=49080KB/s, iops=6134, runt=106824msec
    slat (usec): min=3, max=137, avg= 7.70, stdev= 5.87
    clat (usec): min=266, max=35102, avg=1294.64, stdev=147.64
     lat (usec): min=274, max=35121, avg=1302.52, stdev=147.39
    clat percentiles (usec):
     |  1.00th=[ 1112],  5.00th=[ 1160], 10.00th=[ 1208], 20.00th=[ 1256],
     | 30.00th=[ 1272], 40.00th=[ 1272], 50.00th=[ 1288], 60.00th=[ 1304],
     | 70.00th=[ 1320], 80.00th=[ 1352], 90.00th=[ 1384], 95.00th=[ 1416],
     | 99.00th=[ 1480], 99.50th=[ 1512], 99.90th=[ 2064], 99.95th=[ 2256],
     | 99.99th=[ 3600]
    bw (KB  /s): min=45888, max=49536, per=25.02%, avg=49110.62, stdev=295.81
    lat (usec) : 500=0.01%, 750=0.01%, 1000=0.04%
    lat (msec) : 2=99.84%, 4=0.10%, 10=0.01%, 20=0.01%, 50=0.01%
  cpu          : usr=2.32%, sys=8.07%, ctx=536841, majf=0, minf=44
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=655360/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8
rand-read: (groupid=0, jobs=1): err= 0: pid=6332: Thu Oct 30 15:52:03 2014
  read : io=5120.0MB, bw=49080KB/s, iops=6134, runt=106824msec
    slat (usec): min=3, max=106, avg= 7.66, stdev= 5.93
    clat (usec): min=274, max=35268, avg=1294.67, stdev=157.16
     lat (usec): min=281, max=35286, avg=1302.53, stdev=156.91
    clat percentiles (usec):
     |  1.00th=[ 1112],  5.00th=[ 1160], 10.00th=[ 1208], 20.00th=[ 1256],
     | 30.00th=[ 1272], 40.00th=[ 1272], 50.00th=[ 1288], 60.00th=[ 1304],
     | 70.00th=[ 1320], 80.00th=[ 1352], 90.00th=[ 1384], 95.00th=[ 1416],
     | 99.00th=[ 1480], 99.50th=[ 1512], 99.90th=[ 1944], 99.95th=[ 2224],
     | 99.99th=[ 3536]
    bw (KB  /s): min=45840, max=49536, per=25.02%, avg=49109.56, stdev=303.79
    lat (usec) : 500=0.01%, 750=0.01%, 1000=0.05%
    lat (msec) : 2=99.84%, 4=0.09%, 10=0.01%, 20=0.01%, 50=0.01%
  cpu          : usr=2.50%, sys=7.83%, ctx=533469, majf=0, minf=44
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=655360/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
   READ: io=20480MB, aggrb=196318KB/s, minb=49079KB/s, maxb=49092KB/s, mint=106795msec, maxt=106824msec

Disk stats (read/write):
  sda: ios=2620077/60, merge=0/2, ticks=3348979/94, in_queue=3348631, util=100.00%
[oracle@ol62 - /u01/app/oracle/product/12.1.0.2/dbhome_1: SLOB12]
[~/fio-2.1.13]$

Here is some of the corresponding collectl output:

[root@ol62 queue]# collectl -sD -i 5 --iosize --dskfilt sda
waiting for 5 second sample...

# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
sda         196313      0  24K    8       5      0    1    9       8    31     1      0   99
sda         196681      0  24K    8       6      0    0   16       8    31     1      0  100
sda         196369      0  24K    8       7      0    1   12       8    31     1      0   99
sda         196217      0  24K    8       4      0    0   10       8    31     1      0  100
sda         196300      0  24K    8      14      0    2    6       8    31     1      0  100
sda         196699      0  24K    8       6      0    0   16       8    31     1      0   99
sda         196340      0  24K    8       3      0    0   16       8    31     1      0   99
sda         196350      0  24K    8       6      0    0   16       8    31     1      0  100
sda         196379      0  24K    8       6      0    0   16       8    31     1      0   99
sda         196560      0  24K    8       3      0    0   16       8    31     1      0   99
sda         196259      0  24K    8       8      0    1    8       8    31     1      0   99

Summary Number 1

So it would appear that the SSDs are more or less equal from FIO’s point of view. If you followed closely (hurray if you did-I appreciate this is a long post!) you will have noticed the following:

  • The SLOB test used updates (20%). There is furthermore just one data file for the IOPS tablespace (-> there is concurrency!)
  • The FIO test was based on random reads, no writes. It’s 4 jobs also worked on 4 distinct files.

I have tried to avoid the effect of file system buffering by setting buffer to 0 and I’m also using asynchronous IO as provided by libaio (just as Oracle does).

Theory Number 2

It would appear that XFS is better suited for concurrent writes. Let’s put that theory into action. I reformatted the SATA3 SSD with XFS and ran another test. Feast your eyes-comparing the AWR reports of the EXT4 test execution to XFS on /dev/sde:

WORKLOAD REPOSITORY COMPARE PERIOD REPORT

Snapshot Set    DB Id    Instance     Inst Num Release     Cluster Host          Std Block Size
------------ ----------- ------------ -------- ----------- ------- ------------ ---------------
First (1st)   1269760792 SLOB12              1 12.1.0.2.0  NO      ol62.localdo     8192
Second (2nd)  1269760792 SLOB12              1 12.1.0.2.0  NO      ol62.localdo     8192

Snapshot Set  Begin Snap Id Begin Snap Time            End Snap Id End Snap Time                  Avg Active Users           Elapsed Time (min)            DB time (min)
------------ -------------- ------------------------- ------------ ------------------------- -------------------------- -------------------------- --------------------------
1st                      49 31-Oct-14 12:38:18 (Fri)           50 31-Oct-14 12:40:20 (Fri)                      15.8                       2.0                      32.3
2nd                      55 31-Oct-14 15:35:23 (Fri)           56 31-Oct-14 15:37:25 (Fri)                      15.7                       2.0                      32.1
                                                                                      ~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                                                      %Diff:                     0.2%                      -0.5%                      -0.6%

Host Configuration Comparison
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                     1st                  2nd                 Diff     %Diff
----------------------------------- -------------------- -------------------- -------------------- ---------
Number of CPUs:                                       24                   24                    0       0.0
Number of CPU Cores:                                  24                   24                    0       0.0
Number of CPU Sockets:                                 2                    2                    0       0.0
Physical Memory:                                64413.7M             64413.7M                   0M       0.0
Load at Start Snapshot:                             9.57                 1.77                 -7.8     -81.5
Load at End Snapshot:                               15.6                14.15                -1.45      -9.3
%User Time:                                          .64                 2.07                 1.43     223.4
%System Time:                                        .74                 1.48                  .75     100.0
%Idle Time:                                         98.6                 96.4                 -2.2      -2.2
%IO Wait Time:                                      9.06                53.29                44.23     488.2
Cache Sizes
~~~~~~~~~~~
                          1st (M)    2nd (M)   Diff (M)    %Diff
---------------------- ---------- ---------- ---------- --------
Memory Target
.....SGA Target
..........Buffer Cache       16.0       16.0        0.0      0.0
..........Shared Pool     1,024.0    1,024.0        0.0      0.0
..........Large Pool
..........Java Pool          16.0       16.0        0.0      0.0
..........Streams Pool
.....PGA Target           1,024.0    1,024.0        0.0      0.0
Log Buffer                  128.0      128.0        0.0      0.0
In-Memory Area

Workload Comparison
~~~~~~~~~~~~~~~~~~~                      1st Per Sec          2nd Per Sec      %Diff              1st Per Txn          2nd Per Txn      %Diff
                                     ---------------      ---------------     ------          ---------------      ---------------     ------
                       DB time:                 15.8                 15.8        0.2                      4.4                  1.5      -66.7
                      CPU time:                  0.3                  0.9      175.8                      0.1                  0.1        0.0
           Background CPU time:                  0.1                  0.1      133.3                      0.0                  0.0      -50.0
             Redo size (bytes):          4,652,147.0         15,186,494.1      226.4              1,311,306.6          1,426,136.0        8.8
         Logical read (blocks):              5,231.1             16,957.8      224.2                  1,474.5              1,592.5        8.0
                 Block changes:              2,640.2              8,922.1      237.9                    744.2                837.9       12.6
        Physical read (blocks):              3,750.0             12,560.9      235.0                  1,057.0              1,179.6       11.6
       Physical write (blocks):              1,729.0              5,855.0      238.6                    487.4                549.8       12.8
              Read IO requests:              3,749.3             12,560.8      235.0                  1,056.8              1,179.6       11.6
             Write IO requests:              1,538.2              5,219.5      239.3                    433.6                490.2       13.0
                  Read IO (MB):                 29.3                 98.1      234.9                      8.3                  9.2       11.6
                 Write IO (MB):                 13.5                 45.7      238.6                      3.8                  4.3       12.9
                  IM scan rows:                  0.0                  0.0        0.0                      0.0                  0.0        0.0
       Session Logical Read IM:
                    User calls:                  1.4                  1.4        2.9                      0.4                  0.1      -66.7
                  Parses (SQL):                  6.3                 10.3       62.7                      1.8                  1.0      -45.5
             Hard parses (SQL):                  0.3                  0.0      -86.2                      0.1                  0.0     -100.0
            SQL Work Area (MB):                  0.4                  0.2      -48.8                      0.1                  0.0      -48.8
                        Logons:                  0.1                  0.1        0.0                      0.0                  0.0      -50.0
                Executes (SQL):                 23.7                 58.0      144.3                      6.7                  5.4      -18.5
                  Transactions:                  3.5                 10.6      200.0

                                               First               Second       Diff
                                     ---------------      ---------------     ------
     % Blocks changed per Read:                 50.5                 52.6        2.1
              Recursive Call %:                 97.6                 99.0        1.5
    Rollback per transaction %:                  0.0                  0.0        0.0
                 Rows per Sort:                 35.4                 49.6       14.2
    Avg DB time per Call (sec):                 11.5                 11.3       -0.3

Top Timed Events   First DB/Inst: SLOB12/SLOB12 Snaps: 49-50 (Elapsed time: 122.614 sec  DB time: 1935.33 sec),  Second DB/Inst: SLOB12/SLOB12 Snaps: 55-56 (Elapsed time: 121.705 sec  DB time: 1923.77 sec)
-> Events with a "-" did not make the Top list in this set of snapshots, but are displayed for comparison purposes

                                               1st                                                                                                2nd
------------------------------------------------------------------------------------------------   ------------------------------------------------------------------------------------------------
Event                          Wait Class           Waits      Time(s)  Avg Time(ms)    %DB time   Event                          Wait Class           Waits      Time(s)  Avg Time(ms)    %DB time
------------------------------ ------------- ------------ ------------ ------------- -----------   ------------------------------ ------------- ------------ ------------ ------------- -----------
 db file sequential read       User I/O           459,637      1,920.9           4.2        99.3    db file sequential read       User I/O         1,528,725      1,840.4           1.2        95.7
 db file async I/O submit      System I/O           2,975         97.2          32.7         5.0    CPU time                                             N/A        110.3           N/A         5.7
 CPU time                                             N/A         40.2           N/A         2.1    db file parallel write        System I/O         114,997         34.3           0.3         1.8
 log file parallel write       System I/O           1,881          3.2           1.7         0.2    free buffer waits             Configuration        2,318         28.0          12.1         1.5
 db file parallel write        System I/O           4,670          0.9           0.2         0.0    log file parallel write       System I/O           6,897         14.3           2.1         0.7
 control file sequential read  System I/O             499          0.2           0.3         0.0    db file async I/O submit      System I/O          12,819          5.2           0.4         0.3
 oracle thread bootstrap       Other                    7          0.1          18.1         0.0    buffer busy waits             Concurrency            407          2.7           6.7         0.1
 Disk file operations I/O      User I/O                77          0.1           0.9         0.0    Data file init write          User I/O               152          0.9           5.7         0.0
 library cache: mutex X        Concurrency             22          0.1           2.7         0.0    read by other session         User I/O                60          0.7          11.9         0.0
 Data file init write          User I/O                 5          0.0           5.6         0.0    enq: HW - contention          Configuration           18          0.7          39.3         0.0
-read by other session         User I/O                18          0.0           0.6         0.0   -control file sequential read  System I/O             841          0.5           0.5         0.0
-buffer busy waits             Concurrency             82          0.0           0.0         0.0   -oracle thread bootstrap       Other                    7          0.1          17.2         0.0
-                                                     N/A          N/A           N/A         N/A   -Disk file operations I/O      User I/O               136          0.1           0.8         0.0
-                                                     N/A          N/A           N/A         N/A   -library cache: mutex X        Concurrency             21          0.0           1.4         0.0
                          --------------------------------------------------------------------------------------------------------------------

Everything is just better – the IO times, the redo size, the number of IOPS etc. That’s what I would have expected from the outset. Good to know-I’ll no longer use EXT4 for Oracle databases. After discussing this with @fritshoogland he showed me a post by @kevinclosson that didn’t come to mind at the time:

http://kevinclosson.net/2012/03/06/yes-file-systems-still-need-to-support-concurrent-writes-yet-another-look-at-xfs-versus-ext4/

Posted in 12c Release 1, Linux | Leave a Comment »

Exadata and Virtual Private Database: will it offload my query?

Posted by Martin Bach on October 14, 2014

During one of the classes I taught about Exadata optimisations I had an interesting question:

If I am using VPD, will Exadata still offload the query?

Background is that we discussed function offloading, and the meta-view v$sqlfn_metadata. It turned out that SYS_CONTEXT() is not offloadable in 11.2.0.4.

SQL> select name,datatype,analytic,aggregate,offloadable,descr
  2  from v$sqlfn_metadata where name = 'SYS_CONTEXT';

NAME                                               DATATYPE ANA AGG OFF DESCR
-------------------------------------------------- -------- --- --- --- ------------------------------
SYS_CONTEXT                                        UNKNOWN  NO  NO  NO  SYS_CONTEXT

Since I’m a great fan of the Tom Kyte method (don’t say it is so, show it!) I needed a quick example. Of all the Oracle books I read “Effective Oracle by Design” was among the most inspiring.

Where to check?

My first idea was to check v$sqlfn_metadata to see if the ever present SYS_CONTEXT() was offloadable:

SQL> select name,offloadable
  2  from v$sqlfn_metadata
  3  where name = 'SYS_CONTEXT';

NAME                    OFF
----------------------- ---
SYS_CONTEXT             NO

SYS:dbm011> select banner from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
PL/SQL Release 11.2.0.4.0 - Production
CORE	11.2.0.4.0	Production
TNS for Linux: Version 11.2.0.4.0 - Production
NLSRTL Version 11.2.0.4.0 - Production

OK so it does not appear to be offloadable. Does it matter? As you will see, it does not although it could startle you at first.

Reproduce a test case: 11.2.0.4/11.2.3.3.1.140529.1

After finishing the example from the Oracle documentation I decided that I needed a few more rows in the table to get to a more realistic data distribution for an Exadata system. You wouldn’t see a smart scan on a row with < 10 rows. This can be done quite easily, and the end result was:

SYS> select count(*) from scott.orders_tab;

   COUNT(*)
-----------
   12000003

1 row selected.

I had to ensure that the query is offloaded first-simplez! Kerry Osborne has a script for this:

SYS> select /*+ gather_plan_statistics rowcount_sys_stats */
  2  count(*) from scott.orders_tab;

   COUNT(*)
-----------
   12000003

1 row selected.

Elapsed: 00:00:00.81

SYS:OGGSRC> @scripts/fsx
Enter value for sql_text: %rowcount_sys_stats%
Enter value for sql_id:

SQL_ID	       CHILD   PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD IO_SAVED_% SQL_TEXT
------------- ------ ----------- ------ ---------- ------ ------- ---------- ----------------------------------------------------------------------
8bv5b04mjku08	   0  1093340548      1        .81	0 Yes	       38.47 select /*+ gather_plan_statistics rowcount_sys_stats */ count(*) from

1 row selected.

SYS:OGGSRC> @scripts/dplan
Enter value for sql_id: 8bv5b04mjku08
Enter value for child_no:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID	8bv5b04mjku08, child number 0
-------------------------------------
select /*+ gather_plan_statistics rowcount_sys_stats */ count(*) from
scott.orders_tab

Plan hash value: 1093340548

---------------------------------------------------------------------------------
| Id  | Operation                  | Name       | Rows  | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |            |       |  7521 (100)|          |
|   1 |  SORT AGGREGATE            |            |     1 |            |          |
|   2 |   TABLE ACCESS STORAGE FULL| ORDERS_TAB |    12M|  7521   (1)| 00:01:31 |
---------------------------------------------------------------------------------

15 rows selected.

So I guess it is.

Enter VPD

With all the VPD bells and whistles turned on I connected as on the the users for which there is an explicit context set and ran my query:

TBROOKE> select /*+ gather_plan_statistics tbrooke_query_001 */
  2  count(*) from scott.orders_tab;

   COUNT(*)
-----------
          4

1 row selected.

Elapsed: 00:00:00.15

1 row selected.

Which shows that the VPD policy works. What do my tools indicate?

SYS> @scripts/fsx
Enter value for sql_text: %tbrooke_query_001%
Enter value for sql_id:

SQL_ID	       CHILD   PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD IO_SAVED_% SQL_TEXT
------------- ------ ----------- ------ ---------- ------ ------- ---------- ----------------------------------------------------------------------
bs9gq0bdqazzu	   0  1093340548      1        .15	0 Yes	       99.98 select /*+ gather_plan_statistics tbrooke_query_001 */ count(*) from s

1 row selected.

SYS> @scripts/dplan
Enter value for sql_id: bs9gq0bdqazzu
Enter value for child_no: 0

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID	bs9gq0bdqazzu, child number 0
-------------------------------------
select /*+ gather_plan_statistics tbrooke_query_001 */ count(*) from
scott.orders_tab

Plan hash value: 1093340548

-----------------------------------------------------------------------------------------
| Id  | Operation                  | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |            |       |       |  7651 (100)|          |
|   1 |  SORT AGGREGATE            |            |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| ORDERS_TAB |     1 |     6 |  7651   (3)| 00:01:32 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage("CUST_NO"=TO_NUMBER(SYS_CONTEXT('orders_ctx','cust_no')))
       filter("CUST_NO"=TO_NUMBER(SYS_CONTEXT('orders_ctx','cust_no')))


21 rows selected.

So using the script it is easily visible that a smart scan has happened and in fact it saved 99.98% of IO. Which is not surprising giving that only 4 rows out of the whole result set have been returned. The Real Time SQL Monitor Report confirmed the finding by the way. I had to sneak in another hint (+monitor) otherwise the statement wouldn’t be captured in SQL Monitor (only “long running” statements are captured by default)

SQL Monitoring Report

SQL Text
------------------------------
select /*+ gather_plan_statistics monitor tbrooke_query_002 */ count(*) from scott.orders_tab

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  TBROOKE (33:239)
 SQL ID              :  8ydqam3fuwt2z
 SQL Execution ID    :  16777216
 Execution Started   :  10/14/2014 04:28:24
 First Refresh Time  :  10/14/2014 04:28:24
 Last Refresh Time   :  10/14/2014 04:28:24
 Duration            :  .100139s
 Module/Action       :  SQL*Plus/-
 Service             :  SYS$USERS
 Program             :  sqlplus@enkdb01.enkitec.com (TNS V1-V3)
 Fetch Calls         :  1

Global Stats
====================================================================================
| Elapsed |   Cpu   |    IO    | PL/SQL  | Fetch | Buffer | Read | Read  |  Cell   |
| Time(s) | Time(s) | Waits(s) | Time(s) | Calls |  Gets  | Reqs | Bytes | Offload |
====================================================================================
|    0.10 |    0.01 |     0.09 |    0.00 |     1 |  27352 |  228 | 214MB |  99.98% |
====================================================================================

SQL Plan Monitoring Details (Plan Hash Value=1093340548)
=========================================================================================================================================================================
| Id |          Operation           |    Name    |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  |  Cell   |  Mem  | Activity | Activity Detail |
|    |                              |            | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Offload | (Max) |   (%)    |   (# samples)   |
=========================================================================================================================================================================
|  0 | SELECT STATEMENT             |            |         |      |         1 |     +0 |     1 |        1 |      |       |         |       |          |                 |
|  1 |   SORT AGGREGATE             |            |       1 |      |         1 |     +0 |     1 |        1 |      |       |         |       |          |                 |
|  2 |    TABLE ACCESS STORAGE FULL | ORDERS_TAB |       1 | 7651 |         1 |     +0 |     1 |        4 |  228 | 214MB |  99.98% |    2M |          |                 |
=========================================================================================================================================================================

This confirms that a full table scan happened, and it must have been executed as a direct path read. A DPR on Exadata most oftent transforms into a smart scan. As you can see I didn’t specify any other predicate, and yet the VPD was offloaded.

Reproduce a test case: 12.1.0.2.0/12.1.1.1.1.140712

Interestingly in 12.1.0.2.0 the function SYS_CONTEXT is offloadable:

SQL> select name, offloadable
  2  from v$sqlfn_metadata where name = 'SYS_CONTEXT';

NAME                           OFF
------------------------------ ---
SYS_CONTEXT                    YES

1 row selected.

For completeness sake I have repeated my test here. I copied the table via network link and created the same users and VPD. Not too much of a difference. Information is shown here without comments:

SQL> select banner from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
PL/SQL Release 12.1.0.2.0 - Production
CORE    12.1.0.2.0      Production
TNS for Linux: Version 12.1.0.2.0 - Production
NLSRTL Version 12.1.0.2.0 - Production


TBROOKE> select /*+ gather_plan_statistics tbrooke_query_013 */ count(*) from scott.orders_tab;

  COUNT(*)
----------
         4

Elapsed: 00:00:00.02

SQL> @scripts/fsx
Enter value for sql_text: %tbrooke_query_013%
Enter value for sql_id:

SQL_ID         CHILD  PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD IO_SAVED_% SQL_TEXT
------------- ------ ---------- ------ ---------- ------ ------- ---------- ----------------------------------------------------------------------
bf0s5hzr7x9r5      0 1093340548      1        .02      0 Yes          99.97 select /*+ gather_plan_statistics tbrooke_query_013 */ count(*) from s

1 row selected.

SQL> @scripts/dplan
Enter value for sql_id: bf0s5hzr7x9r5
Enter value for child_no: 0

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  bf0s5hzr7x9r5, child number 0
-------------------------------------
select /*+ gather_plan_statistics tbrooke_query_013 */ count(*) from
scott.orders_tab

Plan hash value: 1093340548

-----------------------------------------------------------------------------------------
| Id  | Operation                  | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |            |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE            |            |     1 |    13 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| ORDERS_TAB |     1 |    13 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage("CUST_NO"=TO_NUMBER(SYS_CONTEXT('orders_ctx','cust_no')))
       filter("CUST_NO"=TO_NUMBER(SYS_CONTEXT('orders_ctx','cust_no')))


21 rows selected.

SQL Monitoring Report

SQL Text
------------------------------
select /*+ gather_plan_statistics monitor tbrooke_query_014 */ count(*) from scott.orders_tab

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  TBROOKE (1042:32766)
 SQL ID              :  55yy67scgw2sf
 SQL Execution ID    :  16777216
 Execution Started   :  10/14/2014 05:13:30
 First Refresh Time  :  10/14/2014 05:13:30
 Last Refresh Time   :  10/14/2014 05:13:30
 Duration            :  .016025s
 Module/Action       :  SQL*Plus/-
 Service             :  SYS$USERS
 Program             :  sqlplus@enkdb03.enkitec.com (TNS V1-V3)
 Fetch Calls         :  1

Global Stats
==================================================================================================
| Elapsed |   Cpu   |    IO    | Application | PL/SQL  | Fetch | Buffer | Read | Read  |  Cell   |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Time(s) | Calls |  Gets  | Reqs | Bytes | Offload |
==================================================================================================
|    0.02 |    0.01 |     0.01 |        0.00 |    0.00 |     1 |  27441 |  223 | 214MB |  99.99% |
==================================================================================================

SQL Plan Monitoring Details (Plan Hash Value=1093340548)
=========================================================================================================================================================================
| Id |          Operation           |    Name    |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  |  Cell   |  Mem  | Activity | Activity Detail |
|    |                              |            | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Offload | (Max) |   (%)    |   (# samples)   |
=========================================================================================================================================================================
|  0 | SELECT STATEMENT             |            |         |      |         1 |     +0 |     1 |        1 |      |       |         |       |          |                 |
|  1 |   SORT AGGREGATE             |            |       1 |      |         1 |     +0 |     1 |        1 |      |       |         |       |          |                 |
|  2 |    TABLE ACCESS STORAGE FULL | ORDERS_TAB |       1 |    2 |         1 |     +0 |     1 |        4 |  223 | 214MB |  99.99% |    3M |          |                 |
=========================================================================================================================================================================

In the last example flash cache and storage indexes attributed greatly to the quick execution time. Using Adrian Billington’s mystats I can see more detail. I have removed what’s not needed from the report.

SQL> @scripts/mystats stop t=1

==========================================================================================
MyStats report : 14-OCT-2014 05:27:44
==========================================================================================


------------------------------------------------------------------------------------------
1. Summary Timings
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
TIMER   snapshot interval (seconds)                                                  13.59
TIMER   CPU time used (seconds)                                                       0.01


------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    CPU used by this session                                                         7
STAT    CPU used when call started                                                       7
STAT    DB time                                                                          9
...
STAT    cell IO uncompressed bytes                                               3,031,040
STAT    cell blocks helped by minscn optimization                                      370
STAT    cell blocks processed by cache layer                                           370
STAT    cell blocks processed by data layer                                            370
STAT    cell blocks processed by txn layer                                             370
STAT    cell flash cache read hits                                                      10
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  224,763,904
STAT    cell physical IO bytes saved by storage index                          221,732,864
STAT    cell physical IO interconnect bytes                                          5,360
STAT    cell physical IO interconnect bytes returned by smart scan                   5,360
STAT    cell scans                                                                       1
...
STAT    consistent gets                                                             27,816
STAT    consistent gets direct                                                      27,437
STAT    consistent gets examination                                                      3
STAT    consistent gets examination (fastpath)                                           3
STAT    consistent gets from cache                                                     379
STAT    consistent gets pin                                                            376
STAT    consistent gets pin (fastpath)                                                 376
STAT    db block changes                                                               787
STAT    db block gets                                                                2,879
STAT    db block gets from cache                                                     2,879
...
STAT    logical read bytes from cache                                           26,689,536
...
STAT    physical read bytes                                                    224,763,904
STAT    physical read requests optimized                                               223
STAT    physical read total IO requests                                                223
STAT    physical read total bytes                                              224,763,904
STAT    physical read total bytes optimized                                    224,763,904
STAT    physical read total multi block requests                                       215
STAT    physical reads                                                              27,437
STAT    physical reads direct                                                       27,437
...
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                         3
STAT    table scan disk non-IMC rows gotten                                          1,685
STAT    table scan rows gotten                                                       1,685
STAT    table scans (direct read)                                                        1
STAT    table scans (long tables)                                                        1
...

------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

In the case of VPD the fact that SYS_CONTEXT() is offloadable did not play a major role.

Posted in 11g Release 2, 12c Release 1, Exadata | Tagged: , | Leave a Comment »

Little things worth knowing-troubleshooting parallel statement queueing

Posted by Martin Bach on October 9, 2014

This is yet another one of these posts that hopefully help you as much as they are going to help me in the future. Recently I enjoyed troubleshooting a problem related to parallel execution. Since I have never really written down how to tackle such a problem I thought it might be nice to do that now.

This is 12.1.0.2.0 on Exadata, but the platform doesn’t really matter for the troubleshooting technique.

What is parallel statement queueing

Statement queueing is a pretty nifty tool new with 11.2. Instead of downgrading the requested Degree of Parallelism (DOP) as it was the case with parallel_adaptive_multi_user your session will queue until the requested number of parallel servers becomes available. The rationale behind this feature is that you might end up queueing for 1 minute until all the requested PX servers become available, but once they are available your session can finish in 2 minutes with a DOP of 16 rather than 8 minutes with a (old behaviour) downgraded DOP of 2 for example. The 1 minute was well spent waiting. If you want to fine tune statement queueing, database resource manager allows you to.

The downside is that you don’t get statement queueing without enabling automatic DOP unless you play with underscore parameters. Automatic DOP is quite often the reason why you see statement queueing in the first place.

How does this statement queueing work?

With parallel_degree_policy set to AUTO (warning again-this can be dangerous!) the optimiser estimates the time it takes to execute a SQL statement during the hard parse. If the estimated execution time exceeds parallel_min_time_threshold (default is AUTO which is 10 seconds) then the optimiser will try to work out the “optimal” DOP which is calculated as min(parallel_degree_limit,  calculated ideal DOP). The default parallel_degree_limit is set to CPU, which is documented as follows: “The maximum degree of parallelism is limited by the number of CPUs in the system”. Ooops-that can be a lot. It might be better to set the parallel_degree_limit to an integer, signifying the maximum allowed DOP for a statement executing in parallel. On Exadata you don’t normally need very high degrees of parallelism, so a more conservative settings might be in order.

Queueing

When does the queueing mechanism kick in? Let’s imagine that you have parallel_max_servers = parallel_servers_target = 48 defined in your system, in my case an Exadata X2 with 2s12c24t Westmere EP processors. This isn’t the default by the way. This setting allows you to execute queries 48-way parallel per instance. Let’s assume that 40 query slaves are in use already, and another user wants to execute a query 12-way parallel. The sum of currently executing parallel slaves + the requested number of slaves exceeds parallel_servers_target, triggering statement queueing for your session.

Wolfgang it!

To create a reproducible test case I set parallel_max_servers to 48 and parallel_servers_target to 24. I also enabled automatic DOP in my session. My test queries are directed at a 256 million rows table of approximately 80GB in size, non partitioned. Before executing the first SQL statement I enabled event 10053 to Wolfgang my query.

SQL> alter session set events '10053 trace name context forever, level 1';

Session altered.

Elapsed: 00:00:00.02
SQL> desc bigtab
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                                 NUMBER
 V1                                                 VARCHAR2(40)
 N1                                                 NUMBER
 N2                                                 NUMBER
 N_256K                                             NUMBER
 N_128K                                             NUMBER
 N_8K                                               NUMBER
 PADDING                                            VARCHAR2(960)

SQL> select 1 + sum(n1) + sum (n2) - sum(n_8k) from bigtab where id > 1;

1+SUM(N1)+SUM(N2)-SUM(N_8K)
---------------------------
                 1.0471E+13

Elapsed: 00:00:35.59

The 10053 trace should give me some idea about the DOP. Let’s have a look at it:

     36 **************************
     37 Automatic degree of parallelism (AUTODOP)
     38 **************************
     39 Automatic degree of parallelism is enabled for this statement in auto mode.
     40 kkopqSetForceParallelProperties: Hint:no
     41 Query: compute:yes forced:no forceDop:0
     42 Calibration statistics is enabled.
     43 Start with a serial pass, cost the DOP to use

   1935 AUTO DOP PLANS EVALUATION
   1936 ***************************************
   1937 Compilation completed with Dop: 1.
   1938   Cost_io:  2776823.000000  Cost_cpu: 154935265907
   1939   Card:     1.000000  Bytes:    19.000000
   1940   Cost:     2781428.613003  Est_time:  401168ms
   1941 kkopqCombineDop: Dop:40 Hint:no
   1942 Query: compute:yes forced:no  scanDop:40 cpuDop:1 forceDop:0
   1943 Serial plan is expensive enough to be a candidate for parallelism (2781429)
   1944 AutoDOP: kkopqAdjustDop: dop is 40 after affinity normalization
   1945 Signal reparse with DOP 40.
   1946 *****************************
   1947 Number of Compilations tried: 1
   1948 *****************************

   3853 AUTO DOP PLANS EVALUATION
   3854 ***************************************
   3855 Compilation completed with Dop: 40.
   3856   Cost_io:  77133.972222  Cost_cpu: 2387701756
   3857   Card:     1.000000  Bytes:    19.000000
   3858   Cost:     77204.949157  Est_time:  11135ms
   3859 kkopqCombineDop: Dop:80 Hint:no
   3860 Query: compute:yes forced:no  scanDop:40 cpuDop:80 forceDop:0
   3861 AutoDOP: kkopqAdjustDop: computed dop before applying degree limit was 80 and now it is 48
   3862 AutoDOP: kkopqAdjustDop: dop is 48 after affinity normalization
   3863 Adopt new plan with dop=40
   3864 Signal reparse with DOP 48.
   3865 *****************************
   3866 Number of Compilations tried: 2
   3867 *****************************

   5772 AUTO DOP PLANS EVALUATION
   5773 ***************************************
   5774 Compilation completed with Dop: 48.
   5775   Cost_io:  64278.310185  Cost_cpu: 1989751463
   5776   Card:     1.000000  Bytes:    19.000000
   5777   Cost:     64337.457631  Est_time:  9279ms
   5778 kkopqCombineDop: Dop:80 Hint:no
   5779 Query: compute:yes forced:no  scanDop:40 cpuDop:80 forceDop:0
   5780 Adopt new plan with dop=48
   5781 kkopqSetDopReason: Reason why we chose this DOP is: degree limit.
   5782 Costing completed. DOP chosen: 48.
   5783 AutoDop: kkopqSetMaxDopInCursorEnv:In the Cursor Session Env, max DOP is set to 48
   5784 *****************************
   5785 Number of Compilations tried: 3
   5786 *****************************

   5811 ============
   5812 Plan Table
   5813 ============
   5814 --------------------------------------------------+-----------------------------------+-------------------------+
   5815 | Id  | Operation                       | Name    | Rows  | Bytes | Cost  | Time      |  TQ  |IN-OUT|PQ Distrib |
   5816 --------------------------------------------------+-----------------------------------+-------------------------+
   5817 | 0   | SELECT STATEMENT                |         |       |       |   63K |           |      |      |           |
   5818 | 1   |  SORT AGGREGATE                 |         |     1 |    19 |       |           |      |      |           |
   5819 | 2   |   PX COORDINATOR                |         |       |       |       |           |      |      |           |
   5820 | 3   |    PX SEND QC (RANDOM)          | :TQ10000|     1 |    19 |       |           |:Q1000| P->S |QC (RANDOM)|
   5821 | 4   |     SORT AGGREGATE              |         |     1 |    19 |       |           |:Q1000| PCWP |           |
   5822 | 5   |      PX BLOCK ITERATOR          |         |  276M | 5248M |   63K |  00:02:09 |:Q1000| PCWC |           |
   5823 | 6   |       TABLE ACCESS STORAGE FULL | BIGTAB  |  276M | 5248M |   63K |  00:02:09 |:Q1000| PCWP |           |
   5824 --------------------------------------------------+-----------------------------------+-------------------------+
   5825 Predicate Information:
   5826 ----------------------
   5827 6 - storage(:Z>=:Z AND :Z<=:Z AND "ID">1)
   5828 6 - filter("ID">1)
   5829
   5830 Content of other_xml column
   5831 ===========================
   5832   derived_cpu_dop: 80
   5833   derived_io_dop : 40
   5834   dop_reason     : degree limit
   5835   dop            : 48
   5836   px_in_memory_imc: no
   5837   px_in_memory   : no
   5838   io_rate        : 200
   5839   derived_io_dop : 40
   5840   cpu_rate       : 1000
   5841   derived_cpu_dop: 52
   5842   cpu_rate       : 1000
   5843   derived_cpu_dop: 29
   5844   cpu_rate       : 1000
   5845   derived_cpu_dop: 29
   5846   db_version     : 12.1.0.2
   5847   parse_schema   : MARTIN
   5848   dynamic_sampling: 11
   5849   plan_hash_full : 1216263238
   5850   plan_hash      : 4050997022
   5851   plan_hash_2    : 1216263238
   5852   Outline Data:
   5853   /*+
   5854     BEGIN_OUTLINE_DATA
   5855       IGNORE_OPTIM_EMBEDDED_HINTS
   5856       OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
   5857       DB_VERSION('12.1.0.2')
   5858       OPT_PARAM('optimizer_dynamic_sampling' 11)
   5859       ALL_ROWS
   5860       SHARED(48)
   5861       OUTLINE_LEAF(@"SEL$1")
   5862       FULL(@"SEL$1" "BIGTAB"@"SEL$1")
   5863     END_OUTLINE_DATA
   5864   */

There you have it! The v$sql_plan.other_xml column contains all the necessary information-no need for a 10053 trace :). The fanciest way I can come up with to get that information is to use this statement:

select t.*
 from v$sql_plan v,
  xmltable(
	'/other_xml/info'
	passing xmltype(v.other_xml)
	columns
		info_type varchar2(30) path '@type',
		info_value varchar2(30) path '/info'
  ) t
where v.sql_id = '&sql_id'
  and v.child_number = &child_number
  and other_xml is not null;

INFO_TYPE                      INFO_VALUE
------------------------------ ------------------------------
derived_cpu_dop                51
derived_io_dop                 40
dop_reason                     degree limit
dop                            48
px_in_memory_imc               no
px_in_memory                   no
io_rate                        200
derived_io_dop                 40
cpu_rate                       1000
derived_cpu_dop                26
cpu_rate                       1000
derived_cpu_dop                26
cpu_rate                       1000
derived_cpu_dop                26
db_version                     12.1.0.2
parse_schema                   "MARTIN"
plan_hash_full                 1216263238
plan_hash                      4050997022
plan_hash_2                    1216263238

19 rows selected.

But I digress… By the way-this statement used 48 PX server processes because it was the only one active at the time.

Causing trouble

After hitting publish I noticed that I copied the wrong example…

An example working better for the purpose of demonstrating the effect of queueing uses a more realistic case where statements use different DOPs. Two sessions are enough, one that will use 12 slaves, and another that requests 36. The statement with a DOP of 12 is started first, the one with a DOP of 36 a few seconds later. In a third session I gathered some diagnostic information:

SYS:dbm011> select px.sid, px.serial#,px.qcsid,px.qcserial#,px.qcinst_id,px.degree,px.req_degree,
  2  s.username, s.sql_id, s.sql_child_number, s.event, s.state
  3  from v$px_session px, v$session s
  4  where s.sid = px.sid
  5    and s.serial# = px.serial#
  6  order by px.qcsid;

        SID     SERIAL#       QCSID   QCSERIAL#   QCINST_ID      DEGREE  REQ_DEGREE USERNAME             SQL_ID        SQL_CHILD_NUMBER EVENT                                                            STATE
----------- ----------- ----------- ----------- ----------- ----------- ----------- -------------------- ------------- ---------------- ---------------------------------------------------------------- --------------------
        265       14828         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITED SHORT TIME
        330       21719         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        395       59827         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITED SHORT TIME
        461       17363         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        525        4955         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        592        2687         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        526       60805         526                                                 MARTIN               9md3ukhghgj58                0 PX Deq: Execute Reply                                            WAITING
        720       27626         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        786       47540         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITED KNOWN TIME
        850       41110         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITED SHORT TIME
        918       46836         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        983       12699         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        657       22654         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING

13 rows selected.

SYS:dbm011> select count(qcsid),qcsid from v$px_session group by qcsid;

COUNT(QCSID)       QCSID
------------ -----------
          13         526

SYS:dbm011>
SYS:dbm011> select count(event),event from v$session where username = 'MARTIN' group by event;

COUNT(EVENT) EVENT
------------ ----------------------------------------------------------------
          12 cell smart table scan
           1 PX Deq: Execute Reply
           1 resmgr:pq queued

SYS:dbm011>

So my first session is well underway, 12 way parallel as requested. The other session had to wait but kicked into motion as soon as my first session finished.

SYS:dbm011> select px.sid, px.serial#,px.qcsid,px.qcserial#,px.qcinst_id,px.degree,px.req_degree,
  2  s.username, s.sql_id, s.sql_child_number, s.event, s.state
  3  from v$px_session px, v$session s
  4  where s.sid = px.sid
  5    and s.serial# = px.serial#
  6  order by px.qcsid;

        SID     SERIAL#       QCSID   QCSERIAL#   QCINST_ID      DEGREE  REQ_DEGREE USERNAME             SQL_ID        SQL_CHILD_NUMBER EVENT                                                            STATE
----------- ----------- ----------- ----------- ----------- ----------- ----------- -------------------- ------------- ---------------- ---------------------------------------------------------------- --------------------
        265       43575         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        525       17778         525                                                 MARTIN               4dz3dapm4uy8q                0 PX Deq: Execute Reply                                            WAITING
        395       10504         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        462       57813         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        526       12648         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        591       42643         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        657        5236         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        720       19243         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        786       61636         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        850       17734         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED KNOWN TIME
        918       60016         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        983       24489         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
       1044       41056         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
       1108        8936         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
       1178       52303         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED KNOWN TIME
       1242       37008         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
       1307       28798         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED SHORT TIME
       1372       22172         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED SHORT TIME
       1437       59990         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED SHORT TIME
       1501        9307         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
          6       39400         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
         72       28569         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        137       18368         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        202       56832         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        333        6482         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        397       28334         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        461       17550         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        524         958         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        593       41235         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        656       51686         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED KNOWN TIME
        719       40346         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED SHORT TIME
        785        5157         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        851       35243         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED KNOWN TIME
        917       45635         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        980       18504         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
       1046       27779         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        330       10724         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING

37 rows selected.

SYS:dbm011>
SYS:dbm011> select count(qcsid),qcsid from v$px_session group by qcsid;

COUNT(QCSID)       QCSID
------------ -----------
          37         525

SYS:dbm011>
SYS:dbm011> select count(event),event from v$session where username = 'MARTIN' group by event;

COUNT(EVENT) EVENT
------------ ----------------------------------------------------------------
          36 cell smart table scan
           1 PX Deq: Execute Reply

The effect of queueing can be seen in the extended execution time:

  • 12 way parallel-no queueing: Elapsed: 00:00:07.16
  • 36 way parallel-with a bit of queueing: Elapsed: 00:00:12.92

V$PX_SESSION allows you to identify the culprit causing other sessions to queue-have a look at it and see if you can optimise the statement or run it later. This should unblock you. Also have a look at parallel_servers_target- you might have limited yourself too much. I personally don’t exceed the core/thread count on a box by too much to avoid scheduling problems.

Posted in 11g Release 2, 12c Release 1 | Leave a Comment »

Interesting observation about standby redo logs in Data Guard

Posted by Martin Bach on September 22, 2014

Some of you might have followed the discussion around the number of standby redo logs on twitter, but since 140 characters are woefully short for the complete story here’s the writeup that prompted the question. This is a test with 12.1.0.2 on virtualised Linux, repeated on a proper platform with physical hardware.

First of all here’s my setup. I have a dbca-based database (CDB, but doesn’t matter) that features 3 groups for its online redo logs. They are all 50 MB in size-important for this test, but not realistic :) Following the Oracle documentation I created n + 1 groups (per thread) on the standby to stop Data Guard broker from complaining about missing standby redo logs (SRL).

The end result was positive, here’s what the broker thinks:

DGMGRL> show configuration

Configuration - test

  Protection Mode: MaxPerformance
  Members:
  CDB1  - Primary database
    STDBY - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 23 seconds ago)

The broker likes to bemoan you in case SRLs are missing on either primary and/or standby. So I’m all good to go! But wait a second, when I’m using one of the cool new 12c features to check the failover readiness, something is odd:

DGMGRL> validate database "STDBY"

  Database Role:     Physical standby database
  Primary Database:  CDB1

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Flashback Database Status:
    CDB1:   Off
    STDBY:  Off

  Current Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (CDB1)                  (STDBY)
    1         3                       2                       Insufficient SRLs

  Future Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (STDBY)                 (CDB1)
    1         3                       0                       Insufficient SRLs
    Warning: standby redo logs not configured for thread 1 on CDB1

DGMGRL>

Pardon me? I _do_ have 4 groups of SRLs:

SQL> r
  1* select group#,sequence#,bytes,used,status from v$standby_log

    GROUP#  SEQUENCE#      BYTES       USED STATUS
---------- ---------- ---------- ---------- ----------
         4         27   52428800       2048 ACTIVE
         5         29   52428800      11264 ACTIVE
         6          0   52428800          0 UNASSIGNED
         7          0   52428800          0 UNASSIGNED

And 3 online redo logs:

SQL> select group#,thread#,sequence#,bytes,status from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES STATUS
---------- ---------- ---------- ---------- ----------------
         1          1         31   52428800 CURRENT
         2          1         29   52428800 INACTIVE
         3          1         30   52428800 INACTIVE

But the fun fact remains that out of my 4 standby redo logs, only 2 are ever used. @pioro suggested that it’s because there is no need to use any of the other two because of a lack of redo to be applied and that sounds plausible. To validate this I created a test environment on physical hardware with proper storage because my tests on the lab turned out to suffer from IO problems.

The setup is the same.

I created a couple of sessions that start off by creating a 1,000,000 rows table, then delete from it only to roll back after the delete completed. That’s a lot of redo for 50 MB files (again you wouldn’t use 50MB online redo logs in production-this is just a test). My observation remains: only groups 4 and 5 are used. When the redo generation on the primary gets too far ahead, there will be RFS transfers of the archived redo log. Here are a few queries with their output:

PROCESS   STATUS       CLIENT_P GROUP#                                      THREAD#  SEQUENCE#     BLOCK# ACTIVE_AGENTS
--------- ------------ -------- ---------------------------------------- ---------- ---------- ---------- -------------
ARCH      CLOSING      ARCH     5                                                 1        149      83968             0
ARCH      CONNECTED    ARCH     N/A                                               0          0          0             0
ARCH      CLOSING      ARCH     5                                                 1        147      86016             0
ARCH      CLOSING      ARCH     4                                                 1        148      83968             0
MRP0      APPLYING_LOG N/A      N/A                                               1        147      85766            25
RFS       IDLE         UNKNOWN  N/A                                               0          0          0             0
RFS       IDLE         UNKNOWN  N/A                                               0          0          0             0
RFS       RECEIVING    LGWR     3                                                 1        150      85744             0
RFS       IDLE         ARCH     N/A                                               0          0          0             0

Is this a problem? Not as far as I can tell. And maybe I did something incorrectly on my side too. There wasn’t a problem-my archiver process was quick enough to archive the SRLs and as soon as it fell behind it resorted to fetch archived redo logs from the primary. When it uses the FAL process you see line like this in the alert.log:

2014-09-17 02:34:45.164000 -05:00
RFS[4]: Selected log 4 for thread 1 sequence 211 dbid 788205474 branch 857548261
Media Recovery Log +RECO/STDBY/ARCHIVELOG/2014_09_17/thread_1_seq_209.368.858479683
Archived Log entry 141 added for thread 1 sequence 210 ID 0x2efbcba0 dest 1:
2014-09-17 02:34:46.215000 -05:00

When Real-Time-Apply was active (whenever redo transfer rates dropped), I saw this, alternating between group 4 and 5:

2014-09-17 02:35:33.653000 -05:00
Media Recovery Waiting for thread 1 sequence 230 (in transit)
Recovery of Online Redo Log: Thread 1 Group 5 Seq 230 Reading mem 0
  Mem# 0: +DATA/STDBY/ONLINELOG/group_5.380.858076769
  Mem# 1: +RECO/STDBY/ONLINELOG/group_5.342.858076769

So for some reason yet unknown to me there are only ever 2 SRLs in use.

Posted in 12c Release 1, Linux | 4 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 2,492 other followers