Martins Blog

Trying to explain complex things in simple terms

Little things worth knowing: Data Guard Broker Setup changes in 12c

Posted by Martin Bach on July 27, 2015

One of the problems I have seen when deploying Data Guard for systems such as RAC One Node and policy managed databases was the static listener configuration you needed in 11.2. This has changed with 12c for the better if you are using Grid Infrastructure.

http://docs.oracle.com/database/121/DGBKR/install.htm

In the section about static listener registration a little addendum can be found (thanks to Patrick Hurley/@phurley for pointing this out to me!):

“A static service needs to be defined and registered only if Oracle Clusterware or Oracle Restart is not being used.”

This is good news, let’s put it to the test; I’m a great fan of Oracle Restart. If I ever find the time I’d like to repeat this test with clustered Grid Infrastructure. I think the quote mentioned earlier still stands true but I would like to see it with my own eyes.

The Setup

To start with I used a dbca-created database named “NCDB” on my server named “server1″. It uses the DATA and RECO disk groups and is a non-CDB (although I’m quite sure that doesn’t matter). The patch level is current at the time of writing, I’m on 12.1.0.2.4 (which is the OJVM and PSU bundle patch for July 2015):

[oracle@server1 ~]$  /u01/app/oracle/product/12.1.0.2/dbhome_1/OPatch/opatch lspatches
21068507;Database PSU 12.1.0.2.4, Oracle JavaVM Component (Jul2015)
20831113;OCW Patch Set Update : 12.1.0.2.4 (20831113)
20831110;Database Patch Set Update : 12.1.0.2.4 (20831110)

OPatch succeeded.

[oracle@server1 ~]$  /u01/app/oracle/product/12.1.0.2/grid/OPatch/opatch lspatches
20831113;OCW Patch Set Update : 12.1.0.2.4 (20831113)
20831110;Database Patch Set Update : 12.1.0.2.4 (20831110)
20299018;ACFS Patch Set Update : 12.1.0.2.3 (20299018)
19872484;WLM Patch Set Update: 12.1.0.2.2 (19872484)

OPatch succeeded.

The future standby database will be called STDBY. This is for demonstration purposes only, I’m not a fan of using db_unique_name implying roles. Imagine you switch over to STDBY and run it as primary database for extended periods of time-that would be potentially confusing.

With that in mind, I created/updated a common tnsnames.ora on server1 and server2:

[oracle@server1 ~]$ cat $ORACLE_HOME/network/admin/tnsnames.ora
# tnsnames.ora Network Configuration File: 
# /u01/app/oracle/product/12.1.0.2/dbhome_1/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.

NCDB =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = server1.example.com)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = NCDB)
    )
  )

STDBY =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = server2.example.com)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = STDBY)
    )
  )

The duplicate command was a simple “duplicate target database for standby” after creating a simple pfile and auxiliary structures (audit_dump_dest, large pages etc) were in place. For this to work you need a copy of the controlfile for the standby in place as well as a backup of the primary database).

RMAN> duplicate target database for standby;

Starting Duplicate Db at 27-JUL-15
using channel ORA_AUX_DISK_1
using channel ORA_AUX_DISK_2

contents of Memory Script:
{
   sql clone "alter system set  control_files =
  ''+DATA/STDBY/CONTROLFILE/current.258.886176617'', 
  ''+RECO/STDBY/CONTROLFILE/current.257.886176617'' comment=
 ''Set by RMAN'' scope=spfile";
   restore clone standby controlfile;
}
executing Memory Script

sql statement: alter system set  control_files =   ''+DATA/STDBY/CONTROLFILE/current.258.886176617'', 
 ''+RECO/STDBY/CONTROLFILE/current.257.886176617'' comment= ''Set by RMAN'' scope=spfile

Starting restore at 27-JUL-15
using channel ORA_AUX_DISK_1
using channel ORA_AUX_DISK_2

channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: restoring control file
channel ORA_AUX_DISK_1: reading from backup piece +RECO/NCDB/AUTOBACKUP/2015_07_27/s_886176583.273.886176583
channel ORA_AUX_DISK_1: ORA-19870: error while restoring backup piece +RECO/NCDB/AUTOBACKUP/2015_07_27/s_886176583.273.886176583
ORA-19505: failed to identify file "+RECO/NCDB/AUTOBACKUP/2015_07_27/s_886176583.273.886176583"
ORA-17503: ksfdopn:2 Failed to open file +RECO/NCDB/AUTOBACKUP/2015_07_27/s_886176583.273.886176583
ORA-15012: ASM file '+RECO/NCDB/AUTOBACKUP/2015_07_27/s_886176583.273.886176583' does not exist

failover to previous backup

channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: restoring control file
channel ORA_AUX_DISK_1: reading from backup piece /u01/oraback/NCDB/09qd3uq4_1_1
channel ORA_AUX_DISK_1: piece handle=/u01/oraback/NCDB/09qd3uq4_1_1 tag=TAG20150727T160940
channel ORA_AUX_DISK_1: restored backup piece 1
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:01
output file name=+DATA/STDBY/CONTROLFILE/current.259.886176619
output file name=+RECO/STDBY/CONTROLFILE/current.258.886176619
Finished restore at 27-JUL-15
contents of Memory Script:
{
   sql clone 'alter database mount standby database';
}
executing Memory Script

sql statement: alter database mount standby database

contents of Memory Script:
{
   set newname for clone tempfile  1 to new;
   switch clone tempfile all;
   set newname for clone datafile  1 to new;
   set newname for clone datafile  3 to new;
   set newname for clone datafile  4 to new;
   set newname for clone datafile  5 to new;
   set newname for clone datafile  6 to new;
   restore
   clone database
   ;
}
executing Memory Script

executing command: SET NEWNAME

renamed tempfile 1 to +DATA in control file

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

Starting restore at 27-JUL-15
using channel ORA_AUX_DISK_1
using channel ORA_AUX_DISK_2

channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00004 to +DATA
channel ORA_AUX_DISK_1: restoring datafile 00005 to +DATA
channel ORA_AUX_DISK_1: restoring datafile 00006 to +DATA
channel ORA_AUX_DISK_1: reading from backup piece /u01/oraback/NCDB/05qd3uas_1_1
channel ORA_AUX_DISK_2: starting datafile backup set restore
channel ORA_AUX_DISK_2: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_2: restoring datafile 00001 to +DATA
channel ORA_AUX_DISK_2: restoring datafile 00003 to +DATA
channel ORA_AUX_DISK_2: reading from backup piece /u01/oraback/NCDB/06qd3uat_1_1
channel ORA_AUX_DISK_1: piece handle=/u01/oraback/NCDB/05qd3uas_1_1 tag=TAG20150727T160132
channel ORA_AUX_DISK_1: restored backup piece 1
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:15
channel ORA_AUX_DISK_2: piece handle=/u01/oraback/NCDB/06qd3uat_1_1 tag=TAG20150727T160132
channel ORA_AUX_DISK_2: restored backup piece 1
channel ORA_AUX_DISK_2: restore complete, elapsed time: 00:00:15
Finished restore at 27-JUL-15

contents of Memory Script:
{
   switch clone datafile all;
}
executing Memory Script
datafile 1 switched to datafile copy
input datafile copy RECID=6 STAMP=886176640 file name=+DATA/STDBY/DATAFILE/system.261.886176625
datafile 3 switched to datafile copy
input datafile copy RECID=7 STAMP=886176641 file name=+DATA/STDBY/DATAFILE/sysaux.263.886176625
datafile 4 switched to datafile copy
input datafile copy RECID=8 STAMP=886176641 file name=+DATA/STDBY/DATAFILE/undotbs1.262.886176625
datafile 5 switched to datafile copy
input datafile copy RECID=9 STAMP=886176641 file name=+DATA/STDBY/DATAFILE/example.260.886176625
datafile 6 switched to datafile copy
input datafile copy RECID=10 STAMP=886176641 file name=+DATA/STDBY/DATAFILE/users.264.886176625
Finished Duplicate Db at 27-JUL-15

RMAN>

That’s a working standby database. I will have to register it with Grid Infrastructure next.

[oracle@server2 ~]$ srvctl add database -db STDBY -oraclehome /u01/app/oracle/product/12.1.0.2/dbhome_1 \
> -role physical_standby -startoption MOUNT -policy automatic -diskgroup data,reco

Now all I need to do is set standby_file_management to auto, add standby redo logs to both databases and enable the broker. I’ll not show these here.

Broker Configuration

The next step in the deployment of my standby database is the creation of a Data Guard Broker configuration. I always create it using dgmgrl.

[oracle@server1 ~]$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys@ncdb
Password:
Connected as SYSDBA.

DGMGRL> CREATE CONFIGURATION brokertest AS
>  PRIMARY DATABASE IS 'NCDB'
>  CONNECT IDENTIFIER IS 'NCDB';
Configuration "brokertest" created with primary database "NCDB"

DGMGRL>  add database 'STDBY' as connect identifier is 'STDBY';
Database "STDBY" added

DGMGRL> enable configuration;
Enabled.
DGMGRL> show configuration;

Configuration - brokertest

  Protection Mode: MaxPerformance
  Members:
  NCDB  - Primary database
    STDBY - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 5 seconds ago)

That’s looking good so far! I can see archived redo logs being exchanged, and I can see MRP0 applying data in real time on the standby.

Recap

This is a reference for the settings in the Data Guard Broker. I have not modified any listener.ora file on any host, all I did was to ensure that the tnsnames.ora file has identical contents on both sides. I have listed the database configuration for later reference here:

DGMGRL> show database verbose 'NCDB';

Database - NCDB

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    NCDB

  Properties:
    DGConnectIdentifier             = 'NCDB'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'ASYNC'
    RedoRoutes                      = ''
    DelayMins                       = '0'
    Binding                         = 'optional'
    MaxFailure                      = '0'
    MaxConnections                  = '1'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    LogShipping                     = 'ON'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyLagThreshold               = '0'
    TransportLagThreshold           = '0'
    TransportDisconnectedThreshold  = '30'
    ApplyParallel                   = 'AUTO'
    StandbyFileManagement           = 'AUTO'
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '4'
    LogArchiveMinSucceedDest        = '1'
    DbFileNameConvert               = ''
    LogFileNameConvert              = ''
    FastStartFailoverTarget         = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    SendQEntries                    = '(monitor)'
    LogXptStatus                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=server1.example.com)
           (PORT=1521))(CONNECT_DATA=(SERVICE_NAME=NCDB_DGMGRL)
           (INSTANCE_NAME=NCDB)(SERVER=DEDICATED)))'
    StandbyArchiveLocation          = 'USE_DB_RECOVERY_FILE_DEST'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = '%t_%s_%r.dbf'
    TopWaitEvents                   = '(monitor)'

Database Status:
SUCCESS

DGMGRL> show database verbose 'STDBY';

Database - STDBY

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 5.00 KByte/s
  Active Apply Rate:  0 Byte/s
  Maximum Apply Rate: 0 Byte/s
  Real Time Query:    OFF
  Instance(s):
    STDBY

  Properties:
    DGConnectIdentifier             = 'STDBY'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'ASYNC'
    RedoRoutes                      = ''
    DelayMins                       = '0'
    Binding                         = 'optional'
    MaxFailure                      = '0'
    MaxConnections                  = '1'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    LogShipping                     = 'ON'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyLagThreshold               = '0'
    TransportLagThreshold           = '0'
    TransportDisconnectedThreshold  = '30'
    ApplyParallel                   = 'AUTO'
    StandbyFileManagement           = 'AUTO'
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '4'
    LogArchiveMinSucceedDest        = '1'
    DbFileNameConvert               = ''
    LogFileNameConvert              = ''
    FastStartFailoverTarget         = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    SendQEntries                    = '(monitor)'
    LogXptStatus                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.52)
          (PORT=1521))(CONNECT_DATA=(SERVICE_NAME=STDBY_DGMGRL)(INSTANCE_NAME=STDBY)
          (SERVER=DEDICATED)))'
    StandbyArchiveLocation          = 'USE_DB_RECOVERY_FILE_DEST'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = '%t_%s_%r.dbf'
    TopWaitEvents                   = '(monitor)'

Database Status:
SUCCESS

DGMGRL> show configuration verbose;

Configuration - brokertest

  Protection Mode: MaxPerformance
  Members:
  NCDB  - Primary database
    STDBY - Physical standby database 

  Properties:
    FastStartFailoverThreshold      = '30'
    OperationTimeout                = '30'
    TraceLevel                      = 'USER'
    FastStartFailoverLagLimit       = '30'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '0'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'CONTINUE'

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

Now the big question is: will this allow me to switch over? For the record I am still using my SSH connection (in screen!) on server1, but am now connected to the standby database in dgmgrl.

DGMGRL> validate database 'STDBY'

  Database Role:     Physical standby database
  Primary Database:  NCDB

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Flashback Database Status:
    NCDB:   Off
    STDBY:  Off

  Current Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (NCDB)                  (STDBY)
    1         3                       2                       Insufficient SRLs

  Future Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (STDBY)                 (NCDB)
    1         3                       0                       Insufficient SRLs
    Warning: standby redo logs not configured for thread 1 on NCDB

DGMGRL> switchover to 'STDBY';
Performing switchover NOW, please wait...
New primary database "STDBY" is opening...
Oracle Clusterware is restarting database "NCDB" ...
Switchover succeeded, new primary is "STDBY"
DGMGRL> show configuration

Configuration - brokertest

  Protection Mode: MaxPerformance
  Members:
  STDBY - Primary database
    NCDB  - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 13 seconds ago)

DGMGRL>

Well that seems to have worked!

It seems straight forward though: the SSH connection was made to the future standby host, which could have helped with Oracle Restart restarting the database. What if I try to switch back, but this time connect to server1 (currently in the standby role) and issue the switchover command? Here is the output:

[oracle@server1 ~]$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys@ncdb
Password:
Connected as SYSDBA.
DGMGRL> show configuration

Configuration - brokertest

  Protection Mode: MaxPerformance
  Members:
  STDBY - Primary database
    NCDB  - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 2 seconds ago)

DGMGRL> validate database 'NCDB';

  Database Role:     Physical standby database
  Primary Database:  STDBY

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Flashback Database Status:
    STDBY:  Off
    NCDB:   Off

  Current Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status       
              (STDBY)                 (NCDB)                               
    1         3                       2                       Insufficient SRLs

  Future Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status       
              (NCDB)                  (STDBY)                              
    1         3                       2                       Insufficient SRLs

Ready to switch over:

DGMGRL> switchover to 'NCDB'
Performing switchover NOW, please wait...
New primary database "NCDB" is opening...
Oracle Clusterware is restarting database "STDBY" ...
Switchover succeeded, new primary is "NCDB"
DGMGRL> show configuration

Configuration - brokertest

  Protection Mode: MaxPerformance
  Members:
  NCDB  - Primary database
    STDBY - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 54 seconds ago)

DGMGRL> 

OK, that worked too-very nice. It’s safe to say that in my lab conditions restarting of databases works pretty well.

Posted in 12c Release 1, Data Guard, Linux | 1 Comment »

JSON support in Exadata 12.1.2.1.0 and later

Posted by Martin Bach on July 8, 2015

Some time ago Oracle announced that RDBMS 12.1.0.2 has built-in support for JSON processing. A little later it was also mentioned that you have support for JSON in the Exadata storage servers for offloading. This is probably a lot more exciting to users of JSON than it is to me as I’m not a developer. However, whenever an announcement such as the one I’m referring to is made I would like to see for myself how much of it is implemented in software. Like I said, I’m not a developer so apologies for a silly example: what I’m showing you here can probably done differently and is not the best use of an Exadata. But all I really wanted to test is if JSON support actually exists. I am using cellsrv 12.1.2.1.0 and RDBMS 12.1.0.2.2 for this test.

JSON

I have to say I struggled a little bit to understand the use case for JSON and therefore did what probably everyone does and consulted the official documentation and oracle-base.com for Tim’s views on JSON. Here’s a summary of links I found useful to get started:

The Test

Ok so that was enough to get me started. I needed data, and a table to store this in. It appeared to me that an apache log could be a useful source for JSON records, so I converted my webserver’s log file to JSON using libee0 on OpenSuSE (yes I know, but it’s a great virtualisation platform). The converted file was named httpd_access_log.json and had records such as these:

{"host": "192.168.100.1", "identity": "", "user": "", "date": "05/Feb/2015:12:13:05 +0100", "request": "HEAD /ol70 HTTP/1.1", "status": "404", "size": "", "f1": "", "useragent": "Python-urllib/2.7"}
{"host": "192.168.100.1", "identity": "", "user": "", "date": "05/Feb/2015:12:13:25 +0100", "request": "GET / HTTP/1.1", "status": "403", "size": "989", "f1": "", "useragent": "Mozilla/5.0 (X11; Linux x86_64; rv:35.0) Gecko/20100101 Firefox/35.0"}

Sorry for the wide output-it’s an Apache log…

I then created the table to store the data. JSON appears to be pretty unstructured, so this will do:

SQL> create table jsontest (id number,
  2   jdata clob,
  3   constraint jsontest check (jdata is json)
  4  )
  5  lob (jdata) store as securefile (
  6   enable storage in row
  7  );

SQL> create sequence s_jsontest;

Sequence created

If you look closely then you’ll see that the JSON data is stored in an inline CLOB-that’s one of the pre-requisites for offloading LOBs in 12c.

Loading JSON

Now I needed a way to get the data into the table. I think I could have used SQLLDR but since I have rusty perl scripting skills I gave DBD::Oracle on 12.1.0.2 a go. The following script inserts records slow-by-slow or row-by-row into the table and is probably not the most efficient way to do this. But one of the reasons I blog is so that I don’t have to remember everything. If you ever wondered how to write a DBI/DBD::Oracle script here’s a starting point. Note the emphasis on “starting point” since the script has been trimmed for readability-essential error checking is not shown. Whenever you work with data make sure that your error handling is top-notch!

#!/usr/bin/perl

use strict;
use warnings;

use DBI;
use DBD::Oracle;
use Getopt::Long;

# these will be set by GetOpt::Long
my $service;            # has to be in tnsnames.ora
my $username;
my $jsonfile;

GetOptions (
  "service=s"   => \$service,
  "user=s"      => \$username,
  "jsonfile=s"  => \$jsonfile
);
die "usage: load_json.pl --service <servicename> --jsonfile [--user username] " if (!defined ($service ) || !defined ($jsonfile));

die "$jsonfile is not a file" unless ( -f $jsonfile );

print "connecting to service $service as user $username to load file $jsonfile\n";

# about to start...
my $dbh = DBI->connect ("dbi:Oracle:$service", "$username", "someCleverPasswordOrCatName")
  or die ("Cannot connect to service $service: DBI:errstr!");

print "connection to the database established, trying to load data...\n";

# prepare a cursor to loop over all entries in the file
my $sth = $dbh->prepare(q{
 insert into jsontest (id, jdata) values(s_jsontest.nextval, :json)
});

if (! open JSON, "$jsonfile")  {
  print "cannot open $jsonfile: $!\n";
  $dbh->disconnect();
  die "Cannot continue\n";
}

while (<JSON>) {
  chomp;
  $sth->bind_param(":json", $_);
  $sth->execute();
}

$dbh->disconnect();

close JSON;

print "done\n";

This script read the file and inserted all the data into the table. Again, essential error checking must be added, the script is far from being complete. You also need to set the Perl environment variables to the perl installation in $ORACLE_HOME for it to find the DBI and DBD::Oracle drivers.

Offloading or not?

It turned out that the data I inserted was of course not enough to trigger a direct path read that could turn into a Smart Scan. A little inflation of the table was needed. Once that was done I started to get my feet wet with JSON queries:

SQL> select jdata from jsontest where rownum < 6;

JDATA
--------------------------------------------------------------------------------
{"host": "192.168.100.1", "identity": "", "user": "", "date": "05/Feb/2015:12:26
{"host": "192.168.100.156", "identity": "", "user": "", "date": "05/Feb/2015:12:
{"host": "192.168.100.156", "identity": "", "user": "", "date": "05/Feb/2015:12:
{"host": "192.168.100.156", "identity": "", "user": "", "date": "05/Feb/2015:12:
{"host": "192.168.100.156", "identity": "", "user": "", "date": "05/Feb/2015:12:

Interesting. Here are a few more examples with my data set. Again, refer to oracle-base.com and the official documentation set for more information about JSON and querying it in the database. It’s by no means an Exadata only feature.

SQL> select count(*) from jsontest where json_exists(jsontest.jdata, '$.host' false on error);

  COUNT(*)
----------
   2195968

SQL> select count(*) from jsontest where not json_exists(jsontest.jdata, '$.host' false on error);

  COUNT(*)
----------
         0

And finally, here is proof that you can offload JSON data in Exadata; at least for some of the operations it should  be possible judging by the information in v$sqlfn_metadata:

SQL> select name,offloadable from v$sqlfn_metadata where name like '%JSON%'
  2  order by offloadable,name;

NAME                                               OFF
-------------------------------------------------- ---
JSON_ARRAY                                         NO
JSON_ARRAYAGG                                      NO
JSON_EQUAL                                         NO
JSON_OBJECT                                        NO
JSON_OBJECTAGG                                     NO
JSON_QUERY                                         NO
JSON_SERIALIZE                                     NO
JSON_TEXTCONTAINS2                                 NO
JSON_VALUE                                         NO
JSON                                               YES
JSON                                               YES
JSON_EXISTS                                        YES
JSON_QUERY                                         YES
JSON_VALUE                                         YES

14 rows selected.

The two entries named “JSON” are most likely “is JSON” and “is not JSON”.

And now with real data volumes on a real system using JSON_EXISTS:

SQL> select /*+ monitor am_I_offloaded */ count(*)
  2  from jsontest where json_exists(jsontest.jdata, '$.host' false on error);                                                                                        

   COUNT(*)
-----------
    2195968

Elapsed: 00:00:04.96

SQL> select * from table(dbms_xplan.display_cursor);                                                                                                        

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
SQL_ID  6j73xcww7hmcw, child number 0
-------------------------------------
select /*+ monitor am_I_offloaded */ count(*) from jsontest where
json_exists(jsontest.jdata, '$.host' false on error)

Plan hash value: 568818393

---------------------------------------------------------------------------------------
| Id  | Operation                  | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |          |       |       | 91078 (100)|          |
|   1 |  SORT AGGREGATE            |          |     1 |   610 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| JSONTEST | 21960 |    12M| 91078   (1)| 00:00:04 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage(JSON_EXISTS2("JSONTEST"."JDATA" FORMAT JSON , '$.host' FALSE ON
              ERROR)=1)
       filter(JSON_EXISTS2("JSONTEST"."JDATA" FORMAT JSON , '$.host' FALSE ON
              ERROR)=1)

So the execution plan looks promising-I can see “table access storage full” and a storage() predicate. Looking at V$SQL I get:

SQL> select sql_id, child_number,
  2  case when io_cell_offload_eligible_bytes = 0 then 'NO' else 'YES' end offloaded,
  3  io_cell_offload_eligible_bytes/power(1024,2) offload_eligible_mb,
  4  io_interconnect_bytes/power(1024,2) interconnect_mb,
  5  io_cell_offload_returned_bytes/power(1024,2) returned_mb,
  6  io_cell_offload_returned_bytes/io_cell_offload_eligible_bytes*100 offload_pct
  7   from v$sql where sql_id = '6j73xcww7hmcw';                                                                                                                                                   

SQL_ID        CHILD_NUMBER OFF OFFLOAD_ELIGIBLE_MB INTERCONNECT_MB RETURNED_MB OFFLOAD_PCT
------------- ------------ --- ------------------- --------------- ----------- -----------
6j73xcww7hmcw            0 YES         2606.695313     1191.731941 1191.724129 45.71781455

And to avoid any doubt, I have the SQL Trace as well:

PARSING IN CURSOR #140370400430072 len=120 dep=0 uid=65 oct=3 lid=65 tim=1784977054418 hv=1750582781 ad='5bfcebed8' sqlid='bfwd4t5n5gjgx'
select /*+ monitor am_I_offloaded */ count(*)   from jsontest where json_exists(jsontest.jdata, '$.host' false on error)
END OF STMT
PARSE #140370400430072:c=103984,e=272239,p=909,cr=968,cu=0,mis=1,r=0,dep=0,og=1,plh=568818393,tim=1784977054417
EXEC #140370400430072:c=0,e=105,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=568818393,tim=1784977054587
WAIT #140370400430072: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 p3=0 obj#=96305 tim=1784977054666
WAIT #140370400430072: nam='reliable message' ela= 826 channel context=27059892880 channel handle=27196561216 broadcast message=26855409216 obj#=96305 tim=1784977055727
WAIT #140370400430072: nam='enq: KO - fast object checkpoint' ela= 159 name|mode=1263468550 2=65629 0=1 obj#=96305 tim=1784977055942
WAIT #140370400430072: nam='enq: KO - fast object checkpoint' ela= 229 name|mode=1263468545 2=65629 0=2 obj#=96305 tim=1784977056265
WAIT #140370400430072: nam='cell smart table scan' ela= 196 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977057370
WAIT #140370400430072: nam='cell smart table scan' ela= 171 cellhash#=822451848 p2=0 p3=0 obj#=96298 tim=1784977057884
WAIT #140370400430072: nam='cell smart table scan' ela= 188 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977058461
WAIT #140370400430072: nam='cell smart table scan' ela= 321 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977061623
WAIT #140370400430072: nam='cell smart table scan' ela= 224 cellhash#=822451848 p2=0 p3=0 obj#=96298 tim=1784977062053
WAIT #140370400430072: nam='cell smart table scan' ela= 254 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977062487
WAIT #140370400430072: nam='cell smart table scan' ela= 7 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977062969
WAIT #140370400430072: nam='cell smart table scan' ela= 25 cellhash#=822451848 p2=0 p3=0 obj#=96298 tim=1784977063016
WAIT #140370400430072: nam='cell smart table scan' ela= 81 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977063115
WAIT #140370400430072: nam='cell smart table scan' ela= 1134 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977065442
WAIT #140370400430072: nam='cell smart table scan' ela= 6 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977065883
WAIT #140370400430072: nam='cell smart table scan' ela= 14 cellhash#=822451848 p2=0 p3=0 obj#=96298 tim=1784977065917
WAIT #140370400430072: nam='cell smart table scan' ela= 105 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977066037
WAIT #140370400430072: nam='cell smart table scan' ela= 12 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977066207
WAIT #140370400430072: nam='cell smart table scan' ela= 6605 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977072866
WAIT #140370400430072: nam='cell smart table scan' ela= 27 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977073877
WAIT #140370400430072: nam='cell smart table scan' ela= 29 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977074903
WAIT #140370400430072: nam='cell smart table scan' ela= 907 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977077783
WAIT #140370400430072: nam='cell smart table scan' ela= 28 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977078753
WAIT #140370400430072: nam='cell smart table scan' ela= 24 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977080860
WAIT #140370400430072: nam='cell smart table scan' ela= 1077 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977082935
...

Summary

So yet, it would appear as if JSON is offloaded.

Posted in 12c Release 1, Exadata | Tagged: | Leave a Comment »

IO Resource Manager for Pluggable Databases in Exadata

Posted by Martin Bach on June 16, 2015

Another interesting topic that goes over and above the CDB Resource Manager Plans I described earlier this week is the implementation of IORM Plans for Pluggable Databases. Pluggable Databases are an interesting object for studies, and I like to research things. When 12c came out there was no support for offloading initially, it is required that you are on cell software 12.1.x.x.x for full 12c support on Exadata. One aspect I was particularly interested in were PDBs and IORM. Limiting CPU on a PDB basis is already a cool thing, but in a hugely consolidated environment it does make sense to limit I/O based on a consumer group. Another one of the perks of working on Exadata :)

According to what I found out PDBs are treated as entities within the database, and they probably fall into the category of intra-database IORM. I have previously written about how DBRM plans filter down to the cells and become intra-database resource plans. This seems to be happening here, too.

IORM metric definitions

When looking at IORM I’m old fashioned and like to rely on the command line. More GUI oriented people should consider the use of OEM 12 to get similar data but in pretty pictures. But since OEM taps into cellcli under the covers it is only fair trying to understand the underlying technology.

The cells provide a lot of performance information in the metriccurrent and metrichistory views. You are shown metrics based on a name and objectType. For IORM the following objectTypes are of interest in general:

[celladmin@enkcel04 ~]$ cellcli -e "list metricdefinition attributes objectType" | grep IORM | sort | uniq
	 IORM_CATEGORY
	 IORM_CONSUMER_GROUP
	 IORM_DATABASE
	 IORM_PLUGGABLE_DATABASE

This was executed on Exadata 12.1.2.1.1, if you try this on an 11.2.3.x Exadata release you won’t see the IORM_PLUGGABLE_DATABASE category. Question is: which metrics are gathered in 12.1.2.1.1?

CELLCLI> list metricdefinition attributes name,description where objectType = 'IORM_PLUGGABLE_DATABASE'
	 PDB_FC_BY_ALLOCATED	 "Number of megabytes allocated in flash cache for this pluggable database"
	 PDB_FC_IO_BY_SEC   	 "Number of megabytes of I/O per second for this pluggable database to flash cache"
	 PDB_FC_IO_RQ       	 "Number of IO requests issued by this pluggable database to flash cache"
	 PDB_FC_IO_RQ_LG    	 "Number of large IO requests issued by this pluggable database to flash cache"
	 PDB_FC_IO_RQ_LG_SEC	 "Number of large IO requests issued by this pluggable database to flash cache per second"
	 PDB_FC_IO_RQ_SEC   	 "Number of IO requests issued by this pluggable database to flash cache per second"
	 PDB_FC_IO_RQ_SM    	 "Number of small IO requests issued by this pluggable database to flash cache"
	 PDB_FC_IO_RQ_SM_SEC	 "Number of small IO requests issued by this pluggable database to flash cache per second"
	 PDB_FD_IO_BY_SEC   	 "Number of megabytes of I/O per second for this pluggable database to flash disks"
	 PDB_FD_IO_LOAD     	 "Average I/O load from this pluggable database for flash disks"
	 PDB_FD_IO_RQ_LG    	 "Number of large IO requests issued by this pluggable database to flash disks"
	 PDB_FD_IO_RQ_LG_SEC	 "Number of large IO requests issued by this pluggable database to flash disks per second"
	 PDB_FD_IO_RQ_SM    	 "Number of small IO requests issued by this pluggable database to flash disks"
	 PDB_FD_IO_RQ_SM_SEC	 "Number of small IO requests issued by this pluggable database to flash disks per second"
	 PDB_FD_IO_TM       	 "The cumulative latency of reading blocks by this pluggable database from flash disks"
	 PDB_FD_IO_TM_RQ    	 "The rate which is the average latency of reading or writing blocks per request by this pluggable database from flash disks"
	 PDB_FD_IO_UTIL     	 "Percentage of flash resources utilized by this pluggable database"
	 PDB_FD_IO_WT_LG    	 "IORM wait time for large IO requests issued to flash disks by this pluggable database"
	 PDB_FD_IO_WT_LG_RQ 	 "Average IORM wait time per request for large IO requests issued to flash disks by this pluggable database"
	 PDB_FD_IO_WT_SM    	 "IORM wait time for small IO requests issued to flash disks by this pluggable database"
	 PDB_FD_IO_WT_SM_RQ 	 "Average IORM wait time per request for small IO requests issued to flash disks by this pluggable database"
	 PDB_IO_BY_SEC      	 "Number of megabytes of I/O per second for this pluggable database to hard disks"
	 PDB_IO_LOAD        	 "Average I/O load from this pluggable database for hard disks"
	 PDB_IO_RQ_LG       	 "Number of large IO requests issued by this pluggable database to hard disks"
	 PDB_IO_RQ_LG_SEC   	 "Number of large IO requests issued by this pluggable database to hard disks per second"
	 PDB_IO_RQ_SM       	 "Number of small IO requests issued by this pluggable database to hard disks"
	 PDB_IO_RQ_SM_SEC   	 "Number of small IO requests issued by this pluggable database to hard disks per second"
	 PDB_IO_TM_LG       	 "The cumulative latency of reading or writing large blocks by this pluggable database from hard disks"
	 PDB_IO_TM_LG_RQ    	 "The rate which is the average latency of reading or writing large blocks per request by this pluggable database from hard disks"
	 PDB_IO_TM_SM       	 "The cumulative latency of reading or writing small blocks by this pluggable database from hard disks"
	 PDB_IO_TM_SM_RQ    	 "The rate which is the average latency of reading or writing small blocks per request by this pluggable database from hard disks"
	 PDB_IO_UTIL_LG     	 "Percentage of disk resources utilized by large requests from this pluggable database"
	 PDB_IO_UTIL_SM     	 "Percentage of disk resources utilized by small requests from this pluggable database"
	 PDB_IO_WT_LG       	 "IORM wait time for large IO requests issued to hard disks by this pluggable database"
	 PDB_IO_WT_LG_RQ    	 "Average IORM wait time per request for large IO requests issued to hard disks by this pluggable database"
	 PDB_IO_WT_SM       	 "IORM wait time for small IO requests issued to hard disks by this pluggable database"
	 PDB_IO_WT_SM_RQ    	 "Average IORM wait time per request for small IO requests issued to hard disks by this pluggable database"

IORM metrics for PDBs

You still can’t “join” metricdefintition to metriccurrent but it is possible to use the objectType in metriccurrent, too. The way that Oracle externalises information about PDBs is as shown here:

CellCLI> list metriccurrent where objectType = 'IORM_PLUGGABLE_DATABASE' and metricObjectName like 'MBACHMT.*'
	 PDB_FC_BY_ALLOCATED	 MBACHMT.CDB$ROOT   	 1,843 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.PDB$SEED   	 41.500 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH0	 3,772 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH1	 3,374 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH2	 3,280 MB
...

PDB_FC_BY_ALLOCATED translates to “Number of megabytes allocated in flash cache for this pluggable database” as per the above translation. The Metric Object Name therefore is made up of the database name (I continue using MBACHMT as my CDB) and the container name. CDB$ROOT stands for the root, PDB$SEED for the seed database, and then the various PDB names you define. In my example I have “user PDBs” defined as swingbench{0,1,2}.

Stressing it

With the basics covered it is time to run some stress testing. I have created a 6 GB table named IOTEST in all my swingbench* PDBs and will use the same script to issue 80 sessions against that table in each PDB. My CDB resource manager plan is still the same, repeated here for your convenience:

begin
 dbms_resource_manager.clear_pending_area;
 dbms_resource_manager.create_pending_area;
 
 dbms_resource_manager.create_cdb_plan(
  plan => 'ENKITC_CDB_PLAN',
  comment => 'A CDB plan for 12c'
 );
 
 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench0',
  shares => 5,
  utilization_limit => 100);
 
 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench1',
  shares => 3,
  utilization_limit => 50);
 
 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench2',
  shares => 1,
  utilization_limit => 30);
 
 dbms_resource_manager.validate_pending_area;
 dbms_resource_manager.submit_pending_area;
end;
/

If I execute the scripts concurrently (80 sessions connecting against each PDB) and prefixing my scheduler with time command then I get the following results

  • swingbench0 – 143s
  • swingbench1 – 223s
  • swingbench2 – 288s

Interestingly, there is no event that would show I/O throttling in 12.1.0.2.2:

SYS:MBACHMT2> select count(*), con_id, event from v$session where username = 'SOE' group by con_id, event;

   COUNT(*)      CON_ID EVENT
----------- ----------- ----------------------------------------------------------------
         80           4 cell smart table scan
         80           5 cell smart table scan
         80           3 cell smart table scan

So all of them are Smart-Scanning. The fact that some of the sessions are throttled is not visible from the wait interface, or at least I haven’t seen a way to externalise I/O throttling. But it does happen. Using one of my favourite tools, metric_iorm.pl (available from MOS), I noticed the following:

Database: MBACHMT
Utilization:     Small=0%    Large=18%
Flash Cache:     IOPS=13087
Disk Throughput: MBPS=361
Small I/O's:     IOPS=4.0    Avg qtime=0.0ms
Large I/O's:     IOPS=346    Avg qtime=1026ms
	Consumer Group: SWINGBENCH1.SWINGBENCH_GROUP
	Utilization:     Small=0%    Large=4%
	Flash Cache:     IOPS=3482
	Disk Throughput: MBPS=91
	Small I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Large I/O's:     IOPS=87.7    Avg qtime=1336ms
	Consumer Group: SWINGBENCH0.SWINGBENCH_GROUP
	Utilization:     Small=0%    Large=13%
	Flash Cache:     IOPS=8886
	Disk Throughput: MBPS=254
	Small I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Large I/O's:     IOPS=244    Avg qtime=906ms
	Consumer Group: CDB$ROOT._ORACLE_LOWPRIBG_GROUP_
	Utilization:     Small=0%    Large=0%
	Flash Cache:     IOPS=0.8
	Disk Throughput: MBPS=0
	Small I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Large I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Consumer Group: CDB$ROOT.ORA$AUTOTASK
	Utilization:     Small=0%    Large=0%
	Flash Cache:     IOPS=1.4
	Disk Throughput: MBPS=0
	Small I/O's:     IOPS=0.4    Avg qtime=0.0ms
	Large I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Consumer Group: CDB$ROOT._ORACLE_BACKGROUND_GROUP_
	Utilization:     Small=0%    Large=0%
	Flash Cache:     IOPS=0.7
	Disk Throughput: MBPS=0
	Small I/O's:     IOPS=3.6    Avg qtime=0.0ms
	Large I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Consumer Group: SWINGBENCH2.SWINGBENCH_GROUP
	Utilization:     Small=0%    Large=0%
	Flash Cache:     IOPS=717
	Disk Throughput: MBPS=15
	Small I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Large I/O's:     IOPS=14.5    Avg qtime=1152ms

These are statistics from a single cell-this X2-2 has 3 of them. I have also gathered some of the raw stats here in case you are interested, again from a single cell:

CellCLI> list metriccurrent where name like 'PDB.*' and metricObjectName like 'MBACHMT.SW.*' and metricValue not like '0.*';
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH0	 3,779 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH1	 3,467 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH2	 3,461 MB
	 PDB_FC_IO_BY_SEC   	 MBACHMT.SWINGBENCH0	 835 MB/sec
	 PDB_FC_IO_BY_SEC   	 MBACHMT.SWINGBENCH1	 635 MB/sec
	 PDB_FC_IO_BY_SEC   	 MBACHMT.SWINGBENCH2	 338 MB/sec
	 PDB_FC_IO_RQ       	 MBACHMT.SWINGBENCH0	 4,807,060 IO requests
	 PDB_FC_IO_RQ       	 MBACHMT.SWINGBENCH1	 4,835,038 IO requests
	 PDB_FC_IO_RQ       	 MBACHMT.SWINGBENCH2	 4,833,804 IO requests
	 PDB_FC_IO_RQ_LG    	 MBACHMT.SWINGBENCH0	 4,797,336 IO requests
	 PDB_FC_IO_RQ_LG    	 MBACHMT.SWINGBENCH1	 4,828,945 IO requests
	 PDB_FC_IO_RQ_LG    	 MBACHMT.SWINGBENCH2	 4,827,506 IO requests
	 PDB_FC_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH0	 12,738 IO/sec
	 PDB_FC_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH1	 9,691 IO/sec
	 PDB_FC_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH2	 5,158 IO/sec
	 PDB_FC_IO_RQ_SEC   	 MBACHMT.SWINGBENCH0	 12,789 IO/sec
	 PDB_FC_IO_RQ_SEC   	 MBACHMT.SWINGBENCH1	 9,721 IO/sec
	 PDB_FC_IO_RQ_SEC   	 MBACHMT.SWINGBENCH2	 5,182 IO/sec
	 PDB_FC_IO_RQ_SM    	 MBACHMT.SWINGBENCH0	 9,724 IO requests
	 PDB_FC_IO_RQ_SM    	 MBACHMT.SWINGBENCH1	 6,093 IO requests
	 PDB_FC_IO_RQ_SM    	 MBACHMT.SWINGBENCH2	 6,298 IO requests
	 PDB_FC_IO_RQ_SM_SEC	 MBACHMT.SWINGBENCH0	 51.2 IO/sec
	 PDB_FC_IO_RQ_SM_SEC	 MBACHMT.SWINGBENCH1	 30.2 IO/sec
	 PDB_FC_IO_RQ_SM_SEC	 MBACHMT.SWINGBENCH2	 23.9 IO/sec
	 PDB_FD_IO_BY_SEC   	 MBACHMT.SWINGBENCH0	 835 MB/sec
	 PDB_FD_IO_BY_SEC   	 MBACHMT.SWINGBENCH1	 635 MB/sec
	 PDB_FD_IO_BY_SEC   	 MBACHMT.SWINGBENCH2	 338 MB/sec
	 PDB_FD_IO_LOAD     	 MBACHMT.SWINGBENCH0	 30.3
	 PDB_FD_IO_LOAD     	 MBACHMT.SWINGBENCH1	 30.4
	 PDB_FD_IO_LOAD     	 MBACHMT.SWINGBENCH2	 28.1
	 PDB_FD_IO_RQ_LG    	 MBACHMT.SWINGBENCH0	 4,797,336 IO requests
	 PDB_FD_IO_RQ_LG    	 MBACHMT.SWINGBENCH1	 4,828,945 IO requests
	 PDB_FD_IO_RQ_LG    	 MBACHMT.SWINGBENCH2	 4,827,506 IO requests
	 PDB_FD_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH0	 12,738 IO/sec
	 PDB_FD_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH1	 9,691 IO/sec
	 PDB_FD_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH2	 5,158 IO/sec
	 PDB_FD_IO_TM       	 MBACHMT.SWINGBENCH0	 69,803,464 us
	 PDB_FD_IO_TM       	 MBACHMT.SWINGBENCH1	 45,061,357 us
	 PDB_FD_IO_TM       	 MBACHMT.SWINGBENCH2	 40,433,099 us
	 PDB_FD_IO_TM_RQ    	 MBACHMT.SWINGBENCH0	 1.0 us/request
	 PDB_FD_IO_TM_RQ    	 MBACHMT.SWINGBENCH1	 1.0 us/request
	 PDB_FD_IO_TM_RQ    	 MBACHMT.SWINGBENCH2	 1.0 us/request
	 PDB_FD_IO_UTIL     	 MBACHMT.SWINGBENCH0	 67 %
	 PDB_FD_IO_UTIL     	 MBACHMT.SWINGBENCH1	 54 %
	 PDB_FD_IO_UTIL     	 MBACHMT.SWINGBENCH2	 29 %
	 PDB_FD_IO_WT_LG    	 MBACHMT.SWINGBENCH0	 830,669 ms
	 PDB_FD_IO_WT_LG    	 MBACHMT.SWINGBENCH1	 717,211 ms
	 PDB_FD_IO_WT_LG    	 MBACHMT.SWINGBENCH2	 221,666 ms
	 PDB_IO_BY_SEC      	 MBACHMT.SWINGBENCH0	 380 MB/sec
	 PDB_IO_BY_SEC      	 MBACHMT.SWINGBENCH1	 305 MB/sec
	 PDB_IO_BY_SEC      	 MBACHMT.SWINGBENCH2	 151 MB/sec
	 PDB_IO_LOAD        	 MBACHMT.SWINGBENCH0	 2.4
	 PDB_IO_LOAD        	 MBACHMT.SWINGBENCH1	 2.6
	 PDB_IO_LOAD        	 MBACHMT.SWINGBENCH2	 2.3
	 PDB_IO_RQ_LG       	 MBACHMT.SWINGBENCH0	 105,784 IO requests
	 PDB_IO_RQ_LG       	 MBACHMT.SWINGBENCH1	 88,549 IO requests
	 PDB_IO_RQ_LG       	 MBACHMT.SWINGBENCH2	 61,617 IO requests
	 PDB_IO_RQ_LG_SEC   	 MBACHMT.SWINGBENCH0	 365 IO/sec
	 PDB_IO_RQ_LG_SEC   	 MBACHMT.SWINGBENCH1	 292 IO/sec
	 PDB_IO_RQ_LG_SEC   	 MBACHMT.SWINGBENCH2	 145 IO/sec
	 PDB_IO_TM_LG       	 MBACHMT.SWINGBENCH0	 3,822,888,945 us
	 PDB_IO_TM_LG       	 MBACHMT.SWINGBENCH1	 3,355,167,650 us
	 PDB_IO_TM_LG       	 MBACHMT.SWINGBENCH2	 2,004,747,904 us
	 PDB_IO_TM_LG_RQ    	 MBACHMT.SWINGBENCH0	 1.0 us/request
	 PDB_IO_TM_LG_RQ    	 MBACHMT.SWINGBENCH1	 1.0 us/request
	 PDB_IO_TM_LG_RQ    	 MBACHMT.SWINGBENCH2	 1.0 us/request
	 PDB_IO_UTIL_LG     	 MBACHMT.SWINGBENCH0	 27 %
	 PDB_IO_UTIL_LG     	 MBACHMT.SWINGBENCH1	 20 %
	 PDB_IO_UTIL_LG     	 MBACHMT.SWINGBENCH2	 9 %
	 PDB_IO_WT_LG       	 MBACHMT.SWINGBENCH0	 108,668,272 ms
	 PDB_IO_WT_LG       	 MBACHMT.SWINGBENCH1	 105,099,717 ms
	 PDB_IO_WT_LG       	 MBACHMT.SWINGBENCH2	 132,192,319 ms
	 PDB_IO_WT_LG_RQ    	 MBACHMT.SWINGBENCH0	 1,655 ms/request
	 PDB_IO_WT_LG_RQ    	 MBACHMT.SWINGBENCH1	 2,979 ms/request
	 PDB_IO_WT_LG_RQ    	 MBACHMT.SWINGBENCH2	 4,921 ms/request

Some of the numbers don’t seem to make sense here, for example PDB_FD_IO_RQ_LG as the values are very similar. This is actually a feature (really!), because some metrics are cumulative, and some are instantaneous:

CELLCLI> list metricdefinition where name = 'PDB_FD_IO_RQ_LG' detail
	 name:              	 PDB_FD_IO_RQ_LG
	 description:       	 "Number of large IO requests issued by this pluggable database to flash disks"
	 metricType:        	 Cumulative
	 objectType:        	 IORM_PLUGGABLE_DATABASE
	 unit:              	 "IO requests"

So this is a cumulative metric. Others, like PDB_IO_RQ_LG_SEC measure the state “as is”:

CELLCLI> list metricdefinition where name = 'PDB_IO_RQ_LG_SEC' detail
	 name:              	 PDB_IO_RQ_LG_SEC
	 description:       	 "Number of large IO requests issued by this pluggable database to hard disks per second"
	 metricType:        	 Rate
	 objectType:        	 IORM_PLUGGABLE_DATABASE
	 unit:              	 IO/sec

Have fun!

Posted in 12c Release 1, Exadata | Tagged: , | 2 Comments »

Testing 12c CDB Resource Plans and a little bit about OEM Express

Posted by Martin Bach on June 15, 2015

Inspired by Sue Lee’s presentation at Enkitec’s E4 conference I decided to re-run my test suite to work out how you can use Database and I/O Resource Manager in Oracle 12.1.0.2.2 to help consolidating databases in the new Multi-Tenant architecture. I should point out briefly that the use of Multi-Tenant as demonstrated in this article requires you to have a license. Tuning tools shown will also require you to be appropriately licensed…

Setup

I have created a Container Database (CDB), named MBACHMT (MBACH – Multi-Tenant), in which I want to run Swingbench to experiment with I/O Resource Manager. The same Pluggable Databases (PDBs) are subject to the new CDB-(Database) Resource Manager testing. In order to simplify the task I’ll just create a single PDB for now, install Swingbench’s Order Entry schema, and clone the PDB twice. The CDB is created using dbca on an Exadata X2-2 quarter rack. Once the CDB was in place I could create the first PDB. I don’t think the steps need a lot of explanation, so here they go without much commenting. First the PDB must be created-based on the SEED database, then I create a tablespace and a user account to host the actual data.

SYS:MBACHMT1> create pluggable database swingbench0 admin user admin identified by secret roles=(DBA);

Pluggable database created.

SYS:MBACHMT1> alter pluggable database swingbench0 open instance=all;

Pluggable database altered.

SYS:MBACHMT1> alter session set container = swingbench0;

Session altered.

SYS:MBACHMT1> create tablespace soe datafile size 5g;

Tablespace created.

SYS:MBACHMT1> create user soe identified by soe default tablespace soe;

User created.

SYS:MBACHMT1> grant connect to soe;

Grant succeeded.

SYS:MBACHMT1> grant execute on dbms_lock to soe;

Grant succeeded.

One thing requires an explanation, and that’s the “alter session set container = swingbench0″ command. As you can see the SQLPROMPT is comprised of username – colon – container name. The container named does not change when using the SYS account to switch the context from CDB$ROOT to a PDB, hence you continue to see the MBACHMT1 prefix when in fact I am executing commands on the PDB level.

The next step is to create a basic Order Entry schema. I wrote about this before, and so has Dominic Giles on his blog. For your reference, here is the command I used:

oewizard -scale 1 -dbap supersecret -u soe -p soe -cl -cs //enkscan2/swingbench0 -ts SOE -create

SwingBench Wizard
Author  :        Dominic Giles
Version :        2.5.0.949

Running in Lights Out Mode using config file : oewizard.xml

============================================
|           Datagenerator Run Stats        |
============================================
Connection Time                        0:00:00.005
Data Generation Time                   0:00:21.380
DDL Creation Time                      0:00:52.221
Total Run Time                         0:01:13.609
Rows Inserted per sec                      566,237
Data Generated (MB) per sec                   46.1
Actual Rows Generated                   13,009,500


Post Creation Validation Report
===============================
The schema appears to have been created successfully.

Valid Objects
=============
Valid Tables : 'ORDERS','ORDER_ITEMS','CUSTOMERS','WAREHOUSES','ORDERENTRY_METADATA','INVENTORIES','PRODUCT_INFORMATION',
'PRODUCT_DESCRIPTIONS','ADDRESSES','CARD_DETAILS'
Valid Indexes : 'PRD_DESC_PK','PROD_NAME_IX','PRODUCT_INFORMATION_PK','PROD_SUPPLIER_IX','PROD_CATEGORY_IX','INVENTORY_PK',
'INV_PRODUCT_IX','INV_WAREHOUSE_IX','ORDER_PK','ORD_SALES_REP_IX','ORD_CUSTOMER_IX','ORD_ORDER_DATE_IX',
'ORD_WAREHOUSE_IX','ORDER_ITEMS_PK','ITEM_ORDER_IX','ITEM_PRODUCT_IX','WAREHOUSES_PK','WHS_LOCATION_IX',
'CUSTOMERS_PK','CUST_EMAIL_IX','CUST_ACCOUNT_MANAGER_IX','CUST_FUNC_LOWER_NAME_IX','ADDRESS_PK','ADDRESS_CUST_IX',
'CARD_DETAILS_PK','CARDDETAILS_CUST_IX'
Valid Views : 'PRODUCTS','PRODUCT_PRICES'
Valid Sequences : 'CUSTOMER_SEQ','ORDERS_SEQ','ADDRESS_SEQ','LOGON_SEQ','CARD_DETAILS_SEQ'
Valid Code : 'ORDERENTRY'
Schema Created

With the schema in place I need a PDB Resource Plan, or in other words, enable a resource manager plan on PDB-level. The PDB Resource Plan is almost identical to non-CDB DBRM plans, with a few restrictions mentioned in the Admin Guide chapter 44, just below figure 44-4. The one that affects me is the lack of multi-level resource plans. For this reason I’m going to use a simple plan based on the RATIO mgmt_mth of dbms_resource_manager.create_plan. Not having multi-level resource plans at your disposal might actually prevent incredibly complex plans from being created, that are beautifully architected but equally difficult to understand for me at least.

To keep it simple, my PDBs just have 1 purpose: execute Swingbench. As such there won’t be an additional application user, all I care about is the SOE account. I want it to be eligible for the lion share of CPU, so here’s the plan. You must make sure that your execution context is the new PDB (swingbench0). You can make sure by selecting “sys_context(‘userenv’,’con_name’) from dual.

begin
 dbms_resource_manager.clear_pending_area;
 dbms_resource_manager.create_pending_area;

 -- create a new resource consumer group to which we will later on add plan directives
 -- a consumer group is a logical construct grouping sessions to a similar/identical workload
 dbms_resource_manager.create_consumer_group('SWINGBENCH_GROUP', 'for swingbench processing');

 dbms_resource_manager.validate_pending_area();
 dbms_resource_manager.submit_pending_area();
end;
/

begin
 dbms_resource_manager.create_pending_area();

 -- when logging in as oracle user "SOE", map this session to the SWINGBENCH_GROUP
 dbms_resource_manager.set_consumer_group_mapping(
		dbms_resource_manager.oracle_user, 'SOE', 'SWINGBENCH_GROUP');
 dbms_resource_manager.submit_pending_area();
end;
/

begin
 -- must allow the SOE user to switch from OTHERS_GROUP to SWINGBENCH_GROUP. Forgetting this step
 -- is a common reason for DBRM not to work as expected
 dbms_resource_manager_privs.grant_switch_consumer_group('SOE','SWINGBENCH_GROUP', true);
end;
/

BEGIN
 dbms_resource_manager.clear_pending_area();
 dbms_resource_manager.create_pending_area();
 
 -- new create the plan in the first step. Note the mgmt_mth which essentially requires you
 -- to think of CPU shares, not percentages. Also enforces the requirement not to use
 -- multi-level plans
 -- thanks for @fritshoogland for making this obvious to me
 dbms_resource_manager.create_plan(
 	plan => 'ENKITEC_SWINGBENCH_PDB_PLAN',
        mgmt_mth => 'RATIO',
 	comment => 'sample DBRM plan for swingbench'
 );

 -- now define what the plan is about. Give the SYS_GROUP 3 shares
 dbms_resource_manager.create_plan_directive(
  plan => 'ENKITEC_SWINGBENCH_PDB_PLAN',
  comment => 'sys_group is level 1',
  group_or_subplan => 'SYS_GROUP',
  mgmt_p1 => 3);

 -- the SWINGBENCH user gets 7 shares
 dbms_resource_manager.create_plan_directive(
  plan => 'ENKITEC_SWINGBENCH_PDB_PLAN',
  group_or_subplan => 'SWINGBENCH_GROUP',
  comment => 'us before anyone else',
  mgmt_p1 => 7
 );

 -- finally anyone not in a previous consumer group will be mapped to the
 -- OTHER_GROUPS and get 1 share. 
 dbms_resource_manager.create_plan_directive(
  plan => 'ENKITEC_SWINGBENCH_PDB_PLAN',
  group_or_subplan => 'OTHER_GROUPS',
  comment => 'all the rest',
  mgmt_p1 => 1
 );
 
 dbms_resource_manager.validate_pending_area();
 dbms_resource_manager.submit_pending_area();
end;
/

If you didn’t get any errors you can enable the plan in the PDB using the familiar “alter system set resource_manager_plan = ENKITEC_SWINGBENCH_PDB_PLAN;” command.

The PDB is now ready for cloning, which requires it to be open read only. Once swingbench0 is in the correct open mode, clone it using “create pluggable database swingbench1 from swingbench0;” and “create pluggable database swingbench2 from swingbench0;”. Nice-no “RMAN> duplicate database to swingbench1 from active database” and all this … just a one-liner. Once the cloning is done, open the PDBs.

The CDB plan

With the PDBs all registered I am now able to define a CDB resource plan in CDB$ROOT. Again, check using “select sys_context(‘userenv’,’con_name’) from dual” that you are in the root, not a PDB. Here is the plan:

begin
 dbms_resource_manager.clear_pending_area;
 dbms_resource_manager.create_pending_area;

 dbms_resource_manager.create_cdb_plan(
  plan => 'ENKITC_CDB_PLAN',
  comment => 'A CDB plan for 12c'
 );

 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench0',
  shares => 5,
  utilization_limit => 100);

 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench1',
  shares => 3,
  utilization_limit => 50);

 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench2',
  shares => 1,
  utilization_limit => 30);

 dbms_resource_manager.validate_pending_area;
 dbms_resource_manager.submit_pending_area;
end;
/

SQL> alter system set RESOURCE_MANAGER_PLAN = 'FORCE:ENKITC_CDB_PLAN' scope=both sid='*';

System altered.

With this CDB plan I defined CPU shares and utilisation limits. There are other items worth investigating, refer to the documentation for those. Please take note that except for swingbench0 I capped the maximum utilisation of 50 and 30 percent respectively. This will be interesting later.

Testing

No workload is 100% CPU-bound and I tried a few iterations before coming up with a suitable model to view the CDB Resource Plan in action. In the end all I needed was CPU burning nonsense, and I have found one way of burning CPU by calculating millions of square roots. I have written a small “launcher” script that can execute a SQL script against a (Pluggable) database x-many times. When exceeding the CPU capacity on the system I should be able to see the effect. So I launched 20 sessions of my CPU burning for-loops against each of the PDBs in instance two and connected against CDB$ROOT to see the effect:

SYS:MBACHMT2> select count(*), con_id, inst_id, event from gv$session where username = 'SOE' group by con_id, inst_id, event;

   COUNT(*)      CON_ID     INST_ID EVENT
----------- ----------- ----------- ----------------------------------------------------------------
         20           3           2 resmgr:cpu quantum
         20           4           2 resmgr:cpu quantum
         20           5           2 resmgr:cpu quantum

And yes, that looks like it ;)

The timings for the executions were:

  • 232 seconds for swingbench0
  • 318 seconds for swingbench1
  • 509 seconds for swingbench2

A little bit of OEM Express

OEM Express was running at the time and it allows you to see the utilisation of your PDBs:

CDB Resource Plan with all 3 PDBs active

CDB Resource Plan with all 3 PDBs active

You can see the 3 PDBs working along. On the CPU bars to the right you can see the number of running sessions (green) and those waiting (beige). You can also see the entitlement as per shares (black vertical bar in “CPU Resource Limits”) and the utilisation limit (red vertical bar) The output almost perfectly matched the configuration.

 

Posted in 12c Release 1, Exadata | Tagged: , , | 1 Comment »

Smart Scan enhancements in Oracle Exadata 12c-inline LOBs

Posted by Martin Bach on May 25, 2015

As part of the 12c release of the database and cellos Oracle introduced the ability to perform Smart Scans on inline LOBs. The enhancement is certainly for Secure Files only. And as always, if you would like to read up about Secure Files you can head over to Tim Hall’s website for the details.

To demonstrate the new behaviour I have used an 11.2.0.3 BP 22 database on Exadata 11.2.3.3.1 and 12.1.0.2.2 database on Exadata 12.1.2.1.0. The test case evolves around the following table:

CREATE TABLE loboffload (
 id number primary key,
 comments clob)
enable row movement 
tablespace users
lob (comments) store as securefile (
 enable storage in row
);

This is the short version, Oracle fills in the blanks and converts the table DDL to

SQL> select dbms_metadata.get_ddl('TABLE','LOBOFFLOAD') from dual;

DBMS_METADATA.GET_DDL('TABLE','LOBOFFLOAD')
--------------------------------------------------------------------------------

  CREATE TABLE "MARTIN"."LOBOFFLOAD"
   (    "ID" NUMBER,
        "COMMENTS" CLOB,
         PRIMARY KEY ("ID")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"  ENABLE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"
 LOB ("COMMENTS") STORE AS SECUREFILE (
  TABLESPACE "USERS" ENABLE STORAGE IN ROW CHUNK 8192
  NOCACHE LOGGING  NOCOMPRESS  KEEP_DUPLICATES
  STORAGE(INITIAL 106496 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT))  ENABLE ROW
 MOVEMENT

The idea is that I am using the comments lob column to simulate a common approach in application design where you define a free text field on a GUI screen for the user to provide additional information about the record to be stored in the database. The design is not very practical though.

The important information is in the section about the LOB. Here you find the request to store the LOB as a SECUREFILE in-line with the table. Out of line LOBs are not offloadable as of 12.1.2.1.0 as far as I know.

To feed some data into the table I copied added a bit of data and then used the insert … append technique to populate it:

alter session enable parallel dml;

-- with nods to Jonathan Lewis who published this in one of his presentations on generating data
insert /*+ append parallel(4) */ into loboffload
with v1 as (
 SELECT rownum n FROM dual CONNECT BY level <= 10000
)
SELECT  rownum id,
  CASE
    WHEN mod(rownum,100000) = 0     THEN 'THIS WAS EXCEPTIONAL' 
    WHEN mod(rownum,10000) = 0      THEN 'THIS WAS VERY GOOD'
    WHEN mod(rownum,1000) = 0       THEN 'THIS WAS GOOD'
    WHEN mod(rownum,100) = 0        THEN 'OK'
    ELSE 'NO COMPLAINTS'
  END 
FROM v1,
  v1
WHERE rownum <= 1e6;

create sequence loboffload_s cache 10000 start with 1000001;


insert /*+ append parallel(4) */ into loboffload
select /*+ parallel(4) */ loboffload_s.nextval, comments from loboffload;

I ended up with 16 million rows in the end. I used impdp over a network link to move the table to the other database.

Test with 11.2.0.3

The test in 11.2.0.3 is used to confirm that inline LOBs are not offloadable. Here is the query and result:

MARTIN@ora11203> select /*+ monitor noparallel full(t) */ count(*) from loboffload t where comments like '%GOOD%';

  COUNT(*)
----------
     15840

Elapsed: 00:01:33.48

MARTIN@ora11203> -- removing irrelevant statistics from the output
MARTIN@ora11203> @mystats stop t=1
==========================================================================================
MyStats report : 25-MAY-2015 02:33:24
==========================================================================================


------------------------------------------------------------------------------------------
1. Summary Timings
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
TIMER   snapshot interval (seconds)                                                  95.28
TIMER   CPU time used (seconds)                                                      93.00


------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    CPU used by this session                                                     9,303
...
STAT    DB time                                                                      9,353
...
STAT    cell flash cache read hits                                                   9,454
STAT    cell physical IO interconnect bytes                                  1,233,436,672
...
STAT    consistent gets                                                            150,794
STAT    consistent gets - examination                                                    3
STAT    consistent gets direct                                                     150,558
STAT    consistent gets from cache                                                     236
STAT    consistent gets from cache (fastpath)                                          231
STAT    cursor authentications                                                           3
...
STAT    execute count                                                                   13
...
STAT    logical read bytes from cache                                           14,393,344
STAT    no work - consistent read gets                                             150,558
...
STAT    parse count (hard)                                                               1
STAT    parse count (total)                                                             13
STAT    parse time cpu                                                                   1
STAT    parse time elapsed                                                               1
STAT    physical read IO requests                                                    9,459
STAT    physical read bytes                                                  1,233,436,672
STAT    physical read requests optimized                                             9,454
STAT    physical read total IO requests                                              9,459
STAT    physical read total bytes                                            1,233,436,672
STAT    physical read total bytes optimized                                  1,233,395,712
STAT    physical read total multi block requests                                     9,255
STAT    physical reads                                                             150,566
STAT    physical reads cache                                                             8
STAT    physical reads direct                                                      150,558
...
STAT    table scan blocks gotten                                                   150,558
STAT    table scan rows gotten                                                  16,000,000
STAT    table scans (direct read)                                                        1
STAT    table scans (long tables)                                                        1
...

------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

So none of the essential cell% events recorded except for Flash Cache read hits. Without the cell scans statistics incremented by 1 (for the table segment) you can conclude that no Smart Scan happened here.

Test on 12.1.0.2/12.1.2.1.0

The first result was not too surprising, as it is the documented behaviour. The second test should hopefully reveal some offloading.

MARTIN@ORA12c> select /*+ monitor noparallel full(t) */ count(*) from loboffload t where comments like '%GOOD%';

  COUNT(*)
----------
     15840

Elapsed: 00:00:01.65

MARTIN@ORA12c> @mystats stop t=1
==========================================================================================
MyStats report : 25-MAY-2015 02:29:46
==========================================================================================


------------------------------------------------------------------------------------------
1. Summary Timings
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
TIMER   snapshot interval (seconds)                                                  14.61
TIMER   CPU time used (seconds)                                                       0.03


------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    CPU used by this session                                                         9
...
STAT    cell IO uncompressed bytes                                           1,234,296,832
STAT    cell blocks helped by minscn optimization                                  150,666
STAT    cell blocks processed by cache layer                                       150,671
STAT    cell blocks processed by data layer                                        150,671
STAT    cell blocks processed by txn layer                                         150,671
STAT    cell flash cache read hits                                                   1,143
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                1,234,296,832
STAT    cell physical IO interconnect bytes                                      6,273,368
STAT    cell physical IO interconnect bytes returned by smart scan               6,273,368
STAT    cell scans                                                                       1
...
STAT    consistent gets                                                            151,053
STAT    consistent gets direct                                                     150,671
STAT    consistent gets examination                                                      3
STAT    consistent gets examination (fastpath)                                           3
STAT    consistent gets from cache                                                     382
STAT    consistent gets pin                                                            379
STAT    consistent gets pin (fastpath)                                                 379
...
STAT    parse count (total)                                                             13
STAT    physical read IO requests                                                    1,205
STAT    physical read bytes                                                  1,234,296,832
STAT    physical read requests optimized                                             1,143
STAT    physical read total IO requests                                              1,205
STAT    physical read total bytes                                            1,234,296,832
STAT    physical read total bytes optimized                                  1,171,423,232
STAT    physical read total multi block requests                                     1,189
STAT    physical reads                                                             150,671
STAT    physical reads direct                                                      150,671
...
STAT    table scan blocks gotten                                                    15,744
STAT    table scan disk non-IMC rows gotten                                      1,670,692
STAT    table scan rows gotten                                                   1,670,692
STAT    table scans (direct read)                                                        1
STAT    table scans (short tables)                                                       1
...

------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

Good news, what Oracle said is true. The same query that did not offload in 11.2.0.3 is now offloaded in 12.1.0.2 with Exadata 12.1.2.1.0. You find proof in the existence of all the cell% statistics, especially cell scans plus the … processed by … layer statistic.

Posted in 12c Release 1, Exadata | 1 Comment »

If you use swingbench take note of sbutil

Posted by Martin Bach on May 19, 2015

This is going to be a very short post for a change. I have used Swingbench extensively and really love the tool. Many thanks to Dominic Giles!

Recently he announced a new tool on his blog that you can use to inflate your data volume. So instead of using the “-scale” argument when executing oewizard you can just keep the defaults and later on create as much data as you like. Here is an example, the reason for this post.

Setting the Scene

If you have not yet downloaded the latest Swingbench version and JDK 8 do this first. You can get Swingbench from the tool’s landing page and the Java software from java.sun.com. I used Swingbench 2.5.791 (stable). With both deployed, I created the Order Entry schema in lights-out mode:

[oracle@lab bin]$ ./oewizard -dbap secret -u soe -p soe -cl -cs //lab/martin -ts soe_tbs -create -dba system
SwingBench Wizard
Author  :        Dominic Giles
Version :        2.5.0.971

Running in Lights Out Mode using config file : oewizard.xml
The following statement failed : GRANT EXECUTE ON dbms_lock TO soe : Due to : ORA-01031: insufficient privileges

============================================
|           Datagenerator Run Stats        |
============================================
Connection Time                        0:00:00.006
Data Generation Time                   0:01:06.110
DDL Creation Time                      0:00:30.346
Total Run Time                         0:01:36.465
Rows Inserted per sec                      183,121
Data Generated (MB) per sec                   14.9
Actual Rows Generated                   13,002,138

Post Creation Validation Report
===============================
The schema appears to have been created successfully.

Valid Objects
=============
Valid Tables : 'ORDERS','ORDER_ITEMS','CUSTOMERS','WAREHOUSES','ORDERENTRY_METADATA','INVENTORIES',
'PRODUCT_INFORMATION','PRODUCT_DESCRIPTIONS','ADDRESSES','CARD_DETAILS'
Valid Indexes : 'PRD_DESC_PK','PROD_NAME_IX','PRODUCT_INFORMATION_PK','PROD_SUPPLIER_IX','PROD_CATEGORY_IX',
'INVENTORY_PK','INV_PRODUCT_IX','INV_WAREHOUSE_IX','ORDER_PK','ORD_SALES_REP_IX',
'ORD_CUSTOMER_IX','ORD_ORDER_DATE_IX','ORD_WAREHOUSE_IX','ORDER_ITEMS_PK','ITEM_ORDER_IX',
'ITEM_PRODUCT_IX','WAREHOUSES_PK','WHS_LOCATION_IX','CUSTOMERS_PK','CUST_EMAIL_IX',
'CUST_ACCOUNT_MANAGER_IX','CUST_FUNC_LOWER_NAME_IX','ADDRESS_PK','ADDRESS_CUST_IX',
'CARD_DETAILS_PK','CARDDETAILS_CUST_IX'
Valid Views : 'PRODUCTS','PRODUCT_PRICES'
Valid Sequences : 'CUSTOMER_SEQ','ORDERS_SEQ','ADDRESS_SEQ','LOGON_SEQ','CARD_DETAILS_SEQ'
Valid Code : 'ORDERENTRY'
Schema Created

More Data!

The next step is to invoke sbutil:

[oracle@lab bin]$ ./sbutil -u soe -p soe  -cs //lab/martin -soe parallel 12 -dup 4
Getting table Info
Got table information. Completed in : 0:00:01.558
Dropping Indexes
Dropped Indexes. Completed in : 0:00:03.252
Creating copies of tables
Created copies of tables. Completed in : 0:00:00.042
Begining data duplication
Completed Iteration 4. Completed in : 0:00:07.942
Creating  Constraints
Created  Constraints. Completed in : 0:00:28.737
Creating  Indexes
Created  Indexes. Completed in : 0:00:54.541
Updating Metadata and Recompiling Code
Updated Metadata. Completed in : 0:00:00.754
Updating Sequences
Updated Sequences. Completed in : 0:00:00.792
Determining New Row Counts
Got New Row Counts. Completed in : 0:00:03.132
Completed Data Duplication in 0 hour(s) 2 minute(s) 8 second(s) 454 millisecond(s)
----------------------------------------------------------------------------------------------------------
|Table Name          |  Original Row Count|       Original Size|       New Row Count|            New Size|
----------------------------------------------------------------------------------------------------------
|ORDER_ITEMS         |           4,288,582|              280 MB|          17,154,328|              1.1 GB|
|CUSTOMERS           |           1,000,000|              136 MB|           4,000,000|            552.4 MB|
|CARD_DETAILS        |           1,500,000|               80 MB|           6,000,000|            337.8 MB|
|ORDERS              |           1,429,790|              160 MB|           5,719,160|              664 MB|
|ADDRESSES           |           1,500,000|              184 MB|           6,000,000|            571.8 MB|
----------------------------------------------------------------------------------------------------------
|Total               |                    |              840 MB|                    |              3.2 GB|
----------------------------------------------------------------------------------------------------------

How Cool is That?

Really, it is. And look at the timings. This didn’t take too long, even on my poor old AMD 6200 series lab server.

Posted in Performance | Tagged: , , | 1 Comment »

Additional information on Oracle 12c big table caching

Posted by Martin Bach on May 13, 2015

Teaching is on the things I like doing, and currently I am investigating the Oracle 12c features around caching data in the various memory areas. Since the In-Memory (cost) option has been discussed by other far more knowledgeable people I would like to share some findings about the big table caching here.

Some Background

In Oracle 12c you have two additional options to cache information: full database caching and big table caching. The first is great if you have a massively big machine with lots and lots of DRAM plus a clever OS that can deal with the inevitable ccNUMA setup you will have to tackle. And maybe don’t want to pay for the In-Memory option. This post is not about full database caching, but rather about the other possibility to cache blocks.

This other option is to have just a few tables in an area within the buffer cache. That’s right-after the keep and recycle pools you can now tweak the default buffer cache. In fact you tell Oracle by means of an initialisation parameter how much of the default buffer cache can be used to cache full scans. You can use zero percent (default) for caching full scans, up to 90%. You must leave 10% to OLTP workloads. I haven’t really investigated the pre-12c caching mechanism in detail (hey it works!) but I read that full scans are not cached by default to avoid thrashing the cache.

Big Table Caching

Now what? We can cache full scans too. Interesting, let’s try it. I am using 12.1.0.2.2 on an ODA, but that shouldn’t really matter. The SGA is 24 GB in size, and my segment to be scanned (a non-partitioned table without indexes) is about 20GB in size. I want 50% of the buffer cache allocated for the big table caching tool.

SQL> select component, current_size/power(1024,2) size_mb
  2  from v$sga_dynamic_components where current_size <> 0;

COMPONENT                                                           SIZE_MB
---------------------------------------------------------------- ----------
shared pool                                                            2688
large pool                                                              192
java pool                                                               128
DEFAULT buffer cache                                                  20992
Shared IO Pool                                                          512

SQL> show parameter db_big_table_cache_percent_target

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_big_table_cache_percent_target    string      50

SQL> select * from v$bt_scan_cache;

BT_CACHE_ALLOC BT_CACHE_TARGET OBJECT_COUNT MEMORY_BUF_ALLOC MIN_CACHED_TEMP     CON_ID
-------------- --------------- ------------ ---------------- --------------- ----------
    .500334108              50            0                0            1000          0

SQL> select bytes/power(1024,2) m from dba_segments
  2  where owner = 'MARTIN' and segment_name = 'T1';

         M
----------
     20864

Now let’s see if we can make use of this. 50% of 20GB are about 10GB useable for the scan cache. If I start a query against T1 in another session I can see the object count increase.

SQL> r
  1* select * from v$bt_scan_cache

BT_CACHE_ALLOC BT_CACHE_TARGET OBJECT_COUNT MEMORY_BUF_ALLOC MIN_CACHED_TEMP     CON_ID
-------------- --------------- ------------ ---------------- --------------- ----------
    .500334108              50            1          1292363            1000          0

Subsequent queries against T1 will increase the temperature and potentially the cached information. The “object temperature” is a method Oracle uses to determine the suitability of an object to be cached. In my very basic example there is only one table which has been full-scanned so far. Later on I’ll add T2 to the mix. The temperature and other object information are reflected in the second view, v$bt_scan_obj_temps:

SQL> select * from v$bt_scan_obj_temps;

       TS#   DATAOBJ# SIZE_IN_BLKS TEMPERATURE POLICY     CACHED_IN_MEM     CON_ID
---------- ---------- ------------ ----------- ---------- ------------- ----------
         4      39461      2669763        4000 MEM_PART         1292711          0

Session Counters!

I have developed a habit of looking into session counters when running queries to see if there is anything of interest. Using snapper with the before/after snapshot I got this:

SQL> @snapper4 all,end 5 1 163
Sampling SID 163 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v4.11 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
...
    163, MARTIN    , STAT, user calls                                                ,             3,        .06,         ,             ,          ,           ,          3 per execution
...
    163, MARTIN    , STAT, physical read total IO requests                           ,         10752,     224.51,         ,             ,          ,           ,     10.75k per execution
    163, MARTIN    , STAT, physical read total multi block requests                  ,         10752,     224.51,         ,             ,          ,           ,     10.75k per execution
    163, MARTIN    , STAT, physical read total bytes                                 ,   11262763008,    235.17M,         ,             ,          ,           ,     11.26G per execution
...
    163, MARTIN    , STAT, consistent gets                                           ,       2666722,     55.68k,         ,             ,          ,           ,      2.67M per execution
    163, MARTIN    , STAT, consistent gets from cache                                ,       1292769,     26.99k,         ,             ,          ,           ,      1.29M per execution
    163, MARTIN    , STAT, consistent gets pin                                       ,       1292769,     26.99k,         ,             ,          ,           ,      1.29M per execution
    163, MARTIN    , STAT, consistent gets pin (fastpath)                            ,       1292768,     26.99k,         ,             ,          ,           ,      1.29M per execution
    163, MARTIN    , STAT, consistent gets direct                                    ,       1373953,     28.69k,         ,             ,          ,           ,      1.37M per execution
...
    163, MARTIN    , STAT, physical reads                                            ,       1374849,     28.71k,         ,             ,          ,           ,      1.37M per execution
    163, MARTIN    , STAT, physical reads cache                                      ,           128,       2.67,         ,             ,          ,           ,        128 per execution
    163, MARTIN    , STAT, physical reads direct                                     ,       1374721,     28.71k,         ,             ,          ,           ,      1.37M per execution
    163, MARTIN    , STAT, physical read IO requests                                 ,         10752,     224.51,         ,             ,          ,           ,      1.05M bytes per request
    163, MARTIN    , STAT, physical read bytes                                       ,   11262763008,    235.17M,         ,             ,          ,           ,     11.26G per execution
...
    163, MARTIN    , STAT, data warehousing scanned objects                          ,             1,        .02,         ,             ,          ,           ,          1 per execution
    163, MARTIN    , STAT, data warehousing scanned blocks                           ,       2666668,     55.68k,         ,             ,          ,           ,      2.67M per execution
    163, MARTIN    , STAT, data warehousing scanned blocks - memory                  ,       1292715,     26.99k,         ,             ,          ,           ,      1.29M per execution
    163, MARTIN    , STAT, data warehousing scanned blocks - disk                    ,       1373953,     28.69k,         ,             ,          ,           ,      1.37M per execution
...
    163, MARTIN    , STAT, table scans (short tables)                                ,             1,        .02,         ,             ,          ,           ,          1 per execution
    163, MARTIN    , STAT, table scan rows gotten                                    ,      16000006,    334.09k,         ,             ,          ,           ,        16M per execution
    163, MARTIN    , STAT, table scan disk non-IMC rows gotten                       ,      16000006,    334.09k,         ,             ,          ,           ,        16M per execution
    163, MARTIN    , STAT, table scan blocks gotten                                  ,       2666668,     55.68k,         ,             ,          ,           ,      2.67M per execution
...
    163, MARTIN    , STAT, execute count                                             ,             1,        .02,         ,             ,          ,           ,          1 executions per parse
...
    163, MARTIN    , TIME, parse time elapsed                                        ,            26,      .54us,      .0%, [          ],          ,           ,
    163, MARTIN    , TIME, DB CPU                                                    ,       5682136,   118.65ms,    11.9%, [@@        ],          ,           ,
    163, MARTIN    , TIME, sql execute elapsed time                                  ,      33220099,   693.66ms,    69.4%, [#######   ],          ,           ,
    163, MARTIN    , TIME, DB time                                                   ,      33220306,   693.66ms,    69.4%, [#######   ],          ,           ,       -.32 % unaccounted time
    163, MARTIN    , WAIT, db file scattered read                                    ,         24636,   514.41us,      .1%, [          ],         1,        .02,    24.64ms average wait
    163, MARTIN    , WAIT, direct path read                                          ,      27536664,   574.98ms,    57.5%, [WWWWWW    ],     10694,      223.3,     2.57ms average wait
    163, MARTIN    , WAIT, SQL*Net message to client                                 ,             3,      .06us,      .0%, [          ],         2,        .04,      1.5us average wait
    163, MARTIN    , WAIT, SQL*Net message from client                               ,      14826292,   309.58ms,    31.0%, [WWWW      ],         2,        .04,      7.41s average wait

--  End of Stats snap 1, end=2015-05-11 10:13:13, seconds=47.9

Don’t worry about the wide output-all you need in the output is the STATISTIC and DELTA columns. This can all be a bit overwhelming at first, so let me guide you through-statistics that are not important for the discussion have already been removed.

First of all you see the phyical reads. 10752 IO requests were issued, all of them multi block requests (physical read total IO requests and physical read total multi block requests, repeated also in physical read IO requests). Hmm – 10GB – isn’t that 50% of my buffer cache? It might actually be possible that 10GB of the 20GB table were read from disk using direct path reads, and another 10GB came from memory.

What is interesting is the next set of counters: “data warehousing scanned %” which I haven’t seen until now. In fact, if I run @TanelPoder’s statn.sql script in my session I can see there are more than the ones I have in the snapper output:

SQL> @statn.sql warehousing

     STAT# HEX#      OFFSET NAME                                                                  VALUE
---------- ----- ---------- ---------------------------------------------------------------- ----------
       237 ED          1896 data warehousing scanned objects                                          5
       238 EE          1904 data warehousing scanned blocks                                    13333340
       239 EF          1912 data warehousing scanned blocks - memory                            6462815
       240 F0          1920 data warehousing scanned blocks - flash                                   0
       241 F1          1928 data warehousing scanned blocks - disk                              6870525
       242 F2          1936 data warehousing scanned blocks - offload                                 0
       243 F3          1944 data warehousing evicted objects                                          0
       244 F4          1952 data warehousing evicted objects - cooling                                0
       245 F5          1960 data warehousing evicted objects - replace                                0
       246 F6          1968 data warehousing cooling action                                           0

10 rows selected.

Interesting! I’m quite glad Oracle gave us so many statistics that describe the way the big table caching works. I have queried T1 5 times in my session, and scanned 13333340 blocks during these. The distribution between memory and disk is 6462815 to 6870525, almost 50:50. It looks like you can use flash for this and offloading (note to self: repeat the test on Exadata). That makes sense as in the example just shown: the segment to be scanned is 20GB out of which 10GB are in the buffer cache. If the rest of the segment can be scanned using direct path reads as in the above example then it is more than likely that you can offload the scan as well.

I can even see if there was space pressure on the big table cache, the data warehousing evicted% statistics hint at space management.

Purely from memory?

Looking at the session counters above my take is too big. I would like to see more scans entirely from memory :) So I created a table with 25% of the size of T1 and called it T2 using the sample clause. Initial scans against T2 showed disk scans only (policy = DISK):

SQL> r
  1* select * from v$bt_scan_obj_temps

       TS#   DATAOBJ# SIZE_IN_BLKS TEMPERATURE POLICY     CACHED_IN_MEM     CON_ID
---------- ---------- ------------ ----------- ---------- ------------- ----------
         4      39461      2669763        8000 MEM_PART         1293081          0
         4      39465       668107        5000 DISK                   0          0

But after the 9th scan (each scan increased the temperature by 1000) the situation changed:

SQL> r
  1* select * from v$bt_scan_obj_temps

       TS#   DATAOBJ# SIZE_IN_BLKS TEMPERATURE POLICY     CACHED_IN_MEM     CON_ID
---------- ---------- ------------ ----------- ---------- ------------- ----------
         4      39465       668107        9000 MEM_ONLY          668107          0
         4      39461      2669763        8000 MEM_PART          624974          0

The smaller table now fits into memory and took over! The result is visible in the execution time:

SQL> r
  1* select min(date_created) from t2

MIN(DATE_CREATED)
-----------------
20140513 00:00:00

Elapsed: 00:00:05.39

...

SQL> r
  1* select min(date_created) from t2

MIN(DATE_CREATED)
-----------------
20140513 00:00:00

Elapsed: 00:00:01.25

So to prove the point I reconnected to the database and ran my test again:


SQL> select min(date_created) from t2;

MIN(DATE_CREATED)
-----------------
20140513 00:00:00

Elapsed: 00:00:01.36

SQL> @statn warehousing

     STAT# HEX#      OFFSET NAME                                                                  VALUE
---------- ----- ---------- ---------------------------------------------------------------- ----------
       237 ED          1896 data warehousing scanned objects                                          1
       238 EE          1904 data warehousing scanned blocks                                      666966
       239 EF          1912 data warehousing scanned blocks - memory                             666966
       240 F0          1920 data warehousing scanned blocks - flash                                   0
       241 F1          1928 data warehousing scanned blocks - disk                                    0
       242 F2          1936 data warehousing scanned blocks - offload                                 0
       243 F3          1944 data warehousing evicted objects                                          0
       244 F4          1952 data warehousing evicted objects - cooling                                0
       245 F5          1960 data warehousing evicted objects - replace                                0
       246 F6          1968 data warehousing cooling action                                           0

10 rows selected.

So the popularity of the object plays a big role working out which segment to cache as well. This is an interesting feature worth testing in your development environments. I have no idea if it’s licensable or not so please make sure you check with Oracle before using it to avoid surprises.

Posted in 12c Release 1, Performance | Tagged: | 9 Comments »

Exadata Fast Data File Creation

Posted by Martin Bach on May 11, 2015

This post is the result of some testing I performed on Exadata data file creation. You may know that Exadata offloading incorporates SQL optimisations as well as some infrastructure work. For quite some time Exadata allowed the DBA to create data files a lot quicker than on traditional systems. This has been documented before, for example by @mpnsh here

The final comment on his blog entry was a remark that data file creation is quite fast, but that is not true for online redo logs. Especially in environments where you duplicate production to a lower tier environment you have to wait for the online redo logs (including all members across all threads) to be zeroed. This is no longer an issue with Fast Data File Creation. If your system is configured to use Write-Back Flash Cache (WBFC from now on) and you are on Exadata 11.2.3.3 then you can benefit from super-fast file creation, including online redo logs. Here is an example, taken from a SQL trace:

TKPROF: Release 12.1.0.2.0 
...
Trace file: /u01/app/oracle/diag/rdbms/mbach/MBACH1/trace/MBACH1_ora_124411.trc
Sort options: default

********************************************************************************
count    = number of times OCI procedure was executed
cpu      = cpu time in seconds executing
elapsed  = elapsed time in seconds executing
disk     = number of physical reads of buffers from disk
query    = number of buffers gotten for consistent read
current  = number of buffers gotten in current mode (usually for update)
rows     = number of rows processed by the fetch or execute call
********************************************************************************

SQL ID: fvt6psf2t3cdz Plan Hash: 0

alter database add logfile thread 2 group 5 ('+DATA','+RECO') size 4096m


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.37       5.09          0          0          2           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.38       5.09          0          0          2           0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  Disk file Mirror Read                           2        0.00          0.00
  control file sequential read                   23        0.00          0.01
  KSV master wait                                19        0.04          0.09
  ASM file metadata operation                     6        0.00          0.00
  CSS initialization                              1        0.00          0.00
  CSS operation: query                            6        0.00          0.00
  CSS operation: action                           2        0.00          0.00
  kfk: async disk IO                              2        0.00          0.00
  Disk file operations I/O                        3        0.00          0.00
  cell smart file creation                     3258        0.13          4.64
  log file single write                           2        0.00          0.00
  control file parallel write                    10        0.07          0.07
  log file sync                                   1        0.00          0.00
  SQL*Net message to client                       1        0.00          0.00
  SQL*Net message from client                     1       17.61         17.61
********************************************************************************

I have tried to replicate the command issued in Martin Nash’s blog post. If memory serves me right he was waiting for 37 seconds for this command to complete. The new method uses about 5 seconds-not bad! Interestingly there is no new wait event-the one we know (cell smart file creation) is used to indicate the work that has been done. In the case of Fast Data File Creation only the metadata about the new file is persisted in the WBFC, the actual formatting has not happened when the prompt returned.

Session Counters?

I have spent _a lot_ of time on Exadata related session counters in 12.1.0.2 and earlier releases. I think they are fascinating, and since the wait events do not really show me what happened during the execution of the statement I used the session counters instead. I am using Adrian Billington’s mystats tool for this:

SQL> @scripts/mystats start

SQL> alter database add logfile thread 2 group 5 ('+DATA','+RECO') size 4096m;

Database altered.

Elapsed: 00:00:04.87
SQL> @scripts/mystats stop t=1
...
STAT    cell flash cache read hits                                                      20
STAT    cell logical write IO requests                                                  14
STAT    cell overwrites in flash cache                                                  30
STAT    cell physical IO bytes eligible for predicate offload               17,179,869,184
STAT    cell physical IO bytes saved during optimized file creation         17,179,869,184
STAT    cell physical IO interconnect bytes                                      1,361,920
STAT    cell writes to flash cache                                                      30
...
STAT    physical write requests optimized                                               10
STAT    physical write total IO requests                                             4,126
STAT    physical write total bytes                                          17,180,197,888
STAT    physical write total bytes optimized                                       163,840
STAT    physical write total multi block requests                                    4,096

Let’s begin with the cell%-statistics. These indicate what happened on the storage layer. The command I executed was to create an online redo log group in thread 2, in disk groups DATA and RECO. Each file is approximately 4 GB in size.

SQL> select f.member, l.bytes/power(1024,2) m 
  2  from v$log l, v$logfile f
  3  where l.group# = f.group#
  4  and l.group# = 5;

MEMBER                                                                M
------------------------------------------------------------ ----------
+DATA/MBACH/ONLINELOG/group_5.454.879407723                        4096
+RECO/MBACH/ONLINELOG/group_5.1229.879407725                       4096

2 rows selected.

So why does the database report 17,180,197,888 “bytes saved during optimized file creation” and later again in “physical write total bytes”? The answer is ASM mirroring. In this system DATA and RECO are protected using ASM normal redundancy, doubling the writes.

What I found amusing is that the exact number of bytes eventually written is “eligible for predicate offload”. Until quite recently I only associated Smart Scans with this statistic counter.

You can also see a few writes (and overwrites) to (Write Back) Flash Cache.

Posted in 12c Release 1, Exadata, Performance | Leave a Comment »

Little things worth knowing: Is there a penalty in establishing a connection to Oracle using the MAA connection string? Part 2

Posted by Martin Bach on April 23, 2015

In the very lengthy previous post about the MAA connect string I wanted to explain the use of the MAA connection string as promoted by Oracle. I deliberately kept the first part simple: both primary and standby cluster were up, and although the database was operating in the primary role on what I called standby cluster (again it’s probably not a good idea to include the intended role in the infrastructure names) there was no penalty establishing a connection.

As pointed out by readers of the blog entry that is of course only one part of the story (you may have guessed by the TNS alias MAA_TEST1 … there are more to come). When you define the connection string this way, you cater for the situation where something goes wrong. Let’s try a few more scenarios. Here is the current Data Guard situation:

DGMGRL> show configuration

Configuration - martin

  Protection Mode: MaxPerformance
  Members:
  CDB   - Primary database
    Error: ORA-16810: multiple errors or warnings detected for the database

    STDBY - Physical standby database 
      Error: ORA-12543: TNS:destination host unreachable

Fast-Start Failover: DISABLED

Configuration Status:
ERROR   (status updated 71 seconds ago)
DGMGRL> show database 'CDB'

Database - CDB

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    CDB1
      Error: ORA-16737: the redo transport service for standby database "STDBY" has an error

    CDB2
      Error: ORA-16737: the redo transport service for standby database "STDBY" has an error

Database Status:
ERROR

As you can see CDB is operating in primary role, and redo transport is broken. This simple reason for broken redo transport is the fact that the standby cluster in its entirety is down. For the sake of completeness I have repeated the MAA connection string here:

MAA_TEST1 =
  (DESCRIPTION_LIST=
     (LOAD_BALANCE=off)(FAILOVER=on)
       (DESCRIPTION=
         (CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)
           (ADDRESS_LIST= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST=rac12pri-scan)(PORT=1521)))
           (CONNECT_DATA=(SERVICE_NAME=rootsrv)))
       (DESCRIPTION= (CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)
           (ADDRESS_LIST= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST= rac12sby-scan)(PORT=1521)))
           (CONNECT_DATA=(SERVICE_NAME=rootsrv)))
  )

Scenario 1: shut down of the standby cluster, database in primary role on primary cluster

This does not seem to be much of a problem. Using the same test as in part 1 I get no noteworthy penalty at all:

[oracle@lab tns]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Thu Apr 23 10:11:53 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Thu Apr 23 2015 09:59:48 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> 
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
CDB2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real	0m0.246s
user	0m0.020s
sys	0m0.025s
[oracle@lab tns]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Thu Apr 23 10:12:12 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Thu Apr 23 2015 10:11:52 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> 
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
CDB1

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real	0m0.171s
user	0m0.018s
sys	0m0.018s
[oracle@lab tns]$ 

Trying from a 12.1.0.1.0 client on a different machine:

[oracle@oracledev ~]$ time sqlplus system/secret@maa_test1 <<EOF
> select sys_context('userenv','instance_name') from dual;
> exit
> EOF

SQL*Plus: Release 12.1.0.1.0 Production on Thu Apr 23 10:13:50 2015

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Last Successful login time: Thu Apr 23 2015 10:12:11 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> 
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
CDB1

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real	0m0.427s
user	0m0.012s
sys	0m0.027s
[oracle@oracledev ~]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.1.0 Production on Thu Apr 23 10:13:52 2015

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Last Successful login time: Thu Apr 23 2015 10:13:50 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> 
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
CDB2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real	0m0.158s
user	0m0.012s
sys	0m0.016s

That seems ok, but can be easily explained by the (LOAD_BALANCE=off)(FAILOVER=on) in the DESCRIPTION_LIST. But what if we do it the other way around? Let’s start by reversing the roles:

DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected as SYSDBA.
DGMGRL> switchover to 'STDBY'
Performing switchover NOW, please wait...
New primary database "STDBY" is opening...
Oracle Clusterware is restarting database "CDB" ...
Switchover succeeded, new primary is "STDBY"

DGMGRL> show configuration

Configuration - martin

  Protection Mode: MaxPerformance
  Members:
  STDBY - Primary database
    CDB   - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 48 seconds ago)


Scenario 2: black-out of the primary cluster, database in primary role on standby cluster

The next is the worst case: a date centre has gone down, and none of the hosts in the other cluster (rac12pri) is reachable. To simulate this I simply shut down rac12pri1 and rac12pri2, that’s as dead as it gets. The SCAN will still resolve in DNS, but there is neither a SCAN VIP nor a SCAN LISTENER anywhere to answer on the primary cluster.

As you saw in the output leading to this paragraph the primary database is started on rac12sby, nodes rac12sby1 and rac12sby2. In the current format, there is – for the first time – a connection penalty:

preventde
[oracle@rac12sby1 ~]$ time sqlplus system/secret@maa_test1 <<EOF
> select sys_context('userenv','instance_name') from dual;
> exit;
> EOF

SQL*Plus: Release 12.1.0.2.0 Production on Tue Apr 21 07:17:36 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Mon Apr 20 2015 10:39:29 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL>
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
STDBY2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real    0m35.770s
user    0m0.024s
sys     0m0.043s

[oracle@rac12sby1 ~]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit;
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Tue Apr 21 07:18:39 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Tue Apr 21 2015 07:18:09 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL>
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
STDBY2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real    0m36.219s
user    0m0.021s
sys     0m0.022s

During testing with 12c these ranged between 30 seconds and 38. Here is proof that I truly cannot connect to the primary cluster:

[oracle@lab tns]$ ping rac12pri-scan
PING rac12pri-scan.example.com (192.168.100.111) 56(84) bytes of data.
From lab.example.com (192.168.100.1) icmp_seq=1 Destination Host Unreachable
From lab.example.com (192.168.100.1) icmp_seq=2 Destination Host Unreachable
From lab.example.com (192.168.100.1) icmp_seq=3 Destination Host Unreachable
From lab.example.com (192.168.100.1) icmp_seq=4 Destination Host Unreachable
^C
--- rac12pri-scan.example.com ping statistics ---
5 packets transmitted, 0 received, +4 errors, 100% packet loss, time 4001ms

[oracle@rac12sby1 ~]$ time telnet rac12pri-scan 1521
Trying 192.168.100.112...
telnet: connect to address 192.168.100.112: No route to host
Trying 192.168.100.113...
telnet: connect to address 192.168.100.113: No route to host
Trying 192.168.100.111...
telnet: connect to address 192.168.100.111: No route to host

real	0m9.022s
user	0m0.001s
sys	0m0.003s

I found it interesting that the telnet command took almost no time at all to complete, yet trying to connect using SQL_Net it seems to take forever. Why could that be? Trace!

Getting to the bottom of this

Tracing can help sometimes, so I tried that. I try and keep my working TNS configuration separate from the production entries in $ORACLE_HOME. Using ~/tns I defined sqlnet.ora to read:

  • diag_adr_enabled = off
  • trace_level_client = support
  • trace_directory_client = /home/oracle/tns
  • log_directory_client = /home/oracle/tns
  • trace_unique_client = true

And connected again. I noticed time increases whenever nsc2addr was invoked:

[oracle@lab tns]$ grep "nsc2addr.*DESC" cli_6630.trc | nl
     1  (1947452928) [23-APR-2015 11:07:45:376] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.112)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     2  (1947452928) [23-APR-2015 11:07:48:383] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.111)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     3  (1947452928) [23-APR-2015 11:07:51:390] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.113)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     4  (1947452928) [23-APR-2015 11:07:54:396] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.113)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     5  (1947452928) [23-APR-2015 11:07:57:402] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.111)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     6  (1947452928) [23-APR-2015 11:08:00:408] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.112)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     7  (1947452928) [23-APR-2015 11:08:03:414] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.113)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     8  (1947452928) [23-APR-2015 11:08:06:421] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.111)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     9  (1947452928) [23-APR-2015 11:08:09:427] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.112)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
    10  (1947452928) [23-APR-2015 11:08:12:433] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.111)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
    11  (1947452928) [23-APR-2015 11:08:15:439] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.112)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
    12  (1947452928) [23-APR-2015 11:08:18:445] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.113)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
    13  (1947452928) [23-APR-2015 11:08:21:452] nsc2addr: (DESCRIPTION=(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.120)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))

Sorry for the lengthy listing but that way it shows best! The primary SCAN is set to 111,112 and 113 with the standby SCAN on 119,120, and 121.

What you can see clearly by looking at the timestamps is that the transport_connect_timout works perfectly well-a new address is tried every 3 seconds. What surprises me is that each SCAN VIP is tried 4 times before the second SCAN is tried (lines 1-12). I thought I had that limited to 3 retries … Line 13 is the first reference to the second SCAN, and as soon as that is referenced a connection is established.

That can’t be it, can it?

Hmm maybe Oracle went a bit too far by adding 3 retries to every single SCAN VIP. The whole process can be sped up by trying every SCAN VIP only once before failing over to the second SCAN. This can be done with the following connection string:

MAA_TEST2 =
  (DESCRIPTION_LIST=
     (LOAD_BALANCE=off)(FAILOVER=on)
     (CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)
       (DESCRIPTION=
           (ADDRESS_LIST= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST=rac12pri-scan)(PORT=1521)))
           (CONNECT_DATA=(SERVICE_NAME=rootsrv)))
       (DESCRIPTION= 
           (ADDRESS_LIST= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST= rac12sby-scan)(PORT=1521)))
           (CONNECT_DATA=(SERVICE_NAME=rootsrv)))
  )

Trying this works a lot better from a timing perspective:

[oracle@lab tns]$ time sqlplus system/secret@maa_test2<<EOF
exit
EOF


SQL*Plus: Release 12.1.0.2.0 Production on Thu Apr 23 11:35:17 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Thu Apr 23 2015 11:08:21 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real	0m9.829s
user	0m0.041s
sys	0m0.022s
[oracle@lab tns]$ 

Visible in the SQL*Net trace:

[oracle@lab tns]$ grep "nsc2addr.*DESC" cli_7305.trc | nl 
     1	(3474875904) [23-APR-2015 11:35:17:935] nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.113)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     2	(3474875904) [23-APR-2015 11:35:20:940] nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.112)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     3	(3474875904) [23-APR-2015 11:35:23:948] nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.111)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))
     4	(3474875904) [23-APR-2015 11:35:26:956] nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.119)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=rootsrv)(CID=(PROGRAM=sqlplus)(HOST=lab.example.com)(USER=oracle))))

That was the solution to the “slow” connectivity that the person who asked me the question in Paris shared, but I haven’t been able to find proof as to why it works better: this is it. Now the question remains why the MAA string in the Application Continuity white paper is more complex? I think what’s failing here is the CONNECT_TIMEOUT. According to tnsnames.ora reference in 12c the purpose of the parameter is “To specify the timeout duration in seconds for a client to establish an Oracle Net connection to an Oracle database”. You could read this so that after CONNECT_TIMEOUT is reached the failover to the next SCAN occurs. However it appears – by looking at the trace – that the connection timeout is similar to transport timeout and the lower of both is used.

There is a high probability that the syntax is in the document there for a reason, although it escapes me right now. I guess the penalty when creating new connections with connection pools doesn’t really matter that much, because you create the pool (where you “pay” the penalty) only once, and subsequently pull connections from there, which should be very fast.

Posted in 12c Release 1, Linux | Tagged: | 2 Comments »

Little things worth knowing: Is there a penalty in establishing a connection to Oracle using the MAA connection string?

Posted by Martin Bach on April 20, 2015

Sorry for the long title!

I had a question during my session about “advanced RAC programming features” during the last Paris Oracle Meetup about the MAA connection string. I showed an example taken from the Appication Continuity White Paper (http://www.oracle.com/technetwork/database/options/clustering/application-continuity-wp-12c-1966213.pdf). Someone from the audience asked me if I had experienced any problems with it, such as very slow connection timeouts. I haven’t, but wanted to double-check anyway. This is a simplified test using a sqlplus connection since it is easier to time than a call to a connection pool creation. If you know of a way to reliably do so in Java/UCP let me know and I’ll test it.

My system is 12.1.0.2 on Oracle Linux 7.1 with UEK (3.8.13-55.1.6.el7uek.x86_64) and the following patches on the RDBMS and Grid Infrastructure side:

[oracle@rac12sby1 ~]$ opatch lspatches -oh /u01/app/12.1.0.2/grid
19872484;WLM Patch Set Update: 12.1.0.2.2 (19872484)
19769480;Database Patch Set Update : 12.1.0.2.2 (19769480)
19769479;OCW Patch Set Update : 12.1.0.2.2 (19769479)
19769473;ACFS Patch Set Update : 12.1.0.2.2 (19769473)

[oracle@rac12sby1 ~]$ opatch lspatches -oh /u01/app/oracle/product/12.1.0.2/dbhome_1
19877336;Database PSU 12.1.0.2.2, Oracle JavaVM Component (Jan2015)
19769480;Database Patch Set Update : 12.1.0.2.2 (19769480)
19769479;OCW Patch Set Update : 12.1.0.2.2 (19769479)

This is the January System Patch by the way, 20132450. At first I wanted to blog about the patch application but it was so uneventful I decided against it.

I have two clusters, rac12pri and rac12sby. Both consist of 2 nodes each. Database CDB is located on rac12pri, STDBY on rac12sby. Normally I am against naming databases by function as you might end up using STDBY in primary role after a data centre migration, and people tend to find the use of STDBY as primary database odd. In this case I hope it helps understanding the concept better, and it’s a lab environment anyway …

Creating the broker configuration

There is nothing special about creating the Broker Configuration, just remember to define the broker configuration files on shared storage and enabling automatic standby file management. I also recommend standby redo logs on both the primary and standby databases. Once you have the configuration in place, check the database(s) in verbose mode to get the broker connection string. You can copy/paste the connection string to sqlplus to ensure that every instance can be started thanks to a static listener registration (Data Guard will restart databases during switchover and failover operations, which is bound to fail unless the databases are statically registered with the listener). Here is what I mean:

DGMGRL> show instance verbose 'CDB1';

Instance 'CDB1' of database 'CDB'

  Host Name: rac12pri1
  PFILE:     
  Properties:
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.108)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=CDB_DGMGRL)(INSTANCE_NAME=CDB1)(SERVER=DEDICATED)))'
    StandbyArchiveLocation          = 'USE_DB_RECOVERY_FILE_DEST'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = '%t_%s_%r.dbf'
    TopWaitEvents                   = '(monitor)'

Instance Status:
SUCCESS

Now I can take the static connection identifier and use it (be sure to specify the “as sysdba” at the end):

[oracle@rac12pri1 ~]$ sqlplus 

SQL*Plus: Release 12.1.0.2.0 Production on Sat Apr 18 10:50:09 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter user-name: sys@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.100.108)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=CDB_DGMGRL)(INSTANCE_NAME=CDB1)(SERVER=DEDICATED))) as sysdba
Enter password: 

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL>

When this worked for all instances in the configuration you are a big step further in your Data Guard setup!

Service setup

I have created a new service, named rootsrv on both databases.

[oracle@rac12sby1 ~]$ srvctl add service -d STDBY -s rootsrv -role primary -policy automatic -preferred STDBY1,STDBY2

[oracle@rac12pri1 ~]$ srvctl add service -d CDB -s rootsrv -role primary -policy automatic -preferred CDB1,CDB2

To better follow along here is the current status of the configuration (In real life you might want to use a different protection mode):

DGMGRL> show configuration

Configuration - martin

  Protection Mode: MaxPerformance
  Members:
  STDBY - Primary database
    CDB   - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 28 seconds ago)

As per its definition, the service is started on the primary but not on the standby. In fact, I can’t start it on the standby:

[oracle@rac12sby1 ~]$ srvctl status service -d STDBY -s rootsrv
Service rootsrv is running on instance(s) STDBY1,STDBY2

[oracle@rac12pri1 ~]$ srvctl status service -d CDB -s rootsrv
Service rootsrv is not running.
[oracle@rac12pri1 ~]$ srvctl start service -d CDB -s rootsrv
PRCD-1084 : Failed to start service rootsrv
PRCR-1079 : Failed to start resource ora.cdb.rootsrv.svc
CRS-2800: Cannot start resource 'ora.cdb.db' as it is already in the INTERMEDIATE state on server 'rac12pri1'
CRS-2632: There are no more servers to try to place resource 'ora.cdb.rootsrv.svc' on that would satisfy its placement policy
CRS-2800: Cannot start resource 'ora.cdb.db' as it is already in the INTERMEDIATE state on server 'rac12pri2'
[oracle@rac12pri1 ~]$

So no more need for database triggers to start and stop services (this was another question I had during my talk) depending on the database’s role.

The MAA connection string

The MAA connection string as taken from the white paper and slightly adapted is this one:

MAA_TEST1 =
  (DESCRIPTION_LIST=
     (LOAD_BALANCE=off)(FAILOVER=on)
       (DESCRIPTION=
         (CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)
           (ADDRESS_LIST= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST=rac12pri-scan)(PORT=1521)))
           (CONNECT_DATA=(SERVICE_NAME=rootsrv)))
       (DESCRIPTION= (CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)
           (ADDRESS_LIST= (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST= rac12sby-scan)(PORT=1521)))
           (CONNECT_DATA=(SERVICE_NAME=rootsrv)))
  )

Please refer to the white paper for an explanation of the various parameters.

Notice that both the primary and standby SCAN are referenced in there, both connecting to “rootsrv” which is currently active on STDBY. To get some proper timing about the connection delay I use the following little snippet:


$ time sqlplus system/secret@maa_test1 <<EOF
> select sys_context('userenv','instance_name') from dual;
> exit
> EOF

Testing on the “primary cluster” first, the local database is operating in the standby role:

[oracle@rac12pri1 ~]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Sat Apr 18 10:41:25 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sat Apr 18 2015 10:33:53 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL>
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
STDBY2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real    0m0.432s
user    0m0.026s
sys     0m0.028s

[oracle@rac12pri1 ~]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Sat Apr 18 10:31:25 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sat Apr 18 2015 10:31:24 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL>
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
STDBY2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real    0m0.415s
user    0m0.023s
sys     0m0.025s

That was quick-not even a second. And as you can see the connection is set up against database STDBY on the other cluster.

Next on the standby cluster:

[oracle@rac12sby1 ~]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Sat Apr 18 10:33:26 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sat Apr 18 2015 10:33:22 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL>
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
STDBY2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real    0m0.613s
user    0m0.025s
sys     0m0.019s

[oracle@rac12sby1 ~]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv','instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Sat Apr 18 10:33:52 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sat Apr 18 2015 10:33:50 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL>
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
STDBY2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real    0m0.199s
user    0m0.024s
sys     0m0.021s

So this was quick too. And finally on a different machine altogether (my lab server)

[oracle@lab tns]$ time sqlplus system/secret@maa_test1 <<EOF
select sys_context('userenv', 'instance_name') from dual;
exit
EOF

SQL*Plus: Release 12.1.0.2.0 Production on Sat Apr 18 08:13:50 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sat Apr 18 2015 08:13:47 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> 
SYS_CONTEXT('USERENV','INSTANCE_NAME')
--------------------------------------------------------------------------------
STDBY2

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

real	0m0.387s
user	0m0.020s
sys	0m0.018s

Does it change after a role reversal? I think I’ll cover that in a different post …

Appendix: setting up Data Guard for RAC

This is something I always forget so it’s here for reference. I use an identical tnsnames.ora file across all nodes, here it is:

CDB =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac12pri-scan)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = CDB)
    )
  ) 

STDBY =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac12sby-scan)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = STDBY)
    )
  ) 

# used for DG and RMAN duplicate
STDBY_NON_SCAN =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12sby1-vip)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12sby2-vip)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = STDBY)
    )
  )

CDB_NON_SCAN =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12pri1-vip)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = rac12pri2-vip)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = CDB)
    )
  )

The non-scan TNS entries are a quick workaround to the problem that I registered the SIDs statically with the node listeners only, not the SCAN listeners. If there is a more elegant way of statically registering a database, please let me know! I couldn’t find any documentation on how to do this. I guess adding a SID_LIST_LISTENER_SCANx would do, too. Anyway, here is an example for the local node listener configuration (not the complete file contents). The ORACLE_SID and host names must be adapted for each node in the cluster.

LISTENER =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = IPC)(KEY = LISTENER))
    )
  )

# for broker
SID_LIST_LISTENER=
  (SID_LIST=
    (SID_DESC=
      (GLOBAL_DBNAME=STDBY)
      (ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1)
      (SID_NAME=STDBY1))
    (SID_DESC=
      (GLOBAL_DBNAME=STDBY_DGMGRL)
      (ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1)
      (SID_NAME=STDBY1))
  )

After changing the listener configuration you need to reload the local node listener.

Posted in 12c Release 1, Data Guard, Linux | Tagged: , | 7 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 2,785 other followers