Category Archives: 11g Release 2

Database 11g Release 2

A quick look at Oracle 12.2 performance instrumentation

Thanks to the prep-work done by Andy Colvin, who continuously and boldly goes where no man has gone before, one of our Exadata systems in the lab is now fully upgraded to 12.2.0.1. It comes fully equipped with the matching cellos to support all the cool new features. Exciting times!

The reason for this post is simple: I have started working on our talk for @Enkitec’s E4 conference in June but thought I’d share this little article with you as a teaser :) There might be one or two more of these posts but if you want the full story make sure you catch us (online) during the conference.

More diagnostic information in 12.2

The Oracle database truly champions diagnosability in a way I have not seen with any other system, and it does so out of the box. Granted, some of that requires an investment into extra cost options but the value one gets in form of Active Session History (ASH) and Automatic Workload Repository (AWR) is real. After I read the chapter on instrumentation in “Insights-tales from the Oaktable” (Apress) a long time ago, I started to realise the inherent beauty of having insights to code. This code can be your code if you instrumented it properly, or the Oracle codepath externalised as wait events. Sadly most application developers do not adhere to the principle of instrumenting code (or maybe don’t even know about the possibility?) and therefore complicate troubleshooting unnecessarily. The latter is not so much an issue on many platforms where you don’t have an equivalent of the Oracle Wait Interface and session counters anyway, but on Oracle it’s a real wasted opportunity as others have pointed out before me.

I’ll now take my idealist hat off and focus on the real world :) In the far more common case where the application isn’t instrumented by the developer, you can still get to some conclusions by using the Wait Interface and session counters. In most scenarios I am involved in the first step is to use Tanel Poder’s session snapper script which gives me insights to both.

So what has changed in this respect with 12.2? The test environment I am using is an Exadata quarter rack as mentioned before. The findings should be comparable with other Oracle software offerings, in the cloud and on premises.

Wait Interface

The Wait Interface is one of the most useful features for the performance engineer, and one of the building blocks for Statspack, ASH and AWR. I was curious if new events have been introduced in 12.2, hence this investigation. The sheer number of events tracked in the database prevents them from being listed verbally in this post. If you want to, you can use Tanel’s “sed.sql” to find out more, or simply query v$event_name.

An interesting tidbit for 12.2 has been covered by @FranckPachot: some of the more misleading event names such as db file scattered read and db file sequential readhave been clarified in 12.2. Search for events where the name is not equal to the display_name, or read Franck’s post on the DBI blog.

Wait events in 11.2.0.3

I like to start comparisons with a benchmark, and 11.2.0.3 seems to be a good candidate. Just looking at the wait_classes and counting events per class should be a good starting point:

SQL> select count(*), wait_class 
  2  from v$event_name 
  3  group by rollup(wait_class) 
  4  order by wait_class;

   COUNT(*) WAIT_CLASS
----------- ----------------------------------------------------------------
         55 Administrative
         17 Application
         50 Cluster
          2 Commit
         33 Concurrency
         24 Configuration
         95 Idle
         35 Network
        745 Other
          9 Queueing
          8 Scheduler
         31 System I/O
         48 User I/O
       1152

14 rows selected.

So there are 1152 events total in 11.2.0.3, keep that number in mind.

Wait events in 12.1.0.2

In my opinion 12.1 is a major step ahead, and I said it many times: I haven’t seen so many fundamental changes to the Oracle database for a long time. For various reasons though 12.1 hasn’t seen the traction in the field it probably deserved. Note how the diagnosability has been greatly enhanced:

SQL> select count(*), wait_class 
  2  from v$event_name 
  3  group by rollup(wait_class) 
  4  order by wait_class;

   COUNT(*) WAIT_CLASS
----------- ----------------------------------------------------------------
         57 Administrative
         17 Application
         64 Cluster
          4 Commit
         38 Concurrency
         26 Configuration
        121 Idle
         28 Network
       1186 Other
          9 Queueing
          9 Scheduler
         35 System I/O
         56 User I/O
       1650

14 rows selected.

A quick calculation reveals that 12.1 features 498 additional events, a lot more than 11.2.0.3. Not too bad in my opinion. Unfortunately most of these additional events ended up in the “Other” wait class. This is a real shame, and I have remarked this before when blogging about the new Data Guard Wait events: they probably should have ended up in the Network class instead. I am sure there are other examples like this.

Wait events in 12.2.0.1

Now what about 12.2? First let’s start with the list:

SQL> select count(*), wait_class
  2  from v$event_name
  3  group by rollup(wait_class)
  4  order by wait_class;

  COUNT(*) WAIT_CLASS
---------- ----------------------------------------------------------------
        57 Administrative
        17 Application
        68 Cluster
         4 Commit
        49 Concurrency
        27 Configuration
       135 Idle
        29 Network
      1314 Other
         9 Queueing
        10 Scheduler
        35 System I/O
        57 User I/O
      1811

14 rows selected.

There are indeed some new events, most of them can be found in the Other wait class. Again, this is quite unfortunate as it prevents the performance architect from identifying unknown wait events quickly.

I have decided to keep this post short-ish and will spend more time some other day to investigate the exact difference between 12.1.0.2 and 12.2.0.1. Most likely after E4 this year.

Session Counters

With the wait interface covered, it’s time to move on to the session counters. Continuing the approach I took with wait events I will group all session counters by class. Instead of re-inventing the wheel I am using a slightly adapted version of Tanel Poder’s “mys.sql” script to group counters by class. Most of them fall in just one, but there are others where more than one class is applicable. The 12.2 Reference Guide explains v$statname.class in more detail.

Session counters in 11.2.0.3

Before investigating 12.1 and 12.2 I’ll look at 11.2.0.3 first, as in the first section of the article. The SQL statement I used is this:

with classification as (
select name, TRIM(
  CASE WHEN BITAND(class,  1) =   1 THEN 'USER  ' END ||
  CASE WHEN BITAND(class,  2) =   2 THEN 'REDO  ' END ||
  CASE WHEN BITAND(class,  4) =   4 THEN 'ENQ   ' END ||
  CASE WHEN BITAND(class,  8) =   8 THEN 'CACHE ' END ||
  CASE WHEN BITAND(class, 16) =  16 THEN 'OS    ' END ||
  CASE WHEN BITAND(class, 32) =  32 THEN 'RAC   ' END ||
  CASE WHEN BITAND(class, 64) =  64 THEN 'SQL   ' END ||
  CASE WHEN BITAND(class,128) = 128 THEN 'DEBUG ' END
) class_name 
from v$statname
) select count(*), class_name from classification 
group by rollup(class_name)
order by class_name;

Executed on an 11.2.0.3 database this reveals the following numbers:

   COUNT(*) CLASS_NAME
----------- ------------------------------------------------
        121 CACHE
         27 CACHE RAC
         15 CACHE SQL
        188 DEBUG
          9 ENQ
         16 OS
         25 RAC
         32 REDO
         93 SQL
          2 SQL   DEBUG
        107 USER
          3 USER  RAC
        638

13 rows selected.

638 of them altogether. Keep the number in mind when moving to 12.1.

Session counters in 12.1.0.2

There was a major increase in the number of counters between 11.2.0.3 and 12.1.0.2. Consider the numbers:

  COUNT(*) CLASS_NAME
---------- ------------------------------------------------
       151 CACHE
        53 CACHE RAC
        15 CACHE SQL
       565 DEBUG
         9 ENQ
        16 OS
        35 RAC
        68 REDO
         1 REDO  RAC
       130 SQL
         2 SQL   DEBUG
       130 USER
         3 USER  RAC
      1178

14 rows selected.

That nearly doubles the number of counters available. Note that quite a few of the new counters fall into the DEBUG section. Looking a bit more closely you can see they seem to be used by the In Memory (IM) Option:

SQL>  SELECT
  2      regexp_substr(name,'\w+ \w+') AS short_name,
  3      COUNT(*)
  4  FROM
  5      v$statname
  6  WHERE
  7      class = 128
  8  GROUP BY
  9      regexp_substr(name,'\w+ \w+')
 10  ORDER BY
 11      2 DESC
 12  FETCH FIRST 5 ROWS ONLY;

SHORT_NAME                        COUNT(*)
------------------------------ -----------
IM repopulate                           49
IM scan                                 47
IM populate                             37
spare statistic                         35
IM space                                26

5 rows selected.

There are 198 session counters beginning with ‘IM %’. I can feel another post about DB In Memory coming …

Session counters in 12.2.0.1

Finally, here is the list of statistics in 12.2.0.1:

  COUNT(*) CLASS_NAME
---------- ------------------------------------------------
       174 CACHE
        73 CACHE RAC
        15 CACHE SQL
      1067 DEBUG
         9 ENQ
        16 OS
        35 RAC
        75 REDO
         1 REDO  RAC
       190 SQL
         2 SQL   DEBUG
       144 USER
         3 USER  RAC
      1804

Another 626 additional counters, that’s almost the number of counters available in total on the 11.2.0.3 system! Running my previous query again it seems that IM-related statistics dominate, but there are lots of others of interest.

As with the 12.2 wait events I don’t want to give too much away at this point (and the post is long enough anyway) so stay tuned for an update at a later time.

Summary

Oracle has been one of the best database engines around, and with 12.2.0.1 instrumentation is further improved. This post has again become too long, so I’ll stop here and defer the write-up of my investigation into those stats relevant for Exadata to a later point.

Happy troubleshooting!

Advertisement

Example of Full Transportable Export to create a 12c PDB

The Cool Stuff

Oracle has introduced a new way of transporting data from one platform to another in 12c. The new feature is called “Full Transportable Export”. This enhancement works from 11.2.0.3 and later and is a great way to move data, as it allows for an easy-to-use combination of Export Data Pump and Transportable Tablespaces. The documentation specifically praises it as a means to move from an 11g Release 2 database into a 12c Pluggable Database. And Pluggable Databases I like :)

Unfortunately the Full Transportable Export/Import process does not perform endianness conversion for you. And in fact, if you have encrypted data you want to move you cannot do so unless you are on the same endianness.

Limitations

Before we begin, here is a list of restrictions pertaining to the new Full Transportable Export, taken from the 12c Utilities Guide:

  • You must use a privileged account, at least DATAPUMP_EXP_FULL_DATABASE must be granted to user. That user’s default tablespace must not be part of the transport set
  • Encrypted data can be transported only between same endianness.
  • Endianness conversion not transparently taken care off: you must do so using dbms_file_transfer or RMAN convert
  • Export is not restartable
  • Objects to be transported must be on user-created tablespaces
  • LONG and LONG RAW objects in administrative tablespaces (SYSTEM, SYSAUX) cannot be transported across the network
  • All the user defined tablespaces must be placed in read-only mode: this is an outage!

There might be more, check the documentation for an updated version.

Let’s test

In order to assess the suitability of the concept, two users have been created in an Oracle 11.2.0.3 database on Solaris/Intel. I would have liked to test the procedure with a big endian source but my last SPARC box was a Sun Blade 100 from 10 years ago that finally went to the skip, or in other words I don’t have a SPARC server available in my lab right now…

To make it more interesting I am going to migrate these accounts into a 12.1.0.2 PDB on Linux. The source database is 11.2.0.3 with the April 2015 database PSU. Here is the data creation part in case you would like to follow the demo. First the Order Entry schema, part of the Swingbench suite is created.

oracle@solaris:~/swingbench/bin$ ./oewizard -allindexes -cl -create -cs //solaris/sol11203 -dba system \
-dbap secretpwd -ts soe_tbs -part -p soe -scale 1 -tc 2 -u soe
SwingBench Wizard
Author  :        Dominic Giles
Version :        2.5.0.971

Running in Lights Out Mode using config file : oewizard.xml
The following statement failed : GRANT EXECUTE ON dbms_lock TO soe :
   Due to : ORA-01031: insufficient privileges

============================================
|           Datagenerator Run Stats        |
============================================
Connection Time                        0:00:00.002
Data Generation Time                   0:06:53.268
DDL Creation Time                      0:03:17.259
Total Run Time                         0:10:10.533
Rows Inserted per sec                       29,294
Data Generated (MB) per sec                    2.4
Actual Rows Generated                   13,004,989

Post Creation Validation Report
===============================
The schema appears to have been created successfully.

Valid Objects
=============
Valid Tables : 'ORDERS','ORDER_ITEMS','CUSTOMERS','WAREHOUSES','ORDERENTRY_METADATA',
'INVENTORIES','PRODUCT_INFORMATION','PRODUCT_DESCRIPTIONS','ADDRESSES','CARD_DETAILS'
Valid Indexes : 'PRD_DESC_PK','PROD_NAME_IX','PRODUCT_INFORMATION_PK','PROD_SUPPLIER_IX',
'PROD_CATEGORY_IX','INVENTORY_PK','INV_PRODUCT_IX','INV_WAREHOUSE_IX','ORDER_PK',
'ORD_SALES_REP_IX','ORD_CUSTOMER_IX','ORD_ORDER_DATE_IX','ORD_WAREHOUSE_IX',
'ORDER_ITEMS_PK','ITEM_ORDER_IX','ITEM_PRODUCT_IX','WAREHOUSES_PK','WHS_LOCATION_IX',
'CUSTOMERS_PK','CUST_EMAIL_IX','CUST_ACCOUNT_MANAGER_IX','CUST_FUNC_LOWER_NAME_IX',
'ADDRESS_PK','ADDRESS_CUST_IX','CARD_DETAILS_PK','CARDDETAILS_CUST_IX'
Valid Views : 'PRODUCTS','PRODUCT_PRICES'
Valid Sequences : 'CUSTOMER_SEQ','ORDERS_SEQ','ADDRESS_SEQ','LOGON_SEQ','CARD_DETAILS_SEQ'
Valid Code : 'ORDERENTRY'
Schema Created

The Swingbench benchmark suite does not need introducing any more. I have used it many times and am truly grateful for @dominic_giles having written this for the Oracle community.

About to cause grief-deliberately

A second user is created to test all sorts of segment types to see how these are handled. All data is bundled under the MARTIN account. The idea is to test against (some of) the limitations documented against Transportable Tablespaces, which is invoked under the covers. It would be nice if segments that cannot be transported using TTS were part of the export dump-let’s see if that is going to be the case. And although the transport might work without problems for me, it will do so because I tested it the way I did. It does not mean that for example objects using XMLDB will always be migrated successfully! As always, test, test, test … with your systems. There is nothing like a few dummy runs to hone in the procedures.

Back to the user creation:

SQL> show parameter compatible

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
compatible                           string      11.2.0.3

SQL> create user martin identified by secretpwd quota unlimited on users
  2  /

User created.

SQL> begin
  2   for i in 1..10 loop
  3    execute immediate 'create or replace procedure martin.p' || i || ' as begin null; end;';
  4   end loop;
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL> select owner, object_name from dba_objects
  2  where owner = 'MARTIN'
  3* and object_name like 'P%'

OWNER                          OBJECT_NAME
------------------------------ ------------------------------
MARTIN                         P9
MARTIN                         P8
MARTIN                         P7
MARTIN                         P6
MARTIN                         P5
MARTIN                         P4
MARTIN                         P3
MARTIN                         P2
MARTIN                         P10
MARTIN                         P1

10 rows selected.

SQL> create table martin.xmltest of xmltype tablespace users;

Table created.

SQL> insert into martin.xmltest values ('<toto/>');

1 row created.

SQL> commit;

Commit complete.

SQL> select * from martin.xmltest;

SYS_NC_ROWINFO$
--------------------------------------------------------------------------------
<toto/>

SQL> create table martin.iot_test (
  2    object_id,
  3    owner,
  4    object_name,
  5    subobject_name,
  6    object_type,
  7    constraint pk_iot_test primary key (object_id))
  8  organization index tablespace users as
  9  select object_id,owner,object_name,subobject_name,
 10  object_type from dba_objects;

Table created.

SQL> create table martin.lobtest (
  2   id number primary key,
  3   l_data clob)
  4  lob (l_data)
  5 store as secrefile;

Table created.

SQL> insert into martin.lobtest values (1, 'firstlob');

1 row created.

SQL> commit;

Commit complete.

SQL> select table_name, column_name,in_row, tablespace_name
  2  from dba_lobs
  3* where owner = 'MARTIN'

TABLE_NAME           COLUMN_NAME                    IN_ TABLESPACE_NAME
-------------------- ------------------------------ --- --------------------
XMLTEST              XMLDATA                        YES USERS
LOBTEST              L_DATA                         YES USERS

SQL> create table martin.tztest (id number primary key, d timestamp with local time zone)
  2   tablespace users;

SQL> insert into martin.tztest values (1, systimestamp);

SQL> commit;

Commit complete

That’s all I could think of. I may have added nested tables but I am not aware of their use in an Oracle application outside of PL/SQL. Now in this place I am deliberately causing trouble, and I am also working against the limitations and recommendations taken from the documentation. The following is documented behaviour, my interest was to see how it actually works (and also give you a clue when you search the Internet for error messages).

The Migration

Before migrating the database you need to check if the data is self-contained. Since there are two tablespaces to be migrated, I need to check them both.


SQL> select name,PLATFORM_NAME from v$database

NAME      PLATFORM_NAME
--------- --------------------------------------------------
SOL11203  Solaris Operating System (x86-64)

SQL> exec sys.dbms_tts.transport_set_check('users,soe_tbs',true)

PL/SQL procedure successfully completed.

SQL> select * from transport_set_violations;

no rows selected

Just as with “regular” transportable tablespaces, you need to create a directory to store the export dump and put the tablespaces to, or use the default DATA_PUMP_DIR.

You then need to place the user-defined tablespaces into read-only mode. Essentially all non-Oracle managed tablespaces are affected, causing an outage at this point.


SQL> select tablespace_name from dba_tablespaces;

TABLESPACE_NAME
------------------------------
SYSTEM
SYSAUX
UNDOTBS1
TEMP
USERS
SOE_TBS

6 rows selected.

SQL> alter tablespace users read only;

Tablespace altered.

SQL> c.users.soe_tbs
  1* alter tablespace soe_tbs read only
SQL> r
  1* alter tablespace soe_tbs read only

Tablespace altered.

The exciting bit is when you start the export. Here is a sample command:

oracle@solaris:~/full_tts$ expdp system full=y transportable=always version=12 directory=full_tts_dir \
> dumpfile=exp_full_transportable.dmp logfile=exp_full_transportable.log 

Export: Release 11.2.0.3.0 - Production on Wed Oct 7 21:04:10 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
Password: 

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Starting "SYSTEM"."SYS_EXPORT_FULL_01":  system/******** full=y  
 transportable=always version=12 directory=full_tts_dir 
 dumpfile=exp_full_transportable.dmp 
 logfile=exp_full_transportable.log 
Estimate in progress using BLOCKS method...
Processing object type DATABASE_EXPORT/PLUGTS_FULL/FULL/PLUGTS_TABLESPACE
Processing object type DATABASE_EXPORT/PLUGTS_FULL/PLUGTS_BLK
Processing object type DATABASE_EXPORT/EARLY_OPTIONS/TABLE_DATA
Processing object type DATABASE_EXPORT/EARLY_OPTIONS/VIEWS_AS_TABLES/TABLE_DATA
Processing object type DATABASE_EXPORT/NORMAL_OPTIONS/TABLE_DATA
Processing object type DATABASE_EXPORT/NORMAL_OPTIONS/VIEWS_AS_TABLES/TABLE_DATA
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 80.43 MB
Processing object type DATABASE_EXPORT/PRE_SYSTEM_IMPCALLOUT/MARKER
...
. . exported "SYSTEM"."REPCAT$_REPCATLOG"                    0 KB       0 rows
. . exported "SYSTEM"."REPCAT$_REPCOLUMN"                    0 KB       0 rows
. . exported "SYSTEM"."REPCAT$_REPGROUP_PRIVS"               0 KB       0 rows
. . exported "SYSTEM"."REPCAT$_REPOBJECT"                    0 KB       0 rows
. . exported "SYSTEM"."REPCAT$_REPPROP"                      0 KB       0 rows
. . exported "SYSTEM"."REPCAT$_REPSCHEMA"                    0 KB       0 rows
. . exported "SYSTEM"."REPCAT$_RESOLUTION"                   0 KB       0 rows
. . exported "SYSTEM"."REPCAT$_RESOLUTION_STATISTICS"        0 KB       0 rows
. . exported "SYSTEM"."REPCAT$_RESOL_STATS_CONTROL"          0 KB       0 rows
. . exported "SYSTEM"."REPCAT$_RUNTIME_PARMS"                0 KB       0 rows
. . exported "SYSTEM"."REPCAT$_SITES_NEW"                    0 KB       0 rows
. . exported "SYSTEM"."REPCAT$_SITE_OBJECTS"                 0 KB       0 rows
. . exported "SYSTEM"."REPCAT$_SNAPGROUP"                    0 KB       0 rows
. . exported "SYSTEM"."REPCAT$_TEMPLATE_OBJECTS"             0 KB       0 rows
. . exported "SYSTEM"."REPCAT$_TEMPLATE_PARMS"               0 KB       0 rows
. . exported "SYSTEM"."REPCAT$_TEMPLATE_REFGROUPS"           0 KB       0 rows
. . exported "SYSTEM"."REPCAT$_TEMPLATE_SITES"               0 KB       0 rows
. . exported "SYSTEM"."REPCAT$_TEMPLATE_TARGETS"             0 KB       0 rows
. . exported "SYSTEM"."REPCAT$_USER_AUTHORIZATIONS"          0 KB       0 rows
. . exported "SYSTEM"."REPCAT$_USER_PARM_VALUES"             0 KB       0 rows
. . exported "SYSTEM"."SQLPLUS_PRODUCT_PROFILE"              0 KB       0 rows
Master table "SYSTEM"."SYS_EXPORT_FULL_01" successfully loaded/unloaded
******************************************************************************
Dump file set for SYSTEM.SYS_EXPORT_FULL_01 is:
  /export/home/oracle/full_tts/exp_full_transportable.dmp
******************************************************************************
Datafiles required for transportable tablespace SOE_TBS:
  /u01/oradata/SOL11203/datafile/o1_mf_soe_tbs_bomhd219_.dbf
Datafiles required for transportable tablespace USERS:
  /u01/oradata/SOL11203/datafile/o1_mf_users_bolbn3sn_.dbf
Job "SYSTEM"."SYS_EXPORT_FULL_01" successfully completed at 21:09:33

A quick check on the logfile revealed that objects owned by XDB, SYS, WMSYS, SYSTEM, APEX, SYSMAN, OLAPSYS, ORDDATA were exported. I did not see any non-Oracle provided objects exported in the list of tables. So the import will be interesting!

The next step is to move the dump and data files across to the target system. In this example NFS is used, but any other efficient means of transportation will do.

Importing into a PDB

In preparation for the metadata import you need to either create a directory or point impdp to an existing one. In the example, FULL_TTS_DIR is used. The directory points to the NFS mount. The files stored in the directory include the data files (/u01/oradata/SOL11203/datafile/o1_mf_soe_tbs_bomhd219_.dbf and /u01/oradata/SOL11203/datafile/o1_mf_users_bolbn3sn_.dbf plus the dump file exp_full_transportable.dmp). Since both systems are little endian no conversion is necessary at this stage. Before starting here is some information about the destination. The PDB created for the import is named SOE_PDB.

SQL> select name,cdb, platform_name from v$database;

NAME      CDB PLATFORM_NAME
--------- --- ----------------------------------------------------------------------------------
CDB       YES Linux x86 64-bit

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 MASTER                         MOUNTED

SQL> create pluggable database soe_pdb admin user pdb_admin
  2  identified by secretpwd;

Pluggable database created.

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 MASTER                         MOUNTED
         4 SOE_PDB                        MOUNTED

SQL> alter pluggable database SOE_PDB open ;

Pluggable database altered.

Since the database is using ASM the data files first have to be moved into ASM before they can be made known to the destination. The files originate from little endian Solaris/Intel so no conversion is needed. There are many ways to move data files to ASM, and asmcmd is one of them. The file naming convention on ASM for PDBs is diskgroupName/CDBName/PDBGUID/datafile/fileName.dbf’. The GUID can be found in v$pdbs for example.

SQL> select guid, name from v$pdbs where name = 'SOE_PDB';

GUID                             NAME
-------------------------------- ------------------------------
218B3FD3B43B146EE0531438A8C06EE5 SOE_PDB

[oracle@oraclelinux7 full_tts_dir]$ asmcmd cp /m/oracle/full_tts_dir/o1_mf_soe_tbs_bomhd219_.dbf  +data/cdb/218B3FD3B43B146EE0531438A8C06EE5/datafile/soe_tbs.dbf
copying /m/oracle/full_tts_dir/o1_mf_soe_tbs_bomhd219_.dbf -> +data/cdb/218B3FD3B43B146EE0531438A8C06EE5/datafile/soe_tbs.dbf
[oracle@oraclelinux7 full_tts_dir]$ asmcmd cp /m/oracle/full_tts_dir/o1_mf_users_bolbn3sn_.dbf +data/cdb/218B3FD3B43B146EE0531438A8C06EE5/datafile/users.dbf
copying /m/oracle/full_tts_dir/o1_mf_users_bolbn3sn_.dbf -> +data/cdb/218B3FD3B43B146EE0531438A8C06EE5/datafile/users.dbf
[oracle@oraclelinux7 full_tts_dir]$ 

Although the data files are now in ASM, they are not part of the database yet. This is done in the next step.

[oracle@oraclelinux7 full_tts_dir]$ impdp system/secretpwd@localhost/SOE_PDB full=y \
> dumpfile=exp_full_transportable.dmp directory=FULL_TTS_DIR \
> transport_datafiles= \
> '+data/cdb/218B3FD3B43B146EE0531438A8C06EE5/datafile/soe_tbs.dbf', \
> '+data/cdb/218B3FD3B43B146EE0531438A8C06EE5/datafile/users.dbf' \
> logfile=imp_full_transportable.log

Import: Release 12.1.0.2.0 - Production on Wed Oct 7 22:40:55 2015

Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics
and Real Application Testing options
Master table "SYSTEM"."SYS_IMPORT_FULL_01" successfully loaded/unloaded
Source time zone is +00:00 and target time zone is -07:00.
Starting "SYSTEM"."SYS_IMPORT_FULL_01":  system/********@localhost/SOE_PDB 
  full=y dumpfile=exp_full_transportable.dmp directory=FULL_TTS_DIR 
  transport_datafiles=+data/cdb/218B3FD3B43B146EE0531438A8C06EE5/datafile/soe_tbs.dbf, 
  +data/cdb/218B3FD3B43B146EE0531438A8C06EE5/datafile/users.dbf  
  logfile=imp_full_transportable.log 
Processing object type DATABASE_EXPORT/PRE_SYSTEM_IMPCALLOUT/MARKER
Processing object type DATABASE_EXPORT/PRE_INSTANCE_IMPCALLOUT/MARKER
Processing object type DATABASE_EXPORT/PLUGTS_FULL/PLUGTS_BLK
Processing object type DATABASE_EXPORT/TABLESPACE
ORA-39083: Object type TABLESPACE:"UNDOTBS1" failed to create with error:
ORA-01516: nonexistent log file, data file, or temporary file "/u01/oradata/SOL11203/datafile/o1_mf_undotbs1_bolbn3ro_.dbf"
Failing sql is:
 ALTER DATABASE DATAFILE '/u01/oradata/SOL11203/datafile/o1_mf_undotbs1_bolbn3ro_.dbf' RESIZE 844103680
ORA-31684: Object type TABLESPACE:"TEMP" already exists
Processing object type DATABASE_EXPORT/PROFILE
Processing object type DATABASE_EXPORT/SYS_USER/USER
...
Processing object type DATABASE_EXPORT/NORMAL_POST_INSTANCE_IMPCALLOU/MARKER
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE
ORA-39360: Table "SOE"."ORDERS" was skipped due to transportable import and TSLTZ issues resulting from time zone mismatch.
ORA-39151: Table "ORDDATA"."ORDDCM_INTERNAL_TAGS" exists. All dependent metadata and data will be skipped due to table_exists_action of skip
ORA-39151: Table "ORDDATA"."ORDDCM_DOC_TYPES" exists. All dependent metadata and data will be skipped due to table_exists_action of skip

ORA-39151: Table "OLAPSYS"."XML_LOAD_LOG" exists. All dependent metadata and data will be skipped due to table_exists_action of skip
ORA-39360: Table "MARTIN"."TZTEST" was skipped due to transportable import and TSLTZ issues resulting from time zone mismatch.
ORA-39151: Table "ORDDATA"."ORDDCM_MAPPING_DOCS" exists. All dependent metadata and data will be skipped due to table_exists_action of skip

ORA-39083: Object type TABLE:"MARTIN"."XMLTEST" failed to create with error:
ORA-39945: Token conflicting with existing tokens.
Failing sql is:
CREATE TABLE "MARTIN"."XMLTEST" OF XMLTYPE  OID '165A17E90A5B0857E054080027F26450'  XMLTYPE STORE AS SECUREFILE BINARY XML ( TABLESPACE "USERS" ENABLE STORAGE IN ROW CHUNK 8192 NOCACHE LOGGING  NOCOMPRESS  KEEP_DUPLICATES  STORAGE(SEG_FILE 4 SEG_BLOCK 177 OBJNO_REUSE 76699 INITIAL 106496 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 BUFFER_POOL DEFAU
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
. . imported "APEX_030200"."WWV_FLOW_PAGE_PLUGS"         3.834 MB    7416 rows


Processing object type DATABASE_EXPORT/SCHEMA/TABLE/INDEX/INDEX
ORA-39112: Dependent object type INDEX:"SOE"."ORDER_PK" skipped, base object type TABLE:"SOE"."ORDERS" creation failed
ORA-39112: Dependent object type INDEX:"SOE"."ORD_SALES_REP_IX" skipped, base object type TABLE:"SOE"."ORDERS" creation failed
ORA-39112: Dependent object type INDEX:"SOE"."ORD_CUSTOMER_IX" skipped, base object type TABLE:"SOE"."ORDERS" creation failed
ORA-39112: Dependent object type INDEX:"SOE"."ORD_ORDER_DATE_IX" skipped, base object type TABLE:"SOE"."ORDERS" creation failed
ORA-39112: Dependent object type INDEX:"SOE"."ORD_WAREHOUSE_IX" skipped, base object type TABLE:"SOE"."ORDERS" creation failed
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/INDEX/FUNCTIONAL_INDEX/INDEX

The import command took a fair amount of time to complete, but this is at least partially so because of the underlying hardware which could do with 2 more cores. Or 4 actually … As expected, some errors were thrown by the process. As stated in the documentation timestamp with local time zone tables will not be transportable using TTS. This is visible from the log, the subsequent errors are a result of this. You can’t create an index on a non-existing table.

[oracle@oraclelinux7 full_tts_dir]$ grep SOE imp_full_transportable.log
ORA-39360: Table "SOE"."ORDERS" was skipped due to transportable import and TSLTZ issues resulting from time zone mismatch.
ORA-39082: Object type PACKAGE:"SOE"."ORDERENTRY" created with compilation warnings
ORA-39112: Dependent object type INDEX:"SOE"."ORDER_PK" skipped, base object type TABLE:"SOE"."ORDERS" creation failed
ORA-39112: Dependent object type INDEX:"SOE"."ORD_SALES_REP_IX" skipped, base object type TABLE:"SOE"."ORDERS" creation failed
ORA-39112: Dependent object type INDEX:"SOE"."ORD_CUSTOMER_IX" skipped, base object type TABLE:"SOE"."ORDERS" creation failed
ORA-39112: Dependent object type INDEX:"SOE"."ORD_ORDER_DATE_IX" skipped, base object type TABLE:"SOE"."ORDERS" creation failed
ORA-39112: Dependent object type INDEX:"SOE"."ORD_WAREHOUSE_IX" skipped, base object type TABLE:"SOE"."ORDERS" creation failed
ORA-39112: Dependent object type CONSTRAINT:"SOE"."ORDER_MODE_LOV" skipped, base object type TABLE:"SOE"."ORDERS" creation failed
ORA-39112: Dependent object type CONSTRAINT:"SOE"."ORDER_PK" skipped, base object type TABLE:"SOE"."ORDERS" creation failed
ORA-39112: Dependent object type CONSTRAINT:"SOE"."ORDER_TOTAL_MIN" skipped, base object type TABLE:"SOE"."ORDERS" creation failed
ORA-39083: Object type REF_CONSTRAINT:"SOE"."ORDER_ITEMS_ORDER_ID_FK" failed to create with error:
ALTER TABLE "SOE"."ORDER_ITEMS" ADD CONSTRAINT "ORDER_ITEMS_ORDER_ID_FK" FOREIGN KEY ("ORDER_ID") REFERENCES "SOE"."ORDERS" ("ORDER_ID") ON DELETE CASCADE ENABLE NOVALIDATE
ORA-39112: Dependent object type REF_CONSTRAINT:"SOE"."ORDERS_CUSTOMER_ID_FK" skipped, base object type TABLE:"SOE"."ORDERS" creation failed
ORA-39082: Object type PACKAGE BODY:"SOE"."ORDERENTRY" created with compilation warnings

In addition to that some of the objects created in the MARTIN schema did not survive the operation either:

ORA-31684: Object type USER:"MARTIN" already exists
ORA-39360: Table "MARTIN"."TZTEST" was skipped due to transportable import and TSLTZ issues resulting from time zone mismatch.
ORA-39083: Object type TABLE:"MARTIN"."XMLTEST" failed to create with error:
CREATE TABLE "MARTIN"."XMLTEST" OF XMLTYPE  OID '165A17E90A5B0857E054080027F26450'  XMLTYPE
 STORE AS SECUREFILE BINARY XML ( TABLESPACE "USERS" ENABLE STORAGE IN ROW CHUNK 8192 NOCACHE
 LOGGING  NOCOMPRESS  KEEP_DUPLICATES  STORAGE(SEG_FILE 4 SEG_BLOCK 177 OBJNO_REUSE 76699
 INITIAL 106496 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 BUFFER_POOL
 DEFAU
ORA-39112: Dependent object type CONSTRAINT:"MARTIN"."SYS_C0011162" skipped, base object
 type TABLE:"MARTIN"."TZTEST" creation failed

You can’t blame Oracle for that, this is expected behaviour! On the other hand it does also imply that the process is indeed a wrapper around (cross-platform) transportable tablespaces. Two out of four tables from the MARTIN schema made it across: LOBTEST and IOT_TEST. The XML table did not, and neither did the one with the timestamp with local time zone. When using the new Full Tranportable Export/Import, be aware of what the tool can and cannot do!

The white paper at http://www.oracle.com/technetwork/database/enterprise-edition/full-transportable-wp-12c-1973971.pdf also states that beginning with 12c administrative information stored on SYSTEM and SYSAUX is neither exported nor imported. This probably applies to a 12c->12c migration. In my log file I can see a number of account creation commands, or rather, how they failed.

[oracle@oraclelinux7 ~]$ egrep -i "object type user.*already exists" imp_full_transportable.log
ORA-31684: Object type USER:"OUTLN" already exists
ORA-31684: Object type USER:"ORDDATA" already exists
ORA-31684: Object type USER:"OLAPSYS" already exists
ORA-31684: Object type USER:"MDDATA" already exists
ORA-31684: Object type USER:"SPATIAL_WFS_ADMIN_USR" already exists
ORA-31684: Object type USER:"SPATIAL_CSW_ADMIN_USR" already exists
ORA-31684: Object type USER:"FLOWS_FILES" already exists
ORA-31684: Object type USER:"APEX_PUBLIC_USER" already exists

Time permitting I will try again with a 12c non-CDB and import into a 12c PDB.

Little things worth knowing: exp/imp vs expdp and impdp for HCC in Exadata

Do you know the difference between exp/imp and expdp/impdp when it comes to importing HCC compressed data in Exadata?

If not, then follow me through two examples. This is on 11.2.0.3/11.2.3.3.1 but applies to all database releases you can have on Exadata. The task at hand is to export a table (which happens to be non-partitioned and HCC compressed for query high) and import it into a different user’s schema. This is quite a common approach when migrating data from a non-Exadata system into an Exadata system. You could for example pre-create the DDL for the tables and implement HCC before even importing a single row. When importing the data, the partitions’ HCC attributes will be honoured and data will be inserted compressed. Or won’t it?

The table

The table I want to export resides on our V2 system. Since I am (trying to be) a good citizen I want to use dbfs for the dump files. Beginning with 12.1.0.2 Grid Infrastructure you can also have ACFS by the way. Let’s start by creating the directory needed and some meta-information about the source table:

SQL> create or replace directory DBM01 as '/dbfs_direct/FS1/mbach/';

Directory created.

SQL> select owner, table_name, partitioned, num_rows, compress_for from dba_tables where table_name = 'T1_QH';

OWNER                          TABLE_NAME                     PAR   NUM_ROWS COMPRESS_FOR
------------------------------ ------------------------------ --- ---------- ------------
MARTIN                         T1_QH                          NO             QUERY HIGH

SQL> select bytes/power(1024,2) m, blocks from dba_segments where segment_name = 'T1_QH';

         M     BLOCKS
---------- ----------
        72       9216

The table is 72 MB in size, and HCC compressed (I actually ensured that it was by issuing an “alter table t1_qh move;” before the export started).

Data Pump

The first export uses expdp, followed by impdp to get the data back. I am remapping the schema so I don’t have to risk overwriting my source data.

[enkdb01:oracle:DBM011] /home/oracle/mbach
> expdp martin/xxxxxx directory=DBM01 dumpfile=exp_t1_qh.dmp logfile=exp_t1_qh.log tables=t1_qh

Export: Release 11.2.0.3.0 - Production on Tue Mar 24 06:15:34 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
Starting "MARTIN"."SYS_EXPORT_TABLE_02":  martin/******** directory=DBM01 dumpfile=exp_t1_qh.dmp logfile=exp_t1_qh.log tables=t1_qh
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 72 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
. . exported "MARTIN"."T1_QH"                            9.675 GB 10000000 rows
Master table "MARTIN"."SYS_EXPORT_TABLE_02" successfully loaded/unloaded
******************************************************************************
Dump file set for MARTIN.SYS_EXPORT_TABLE_02 is:
  /dbfs_direct/FS1/mbach/exp_t1_qh.dmp
Job "MARTIN"."SYS_EXPORT_TABLE_02" successfully completed at 06:17:16

The interesting bit here is that the table on disk occupies around 72 MB, and yet expdp tells me the 10000000 rows occupy 9.7 GB. Can anyone guess why?

[enkdb01:oracle:DBM011] /home/oracle/mbach
> ls -lh /dbfs_direct/FS1/mbach/exp_t1_qh.dmp
-rw-r----- 1 oracle dba 9.7G Mar 24 06:17 /dbfs_direct/FS1/mbach/exp_t1_qh.dmp

Yes, 10GB. Data apparently is not exported in its compressed form. Now this table is going to be imported:

[enkdb01:oracle:DBM011] /home/oracle/mbach
> impdp imptest/xxxxxx directory=DBM01 dumpfile=exp_t1_qh.dmp logfile=imp_t1_qh.log \
> tables=martin.t1_qh remap_schema=martin:imptest 

Import: Release 11.2.0.3.0 - Production on Tue Mar 24 06:23:44 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
Master table "IMPTEST"."SYS_IMPORT_TABLE_01" successfully loaded/unloaded
Starting "IMPTEST"."SYS_IMPORT_TABLE_01":  imptest/******** directory=DBM01
  dumpfile=exp_t1_qh.dmp logfile=imp_t1_qh.log tables=martin.t1_qh remap_schema=martin:imptest
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
. . imported "IMPTEST"."T1_QH"                           9.675 GB 10000000 rows
Job "IMPTEST"."SYS_IMPORT_TABLE_01" successfully completed at 06:29:53

This looks all right-all data back (the number of rows exported matches those imported). What about the segment size?

SQL> select owner, table_name, partitioned, num_rows, compress_for from dba_tables where table_name = 'T1_QH';

OWNER                          TABLE_NAME                     PAR   NUM_ROWS COMPRESS_FOR
------------------------------ ------------------------------ --- ---------- ------------
MARTIN                         T1_QH                          NO             QUERY HIGH
IMPTEST                        T1_QH                          NO             QUERY HIGH

2 rows selected.

SQL> select bytes/power(1024,2) m, blocks from dba_segments where segment_name = 'T1_QH';

         M     BLOCKS
---------- ----------
        72       9216
        72       9216

2 rows selected.

Identical down to the block.

Traditional Export/Import

Despite the fact that exp/imp are deprecated they are still included with 12.1.0.2, the current release at the time of writing. What if you did the same process with these instead? After all, many DBAs “grew up” with those tools and can use them in their sleep. This plus some initial deficits with Data Pump in 10g keep exp/imp high up in the list of tools we like.

Let’s export the table:

[enkdb01:oracle:DBM011] /home/oracle/mbach
> exp martin/xxxxxxx file=/dbfs_direct/FS1/mbach/exp_t1_qh_classic.dmp log=/dbfs_direct/FS1/mbach/exp_t1_qh_classic.log tables=t1_qh

Export: Release 11.2.0.3.0 - Production on Tue Mar 24 06:42:34 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Tes
Export done in US7ASCII character set and AL16UTF16 NCHAR character set

About to export specified tables via Conventional Path ...
. . exporting table                          T1_QH   10000000 rows exported
Export terminated successfully without warnings.

The files are more or less identical in size to the ones created before by Data Pump:

[enkdb01:oracle:DBM011] /home/oracle/mbach
> ls -lh /dbfs_direct/FS1/mbach/*classic*
-rw-r--r-- 1 oracle dba 9.7G Mar 24 07:01 /dbfs_direct/FS1/mbach/exp_t1_qh_classic.dmp
-rw-r--r-- 1 oracle dba  537 Mar 24 07:01 /dbfs_direct/FS1/mbach/exp_t1_qh_classic.log

What about the import?

[enkdb01:oracle:DBM011] /home/oracle/mbach
> imp imptest/xxxxxx file=/dbfs_direct/FS1/mbach/exp_t1_qh_classic.dmp log=/dbfs_direct/FS1/mbach/imp_t1_qh_classic.log \
> FROMUSER=martin TOUSER=imptest tables=t1_qh

Import: Release 11.2.0.3.0 - Production on Tue Mar 24 07:27:42 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Tes

Export file created by EXPORT:V11.02.00 via conventional path

Warning: the objects were exported by MARTIN, not by you

import done in US7ASCII character set and AL16UTF16 NCHAR character set
. importing MARTIN's objects into IMPTEST
. . importing table                        "T1_QH"

And this takes a loooong time. What’s happening in the background? I first checked what the session was doing. That’s simple-have a look at the output.

SQL> @scripts/as

   INST_ID   SID    SERIAL# USERNAME      PROG       SQL_ID         CHILD PLAN_HASH_VALUE      EXECS   AVG_ETIME OFF SQL_TEXT
---------- ----- ---------- ------------- ---------- ------------- ------ --------------- ---------- ----------- --- -----------------------------------------
         1   594         23 IMPTEST       imp@enkdb0 fgft1tcrr12ga      0               0      22723         .00 No  INSERT /*+NESTED_TABLE_SET_REFS+*/ INTO "

The #execs were interesting and indeed, after a couple of minutes that counter has gone up a lot:

   INST_ID   SID    SERIAL# USERNAME      PROG       SQL_ID         CHILD PLAN_HASH_VALUE      EXECS   AVG_ETIME OFF SQL_TEXT
---------- ----- ---------- ------------- ---------- ------------- ------ --------------- ---------- ----------- --- -----------------------------------------
         1   594         23 IMPTEST       imp@enkdb0 fgft1tcrr12ga      0               0     639818         .00 Yes INSERT /*+NESTED_TABLE_SET_REFS+*/ INTO "

You can probably guess what’s coming next. After quite some while the import finished:

> imp imptest/xxxxxxx file=/dbfs_direct/FS1/mbach/exp_t1_qh_classic.dmp log=/dbfs_direct/FS1/mbach/imp_t1_qh_classic.log \
> FROMUSER=martin TOUSER=imptest tables=t1_qh

Import: Release 11.2.0.3.0 - Production on Tue Mar 24 07:27:42 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Tes

Export file created by EXPORT:V11.02.00 via conventional path

Warning: the objects were exported by MARTIN, not by you

import done in US7ASCII character set and AL16UTF16 NCHAR character set
. importing MARTIN's objects into IMPTEST
. . importing table                        "T1_QH"   10000000 rows imported
Import terminated successfully without warnings.

And now for the reason of this blog post:

SQL> select owner, bytes/power(1024,2) m, blocks from dba_segments where segment_name = 'T1_QH';

OWNER                                   M     BLOCKS
------------------------------ ---------- ----------
MARTIN                                 72       9216
IMPTEST                             11227    1437056

The newly important table was not compressed at all when importing using the traditional path.

Concurrent statistics gathering in 12.1.0.2

Prompted by an actual task at hand I spent some time investigating an 11.2.0.2 feature – concurrent statistics gathering. It has been on my to-do list for quite some time but so far I didn’t have a use case, and use cases make it so much easier. The question was-how can I gather statistics on a really large, partitioned table? Previously, you could revert to the degree in dbms_stats.gather_table_stats to ensure that statistics were gathered in parallel. This is all good, but sometimes you need more umph. Some DBAs wrote scripts to execute individual statistic gathering jobs against partitions in parallel, using the tabname and partname arguments in dbms_stats.gather_table_stats(). But that requires manual effort – and the not-quite-so-new concurrent option is so much nicer. Let me take you along the ride… Actually I have to tell the story starting with the happy ending as I had a few snags along the way. This is 12.1.0.2.1 on Oracle Linux 6.5.

Enabling concurrent statistics gathering

A flag needs to be set to enable the new concurrent mechanism for gathering statistics. The initial blog post on the subject set the flag globally:

https://blogs.oracle.com/optimizer/entry/gathering_optimizer_statistics_is_one

Scroll down to “configuration and settings” for the detail. I personally would rather like to set this at table level, but despite the Oracle documentation suggesting that it was possible, it is not. You can check the header of DBMS_STATS to see the different values that can be passed as PNAME to set_table_prefs().

After enabling the concurrent flag on the database level, you can query it:

SQL> select dbms_stats.get_prefs('CONCURRENT')
  2  from dual;

DBMS_STATS.GET_PREFS('CONCURRENT')
------------------------------------------------------------------------------------
TRUE

SQL> select dbms_stats.get_prefs('CONCURRENT',user,'BIGTAB_PART') flag from dual;

FLAG
-------------------------------------------------------------------------------------
TRUE

We should be ready for using the concurrent method now. Here is some background information about the table in question:

SQL>  select partition_name, num_rows, sample_size, last_analyzed 
  2   from user_tab_partitions where table_name = 'BIGTAB_PART'
  3  ;

PARTITION_NAME                   NUM_ROWS SAMPLE_SIZE LAST_ANAL
------------------------------ ---------- ----------- ---------
SYS_P1254
SYS_P1253
SYS_P1252
SYS_P1251
SYS_P1250
SYS_P1249
SYS_P1248
SYS_P1247
SYS_P1246
SYS_P1245
P_MAN

11 rows selected.

SQL> select partition_name, bytes/power(1024,2) m from user_segments where segment_name = 'BIGTAB_PART';

PARTITION_NAME                          M
------------------------------ ----------
SYS_P1245                            8064
SYS_P1246                            8064
SYS_P1247                            8064
SYS_P1248                            8064
P_MAN                                8064
SYS_P1249                            8064
SYS_P1250                            8064
SYS_P1251                            8064
SYS_P1252                            8064
SYS_P1253                            8064
SYS_P1254                               8

11 rows selected.

So each partition is about 8 GB in size. A little helper script prints all the table preferences:

getting table preferences for table BIGTAB_PART
value for CASCADE is DBMS_STATS.AUTO_CASCADE
value for DEGREE is NULL
value for ESTIMATE_PERCENT is DBMS_STATS.AUTO_SAMPLE_SIZE
value for METHOD_OPT is FOR ALL COLUMNS SIZE AUTO
value for NO_INVALIDATE is DBMS_STATS.AUTO_INVALIDATE
value for GRANULARITY is AUTO
value for PUBLISH is TRUE
value for INCREMENTAL is FALSE
value for STALE_PERCENT is 10
value for CONCURRENT is TRUE

Now you can simply execute the call to dbms_stats.gather_table_stats(user, ‘BIGTAB_PART’, degree=>4) and thanks to all the defaults the gathering of statistics will be performed by multiple jobs. In order for this to work you need to have job_queue_processes set to > 4 (the default is 1000 in 12.1.0.2) and you need a resource manager plan to be active. Refer to the oracle blog post referenced earlier for more information about statement queueing and the execution of these jobs. The post also recommends setting parallel_adaptive_multi_user to false, which is the default in 11.2. Just be careful changing PX parameters-these can have undesired side effects.

Using the diagnostic queries take from the Oracle blog post you can see what’s going on. In my case, 6 jobs were created:

select job_name, state, comments from dba_scheduler_jobs where job_class like 'CONC%'

JOB_NAME             STATE           COMMENTS
-------------------- --------------- ----------------------------------------------------------------------------------------------------
ST$T3606_6_B2        SCHEDULED       "MARTIN"."BIGTAB_PART"."SYS_P1254";"MARTIN"."BIGTAB_PART"
ST$T3606_5_B2        RUNNING         "MARTIN"."BIGTAB_PART"."SYS_P1252";"MARTIN"."BIGTAB_PART"."SYS_P1253"
ST$T3606_4_B2        SCHEDULED       "MARTIN"."BIGTAB_PART"."SYS_P1250";"MARTIN"."BIGTAB_PART"."SYS_P1251"
ST$T3606_3_B2        RUNNING         "MARTIN"."BIGTAB_PART"."SYS_P1248";"MARTIN"."BIGTAB_PART"."SYS_P1249"
ST$T3606_2_B2        SCHEDULED       "MARTIN"."BIGTAB_PART"."SYS_P1246";"MARTIN"."BIGTAB_PART"."SYS_P1247"
ST$T3606_1_B2        RUNNING         "MARTIN"."BIGTAB_PART"."P_MAN";"MARTIN"."BIGTAB_PART"."SYS_P1245"

6 rows selected.

By the way if you are interested in the payload, you can use DBMS_METADATA:

SQL> get get_job
  1* SELECT DBMS_METADATA.GET_DDL('PROCOBJ', '&1') from dual;
SQL>

You pass it the job name and you get some pretty cryptic looking output :)

Interlude

Actually, you have to see this…

DBMS_METADATA.GET_DDL('PROCOBJ','ST$T3982_1')
-----------------------------------------------------------------------------------------


BEGIN
dbms_scheduler.create_job('"ST$T3982_1"',
job_type=>'PLSQL_BLOCK', job_action=>
'declare c dbms_stats.CContext := dbms_stats.CContext(); begin c.extend(30); c(1):=''ST$T3982_1;1;855;1;FALSE;FALSE;''; c(2):=q''#"MARTIN";#''; c(3):=q''#"BIGTAB_PART";#''; c(4):=q''#"SYS_P1250";#''; c(5):=''0;''; c(6):=''FALSE;''; c(7):=q''#FOR ALL
COLUMNS SIZE AUTO;#''; c(8):=''NULL;''; c(9):=''PARTITION;''; c(10):=''TRUE;''; c(11):=q''#NULL;#''; c(12):=q''#NULL;#''; c(13):=q''#NULL;#''; c(14):=''NULL;''; c(15):=''DATA;''; c(16):=''FALSE;''; c(17):=''GLOBAL AND PARTITION;''; c(18):=''TRUE;'';
c(19):=''0;''; c(20):='';''; c(21):=''FALSE;''; c(22):=''FALSE;''; c(23):=''4;''; c(24):=''ST$T3982;''; c(25):=''TRUE;''; c(26):=''FALSE;''; c(27):=''4.562199;''; c(28):=''228.109937;''; c(29):=''0;''; c(30):=''0;''; dbms_stats.gather_table_stats('''
','''',context=>c); end;'
, number_of_arguments=>0,
start_date=>NULL, repeat_interval=>
NULL
, end_date=>NULL,
job_class=>'"CONC_ST$T3982"', enabled=>FALSE, auto_drop=>TRUE,comments=>
'"MARTIN"."BIGTAB_PART"."SYS_P1250"'
);
dbms_scheduler.set_attribute('"ST$T3982_1"','raise_events',38);
dbms_scheduler.enable('"ST$T3982_1"');
COMMIT;
END;

This looks better when you parse it a bit more so that it reads:

DECLARE
  c dbms_stats.CContext := dbms_stats.CContext();
BEGIN
  c.extend(30);
  c(1) :='ST$T3982_1;1;855;1;FALSE;FALSE;';
  c(2) :=q'#"MARTIN";#';
  c(3) :=q'#"BIGTAB_PART";#';
  c(4) :=q'#"SYS_P1250";#';
  c(5) :='0;';
  c(6) :='FALSE;';
  c(7) :=q'#FOR ALL COLUMNS SIZE
AUTO;#';
  c(8) :='NULL;';
  c(9) :='PARTITION;';
  c(10):='TRUE;';
  c(11):=q'#NULL;#';
  c(12):=q'#NULL;#';
  c(13):=q'#NULL;#';
  c(14):='NULL;';
  c(15):='DATA;';
  c(16):='FALSE;';
  c(17):='GLOBAL AND PARTITION;';
  c(18):='TRUE;';
  c(19):='0;';
  c(20):=';';
  c(21):='FALSE;';
  c(22):='FALSE;';
  c(23):='4;';
  c(24):='ST$T3982;';
  c(25):='TRUE;';
  c(26):='FALSE;';
  c(27):='4.562199;';
  c(28):='228.109937;';
  c(29):='0;';
  c(30):='0;';
  dbms_stats.gather_table_stats('','',context=>c);
END;

Interesting!

The concurrent statistics gathering works for incremental statistics as well and is a good way to get the initial set of stats it needs anyway (see Doc ID 1541543.1 for background on incremental statistics gathering).

The end result are statistics!

SQL> select partition_name, num_rows, sample_size, last_analyzed
  2   from user_tab_partitions where table_name = 'BIGTAB_PART';

PARTITION_NAME                   NUM_ROWS SAMPLE_SIZE LAST_ANAL
------------------------------ ---------- ----------- ---------
SYS_P1254                              32          32 07-NOV-14
SYS_P1253                        25600000    25600000 07-NOV-14
SYS_P1252                        25600000    25600000 07-NOV-14
SYS_P1251                        25600000    25600000 07-NOV-14
SYS_P1250                        25600000    25600000 07-NOV-14
SYS_P1249                        25600000    25600000 07-NOV-14
SYS_P1248                        25600000    25600000 07-NOV-14
SYS_P1247                        25600000    25600000 07-NOV-14
SYS_P1246                        25600000    25600000 07-NOV-14
SYS_P1245                        25600000    25600000 07-NOV-14
P_MAN                            25599968    25599968 07-NOV-14

11 rows selected.

Stats not gathered concurrently if partitions are small

This all looks pretty straight forward, except that it wasn’t when I first tried. I set all the parameters as documented but still had no jobs running in the background. The first table I tried to gather statistics on concurrently was partitioned exactly like BIGTAB_PART with the difference that all partitions where compressed for QUERY HIGH. This means that a lot less space is taken for each. I got an idea that segment_size might be reladed, so I compressed BIGTAB_PART and here is the result:

SQL> select partition_name, bytes/power(1024,2) m from user_segments 
  2   where segment_name = 'BIGTAB_PART';

PARTITION_NAME                                   M
------------------------------ -------------------
P_MAN                                       321.75
SYS_P1245                                   321.00
SYS_P1246                                   322.50
SYS_P1247                                   321.50
SYS_P1248                                   320.75
SYS_P1249                                   322.25
SYS_P1250                                   322.25
SYS_P1251                                   322.25
SYS_P1252                                   322.25
SYS_P1253                                   322.00
SYS_P1254                                     8.25

11 rows selected.

When trying to gather statistics concurrently (without changing the table preferences at all), this did not happen. So remember next time you gather statistics concurrently and don’t see multiple jobs for intra-table statistics gathering that your partitions might actually be too small.

Exadata and Virtual Private Database: will it offload my query?

During one of the classes I taught about Exadata optimisations I had an interesting question:

If I am using VPD, will Exadata still offload the query?

Background is that we discussed function offloading, and the meta-view v$sqlfn_metadata. It turned out that SYS_CONTEXT() is not offloadable in 11.2.0.4.

SQL> select name,datatype,analytic,aggregate,offloadable,descr
  2  from v$sqlfn_metadata where name = 'SYS_CONTEXT';

NAME                                               DATATYPE ANA AGG OFF DESCR
-------------------------------------------------- -------- --- --- --- ------------------------------
SYS_CONTEXT                                        UNKNOWN  NO  NO  NO  SYS_CONTEXT

Since I’m a great fan of the Tom Kyte method (don’t say it is so, show it!) I needed a quick example. Of all the Oracle books I read “Effective Oracle by Design” was among the most inspiring.

Where to check?

My first idea was to check v$sqlfn_metadata to see if the ever present SYS_CONTEXT() was offloadable:

SQL> select name,offloadable
  2  from v$sqlfn_metadata
  3  where name = 'SYS_CONTEXT';

NAME                    OFF
----------------------- ---
SYS_CONTEXT             NO

SYS:dbm011> select banner from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
PL/SQL Release 11.2.0.4.0 - Production
CORE	11.2.0.4.0	Production
TNS for Linux: Version 11.2.0.4.0 - Production
NLSRTL Version 11.2.0.4.0 - Production

OK so it does not appear to be offloadable. Does it matter? As you will see, it does not although it could startle you at first.

Reproduce a test case: 11.2.0.4/11.2.3.3.1.140529.1

After finishing the example from the Oracle documentation I decided that I needed a few more rows in the table to get to a more realistic data distribution for an Exadata system. You wouldn’t see a smart scan on a row with < 10 rows. This can be done quite easily, and the end result was:

SYS> select count(*) from scott.orders_tab;

   COUNT(*)
-----------
   12000003

1 row selected.

I had to ensure that the query is offloaded first-simplez! Kerry Osborne has a script for this:

SYS> select /*+ gather_plan_statistics rowcount_sys_stats */
  2  count(*) from scott.orders_tab;

   COUNT(*)
-----------
   12000003

1 row selected.

Elapsed: 00:00:00.81

SYS:OGGSRC> @scripts/fsx
Enter value for sql_text: %rowcount_sys_stats%
Enter value for sql_id:

SQL_ID	       CHILD   PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD IO_SAVED_% SQL_TEXT
------------- ------ ----------- ------ ---------- ------ ------- ---------- ----------------------------------------------------------------------
8bv5b04mjku08	   0  1093340548      1        .81	0 Yes	       38.47 select /*+ gather_plan_statistics rowcount_sys_stats */ count(*) from

1 row selected.

SYS:OGGSRC> @scripts/dplan
Enter value for sql_id: 8bv5b04mjku08
Enter value for child_no:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID	8bv5b04mjku08, child number 0
-------------------------------------
select /*+ gather_plan_statistics rowcount_sys_stats */ count(*) from
scott.orders_tab

Plan hash value: 1093340548

---------------------------------------------------------------------------------
| Id  | Operation                  | Name       | Rows  | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |            |       |  7521 (100)|          |
|   1 |  SORT AGGREGATE            |            |     1 |            |          |
|   2 |   TABLE ACCESS STORAGE FULL| ORDERS_TAB |    12M|  7521   (1)| 00:01:31 |
---------------------------------------------------------------------------------

15 rows selected.

So I guess it is.

Enter VPD

With all the VPD bells and whistles turned on I connected as on the the users for which there is an explicit context set and ran my query:

TBROOKE> select /*+ gather_plan_statistics tbrooke_query_001 */
  2  count(*) from scott.orders_tab;

   COUNT(*)
-----------
          4

1 row selected.

Elapsed: 00:00:00.15

1 row selected.

Which shows that the VPD policy works. What do my tools indicate?

SYS> @scripts/fsx
Enter value for sql_text: %tbrooke_query_001%
Enter value for sql_id:

SQL_ID	       CHILD   PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD IO_SAVED_% SQL_TEXT
------------- ------ ----------- ------ ---------- ------ ------- ---------- ----------------------------------------------------------------------
bs9gq0bdqazzu	   0  1093340548      1        .15	0 Yes	       99.98 select /*+ gather_plan_statistics tbrooke_query_001 */ count(*) from s

1 row selected.

SYS> @scripts/dplan
Enter value for sql_id: bs9gq0bdqazzu
Enter value for child_no: 0

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID	bs9gq0bdqazzu, child number 0
-------------------------------------
select /*+ gather_plan_statistics tbrooke_query_001 */ count(*) from
scott.orders_tab

Plan hash value: 1093340548

-----------------------------------------------------------------------------------------
| Id  | Operation                  | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |            |       |       |  7651 (100)|          |
|   1 |  SORT AGGREGATE            |            |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| ORDERS_TAB |     1 |     6 |  7651   (3)| 00:01:32 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage("CUST_NO"=TO_NUMBER(SYS_CONTEXT('orders_ctx','cust_no')))
       filter("CUST_NO"=TO_NUMBER(SYS_CONTEXT('orders_ctx','cust_no')))

21 rows selected.

So using the script it is easily visible that a smart scan has happened and in fact it saved 99.98% of IO. Which is not surprising giving that only 4 rows out of the whole result set have been returned. The Real Time SQL Monitor Report confirmed the finding by the way. I had to sneak in another hint (+monitor) otherwise the statement wouldn’t be captured in SQL Monitor (only “long running” statements are captured by default)

SQL Monitoring Report

SQL Text
------------------------------
select /*+ gather_plan_statistics monitor tbrooke_query_002 */ count(*) from scott.orders_tab

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  TBROOKE (33:239)
 SQL ID              :  8ydqam3fuwt2z
 SQL Execution ID    :  16777216
 Execution Started   :  10/14/2014 04:28:24
 First Refresh Time  :  10/14/2014 04:28:24
 Last Refresh Time   :  10/14/2014 04:28:24
 Duration            :  .100139s
 Module/Action       :  SQL*Plus/-
 Service             :  SYS$USERS
 Program             :  sqlplus@enkdb01.enkitec.com (TNS V1-V3)
 Fetch Calls         :  1

Global Stats
====================================================================================
| Elapsed |   Cpu   |    IO    | PL/SQL  | Fetch | Buffer | Read | Read  |  Cell   |
| Time(s) | Time(s) | Waits(s) | Time(s) | Calls |  Gets  | Reqs | Bytes | Offload |
====================================================================================
|    0.10 |    0.01 |     0.09 |    0.00 |     1 |  27352 |  228 | 214MB |  99.98% |
====================================================================================

SQL Plan Monitoring Details (Plan Hash Value=1093340548)
=========================================================================================================================================================================
| Id |          Operation           |    Name    |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  |  Cell   |  Mem  | Activity | Activity Detail |
|    |                              |            | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Offload | (Max) |   (%)    |   (# samples)   |
=========================================================================================================================================================================
|  0 | SELECT STATEMENT             |            |         |      |         1 |     +0 |     1 |        1 |      |       |         |       |          |                 |
|  1 |   SORT AGGREGATE             |            |       1 |      |         1 |     +0 |     1 |        1 |      |       |         |       |          |                 |
|  2 |    TABLE ACCESS STORAGE FULL | ORDERS_TAB |       1 | 7651 |         1 |     +0 |     1 |        4 |  228 | 214MB |  99.98% |    2M |          |                 |
=========================================================================================================================================================================

This confirms that a full table scan happened, and it must have been executed as a direct path read. A DPR on Exadata most oftent transforms into a smart scan. As you can see I didn’t specify any other predicate, and yet the VPD was offloaded.

Reproduce a test case: 12.1.0.2.0/12.1.1.1.1.140712

Interestingly in 12.1.0.2.0 the function SYS_CONTEXT is offloadable:

SQL> select name, offloadable
  2  from v$sqlfn_metadata where name = 'SYS_CONTEXT';

NAME                           OFF
------------------------------ ---
SYS_CONTEXT                    YES

1 row selected.

For completeness sake I have repeated my test here. I copied the table via network link and created the same users and VPD. Not too much of a difference. Information is shown here without comments:

SQL> select banner from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
PL/SQL Release 12.1.0.2.0 - Production
CORE    12.1.0.2.0      Production
TNS for Linux: Version 12.1.0.2.0 - Production
NLSRTL Version 12.1.0.2.0 - Production

TBROOKE> select /*+ gather_plan_statistics tbrooke_query_013 */ count(*) from scott.orders_tab;

  COUNT(*)
----------
         4

Elapsed: 00:00:00.02

SQL> @scripts/fsx
Enter value for sql_text: %tbrooke_query_013%
Enter value for sql_id:

SQL_ID         CHILD  PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD IO_SAVED_% SQL_TEXT
------------- ------ ---------- ------ ---------- ------ ------- ---------- ----------------------------------------------------------------------
bf0s5hzr7x9r5      0 1093340548      1        .02      0 Yes          99.97 select /*+ gather_plan_statistics tbrooke_query_013 */ count(*) from s

1 row selected.

SQL> @scripts/dplan
Enter value for sql_id: bf0s5hzr7x9r5
Enter value for child_no: 0

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  bf0s5hzr7x9r5, child number 0
-------------------------------------
select /*+ gather_plan_statistics tbrooke_query_013 */ count(*) from
scott.orders_tab

Plan hash value: 1093340548

-----------------------------------------------------------------------------------------
| Id  | Operation                  | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |            |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE            |            |     1 |    13 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| ORDERS_TAB |     1 |    13 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage("CUST_NO"=TO_NUMBER(SYS_CONTEXT('orders_ctx','cust_no')))
       filter("CUST_NO"=TO_NUMBER(SYS_CONTEXT('orders_ctx','cust_no')))

21 rows selected.

SQL Monitoring Report

SQL Text
------------------------------
select /*+ gather_plan_statistics monitor tbrooke_query_014 */ count(*) from scott.orders_tab

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  TBROOKE (1042:32766)
 SQL ID              :  55yy67scgw2sf
 SQL Execution ID    :  16777216
 Execution Started   :  10/14/2014 05:13:30
 First Refresh Time  :  10/14/2014 05:13:30
 Last Refresh Time   :  10/14/2014 05:13:30
 Duration            :  .016025s
 Module/Action       :  SQL*Plus/-
 Service             :  SYS$USERS
 Program             :  sqlplus@enkdb03.enkitec.com (TNS V1-V3)
 Fetch Calls         :  1

Global Stats
==================================================================================================
| Elapsed |   Cpu   |    IO    | Application | PL/SQL  | Fetch | Buffer | Read | Read  |  Cell   |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Time(s) | Calls |  Gets  | Reqs | Bytes | Offload |
==================================================================================================
|    0.02 |    0.01 |     0.01 |        0.00 |    0.00 |     1 |  27441 |  223 | 214MB |  99.99% |
==================================================================================================

SQL Plan Monitoring Details (Plan Hash Value=1093340548)
=========================================================================================================================================================================
| Id |          Operation           |    Name    |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  |  Cell   |  Mem  | Activity | Activity Detail |
|    |                              |            | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Offload | (Max) |   (%)    |   (# samples)   |
=========================================================================================================================================================================
|  0 | SELECT STATEMENT             |            |         |      |         1 |     +0 |     1 |        1 |      |       |         |       |          |                 |
|  1 |   SORT AGGREGATE             |            |       1 |      |         1 |     +0 |     1 |        1 |      |       |         |       |          |                 |
|  2 |    TABLE ACCESS STORAGE FULL | ORDERS_TAB |       1 |    2 |         1 |     +0 |     1 |        4 |  223 | 214MB |  99.99% |    3M |          |                 |
=========================================================================================================================================================================

In the last example flash cache and storage indexes attributed greatly to the quick execution time. Using Adrian Billington’s mystats I can see more detail. I have removed what’s not needed from the report.

SQL> @scripts/mystats stop t=1

==========================================================================================
MyStats report : 14-OCT-2014 05:27:44
==========================================================================================

------------------------------------------------------------------------------------------
1. Summary Timings
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
TIMER   snapshot interval (seconds)                                                  13.59
TIMER   CPU time used (seconds)                                                       0.01

------------------------------------------------------------------------------------------
2. Statistics Report
------------------------------------------------------------------------------------------

Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    CPU used by this session                                                         7
STAT    CPU used when call started                                                       7
STAT    DB time                                                                          9
...
STAT    cell IO uncompressed bytes                                               3,031,040
STAT    cell blocks helped by minscn optimization                                      370
STAT    cell blocks processed by cache layer                                           370
STAT    cell blocks processed by data layer                                            370
STAT    cell blocks processed by txn layer                                             370
STAT    cell flash cache read hits                                                      10
STAT    cell num smartio automem buffer allocation attempts                              1
STAT    cell physical IO bytes eligible for predicate offload                  224,763,904
STAT    cell physical IO bytes saved by storage index                          221,732,864
STAT    cell physical IO interconnect bytes                                          5,360
STAT    cell physical IO interconnect bytes returned by smart scan                   5,360
STAT    cell scans                                                                       1
...
STAT    consistent gets                                                             27,816
STAT    consistent gets direct                                                      27,437
STAT    consistent gets examination                                                      3
STAT    consistent gets examination (fastpath)                                           3
STAT    consistent gets from cache                                                     379
STAT    consistent gets pin                                                            376
STAT    consistent gets pin (fastpath)                                                 376
STAT    db block changes                                                               787
STAT    db block gets                                                                2,879
STAT    db block gets from cache                                                     2,879
...
STAT    logical read bytes from cache                                           26,689,536
...
STAT    physical read bytes                                                    224,763,904
STAT    physical read requests optimized                                               223
STAT    physical read total IO requests                                                223
STAT    physical read total bytes                                              224,763,904
STAT    physical read total bytes optimized                                    224,763,904
STAT    physical read total multi block requests                                       215
STAT    physical reads                                                              27,437
STAT    physical reads direct                                                       27,437
...
STAT    table fetch by rowid                                                             1
STAT    table scan blocks gotten                                                         3
STAT    table scan disk non-IMC rows gotten                                          1,685
STAT    table scan rows gotten                                                       1,685
STAT    table scans (direct read)                                                        1
STAT    table scans (long tables)                                                        1
...

------------------------------------------------------------------------------------------
3. About
------------------------------------------------------------------------------------------
- MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
- Based on the SNAP_MY_STATS utility by Jonathan Lewis

==========================================================================================
End of report
==========================================================================================

In the case of VPD the fact that SYS_CONTEXT() is offloadable did not play a major role.

Little things worth knowing-troubleshooting parallel statement queueing

This is yet another one of these posts that hopefully help you as much as they are going to help me in the future. Recently I enjoyed troubleshooting a problem related to parallel execution. Since I have never really written down how to tackle such a problem I thought it might be nice to do that now.

This is 12.1.0.2.0 on Exadata, but the platform doesn’t really matter for the troubleshooting technique.

What is parallel statement queueing

Statement queueing is a pretty nifty tool new with 11.2. Instead of downgrading the requested Degree of Parallelism (DOP) as it was the case with parallel_adaptive_multi_user your session will queue until the requested number of parallel servers becomes available. The rationale behind this feature is that you might end up queueing for 1 minute until all the requested PX servers become available, but once they are available your session can finish in 2 minutes with a DOP of 16 rather than 8 minutes with a (old behaviour) downgraded DOP of 2 for example. The 1 minute was well spent waiting. If you want to fine tune statement queueing, database resource manager allows you to.

The downside is that you don’t get statement queueing without enabling automatic DOP unless you play with underscore parameters. Automatic DOP is quite often the reason why you see statement queueing in the first place.

How does this statement queueing work?

With parallel_degree_policy set to AUTO (warning again-this can be dangerous!) the optimiser estimates the time it takes to execute a SQL statement during the hard parse. If the estimated execution time exceeds parallel_min_time_threshold (default is AUTO which is 10 seconds) then the optimiser will try to work out the “optimal” DOP which is calculated as min(parallel_degree_limit,  calculated ideal DOP). The default parallel_degree_limit is set to CPU, which is documented as follows: “The maximum degree of parallelism is limited by the number of CPUs in the system”. Ooops-that can be a lot. It might be better to set the parallel_degree_limit to an integer, signifying the maximum allowed DOP for a statement executing in parallel. On Exadata you don’t normally need very high degrees of parallelism, so a more conservative settings might be in order.

Queueing

When does the queueing mechanism kick in? Let’s imagine that you have parallel_max_servers = parallel_servers_target = 48 defined in your system, in my case an Exadata X2 with 2s12c24t Westmere EP processors. This isn’t the default by the way. This setting allows you to execute queries 48-way parallel per instance. Let’s assume that 40 query slaves are in use already, and another user wants to execute a query 12-way parallel. The sum of currently executing parallel slaves + the requested number of slaves exceeds parallel_servers_target, triggering statement queueing for your session.

Wolfgang it!

To create a reproducible test case I set parallel_max_servers to 48 and parallel_servers_target to 24. I also enabled automatic DOP in my session. My test queries are directed at a 256 million rows table of approximately 80GB in size, non partitioned. Before executing the first SQL statement I enabled event 10053 to Wolfgang my query.

SQL> alter session set events '10053 trace name context forever, level 1';

Session altered.

Elapsed: 00:00:00.02
SQL> desc bigtab
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                                 NUMBER
 V1                                                 VARCHAR2(40)
 N1                                                 NUMBER
 N2                                                 NUMBER
 N_256K                                             NUMBER
 N_128K                                             NUMBER
 N_8K                                               NUMBER
 PADDING                                            VARCHAR2(960)

SQL> select 1 + sum(n1) + sum (n2) - sum(n_8k) from bigtab where id > 1;

1+SUM(N1)+SUM(N2)-SUM(N_8K)
---------------------------
                 1.0471E+13

Elapsed: 00:00:35.59

The 10053 trace should give me some idea about the DOP. Let’s have a look at it:

     36 **************************
     37 Automatic degree of parallelism (AUTODOP)
     38 **************************
     39 Automatic degree of parallelism is enabled for this statement in auto mode.
     40 kkopqSetForceParallelProperties: Hint:no
     41 Query: compute:yes forced:no forceDop:0
     42 Calibration statistics is enabled.
     43 Start with a serial pass, cost the DOP to use

   1935 AUTO DOP PLANS EVALUATION
   1936 ***************************************
   1937 Compilation completed with Dop: 1.
   1938   Cost_io:  2776823.000000  Cost_cpu: 154935265907
   1939   Card:     1.000000  Bytes:    19.000000
   1940   Cost:     2781428.613003  Est_time:  401168ms
   1941 kkopqCombineDop: Dop:40 Hint:no
   1942 Query: compute:yes forced:no  scanDop:40 cpuDop:1 forceDop:0
   1943 Serial plan is expensive enough to be a candidate for parallelism (2781429)
   1944 AutoDOP: kkopqAdjustDop: dop is 40 after affinity normalization
   1945 Signal reparse with DOP 40.
   1946 *****************************
   1947 Number of Compilations tried: 1
   1948 *****************************

   3853 AUTO DOP PLANS EVALUATION
   3854 ***************************************
   3855 Compilation completed with Dop: 40.
   3856   Cost_io:  77133.972222  Cost_cpu: 2387701756
   3857   Card:     1.000000  Bytes:    19.000000
   3858   Cost:     77204.949157  Est_time:  11135ms
   3859 kkopqCombineDop: Dop:80 Hint:no
   3860 Query: compute:yes forced:no  scanDop:40 cpuDop:80 forceDop:0
   3861 AutoDOP: kkopqAdjustDop: computed dop before applying degree limit was 80 and now it is 48
   3862 AutoDOP: kkopqAdjustDop: dop is 48 after affinity normalization
   3863 Adopt new plan with dop=40
   3864 Signal reparse with DOP 48.
   3865 *****************************
   3866 Number of Compilations tried: 2
   3867 *****************************

   5772 AUTO DOP PLANS EVALUATION
   5773 ***************************************
   5774 Compilation completed with Dop: 48.
   5775   Cost_io:  64278.310185  Cost_cpu: 1989751463
   5776   Card:     1.000000  Bytes:    19.000000
   5777   Cost:     64337.457631  Est_time:  9279ms
   5778 kkopqCombineDop: Dop:80 Hint:no
   5779 Query: compute:yes forced:no  scanDop:40 cpuDop:80 forceDop:0
   5780 Adopt new plan with dop=48
   5781 kkopqSetDopReason: Reason why we chose this DOP is: degree limit.
   5782 Costing completed. DOP chosen: 48.
   5783 AutoDop: kkopqSetMaxDopInCursorEnv:In the Cursor Session Env, max DOP is set to 48
   5784 *****************************
   5785 Number of Compilations tried: 3
   5786 *****************************

   5811 ============
   5812 Plan Table
   5813 ============
   5814 --------------------------------------------------+-----------------------------------+-------------------------+
   5815 | Id  | Operation                       | Name    | Rows  | Bytes | Cost  | Time      |  TQ  |IN-OUT|PQ Distrib |
   5816 --------------------------------------------------+-----------------------------------+-------------------------+
   5817 | 0   | SELECT STATEMENT                |         |       |       |   63K |           |      |      |           |
   5818 | 1   |  SORT AGGREGATE                 |         |     1 |    19 |       |           |      |      |           |
   5819 | 2   |   PX COORDINATOR                |         |       |       |       |           |      |      |           |
   5820 | 3   |    PX SEND QC (RANDOM)          | :TQ10000|     1 |    19 |       |           |:Q1000| P->S |QC (RANDOM)|
   5821 | 4   |     SORT AGGREGATE              |         |     1 |    19 |       |           |:Q1000| PCWP |           |
   5822 | 5   |      PX BLOCK ITERATOR          |         |  276M | 5248M |   63K |  00:02:09 |:Q1000| PCWC |           |
   5823 | 6   |       TABLE ACCESS STORAGE FULL | BIGTAB  |  276M | 5248M |   63K |  00:02:09 |:Q1000| PCWP |           |
   5824 --------------------------------------------------+-----------------------------------+-------------------------+
   5825 Predicate Information:
   5826 ----------------------
   5827 6 - storage(:Z>=:Z AND :Z<=:Z AND "ID">1)
   5828 6 - filter("ID">1)
   5829
   5830 Content of other_xml column
   5831 ===========================
   5832   derived_cpu_dop: 80
   5833   derived_io_dop : 40
   5834   dop_reason     : degree limit
   5835   dop            : 48
   5836   px_in_memory_imc: no
   5837   px_in_memory   : no
   5838   io_rate        : 200
   5839   derived_io_dop : 40
   5840   cpu_rate       : 1000
   5841   derived_cpu_dop: 52
   5842   cpu_rate       : 1000
   5843   derived_cpu_dop: 29
   5844   cpu_rate       : 1000
   5845   derived_cpu_dop: 29
   5846   db_version     : 12.1.0.2
   5847   parse_schema   : MARTIN
   5848   dynamic_sampling: 11
   5849   plan_hash_full : 1216263238
   5850   plan_hash      : 4050997022
   5851   plan_hash_2    : 1216263238
   5852   Outline Data:
   5853   /*+
   5854     BEGIN_OUTLINE_DATA
   5855       IGNORE_OPTIM_EMBEDDED_HINTS
   5856       OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
   5857       DB_VERSION('12.1.0.2')
   5858       OPT_PARAM('optimizer_dynamic_sampling' 11)
   5859       ALL_ROWS
   5860       SHARED(48)
   5861       OUTLINE_LEAF(@"SEL$1")
   5862       FULL(@"SEL$1" "BIGTAB"@"SEL$1")
   5863     END_OUTLINE_DATA
   5864   */

There you have it! The v$sql_plan.other_xml column contains all the necessary information-no need for a 10053 trace :). The fanciest way I can come up with to get that information is to use this statement:

select t.*
 from v$sql_plan v,
  xmltable(
	'/other_xml/info'
	passing xmltype(v.other_xml)
	columns
		info_type varchar2(30) path '@type',
		info_value varchar2(30) path '/info'
  ) t
where v.sql_id = '&sql_id'
  and v.child_number = &child_number
  and other_xml is not null;

INFO_TYPE                      INFO_VALUE
------------------------------ ------------------------------
derived_cpu_dop                51
derived_io_dop                 40
dop_reason                     degree limit
dop                            48
px_in_memory_imc               no
px_in_memory                   no
io_rate                        200
derived_io_dop                 40
cpu_rate                       1000
derived_cpu_dop                26
cpu_rate                       1000
derived_cpu_dop                26
cpu_rate                       1000
derived_cpu_dop                26
db_version                     12.1.0.2
parse_schema                   "MARTIN"
plan_hash_full                 1216263238
plan_hash                      4050997022
plan_hash_2                    1216263238

19 rows selected.

But I digress… By the way-this statement used 48 PX server processes because it was the only one active at the time.

Causing trouble

After hitting publish I noticed that I copied the wrong example…

An example working better for the purpose of demonstrating the effect of queueing uses a more realistic case where statements use different DOPs. Two sessions are enough, one that will use 12 slaves, and another that requests 36. The statement with a DOP of 12 is started first, the one with a DOP of 36 a few seconds later. In a third session I gathered some diagnostic information:

SYS:dbm011> select px.sid, px.serial#,px.qcsid,px.qcserial#,px.qcinst_id,px.degree,px.req_degree,
  2  s.username, s.sql_id, s.sql_child_number, s.event, s.state
  3  from v$px_session px, v$session s
  4  where s.sid = px.sid
  5    and s.serial# = px.serial#
  6  order by px.qcsid;

        SID     SERIAL#       QCSID   QCSERIAL#   QCINST_ID      DEGREE  REQ_DEGREE USERNAME             SQL_ID        SQL_CHILD_NUMBER EVENT                                                            STATE
----------- ----------- ----------- ----------- ----------- ----------- ----------- -------------------- ------------- ---------------- ---------------------------------------------------------------- --------------------
        265       14828         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITED SHORT TIME
        330       21719         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        395       59827         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITED SHORT TIME
        461       17363         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        525        4955         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        592        2687         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        526       60805         526                                                 MARTIN               9md3ukhghgj58                0 PX Deq: Execute Reply                                            WAITING
        720       27626         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        786       47540         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITED KNOWN TIME
        850       41110         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITED SHORT TIME
        918       46836         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        983       12699         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        657       22654         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING

13 rows selected.

SYS:dbm011> select count(qcsid),qcsid from v$px_session group by qcsid;

COUNT(QCSID)       QCSID
------------ -----------
          13         526

SYS:dbm011>
SYS:dbm011> select count(event),event from v$session where username = 'MARTIN' group by event;

COUNT(EVENT) EVENT
------------ ----------------------------------------------------------------
          12 cell smart table scan
           1 PX Deq: Execute Reply
           1 resmgr:pq queued

SYS:dbm011>

So my first session is well underway, 12 way parallel as requested. The other session had to wait but kicked into motion as soon as my first session finished.

SYS:dbm011> select px.sid, px.serial#,px.qcsid,px.qcserial#,px.qcinst_id,px.degree,px.req_degree,
  2  s.username, s.sql_id, s.sql_child_number, s.event, s.state
  3  from v$px_session px, v$session s
  4  where s.sid = px.sid
  5    and s.serial# = px.serial#
  6  order by px.qcsid;

        SID     SERIAL#       QCSID   QCSERIAL#   QCINST_ID      DEGREE  REQ_DEGREE USERNAME             SQL_ID        SQL_CHILD_NUMBER EVENT                                                            STATE
----------- ----------- ----------- ----------- ----------- ----------- ----------- -------------------- ------------- ---------------- ---------------------------------------------------------------- --------------------
        265       43575         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        525       17778         525                                                 MARTIN               4dz3dapm4uy8q                0 PX Deq: Execute Reply                                            WAITING
        395       10504         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        462       57813         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        526       12648         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        591       42643         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        657        5236         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        720       19243         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        786       61636         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        850       17734         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED KNOWN TIME
        918       60016         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        983       24489         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
       1044       41056         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
       1108        8936         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
       1178       52303         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED KNOWN TIME
       1242       37008         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
       1307       28798         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED SHORT TIME
       1372       22172         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED SHORT TIME
       1437       59990         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED SHORT TIME
       1501        9307         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
          6       39400         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
         72       28569         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        137       18368         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        202       56832         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        333        6482         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        397       28334         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        461       17550         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        524         958         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        593       41235         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        656       51686         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED KNOWN TIME
        719       40346         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED SHORT TIME
        785        5157         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        851       35243         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED KNOWN TIME
        917       45635         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        980       18504         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
       1046       27779         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        330       10724         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING

37 rows selected.

SYS:dbm011>
SYS:dbm011> select count(qcsid),qcsid from v$px_session group by qcsid;

COUNT(QCSID)       QCSID
------------ -----------
          37         525

SYS:dbm011>
SYS:dbm011> select count(event),event from v$session where username = 'MARTIN' group by event;

COUNT(EVENT) EVENT
------------ ----------------------------------------------------------------
          36 cell smart table scan
           1 PX Deq: Execute Reply

The effect of queueing can be seen in the extended execution time:

  • 12 way parallel-no queueing: Elapsed: 00:00:07.16
  • 36 way parallel-with a bit of queueing: Elapsed: 00:00:12.92

V$PX_SESSION allows you to identify the culprit causing other sessions to queue-have a look at it and see if you can optimise the statement or run it later. This should unblock you. Also have a look at parallel_servers_target- you might have limited yourself too much. I personally don’t exceed the core/thread count on a box by too much to avoid scheduling problems.

A brief history of time^H^H Oracle session statistics

I didn’t intend to write another blog post yesterday evening at all, but found something that was worth sharing and got me excited… And when I started writing I intended it to be a short post, too.

If you have been digging around Oracle session performance counters a little you undoubtedly noticed how their number has increased with every release, and even with every patch set. Unfortunately I don’t have a 11.1 system (or earlier) at my disposal to test, but here is a comparison of how Oracle has instrumented the database. I have already ditched my 12.1.0.1 system as well, so no comparison there either :( This is Oracle on Linux.

The script

In the following examples I am going to use a simple query to list the session statistics by their class. The decode statement is based on the official documentation set. There you find the definition of v$statname plus an explanation of the meaning of the class-column. Here is the script:

with stats as (
        select name, decode(class,
                1, 'USER',
                2, 'REDO',
                4, 'ENQUEUE',
                8, 'CACHE',
                16, 'OS',
                32, 'RAC',
                64, 'SQL',
                128, 'DEBUG',
                'NA'
        ) as decoded_class from v$statname
)
select count(decoded_class), decoded_class
 from stats
 group by rollup(decoded_class)
 order by 1
/

Oracle 11.2.0.3

11.2.0.3 is probably the most common 11g Release 2 version currently out there in the field. Or at least that’s my observation. According to MOS Doc ID 742060.1 11.2.0.3 was released on 23 September 2011 (is that really that long ago?) and already out of error correction support by the way.

Executing the above-mentioned script gives me the following result:

COUNT(DECODED_CLASS) DECODED
-------------------- -------
                   9 ENQUEUE
                  16 OS
                  25 RAC
                  32 REDO
                  47 NA
                  93 SQL
                 107 USER
                 121 CACHE
                 188 DEBUG
                 638

So there are 638 of these counters. Let’s move on to 11.2.0.4

Oracle 11.2.0.4

Oracle 11.2.0.4 is interesting as it has been released after 12.1.0.1. It is the terminal release for Oracle 11.2, and you should consider migrating to it as it is in error correction support. The patch set came out on 28 August 2013. What about the session statistics?

COUNT(DECODED_CLASS) DECODED
-------------------- -------
                   9 ENQUEUE
                  16 OS
                  25 RAC
                  34 REDO
                  48 NA
                  96 SQL
                 117 USER
                 127 CACHE
                 207 DEBUG
                 679

A few more, all within what can be expected.

Oracle 12.1.0.2

Oracle 12.1.0.2 is fresh off the press, released just a few weeks ago. Unsurprisingly the number of session statistics has been increased again. What did surprise me was the number of statistics now available for every session! Have a look at this:

COUNT(DECODED_CLASS) DECODED
-------------------- -------
                   9 ENQUEUE
                  16 OS
                  35 RAC
                  68 REDO
                  74 NA
                 130 SQL
                 130 USER
                 151 CACHE
                 565 DEBUG
                1178

That’s nearly double what you found for 11.2.0.3. Incredible, and hence this post. Comparing 11.2.0.4 with 12.1.0.2 you will notice the:

  • same number of enqueue stats
  • same number of OS stats
  • 10 additional RAC stats
  • twice the number of REDO related statistics
  • quite a few more not classified (26)
  • 34 more sql related
  • 13 more in the user-class
  • 24 additional stats in the cache-class
  • and a whopping 298 (!) in the debug class

The debug class (128) shows lots of statistics (including spare ones) for the in-memory option (IM):

SQL> select count(1), class from v$statname where name like 'IM%' group by class;

  COUNT(1)      CLASS
---------- ----------
       211        128

Happy troubleshooting! Reminds me to look into the IM-option in more detail.

Why is P1 the only parameter populated in cell smart table scan?

Anyone who has looked at Exadata might ask the question, and I did so too. After all, cell smart table scan is in wait class User IO so there should be more, right? This is what you find for a smart scan:

NAME                           PARAMETER1           PARAMETER2           PARAMETER3                     WAIT_CLASS
------------------------------ -------------------- -------------------- ------------------------------ ---------------
cell smart table scan          cellhash#                                                                User I/O
cell smart index scan          cellhash#                                                                User I/O

Compare this to the traditional IO request:

NAME                           PARAMETER1           PARAMETER2           PARAMETER3                     WAIT_CLASS
------------------------------ -------------------- -------------------- ------------------------------ ---------------
db file sequential read        file#                block#               blocks                         User I/O
db file scattered read         file#                block#               blocks                         User I/O
direct path read               file number          first dba            block cnt                      User I/O

At first I didn’t understand as to why that was the. Until tracing a lot more. Using the method described by my colleague Frits Hoogland I traced Exadata calls as well as the IB interconnect in addition to a SQL trace (level 8). Please don’t do this outside the lab, the trace will get huge!

Now that produces an interesting file, and the SQL trace information is only 1 intermittent line drowning in a sea of other information. So I used a sed oneliner to make them stand out a bit:

$ sed  "/WAIT/{x;p;x;G;}" db12c1_ora_11916.trc > kxd_libcell_sqltace.trc

This introduces a newline before and after each line containing WAIT. The SQL Trace information is familiar:

WAIT #139970814085680: nam='cell smart table scan' ela= 251 cellhash#=822451848 p2=0 p3=0 obj#=61471 tim=3470911250866

By the way the obj# is the data_object_ID, not the object_id in DBA_OBJECTS.

Stepping through the trace

The trace information reflects what is happening during the various smart scan processing steps. This first example shows you some of the processing during the smart scan initialisation phase, when the cells are being initialised. I initially struggled mapping the “appliance” hexadecimal number to an actual cell. But it’s easy if you grep for kcfis_open_appliance_fd, and match the cellhash to v$cell.cell_hashval. Continuing the example:

kcfis_open_appliance:
throttle: initialized for appliance 0x7f4d7e9bc4c0
kcfis_find_appliance_fd: appliance fd not found. appliance 192.168.12.10/ Cached appliance fd count 0
kcfis_open_appliance_fd: 0x7f4d7e9bc4c0
oss_open called for o/192.168.12.10//box/predicate
Sending message RQ_Tag_2827385864_11: 0xf9b3f08 with frag # 0, to box my_box_refid: 2 (box inc: 12)
Command with band 1, payload length 116, payload 0xf9b4110
RQ_Tag_2827385864_11: Command name OPEN, refid 11
Open flags: readwrite - 1, sync - 0, char - 1.
 Device name - /box/predicate
Vrpc completion - RQ_Tag_2827385864_11: cmd 0xf9b3f48, req 0xf9b3a40, frag # 1
Vrpc reply 0xf9b4000 with refid 11 for existing message: 0xf9b3f08
Reply with band 0, payload length 0, payload (nil)
RQ_Tag_2827385864_11: Command name REPLY OPEN, refid 11
Returned open file descriptor  1
opening file number 11

WAIT #139970814085680: nam='cell smart table scan' ela= 259 cellhash#=822451848 p2=0 p3=0 obj#=61471 tim=3470911167788

So here the cell with IP 192.168.12.10 maps to 0x7f4d7e9bc4c0 in the trace. As you can see, immediately before the wait event is written to the trace file a file is opened on the cell. This causes the line to be emitted in the SQL trace file. So far there hasn’t been any significant scanning activity against the data. But the trace line makes sense: the IO request was issued against the cell with hash 822451848. We don’t know what it did though, and which file it accessed.

SYS:db12c1> select cell_path from v$cell where cell_hashval = '822451848';

CELL_PATH
----------------------------
192.168.12.10

I was initially quite pleased with myself initially when I found this out.

More trace information. A lot more.

Lesson learned trawling through the trace: just because there is a single line in the trace file doesn’t mean there aren’t other requests either in flight or under preparation.

The next few thousand (!) lines show you the mapping between RDBMS and cell addressing. This part is well explained in Frits’ post I referenced earlier. Further down in the trace it gets more interesting from a SQL trace perspective again. Here payload maps are generated and issued against the cells. Remember that these IO requests to cells are performed asynchronously. I found quite a few calls to the code generating payloads for the cells between individual SQL Trace WAIT lines. This makes reading the trace a bit confusing, it almost appears as if the SQL Trace couldn’t keep up with the speed the requests are created/sent.

Looking at the file I found the following pattern preceding a smart scan line in the first half of the process (before data comes back from the cells). Beginning immediately after the last WAIT line, in abbreviated format:

Ioctl completed.
appliance 0x7f4d7e9bc4c0 : cellsrv pid: 9563: predicate: /box/predicate37843
...
kcfis_create_maps_payload. appliance 0x7f4d7ea65570 num_disks 12
throttle: allowing map push: appliance 0x7f4d7ea65570, maps_to_push 7
...
Pushing request : disknumber = 0, offset = 936606973952, len = 1032192 rdba: 43778306 version 0 reqid=10 cver=1
disk=0x7f4d7e9c0aa8 state=1
trans_req_element = 0x7f4d7e9cba00

The appliance name, cellsrv (thread) ID, and the box predicate appear to remain constant for the lifetime of the smart scan. You can find them in other occurrences of the trace as well (but not used in the same context).

The preceding lines are then followed by many more lines beginning with “Pushing request”, iterating over all 12 disks (-> num_disks), for a range of Data Block Addresses. After that map payload has been created it is time to push it to the cell. Here it gets a bit murky: the KXD.* traces show requests being pushed, but not necessarily to the cell mentioned in the event line. Consider this:

kcfis_create_maps_payload. alloc_len 272 num maps 2
throttle: mappayload: maps_to_push 4 iosize_being_pushed 8388608
kcfis_push: pushing maps to appliance 0x7f4d7e9bc4c0
kcfis_issue_ioctl: payload_type 2
oss_ioctl open file: 11, opcode: 99
Sending message RQ_Tag_2827385864_54: 0xf9c5728 with frag # 0, to box my_box_refid: 2 (box inc: 12)
Command with band 1, payload length 272, payload 0x7f4d7e9b9ef8
RQ_Tag_2827385864_54: Command name IOCTL, refid 54
Ioctl arguments fd 1 opcode 99 size 272
oss_ioctl done
oss_wait called for request: 0xf9b3a40
Vrpc completion - RQ_Tag_2341095748_52: cmd 0xf9c6ec8, req 0xf9b3a40, frag # 1
Vrpc reply 0xf9c6f80 with refid 52 for existing message: 0xf9c6e88
Reply with band 1, payload length 40, payload 0x7f4d7ea65a90
RQ_Tag_2341095748_52: Command name REPLY IOCTL, refid 52
oss_wait done, request 0xf9b3a40, result 0x7f4d837e4c48

WAIT #139970814085680: nam='cell smart table scan' ela= 44 cellhash#=3249924569 p2=0 p3=0 obj#=61471 tim=3470911270172

The kcfis_push call is addressed to appliance 0x7f4d7e9bc4c0, which maps to 192.168.12.10. The cell hash from the event (3249924569) maps to a different cell though. You can find the corresponding kcfis_push earlier in the trace. In fact, it is the first occurrence after the last WAIT event line.

trans_req_element = 0x7f4d7e870780
Pushing request : disknumber = 5, offset = 937011707904, len = 4194304 rdba: 44570624 version 0 reqid=1021 cver=1
disk=0x7f4d7e9b5980 state=1
disk=0x7f4d7e9b4968 state=1
disk=0x7f4d7e9b3460 state=1
disk=0x7f4d7e9b2f70 state=1
disk=0x7f4d7e9df5a0 state=1
disk=0x7f4d7e9df0b0 state=1
disk=0x7f4d7e9dea78 state=1
kcfis_create_maps_payload. alloc_len 2648 num maps 35
throttle: mappayload: maps_to_push 10 iosize_being_pushed 102039552
kcfis_push: pushing maps to appliance 0x7f4d7ea65570
kcfis_issue_ioctl: payload_type 2
SKGXP-connecting to the box 192.168.12.8 OSS OS Pid 4541
oss_ioctl open file: 2, opcode: 99
Sending message RQ_Tag_2341095748_52: 0xf9c6e88 with frag # 0, to box my_box_refid: 0 (box inc: 16)
Command with band 3, payload length 2648, payload 0x7f4d7ea62e98
RQ_Tag_2341095748_52: Command name IOCTL, refid 52
Ioctl arguments fd 1 opcode 99 size 2648
oss_ioctl done

This is getting too complex, can you say it again?

Now let’s put this together again, simplified:

WAIT #139970814085680: nam='cell smart table scan' ela= 24 cellhash#=822451848 p2=0 p3=0 obj#=61471 tim=3470911267234

kcfis_push: num-appliances 3. payload_size 0x7f4d837e4cf0 ioctl_issued 0x7f4d837e4cc8 results 0x7f4d837e4c48 payload_type 0x7f4d837e4d20
kcfis_create_maps_payload. appliance 0x7f4d7ea65570 num_disks 12
...
kcfis_create_maps_payload. alloc_len 2648 num maps 35
throttle: mappayload: maps_to_push 10 iosize_being_pushed 102039552
kcfis_push: pushing maps to appliance 0x7f4d7ea65570
kcfis_issue_ioctl: payload_type 2
SKGXP-connecting to the box 192.168.12.8 OSS OS Pid 4541
oss_ioctl open file: 2, opcode: 99
Sending message RQ_Tag_2341095748_52: 0xf9c6e88 with frag # 0, to box my_box_refid: 0 (box inc: 16)
Command with band 3, payload length 2648, payload 0x7f4d7ea62e98
RQ_Tag_2341095748_52: Command name IOCTL, refid 52
Ioctl arguments fd 1 opcode 99 size 2648
oss_ioctl done
...
kcfis_create_maps_payload. appliance 0x7f4d7e9bf518 num_disks 12
...
kcfis_create_maps_payload. alloc_len 1208 num maps 15
throttle: mappayload: maps_to_push 10 iosize_being_pushed 34471936
kcfis_push: pushing maps to appliance 0x7f4d7e9bf518
kcfis_issue_ioctl: payload_type 2
SKGXP-connecting to the box 192.168.12.9 OSS OS Pid 9547
...
oss_ioctl done
...
kcfis_create_maps_payload. appliance 0x7f4d7e9bc4c0 num_disks 12
...
kcfis_create_maps_payload. alloc_len 272 num maps 2
throttle: mappayload: maps_to_push 4 iosize_being_pushed 8388608
kcfis_push: pushing maps to appliance 0x7f4d7e9bc4c0
...
oss_ioctl done
...
oss_wait done, request 0xf9b3a40, result 0x7f4d837e4c48

WAIT #139970814085680: nam='cell smart table scan' ela= 44 cellhash#=3249924569 p2=0 p3=0 obj#=61471 tim=3470911270172

Interestingly the results for the requests come back in that order as well. It seems coincidence, judging by the rest of the trace. Compare the appliance values with the summary you just read:

WAIT #139970814085680: nam='cell smart table scan' ela= 44 cellhash#=3249924569 p2=0 p3=0 obj#=61471 tim=3470911270172

Ioctl completed. Payload type 2
Ioctl quarantine response 1 for appliance 0x7f4d7ea65570
appliance 0x7f4d7ea65570 : cellsrv pid: 4541: predicate: /box/predicate114
oss_wait called for request: 0xf9c69c0
Vrpc completion - RQ_Tag_2825762783_53: cmd 0xf9c6768, req 0xf9c69c0, frag # 1
Vrpc reply 0xf9c6820 with refid 53 for existing message: 0xf9c6728
Reply with band 1, payload length 40, payload 0x7f4d7e9bfa38
RQ_Tag_2825762783_53: Command name REPLY IOCTL, refid 53
oss_wait done, request 0xf9c69c0, result 0x7f4d837e4c68

WAIT #139970814085680: nam='cell smart table scan' ela= 64 cellhash#=674246789 p2=0 p3=0 obj#=61471 tim=3470911270269

Ioctl completed. Payload type 2
Ioctl quarantine response 1 for appliance 0x7f4d7e9bf518
appliance 0x7f4d7e9bf518 : cellsrv pid: 9547: predicate: /box/predicate37211
oss_wait called for request: 0xf9c5260
Vrpc completion - RQ_Tag_2827385864_54: cmd 0xf9c5768, req 0xf9c5260, frag # 1
Vrpc reply 0xf9c5820 with refid 54 for existing message: 0xf9c5728
Reply with band 1, payload length 40, payload 0x7f4d7e9bc9e0
RQ_Tag_2827385864_54: Command name REPLY IOCTL, refid 54
oss_wait done, request 0xf9c5260, result 0x7f4d837e4c88

WAIT #139970814085680: nam='cell smart table scan' ela= 54 cellhash#=822451848 p2=0 p3=0 obj#=61471 tim=3470911270351

Ioctl completed. Payload type 2
Ioctl quarantine response 1 for appliance 0x7f4d7e9bc4c0

SQL trace wait events frequently seem to be written after an IOCTL (standard Linux call) or oss_wait (Oracle) has completed.  So it appears as if requests are created and treated on a cell-by-cell basis. This explains why you get so many cell table smart scan events in general. It’s not because they are necessarily small in size, the payloads always seem to include more than 1 disk to scan. Thinking about it now it makes sense, but as an analyst I would have appreciated more insights into what happens under the covers. But we can trace, right?

Data Guard transport lag in OEM 12c

I have come across this phenomenon a couple of times now so I thought it was worth writing up.

Consider a scenario where you get an alert because your standby database has an apply lag. The alert is generated by OEM and when you log in and check-it has indeed an apply lag. Even worse, the apply lag increases with every refresh of the page! I tagged this as an 11.2 problem but it’s definitely not related to that version.

Here is a screenshot of this misery:

 Lag in OEM

Now there are of course a number of possible causes:

  • There is a lag
  • You are not using Real Time Apply

The first one is easy to check: look at the redo generation rate on the primary database to see if it’s any different. Maybe you are currently loading lots of data? Maybe a batch job has been initiated that goes over a lot of data… the possibilities are nearly endless.

Another, more subtle interpretation could be that you are not using Real Time Apply. How can you check? In the broker command line interface for example:

DGMGRL> show configuration

Configuration - test

  Protection Mode: MaxPerformance
  Databases:
    pri - Primary database
      Warning: ORA-16789: standby redo logs not configured

    sby - Physical standby database
      Warning: ORA-16789: standby redo logs not configured

Fast-Start Failover: DISABLED

Configuration Status:
WARNING

The warnings about missing standby redo logs show that you cannot possibly use Real Time Apply (it needs standby redo logs). The other option is in the database itself:

SQL> select dest_id,status,database_mode,recovery_mode
  2  from v$archive_dest_status
  3  where status <> 'INACTIVE';

   DEST_ID STATUS    DATABASE_MODE   RECOVERY_MODE
---------- --------- --------------- -----------------------
         1 VALID     MOUNTED-STANDBY MANAGED
        32 VALID     UNKNOWN         IDLE

Did you notice dest_id of 32? That’s a bit of an unusual one, more on that later (since you can only set log_archive_dest_x where x is {1,31}).

So indeed we have managed recovery active, but not using Real Time Apply. This is expressed in the database status:

DGMGRL> show database verbose sby

Database - sby

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   28 seconds
  Apply Lag:       28 seconds
  Real Time Query: OFF
  Instance(s):
    sby

A few moments later when you query the database again the lag has increased:

DGMGRL> show database verbose sby

Database - sby

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   3 minutes 22 seconds
  Apply Lag:       3 minutes 22 seconds
  Real Time Query: OFF
  Instance(s):
    sby

This is to be expected-the primary is still happily processing user requests. The cure is to add standby redo logs, as suggested in so many places and described in the Data Guard documentation. After the successful addition of SRLs the lag should disappear. A restart of managed recovery using the broker will show something along these lines on the standby:

2014-01-30 14:35:18.353000 +00:00
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  THROUGH ALL SWITCHOVER DISCONNECT  USING CURRENT LOGFILE
Attempt to start background Managed Standby Recovery process (sby)
MRP0 started with pid=24, OS id=4854
MRP0: Background Managed Standby Recovery process started (sby)
2014-01-30 14:35:23.406000 +00:00
 started logmerger process
Managed Standby Recovery starting Real Time Apply
...
2014-01-30 14:37:12.595000 +00:00
Media Recovery Waiting for thread 1 sequence 20 (in transit)
2014-01-30 14:37:13.691000 +00:00
Recovery of Online Redo Log: Thread 1 Group 5 Seq 20 Reading mem 0
  Mem# 0: +DATA/sby/onlinelog/group_5.266.838218819

Two important bits of information are shown here: Managed Standby Recovery starting Real Time Apply and the fact that it is using the standby redo log. Sure enough, after the database is in sync with its primary and uses the log, the lag is gone:

DGMGRL> show database verbose sby

Database - sby

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   0 seconds
  Apply Lag:       0 seconds
  Real Time Query: OFF
  Instance(s):
    sby

And also in the OEM view:

OEM-lag-02

Slight Variation

I have also seen this problem in OEM where the transport lag was near 0 and therefore hardly visible due to the scale of the graph. The apply lag nevertheless resulted from the primary working and the current log hasn’t shipped to the standby-obviously before the implementation of standby redo logs. You saw a spike mounting in the OEM view until the next log switch on the primary when the apply lag dropped to 0 for a brief moment before increasing again.

Summary

Real Time Apply is a very very useful feature, especially when used together with the maximum availability protection mode. The real risk of not using standby redo logs – and implicitly no RT Apply – is that you lose data since the current online redo log on the primary has not been copied across. If you need to activate your standby you will be some transactions short of the primary. The larger the online redo log, the larger the gap.

Massive tablespace fragmentation on LMT with ASSM

I have been asked to investigate another interesting problem worth writing about. It’s worth mentioning here because it deals with a problem I believed to have long since been solved: tablespace fragmentation. However, in this case it was a locally managed tablespace (LMT) with Automatic Segment Space Management (ASSM) enabled. It should be difficult to have fragmentation on one of these, but as you will see it is not impossible.

So the story started innocently enough with an ORA-01653 while shuffling around subpartitions to another tablespace:

01653, 00000, "unable to extend table %s.%s by %s in tablespace %s"
// *Cause:  Failed to allocate an extent of the required number of blocks for
//          a table segment in the tablespace indicated.
// *Action: Use ALTER TABLESPACE ADD DATAFILE statement to add one or more
//          files to the tablespace indicated.

Well except that there was plenty of free space available, the sum(bytes) in dba_free_tablespaces for the tablespace showed in excess of 20 GB. So it could only be: fragmentation. And indeed I quickly found out that the largest contiguous free block was 8 MB. Which should really have been enough for most use cases… After a little digging around I noticed that all subpartitions on the tablespace had at least 9 extents, some more. The sum of bytes per subpartition ever so slightly exceeded 8MB which was the cause for the fragmentation. Here is an example:

SQL> select segment_name,partition_name,bytes
  2  from dba_extents
  3  where segment_name = 'T1_SUBPART'
  4  and partition_name = 'SYS_SUBP26737';

SEGMENT_NAME                   PARTITION_NAME                      BYTES
------------------------------ ------------------------------ ----------
T1_SUBPART                     SYS_SUBP26737                     1048576
T1_SUBPART                     SYS_SUBP26737                     1048576
T1_SUBPART                     SYS_SUBP26737                     1048576
T1_SUBPART                     SYS_SUBP26737                     1048576
T1_SUBPART                     SYS_SUBP26737                     1048576
T1_SUBPART                     SYS_SUBP26737                     1048576
T1_SUBPART                     SYS_SUBP26737                      131072
T1_SUBPART                     SYS_SUBP26737                      131072
T1_SUBPART                     SYS_SUBP26737                      131072
T1_SUBPART                     SYS_SUBP26737                      131072
T1_SUBPART                     SYS_SUBP26737                      196608
T1_SUBPART                     SYS_SUBP26737                     1048576
T1_SUBPART                     SYS_SUBP26737                     1048576
                                                              ----------
sum                                                              9109504

13 rows selected.

Some of the extents are nicely aligned along 1MB boundaries, but there are a few that are 128kb in size. Multiply this by a few thousand and you get the picture. It’s not good if you have 20,000 entries in DBA_FREE_SPACE for a given tablespace!

The error message was thrown during the execution of an anonymous PL/SQL block that moved lots of subpartitions from one tablespace to another. There were these already mentioned roughly few thousand subpartitions to be moved plus a few unpartitioned tables. The process aborted halfway. The following is more or less a transcript of my troubleshooting efforts.

The test case

To reproduce the problem in our lab I created a table with lots of sub partitions, using the following SQL statement. It doesn’t matter that the subpartitions are more or less empty in this case.

CREATE TABLE t1_subpart
  (
    id, t_pad, date_Created, date_completed, state, spcol
  )
  partition BY range (id) interval (1000)
  subpartition BY list (spcol) subpartition template
  (
    subpartition sp1 VALUES ('0'),
    subpartition sp2 VALUES ('1'),
    subpartition sp3 VALUES ('2'),
    subpartition sp4 VALUES ('3'),
    subpartition spdef VALUES (DEFAULT)
  )
  (
    partition p_man VALUES less than (10000)
  ) AS
WITH v1 AS
  (SELECT rownum n FROM dual CONNECT BY level <= 10000
  )
SELECT mod(rownum,1000000) id,
  rpad(rownum,100) t_pad,
  TRUNC(sysdate) - 180 + TRUNC((rownum-1)/3)/86400 date_created,
  TRUNC(sysdate) - 180 + TRUNC((rownum-1)/3)/86400 + dbms_random.value(1800, 86400)/86400 date_completed,
  CASE
    WHEN mod(rownum,100000) = 0
    THEN CAST('RARE' AS VARCHAR2(12))
    WHEN mod(rownum,10000) = 0
    THEN CAST('FAIRLY RARE' AS VARCHAR2(12))
    WHEN mod(rownum,1000) = 0
    THEN CAST('NOT RARE' AS VARCHAR2(12))
    WHEN mod(rownum,100) = 0
    THEN CAST('COMMON' AS   VARCHAR2(12))
    ELSE CAST('THE REST' AS VARCHAR2(12))
  END state,
  TO_CHAR(mod(rownum, 4)) spcol
FROM v1,
  v1
WHERE rownum <= 1e6; 

(If this looks familiar to you then you probably know this presentation by Jonathan Lewis)

A couple of things to remember on this 11.2.0.3 database: Oracle introduced large extents for partitions, as explained in “Initial Extent Size of a Partition Changed to 8MB from 64KB After Upgrade to 11.2.0.2 or Later (Doc ID 1295484.1)”. So having empty partitions is now a huge waste of space!

But back to the problem: is there anything that would lead you to suspect fragmentation?

SQL> select max(blocks) from dba_segments where segment_name = 'T1_SUBPART';

MAX(BLOCKS)
-----------
       1024

Elapsed: 00:00:00.15

No, all subpartitions are 8 MB, most of them 1x8M, < 15 different:

select s.segment_name,s.partition_name, s.blocks, s.extents
from user_segments s
where s.segment_name = 'T1_SUBPART'
and extents <> 1
order by extents desc;

SEGMENT_NAME                   PARTITION_NAME                     BLOCKS    EXTENTS
------------------------------ ------------------------------ ---------- ----------
T1_SUBPART                     SYS_SUBP29210                        1024          8
T1_SUBPART                     SYS_SUBP32032                        1024          5
T1_SUBPART                     SYS_SUBP29180                        1024          5
T1_SUBPART                     SYS_SUBP28738                        1024          4
T1_SUBPART                     SYS_SUBP32624                        1024          4
T1_SUBPART                     SYS_SUBP26953                        1024          4
T1_SUBPART                     SYS_SUBP30400                        1024          4
T1_SUBPART                     SYS_SUBP30996                        1024          4
T1_SUBPART                     SYS_SUBP31591                        1024          4
T1_SUBPART                     SYS_SUBP28143                        1024          4
T1_SUBPART                     SYS_SUBP27548                        1024          4
T1_SUBPART                     SYS_SUBP29805                        1024          4

Most of the subpartitions have 1×8 MB extents, very few have more, but all sum up to 8MB:

SQL> select count(extents),extents from user_segments s
   where s.segment_name = 'T1_SUBPART'
   group by extents

COUNT(EXTENTS)    EXTENTS
-------------- ----------
          4942          1
             9          4
             2          5
             1          8

Elapsed: 00:00:00.66
SQL>

SQL> select distinct bytes from user_segments where segment_name = 'T1_SUBPART';

     BYTES
----------
   8388608

Next I am trying to reproduce the problem as closely as possible. Here, HCC compression has been enabled by the customer on the table, after which the partitions have been moved to the destination tablespace. Note that this problem has nothing to do with HCC. I have since tried to reproduce this case outside of Exadata with the same result.

SQL> alter table t1_subpart compress for query high;

Table altered.

Next I’ll show you DBA_FREE_SPACE for the tablepace:

SQL> select * from dba_free_space
  2* where tablespace_name = 'DESTINATION';

TABLESPACE_NAME                   FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
------------------------------ ---------- ---------- ---------- ---------- ------------
DESTINATION                            10        128 1072693248     130944           10
DESTINATION                            11        128 1072693248     130944           11

Elapsed: 00:00:00.01

So now I’m going to move all of the subpartitions. At first I couldn’t reproduce the fragmentation at all when I moved these serially-everything stayed just as it was, some of the subpartitions which consisted of more than 1 extent actually were coalesced into just 1. I didn’t pay attention to the “parallel” keyword as I thought that was just a means of speeding things up. When after the 3rd time I had no luck reproducing the fragmentation I actually tried with parallel, and here is how it went. First the anonymous PL/SQL block:

SQL> !cat movemove.sql
begin
 for i in (select subpartition_name from user_tab_subpartitions
           where table_name = 'T1_SUBPART' and tablespace_name <> 'DESTINATION')
 loop
  begin
   execute immediate 'alter table t1_subpart move subpartition ' || i.subpartition_name || ' tablespace DESTINATION parallel';
  exception
   when others then
    dbms_output.put_line('could not move ' || i.subpartition_name || ' due to ' || sqlerrm(sqlcode));
  end;
 end loop;
end;
/

This takes a long time, and the quality of the script is not production-worthy! In this case the anonymous block completed without issues, but that was just 1 table and not the same volume as the real system where this happened. Look at the extent distribution now:

SQL> select count(extents),extents from user_segments s
  2  where s.segment_name = 'T1_SUBPART'
  3  group by extents;

COUNT(EXTENTS)    EXTENTS
-------------- ----------
           634          1
          3959         13
           356          5
             4         48
             1          8

Also consider the impact on DBA_FREE_SPACE:

SQL> select count(1) from dba_free_Space where tablespace_name = 'DESTINATION';

  COUNT(1)
----------
     19796

Now looking at a random subpartition shows the dilemma:

SQL> select PARTITION_NAME, SEGMENT_TYPE, TABLESPACE_NAME, EXTENT_ID, BYTES, BLOCKS
  2  from dba_extents where segment_name = 'T1_SUBPART'
  3  and partition_name = 'SYS_SUBP26737'
  4  order by extent_id;

PARTITION_NAME                 SEGMENT_TYPE       TABLESPACE_NAME                 EXTENT_ID      BYTES     BLOCKS
------------------------------ ------------------ ------------------------------ ---------- ---------- ----------
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             0     131072         16
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             1     131072         16
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             2     131072         16
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             3     131072         16
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             4     196608         24
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             5    1048576        128
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             6    1048576        128
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             7    1048576        128
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             8    1048576        128
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             9    1048576        128
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                            10    1048576        128
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                            11    1048576        128
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                            12    1048576        128
                                                                                            ---------- ----------
sum                                                                                            9109504       1112

So there you go-I don’t really know what to say other than “d’oh”. It probably doesn’t hurt you too much but it’s still worth knowing that if you are consolidating thousands of segments in a previously empty tablespace you can get into trouble. If I work out why it does this I’ll post a follow-up. For now the work around is to move segments serially. Yes I know.