Martins Blog

Trying to explain complex things in simple terms

Little things worth knowing: automatic generation of extended statistics in 12c Part II

Posted by Martin Bach on November 6, 2015

In my last post I demonstrated an example how to use a PL/SQL package to monitor a workload in order to identify correlated columns. Such columns are good candidates for the creation of extended statistics since the optimiser does not assume that information stored in different columns may be related. Before starting my tests I invoked DBMS_STATS.SEED_COL_USAGE, which in turn increased Oracle’s attention level trying to find correlated columns. Eventually these have been identified (as shown in the output of DBMS_STATS.REPORT_COL_USAGE) and a subsequent call to DBMS_STATS.GATHER_TABLE_STATS caused extended statistics to be created, including histograms. This is one way you can get extended statistics automatically, but it requires you to enable monitoring of a workload by invoking a PL/SQL API. In other words, it is a conscious effort.

More Ways to Extended Statistics

The other way of ending up with extended statistics is more subtle as it does not require user intervention. As described in the documentation (Database SQL Tuning Guide, chapter 13 “Managing Optimizer Statistics: Advanced Topics”) there is another way to gather extended statistics automatically. It is based on statistics feedback and SQL Plan Directives, both new to 12c (actually statistics feedback is not quite so new, it seems to be an evolution of cardinality feedback).

Demo

To start with a clean sheet I dropped the table I had used previously, which got rid of all the extended stats and SQL Plan Directives (SPD) from the previous example. To be really thorough I also flushed the shared pool.

SQL> drop table martin.customers purge;

Table dropped.

SQL> select * from dba_stat_extensions where owner = user;

no rows selected

SQL> select count(*) from DBA_SQL_PLAN_DIR_OBJECTS where owner = user;

  COUNT(*)
----------
         0

SQL> alter system flush shared_pool;

System altered.

BTW I have switched environments to an Oracle Restart 12.1.0.2 database with the JUL 2015 PSU applied. The second patch you see here is the OJVM patch.

SQL> select patch_id, version, action from dba_registry_sqlpatch;

  PATCH_ID VERSION              ACTION
---------- -------------------- ---------------
  20831110 12.1.0.2             APPLY
  21068507 12.1.0.2             APPLY

Back to creating the table… I am using the same technique as before, but this time without the call to DBMS_STATS.SEED_COL_USAGE:

SQL> create table martin.customers as select * from sh.customers;

Table created.

SQL> select num_rows, last_analyzed from user_tables where table_name = 'CUSTOMERS';

  NUM_ROWS LAST_ANALYZED
---------- -------------------
     55500 15.10.2015 16:57:42

I love the 12c feature where stats are gathered during a CTAS operation …

And now a few queries to tickle the optimiser. Apologies for the wide output but that makes it so much easier to use RLWRAP and command line history. Credit again to
the blog post by Maria Colgan
for the query examples.

SQL> select /*+ gather_plan_statistics */ count(*) from customers where cust_city = 'Los Angeles' and cust_state_province = 'CA' and country_id = 52790;

  COUNT(*)
----------
       932

SQL>
SQL> select * from table(dbms_xplan.display_cursor(format => 'allstats last'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------
SQL_ID  34zmr3acgz06g, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from customers where
cust_city = 'Los Angeles' and cust_state_province = 'CA' and country_id
= 52790

Plan hash value: 296924608

---------------------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |      1 |00:00:00.03 |    1521 |   1516 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |      1 |00:00:00.03 |    1521 |   1516 |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |      1 |    932 |00:00:00.04 |    1521 |   1516 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("CUST_CITY"='Los Angeles' AND "CUST_STATE_PROVINCE"='CA' AND
              "COUNTRY_ID"=52790))


22 rows selected.

There is nothing too new here-the optimiser’s cardinality estimate is not great. Following the example from the SQL Tuning Guide-chapter 13 (btw country_id is a number, not a varchar2 field) we can now check if there is anything fancy going to happen next. And sure enough there is:

SQL> select sql_id, child_number, is_obsolete, is_reoptimizable, is_resolved_adaptive_plan from v$sql where sql_id = '34zmr3acgz06g';

SQL_ID        CHILD_NUMBER I I I
------------- ------------ - - -
34zmr3acgz06g            0 N Y

The second flag, is_reoptimisable, is interesting. It indicates the optimiser’s intention to apply information found at run-time to the next execution of the cursor. Let’s run the original query again.

SQL> select /*+ gather_plan_statistics */ count(*) from customers where cust_city = 'Los Angeles' and cust_state_province = 'CA' and country_id = 52790;

  COUNT(*)
----------
       932

So what did that do? Let’s have a look at the diagnostics:


SQL> select sql_id, child_number, is_obsolete, is_reoptimizable, is_resolved_adaptive_plan from v$sql where sql_id = '34zmr3acgz06g';

SQL_ID        CHILD_NUMBER I I I
------------- ------------ - - -
34zmr3acgz06g            0 N Y
34zmr3acgz06g            1 N N

SQL> select * from table(dbms_xplan.display_cursor(sql_id=>'34zmr3acgz06g',cursor_child_no=>1,format => 'allstats last'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  34zmr3acgz06g, child number 1
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from customers where
cust_city = 'Los Angeles' and cust_state_province = 'CA' and country_id
= 52790

Plan hash value: 296924608

------------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |      1 |00:00:00.01 |    1521 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |      1 |00:00:00.01 |    1521 |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |    932 |    932 |00:00:00.01 |    1521 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("CUST_CITY"='Los Angeles' AND "CUST_STATE_PROVINCE"='CA' AND
              "COUNTRY_ID"=52790))

Note
-----
   - statistics feedback used for this statement


26 rows selected.

As you can see another child cursor has been created. Why?

SQL> select xmltype(reason) from v$sql_shared_cursor where dbms_lob.getlength(reason) > 1 and sql_id = '34zmr3acgz06g';

XMLTYPE(REASON)
-----------------------------------------------------------------------------------------------------------------------------
<ChildNode>
  <ChildNumber>0</ChildNumber>
  <ID>49</ID>
  <reason>Auto Reoptimization Mismatch(1)</reason>
  <size>3x4</size>
  <kxscflg>32</kxscflg>
  <kxscfl4>4194560</kxscfl4>
  <dnum_kksfcxe>262144</dnum_kksfcxe>
</ChildNode>

Ah – the cursor was created because of a reoptimisation mismatch. Checking against v$sql you can see that the optimiser is now happy with the cardinality estimate (not anticipating further reoptimisation for this statement). Note that depending on the query’s complexity many more child cursors can be created as part of the reoptimisation. Also note that for child cursor 1 the A and E rows are perfect matches. Our work here is done. But wait- what about that column usage?


SQL> select dbms_stats.report_col_usage(user, 'CUSTOMERS') from dual;

DBMS_STATS.REPORT_COL_USAGE(USER,'CUSTOMERS')
--------------------------------------------------------------------------------
LEGEND:
.......

EQ         : Used in single table EQuality predicate
RANGE      : Used in single table RANGE predicate
LIKE       : Used in single table LIKE predicate
NULL       : Used in single table is (not) NULL predicate
EQ_JOIN    : Used in EQuality JOIN predicate
NONEQ_JOIN : Used in NON EQuality JOIN predicate
FILTER     : Used in single table FILTER predicate
JOIN       : Used in JOIN predicate
GROUP_BY   : Used in GROUP BY expression
...............................................................................

###############################################################################

COLUMN USAGE REPORT FOR MARTIN.CUSTOMERS
........................................

1. COUNTRY_ID                          : EQ
2. CUST_CITY                           : EQ
3. CUST_STATE_PROVINCE                 : EQ
###############################################################################

SQL> select * from dba_stat_extensions where owner = user;

no rows selected

Nothing here except some equality predicates! This is the big difference to part I of this article where the filter on the 3 columns was detected thanks to the call to DBMS_STATS.SEED_COL_USAGE.

It took me a while to connect the dots and understand what needed to be done next. In the meantime, hidden from view, and in the background, Oracle created a SQL Plan Directive for that table which I failed to notice for a while. SQL Plan Directives are not persisted immediately, the documentation states they are written to SYSAUX every 15 minutes. I must have thought about this for more than 15 minutes, as you will see shortly. If you are a bit more impatient then force the write of the SPD to disk using DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE now.

To verify if there were any SPDs I used a query taken more or less literally from the documentation, and yes, there are:

SQL> SELECT TO_CHAR(d.DIRECTIVE_ID) dir_id, o.OBJECT_NAME,
  2         o.SUBOBJECT_NAME col_name, o.OBJECT_TYPE, d.TYPE, d.STATE, d.REASON
  3  FROM   DBA_SQL_PLAN_DIRECTIVES d, DBA_SQL_PLAN_DIR_OBJECTS o
  4  WHERE  d.DIRECTIVE_ID=o.DIRECTIVE_ID
  5  AND    o.OWNER = user
  6  ORDER BY 1,2,3,4,5;

DIR_ID                         OBJECT_NAME          COL_NAME             OBJECT TYPE             STATE    REASON
------------------------------ -------------------- -------------------- ------ ---------------- -------- ------------------------------------
15273172249382976180           CUSTOMERS            COUNTRY_ID           COLUMN DYNAMIC_SAMPLING USABLE   SINGLE TABLE CARDINALITY MISESTIMATE
15273172249382976180           CUSTOMERS            CUST_CITY            COLUMN DYNAMIC_SAMPLING USABLE   SINGLE TABLE CARDINALITY MISESTIMATE
15273172249382976180           CUSTOMERS            CUST_STATE_PROVINCE  COLUMN DYNAMIC_SAMPLING USABLE   SINGLE TABLE CARDINALITY MISESTIMATE
15273172249382976180           CUSTOMERS                                 TABLE  DYNAMIC_SAMPLING USABLE   SINGLE TABLE CARDINALITY MISESTIMATE

Once you can see the SPD in the dictionary, you can also see them in action when you run a statement with the same predicates but different SQL_ID, as in this example.

SQL> select /*+ gather_plan_statistics new_parse_please */ count(*) from customers where cust_city = 'Los Angeles' and cust_state_province = 'CA' and country_id = 52790;

  COUNT(*)
----------
       932

SQL> select * from table(dbms_xplan.display_cursor(format => 'allstats last'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  85qvryzgzj57q, child number 0
-------------------------------------
select /*+ gather_plan_statistics new_parse_please */ count(*) from
customers where cust_city = 'Los Angeles' and cust_state_province =
'CA' and country_id = 52790

Plan hash value: 296924608

------------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |      1 |00:00:00.01 |    1521 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |      1 |00:00:00.01 |    1521 |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |    829 |    932 |00:00:00.01 |    1521 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("CUST_CITY"='Los Angeles' AND "CUST_STATE_PROVINCE"='CA' AND
              "COUNTRY_ID"=52790))

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - 1 Sql Plan Directive used for this statement


27 rows selected.

The SQL Plan directive stays even if you flush the cursor with SQL ID 34zmr3acgz06g from the cursor cache or supply a different set of predicates. I used Kerry Osborne’s flush_sql.sql script for this.

SQL> @flush_sql 
Enter value for sql_id: 34zmr3acgz06g
old  14:   where sql_id like '&sql_id';
new  14:   where sql_id like '34zmr3acgz06g';

PL/SQL procedure successfully completed.

SQL> select sql_id, child_number, is_obsolete, is_reoptimizable, is_resolved_adaptive_plan from v$sql where sql_id = '34zmr3acgz06g';

no rows selected

SQL> select /*+ gather_plan_statistics */ count(*) from customers where cust_city = 'Los Angeles' and cust_state_province = 'CA' and country_id = 52790;

  COUNT(*)
----------
       932

SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  34zmr3acgz06g, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from customers where
cust_city = 'Los Angeles' and cust_state_province = 'CA' and country_id
= 52790

Plan hash value: 296924608

------------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |      1 |00:00:00.01 |    1521 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |      1 |00:00:00.01 |    1521 |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |    953 |    932 |00:00:00.01 |    1521 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("CUST_CITY"='Los Angeles' AND "CUST_STATE_PROVINCE"='CA' AND
              "COUNTRY_ID"=52790))

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - 1 Sql Plan Directive used for this statement


27 rows selected.


SQL> select /*+ gather_plan_statistics */ count(*) from customers where cust_city = 'Palmdale' and cust_state_province = 'FL' and country_id = 52790;

  COUNT(*)
----------
       250

SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  dbkfpchpfwap3, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from customers where
cust_city = 'Palmdale' and cust_state_province = 'FL' and country_id =
52790

Plan hash value: 296924608

------------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |      1 |00:00:00.01 |    1521 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |      1 |00:00:00.01 |    1521 |
|*  2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |    287 |    250 |00:00:00.02 |    1521 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("CUST_CITY"='Palmdale' AND "CUST_STATE_PROVINCE"='FL' AND
              "COUNTRY_ID"=52790))

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - 1 Sql Plan Directive used for this statement


27 rows selected.

And NOW you get the update to REPORT_COL_USAGE:

SQL> select dbms_stats.report_col_usage(user, 'CUSTOMERS') from dual;

DBMS_STATS.REPORT_COL_USAGE(USER,'CUSTOMERS')
--------------------------------------------------------------------------------
LEGEND:
.......

EQ         : Used in single table EQuality predicate
RANGE      : Used in single table RANGE predicate
LIKE       : Used in single table LIKE predicate
NULL       : Used in single table is (not) NULL predicate
EQ_JOIN    : Used in EQuality JOIN predicate
NONEQ_JOIN : Used in NON EQuality JOIN predicate
FILTER     : Used in single table FILTER predicate
JOIN       : Used in JOIN predicate
GROUP_BY   : Used in GROUP BY expression
...............................................................................

###############################################################################

COLUMN USAGE REPORT FOR MARTIN.CUSTOMERS
........................................

1. COUNTRY_ID                          : EQ
2. CUST_CITY                           : EQ
3. CUST_STATE_PROVINCE                 : EQ
4. (CUST_CITY, CUST_STATE_PROVINCE,
    COUNTRY_ID)                        : FILTER
###############################################################################

From here on it’s the same as in part 1 of this article. You gather stats, either manually like me or automatically like Oracle would, and the end result are extended statistics on the “filter” shown in line 4.


SQL> exec dbms_stats.gather_table_stats(user, 'CUSTOMERS')

PL/SQL procedure successfully completed.

SQL> select table_name, extension_name, extension from dba_stat_extensions where owner = user;

TABLE_NAME                     EXTENSION_NAME                           EXTENSION
------------------------------ ---------------------------------------- --------------------------------------------------------------------------------
CUSTOMERS                      SYS_STSMZ$C3AIHLPBROI#SKA58H_N           ("CUST_CITY","CUST_STATE_PROVINCE","COUNTRY_ID")

Voila! Extended statistics.

There are many more interesting implications to this whole concept, which is something I’ll write about in another post. It’s incredibly interesting, I can’t believe I’m writing optimiser posts …

PS: thanks to @Mautro for casting an eye over this article!

One Response to “Little things worth knowing: automatic generation of extended statistics in 12c Part II”

  1. leborchuk said

    We’ve faced with massive invalidations because of creating extended statistics. I found out 15000 invalid objects next morning after migration to 12.1.0.2 and should restart my EBS system. The causer is Bug 19450314 “UNNECESSRAY INVALIDATIONS IN 12C”. It hasn’t been included in PSU Oct 2015 and still should be installed separately.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: