Why is my Exadata smart scan not offloading?

After receiving the excellent “Expert Oracle Exadata” book I decided to spend some time looking into Exadata performance after I having spent most of the time previously on infrastructure related questions such as “how can I prevent someone from overwriting my DR system with UAT data”, patching etc.

Now there is one thing to keep in mind with Exadata-you need lots of data before it breaks into a little sweat. Luckily for me, one of my colleagues has performed some testing on this environment. For reasons unknown the swingbench order entry benchmark (version 2.3) has been chosen. For those who don’t know the OE benchmark: it’s a heavy OLTP style workload simulating a web shop where users browse products, place orders etc. OE is optimised for single block I/O, and despite what you may have heard, Exadata doesn’t provide a noticable benefit for these queries.

Anyway, what I liked was the fact that the order_items table had about 350 million rows organised in about 8GB. From discussions with Frits Hoogland I know that a full scan of such a table takes between 40 and 60 seconds depending on system load.

SOE.ORDER_ITEMS

Here are some of the interesting facts around the table:

SQL> select partition_name,num_rows,blocks,avg_space,chain_cnt,global_stats,user_stats,stale_stats
2  from dba_tab_statistics
3  where table_name = 'ORDER_ITEMS'
4  /

PARTITION_NAME                   NUM_ROWS     BLOCKS  AVG_SPACE  CHAIN_CNT GLO USE STA
------------------------------ ---------- ---------- ---------- ---------- --- --- ---
349990815    1144832          0          0 YES NO  NO
SYS_P341                         21866814      71297          0          0 YES NO  NO
SYS_P342                         21889112      72317          0          0 YES NO  NO
SYS_P343                         21877726      71297          0          0 YES NO  NO
SYS_P344                         21866053      71297          0          0 YES NO  NO
SYS_P345                         21870127      71297          0          0 YES NO  NO
SYS_P346                         21887971      72317          0          0 YES NO  NO
SYS_P347                         21875056      71297          0          0 YES NO  NO
SYS_P348                         21891454      72317          0          0 YES NO  NO
SYS_P349                         21883576      72317          0          0 YES NO  NO
SYS_P350                         21859704      71297          0          0 YES NO  NO
SYS_P351                         21866820      71297          0          0 YES NO  NO
SYS_P352                         21865681      71297          0          0 YES NO  NO
SYS_P353                         21865239      71297          0          0 YES NO  NO
SYS_P354                         21870373      71297          0          0 YES NO  NO
SYS_P355                         21882656      71297          0          0 YES NO  NO
SYS_P356                         21872453      71297          0          0 YES NO  NO

17 rows selected.

It has 4 indexes, out of which some are reverse key indexes:

SQL> select index_name,index_type from user_indexes where table_name = 'ORDER_ITEMS';

INDEX_NAME                     INDEX_TYPE
------------------------------ ---------------------------
ITEM_ORDER_IX                  NORMAL/REV
ORDER_ITEMS_UK                 NORMAL
ORDER_ITEMS_PK                 NORMAL
ITEM_PRODUCT_IX                NORMAL/REV

SQL>

This is going to be interesting (I have to admit I didn’t check initially for reverse key indexes). All I wanted was a huge table to see if a smart scan really is so mind blowingly fast. These examples can easily be reproduced by generating the SOE schema with the oewizard-just make sure you select the use of partitioning (you should have a license for it).

Performance Testing

My plan was to start off with a serial execution, then use parallel query and check for execution times.As with all performance tuning of this kind you should have a copy of session snapper from Tanel Poder available. At the time of this writing, the latest version was 3.52 available from Tanel’s blog.

I also wanted to see when a smart scan kicked in. Here’s the first test with serial execution:

10:44:37 SQL> select count(*) from order_items
10:44:40   2  /

COUNT(*)
----------
350749016

Elapsed: 00:00:47.54

OK, that doesn’t look like a smart scan has happened, the 47 seconds are a little too slow. As always, check using snapper to confirm:

SQL> @snapper all 5 1 243
Sampling SID 243 with interval 5 seconds, taking 1 snapshots...
setting stats to all due option = all

-- Session Snapper v3.52 by Tanel Poder @ E2SN ( http://tech.e2sn.com )

-------------------------------------------------------------------------------------------------------------------------------------
SID, USERNAME  , TYPE, STATISTIC                                                 ,     HDELTA, HDELTA/SEC,    %TIME, GRAPH
-------------------------------------------------------------------------------------------------------------------------------------
243, SOE       , STAT, session logical reads                                     ,     81.07k,     16.21k,
243, SOE       , STAT, user I/O wait time                                        ,        299,       59.8,
243, SOE       , STAT, non-idle wait time                                        ,        300,         60,
243, SOE       , STAT, non-idle wait count                                       ,      1.26k,        251,
243, SOE       , STAT, physical read total IO requests                           ,        634,      126.8,
243, SOE       , STAT, physical read total multi block requests                  ,        634,      126.8,
243, SOE       , STAT, physical read total bytes                                 ,    664.14M,    132.83M,
243, SOE       , STAT, cell physical IO interconnect bytes                       ,    664.14M,    132.83M,
243, SOE       , STAT, consistent gets                                           ,     81.07k,     16.21k,
243, SOE       , STAT, consistent gets from cache                                ,          1,         .2,
243, SOE       , STAT, consistent gets from cache (fastpath)                     ,          1,         .2,
243, SOE       , STAT, consistent gets direct                                    ,     81.07k,     16.21k,
243, SOE       , STAT, physical reads                                            ,     81.07k,     16.21k,
243, SOE       , STAT, physical reads direct                                     ,     81.07k,     16.21k,
243, SOE       , STAT, physical read IO requests                                 ,        634,      126.8,
243, SOE       , STAT, physical read bytes                                       ,    664.14M,    132.83M,
243, SOE       , STAT, calls to kcmgcs                                           ,          1,         .2,
243, SOE       , STAT, file io wait time                                         ,        395,         79,
243, SOE       , STAT, Number of read IOs issued                                 ,        635,        127,
243, SOE       , STAT, no work - consistent read gets                            ,      81.1k,     16.22k,
243, SOE       , TIME, DB CPU                                                    ,      1.19s,   237.96ms,    23.8%, |@@@       |
243, SOE       , TIME, sql execute elapsed time                                  ,      4.01s,   801.37ms,    80.1%, |@@@@@@@@  |
243, SOE       , TIME, DB time                                                   ,      4.01s,   801.37ms,    80.1%, |@@@@@@@@  |
243, SOE       , WAIT, direct path read                                          ,      2.99s,   598.63ms,    59.9%, |@@@@@@    |
243, SOE       , WAIT, kfk: async disk IO                                        ,     4.25ms,    849.4us,      .1%, |          |
--  End of Stats snap 1, end=2011-08-17 10:30:47, seconds=5

-----------------------------------------------------------------------
Active% | SQL_ID          | EVENT                     | WAIT_CLASS
-----------------------------------------------------------------------
62% | b0hcgjs21yrq9   | direct path read          | User I/O
38% | b0hcgjs21yrq9   | ON CPU                    | ON CPU

--  End of ASH snap 1, end=2011-08-17 10:30:47, seconds=5, samples_taken=42

PL/SQL procedure successfully completed.

Right-no smart scan; this puzzled me. To recap, smart scans happen only if:

  • Direct path reads are used
  • The row source can be offloaded
  • The parameter cell_offload_processing is set to true (I think it’s “always” in 11.2.0.2+)
  • There are no chained or migrated rows

Now let’s check these.

I can clearly see that direct path reads have happened from the snapper output-check. It’s also worth remembering that the decision to perform direct path reads is made on a segment basis-and each partition is a segment: keep that in mind!

You can check the execution plan to find out if the row source can be offloaded as in this example:

SQL> select * from table(dbms_xplan.display(format=>'+projection'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2209137760

-----------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |     1 |   249K  (1)| 00:49:52 |
|   1 |  SORT AGGREGATE               |                 |     1 |            |          |
|   2 |   INDEX STORAGE FAST FULL SCAN| ITEM_PRODUCT_IX |   349M|   249K  (1)| 00:49:52 |
-----------------------------------------------------------------------------------------

Column Projection Information (identified by operation id):

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------

1 - (#keys=0) COUNT(*)[22]

14 rows selected.

You can see that the INDEX STORAGE FAST FULL SCAN clause is used, in other words the source can (theoretically) be offloaded. The parameter cell_offload_processing was set to true on my system. Do you remember that the index is a reverse key index? I’m wondering if that has anything to do with it (spoiler: it does – see comments. I didn’t know when posting this).

UPDATE: I should have updated this post a long time ago, but thought the comments were sufficient. Reverse Key indexes still cannot be offloaded (at least not with 12.1.2.3.0), a fact that is documented in the storage server software user’s guide, chapter 7.

To rule out that there was a problem with the direct path reads I set “_serial_direct_read”=true and tried again, but it didn’t make a difference.

Another way to check for smart scans in a live system is the use of v$sql-columns IO_CELL_UNCOMPRESSED_BYTES and  IO_CELL_OFFLOAD_RETURNED_BYTES are cumulative counters for smart scan activity. However if they are 0 like in my case, they indicate some sort of issue.

This continued with parallel query: 8 or 64 slaves and still no smart scan. I even traced the execution, but there was not a single pxxx trace file with the word “smart” in it (and I made sure I captured the waits)

What was going on?

SQL_ID  4vkkk105tny60, child number 0
-------------------------------------
select /*+parallel(64)*/ count(*) from order_items

Plan hash value: 544438321

--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name            | Rows  | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                 |       |  4306 (100)|          |        |      |            |
|   1 |  SORT AGGREGATE                   |                 |     1 |            |          |        |      |            |
|   2 |   PX COORDINATOR                  |                 |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)            | :TQ10000        |     1 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE                |                 |     1 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR            |                 |   349M|  4306   (1)| 00:00:52 |  Q1,00 | PCWC |            |
|*  6 |       INDEX STORAGE FAST FULL SCAN| ITEM_PRODUCT_IX |   349M|  4306   (1)| 00:00:52 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

1 - SEL$1
6 - SEL$1 / ORDER_ITEMS@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------

6 - storage(:Z>=:Z AND :Z<=:Z)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

1 - (#keys=0) COUNT()[22]
2 - SYS_OP_MSR()[10]
3 - (#keys=0) SYS_OP_MSR()[10]
4 - (#keys=0) SYS_OP_MSR()[10]

Note
-----
- automatic DOP: Computed Degree of Parallelism is 64

Ok then I got fed up with that ITEM_PRODUCT_IX and forced a full table scan and voila-a smart scan happened.

SQL> select * from table(dbms_xplan.display_cursor('5a1x1v72ujf8s', format=>'ALL'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  5a1x1v72ujf8s, child number 0
-------------------------------------
select /*+ full(t) parallel(t, 8) */ count(*) from order_items t

Plan hash value: 661298821

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name        | Rows  | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |             |       | 43358 (100)|          |       |       |        |      |            |
|   1 |  SORT AGGREGATE                |             |     1 |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR               |             |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)         | :TQ10000    |     1 |            |          |       |       |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE             |             |     1 |            |          |       |       |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR         |             |   349M| 43358   (1)| 00:08:41 |     1 |    16 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS STORAGE FULL| ORDER_ITEMS |   349M| 43358   (1)| 00:08:41 |     1 |    16 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

1 - SEL$1
6 - SEL$1 / T@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------

6 - storage(:Z>=:Z AND :Z<=:Z)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

1 - (#keys=0) COUNT()[22]
2 - SYS_OP_MSR()[10]
3 - (#keys=0) SYS_OP_MSR()[10]
4 - (#keys=0) SYS_OP_MSR()[10]

37 rows selected.

To be sure that was correct, I retried with serial execution (check the time):

11:44:10 SQL> select /*+ full(t) single */ count(*) from order_items t;

COUNT(*)
----------
350749016

Elapsed: 00:00:12.02

@snapper all 5 1 241

Sampling SID 243 with interval 5 seconds, taking 1 snapshots...
setting stats to all due option = all

-- Session Snapper v3.52 by Tanel Poder @ E2SN ( http://tech.e2sn.com )

-------------------------------------------------------------------------------------------------------------------------------------
SID, USERNAME  , TYPE, STATISTIC                                                 ,     HDELTA, HDELTA/SEC,    %TIME, GRAPH
-------------------------------------------------------------------------------------------------------------------------------------
243, SOE       , STAT, session logical reads                                     ,    482.88k,     96.58k,
243, SOE       , STAT, application wait time                                     ,          1,         .2,
243, SOE       , STAT, user I/O wait time                                        ,        158,       31.6,
243, SOE       , STAT, non-idle wait time                                        ,        159,       31.8,
243, SOE       , STAT, non-idle wait count                                       ,      3.72k,      744.8,
243, SOE       , STAT, enqueue waits                                             ,         15,          3,
243, SOE       , STAT, enqueue requests                                          ,         14,        2.8,
243, SOE       , STAT, enqueue conversions                                       ,         21,        4.2,
243, SOE       , STAT, enqueue releases                                          ,         14,        2.8,
243, SOE       , STAT, global enqueue gets sync                                  ,         35,          7,
243, SOE       , STAT, global enqueue releases                                   ,         14,        2.8,
243, SOE       , STAT, physical read total IO requests                           ,      5.23k,      1.05k,
243, SOE       , STAT, physical read total multi block requests                  ,      4.91k,        982,
243, SOE       , STAT, physical read total bytes                                 ,      3.95G,    790.99M,
243, SOE       , STAT, cell physical IO interconnect bytes                       ,      1.72G,    343.34M,
243, SOE       , STAT, ges messages sent                                         ,         21,        4.2,
243, SOE       , STAT, consistent gets                                           ,    482.88k,     96.58k,
243, SOE       , STAT, consistent gets from cache                                ,         97,       19.4,
243, SOE       , STAT, consistent gets from cache (fastpath)                     ,         97,       19.4,
243, SOE       , STAT, consistent gets direct                                    ,    482.78k,     96.56k,
243, SOE       , STAT, physical reads                                            ,    482.78k,     96.56k,
243, SOE       , STAT, physical reads direct                                     ,    482.78k,     96.56k,
243, SOE       , STAT, physical read IO requests                                 ,      5.23k,      1.05k,
243, SOE       , STAT, physical read bytes                                       ,      3.95G,    790.99M,
243, SOE       , STAT, calls to kcmgcs                                           ,         97,       19.4,
243, SOE       , STAT, file io wait time                                         ,      19.7k,      3.94k,
243, SOE       , STAT, cell physical IO bytes eligible for predicate offload     ,      3.95G,    790.89M,
243, SOE       , STAT, cell smart IO session cache lookups                       ,          7,        1.4,
243, SOE       , STAT, cell smart IO session cache hits                          ,          7,        1.4,
243, SOE       , STAT, cell physical IO interconnect bytes returned by smart scan,      1.72G,    343.38M,
243, SOE       , STAT, table scans (long tables)                                 ,          7,        1.4,
243, SOE       , STAT, table scans (direct read)                                 ,          7,        1.4,
243, SOE       , STAT, table scan rows gotten                                    ,    147.19M,     29.44M,
243, SOE       , STAT, table scan blocks gotten                                  ,    480.07k,     96.01k,
243, SOE       , STAT, cell scans                                                ,          7,        1.4,
243, SOE       , STAT, cell blocks processed by cache layer                      ,    576.41k,    115.28k,
243, SOE       , STAT, cell blocks processed by txn layer                        ,    576.41k,    115.28k,
243, SOE       , STAT, cell blocks processed by data layer                       ,    485.06k,     97.01k,
243, SOE       , STAT, cell blocks helped by minscn optimization                 ,    576.28k,    115.26k,
243, SOE       , STAT, cell simulated session smart scan efficiency              ,      3.97G,     794.6M,
243, SOE       , STAT, cell IO uncompressed bytes                                ,      3.97G,    794.93M,
243, SOE       , TIME, DB CPU                                                    ,      2.63s,   526.72ms,    52.7%, |@@@@@@    |
243, SOE       , TIME, sql execute elapsed time                                  ,      4.01s,   801.13ms,    80.1%, |@@@@@@@@  |
243, SOE       , TIME, DB time                                                   ,      4.01s,   801.13ms,    80.1%, |@@@@@@@@  |
243, SOE       , WAIT, enq: KO - fast object checkpoint                          ,     2.19ms,    438.8us,      .0%, |          |
243, SOE       , WAIT, cell smart table scan                                     ,      1.58s,   316.49ms,    31.6%, |@@@@      |
243, SOE       , WAIT, events in waitclass Other                                 ,     8.54ms,     1.71ms,      .2%, |          |
--  End of Stats snap 1, end=2011-08-17 12:53:04, seconds=5

-----------------------------------------------------------------------
Active% | SQL_ID          | EVENT                     | WAIT_CLASS
-----------------------------------------------------------------------
66% | cfhpsq29gb49m   | ON CPU                    | ON CPU
34% | cfhpsq29gb49m   | cell smart table scan     | User I/O

--  End of ASH snap 1, end=2011-08-17 12:53:04, seconds=5, samples_taken=47

PL/SQL procedure successfully complet

Summary

UPDATE: I went over this post 5 years after having written it and although it is now understood that reverse key indexes cannot be Smart Scanned, I decided to keep it here. I have learned from the masters that troubleshooting is a science and requires a systematic approach. I tried to update the post with what I know now, without removing my train of thought. Have a look at the comments for more information!

Responses

  1. Hi Martin,

    Just to clarify a little bit, it appears to me that reverse key indexes are just not enabled in the code for smart scans (even though the operation says “INDEX STORAGE FAST FULL SCAN”). But it’s hard to prove a negative, there may be some circumstance where they would work. I can’t think of a technical reason why they shouldn’t be able to work with smart scans. The situation where a Fast Full Scan on normal B-Tree index does not offload on 11.2.0.1 appears to be a bug to me, possibly related to fairly old DB software (11.2.0.1) running on fairly new storage software (cellsrv 11.2.2.2.0 and later). As I mentioned, I intend to open an SR on this issue.

    1. My reasoning here would be that when smart scanning reverse key indexes, you would also have to offload the reversing function. And as that’s probably done somewhere deeper in the data layer code than regular SQL functions, it was not made offloadable. Definitely an interesting case though…

      1. Thanks Tanel! I’m curious as to what Kerry finds out when he raises the SR…

  2. Index FFS for reverse key indexes and compressed indexes are not offloaded.

    1. Greg,

      There is a difference between not offloadable and not offloaded. Some of these things are just not implemented or implemnted yet and some of these things cannot be implemented. If your life depended on what you think *should* be happening I guess the distinction is not all that important. In the end, this wranging for how to offload work from the RAC grid to storage will keep all of us guessing. Well, I don’t do all that much guessing I guess. :-)

      There seems to be a large faction of folks who think Oracle is working vigorously to implement offload processing for everything. A total-offload situation is not possible. Between now and the time folks figure that out there will be an unending series of these “mysteries” I think.

      1. True – as of today, index FFS for reverse key indexes and compressed indexes are not offloaded-able.
        The problem with the reverse key or compress index FFS is it reports incorrectly as offload[ed|able] in the execution plan. That’s a bug (bug 12909225). The type of index is known at compile time so it should be possible to correct this. The plan should display INDEX FAST FULL SCAN, not INDEX STORAGE FAST FULL SCAN.

        Should reverse key or compress index FFS be made offloaded-able? Seems like a discussion for Exadata dev teams to have based on customer requirements and programming efforts.

  3. But I thought you didn’t need indexes with Exadata ;)

    1. The test was done against an OLTP schema. That is the start of many evils in DW/BI.

  4. I just found your post, because I saw the same behavior (my index is NOT a reverse key lookup).. Using the full hint caused the scan to offload, without the full hint it did a “INDEX FAST FULL SCAN” and did not offload.
    Here are my notes on it..
    http://bryangrenn.blogspot.com/2011/12/fts-vs-index-scan-on-exadata.html

Blog at WordPress.com.