The performance story that didn’t turn out as one: a case of or-expansion and how the database mitigated the problem on its own

Recently I had a bit of time to study the effect of an optimizer query transformation, the so-called or-expansion in Oracle 19c. I thought there might be performance implications with statements using bind variables transformed in this way. My limited testing suggests that isn’t necessarily the case as the optimiser is remarkably resilient.

Still I thought my investigation was worth writing down, I hope you might agree and better still, find the read enjoyable.

Excuse me, what exactly is an Or-Expansion?

Quoting from the SQL Tuning Guide, during an “or expansion the optimiser transforms a query block containing top-level disjunctions into the form of a UNION ALL query the contains 2 or more branches”.

The remainder of this article sheds some light on the query transformation. By the way, there are many, many other blog articles out there covering or-expansion. Some even include an analysis of a 10053 trace! The subject is also covered in Troubleshooting Oracle Performance by Christian Antognini, a great read.

Example setup

This is one of the few times I deviated from my trusted Swingbench environment. Instead I created the following segments in my 19.12.0 database running on Oracle Linux 7.9/UEK 6.

CREATE TABLE skewed_data_distribution
    AS
        WITH generator AS (
            SELECT
                ROWNUM id
            FROM
                dual
            CONNECT BY
                ROWNUM <= 4000
        )
        SELECT
            ROWNUM                    AS id,
            CASE
                WHEN mod(ROWNUM, 100000) = 0     THEN 1
                WHEN mod(ROWNUM, 1000) = 0       THEN 2
                WHEN mod(ROWNUM, 100) = 0        THEN 3
                ELSE 4
            END                       AS skew,
            lpad('*', 150, '*')       AS pad,
            sysdate + dbms_random.value(-1000,0) as datecol
        FROM
            generator,
            generator
        WHERE
            ROWNUM <= 10e6;

CREATE INDEX i_skew_1 ON
    skewed_data_distribution (
        id
    );

CREATE INDEX i_skew_2 ON
    skewed_data_distribution (
        skew
    );

BEGIN
 dbms_stats.gather_table_stats(
  ownname => 'MARTIN', 
  tabname => 'SKEWED_DATA_DISTRIBUTION',
  method_opt => 'for all columns size auto, for columns size 254 skew',
  degree => 4
 );
END;
/

Thanks to Jonathan Lewis for elaborating on how to create sample data safely.

By the way I didn’t enable any of the fix_controls that come with the 19.12 Release Update (RU). If you just raised an eyebrow, please have look at Mike Dietrich’s blog for details about a potential call to DBMS_OPTIM_BUNDLE after applying a RU.

The query I’ll use for this article is this (it’s designed to trigger an or-expansion).

var the_id number
var the_skew number

WITH q AS (
    SELECT id,
        skew
    FROM
        skewed_data_distribution
    WHERE
        id = :the_id
        OR skew = :the_skew
)
SELECT /*+ gather_plan_statistics opt_param('_b_tree_bitmap_plans','false') */
    COUNT(*)
FROM
    q;

Please ignore the fact that it is a rather useless SQL statement on its own, but it helped me create a test case. All I needed was a table, some indexes and a suitable data distribution as well as a histogram on SKEW, otherwise the optimiser probably wouldn’t have considered the use of the index, but I’m getting ahead of myself.

When I first tested the query I didn’t get the or-expansion I wanted, but rather this:

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------
Plan hash value: 3063879156

---------------------------------------------------------------------------------------------
| Id  | Operation                        | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |          |     1 |     9 |  4872   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE                  |          |     1 |     9 |            |          |
|   2 |   BITMAP CONVERSION COUNT        |          |  2500K|    21M|  4872   (1)| 00:00:01 |
|   3 |    BITMAP OR                     |          |       |       |            |          |
|   4 |     BITMAP CONVERSION FROM ROWIDS|          |       |       |            |          |
|*  5 |      INDEX RANGE SCAN            | I_SKEW_2 |       |       |  4860   (1)| 00:00:01 |
|   6 |     BITMAP CONVERSION FROM ROWIDS|          |       |       |            |          |
|*  7 |      INDEX RANGE SCAN            | I_SKEW_1 |       |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("SKEW"=TO_NUMBER(:THE_SKEW))
   7 - access("ID"=TO_NUMBER(:THE_ID))

20 rows selected.

That bitmap conversion wasn’t quite what I had in mind, hence the opt_param hint in the query text.

Data distribution

The following detail about data distribution might help understand the article better:

REM data distribution: SKEW

SELECT /*+ parallel */
    COUNT(*),
    skew
FROM
    skewed_data_distribution
GROUP BY
    skew;

  COUNT(*)       SKEW
---------- ----------
       100          1
      9900          2
     90000          3
   9900000          4

4 rows selected.

REM Histograms  

SELECT
    column_name,
    histogram,
    num_buckets,
    column_id
FROM
    user_tab_columns
WHERE
    table_name = 'SKEWED_DATA_DISTRIBUTION';

COLUMN_NAM HISTOGRAM           NUM_BUCKETS       COLUMN_ID
---------- --------------- --------------- ---------------
ID         NONE                          1               1
SKEW       FREQUENCY                     4               2
PAD        NONE                          1               3
DATECOL    NONE                          1               4

4 rows selected.

REM Histogram on SKEW

SELECT
    endpoint_number,
    endpoint_value
FROM
    user_tab_histograms
WHERE
        table_name = 'SKEWED_DATA_DISTRIBUTION'
    AND column_name = 'SKEW';

ENDPOINT_NUMBER  ENDPOINT_VALUE
--------------- ---------------
            100               1
          10000               2
         100000               3
       10000000               4

4 rows selected.

Workload

I am going to run 2 scripts a few times to simulate a query workload. The scripts differ in their bind variable values:

$ diff -y skew_unselective.sql skew_selective.sql
var the_id number						var the_id number
var the_skew number						var the_skew number

exec :the_id := 100						exec :the_id := 100
exec :the_skew := 4					      |	exec :the_skew := 1

WITH q AS (							WITH q AS (
    SELECT id,							    SELECT id,
        skew							        skew
    FROM							    FROM
        skewed_data_distribution				        skewed_data_distribution
    WHERE							    WHERE
        id = :the_id						        id = :the_id
        OR skew = :the_skew					        OR skew = :the_skew
)								)
SELECT /*+ gather_plan_statistics opt_param('_b_tree_bitmap_p	SELECT /*+ gather_plan_statistics opt_param('_b_tree_bitmap_p
    COUNT(*)							    COUNT(*)
FROM								FROM
    q;								    q;

Selective bind variables

Using my runMany.sh script I launched 20 instances of the more selective query first. The expected execution plan is as follows:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
SQL_ID  8tjz2tqn9gtck, child number 0
-------------------------------------
WITH q AS (     SELECT id,         skew     FROM
skewed_data_distribution     WHERE         id = :the_id         OR skew
= :the_skew ) SELECT /*+ gather_plan_statistics
opt_param('_b_tree_bitmap_plans','false') */     COUNT(*) FROM     q

Plan hash value: 414689775

---------------------------------------------------------------------------...
| Id  | Operation                              | Name                     |...
---------------------------------------------------------------------------...
|   0 | SELECT STATEMENT                       |                          |...
|   1 |  SORT AGGREGATE                        |                          |...
|   2 |   VIEW                                 | VW_ORE_9774CF0C          |...
|   3 |    UNION-ALL                           |                          |...
|*  4 |     INDEX RANGE SCAN                   | I_SKEW_1                 |...
|*  5 |     TABLE ACCESS BY INDEX ROWID BATCHED| SKEWED_DATA_DISTRIBUTION |...
|*  6 |      INDEX RANGE SCAN                  | I_SKEW_2                 |...
---------------------------------------------------------------------------...

Peeked Binds (identified by position):
--------------------------------------

   1 - :1 (NUMBER): 100
   2 - :2 (NUMBER): 1

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("ID"=:THE_ID)
   5 - filter(LNNVL("ID"=:THE_ID))
   6 - access("SKEW"=:THE_SKEW)


34 rows selected.

In other words, you can witness the result of the query transformation. Also note the VIEW VW_ORE%: it doesn’t exist in the database, it only exists thanks to the query transformation. I think this is a prime use case for the or-expansion: by splitting the or condition in the where clause into separate statements Oracle can make use of the indexes.

Really unselective predicate

So here I thought the trouble might arise: what if instead of a small-ish subset of the rows Oracle has to return the majority of the table instead? It can be as simple as replacing the value for SKEW from a selective-ish predicate to a highly unselective one. Which doesn’t trigger a hard parse due to a change the query text.

Let’s recap. So far, I have 1 execution plan for my statement in the shared pool, covering all executions up to now (29 to be precise):

SELECT
    sql_id,
    plan_hash_value,
    child_number,
    executions,
    is_bind_aware,
    is_bind_sensitive
FROM
    v$sql
WHERE
    sql_id = '8tjz2tqn9gtck';

SQL_ID        PLAN_HASH_VALUE    CHILD_NUMBER      EXECUTIONS I I
------------- --------------- --------------- --------------- - -
8tjz2tqn9gtck       414689775               0              29 N Y

Let’s try the second query. Using runMany.sh I launched skew_unselective.sql 10 times against the database. After they completed, I checked the shared pool again:

SELECT
    sql_id,
    plan_hash_value,
    child_number,
    executions,
    is_bind_aware,
    is_bind_sensitive
FROM
    v$sql
WHERE
    sql_id = '8tjz2tqn9gtck';

SQL_ID        PLAN_HASH_VALUE    CHILD_NUMBER      EXECUTIONS I I
------------- --------------- --------------- --------------- - -
8tjz2tqn9gtck       414689775               0              39 N Y

So here is a problem: The next batch of my queries used the “wrong”, or rather suboptimal plan. This eventually results and longer elapsed time/query. However, during my tests-and I appreciate it’s a bit limited in scope-I noticed that the next time I ran the un-selective query, another child cursor appeared:

SELECT
    sql_id,
    plan_hash_value,
    child_number,
    executions,
    is_bind_aware,
    is_bind_sensitive
FROM
    v$sql
WHERE
    sql_id = '8tjz2tqn9gtck';

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER EXECUTIONS I I
------------- --------------- ------------ ---------- - -
8tjz2tqn9gtck       414689775            0         39 N Y
8tjz2tqn9gtck      1662074091            1          1 N Y

SQL> select * from dbms_xplan.display_cursor('8tjz2tqn9gtck',1);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------
SQL_ID  8tjz2tqn9gtck, child number 1
-------------------------------------
WITH q AS (     SELECT id,         skew     FROM
skewed_data_distribution     WHERE         id = :the_id         OR skew
= :the_skew ) SELECT /*+ gather_plan_statistics
opt_param('_b_tree_bitmap_plans','false') */     COUNT(*) FROM     q

Plan hash value: 1662074091

-----------------------------------------------------------------------------------------------
| Id  | Operation          | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                          |       |       | 66204 (100)|          |
|   1 |  SORT AGGREGATE    |                          |     1 |     9 |            |          |
|*  2 |   TABLE ACCESS FULL| SKEWED_DATA_DISTRIBUTION |  9900K|    84M| 66204   (1)| 00:00:03 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("SKEW"=:THE_SKEW OR "ID"=:THE_ID))


22 rows selected.

Now that’s better! With the totally un-selective predicate it doesn’t make sense to use the index. The full scan is a far better choice. What happens when I run the selective query again?

SQL> @skew_selective

PL/SQL procedure successfully completed.

PL/SQL procedure successfully completed.

  COUNT(*)
----------
       101

Display execution plan for last statement for this session from library cache...

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------
SQL_ID  8tjz2tqn9gtck, child number 2
-------------------------------------
WITH q AS (     SELECT id,         skew     FROM
skewed_data_distribution     WHERE         id = :the_id         OR skew
= :the_skew ) SELECT /*+ gather_plan_statistics
opt_param('_b_tree_bitmap_plans','false') */     COUNT(*) FROM     q

Plan hash value: 414689775

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name                     | Starts | E-Rows |E-Bytes| Cost (%CPU)|
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                          |      1 |        |       |    10 (100)|
|   1 |  SORT AGGREGATE                        |                          |      1 |      1 |       |            |
|   2 |   VIEW                                 | VW_ORE_9774CF0C          |      1 |    101 |       |    10   (0)|
|   3 |    UNION-ALL                           |                          |      1 |        |       |            |
|*  4 |     INDEX RANGE SCAN                   | I_SKEW_1                 |      1 |      1 |     6 |     3   (0)|
|*  5 |     TABLE ACCESS BY INDEX ROWID BATCHED| SKEWED_DATA_DISTRIBUTION |      1 |    100 |   900 |     7   (0)|
|*  6 |      INDEX RANGE SCAN                  | I_SKEW_2                 |      1 |    100 |       |     3   (0)|
------------------------------------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - :1 (NUMBER): 100
   2 - :2 (NUMBER): 1

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("ID"=:THE_ID)
   5 - filter(LNNVL("ID"=:THE_ID))
   6 - access("SKEW"=:THE_SKEW)

SELECT
    sql_id,
    plan_hash_value,
    child_number,
    executions,
    is_bind_aware,
    is_bind_sensitive
FROM
    v$sql
WHERE
    sql_id = '8tjz2tqn9gtck';

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER EXECUTIONS I I
------------- --------------- ------------ ---------- - -
8tjz2tqn9gtck       414689775            0         39 N Y
8tjz2tqn9gtck      1662074091            1          1 N Y
8tjz2tqn9gtck       414689775            2          1 Y Y

After a few more tries the number of child cursors settled down at 4:

SQL> SELECT
    sql_id,
    plan_hash_value,
    child_number,
    executions,
    is_bind_aware,
    is_bind_sensitive
FROM
    v$sql
WHERE
    sql_id = '8tjz2tqn9gtck';

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER EXECUTIONS I I
------------- --------------- ------------ ---------- - -
8tjz2tqn9gtck       414689775            0         39 N Y
8tjz2tqn9gtck      1662074091            1          1 N Y
8tjz2tqn9gtck       414689775            2          5 Y Y
8tjz2tqn9gtck      1662074091            3          7 Y Y

Child cursors 0 and 1 haven’t seen further executions while 2 and 3 do.

Summary

My-admittedly limited-amount of testing suggests that it is indeed possible to use or-expansion successfully even with huge data skew and bind variables in 19.12.0. All of my database’s parameters remained at their default with the exception of _b_tree_bitmap_plans to prevent these from appearing.

And many thanks, as always, to Tanel Poder for sharing his scripts with the community. They have been taken to good use writing this post.