Martins Blog

Trying to explain complex things in simple terms

Little things worth knowing-troubleshooting parallel statement queueing

Posted by Martin Bach on October 9, 2014

This is yet another one of these posts that hopefully help you as much as they are going to help me in the future. Recently I enjoyed troubleshooting a problem related to parallel execution. Since I have never really written down how to tackle such a problem I thought it might be nice to do that now.

This is 12.1.0.2.0 on Exadata, but the platform doesn’t really matter for the troubleshooting technique.

What is parallel statement queueing

Statement queueing is a pretty nifty tool new with 11.2. Instead of downgrading the requested Degree of Parallelism (DOP) as it was the case with parallel_adaptive_multi_user your session will queue until the requested number of parallel servers becomes available. The rationale behind this feature is that you might end up queueing for 1 minute until all the requested PX servers become available, but once they are available your session can finish in 2 minutes with a DOP of 16 rather than 8 minutes with a (old behaviour) downgraded DOP of 2 for example. The 1 minute was well spent waiting. If you want to fine tune statement queueing, database resource manager allows you to.

The downside is that you don’t get statement queueing without enabling automatic DOP unless you play with underscore parameters. Automatic DOP is quite often the reason why you see statement queueing in the first place.

How does this statement queueing work?

With parallel_degree_policy set to AUTO (warning again-this can be dangerous!) the optimiser estimates the time it takes to execute a SQL statement during the hard parse. If the estimated execution time exceeds parallel_min_time_threshold (default is AUTO which is 10 seconds) then the optimiser will try to work out the “optimal” DOP which is calculated as min(parallel_degree_limit,  calculated ideal DOP). The default parallel_degree_limit is set to CPU, which is documented as follows: “The maximum degree of parallelism is limited by the number of CPUs in the system”. Ooops-that can be a lot. It might be better to set the parallel_degree_limit to an integer, signifying the maximum allowed DOP for a statement executing in parallel. On Exadata you don’t normally need very high degrees of parallelism, so a more conservative settings might be in order.

Queueing

When does the queueing mechanism kick in? Let’s imagine that you have parallel_max_servers = parallel_servers_target = 48 defined in your system, in my case an Exadata X2 with 2s12c24t Westmere EP processors. This isn’t the default by the way. This setting allows you to execute queries 48-way parallel per instance. Let’s assume that 40 query slaves are in use already, and another user wants to execute a query 12-way parallel. The sum of currently executing parallel slaves + the requested number of slaves exceeds parallel_servers_target, triggering statement queueing for your session.

Wolfgang it!

To create a reproducible test case I set parallel_max_servers to 48 and parallel_servers_target to 24. I also enabled automatic DOP in my session. My test queries are directed at a 256 million rows table of approximately 80GB in size, non partitioned. Before executing the first SQL statement I enabled event 10053 to Wolfgang my query.

SQL> alter session set events '10053 trace name context forever, level 1';

Session altered.

Elapsed: 00:00:00.02
SQL> desc bigtab
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                                 NUMBER
 V1                                                 VARCHAR2(40)
 N1                                                 NUMBER
 N2                                                 NUMBER
 N_256K                                             NUMBER
 N_128K                                             NUMBER
 N_8K                                               NUMBER
 PADDING                                            VARCHAR2(960)

SQL> select 1 + sum(n1) + sum (n2) - sum(n_8k) from bigtab where id > 1;

1+SUM(N1)+SUM(N2)-SUM(N_8K)
---------------------------
                 1.0471E+13

Elapsed: 00:00:35.59

The 10053 trace should give me some idea about the DOP. Let’s have a look at it:

     36 **************************
     37 Automatic degree of parallelism (AUTODOP)
     38 **************************
     39 Automatic degree of parallelism is enabled for this statement in auto mode.
     40 kkopqSetForceParallelProperties: Hint:no
     41 Query: compute:yes forced:no forceDop:0
     42 Calibration statistics is enabled.
     43 Start with a serial pass, cost the DOP to use

   1935 AUTO DOP PLANS EVALUATION
   1936 ***************************************
   1937 Compilation completed with Dop: 1.
   1938   Cost_io:  2776823.000000  Cost_cpu: 154935265907
   1939   Card:     1.000000  Bytes:    19.000000
   1940   Cost:     2781428.613003  Est_time:  401168ms
   1941 kkopqCombineDop: Dop:40 Hint:no
   1942 Query: compute:yes forced:no  scanDop:40 cpuDop:1 forceDop:0
   1943 Serial plan is expensive enough to be a candidate for parallelism (2781429)
   1944 AutoDOP: kkopqAdjustDop: dop is 40 after affinity normalization
   1945 Signal reparse with DOP 40.
   1946 *****************************
   1947 Number of Compilations tried: 1
   1948 *****************************

   3853 AUTO DOP PLANS EVALUATION
   3854 ***************************************
   3855 Compilation completed with Dop: 40.
   3856   Cost_io:  77133.972222  Cost_cpu: 2387701756
   3857   Card:     1.000000  Bytes:    19.000000
   3858   Cost:     77204.949157  Est_time:  11135ms
   3859 kkopqCombineDop: Dop:80 Hint:no
   3860 Query: compute:yes forced:no  scanDop:40 cpuDop:80 forceDop:0
   3861 AutoDOP: kkopqAdjustDop: computed dop before applying degree limit was 80 and now it is 48
   3862 AutoDOP: kkopqAdjustDop: dop is 48 after affinity normalization
   3863 Adopt new plan with dop=40
   3864 Signal reparse with DOP 48.
   3865 *****************************
   3866 Number of Compilations tried: 2
   3867 *****************************

   5772 AUTO DOP PLANS EVALUATION
   5773 ***************************************
   5774 Compilation completed with Dop: 48.
   5775   Cost_io:  64278.310185  Cost_cpu: 1989751463
   5776   Card:     1.000000  Bytes:    19.000000
   5777   Cost:     64337.457631  Est_time:  9279ms
   5778 kkopqCombineDop: Dop:80 Hint:no
   5779 Query: compute:yes forced:no  scanDop:40 cpuDop:80 forceDop:0
   5780 Adopt new plan with dop=48
   5781 kkopqSetDopReason: Reason why we chose this DOP is: degree limit.
   5782 Costing completed. DOP chosen: 48.
   5783 AutoDop: kkopqSetMaxDopInCursorEnv:In the Cursor Session Env, max DOP is set to 48
   5784 *****************************
   5785 Number of Compilations tried: 3
   5786 *****************************

   5811 ============
   5812 Plan Table
   5813 ============
   5814 --------------------------------------------------+-----------------------------------+-------------------------+
   5815 | Id  | Operation                       | Name    | Rows  | Bytes | Cost  | Time      |  TQ  |IN-OUT|PQ Distrib |
   5816 --------------------------------------------------+-----------------------------------+-------------------------+
   5817 | 0   | SELECT STATEMENT                |         |       |       |   63K |           |      |      |           |
   5818 | 1   |  SORT AGGREGATE                 |         |     1 |    19 |       |           |      |      |           |
   5819 | 2   |   PX COORDINATOR                |         |       |       |       |           |      |      |           |
   5820 | 3   |    PX SEND QC (RANDOM)          | :TQ10000|     1 |    19 |       |           |:Q1000| P->S |QC (RANDOM)|
   5821 | 4   |     SORT AGGREGATE              |         |     1 |    19 |       |           |:Q1000| PCWP |           |
   5822 | 5   |      PX BLOCK ITERATOR          |         |  276M | 5248M |   63K |  00:02:09 |:Q1000| PCWC |           |
   5823 | 6   |       TABLE ACCESS STORAGE FULL | BIGTAB  |  276M | 5248M |   63K |  00:02:09 |:Q1000| PCWP |           |
   5824 --------------------------------------------------+-----------------------------------+-------------------------+
   5825 Predicate Information:
   5826 ----------------------
   5827 6 - storage(:Z>=:Z AND :Z<=:Z AND "ID">1)
   5828 6 - filter("ID">1)
   5829
   5830 Content of other_xml column
   5831 ===========================
   5832   derived_cpu_dop: 80
   5833   derived_io_dop : 40
   5834   dop_reason     : degree limit
   5835   dop            : 48
   5836   px_in_memory_imc: no
   5837   px_in_memory   : no
   5838   io_rate        : 200
   5839   derived_io_dop : 40
   5840   cpu_rate       : 1000
   5841   derived_cpu_dop: 52
   5842   cpu_rate       : 1000
   5843   derived_cpu_dop: 29
   5844   cpu_rate       : 1000
   5845   derived_cpu_dop: 29
   5846   db_version     : 12.1.0.2
   5847   parse_schema   : MARTIN
   5848   dynamic_sampling: 11
   5849   plan_hash_full : 1216263238
   5850   plan_hash      : 4050997022
   5851   plan_hash_2    : 1216263238
   5852   Outline Data:
   5853   /*+
   5854     BEGIN_OUTLINE_DATA
   5855       IGNORE_OPTIM_EMBEDDED_HINTS
   5856       OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
   5857       DB_VERSION('12.1.0.2')
   5858       OPT_PARAM('optimizer_dynamic_sampling' 11)
   5859       ALL_ROWS
   5860       SHARED(48)
   5861       OUTLINE_LEAF(@"SEL$1")
   5862       FULL(@"SEL$1" "BIGTAB"@"SEL$1")
   5863     END_OUTLINE_DATA
   5864   */

There you have it! The v$sql_plan.other_xml column contains all the necessary information-no need for a 10053 trace :). The fanciest way I can come up with to get that information is to use this statement:

select t.*
 from v$sql_plan v,
  xmltable(
	'/other_xml/info'
	passing xmltype(v.other_xml)
	columns
		info_type varchar2(30) path '@type',
		info_value varchar2(30) path '/info'
  ) t
where v.sql_id = '&sql_id'
  and v.child_number = &child_number
  and other_xml is not null;

INFO_TYPE                      INFO_VALUE
------------------------------ ------------------------------
derived_cpu_dop                51
derived_io_dop                 40
dop_reason                     degree limit
dop                            48
px_in_memory_imc               no
px_in_memory                   no
io_rate                        200
derived_io_dop                 40
cpu_rate                       1000
derived_cpu_dop                26
cpu_rate                       1000
derived_cpu_dop                26
cpu_rate                       1000
derived_cpu_dop                26
db_version                     12.1.0.2
parse_schema                   "MARTIN"
plan_hash_full                 1216263238
plan_hash                      4050997022
plan_hash_2                    1216263238

19 rows selected.

But I digress… By the way-this statement used 48 PX server processes because it was the only one active at the time.

Causing trouble

After hitting publish I noticed that I copied the wrong example…

An example working better for the purpose of demonstrating the effect of queueing uses a more realistic case where statements use different DOPs. Two sessions are enough, one that will use 12 slaves, and another that requests 36. The statement with a DOP of 12 is started first, the one with a DOP of 36 a few seconds later. In a third session I gathered some diagnostic information:

SYS:dbm011> select px.sid, px.serial#,px.qcsid,px.qcserial#,px.qcinst_id,px.degree,px.req_degree,
  2  s.username, s.sql_id, s.sql_child_number, s.event, s.state
  3  from v$px_session px, v$session s
  4  where s.sid = px.sid
  5    and s.serial# = px.serial#
  6  order by px.qcsid;

        SID     SERIAL#       QCSID   QCSERIAL#   QCINST_ID      DEGREE  REQ_DEGREE USERNAME             SQL_ID        SQL_CHILD_NUMBER EVENT                                                            STATE
----------- ----------- ----------- ----------- ----------- ----------- ----------- -------------------- ------------- ---------------- ---------------------------------------------------------------- --------------------
        265       14828         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITED SHORT TIME
        330       21719         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        395       59827         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITED SHORT TIME
        461       17363         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        525        4955         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        592        2687         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        526       60805         526                                                 MARTIN               9md3ukhghgj58                0 PX Deq: Execute Reply                                            WAITING
        720       27626         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        786       47540         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITED KNOWN TIME
        850       41110         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITED SHORT TIME
        918       46836         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        983       12699         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING
        657       22654         526       60805           1          12          12 MARTIN               9md3ukhghgj58                0 cell smart table scan                                            WAITING

13 rows selected.

SYS:dbm011> select count(qcsid),qcsid from v$px_session group by qcsid;

COUNT(QCSID)       QCSID
------------ -----------
          13         526

SYS:dbm011>
SYS:dbm011> select count(event),event from v$session where username = 'MARTIN' group by event;

COUNT(EVENT) EVENT
------------ ----------------------------------------------------------------
          12 cell smart table scan
           1 PX Deq: Execute Reply
           1 resmgr:pq queued

SYS:dbm011>

So my first session is well underway, 12 way parallel as requested. The other session had to wait but kicked into motion as soon as my first session finished.

SYS:dbm011> select px.sid, px.serial#,px.qcsid,px.qcserial#,px.qcinst_id,px.degree,px.req_degree,
  2  s.username, s.sql_id, s.sql_child_number, s.event, s.state
  3  from v$px_session px, v$session s
  4  where s.sid = px.sid
  5    and s.serial# = px.serial#
  6  order by px.qcsid;

        SID     SERIAL#       QCSID   QCSERIAL#   QCINST_ID      DEGREE  REQ_DEGREE USERNAME             SQL_ID        SQL_CHILD_NUMBER EVENT                                                            STATE
----------- ----------- ----------- ----------- ----------- ----------- ----------- -------------------- ------------- ---------------- ---------------------------------------------------------------- --------------------
        265       43575         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        525       17778         525                                                 MARTIN               4dz3dapm4uy8q                0 PX Deq: Execute Reply                                            WAITING
        395       10504         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        462       57813         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        526       12648         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        591       42643         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        657        5236         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        720       19243         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        786       61636         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        850       17734         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED KNOWN TIME
        918       60016         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        983       24489         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
       1044       41056         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
       1108        8936         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
       1178       52303         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED KNOWN TIME
       1242       37008         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
       1307       28798         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED SHORT TIME
       1372       22172         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED SHORT TIME
       1437       59990         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED SHORT TIME
       1501        9307         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
          6       39400         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
         72       28569         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        137       18368         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        202       56832         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        333        6482         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        397       28334         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        461       17550         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        524         958         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        593       41235         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        656       51686         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED KNOWN TIME
        719       40346         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED SHORT TIME
        785        5157         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        851       35243         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITED KNOWN TIME
        917       45635         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        980       18504         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
       1046       27779         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING
        330       10724         525       17778           1          36          36 MARTIN               4dz3dapm4uy8q                0 cell smart table scan                                            WAITING

37 rows selected.

SYS:dbm011>
SYS:dbm011> select count(qcsid),qcsid from v$px_session group by qcsid;

COUNT(QCSID)       QCSID
------------ -----------
          37         525

SYS:dbm011>
SYS:dbm011> select count(event),event from v$session where username = 'MARTIN' group by event;

COUNT(EVENT) EVENT
------------ ----------------------------------------------------------------
          36 cell smart table scan
           1 PX Deq: Execute Reply

The effect of queueing can be seen in the extended execution time:

  • 12 way parallel-no queueing: Elapsed: 00:00:07.16
  • 36 way parallel-with a bit of queueing: Elapsed: 00:00:12.92

V$PX_SESSION allows you to identify the culprit causing other sessions to queue-have a look at it and see if you can optimise the statement or run it later. This should unblock you. Also have a look at parallel_servers_target- you might have limited yourself too much. I personally don’t exceed the core/thread count on a box by too much to avoid scheduling problems.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: