Prompted by an actual task at hand I spent some time investigating an 11.2.0.2 feature – concurrent statistics gathering. It has been on my to-do list for quite some time but so far I didn’t have a use case, and use cases make it so much easier. The question was-how can I gather statistics on a really large, partitioned table? Previously, you could revert to the degree in dbms_stats.gather_table_stats to ensure that statistics were gathered in parallel. This is all good, but sometimes you need more umph. Some DBAs wrote scripts to execute individual statistic gathering jobs against partitions in parallel, using the tabname and partname arguments in dbms_stats.gather_table_stats(). But that requires manual effort – and the not-quite-so-new concurrent option is so much nicer. Let me take you along the ride… Actually I have to tell the story starting with the happy ending as I had a few snags along the way. This is 12.1.0.2.1 on Oracle Linux 6.5.
Enabling concurrent statistics gathering
A flag needs to be set to enable the new concurrent mechanism for gathering statistics. The initial blog post on the subject set the flag globally:
https://blogs.oracle.com/optimizer/entry/gathering_optimizer_statistics_is_one
Scroll down to “configuration and settings” for the detail. I personally would rather like to set this at table level, but despite the Oracle documentation suggesting that it was possible, it is not. You can check the header of DBMS_STATS to see the different values that can be passed as PNAME to set_table_prefs().
After enabling the concurrent flag on the database level, you can query it:
SQL> select dbms_stats.get_prefs('CONCURRENT') 2 from dual; DBMS_STATS.GET_PREFS('CONCURRENT') ------------------------------------------------------------------------------------ TRUE SQL> select dbms_stats.get_prefs('CONCURRENT',user,'BIGTAB_PART') flag from dual; FLAG ------------------------------------------------------------------------------------- TRUE
We should be ready for using the concurrent method now. Here is some background information about the table in question:
SQL> select partition_name, num_rows, sample_size, last_analyzed 2 from user_tab_partitions where table_name = 'BIGTAB_PART' 3 ; PARTITION_NAME NUM_ROWS SAMPLE_SIZE LAST_ANAL ------------------------------ ---------- ----------- --------- SYS_P1254 SYS_P1253 SYS_P1252 SYS_P1251 SYS_P1250 SYS_P1249 SYS_P1248 SYS_P1247 SYS_P1246 SYS_P1245 P_MAN 11 rows selected. SQL> select partition_name, bytes/power(1024,2) m from user_segments where segment_name = 'BIGTAB_PART'; PARTITION_NAME M ------------------------------ ---------- SYS_P1245 8064 SYS_P1246 8064 SYS_P1247 8064 SYS_P1248 8064 P_MAN 8064 SYS_P1249 8064 SYS_P1250 8064 SYS_P1251 8064 SYS_P1252 8064 SYS_P1253 8064 SYS_P1254 8 11 rows selected.
So each partition is about 8 GB in size. A little helper script prints all the table preferences:
getting table preferences for table BIGTAB_PART value for CASCADE is DBMS_STATS.AUTO_CASCADE value for DEGREE is NULL value for ESTIMATE_PERCENT is DBMS_STATS.AUTO_SAMPLE_SIZE value for METHOD_OPT is FOR ALL COLUMNS SIZE AUTO value for NO_INVALIDATE is DBMS_STATS.AUTO_INVALIDATE value for GRANULARITY is AUTO value for PUBLISH is TRUE value for INCREMENTAL is FALSE value for STALE_PERCENT is 10 value for CONCURRENT is TRUE
Now you can simply execute the call to dbms_stats.gather_table_stats(user, ‘BIGTAB_PART’, degree=>4) and thanks to all the defaults the gathering of statistics will be performed by multiple jobs. In order for this to work you need to have job_queue_processes set to > 4 (the default is 1000 in 12.1.0.2) and you need a resource manager plan to be active. Refer to the oracle blog post referenced earlier for more information about statement queueing and the execution of these jobs. The post also recommends setting parallel_adaptive_multi_user to false, which is the default in 11.2. Just be careful changing PX parameters-these can have undesired side effects.
Using the diagnostic queries take from the Oracle blog post you can see what’s going on. In my case, 6 jobs were created:
select job_name, state, comments from dba_scheduler_jobs where job_class like 'CONC%' JOB_NAME STATE COMMENTS -------------------- --------------- ---------------------------------------------------------------------------------------------------- ST$T3606_6_B2 SCHEDULED "MARTIN"."BIGTAB_PART"."SYS_P1254";"MARTIN"."BIGTAB_PART" ST$T3606_5_B2 RUNNING "MARTIN"."BIGTAB_PART"."SYS_P1252";"MARTIN"."BIGTAB_PART"."SYS_P1253" ST$T3606_4_B2 SCHEDULED "MARTIN"."BIGTAB_PART"."SYS_P1250";"MARTIN"."BIGTAB_PART"."SYS_P1251" ST$T3606_3_B2 RUNNING "MARTIN"."BIGTAB_PART"."SYS_P1248";"MARTIN"."BIGTAB_PART"."SYS_P1249" ST$T3606_2_B2 SCHEDULED "MARTIN"."BIGTAB_PART"."SYS_P1246";"MARTIN"."BIGTAB_PART"."SYS_P1247" ST$T3606_1_B2 RUNNING "MARTIN"."BIGTAB_PART"."P_MAN";"MARTIN"."BIGTAB_PART"."SYS_P1245" 6 rows selected.
By the way if you are interested in the payload, you can use DBMS_METADATA:
SQL> get get_job 1* SELECT DBMS_METADATA.GET_DDL('PROCOBJ', '&1') from dual; SQL>
You pass it the job name and you get some pretty cryptic looking output :)
Interlude
Actually, you have to see this…
DBMS_METADATA.GET_DDL('PROCOBJ','ST$T3982_1') ----------------------------------------------------------------------------------------- BEGIN dbms_scheduler.create_job('"ST$T3982_1"', job_type=>'PLSQL_BLOCK', job_action=> 'declare c dbms_stats.CContext := dbms_stats.CContext(); begin c.extend(30); c(1):=''ST$T3982_1;1;855;1;FALSE;FALSE;''; c(2):=q''#"MARTIN";#''; c(3):=q''#"BIGTAB_PART";#''; c(4):=q''#"SYS_P1250";#''; c(5):=''0;''; c(6):=''FALSE;''; c(7):=q''#FOR ALL COLUMNS SIZE AUTO;#''; c(8):=''NULL;''; c(9):=''PARTITION;''; c(10):=''TRUE;''; c(11):=q''#NULL;#''; c(12):=q''#NULL;#''; c(13):=q''#NULL;#''; c(14):=''NULL;''; c(15):=''DATA;''; c(16):=''FALSE;''; c(17):=''GLOBAL AND PARTITION;''; c(18):=''TRUE;''; c(19):=''0;''; c(20):='';''; c(21):=''FALSE;''; c(22):=''FALSE;''; c(23):=''4;''; c(24):=''ST$T3982;''; c(25):=''TRUE;''; c(26):=''FALSE;''; c(27):=''4.562199;''; c(28):=''228.109937;''; c(29):=''0;''; c(30):=''0;''; dbms_stats.gather_table_stats(''' ','''',context=>c); end;' , number_of_arguments=>0, start_date=>NULL, repeat_interval=> NULL , end_date=>NULL, job_class=>'"CONC_ST$T3982"', enabled=>FALSE, auto_drop=>TRUE,comments=> '"MARTIN"."BIGTAB_PART"."SYS_P1250"' ); dbms_scheduler.set_attribute('"ST$T3982_1"','raise_events',38); dbms_scheduler.enable('"ST$T3982_1"'); COMMIT; END;
This looks better when you parse it a bit more so that it reads:
DECLARE c dbms_stats.CContext := dbms_stats.CContext(); BEGIN c.extend(30); c(1) :='ST$T3982_1;1;855;1;FALSE;FALSE;'; c(2) :=q'#"MARTIN";#'; c(3) :=q'#"BIGTAB_PART";#'; c(4) :=q'#"SYS_P1250";#'; c(5) :='0;'; c(6) :='FALSE;'; c(7) :=q'#FOR ALL COLUMNS SIZE AUTO;#'; c(8) :='NULL;'; c(9) :='PARTITION;'; c(10):='TRUE;'; c(11):=q'#NULL;#'; c(12):=q'#NULL;#'; c(13):=q'#NULL;#'; c(14):='NULL;'; c(15):='DATA;'; c(16):='FALSE;'; c(17):='GLOBAL AND PARTITION;'; c(18):='TRUE;'; c(19):='0;'; c(20):=';'; c(21):='FALSE;'; c(22):='FALSE;'; c(23):='4;'; c(24):='ST$T3982;'; c(25):='TRUE;'; c(26):='FALSE;'; c(27):='4.562199;'; c(28):='228.109937;'; c(29):='0;'; c(30):='0;'; dbms_stats.gather_table_stats('','',context=>c); END;
Interesting!
The concurrent statistics gathering works for incremental statistics as well and is a good way to get the initial set of stats it needs anyway (see Doc ID 1541543.1 for background on incremental statistics gathering).
The end result are statistics!
SQL> select partition_name, num_rows, sample_size, last_analyzed 2 from user_tab_partitions where table_name = 'BIGTAB_PART'; PARTITION_NAME NUM_ROWS SAMPLE_SIZE LAST_ANAL ------------------------------ ---------- ----------- --------- SYS_P1254 32 32 07-NOV-14 SYS_P1253 25600000 25600000 07-NOV-14 SYS_P1252 25600000 25600000 07-NOV-14 SYS_P1251 25600000 25600000 07-NOV-14 SYS_P1250 25600000 25600000 07-NOV-14 SYS_P1249 25600000 25600000 07-NOV-14 SYS_P1248 25600000 25600000 07-NOV-14 SYS_P1247 25600000 25600000 07-NOV-14 SYS_P1246 25600000 25600000 07-NOV-14 SYS_P1245 25600000 25600000 07-NOV-14 P_MAN 25599968 25599968 07-NOV-14 11 rows selected.
Stats not gathered concurrently if partitions are small
This all looks pretty straight forward, except that it wasn’t when I first tried. I set all the parameters as documented but still had no jobs running in the background. The first table I tried to gather statistics on concurrently was partitioned exactly like BIGTAB_PART with the difference that all partitions where compressed for QUERY HIGH. This means that a lot less space is taken for each. I got an idea that segment_size might be reladed, so I compressed BIGTAB_PART and here is the result:
SQL> select partition_name, bytes/power(1024,2) m from user_segments 2 where segment_name = 'BIGTAB_PART'; PARTITION_NAME M ------------------------------ ------------------- P_MAN 321.75 SYS_P1245 321.00 SYS_P1246 322.50 SYS_P1247 321.50 SYS_P1248 320.75 SYS_P1249 322.25 SYS_P1250 322.25 SYS_P1251 322.25 SYS_P1252 322.25 SYS_P1253 322.00 SYS_P1254 8.25 11 rows selected.
When trying to gather statistics concurrently (without changing the table preferences at all), this did not happen. So remember next time you gather statistics concurrently and don’t see multiple jobs for intra-table statistics gathering that your partitions might actually be too small.
Hi Martin,
i have not used concurrent statistics in 12c yet, but according to the Oracle documentation parameter “concurrent” got only 4 valid values (MANUAL, AUTOMATIC, ALL, OFF). I assume that TRUE is still possible for backward compatibility, but don’t know the equivalent/translated behavior in 12c for that.
The Oracle documentation about table prefs is wrong. Christian Antognini mentioned this in his book “Troubleshooting Oracle Performance, 2nd Edition” on page 247 as well (“Note that the parameters autostats_target and concurrent can only be modified with the set_global_prefs procedure.”)
Regards
Stefan
Hi Stefan,
always good to see you passing by. The concurrent setting is a bit odd. It’s not documented in 11.2 dbms_stats (https://docs.oracle.com/cd/E18283_01/appdev.112/e16760/d_stats.htm), but it is documented in the 12c packages and types reference (although incorrectly).
Now that you mentioned it, I had a second look and found it in the Database SQL Tuning Guide, table 12-1:
You can only set the CONCURRENT preference at the global level (see “About Concurrent Statistics Gathering”).