Tag Archives: IORM

IO Resource Manager for Pluggable Databases in Exadata

Another interesting topic that goes over and above the CDB Resource Manager Plans I described earlier this week is the implementation of IORM Plans for Pluggable Databases. Pluggable Databases are an interesting object for studies, and I like to research things. When 12c came out there was no support for offloading initially, it is required that you are on cell software 12.1.x.x.x for full 12c support on Exadata. One aspect I was particularly interested in were PDBs and IORM. Limiting CPU on a PDB basis is already a cool thing, but in a hugely consolidated environment it does make sense to limit I/O based on a consumer group. Another one of the perks of working on Exadata :)

According to what I found out PDBs are treated as entities within the database, and they probably fall into the category of intra-database IORM. I have previously written about how DBRM plans filter down to the cells and become intra-database resource plans. This seems to be happening here, too.

IORM metric definitions

When looking at IORM I’m old fashioned and like to rely on the command line. More GUI oriented people should consider the use of OEM 12 to get similar data but in pretty pictures. But since OEM taps into cellcli under the covers it is only fair trying to understand the underlying technology.

The cells provide a lot of performance information in the metriccurrent and metrichistory views. You are shown metrics based on a name and objectType. For IORM the following objectTypes are of interest in general:

[celladmin@enkcel04 ~]$ cellcli -e "list metricdefinition attributes objectType" | grep IORM | sort | uniq
	 IORM_CATEGORY
	 IORM_CONSUMER_GROUP
	 IORM_DATABASE
	 IORM_PLUGGABLE_DATABASE

This was executed on Exadata 12.1.2.1.1, if you try this on an 11.2.3.x Exadata release you won’t see the IORM_PLUGGABLE_DATABASE category. Question is: which metrics are gathered in 12.1.2.1.1?

CELLCLI> list metricdefinition attributes name,description where objectType = 'IORM_PLUGGABLE_DATABASE'
	 PDB_FC_BY_ALLOCATED	 "Number of megabytes allocated in flash cache for this pluggable database"
	 PDB_FC_IO_BY_SEC   	 "Number of megabytes of I/O per second for this pluggable database to flash cache"
	 PDB_FC_IO_RQ       	 "Number of IO requests issued by this pluggable database to flash cache"
	 PDB_FC_IO_RQ_LG    	 "Number of large IO requests issued by this pluggable database to flash cache"
	 PDB_FC_IO_RQ_LG_SEC	 "Number of large IO requests issued by this pluggable database to flash cache per second"
	 PDB_FC_IO_RQ_SEC   	 "Number of IO requests issued by this pluggable database to flash cache per second"
	 PDB_FC_IO_RQ_SM    	 "Number of small IO requests issued by this pluggable database to flash cache"
	 PDB_FC_IO_RQ_SM_SEC	 "Number of small IO requests issued by this pluggable database to flash cache per second"
	 PDB_FD_IO_BY_SEC   	 "Number of megabytes of I/O per second for this pluggable database to flash disks"
	 PDB_FD_IO_LOAD     	 "Average I/O load from this pluggable database for flash disks"
	 PDB_FD_IO_RQ_LG    	 "Number of large IO requests issued by this pluggable database to flash disks"
	 PDB_FD_IO_RQ_LG_SEC	 "Number of large IO requests issued by this pluggable database to flash disks per second"
	 PDB_FD_IO_RQ_SM    	 "Number of small IO requests issued by this pluggable database to flash disks"
	 PDB_FD_IO_RQ_SM_SEC	 "Number of small IO requests issued by this pluggable database to flash disks per second"
	 PDB_FD_IO_TM       	 "The cumulative latency of reading blocks by this pluggable database from flash disks"
	 PDB_FD_IO_TM_RQ    	 "The rate which is the average latency of reading or writing blocks per request by this pluggable database from flash disks"
	 PDB_FD_IO_UTIL     	 "Percentage of flash resources utilized by this pluggable database"
	 PDB_FD_IO_WT_LG    	 "IORM wait time for large IO requests issued to flash disks by this pluggable database"
	 PDB_FD_IO_WT_LG_RQ 	 "Average IORM wait time per request for large IO requests issued to flash disks by this pluggable database"
	 PDB_FD_IO_WT_SM    	 "IORM wait time for small IO requests issued to flash disks by this pluggable database"
	 PDB_FD_IO_WT_SM_RQ 	 "Average IORM wait time per request for small IO requests issued to flash disks by this pluggable database"
	 PDB_IO_BY_SEC      	 "Number of megabytes of I/O per second for this pluggable database to hard disks"
	 PDB_IO_LOAD        	 "Average I/O load from this pluggable database for hard disks"
	 PDB_IO_RQ_LG       	 "Number of large IO requests issued by this pluggable database to hard disks"
	 PDB_IO_RQ_LG_SEC   	 "Number of large IO requests issued by this pluggable database to hard disks per second"
	 PDB_IO_RQ_SM       	 "Number of small IO requests issued by this pluggable database to hard disks"
	 PDB_IO_RQ_SM_SEC   	 "Number of small IO requests issued by this pluggable database to hard disks per second"
	 PDB_IO_TM_LG       	 "The cumulative latency of reading or writing large blocks by this pluggable database from hard disks"
	 PDB_IO_TM_LG_RQ    	 "The rate which is the average latency of reading or writing large blocks per request by this pluggable database from hard disks"
	 PDB_IO_TM_SM       	 "The cumulative latency of reading or writing small blocks by this pluggable database from hard disks"
	 PDB_IO_TM_SM_RQ    	 "The rate which is the average latency of reading or writing small blocks per request by this pluggable database from hard disks"
	 PDB_IO_UTIL_LG     	 "Percentage of disk resources utilized by large requests from this pluggable database"
	 PDB_IO_UTIL_SM     	 "Percentage of disk resources utilized by small requests from this pluggable database"
	 PDB_IO_WT_LG       	 "IORM wait time for large IO requests issued to hard disks by this pluggable database"
	 PDB_IO_WT_LG_RQ    	 "Average IORM wait time per request for large IO requests issued to hard disks by this pluggable database"
	 PDB_IO_WT_SM       	 "IORM wait time for small IO requests issued to hard disks by this pluggable database"
	 PDB_IO_WT_SM_RQ    	 "Average IORM wait time per request for small IO requests issued to hard disks by this pluggable database"

IORM metrics for PDBs

You still can’t “join” metricdefintition to metriccurrent but it is possible to use the objectType in metriccurrent, too. The way that Oracle externalises information about PDBs is as shown here:

CellCLI> list metriccurrent where objectType = 'IORM_PLUGGABLE_DATABASE' and metricObjectName like 'MBACHMT.*'
	 PDB_FC_BY_ALLOCATED	 MBACHMT.CDB$ROOT   	 1,843 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.PDB$SEED   	 41.500 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH0	 3,772 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH1	 3,374 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH2	 3,280 MB
...

PDB_FC_BY_ALLOCATED translates to “Number of megabytes allocated in flash cache for this pluggable database” as per the above translation. The Metric Object Name therefore is made up of the database name (I continue using MBACHMT as my CDB) and the container name. CDB$ROOT stands for the root, PDB$SEED for the seed database, and then the various PDB names you define. In my example I have “user PDBs” defined as swingbench{0,1,2}.

Stressing it

With the basics covered it is time to run some stress testing. I have created a 6 GB table named IOTEST in all my swingbench* PDBs and will use the same script to issue 80 sessions against that table in each PDB. My CDB resource manager plan is still the same, repeated here for your convenience:

begin
 dbms_resource_manager.clear_pending_area;
 dbms_resource_manager.create_pending_area;
 
 dbms_resource_manager.create_cdb_plan(
  plan => 'ENKITC_CDB_PLAN',
  comment => 'A CDB plan for 12c'
 );
 
 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench0',
  shares => 5,
  utilization_limit => 100);
 
 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench1',
  shares => 3,
  utilization_limit => 50);
 
 dbms_resource_manager.create_cdb_plan_directive(
  plan => 'ENKITC_CDB_PLAN',
  pluggable_database => 'swingbench2',
  shares => 1,
  utilization_limit => 30);
 
 dbms_resource_manager.validate_pending_area;
 dbms_resource_manager.submit_pending_area;
end;
/

If I execute the scripts concurrently (80 sessions connecting against each PDB) and prefixing my scheduler with time command then I get the following results

  • swingbench0 – 143s
  • swingbench1 – 223s
  • swingbench2 – 288s

Interestingly, there is no event that would show I/O throttling in 12.1.0.2.2:

SYS:MBACHMT2> select count(*), con_id, event from v$session where username = 'SOE' group by con_id, event;

   COUNT(*)      CON_ID EVENT
----------- ----------- ----------------------------------------------------------------
         80           4 cell smart table scan
         80           5 cell smart table scan
         80           3 cell smart table scan

So all of them are Smart-Scanning. The fact that some of the sessions are throttled is not visible from the wait interface, or at least I haven’t seen a way to externalise I/O throttling. But it does happen. Using one of my favourite tools, metric_iorm.pl (available from MOS), I noticed the following:

Database: MBACHMT
Utilization:     Small=0%    Large=18%
Flash Cache:     IOPS=13087
Disk Throughput: MBPS=361
Small I/O's:     IOPS=4.0    Avg qtime=0.0ms
Large I/O's:     IOPS=346    Avg qtime=1026ms
	Consumer Group: SWINGBENCH1.SWINGBENCH_GROUP
	Utilization:     Small=0%    Large=4%
	Flash Cache:     IOPS=3482
	Disk Throughput: MBPS=91
	Small I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Large I/O's:     IOPS=87.7    Avg qtime=1336ms
	Consumer Group: SWINGBENCH0.SWINGBENCH_GROUP
	Utilization:     Small=0%    Large=13%
	Flash Cache:     IOPS=8886
	Disk Throughput: MBPS=254
	Small I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Large I/O's:     IOPS=244    Avg qtime=906ms
	Consumer Group: CDB$ROOT._ORACLE_LOWPRIBG_GROUP_
	Utilization:     Small=0%    Large=0%
	Flash Cache:     IOPS=0.8
	Disk Throughput: MBPS=0
	Small I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Large I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Consumer Group: CDB$ROOT.ORA$AUTOTASK
	Utilization:     Small=0%    Large=0%
	Flash Cache:     IOPS=1.4
	Disk Throughput: MBPS=0
	Small I/O's:     IOPS=0.4    Avg qtime=0.0ms
	Large I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Consumer Group: CDB$ROOT._ORACLE_BACKGROUND_GROUP_
	Utilization:     Small=0%    Large=0%
	Flash Cache:     IOPS=0.7
	Disk Throughput: MBPS=0
	Small I/O's:     IOPS=3.6    Avg qtime=0.0ms
	Large I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Consumer Group: SWINGBENCH2.SWINGBENCH_GROUP
	Utilization:     Small=0%    Large=0%
	Flash Cache:     IOPS=717
	Disk Throughput: MBPS=15
	Small I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Large I/O's:     IOPS=14.5    Avg qtime=1152ms

These are statistics from a single cell-this X2-2 has 3 of them. I have also gathered some of the raw stats here in case you are interested, again from a single cell:

CellCLI> list metriccurrent where name like 'PDB.*' and metricObjectName like 'MBACHMT.SW.*' and metricValue not like '0.*';
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH0	 3,779 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH1	 3,467 MB
	 PDB_FC_BY_ALLOCATED	 MBACHMT.SWINGBENCH2	 3,461 MB
	 PDB_FC_IO_BY_SEC   	 MBACHMT.SWINGBENCH0	 835 MB/sec
	 PDB_FC_IO_BY_SEC   	 MBACHMT.SWINGBENCH1	 635 MB/sec
	 PDB_FC_IO_BY_SEC   	 MBACHMT.SWINGBENCH2	 338 MB/sec
	 PDB_FC_IO_RQ       	 MBACHMT.SWINGBENCH0	 4,807,060 IO requests
	 PDB_FC_IO_RQ       	 MBACHMT.SWINGBENCH1	 4,835,038 IO requests
	 PDB_FC_IO_RQ       	 MBACHMT.SWINGBENCH2	 4,833,804 IO requests
	 PDB_FC_IO_RQ_LG    	 MBACHMT.SWINGBENCH0	 4,797,336 IO requests
	 PDB_FC_IO_RQ_LG    	 MBACHMT.SWINGBENCH1	 4,828,945 IO requests
	 PDB_FC_IO_RQ_LG    	 MBACHMT.SWINGBENCH2	 4,827,506 IO requests
	 PDB_FC_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH0	 12,738 IO/sec
	 PDB_FC_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH1	 9,691 IO/sec
	 PDB_FC_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH2	 5,158 IO/sec
	 PDB_FC_IO_RQ_SEC   	 MBACHMT.SWINGBENCH0	 12,789 IO/sec
	 PDB_FC_IO_RQ_SEC   	 MBACHMT.SWINGBENCH1	 9,721 IO/sec
	 PDB_FC_IO_RQ_SEC   	 MBACHMT.SWINGBENCH2	 5,182 IO/sec
	 PDB_FC_IO_RQ_SM    	 MBACHMT.SWINGBENCH0	 9,724 IO requests
	 PDB_FC_IO_RQ_SM    	 MBACHMT.SWINGBENCH1	 6,093 IO requests
	 PDB_FC_IO_RQ_SM    	 MBACHMT.SWINGBENCH2	 6,298 IO requests
	 PDB_FC_IO_RQ_SM_SEC	 MBACHMT.SWINGBENCH0	 51.2 IO/sec
	 PDB_FC_IO_RQ_SM_SEC	 MBACHMT.SWINGBENCH1	 30.2 IO/sec
	 PDB_FC_IO_RQ_SM_SEC	 MBACHMT.SWINGBENCH2	 23.9 IO/sec
	 PDB_FD_IO_BY_SEC   	 MBACHMT.SWINGBENCH0	 835 MB/sec
	 PDB_FD_IO_BY_SEC   	 MBACHMT.SWINGBENCH1	 635 MB/sec
	 PDB_FD_IO_BY_SEC   	 MBACHMT.SWINGBENCH2	 338 MB/sec
	 PDB_FD_IO_LOAD     	 MBACHMT.SWINGBENCH0	 30.3
	 PDB_FD_IO_LOAD     	 MBACHMT.SWINGBENCH1	 30.4
	 PDB_FD_IO_LOAD     	 MBACHMT.SWINGBENCH2	 28.1
	 PDB_FD_IO_RQ_LG    	 MBACHMT.SWINGBENCH0	 4,797,336 IO requests
	 PDB_FD_IO_RQ_LG    	 MBACHMT.SWINGBENCH1	 4,828,945 IO requests
	 PDB_FD_IO_RQ_LG    	 MBACHMT.SWINGBENCH2	 4,827,506 IO requests
	 PDB_FD_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH0	 12,738 IO/sec
	 PDB_FD_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH1	 9,691 IO/sec
	 PDB_FD_IO_RQ_LG_SEC	 MBACHMT.SWINGBENCH2	 5,158 IO/sec
	 PDB_FD_IO_TM       	 MBACHMT.SWINGBENCH0	 69,803,464 us
	 PDB_FD_IO_TM       	 MBACHMT.SWINGBENCH1	 45,061,357 us
	 PDB_FD_IO_TM       	 MBACHMT.SWINGBENCH2	 40,433,099 us
	 PDB_FD_IO_TM_RQ    	 MBACHMT.SWINGBENCH0	 1.0 us/request
	 PDB_FD_IO_TM_RQ    	 MBACHMT.SWINGBENCH1	 1.0 us/request
	 PDB_FD_IO_TM_RQ    	 MBACHMT.SWINGBENCH2	 1.0 us/request
	 PDB_FD_IO_UTIL     	 MBACHMT.SWINGBENCH0	 67 %
	 PDB_FD_IO_UTIL     	 MBACHMT.SWINGBENCH1	 54 %
	 PDB_FD_IO_UTIL     	 MBACHMT.SWINGBENCH2	 29 %
	 PDB_FD_IO_WT_LG    	 MBACHMT.SWINGBENCH0	 830,669 ms
	 PDB_FD_IO_WT_LG    	 MBACHMT.SWINGBENCH1	 717,211 ms
	 PDB_FD_IO_WT_LG    	 MBACHMT.SWINGBENCH2	 221,666 ms
	 PDB_IO_BY_SEC      	 MBACHMT.SWINGBENCH0	 380 MB/sec
	 PDB_IO_BY_SEC      	 MBACHMT.SWINGBENCH1	 305 MB/sec
	 PDB_IO_BY_SEC      	 MBACHMT.SWINGBENCH2	 151 MB/sec
	 PDB_IO_LOAD        	 MBACHMT.SWINGBENCH0	 2.4
	 PDB_IO_LOAD        	 MBACHMT.SWINGBENCH1	 2.6
	 PDB_IO_LOAD        	 MBACHMT.SWINGBENCH2	 2.3
	 PDB_IO_RQ_LG       	 MBACHMT.SWINGBENCH0	 105,784 IO requests
	 PDB_IO_RQ_LG       	 MBACHMT.SWINGBENCH1	 88,549 IO requests
	 PDB_IO_RQ_LG       	 MBACHMT.SWINGBENCH2	 61,617 IO requests
	 PDB_IO_RQ_LG_SEC   	 MBACHMT.SWINGBENCH0	 365 IO/sec
	 PDB_IO_RQ_LG_SEC   	 MBACHMT.SWINGBENCH1	 292 IO/sec
	 PDB_IO_RQ_LG_SEC   	 MBACHMT.SWINGBENCH2	 145 IO/sec
	 PDB_IO_TM_LG       	 MBACHMT.SWINGBENCH0	 3,822,888,945 us
	 PDB_IO_TM_LG       	 MBACHMT.SWINGBENCH1	 3,355,167,650 us
	 PDB_IO_TM_LG       	 MBACHMT.SWINGBENCH2	 2,004,747,904 us
	 PDB_IO_TM_LG_RQ    	 MBACHMT.SWINGBENCH0	 1.0 us/request
	 PDB_IO_TM_LG_RQ    	 MBACHMT.SWINGBENCH1	 1.0 us/request
	 PDB_IO_TM_LG_RQ    	 MBACHMT.SWINGBENCH2	 1.0 us/request
	 PDB_IO_UTIL_LG     	 MBACHMT.SWINGBENCH0	 27 %
	 PDB_IO_UTIL_LG     	 MBACHMT.SWINGBENCH1	 20 %
	 PDB_IO_UTIL_LG     	 MBACHMT.SWINGBENCH2	 9 %
	 PDB_IO_WT_LG       	 MBACHMT.SWINGBENCH0	 108,668,272 ms
	 PDB_IO_WT_LG       	 MBACHMT.SWINGBENCH1	 105,099,717 ms
	 PDB_IO_WT_LG       	 MBACHMT.SWINGBENCH2	 132,192,319 ms
	 PDB_IO_WT_LG_RQ    	 MBACHMT.SWINGBENCH0	 1,655 ms/request
	 PDB_IO_WT_LG_RQ    	 MBACHMT.SWINGBENCH1	 2,979 ms/request
	 PDB_IO_WT_LG_RQ    	 MBACHMT.SWINGBENCH2	 4,921 ms/request

Some of the numbers don’t seem to make sense here, for example PDB_FD_IO_RQ_LG as the values are very similar. This is actually a feature (really!), because some metrics are cumulative, and some are instantaneous:

CELLCLI> list metricdefinition where name = 'PDB_FD_IO_RQ_LG' detail
	 name:              	 PDB_FD_IO_RQ_LG
	 description:       	 "Number of large IO requests issued by this pluggable database to flash disks"
	 metricType:        	 Cumulative
	 objectType:        	 IORM_PLUGGABLE_DATABASE
	 unit:              	 "IO requests"

So this is a cumulative metric. Others, like PDB_IO_RQ_LG_SEC measure the state “as is”:

CELLCLI> list metricdefinition where name = 'PDB_IO_RQ_LG_SEC' detail
	 name:              	 PDB_IO_RQ_LG_SEC
	 description:       	 "Number of large IO requests issued by this pluggable database to hard disks per second"
	 metricType:        	 Rate
	 objectType:        	 IORM_PLUGGABLE_DATABASE
	 unit:              	 IO/sec

Have fun!

Advertisement

Intra-Database IORM in action

I have been teaching the Enkitec Exadata Administration Class this week and made an interesting observation I thought was worth sharing with regards to IO Resource Management on Exadata.

I have created a Database Resource Manager (DBRM) Plan that specifically puts a resource consumer group to a disadvantage. Actually, quite severely so but the following shouldn’t be a realistic example in the first place: I wanted to prove a point. Hang-on I hear you say: you created a DBRM plan-the post has IORM in the subject though: what gives? Please allow me to explain.

Exadata offers 3 different ways to implement IORM to the keen engineer:

  • Intra-Database IORM
  • Inter-Database IORM
  • Category IORM

The latter 2 need to be implemented on the cells using the cellcli “alter IORMPLAN” directive, documented in the Storage Server User’s Guide in chapter 6. The first one though is pushed down to the cells when activated on the database, an activity that is visible in the cell alert.log. So when you create and enable a DBRM plan it will automatically become an IORM plan as well.

Setup

The code to create the DBRM plan is quite simple. I create a plan named ENKITEC_DBRM and two new consumer groups:

  • LOWPRIO_GROUP
  • HIGHPRIO_GROUP

HIGHPRIO_GROUP gets 100% of mgmt_p2 where LOWPRIO_GROUP gets 25% of the remaining resources on level mgmt_p3 and has to share them with the OTHER_GROUP as well. Not that you would do that in real life…. There was no change to the cpu_count, which is 24 on an X2 compute node (2s12c24t Westmere Xeon processors).

After the setup was complete I created a SQL script that I’m planning on executing in 50 sessions: 25 in low priority, and 25 in high priority sessions. The script is a simple “select count(*) from reallybigtable” – in this case an 80 GB table with 256 million rows, uncompressed.

Running SQLPLUS concurrently

I wrote a small shell script that is capable of launching a user-configurable number of SQLPLUS sessions against the database and included some timing information to measure the time-to-completion for all sessions. With that at hand I started 25 sessions for the low priority group and 25 for the high priority consumer group. In a third SSH-session I connected to the first cell in the quarter rack (it’s an X2 by the way) and repeatedly executed metric_iorm.pl. This little gem was written for cellserv 11.2.x and lists IORM metrics for each database in the system as well as a summary of IO activity against the cell. I explicitly mentioned cellsv 11.2.x because the new IORM instrumentation for PDBs is missing from the output. So far you can only realistically monitor non-CDBs with it when using RDBMS 12c. The perl script is available from “Tool for Gathering I/O Resource Manager Metrics: metric_iorm.pl (Doc ID 1337265.1)”. Deployed to celladmin’s home I was ready to start the test.

Test result

Executing metric_iorm.pl immediately after initiating the heavy IO load against the cells does not report a lot of useful data. It appears as if the contents of “metriccurrent” isn’t updated straight away but with a slight delay. Note there is no typical resource manager wait event such as “cpu quantum” here:

SQL> select username,event,state,sql_id from v$session where username like '%PRIO';

USERNAME                       EVENT                          STATE               SQL_ID
------------------------------ ------------------------------ ------------------- -------------
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITED KNOWN TIME   6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITED SHORT TIME   6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz

50 rows selected.

Almost all of the sessions were happily scanning the table, I am not aware of a way of finding out about IORM throttling on the RDBMS level but that doesn’t mean there is a way of finding out! In the end the sessions of LOWPRIO_GROUP took 320 seconds to scan the table, the sessions in HIGHPRIO_GROUP only needed 199 seconds to finish their scans of the table. So clearly IORM was at work here.

IO Performance

The end result was to be expected, but I wanted to know more. In this case the interesting bit is found on the cell. Please note that I only looked at the first cell in this X2 quarter rack, I didn’t check the combined throughput across all cells. The query I executed wasn’t making any use of Exadata features either, it was a brute force scan across all rows in the table in the absence of a where-clause. I just needed a lot of IO to demonstrate the use of IORM.

When all 50 sessions were active I could observe the following information from metric_iorm.pl for my database:

Database: DBM01
Utilization:     Small=0%    Large=97%
Flash Cache:     IOPS=45702
Disk Throughput: MBPS=0
Small I/O's:     IOPS=0.7    Avg qtime=8.8ms
Large I/O's:     IOPS=1115    Avg qtime=1941ms
        Consumer Group: HIGHPRIO_GROUP
        Utilization:     Small=0%    Large=94%
        Flash Cache:     IOPS=43474
        Disk Throughput: MBPS=0
        Small I/O's:     IOPS=0.0 
        Large I/O's:     IOPS=1064 
        Consumer Group: _ORACLE_BACKGROUND_GROUP_
        Utilization:     Small=0%    Large=0%
        Flash Cache:     IOPS=6.6
        Disk Throughput: MBPS=0
        Small I/O's:     IOPS=0.7
        Large I/O's:     IOPS=0.0
        Consumer Group: LOWPRIO_GROUP
        Utilization:     Small=0%    Large=3%
        Flash Cache:     IOPS=2222
        Disk Throughput: MBPS=52
        Small I/O's:     IOPS=0.0
        Large I/O's:     IOPS=50.4

You can see that IORM is in action: the low priority group is literally starved out, which is correct if you consider the plan I implemented. I have previously put my reallybigtable to good use and ended up with a lot of it in flash cache (thanks to cellsrv 11.2.3.3.x+ you benefit from flash cache during smart scans without having to pin the segment to it). Interestingly the disk throughput is 0 MB per second for the high priority group (which looks quite wrong to me, I need to debug the script). The low priority consumer group apparently makes use of spinning disk, although the combined throughput is lower.

After the 25 sessions from the high priority resource consumer group finished, it looked better for the sessions in the low priority group:

Database: DBM01
Utilization:     Small=0%    Large=88%
Flash Cache:     IOPS=44351
Disk Throughput: MBPS=0
Small I/O's:     IOPS=3.2    Avg qtime=0.0ms
Large I/O's:     IOPS=1080    Avg qtime=830ms
        Consumer Group: _ORACLE_BACKGROUND_GROUP_
        Utilization:     Small=0%    Large=0%
        Flash Cache:     IOPS=8.4
        Disk Throughput: MBPS=0
        Small I/O's:     IOPS=3.2  
        Large I/O's:     IOPS=0.0
        Consumer Group: LOWPRIO_GROUP
        Utilization:     Small=0%    Large=88%
        Flash Cache:     IOPS=44342
        Disk Throughput: MBPS=0
        Small I/O's:     IOPS=0.0
        Large I/O's:     IOPS=1080

You can no longer see that there are other sessions than background and LOWPRIO_GROUP active. The number of IOPS from Flash Cache went up significantly to the extent that disk isn’t reported to be used anymore. I need to verify that-it doesn’t look quite right to me either.

Summary

IO Resource Manager sadly is a much underused tool and few Exadata users seem to know it is active as soon as you implement a DBRM plan (the objective parameter notwithstanding). I am hoping that this post shows that using DBRM/IORM on Exadata is worth investigating time and effort to as it is a key enabler for database consolidation and more fancy topics such as DBaaS. And you don’t get something comparable on anything else that runs the Oracle software.