I have been teaching the Enkitec Exadata Administration Class this week and made an interesting observation I thought was worth sharing with regards to IO Resource Management on Exadata.
I have created a Database Resource Manager (DBRM) Plan that specifically puts a resource consumer group to a disadvantage. Actually, quite severely so but the following shouldn’t be a realistic example in the first place: I wanted to prove a point. Hang-on I hear you say: you created a DBRM plan-the post has IORM in the subject though: what gives? Please allow me to explain.
Exadata offers 3 different ways to implement IORM to the keen engineer:
- Intra-Database IORM
- Inter-Database IORM
- Category IORM
The latter 2 need to be implemented on the cells using the cellcli “alter IORMPLAN” directive, documented in the Storage Server User’s Guide in chapter 6. The first one though is pushed down to the cells when activated on the database, an activity that is visible in the cell alert.log. So when you create and enable a DBRM plan it will automatically become an IORM plan as well.
Setup
The code to create the DBRM plan is quite simple. I create a plan named ENKITEC_DBRM and two new consumer groups:
- LOWPRIO_GROUP
- HIGHPRIO_GROUP
HIGHPRIO_GROUP gets 100% of mgmt_p2 where LOWPRIO_GROUP gets 25% of the remaining resources on level mgmt_p3 and has to share them with the OTHER_GROUP as well. Not that you would do that in real life…. There was no change to the cpu_count, which is 24 on an X2 compute node (2s12c24t Westmere Xeon processors).
After the setup was complete I created a SQL script that I’m planning on executing in 50 sessions: 25 in low priority, and 25 in high priority sessions. The script is a simple “select count(*) from reallybigtable” – in this case an 80 GB table with 256 million rows, uncompressed.
Running SQLPLUS concurrently
I wrote a small shell script that is capable of launching a user-configurable number of SQLPLUS sessions against the database and included some timing information to measure the time-to-completion for all sessions. With that at hand I started 25 sessions for the low priority group and 25 for the high priority consumer group. In a third SSH-session I connected to the first cell in the quarter rack (it’s an X2 by the way) and repeatedly executed metric_iorm.pl. This little gem was written for cellserv 11.2.x and lists IORM metrics for each database in the system as well as a summary of IO activity against the cell. I explicitly mentioned cellsv 11.2.x because the new IORM instrumentation for PDBs is missing from the output. So far you can only realistically monitor non-CDBs with it when using RDBMS 12c. The perl script is available from “Tool for Gathering I/O Resource Manager Metrics: metric_iorm.pl (Doc ID 1337265.1)”. Deployed to celladmin’s home I was ready to start the test.
Test result
Executing metric_iorm.pl immediately after initiating the heavy IO load against the cells does not report a lot of useful data. It appears as if the contents of “metriccurrent” isn’t updated straight away but with a slight delay. Note there is no typical resource manager wait event such as “cpu quantum” here:
SQL> select username,event,state,sql_id from v$session where username like '%PRIO'; USERNAME EVENT STATE SQL_ID ------------------------------ ------------------------------ ------------------- ------------- HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITED KNOWN TIME 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITED SHORT TIME 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz HIGHPRIO cell smart table scan WAITING 6q73x4ufb26pz LOWPRIO cell smart table scan WAITING 6q73x4ufb26pz 50 rows selected.
Almost all of the sessions were happily scanning the table, I am not aware of a way of finding out about IORM throttling on the RDBMS level but that doesn’t mean there is a way of finding out! In the end the sessions of LOWPRIO_GROUP took 320 seconds to scan the table, the sessions in HIGHPRIO_GROUP only needed 199 seconds to finish their scans of the table. So clearly IORM was at work here.
IO Performance
The end result was to be expected, but I wanted to know more. In this case the interesting bit is found on the cell. Please note that I only looked at the first cell in this X2 quarter rack, I didn’t check the combined throughput across all cells. The query I executed wasn’t making any use of Exadata features either, it was a brute force scan across all rows in the table in the absence of a where-clause. I just needed a lot of IO to demonstrate the use of IORM.
When all 50 sessions were active I could observe the following information from metric_iorm.pl for my database:
Database: DBM01 Utilization: Small=0% Large=97% Flash Cache: IOPS=45702 Disk Throughput: MBPS=0 Small I/O's: IOPS=0.7 Avg qtime=8.8ms Large I/O's: IOPS=1115 Avg qtime=1941ms Consumer Group: HIGHPRIO_GROUP Utilization: Small=0% Large=94% Flash Cache: IOPS=43474 Disk Throughput: MBPS=0 Small I/O's: IOPS=0.0 Large I/O's: IOPS=1064 Consumer Group: _ORACLE_BACKGROUND_GROUP_ Utilization: Small=0% Large=0% Flash Cache: IOPS=6.6 Disk Throughput: MBPS=0 Small I/O's: IOPS=0.7 Large I/O's: IOPS=0.0 Consumer Group: LOWPRIO_GROUP Utilization: Small=0% Large=3% Flash Cache: IOPS=2222 Disk Throughput: MBPS=52 Small I/O's: IOPS=0.0 Large I/O's: IOPS=50.4
You can see that IORM is in action: the low priority group is literally starved out, which is correct if you consider the plan I implemented. I have previously put my reallybigtable to good use and ended up with a lot of it in flash cache (thanks to cellsrv 11.2.3.3.x+ you benefit from flash cache during smart scans without having to pin the segment to it). Interestingly the disk throughput is 0 MB per second for the high priority group (which looks quite wrong to me, I need to debug the script). The low priority consumer group apparently makes use of spinning disk, although the combined throughput is lower.
After the 25 sessions from the high priority resource consumer group finished, it looked better for the sessions in the low priority group:
Database: DBM01 Utilization: Small=0% Large=88% Flash Cache: IOPS=44351 Disk Throughput: MBPS=0 Small I/O's: IOPS=3.2 Avg qtime=0.0ms Large I/O's: IOPS=1080 Avg qtime=830ms Consumer Group: _ORACLE_BACKGROUND_GROUP_ Utilization: Small=0% Large=0% Flash Cache: IOPS=8.4 Disk Throughput: MBPS=0 Small I/O's: IOPS=3.2 Large I/O's: IOPS=0.0 Consumer Group: LOWPRIO_GROUP Utilization: Small=0% Large=88% Flash Cache: IOPS=44342 Disk Throughput: MBPS=0 Small I/O's: IOPS=0.0 Large I/O's: IOPS=1080
You can no longer see that there are other sessions than background and LOWPRIO_GROUP active. The number of IOPS from Flash Cache went up significantly to the extent that disk isn’t reported to be used anymore. I need to verify that-it doesn’t look quite right to me either.
Summary
IO Resource Manager sadly is a much underused tool and few Exadata users seem to know it is active as soon as you implement a DBRM plan (the objective parameter notwithstanding). I am hoping that this post shows that using DBRM/IORM on Exadata is worth investigating time and effort to as it is a key enabler for database consolidation and more fancy topics such as DBaaS. And you don’t get something comparable on anything else that runs the Oracle software.
Pingback: IO Resource Manager for Pluggable Databases in Exadata « Martins Blog