Travel time is writing time and I have the perfect setting for a techie post. Actually I got quite excited about the subject causing the article to get a bit longer than initially anticipated. In this part you can read about block cleanouts when using buffered I/O. The next part will show how this works using direct path reads and Smart Scans.
The article ultimately aims at describing the enhancements Exadata brings to the table for direct path reads and delayed block cleanouts. Delayed block cleanouts are described in Jonathan Lewis’s “Oracle Core”, and in one of his blog posts, so here’s just a summary.
The delayed block cleanout
In a nutshell Oracle’s database writer (the process persisting your transactions to disk) is free to write blocks to disk in batches when it has to be done. A commit on its own won’t trigger a write of dirty (or modified) block to disk. If it were to trigger the write, the commit-time would be proportional to the number of blocks affected by the last transaction. The commit command however completes very quickly regardless of how much data has been affected. Unlike a rollback … It is entirely possible that a block modified by a transaction is written to disk before the transaction has been completed. A little later, once the commit has been acknowledged by Oracle there is no process that would read the blocks back into the buffer cache and clear them out-this happens later. It would also be quite inefficient to do so.
Defining “clean”
Now when database writer writes to disk it is possible that the block just written has an active transaction recorded in its header. You can see this by dumping a block with an active transaction – the ITL in the header will reference the XID, UBA and the number of rows affected plus other information. The individual rows that are part of the transaction have their lock byte set as you can see in the row directory. The number in the lb field refers back to an ITL you see further up in the block dump (don’t worry I’ll show you an example shortly).
What happened in Oracle before direct path reads became a lot more common is this:
- A user starts a transaction, for example by updating a large portion of the table
- In the meantime the database writer flushes some blocks to disk, including some of the ones affected by the transaction.
- The user commits a transaction
- The next user queries the block (let’s assume it is not in the buffer cache)
- The second session’s foreground process reads the block
- Realises that it has an active transaction recorded in it
- Checks if that transaction is still active
- And clears the block out if not
- The block with a valid CR image is now in the buffer cache
Huh?
Sounds too abstract-I agree? Let’s have an example.
SQL> select table_name,num_rows,compression,partitioned from tabs 2 where table_name = 'T1_100k'; TABLE_NAME NUM_ROWS COMPRESS PAR ------------------------------ ---------- -------- --- T1_100K 500000 DISABLED NO SQL> update T1_100k set state = 'MODIFIED'; 5000000 rows updated.
An easy way to simulate the flushing of blocks to disk is a slightly brute-force approach of flushing the buffer cache. Only recommended in the lab, really. Let’s have a look at what the block looks like:
SQL> select dbms_rowid.rowid_block_number(rowid), dbms_rowid.rowid_relative_fno(rowid) 2 from t1_100k where rownum < 11; DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID) ------------------------------------ ------------------------------------ 3940811 5 3940811 5 3940811 5 3940811 5 3940811 5 3940811 5 3940812 5 3940812 5 3940812 5 3940812 5 10 rows selected. Elapsed: 00:00:00.01 SQL> alter system dump datafile 5 block 3940811; System altered. Elapsed: 00:00:00.01 SQL> select value from v$diag_info where name like 'Default%'; VALUE --------------------------------------------------------------------------- /u01/app/oracle/diag/rdbms/dbm01/dbm011/trace/dbm011_ora_94351.trc Elapsed: 00:00:00.02
The block dump for block 3940811 on datafile 5 (users tablespace) is now in that trace file:
Block header dump: 0x017c21cb Object id on Block? Y seg/obj: 0x12a13 csc: 0x00.2191ff7 itc: 3 flg: E typ: 1 - DATA brn: 0 bdba: 0x17c21c8 ver: 0x01 opc: 0 inc: 0 exflg: 0 Itl Xid Uba Flag Lck Scn/Fsc 0x01 0xffff.000.00000000 0x00000000.0000.00 C--- 0 scn 0x0000.02191d10 0x02 0x0009.005.00002575 0x0003688a.0ef1.1f ---- 6 fsc 0x0000.00000000 0x03 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000 bdba: 0x017c21cb data_block_dump,data header at 0x7f026361707c =============== tsiz: 0x1f80 hsiz: 0x1e pbl: 0x7f026361707c 76543210 flag=-------- ntab=1 nrow=6 ... 0xe:pti[0] nrow=6 offs=0 0x12:pri[0] offs=0x1b75 0x14:pri[1] offs=0x176a 0x16:pri[2] offs=0x135f 0x18:pri[3] offs=0xf54 0x1a:pri[4] offs=0xb49 0x1c:pri[5] offs=0x73e block_row_dump: tab 0, row 0, @0x1b75 tl: 1035 fb: --H-FL-- lb: 0x2 cc: 6 col 0: [ 2] c1 02 col 1: [999] 31 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 ...
The significant pieces of information here are in the ITL (interested transaction list) and the lock byte set, pointing to the second ITL entry. The absence of flags for ITL #2 indicates the transaction is ongoing (you can see this in v$transaction-this is a lab environment, in real databases you’d see more than 1 active transaction):
SQL> select xidusn, xidslot, xidsqn, status, start_scn from v$transaction; XIDUSN XIDSLOT XIDSQN STATUS START_SCN ---------- ---------- ---------- ---------------- ---------- 9 5 9589 ACTIVE 35200741
This is indeed the transaction that is referenced in v$transaction and the ITL 0x02 above (9589 is decimal for 2575 in hexadecimal notation, 9 and 5 don’t need converting to decimal numbers). If you are curious now about some more internals then you might find this post by Arup Nanda interesting.
Now what happens if another session queries the table? Since the query in session 1 has not yet committed the changes in it must not be seen by any other session (that would be a dirty read otherwise. We do not use dirty reads in Oracle RDBMS). So to give the user a consistent view of the data the block must be rolled back. Before the test I made sure there wasn’t a block in the buffer cache for the object in question:
SQL> select count(*), inst_id, status from gv$bh where objd = 2 (select data_object_id from dba_objects where object_name = 'T1_100K') 3 group by inst_id, status; COUNT(*) INST_ID STATUS ---------- ---------- ---------- 524327 1 free
Let’s test:
SQL> @mystats start SQL> select /* test004 */ count(*) from t1_100k; COUNT(*) ---------- 500000 Elapsed: 00:00:13.96 @mystats stop t=1 ========================================================================================== MyStats report : 22-FEB-2015 08:42:09 ========================================================================================== ------------------------------------------------------------------------------------------ 1. Summary Timings ------------------------------------------------------------------------------------------ Type Statistic Name Value ------ ---------------------------------------------------------------- ---------------- TIMER snapshot interval (seconds) 35.57 TIMER CPU time used (seconds) 3.21 ------------------------------------------------------------------------------------------ 2. Statistics Report ------------------------------------------------------------------------------------------ Type Statistic Name Value ------ ---------------------------------------------------------------- ---------------- STAT CPU used by this session 327 STAT CPU used when call started 327 STAT CR blocks created 83,334 STAT DB time 1,401 STAT Requests to/from client 16 STAT SQL*Net roundtrips to/from client 16 STAT active txn count during cleanout 83,334 STAT buffer is not pinned count 2 STAT bytes received via SQL*Net from client 13,279 STAT bytes sent via SQL*Net to client 5,106 STAT calls to get snapshot scn: kcmgss 758 STAT calls to kcmgas 83,332 STAT calls to kcmgcs 30 STAT cell flash cache read hits 8,045 STAT cell physical IO interconnect bytes 748,576,768 STAT change write time 48 STAT cleanout - number of ktugct calls 83,334 STAT cleanouts and rollbacks - consistent read gets 83,334 STAT consistent changes 591,717 STAT consistent gets 757,980 STAT consistent gets examination 674,254 STAT consistent gets examination (fastpath) 83,337 STAT consistent gets from cache 757,980 STAT consistent gets pin 83,726 STAT consistent gets pin (fastpath) 83,055 STAT data blocks consistent reads - undo records applied 590,917 STAT db block changes 84,134 STAT db block gets 2,301 STAT db block gets from cache 2,301 STAT enqueue releases 4 STAT enqueue requests 4 STAT execute count 13 STAT file io wait time 11,403,214 STAT free buffer requested 174,766 STAT gc local grants 174,713 STAT global enqueue gets sync 43 STAT global enqueue releases 41 STAT heap block compress 4,549 STAT hot buffers moved to head of LRU 4 STAT immediate (CR) block cleanout applications 83,334 STAT index fetch by key 1 STAT lob writes 375 STAT lob writes unaligned 375 STAT logical read bytes from cache 6,228,221,952 STAT messages sent 9 STAT non-idle wait count 17,446 STAT non-idle wait time 1,141 STAT opened cursors cumulative 13 STAT parse count (hard) 1 STAT parse count (total) 13 STAT physical read IO requests 8,715 STAT physical read bytes 748,576,768 STAT physical read requests optimized 8,045 STAT physical read total IO requests 8,715 STAT physical read total bytes 748,576,768 STAT physical read total bytes optimized 65,904,640 STAT physical read total multi block requests 654 STAT physical reads 91,379 STAT physical reads cache 91,379 STAT physical reads cache prefetch 82,664 STAT recursive calls 1,985 STAT recursive cpu usage 6 STAT redo entries 84,005 STAT redo entries for lost write detection 671 STAT redo size 9,016,740 STAT redo size for lost write detection 3,016,296 STAT redo subscn max counts 83,334 STAT rows fetched via callback 1 STAT session cursor cache count 4 STAT session cursor cache hits 9 STAT session logical reads 760,281 STAT session pga memory -65,536 STAT session uga memory 65,488 STAT session uga memory max 148,312 STAT shared hash latch upgrades - no wait 8,218 STAT table fetch by rowid 1 STAT table scan blocks gotten 83,334 STAT table scan disk non-IMC rows gotten 500,000 STAT table scan rows gotten 500,000 STAT table scans (short tables) 1 STAT temp space allocated (bytes) 2,097,152 STAT user I/O wait time 1,141 STAT user calls 22 STAT workarea executions - optimal 1 STAT workarea memory allocated -43 ------------------------------------------------------------------------------------------ 3. About ------------------------------------------------------------------------------------------ - MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net) - Based on the SNAP_MY_STATS utility by Jonathan Lewis ========================================================================================== End of report ==========================================================================================
I am using Adrian Billington’s mystats script here which I can only recommend to you-it’s very decent (and that’s an understatement). In the example above it took two snapshots and calculated the change of the session counters in v$mystat during the query execution. Have a look at the cleanout% statistics here! These are the block cleanouts. As this generates redo you can see that recorded here too. This is a tricky interview question: can a select generate redo? Yes sure can! There is also a fair amount of physical I/O going on. After all the buffer cache was empty.
The second execution of the query completes a lot faster due to the absence of physical I/O. Nevertheless, the session has to clean out all these blocks and perform consistent read processing again (remember the transaction is still uncommitted as seen in active txn%):
SQL> select count(*) from t1_100k; COUNT(*) ---------- 500000 Elapsed: 00:00:01.76 SQL> @mystats stop t=1 ========================================================================================== MyStats report : 22-FEB-2015 08:42:59 ========================================================================================== ------------------------------------------------------------------------------------------ 1. Summary Timings ------------------------------------------------------------------------------------------ Type Statistic Name Value ------ ---------------------------------------------------------------- ---------------- TIMER snapshot interval (seconds) 8.55 TIMER CPU time used (seconds) 1.75 ------------------------------------------------------------------------------------------ 2. Statistics Report ------------------------------------------------------------------------------------------ Type Statistic Name Value ------ ---------------------------------------------------------------- ---------------- STAT CPU used by this session 181 STAT CPU used when call started 181 STAT CR blocks created 83,334 STAT DB time 183 STAT Requests to/from client 16 STAT SQL*Net roundtrips to/from client 16 STAT active txn count during cleanout 83,334 STAT buffer is not pinned count 2 STAT bytes received via SQL*Net from client 13,280 STAT bytes sent via SQL*Net to client 5,177 STAT calls to get snapshot scn: kcmgss 757 STAT calls to kcmgas 83,331 STAT calls to kcmgcs 30 STAT change write time 28 STAT cleanout - number of ktugct calls 83,334 STAT cleanouts and rollbacks - consistent read gets 83,334 STAT consistent changes 591,708 STAT consistent gets 757,980 STAT consistent gets examination 674,254 STAT consistent gets examination (fastpath) 83,337 STAT consistent gets from cache 757,980 STAT consistent gets pin 83,726 STAT consistent gets pin (fastpath) 83,726 STAT cursor authentications 1 STAT data blocks consistent reads - undo records applied 590,917 STAT db block changes 84,125 STAT db block gets 2,895 STAT db block gets from cache 2,895 STAT execute count 13 STAT free buffer requested 83,382 STAT global enqueue gets sync 38 STAT global enqueue releases 36 STAT heap block compress 4,549 STAT immediate (CR) block cleanout applications 83,334 STAT index fetch by key 1 STAT lob writes 375 STAT lob writes unaligned 375 STAT logical read bytes from cache 6,233,088,000 STAT messages sent 7 STAT non-idle wait count 16 STAT opened cursors cumulative 13 STAT parse count (total) 13 STAT recursive calls 1,982 STAT recursive cpu usage 6 STAT redo entries 83,334 STAT redo size 6,000,400 STAT rows fetched via callback 1 STAT session cursor cache hits 12 STAT session logical reads 760,875 STAT session pga memory -1,769,472 STAT session uga memory -1,858,472 STAT shared hash latch upgrades - no wait 524 STAT table fetch by rowid 1 STAT table scan blocks gotten 83,334 STAT table scan disk non-IMC rows gotten 500,000 STAT table scan rows gotten 500,000 STAT table scans (short tables) 1 STAT user calls 22 STAT workarea executions - optimal 1 STAT workarea memory allocated -83 ------------------------------------------------------------------------------------------ 3. About ------------------------------------------------------------------------------------------ - MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net) - Based on the SNAP_MY_STATS utility by Jonathan Lewis ========================================================================================== End of report ==========================================================================================
I should have mentioned that these 2 examples were executed on an X2-2 quarter rack. To prevent Exadata from applying some clever optimisations I specifically disabled direct path reads and turned off Exadata features. Here is the view from the buffer cache again:
SQL> select count(*), inst_id, status from gv$bh where objd = 2 (select data_object_id from dba_objects where object_name = 'T1_100K') 3 group by inst_id, status; COUNT(*) INST_ID STATUS ---------- ---------- ---------- 351704 1 free 1 1 scur 83334 1 xcur 83334 1 cr 4 rows selected.
For each subsequent execution of the select statement consistent read processing has to be conducted. The number of CR blocks in the buffer cache will increase. For the third execution I can see the following:
SQL> r 1 select count(*), inst_id, status from gv$bh where objd = 2 (select data_object_id from dba_objects where object_name = 'T1_100K') 3* group by inst_id, status COUNT(*) INST_ID STATUS ---------- ---------- ---------- 267526 1 free 1 1 scur 83334 1 xcur 166668 1 cr 4 rows selected.
Things improve when the transaction commits.
The impact of a commit
What about a commit now? Committing in session 1 and running the query again in session 2 shows a very good response time. Please remember that I forced buffered reads. The next part deals with this case when using direct path reads instead (those are not buffered). The segment size would have qualified for direct path reads otherwise.
SQL> select /* test004 */ count(*) from t1_100k; COUNT(*) ---------- 500000 Elapsed: 00:00:00.18 ========================================================================================== MyStats report : 22-FEB-2015 09:40:08 ========================================================================================== ------------------------------------------------------------------------------------------ 1. Summary Timings ------------------------------------------------------------------------------------------ Type Statistic Name Value ------ ---------------------------------------------------------------- ---------------- TIMER snapshot interval (seconds) 19.73 TIMER CPU time used (seconds) 0.18 ------------------------------------------------------------------------------------------ 2. Statistics Report ------------------------------------------------------------------------------------------ Type Statistic Name Value ------ ---------------------------------------------------------------- ---------------- STAT CPU used by this session 25 STAT CPU used when call started 25 STAT DB time 25 STAT Heatmap SegLevel - Full Table Scan 1 STAT Requests to/from client 16 STAT SQL*Net roundtrips to/from client 16 STAT buffer is not pinned count 2 STAT bytes received via SQL*Net from client 13,280 STAT bytes sent via SQL*Net to client 5,107 STAT calls to get snapshot scn: kcmgss 757 STAT calls to kcmgcs 30 STAT consistent changes 800 STAT consistent gets 83,729 STAT consistent gets examination 3 STAT consistent gets examination (fastpath) 3 STAT consistent gets from cache 83,729 STAT consistent gets pin 83,726 STAT consistent gets pin (fastpath) 83,726 STAT db block changes 800 STAT db block gets 2,301 STAT db block gets from cache 2,301 STAT enqueue releases 3 STAT enqueue requests 3 STAT execute count 13 STAT free buffer requested 53 STAT global enqueue gets sync 39 STAT global enqueue releases 37 STAT index fetch by key 1 STAT lob writes 375 STAT lob writes unaligned 375 STAT logical read bytes from cache 704,757,760 STAT no work - consistent read gets 83,334 STAT non-idle wait count 16 STAT opened cursors cumulative 13 STAT parse count (total) 13 STAT recursive calls 1,984 STAT recursive cpu usage 7 STAT rows fetched via callback 1 STAT session cursor cache count 4 STAT session cursor cache hits 10 STAT session logical reads 86,030 STAT session pga memory 65,536 STAT session pga memory max 65,536 STAT session uga memory 65,488 STAT session uga memory max 148,312 STAT shared hash latch upgrades - no wait 525 STAT table fetch by rowid 1 STAT table scan blocks gotten 83,334 STAT table scan disk non-IMC rows gotten 500,000 STAT table scan rows gotten 500,000 STAT table scans (short tables) 1 STAT temp space allocated (bytes) 2,097,152 STAT user calls 22 STAT workarea executions - optimal 1 STAT workarea memory allocated -40 ------------------------------------------------------------------------------------------ 3. About ------------------------------------------------------------------------------------------ - MyStats v2.01 by Adrian Billington (http://www.oracle-developer.net) - Based on the SNAP_MY_STATS utility by Jonathan Lewis ========================================================================================== End of report ==========================================================================================
Access to blocks in the buffer cache is quick! There was no need to clean out any blocks anymore. The fastest way to do something is indeed not having to do it at all. Mind you, the block on disk still has the active transaction recorded!
SQL> alter system dump datafile 5 block 3940811; System altered. Elapsed: 00:00:00.01 SQL> select value from v$diag_info where name like 'Default%'; VALUE ------------------------------------------------------------------------------------------------------------------------ /u01/app/oracle/diag/rdbms/dbm01/dbm011/trace/dbm011_ora_100297.trc SQL> exit $ vi /u01/app/oracle/diag/rdbms/dbm01/dbm011/trace/dbm011_ora_100297.trc Block header dump: 0x017c21cb Object id on Block? Y seg/obj: 0x12a13 csc: 0x00.2317620 itc: 3 flg: E typ: 1 - DATA brn: 0 bdba: 0x17c21c8 ver: 0x01 opc: 0 inc: 0 exflg: 0 Itl Xid Uba Flag Lck Scn/Fsc 0x01 0xffff.000.00000000 0x00000000.0000.00 C--- 0 scn 0x0000.02191d10 0x02 0x0009.005.00002575 0x0003688a.0ef1.1f --U- 6 fsc 0x0000.0232cfca 0x03 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000 bdba: 0x017c21cb data_block_dump,data header at 0x7fc9f3cb407c ... $ grep lb -B1 /u01/app/oracle/diag/rdbms/dbm01/dbm011/trace/dbm011_ora_100297.trc tab 0, row 0, @0x1b75 tl: 1035 fb: --H-FL-- lb: 0x2 cc: 6 -- tab 0, row 1, @0x176a tl: 1035 fb: --H-FL-- lb: 0x2 cc: 6 -- tab 0, row 2, @0x135f tl: 1035 fb: --H-FL-- lb: 0x2 cc: 6 -- tab 0, row 3, @0xf54 tl: 1035 fb: --H-FL-- lb: 0x2 cc: 6 -- tab 0, row 4, @0xb49 tl: 1035 fb: --H-FL-- lb: 0x2 cc: 6 -- tab 0, row 5, @0x73e tl: 1035 fb: --H-FL-- lb: 0x2 cc: 6 --
The transaction recorded in the ITL slot 2 is gone though:
SQL> select xidusn, xidslot, xidsqn, status, start_scn from v$transaction 2 where xidusn = 9 and xidslot = 5 and xidsqn = 9589; no rows selected
Did you notice the Flag in the second ITL entry? That wasn’t there before. It indicates a fast commit according to Core Oracle by Jonathan Lewis.
Summary
Block cleanouts are simple with buffered I/O. Dirty blocks can be rolled back and kept in the buffer cache and can be accessed without having to undergo block cleanout.
Response
[…] « Understanding enhancements to block cleanouts in Exadata part 1 […]