This post is the result of some testing I performed on Exadata data file creation. You may know that Exadata offloading incorporates SQL optimisations as well as some infrastructure work. For quite some time Exadata allowed the DBA to create data files a lot quicker than on traditional systems. This has been documented before, for example by @mpnsh here
The final comment on his blog entry was a remark that data file creation is quite fast, but that is not true for online redo logs. Especially in environments where you duplicate production to a lower tier environment you have to wait for the online redo logs (including all members across all threads) to be zeroed. This is no longer an issue with Fast Data File Creation. If your system is configured to use Write-Back Flash Cache (WBFC from now on) and you are on Exadata 11.2.3.3 then you can benefit from super-fast file creation, including online redo logs. Here is an example, taken from a SQL trace:
TKPROF: Release 12.1.0.2.0 ... Trace file: /u01/app/oracle/diag/rdbms/mbach/MBACH1/trace/MBACH1_ora_124411.trc Sort options: default ******************************************************************************** count = number of times OCI procedure was executed cpu = cpu time in seconds executing elapsed = elapsed time in seconds executing disk = number of physical reads of buffers from disk query = number of buffers gotten for consistent read current = number of buffers gotten in current mode (usually for update) rows = number of rows processed by the fetch or execute call ******************************************************************************** SQL ID: fvt6psf2t3cdz Plan Hash: 0 alter database add logfile thread 2 group 5 ('+DATA','+RECO') size 4096m call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.37 5.09 0 0 2 0 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 2 0.38 5.09 0 0 2 0 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: SYS Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ Disk file Mirror Read 2 0.00 0.00 control file sequential read 23 0.00 0.01 KSV master wait 19 0.04 0.09 ASM file metadata operation 6 0.00 0.00 CSS initialization 1 0.00 0.00 CSS operation: query 6 0.00 0.00 CSS operation: action 2 0.00 0.00 kfk: async disk IO 2 0.00 0.00 Disk file operations I/O 3 0.00 0.00 cell smart file creation 3258 0.13 4.64 log file single write 2 0.00 0.00 control file parallel write 10 0.07 0.07 log file sync 1 0.00 0.00 SQL*Net message to client 1 0.00 0.00 SQL*Net message from client 1 17.61 17.61 ********************************************************************************
I have tried to replicate the command issued in Martin Nash’s blog post. If memory serves me right he was waiting for 37 seconds for this command to complete. The new method uses about 5 seconds-not bad! Interestingly there is no new wait event-the one we know (cell smart file creation) is used to indicate the work that has been done. In the case of Fast Data File Creation only the metadata about the new file is persisted in the WBFC, the actual formatting has not happened when the prompt returned.
Session Counters?
I have spent _a lot_ of time on Exadata related session counters in 12.1.0.2 and earlier releases. I think they are fascinating, and since the wait events do not really show me what happened during the execution of the statement I used the session counters instead. I am using Adrian Billington’s mystats tool for this:
SQL> @scripts/mystats start SQL> alter database add logfile thread 2 group 5 ('+DATA','+RECO') size 4096m; Database altered. Elapsed: 00:00:04.87 SQL> @scripts/mystats stop t=1 ... STAT cell flash cache read hits 20 STAT cell logical write IO requests 14 STAT cell overwrites in flash cache 30 STAT cell physical IO bytes eligible for predicate offload 17,179,869,184 STAT cell physical IO bytes saved during optimized file creation 17,179,869,184 STAT cell physical IO interconnect bytes 1,361,920 STAT cell writes to flash cache 30 ... STAT physical write requests optimized 10 STAT physical write total IO requests 4,126 STAT physical write total bytes 17,180,197,888 STAT physical write total bytes optimized 163,840 STAT physical write total multi block requests 4,096
Let’s begin with the cell%-statistics. These indicate what happened on the storage layer. The command I executed was to create an online redo log group in thread 2, in disk groups DATA and RECO. Each file is approximately 4 GB in size.
SQL> select f.member, l.bytes/power(1024,2) m 2 from v$log l, v$logfile f 3 where l.group# = f.group# 4 and l.group# = 5; MEMBER M ------------------------------------------------------------ ---------- +DATA/MBACH/ONLINELOG/group_5.454.879407723 4096 +RECO/MBACH/ONLINELOG/group_5.1229.879407725 4096 2 rows selected.
So why does the database report 17,180,197,888 “bytes saved during optimized file creation” and later again in “physical write total bytes”? The answer is ASM mirroring. In this system DATA and RECO are protected using ASM normal redundancy, doubling the writes.
What I found amusing is that the exact number of bytes eventually written is “eligible for predicate offload”. Until quite recently I only associated Smart Scans with this statistic counter.
You can also see a few writes (and overwrites) to (Write Back) Flash Cache.