Exadata Fast Data File Creation

This post is the result of some testing I performed on Exadata data file creation. You may know that Exadata offloading incorporates SQL optimisations as well as some infrastructure work. For quite some time Exadata allowed the DBA to create data files a lot quicker than on traditional systems. This has been documented before, for example by @mpnsh here

The final comment on his blog entry was a remark that data file creation is quite fast, but that is not true for online redo logs. Especially in environments where you duplicate production to a lower tier environment you have to wait for the online redo logs (including all members across all threads) to be zeroed. This is no longer an issue with Fast Data File Creation. If your system is configured to use Write-Back Flash Cache (WBFC from now on) and you are on Exadata then you can benefit from super-fast file creation, including online redo logs. Here is an example, taken from a SQL trace:

TKPROF: Release 
Trace file: /u01/app/oracle/diag/rdbms/mbach/MBACH1/trace/MBACH1_ora_124411.trc
Sort options: default

count    = number of times OCI procedure was executed
cpu      = cpu time in seconds executing
elapsed  = elapsed time in seconds executing
disk     = number of physical reads of buffers from disk
query    = number of buffers gotten for consistent read
current  = number of buffers gotten in current mode (usually for update)
rows     = number of rows processed by the fetch or execute call

SQL ID: fvt6psf2t3cdz Plan Hash: 0

alter database add logfile thread 2 group 5 ('+DATA','+RECO') size 4096m

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.37       5.09          0          0          2           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.38       5.09          0          0          2           0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  Disk file Mirror Read                           2        0.00          0.00
  control file sequential read                   23        0.00          0.01
  KSV master wait                                19        0.04          0.09
  ASM file metadata operation                     6        0.00          0.00
  CSS initialization                              1        0.00          0.00
  CSS operation: query                            6        0.00          0.00
  CSS operation: action                           2        0.00          0.00
  kfk: async disk IO                              2        0.00          0.00
  Disk file operations I/O                        3        0.00          0.00
  cell smart file creation                     3258        0.13          4.64
  log file single write                           2        0.00          0.00
  control file parallel write                    10        0.07          0.07
  log file sync                                   1        0.00          0.00
  SQL*Net message to client                       1        0.00          0.00
  SQL*Net message from client                     1       17.61         17.61

I have tried to replicate the command issued in Martin Nash’s blog post. If memory serves me right he was waiting for 37 seconds for this command to complete. The new method uses about 5 seconds-not bad! Interestingly there is no new wait event-the one we know (cell smart file creation) is used to indicate the work that has been done. In the case of Fast Data File Creation only the metadata about the new file is persisted in the WBFC, the actual formatting has not happened when the prompt returned.

Session Counters?

I have spent _a lot_ of time on Exadata related session counters in and earlier releases. I think they are fascinating, and since the wait events do not really show me what happened during the execution of the statement I used the session counters instead. I am using Adrian Billington’s mystats tool for this:

SQL> @scripts/mystats start

SQL> alter database add logfile thread 2 group 5 ('+DATA','+RECO') size 4096m;

Database altered.

Elapsed: 00:00:04.87
SQL> @scripts/mystats stop t=1
STAT    cell flash cache read hits                                                      20
STAT    cell logical write IO requests                                                  14
STAT    cell overwrites in flash cache                                                  30
STAT    cell physical IO bytes eligible for predicate offload               17,179,869,184
STAT    cell physical IO bytes saved during optimized file creation         17,179,869,184
STAT    cell physical IO interconnect bytes                                      1,361,920
STAT    cell writes to flash cache                                                      30
STAT    physical write requests optimized                                               10
STAT    physical write total IO requests                                             4,126
STAT    physical write total bytes                                          17,180,197,888
STAT    physical write total bytes optimized                                       163,840
STAT    physical write total multi block requests                                    4,096

Let’s begin with the cell%-statistics. These indicate what happened on the storage layer. The command I executed was to create an online redo log group in thread 2, in disk groups DATA and RECO. Each file is approximately 4 GB in size.

SQL> select f.member, l.bytes/power(1024,2) m 
  2  from v$log l, v$logfile f
  3  where l.group# = f.group#
  4  and l.group# = 5;

MEMBER                                                                M
------------------------------------------------------------ ----------
+DATA/MBACH/ONLINELOG/group_5.454.879407723                        4096
+RECO/MBACH/ONLINELOG/group_5.1229.879407725                       4096

2 rows selected.

So why does the database report 17,180,197,888 “bytes saved during optimized file creation” and later again in “physical write total bytes”? The answer is ASM mirroring. In this system DATA and RECO are protected using ASM normal redundancy, doubling the writes.

What I found amusing is that the exact number of bytes eventually written is “eligible for predicate offload”. Until quite recently I only associated Smart Scans with this statistic counter.

You can also see a few writes (and overwrites) to (Write Back) Flash Cache.