Anyone who has looked at Exadata might ask the question, and I did so too. After all, cell smart table scan is in wait class User IO so there should be more, right? This is what you find for a smart scan:
NAME PARAMETER1 PARAMETER2 PARAMETER3 WAIT_CLASS ------------------------------ -------------------- -------------------- ------------------------------ --------------- cell smart table scan cellhash# User I/O cell smart index scan cellhash# User I/O
Compare this to the traditional IO request:
NAME PARAMETER1 PARAMETER2 PARAMETER3 WAIT_CLASS ------------------------------ -------------------- -------------------- ------------------------------ --------------- db file sequential read file# block# blocks User I/O db file scattered read file# block# blocks User I/O direct path read file number first dba block cnt User I/O
At first I didn’t understand as to why that was the. Until tracing a lot more. Using the method described by my colleague Frits Hoogland I traced Exadata calls as well as the IB interconnect in addition to a SQL trace (level 8). Please don’t do this outside the lab, the trace will get huge!
Now that produces an interesting file, and the SQL trace information is only 1 intermittent line drowning in a sea of other information. So I used a sed oneliner to make them stand out a bit:
$ sed "/WAIT/{x;p;x;G;}" db12c1_ora_11916.trc > kxd_libcell_sqltace.trc
This introduces a newline before and after each line containing WAIT. The SQL Trace information is familiar:
WAIT #139970814085680: nam='cell smart table scan' ela= 251 cellhash#=822451848 p2=0 p3=0 obj#=61471 tim=3470911250866
By the way the obj# is the data_object_ID, not the object_id in DBA_OBJECTS.
Stepping through the trace
The trace information reflects what is happening during the various smart scan processing steps. This first example shows you some of the processing during the smart scan initialisation phase, when the cells are being initialised. I initially struggled mapping the “appliance” hexadecimal number to an actual cell. But it’s easy if you grep for kcfis_open_appliance_fd, and match the cellhash to v$cell.cell_hashval. Continuing the example:
kcfis_open_appliance: throttle: initialized for appliance 0x7f4d7e9bc4c0 kcfis_find_appliance_fd: appliance fd not found. appliance 192.168.12.10/ Cached appliance fd count 0 kcfis_open_appliance_fd: 0x7f4d7e9bc4c0 oss_open called for o/192.168.12.10//box/predicate Sending message RQ_Tag_2827385864_11: 0xf9b3f08 with frag # 0, to box my_box_refid: 2 (box inc: 12) Command with band 1, payload length 116, payload 0xf9b4110 RQ_Tag_2827385864_11: Command name OPEN, refid 11 Open flags: readwrite - 1, sync - 0, char - 1. Device name - /box/predicate Vrpc completion - RQ_Tag_2827385864_11: cmd 0xf9b3f48, req 0xf9b3a40, frag # 1 Vrpc reply 0xf9b4000 with refid 11 for existing message: 0xf9b3f08 Reply with band 0, payload length 0, payload (nil) RQ_Tag_2827385864_11: Command name REPLY OPEN, refid 11 Returned open file descriptor 1 opening file number 11 WAIT #139970814085680: nam='cell smart table scan' ela= 259 cellhash#=822451848 p2=0 p3=0 obj#=61471 tim=3470911167788
So here the cell with IP 192.168.12.10 maps to 0x7f4d7e9bc4c0 in the trace. As you can see, immediately before the wait event is written to the trace file a file is opened on the cell. This causes the line to be emitted in the SQL trace file. So far there hasn’t been any significant scanning activity against the data. But the trace line makes sense: the IO request was issued against the cell with hash 822451848. We don’t know what it did though, and which file it accessed.
SYS:db12c1> select cell_path from v$cell where cell_hashval = '822451848'; CELL_PATH ---------------------------- 192.168.12.10
I was initially quite pleased with myself initially when I found this out.
More trace information. A lot more.
Lesson learned trawling through the trace: just because there is a single line in the trace file doesn’t mean there aren’t other requests either in flight or under preparation.
The next few thousand (!) lines show you the mapping between RDBMS and cell addressing. This part is well explained in Frits’ post I referenced earlier. Further down in the trace it gets more interesting from a SQL trace perspective again. Here payload maps are generated and issued against the cells. Remember that these IO requests to cells are performed asynchronously. I found quite a few calls to the code generating payloads for the cells between individual SQL Trace WAIT lines. This makes reading the trace a bit confusing, it almost appears as if the SQL Trace couldn’t keep up with the speed the requests are created/sent.
Looking at the file I found the following pattern preceding a smart scan line in the first half of the process (before data comes back from the cells). Beginning immediately after the last WAIT line, in abbreviated format:
Ioctl completed. appliance 0x7f4d7e9bc4c0 : cellsrv pid: 9563: predicate: /box/predicate37843 ... kcfis_create_maps_payload. appliance 0x7f4d7ea65570 num_disks 12 throttle: allowing map push: appliance 0x7f4d7ea65570, maps_to_push 7 ... Pushing request : disknumber = 0, offset = 936606973952, len = 1032192 rdba: 43778306 version 0 reqid=10 cver=1 disk=0x7f4d7e9c0aa8 state=1 trans_req_element = 0x7f4d7e9cba00
The appliance name, cellsrv (thread) ID, and the box predicate appear to remain constant for the lifetime of the smart scan. You can find them in other occurrences of the trace as well (but not used in the same context).
The preceding lines are then followed by many more lines beginning with “Pushing request”, iterating over all 12 disks (-> num_disks), for a range of Data Block Addresses. After that map payload has been created it is time to push it to the cell. Here it gets a bit murky: the KXD.* traces show requests being pushed, but not necessarily to the cell mentioned in the event line. Consider this:
kcfis_create_maps_payload. alloc_len 272 num maps 2 throttle: mappayload: maps_to_push 4 iosize_being_pushed 8388608 kcfis_push: pushing maps to appliance 0x7f4d7e9bc4c0 kcfis_issue_ioctl: payload_type 2 oss_ioctl open file: 11, opcode: 99 Sending message RQ_Tag_2827385864_54: 0xf9c5728 with frag # 0, to box my_box_refid: 2 (box inc: 12) Command with band 1, payload length 272, payload 0x7f4d7e9b9ef8 RQ_Tag_2827385864_54: Command name IOCTL, refid 54 Ioctl arguments fd 1 opcode 99 size 272 oss_ioctl done oss_wait called for request: 0xf9b3a40 Vrpc completion - RQ_Tag_2341095748_52: cmd 0xf9c6ec8, req 0xf9b3a40, frag # 1 Vrpc reply 0xf9c6f80 with refid 52 for existing message: 0xf9c6e88 Reply with band 1, payload length 40, payload 0x7f4d7ea65a90 RQ_Tag_2341095748_52: Command name REPLY IOCTL, refid 52 oss_wait done, request 0xf9b3a40, result 0x7f4d837e4c48 WAIT #139970814085680: nam='cell smart table scan' ela= 44 cellhash#=3249924569 p2=0 p3=0 obj#=61471 tim=3470911270172
The kcfis_push call is addressed to appliance 0x7f4d7e9bc4c0, which maps to 192.168.12.10. The cell hash from the event (3249924569) maps to a different cell though. You can find the corresponding kcfis_push earlier in the trace. In fact, it is the first occurrence after the last WAIT event line.
trans_req_element = 0x7f4d7e870780 Pushing request : disknumber = 5, offset = 937011707904, len = 4194304 rdba: 44570624 version 0 reqid=1021 cver=1 disk=0x7f4d7e9b5980 state=1 disk=0x7f4d7e9b4968 state=1 disk=0x7f4d7e9b3460 state=1 disk=0x7f4d7e9b2f70 state=1 disk=0x7f4d7e9df5a0 state=1 disk=0x7f4d7e9df0b0 state=1 disk=0x7f4d7e9dea78 state=1 kcfis_create_maps_payload. alloc_len 2648 num maps 35 throttle: mappayload: maps_to_push 10 iosize_being_pushed 102039552 kcfis_push: pushing maps to appliance 0x7f4d7ea65570 kcfis_issue_ioctl: payload_type 2 SKGXP-connecting to the box 192.168.12.8 OSS OS Pid 4541 oss_ioctl open file: 2, opcode: 99 Sending message RQ_Tag_2341095748_52: 0xf9c6e88 with frag # 0, to box my_box_refid: 0 (box inc: 16) Command with band 3, payload length 2648, payload 0x7f4d7ea62e98 RQ_Tag_2341095748_52: Command name IOCTL, refid 52 Ioctl arguments fd 1 opcode 99 size 2648 oss_ioctl done
This is getting too complex, can you say it again?
Now let’s put this together again, simplified:
WAIT #139970814085680: nam='cell smart table scan' ela= 24 cellhash#=822451848 p2=0 p3=0 obj#=61471 tim=3470911267234 kcfis_push: num-appliances 3. payload_size 0x7f4d837e4cf0 ioctl_issued 0x7f4d837e4cc8 results 0x7f4d837e4c48 payload_type 0x7f4d837e4d20 kcfis_create_maps_payload. appliance 0x7f4d7ea65570 num_disks 12 ... kcfis_create_maps_payload. alloc_len 2648 num maps 35 throttle: mappayload: maps_to_push 10 iosize_being_pushed 102039552 kcfis_push: pushing maps to appliance 0x7f4d7ea65570 kcfis_issue_ioctl: payload_type 2 SKGXP-connecting to the box 192.168.12.8 OSS OS Pid 4541 oss_ioctl open file: 2, opcode: 99 Sending message RQ_Tag_2341095748_52: 0xf9c6e88 with frag # 0, to box my_box_refid: 0 (box inc: 16) Command with band 3, payload length 2648, payload 0x7f4d7ea62e98 RQ_Tag_2341095748_52: Command name IOCTL, refid 52 Ioctl arguments fd 1 opcode 99 size 2648 oss_ioctl done ... kcfis_create_maps_payload. appliance 0x7f4d7e9bf518 num_disks 12 ... kcfis_create_maps_payload. alloc_len 1208 num maps 15 throttle: mappayload: maps_to_push 10 iosize_being_pushed 34471936 kcfis_push: pushing maps to appliance 0x7f4d7e9bf518 kcfis_issue_ioctl: payload_type 2 SKGXP-connecting to the box 192.168.12.9 OSS OS Pid 9547 ... oss_ioctl done ... kcfis_create_maps_payload. appliance 0x7f4d7e9bc4c0 num_disks 12 ... kcfis_create_maps_payload. alloc_len 272 num maps 2 throttle: mappayload: maps_to_push 4 iosize_being_pushed 8388608 kcfis_push: pushing maps to appliance 0x7f4d7e9bc4c0 ... oss_ioctl done ... oss_wait done, request 0xf9b3a40, result 0x7f4d837e4c48 WAIT #139970814085680: nam='cell smart table scan' ela= 44 cellhash#=3249924569 p2=0 p3=0 obj#=61471 tim=3470911270172
Interestingly the results for the requests come back in that order as well. It seems coincidence, judging by the rest of the trace. Compare the appliance values with the summary you just read:
WAIT #139970814085680: nam='cell smart table scan' ela= 44 cellhash#=3249924569 p2=0 p3=0 obj#=61471 tim=3470911270172 Ioctl completed. Payload type 2 Ioctl quarantine response 1 for appliance 0x7f4d7ea65570 appliance 0x7f4d7ea65570 : cellsrv pid: 4541: predicate: /box/predicate114 oss_wait called for request: 0xf9c69c0 Vrpc completion - RQ_Tag_2825762783_53: cmd 0xf9c6768, req 0xf9c69c0, frag # 1 Vrpc reply 0xf9c6820 with refid 53 for existing message: 0xf9c6728 Reply with band 1, payload length 40, payload 0x7f4d7e9bfa38 RQ_Tag_2825762783_53: Command name REPLY IOCTL, refid 53 oss_wait done, request 0xf9c69c0, result 0x7f4d837e4c68 WAIT #139970814085680: nam='cell smart table scan' ela= 64 cellhash#=674246789 p2=0 p3=0 obj#=61471 tim=3470911270269 Ioctl completed. Payload type 2 Ioctl quarantine response 1 for appliance 0x7f4d7e9bf518 appliance 0x7f4d7e9bf518 : cellsrv pid: 9547: predicate: /box/predicate37211 oss_wait called for request: 0xf9c5260 Vrpc completion - RQ_Tag_2827385864_54: cmd 0xf9c5768, req 0xf9c5260, frag # 1 Vrpc reply 0xf9c5820 with refid 54 for existing message: 0xf9c5728 Reply with band 1, payload length 40, payload 0x7f4d7e9bc9e0 RQ_Tag_2827385864_54: Command name REPLY IOCTL, refid 54 oss_wait done, request 0xf9c5260, result 0x7f4d837e4c88 WAIT #139970814085680: nam='cell smart table scan' ela= 54 cellhash#=822451848 p2=0 p3=0 obj#=61471 tim=3470911270351 Ioctl completed. Payload type 2 Ioctl quarantine response 1 for appliance 0x7f4d7e9bc4c0
SQL trace wait events frequently seem to be written after an IOCTL (standard Linux call) or oss_wait (Oracle) has completed. So it appears as if requests are created and treated on a cell-by-cell basis. This explains why you get so many cell table smart scan events in general. It’s not because they are necessarily small in size, the payloads always seem to include more than 1 disk to scan. Thinking about it now it makes sense, but as an analyst I would have appreciated more insights into what happens under the covers. But we can trace, right?