Martins Blog

Trying to explain complex things in simple terms

JSON support in Exadata 12.1.2.1.0 and later

Posted by Martin Bach on July 8, 2015

Some time ago Oracle announced that RDBMS 12.1.0.2 has built-in support for JSON processing. A little later it was also mentioned that you have support for JSON in the Exadata storage servers for offloading. This is probably a lot more exciting to users of JSON than it is to me as I’m not a developer. However, whenever an announcement such as the one I’m referring to is made I would like to see for myself how much of it is implemented in software. Like I said, I’m not a developer so apologies for a silly example: what I’m showing you here can probably done differently and is not the best use of an Exadata. But all I really wanted to test is if JSON support actually exists. I am using cellsrv 12.1.2.1.0 and RDBMS 12.1.0.2.2 for this test.

JSON

I have to say I struggled a little bit to understand the use case for JSON and therefore did what probably everyone does and consulted the official documentation and oracle-base.com for Tim’s views on JSON. Here’s a summary of links I found useful to get started:

The Test

Ok so that was enough to get me started. I needed data, and a table to store this in. It appeared to me that an apache log could be a useful source for JSON records, so I converted my webserver’s log file to JSON using libee0 on OpenSuSE (yes I know, but it’s a great virtualisation platform). The converted file was named httpd_access_log.json and had records such as these:

{"host": "192.168.100.1", "identity": "", "user": "", "date": "05/Feb/2015:12:13:05 +0100", "request": "HEAD /ol70 HTTP/1.1", "status": "404", "size": "", "f1": "", "useragent": "Python-urllib/2.7"}
{"host": "192.168.100.1", "identity": "", "user": "", "date": "05/Feb/2015:12:13:25 +0100", "request": "GET / HTTP/1.1", "status": "403", "size": "989", "f1": "", "useragent": "Mozilla/5.0 (X11; Linux x86_64; rv:35.0) Gecko/20100101 Firefox/35.0"}

Sorry for the wide output-it’s an Apache log…

I then created the table to store the data. JSON appears to be pretty unstructured, so this will do:

SQL> create table jsontest (id number,
  2   jdata clob,
  3   constraint jsontest check (jdata is json)
  4  )
  5  lob (jdata) store as securefile (
  6   enable storage in row
  7  );

SQL> create sequence s_jsontest;

Sequence created

If you look closely then you’ll see that the JSON data is stored in an inline CLOB-that’s one of the pre-requisites for offloading LOBs in 12c.

Loading JSON

Now I needed a way to get the data into the table. I think I could have used SQLLDR but since I have rusty perl scripting skills I gave DBD::Oracle on 12.1.0.2 a go. The following script inserts records slow-by-slow or row-by-row into the table and is probably not the most efficient way to do this. But one of the reasons I blog is so that I don’t have to remember everything. If you ever wondered how to write a DBI/DBD::Oracle script here’s a starting point. Note the emphasis on “starting point” since the script has been trimmed for readability-essential error checking is not shown. Whenever you work with data make sure that your error handling is top-notch!

#!/usr/bin/perl

use strict;
use warnings;

use DBI;
use DBD::Oracle;
use Getopt::Long;

# these will be set by GetOpt::Long
my $service;            # has to be in tnsnames.ora
my $username;
my $jsonfile;

GetOptions (
  "service=s"   => \$service,
  "user=s"      => \$username,
  "jsonfile=s"  => \$jsonfile
);
die "usage: load_json.pl --service <servicename> --jsonfile [--user username] " if (!defined ($service ) || !defined ($jsonfile));

die "$jsonfile is not a file" unless ( -f $jsonfile );

print "connecting to service $service as user $username to load file $jsonfile\n";

# about to start...
my $dbh = DBI->connect ("dbi:Oracle:$service", "$username", "someCleverPasswordOrCatName")
  or die ("Cannot connect to service $service: DBI:errstr!");

print "connection to the database established, trying to load data...\n";

# prepare a cursor to loop over all entries in the file
my $sth = $dbh->prepare(q{
 insert into jsontest (id, jdata) values(s_jsontest.nextval, :json)
});

if (! open JSON, "$jsonfile")  {
  print "cannot open $jsonfile: $!\n";
  $dbh->disconnect();
  die "Cannot continue\n";
}

while (<JSON>) {
  chomp;
  $sth->bind_param(":json", $_);
  $sth->execute();
}

$dbh->disconnect();

close JSON;

print "done\n";

This script read the file and inserted all the data into the table. Again, essential error checking must be added, the script is far from being complete. You also need to set the Perl environment variables to the perl installation in $ORACLE_HOME for it to find the DBI and DBD::Oracle drivers.

Offloading or not?

It turned out that the data I inserted was of course not enough to trigger a direct path read that could turn into a Smart Scan. A little inflation of the table was needed. Once that was done I started to get my feet wet with JSON queries:

SQL> select jdata from jsontest where rownum < 6;

JDATA
--------------------------------------------------------------------------------
{"host": "192.168.100.1", "identity": "", "user": "", "date": "05/Feb/2015:12:26
{"host": "192.168.100.156", "identity": "", "user": "", "date": "05/Feb/2015:12:
{"host": "192.168.100.156", "identity": "", "user": "", "date": "05/Feb/2015:12:
{"host": "192.168.100.156", "identity": "", "user": "", "date": "05/Feb/2015:12:
{"host": "192.168.100.156", "identity": "", "user": "", "date": "05/Feb/2015:12:

Interesting. Here are a few more examples with my data set. Again, refer to oracle-base.com and the official documentation set for more information about JSON and querying it in the database. It’s by no means an Exadata only feature.

SQL> select count(*) from jsontest where json_exists(jsontest.jdata, '$.host' false on error);

  COUNT(*)
----------
   2195968

SQL> select count(*) from jsontest where not json_exists(jsontest.jdata, '$.host' false on error);

  COUNT(*)
----------
         0

And finally, here is proof that you can offload JSON data in Exadata; at least for some of the operations it should  be possible judging by the information in v$sqlfn_metadata:

SQL> select name,offloadable from v$sqlfn_metadata where name like '%JSON%'
  2  order by offloadable,name;

NAME                                               OFF
-------------------------------------------------- ---
JSON_ARRAY                                         NO
JSON_ARRAYAGG                                      NO
JSON_EQUAL                                         NO
JSON_OBJECT                                        NO
JSON_OBJECTAGG                                     NO
JSON_QUERY                                         NO
JSON_SERIALIZE                                     NO
JSON_TEXTCONTAINS2                                 NO
JSON_VALUE                                         NO
JSON                                               YES
JSON                                               YES
JSON_EXISTS                                        YES
JSON_QUERY                                         YES
JSON_VALUE                                         YES

14 rows selected.

The two entries named “JSON” are most likely “is JSON” and “is not JSON”.

And now with real data volumes on a real system using JSON_EXISTS:

SQL> select /*+ monitor am_I_offloaded */ count(*)
  2  from jsontest where json_exists(jsontest.jdata, '$.host' false on error);                                                                                        

   COUNT(*)
-----------
    2195968

Elapsed: 00:00:04.96

SQL> select * from table(dbms_xplan.display_cursor);                                                                                                        

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
SQL_ID  6j73xcww7hmcw, child number 0
-------------------------------------
select /*+ monitor am_I_offloaded */ count(*) from jsontest where
json_exists(jsontest.jdata, '$.host' false on error)

Plan hash value: 568818393

---------------------------------------------------------------------------------------
| Id  | Operation                  | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |          |       |       | 91078 (100)|          |
|   1 |  SORT AGGREGATE            |          |     1 |   610 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| JSONTEST | 21960 |    12M| 91078   (1)| 00:00:04 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage(JSON_EXISTS2("JSONTEST"."JDATA" FORMAT JSON , '$.host' FALSE ON
              ERROR)=1)
       filter(JSON_EXISTS2("JSONTEST"."JDATA" FORMAT JSON , '$.host' FALSE ON
              ERROR)=1)

So the execution plan looks promising-I can see “table access storage full” and a storage() predicate. Looking at V$SQL I get:

SQL> select sql_id, child_number,
  2  case when io_cell_offload_eligible_bytes = 0 then 'NO' else 'YES' end offloaded,
  3  io_cell_offload_eligible_bytes/power(1024,2) offload_eligible_mb,
  4  io_interconnect_bytes/power(1024,2) interconnect_mb,
  5  io_cell_offload_returned_bytes/power(1024,2) returned_mb,
  6  io_cell_offload_returned_bytes/io_cell_offload_eligible_bytes*100 offload_pct
  7   from v$sql where sql_id = '6j73xcww7hmcw';                                                                                                                                                   

SQL_ID        CHILD_NUMBER OFF OFFLOAD_ELIGIBLE_MB INTERCONNECT_MB RETURNED_MB OFFLOAD_PCT
------------- ------------ --- ------------------- --------------- ----------- -----------
6j73xcww7hmcw            0 YES         2606.695313     1191.731941 1191.724129 45.71781455

And to avoid any doubt, I have the SQL Trace as well:

PARSING IN CURSOR #140370400430072 len=120 dep=0 uid=65 oct=3 lid=65 tim=1784977054418 hv=1750582781 ad='5bfcebed8' sqlid='bfwd4t5n5gjgx'
select /*+ monitor am_I_offloaded */ count(*)   from jsontest where json_exists(jsontest.jdata, '$.host' false on error)
END OF STMT
PARSE #140370400430072:c=103984,e=272239,p=909,cr=968,cu=0,mis=1,r=0,dep=0,og=1,plh=568818393,tim=1784977054417
EXEC #140370400430072:c=0,e=105,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=568818393,tim=1784977054587
WAIT #140370400430072: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 p3=0 obj#=96305 tim=1784977054666
WAIT #140370400430072: nam='reliable message' ela= 826 channel context=27059892880 channel handle=27196561216 broadcast message=26855409216 obj#=96305 tim=1784977055727
WAIT #140370400430072: nam='enq: KO - fast object checkpoint' ela= 159 name|mode=1263468550 2=65629 0=1 obj#=96305 tim=1784977055942
WAIT #140370400430072: nam='enq: KO - fast object checkpoint' ela= 229 name|mode=1263468545 2=65629 0=2 obj#=96305 tim=1784977056265
WAIT #140370400430072: nam='cell smart table scan' ela= 196 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977057370
WAIT #140370400430072: nam='cell smart table scan' ela= 171 cellhash#=822451848 p2=0 p3=0 obj#=96298 tim=1784977057884
WAIT #140370400430072: nam='cell smart table scan' ela= 188 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977058461
WAIT #140370400430072: nam='cell smart table scan' ela= 321 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977061623
WAIT #140370400430072: nam='cell smart table scan' ela= 224 cellhash#=822451848 p2=0 p3=0 obj#=96298 tim=1784977062053
WAIT #140370400430072: nam='cell smart table scan' ela= 254 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977062487
WAIT #140370400430072: nam='cell smart table scan' ela= 7 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977062969
WAIT #140370400430072: nam='cell smart table scan' ela= 25 cellhash#=822451848 p2=0 p3=0 obj#=96298 tim=1784977063016
WAIT #140370400430072: nam='cell smart table scan' ela= 81 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977063115
WAIT #140370400430072: nam='cell smart table scan' ela= 1134 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977065442
WAIT #140370400430072: nam='cell smart table scan' ela= 6 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977065883
WAIT #140370400430072: nam='cell smart table scan' ela= 14 cellhash#=822451848 p2=0 p3=0 obj#=96298 tim=1784977065917
WAIT #140370400430072: nam='cell smart table scan' ela= 105 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977066037
WAIT #140370400430072: nam='cell smart table scan' ela= 12 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977066207
WAIT #140370400430072: nam='cell smart table scan' ela= 6605 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977072866
WAIT #140370400430072: nam='cell smart table scan' ela= 27 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977073877
WAIT #140370400430072: nam='cell smart table scan' ela= 29 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977074903
WAIT #140370400430072: nam='cell smart table scan' ela= 907 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977077783
WAIT #140370400430072: nam='cell smart table scan' ela= 28 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977078753
WAIT #140370400430072: nam='cell smart table scan' ela= 24 cellhash#=3249924569 p2=0 p3=0 obj#=96298 tim=1784977080860
WAIT #140370400430072: nam='cell smart table scan' ela= 1077 cellhash#=674246789 p2=0 p3=0 obj#=96298 tim=1784977082935
...

Summary

So yet, it would appear as if JSON is offloaded.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: