Category Archives: Oracle

Oracle (Database and Middleware) related posts

Dead Connection Detection (DCD) and the Oracle database

Dead Connection Detection is a useful feature of the Oracle database: it allows for the cleanup of “dead” sessions so they don’t linger around consuming memory and other system resources. The idea is simple: if the database detects that a client process is no longer connected to its server process, it cleans up. This can happen in many ways, in most cases this kind of problem is triggered by an end user.

A dead connection shouldn’t be confused with idle connections: an idle connection still maintains the network link between client and server process, except that there is no activity. Idle connections aren’t maintained/controlled via DCD, there are other tools in the database handling such cases.

As a by product, DCD can also help with overly eager firewalls forcibly removing seemingly idle network connections. I found the following posts and the references therein very useful:

With Oracle 12c Oracle changed the way DCD works by no longer relying on its network layer but rather pushing the functionality into the TCP stack on platforms that support it. This change in behaviour is also explained in an Oracle white paper from 2014.

For the first part of this post I set sqlnet.expire_time to 1 as per a MOS note I found, your value is probably different. The parameter is documented in the Net*8 reference, please use it to work out what the best value is for you. As others have pointed out, this parameter has to go into the RDBMS home, more specifically $ORACLE_HOME/network/admin/sqlnet.ora. I am using Oracle 19.4 on Oracle Linux 7.7 for this blog post. I have seen the same behaviour in 12.1.0.2 as well in my lab.

Background

In addition to the proven methods of checking whether TCP_KEEPALIVE is enabled for a given session I wanted to show another one. Using the ss(8) utility it is possible to show socket options. I also tried lsof but on my system I couldn’t get it to print the options:

SQL> select spid from v$process where addr = (select paddr from v$session where username = 'MARTIN');

SPID
------------------------
13656

SQL> exit 

...

[root@server2 ~]# lsof -nP -p 13656 -T f
lsof: unsupported TCP/TPI info selection: f
lsof 4.87

Although the man-page for lsof reads:

       -T [t]   controls the reporting of some TCP/TPI information,  also  reported  by  net‐
                stat(1),  following  the network addresses.  In normal output the information
                appears in parentheses, each item except TCP or TPI state name identified  by
                a keyword, followed by `=', separated from others by a single space:

                     <TCP or TPI state name>
                     QR=<read queue length>
                     QS=<send queue length>
                     SO=<socket options and values>
                     SS=<socket states>
                     TF=<TCP flags and values>
                     WR=<window read length>
                     WW=<window write length>

                Not all values are reported for all UNIX dialects.  Items values (when avail‐
                able) are reported after the item name and '='.

                When the field output mode is in effect  (See  OUTPUT  FOR  OTHER  PROGRAMS.)
                each item appears as a field with a `T' leading character.

                -T with no following key characters disables TCP/TPI information reporting.

                -T with following characters selects the reporting of specific TCP/TPI infor‐
                mation:

                     f    selects reporting of socket options,
                          states and values, and TCP flags and
                          values.

So let’s try something else: ss(8) – another utility to investigate sockets

Revealing socket options

I have used ss(8) in the past when I didn’t have netstat available, which is more and more common now that netstat is deprecated and its designated successor is ss :)

As far as I know you can’t limit ss to show information just for a PID, I use grep to limit the output. The output is in fact very wide, which is why this might not look pretty on the blog depending on whether the renderer decides to wrap output or not.

[root@server2 ~]# ss -nop | egrep 'NetidState|13656'
NetidState Recv-Q Send-Q                                    Local Address:Port                                      Peer Address:Port                                                                                                           
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,4.412ms,0)        
[root@server2 ~]# 

I used the following options:

  • -n for “Do not try to resolve service names”
  • -o for “Show timer information” and finally
  • -p to “Show process using socket”.

The main option here is -o. As per the man page:

       -o, --options
              Show timer information. For tcp protocol, the output format is:

              timer:(<timer_name>,<expire_time>,<retrans>)

              <timer_name>
                     the name of the timer, there are five kind of timer names:

                     on: means one of these timers: tcp retrans timer, tcp early retrans timer and tail loss probe timer
                     keepalive: tcp keep alive timer
                     timewait: timewait stage timer
                     persist: zero window probe timer
                     unknown: none of the above timers

              <expire_time>
                     how long time the timer will expire

              <retrans>
                     how many times the retran occurs

With a little shell loop I can show how that timer is decrementing:

[root@server2 ~]# for i in $(seq 1 5); do ss -nop | grep 13656; sleep 1 ; done
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,20sec,0)          
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,19sec,0)          
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,18sec,0)          
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,17sec,0)          
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,16sec,0)          
[root@server2 ~]# 

Summary

Using the ss utility it is possible to check if a keepalive timer is implemented as a means to support DCD with 12.1 and later releases. Invoking ss(8) hasn’t caused any problems on my system, but as with every such tool you need to ensure it’s safe to use before attempting to look at an important system.

Advertisements

Little things worth knowing: keeping enq: TM enqueue at bay during direct path inserts

Direct path inserts are commonly found in processing where data are shifted from one source to another. There are many permutations of the theme, this post details the way SQL Loader (sqlldr) behaves.

I have previously written about sqlldr and concurrent direct path inserts. The behaviour for releases <= 12.1 is described in the first post, the most welcome changes in 12.2 went into the second post. Since the fastest way of doing something is not to do it at all, I thought it might be worth demonstrating a useful technique to keep the dreaded TM enqueue at bay. Please note that these do not only apply to sqlldr, inserts with the append hint for example behave the same way in my experience.

I used Oracle 19.4.0 on Oracle Linux 7 for this post, the content should apply to older database releases, too. The data model is once more provided by Swingbench, I’m using the SH benchmark this time. As I said on twitter yesterday, @TanelPoder scripts are simply too good to afford writing my own, so I’m using them in this post where possible.

Please note that I’m using partitioning in this post, which is a cost option on top of Enterprise Edition. Always ensure your system is properly licensed for features you use.

Preparations

Let’s assume for a moment that you would like to load another batch of records into the (Swingbench) sh.sales table. The table is range partitioned on time_id, and split into quarterly partitions – at least for the later ones. On my system I count 68 partitions, with the latest containing records up to January 1st 2014. I added a couple of partitions for the next 2 quarters to allow for the new data load.

SQL> select partition_name, partition_position,high_value 
  2  from user_tab_partitions where table_name = 'SALES' 
  3   and partition_position >= 68;

PARTITION_NAME     PARTITION_POSITION HIGH_VALUE
----------------- ------------------- -----------------------------------------------------------------------------------
SALES_Q4_2013                      68 TO_DATE(' 2014-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')   
SALES_Q1_2014                      69 TO_DATE(' 2014-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')   
SALES_Q2_2014                      70 TO_DATE(' 2014-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')   

I created CSV files of the data to be loaded by simply adding 3 and 6 months to the last populated partition.

spool /u01/app/oracle/admin/ORCL/sqlldr/sales_q1_2014.csv
select /*csv*/ 
 prod_id,
 cust_id,
 add_months(time_id,3) time_id,
 channel_id,
 promo_id,
 quantity_sold,
 seller,
 fulfillment_center,
 courier_org,
 tax_country,
 tax_region,
 amount_sold 
from sales partition (sales_q4_2013);

spool off
spool  /u01/app/oracle/admin/ORCL/sqlldr/sales_q2_2014.csv

select /*csv*/ 
 prod_id,
 cust_id,
 add_months(time_id,6) time_id,
 channel_id,
 promo_id,
 quantity_sold,
 seller,
 fulfillment_center,
 courier_org,
 tax_country,
 tax_region,
 amount_sold 
from sales partition (sales_q4_2013);

spool off
exit

Using the csv “hint” and executing this little script in sqlcl allows me to spool the files into the desired format (I made sure I have plenty of space available in /u01/app/oracle/admin). Now let’s try loading the data. As always, I use a control file although there might be better ways of doing this. It contains the following instructions:

options (direct=true,skip=1)
load data
append into table sales
fields terminated by ',' optionally enclosed by '"'
date format "dd-mon-yy"
trailing nullcols
(
  PROD_ID,                                                                        
  CUST_ID,                                                                        
  TIME_ID,                                                                        
  CHANNEL_ID,                                                                     
  PROMO_ID,                                                                       
  QUANTITY_SOLD,                                                                  
  SELLER,                                                                         
  FULFILLMENT_CENTER,                                                             
  COURIER_ORG,                                                                    
  TAX_COUNTRY,                                                                    
  TAX_REGION,                                                                     
  AMOUNT_SOLD
)

Test #1

Now let’s load the data! Since I’m not the most patient person I launched 2 sessions in parallel using a screen (1) session

$ sqlldr /@sh control=load_sales.ctl data=sales_q1_2014.csv log=load_q1.log bad=load_q1.bad direct_path_lock_wait=true

$ sqlldr /@sh control=load_sales.ctl data=sales_q2_2014.csv log=load_q2.log bad=load_q2.bad direct_path_lock_wait=true

While monitoring the load process I was surprised to see session 2 waiting:

SQL> select sid,seq#,event,state from v$session where program like '%sqlldr%';

       SID       SEQ# EVENT                                                            STATE
---------- ---------- ---------------------------------------------------------------- -------------------
      1228        578 SQL*Net more data from client                                    WAITED SHORT TIME
      1470         97 enq: TM - contention                                             WAITING

2 rows selected.

Each process should insert data into separate partitions, so why is there a wait? Maybe there is a clue in v$session_wait:

SQL> @sw 1470

    SID STATE   EVENT                                          SEQ# SEC_IN_WAIT P1                  P2                  P3                  P1TRANSL
------- ------- ---------------------------------------- ---------- ----------- ------------------- ------------------- ------------------- ------------------------------------------
   1470 WAITING enq: TM - contention                             98           6 name|mode=          object #= 78758     table/partition= 0  0x544D0006: TM mode 6
                                                                                0x00000000544D0006


1 row selected.

I noticed that p3 mentions a table or partition flag, which triggered my interest. After a little bit of research I found this section in the documentation: 12.3.2 Direct Path Load of a Partitioned or Subpartitioned Table which provides some clues. Object# 78758 is the table in my database:

SQL> @oid 78758

owner                     object_name                    object_type        SUBOBJECT_NAME                 CREATED             LAST_DDL_TIME       status    DATA_OBJECT_ID
------------------------- ------------------------------ ------------------ ------------------------------ ------------------- ------------------- --------- --------------
SH                        SALES                          TABLE                                             2019-08-29 15:14:53 2019-08-29 23:06:27 VALID

What if I tried to insert into a partition straight away?

Test #2

Let’s try that with the following 2 control files. The main differences are a) referencing the input file directly and b) the specification of the target partition in the append clause:

$ cat load_sales_q1_2014.ctl
options (skip=1,direct=true)
load data
infile  /u01/app/oracle/admin/ORCL/sqlldr/sales_q1_2014.csv
append into table sales partition (sales_q1_2014)
fields terminated by ',' optionally enclosed by '"'
date format "dd-mon-yy"
trailing nullcols
( 
  PROD_ID,
  CUST_ID,
  TIME_ID,
  CHANNEL_ID,
  PROMO_ID,
  QUANTITY_SOLD,
  SELLER,
  FULFILLMENT_CENTER,
  COURIER_ORG,
  TAX_COUNTRY,
  TAX_REGION,
  AMOUNT_SOLD
)

$ cat load_sales_q2_2014.ctl
options (skip=1,direct=true)
load data
infile  /u01/app/oracle/admin/ORCL/sqlldr/sales_q2_2014.csv
append into table sales partition (sales_q2_2014)
fields terminated by ',' optionally enclosed by '"'
date format "dd-mon-yy"
trailing nullcols
( 
  PROD_ID,
  CUST_ID,
  TIME_ID,
  CHANNEL_ID,
  PROMO_ID,
  QUANTITY_SOLD,
  SELLER,
  FULFILLMENT_CENTER,
  COURIER_ORG,
  TAX_COUNTRY,
  TAX_REGION,
  AMOUNT_SOLD
)

The slightly adjusted commands for sqlldr are as follows:

$ sqlldr /@sh control=load_sales_q1_2014.ctl log=load_q1.log bad=load_q1.bad direct_path_lock_wait=true

$ sqlldr /@sh control=load_sales_q2_2014.ctl log=load_q2.log bad=load_q2.bad direct_path_lock_wait=true

With the change in place I couldn’t notice any TM enqueues when running these in parallel:

SQL> select sid,seq#,event,state from v$session where program like '%sqlldr%';

no rows selected

SQL>
SQL> r
  1* select sid,seq#,event,state from v$session where program like '%sqlldr%'

       SID       SEQ# EVENT                                                            STATE
---------- ---------- ---------------------------------------------------------------- -------------------
      1251        627 direct path write temp                                           WAITED KNOWN TIME
      1488        475 PGA memory operation                                             WAITED SHORT TIME

SQL> r
  1* select sid,seq#,event,state from v$session where program like '%sqlldr%'

       SID       SEQ# EVENT                                                            STATE
---------- ---------- ---------------------------------------------------------------- -------------------
      1251        819 direct path write temp                                           WAITING
      1488        664 direct path write temp                                           WAITING

SQL> r
  1* select sid,seq#,event,state from v$session where program like '%sqlldr%'

       SID       SEQ# EVENT                                                            STATE
---------- ---------- ---------------------------------------------------------------- -------------------
      1251       1070 SQL*Net more data from client                                    WAITED SHORT TIME
      1488        906 direct path write temp                                           WAITED KNOWN TIME

SQL> r
  1* select sid,seq#,event,state from v$session where program like '%sqlldr%'

       SID       SEQ# EVENT                                                            STATE
---------- ---------- ---------------------------------------------------------------- -------------------
      1251       1332 db file sequential read                                          WAITED SHORT TIME
      1488       1143 library cache lock                                               WAITING

SQL> r
  1* select sid,seq#,event,state from v$session where program like '%sqlldr

       SID       SEQ# EVENT                                                            STATE
---------- ---------- ---------------------------------------------------------------- -------------------
      1251       1824 db file sequential read                                          WAITED SHORT TIME
      1488       1372 db file sequential read                                          WAITING

That looks ok, and switching back through both screen sessions I can see both invocations of sqlldr completed ok:

$ sqlldr /@sh control=load_sales_q1_2014.ctl log=load_q1.log bad=load_q1.bad direct_path_lock_wait=true

SQL*Loader: Release 19.0.0.0.0 - Production on Thu Aug 29 22:44:53 2019
Version 19.4.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct

Load completed - logical record count 652437.

Table SALES, partition SALES_Q1_2014:
  652437 Rows successfully loaded.

Check the log file:
  load_q1.log
for more information about the load.

$ sqlldr /@sh control=load_sales_q2_2014.ctl log=load_q2.log bad=load_q2.bad direct_path_lock_wait=true

SQL*Loader: Release 19.0.0.0.0 - Production on Thu Aug 29 22:44:54 2019
Version 19.4.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct

Load completed - logical record count 652437.

Table SALES, partition SALES_Q2_2014:
  652437 Rows successfully loaded.

Check the log file:
  load_q2.log
for more information about the load.

Summary

When you are trying to load data into a table concurrently using direct path inserts, you will have to wait on a TM enqueue unless your table is partitioned and you are specifying different partitions as the target in each process. More generally speaking, Oracle will hold a TM enqueue on the segment you are loading into, so 2 processes attempting a direct path insert into a (sub) partition will equally have to serialise.

NB: if you are using direct path mode for sqlldr and your target segment is indexed, make sure to read chapter 12 in the Database Utilities Guide to learn about the implications of direct path load and index maintenance.

ORA-14300 partitioning key maps to a partition outside maximum permitted number of partitions and NULLs

While researching interval partitioning in Oracle 19c I came across a phenomenon I had already solved a long time ago but then forgot about. This time I decided to write about it so I don’t have to spend 5 minutes rediscovering the wheel. As always, if you are following along, make sure you are appropriately licensed for this feature

The Situation

Consider the following example:

SQL*Plus: Release 19.0.0.0.0 - Production on Fri Aug 16 09:32:17 2019
Version 19.4.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Fri Aug 16 2019 09:07:55 +02:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.4.0.0.0

SQL> create table t (id, vc, d) 
  2  partition by range (id) interval (5000)
  3  ( 
  4   partition p1 values less than (5000)
  5  ) as
  6  select object_id, object_name, created from dba_objects;
create table t (id, vc, d)
*
ERROR at line 1:
ORA-14300: partitioning key maps to a partition outside maximum permitted
number of partitions

SQL> !oerr ora 14300
14300, 00000, "partitioning key maps to a partition outside maximum permitted number of partitions"
// *Cause:  The row inserted had a partitioning key that maps to a partition number greater than 1048575
// *Action  Ensure that the partitioning key falls within 1048575 partitions or subpartitions.

I knew about the limit of 1048575 partitions, but that shouldn’t be the case here:

SQL> select max(object_id), max(object_id) / 5000 from dba_objects;

MAX(OBJECT_ID) MAX(OBJECT_ID)/5000
-------------- -------------------
         75743             15.1486

There shouldn’t be more than 16 partitions …

NEARBY: not having a not-null constraint on a partition key might be an indication of a sub-optimal design choice.

A quick search revealed an article on AskTom mentioning NULLs. I hadn’t thought about NULL values in dba_objects.object_id, but apparently there are:

SQL> select object_id, object_name, object_type from dba_objects where object_id is null;

 OBJECT_ID OBJECT_NAME                    OBJECT_TYPE
---------- ------------------------------ -----------------------
           SYS_HUB                        DATABASE LINK

Today I learned that database links don’t have object_ids. Neither is there a data_object_id.

Summary

Don’t try inserting NULLs as a partitioning key if that partition key uses interval partitioning. The error message, although strictly speaking correct, lead me astray for a moment:

SQL> create table t (id, vc, d) 
  2  partition by range (id) interval (5000)
  3  (
  4   partition p1 values less than (5000)
  5  )
  6  as
  7  select object_id, object_name, created from dba_objects 
  8  WHERE OBJECT_ID IS NOT NULL;

Table created.

Problem solved! Another, probably better approach, would be the addition of a not null constraint in the first place:

SQL> create table t (id not null, vc, d)
  2  partition by range (id) interval (5000)
  3  (
  4   partition p1 values less than (5000)
  5  )
  6  as select object_id, object_name, created from dba_objects;

ERROR at line 6:
ORA-01400: cannot insert NULL into ("MARTIN"."T"."ID")

The intention of my post was to help users who get an ORA-14300 although they don’t exceed the partition limit. It’s not a post about good design ;)

Interlude

This actually works for non-interval partitioned tables, but ONLY if you have a catch all partition:

SQL> select dbms_rowid.rowid_object(rowid) from t1 where id is null;

DBMS_ROWID.ROWID_OBJECT(ROWID)
------------------------------
                         75564

SQL> select object_name, subobject_name, object_id, data_object_id 
  2  from dba_objects where data_object_id = 75564
  3  /

OBJECT_NAME          SUBOBJECT_NAME        OBJECT_ID DATA_OBJECT_ID
-------------------- -------------------- ---------- --------------
T1                   PMAX                      75564          75564

SQL> select high_value from user_tab_partitions where table_name = 'T1' and partition_name = 'PMAX'
  2  /

HIGH_VALUE
------------------------------
MAXVALUE

Hope this helps!

sqlldr, direct path loads and concurrency in 12.2 and later

In my previous post I showed you that Oracle’s SQL loader (sqlldr) utility has a built-in timeout of 30 seconds waiting for locked resources before returning SQL*Loader-951/ORA-604/ORA-54 errors and failing to load data. This can cause quite some trouble! Before showing you the enhancement in 12.2 and later, here is the gist of the previous post.

Concurrency in Oracle sqlldr 12.1 and earlier

To show you how sqlldr times out I need to simulate an exclusive lock on the table in sqlplus for example. That’s quite simple:

SQL> set time on
10:17:00 SQL> lock table t2 in exclusive mode;

Table(s) Locked.

10:17:07 SQL>

Next I started a sqlldr process in another session. Please refer to the previous post for details, or take my word that I’m using a direct path insert strategy. The only difference is the size of the input file – I had to inflate it considerably to buy some time running standard diagnostic tools:

$ date; sqlldr /@loader control=t2_2.ctl ; date
Tue 23 Jul 10:22:35 BST 2019

SQL*Loader: Release 19.0.0.0.0 - Production on Tue Jul 23 10:22:35 2019
Version 19.4.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct
SQL*Loader-951: Error calling once/load initialization
ORA-00604: error occurred at recursive SQL level 1
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
Tue 23 Jul 10:23:06 BST 2019

The “date” commands reveal a timeout of 30 seconds. Setting ddl_lock_timeout has no bearing on the timeout: the database is waiting on kpodplck wait before retrying ORA-54:

10:22:36 SQL> r
  1* select event, state from v$session where program like 'sqlldr%'

EVENT                                                            STATE
---------------------------------------------------------------- -------------------
kpodplck wait before retrying ORA-54                             WAITING

1 row selected.

This was where I left off with the previous post until I noticed there is another option!

Oracle 12.2 and later

In 12.2 and later you can instruct sqlldr to wait until the lock is released. There is a new parameter named direct_path_lock_timeout:

$ sqlldr | egrep 'Version|direct_path'
Version 19.4.0.0.0
direct_path_lock_wait -- wait for access to table when currently locked  (Default FALSE)

Interestingly there are no hits for direct_path_lock_wait in My Oracle Support’s knowledgebase. There are merely a few documentation references. So what does this parameter do? While the table is still locked in exclusive mode, let’s start the sqlldr process with the new option:

$ date; sqlldr /@loader control=t2_2.ctl direct_path_lock_wait=true; date
Fri 26 Jul 10:13:54 BST 2019

SQL*Loader: Release 19.0.0.0.0 - Production on Fri Jul 26 10:13:54 2019
Version 19.4.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct
[ ... waiting ... ]

The process now sits there and waits … and it does so for more than 30 seconds. And instead of kpodplck wait before retrying ORA-54 it waits on … drums please: enq: TM contention!

10:20:11 SQL> select seq#, event, state, round(seconds_in_wait/60, 2) mins_waiting
10:20:21   2  from v$session where program like 'sqlldr%';

      SEQ# EVENT                    STATE               MINS_WAITING
---------- ------------------------ ------------------- ------------
       119 enq: TM - contention     WAITING                     6.53

10:20:26 SQL> 

This is great news if your sqlldr processes compete for TM enqueues and your load process takes a little longer than the previously hard coded timeout of 30 seconds. The process eventually completed successfully after the enqueue was released:

$ date; sqlldr /@loader control=t2_2.ctl direct_path_lock_wait=true; date
Fri 26 Jul 10:13:54 BST 2019

SQL*Loader: Release 19.0.0.0.0 - Production on Fri Jul 26 10:13:54 2019
Version 19.4.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct

Load completed - logical record count 950936.

Table T2:
  950924 Rows successfully loaded.

Check the log file:
  t2_2.log
for more information about the load.
Fri 26 Jul 10:21:32 BST 2019

Caveats

As with every pro, there are usually cons associated. The downside to waiting (for a potentially very long time) is that you might not notice load processes beginning to stack up unless proper instrumentation and monitoring are in place. Waiting too long for data to be loaded is equally bad as not loading at all because the end result is identical. As with many features in the database Oracle gives you plenty of options, and it’s up to the developers and architects to make the correct decisions on how to use them.

Summary

Beginning with sqlldr 12.2 Oracle introduced the option to wait for enqueues on segments to be released instead of aborting after 30 seconds.

In the next post I’ll write about another possibility to prevent exclusive table locks in the first place when running multiple concurrent sqlldr sessions.

sqlldr, direct path loads and concurrency in 12.1 and earlier

I have recently come across an interesting issue related to concurrent data loading into the Oracle database using sqlldr’s direct path mode. Although I investigated the situation on 12.1.0.2, I found that the same holds true in 19.4 as well when using the defaults. I reconstructed the case, although it is simplified a little to bring the point home.

The environment I used to put this post together is Oracle Restart 19.4.0 on Oracle Linux 7.6.

Test overview

For this test I am running concurrent sqlldr sessions to demonstrate the case. I am conscious of that fact that I could have used external tables, but then I wouldn’t have been able to write this post :)

Assume there’s a table named t2:

SQL> create table t2 as select * from dba_objects where 1 = 0;

Table created.

I am planning on populating the table with data in CSV format. Using sqlcl it is dead easy to create an input file, simply specify the /*csv*/ hint in your query and spool the output to a file: job done. The resulting input file is named t2.dat.

Next up is the control file. I am a creature of habit and tend to use a control file although I could have tried the same process using express mode.

Using the documentation I figured the following control file should do. Since it’s easy enough to forget passing direct=true to the command line I added it to the options clause. When using sqlcl to create a CSV file it adds the column headings in line 1, and they won’t load so I’m skipping these records. Here is the full control file:

options (direct=true, skip=1)
load data 
infile 't2.dat'
into table t2
append
fields csv without embedded 
date format 'dd-mon-yy'
trailing nullcols
(
OWNER,                                                                          
OBJECT_NAME,                                                                    
SUBOBJECT_NAME,                                                                 
OBJECT_ID,                                                                      
DATA_OBJECT_ID,                                                                 
OBJECT_TYPE,                                                                    
CREATED date,                                                                        
LAST_DDL_TIME date,                                                                  
TIMESTAMP,                                                                      
STATUS,                                                                         
TEMPORARY,                                                                      
GENERATED,                                                                      
SECONDARY,                                                                      
NAMESPACE,                                                                      
EDITION_NAME,                                                                   
SHARING,                                                                        
EDITIONABLE,                                                                    
ORACLE_MAINTAINED,                                                              
APPLICATION,                                                                    
DEFAULT_COLLATION,                                                              
DUPLICATED,                                                                     
SHARDED,                                                                        
CREATED_APPID,                                                                  
CREATED_VSNID,                                                                  
MODIFIED_APPID,                                                                 
MODIFIED_VSNID
)

With the prerequisites at hand I’m ready to perform some testing.

Test #1: concurrent sessions using defaults

First of all, what happens when starting 5 concurrent sessions? Will they complete? Let’s try this in bash:

for i in $(seq 1 5) ; do
  echo starting iteration $i
  sqlldr /@loader control=t2.ctl log=session_${i}.log &
done

This ran to completion without any visible errors, but I prefer to run sanity checking anyway. The first thing to do is to see if all records have been loaded. The input file contains 73148 data records by the way.

$ grep 'Rows successfully loaded' session*log
session_1.log:  73148 Rows successfully loaded.
session_2.log:  73148 Rows successfully loaded.
session_3.log:  73148 Rows successfully loaded.
session_4.log:  73148 Rows successfully loaded.
session_5.log:  73148 Rows successfully loaded.

So this looks ok, what about the run times?

$ grep '^Run' *.log
session_1.log:Run began on Mon Jul 22 21:32:31 2019
session_1.log:Run ended on Mon Jul 22 21:32:44 2019
session_2.log:Run began on Mon Jul 22 21:32:31 2019
session_2.log:Run ended on Mon Jul 22 21:32:40 2019
session_3.log:Run began on Mon Jul 22 21:32:31 2019
session_3.log:Run ended on Mon Jul 22 21:32:46 2019
session_4.log:Run began on Mon Jul 22 21:32:31 2019
session_4.log:Run ended on Mon Jul 22 21:32:48 2019
session_5.log:Run began on Mon Jul 22 21:32:31 2019
session_5.log:Run ended on Mon Jul 22 21:32:42 2019

All in all these loads completed in a reasonably short time. Some took longer to finish than others though. That made me curious and I logged into the database to see what was going on. I never consciously saw this event:

SQL> select sid, event, state from v$session where program like 'sqlldr%';

       SID EVENT                                                            STATE
---------- ---------------------------------------------------------------- -------------------
        44 kpodplck wait before retrying ORA-54                             WAITING
        52 kpodplck wait before retrying ORA-54                             WAITING
        54 kpodplck wait before retrying ORA-54                             WAITING
       290 Data file init write                                             WAITING
       291 kpodplck wait before retrying ORA-54                             WAITING

SQL> 

Using the output from my above query I could see that one session was active, all the others queued up behind it. Interestingly there is no “enq: TM – contention” which you’d see with direct path inserts in sqlplus for example:

-- in session 1
SQL> insert /*+ append */ into t2 select * from dba_objects;

73157 rows created.

-- notice there is no commit! 

-- Session 2 waits ...
SQL> insert /*+ append */ into t2 select * from dba_objects;

-- the DBA sees it all:
SQL> select sid, event, state from v$session where username = 'MARTIN'

       SID EVENT                                                            STATE
---------- ---------------------------------------------------------------- -------------------
        51 enq: TM - contention                                             WAITING
       270 SQL*Net message from client                                      WAITING

What’s even more interesting is this: ddl_lock_timeout which I thought might have helped, has no effect on sqlldr operations:

SQL> show parameter ddl_lock_timeout

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
ddl_lock_timeout                     integer     0
SQL> 

The parameter isn’t set at all, yet I could see “kpodplck wait before retrying ORA-54” which is strange. ORA-54 is of course the dreaded “resource busy and acquire with nowait specified” error. I managed to trace kpodp back to the direct load path functionality.

Without the ability to control waiting via a database parameter I am aware of, there must be some other timeout. And sure enough, there is:

$ time sqlldr /@loader control=t2.ctl

SQL*Loader: Release 19.0.0.0.0 - Production on Mon Jul 22 21:46:59 2019
Version 19.4.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct
SQL*Loader-951: Error calling once/load initialization
ORA-00604: error occurred at recursive SQL level 1
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

real    0m32.038s
user    0m0.011s
sys     0m0.011s

It appears as if sqlldr times out waiting for a locked resource after round about 30 seconds, regardless of ddl_lock_timeout.

And this is the end of the story if you aren’t on 12.2 or later. In the next post I’ll show you how you can work around this problem with more current releases.

RBAL (ospid: nnn): terminating the instance due to error 27625 after patching Oracle Restart

I have come across an odd behaviour trying to patch an Oracle Restart environment to 12.1.0.2 January 2019. Based on a twitter conversation this isn’t necessarily limited to my patch combination, there might be others as well. I have used opatchauto to apply patch 28813884 to both RDBMS and GRID homes plus its corresponding OJVM (Java) patch. Before diving into details, this is the environment I have been working with:

  • Oracle Restart 12.1.0.2 with an earlier PSU/OJVM combo applied
  • Separation of duties with oracle as the RDBMS owner, and grid owning the GRID infrastructure installation. This is a key point!
  • Patches to be applied
    • Patch 28813884 (GI PSU)
    • Patch 28790654 (corresponding OJVM patch)
  • OPatch version as installed in the GRID|RDBMS home
    • opatchauto: 13.9.4.2.0
    • opatch: 12.2.0.1.17

Following the instructions in the readme file I upgraded OPatch to the required version. The file I downloaded was named p6880880_121010_Linux-x86-64.zip. I double-checked the readme, and to me this is the correct file. After upgrading OPatch in the RDBMS and GRID homes, I started patching.

After this process completed, I wanted to start the database in UPGRADE mode as required by the OJVM patch. This needs to be done via sqlplus since srvctl does not support an “upgrade” option in “start database”.

And that’s where it hit me: whenever I tried to open the database in UPGRADE mode, it threw an error:

 SQL*Plus: Release 12.1.0.2.0 Production on Tue Jul 16 12:15:34 2019

 Copyright (c) 1982, 2014, Oracle.  All rights reserved.

 Connected to an idle instance.
 
 SQL> startup upgrade
 
 ORACLE instance started.
 
 Total System Global Area 1073741824 bytes
 Fixed Size                  2932632 bytes
 Variable Size             713031784 bytes
 Database Buffers          352321536 bytes
 Redo Buffers                5455872 bytes
 ORA-03113: end-of-file on communication channel
 Process ID: 22354
 Session ID: 14 Serial number: 41958

Looking at the alert log I can see that RBAL terminated the instance:

ALTER DATABASE   MOUNT
 ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.56.113)(PORT=1521))' SCOPE=MEMORY SID='ORCL';
 2019-07-16 12:15:43.404000 -04:00
 NOTE: ASMB mounting group 1 (DATA)
 WARNING: cellinit.ora is missing. RBAL is terminating the instance.
 RBAL (ospid: 22332): terminating the instance due to error 27625
 System state dump requested by (instance=1, osid=22332 (RBAL)), summary=[abnormal instance termination].
 System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/ORCL/trace/ORCL_diag_22306_20190716121543.trc
 Dumping diagnostic data in directory=[cdmp_20190716121543], requested by (instance=1, osid=22332 (RBAL)), summary=[abnormal instance termination].
 Instance terminated by RBAL, pid = 22332
 2019-07-16 12:15:49.164000 -04:00

But since this is Oracle Restart, Clusterware will simply restart the database. Unless of course you’ve been changing the default behaviour. And funny enough, this works (see further down in the article as to why). Although I’d appreciate this in most cases, the automatic restart isn’t appropriate in my situation: when started by Clusterware, the database is not in upgrade mode:

 SQL> select status from v$instance;

 STATUS
 ------------
 OPEN

Which is a problem for me. A MOS search about error 27625 didn’t reveal anything too useful, and it took me quite a while to realise the problem has to do with permissions. An Internet search finally gave me the right answer, a fellow blogger has pointed it out a little while ago …

So what exactly is the problem? The RDBMS “oracle” binary needs a specific set of permissions and ownership/group membership for Oracle Restart with separation of duties enabled. This is what it looked like before applying the patch:

$ ls -l $ORACLE_HOME/bin/oracle
-rwsr-s--x. 1 oracle asmadmin 324518992 Jul 16 12:15 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle

The file permissions are ‘6751’ with oracle owning the file but it belongs to the asmadmin group. Remember, I am using a separate user for Grid Infrastructure with its own specific operating system groups. After running opatch, this changed to:

$ ls -l $ORACLE_HOME/bin/oracle
-rwsr-s--x. 1 oracle oinstall 324518992 Jul 16 12:29 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle

Permissions remained the same, however the group changed from asmadmin to oinstall. Not quite what I had in mind, and it reproducibly causes instance crashes. There is a simple solution: make sure permissions are set correctly! Using “Database Creation on 11.2/12.1/12.2 Grid Infrastructure with Role Separation ( ORA-15025, KFSG-00312, ORA-15081 ) (Doc ID 1084186.1)” got me on the right track.

 $ ls -l $ORACLE_HOME/bin/oracle
 -rwsr-s--x. 1 oracle asmadmin 324518992 Jul 16 12:29 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle

 $ sqlplus / as sysdba

 SQL*Plus: Release 12.1.0.2.0 Production on Tue Jul 16 12:34:45 2019

 Copyright (c) 1982, 2014, Oracle.  All rights reserved.

 Connected to an idle instance.

 SQL> startup upgrade
 ORACLE instance started.

 Total System Global Area 1073741824 bytes
 Fixed Size                  2932632 bytes
 Variable Size             713031784 bytes
 Database Buffers          352321536 bytes
 Redo Buffers                5455872 bytes
 Database mounted.
 Database opened.

 SQL> select status from v$instance;

 STATUS
 ------------
 OPEN MIGRATE

 SQL> 

With the database in upgrade mode (I believe the “migrate” is a left over from the 9i days) I am reassured that running datapatch (the OJVM part actually) works as advertised.

By the way Clusterware corrects the group permissions when you issue a “srvctl start database -db …” command as documented in “Starting the database using srvctl changes the group setting for oracle binary (Doc ID 1508027.1)”. Which didn’t solve my problem as I can’t start the database in upgrade mode using srvctl.

Hope this helps you one day!

JSON_TABLE() and date/time columns in Oracle 19c

While researching the use of JSON in Oracle 19c I came some interesting behaviour that wasn’t immediately obvious (to me). With this post I am hoping to save you a couple of minutes scratching your head when working with JSON_TABLE(). This is Oracle 19.3.0 on Linux and I’m connecting to it using SQLcl 19.1.

Some background

As part of my JSON-support-in-Oracle research I had a good look at JSON_TABLE. Although complex at first sight, it is a lot less intimidating if you know how to use XMLTABLE :) My goal for this post is to convert a JSON document to a relational structure.

To convert JSON to (relational) rows and columns, I believe I need to use the JSON_value_column from the list of available options in JSON_TABLE‘s JSON_column_definition. Interestingly, and different from the syntax diagram for XMLTABLE(), I’m allowed to specify a JSON_value_return_type. There are quite a a few of them, and I guess the most commonly used ones are going to be varchar2, date and timestamp. This post is about the latter two, since parsing out a text string is pretty simple.

The Test Case

I created a very simple table to test the functionality:

SQL> create table t1 (
   2   id number not null,
   3   json_txt clob not null,
   4   constraint pk_t1 primary key (id),
   5   constraint c_is_json check (json_txt is json)
   6  );

 Table created.

Essentially a table to hold a JSON column plus a check constraint to make sure that I’m not inserting data that isn’t JSON. Here’s the initial JSON document I started with:

SQL> select json_query(json_txt, '$' pretty error on error) 
  2   from t1 where id = 1;

JSON_QUERY(JSON_TXT,'$'PRETTYERRORONERROR)                  
------------------------------------------------------------
{                                                           
  "results" :                                               
  [                                                         
    {                                                       
      "resultID" : 1,                                       
      "resultDate" : "13-JUN-19 08:10:00",                  
      "details" : "none"                                    
    }                                                       
  ]                                                         
}                 

Thanks to my check constraint I know this is valid JSON. I specified the pretty keyword to make the output more readable. When developing using new(-ish) features I prefer Oracle to raise an error so I’m going with ERROR ON ERROR.

These are my NLS settings by the way, this is a system installed with the default locale set to en_US. I have added these settings for reference and they will play a role later on.

SQL> select * from nls_session_parameters
  2* where regexp_like(parameter,'(DATE|TIME)');
PARAMETER                      VALUE                         
------------------------------ ------------------------------
NLS_DATE_FORMAT                DD-MON-RR                     
NLS_DATE_LANGUAGE              AMERICAN                      
NLS_TIME_FORMAT                HH24.MI.SSXFF                 
NLS_TIMESTAMP_FORMAT           DD-MON-RR HH24.MI.SSXFF       
NLS_TIME_TZ_FORMAT             HH24.MI.SSXFF TZR             
NLS_TIMESTAMP_TZ_FORMAT        DD-MON-RR HH24.MI.SSXFF TZR   

6 rows selected. 

I am now ready to start testing.

First Test

So let’s get started with the testing. My query is shown here:

SELECT
    jt.*
FROM
    t1,
    JSON_TABLE ( t1.json_txt, '$.results[*]' ERROR ON ERROR
            COLUMNS (
                resID   NUMBER   PATH '$.resultID',
                resDATE DATE     PATH '$.resultDate'
            )
        )
    AS jt
WHERE
    t1.id = 1;

Not too dissimilar to using XMLTABLE, I need to pass the column containing my JSON document to the function, followed by the path to the data within the JSON document I want to parse. I could have stored multiple results in the array (and I did initially, but considered that a little too confusing), hence the reference to all of them.

The columns() clause (of type JSON_value_column) allows me to define columns, their respective data type and the path relative to the second argument to JSON_TABLE. In my case that’s $.results.resultID and $.results.resultDate. These two are mapped to resID and resDATE in the output. I am limiting the query to my first row in t1.

When I execute this query against the first record in the table, I get an error:

SQL> SELECT
  2      jt.*
  3  FROM
  4      t1,
  5      JSON_TABLE ( t1.json_txt, '$.results[*]' ERROR ON ERROR
  6              COLUMNS (
  7                  resID   NUMBER   PATH '$.resultID',
  8                  resDATE DATE     PATH '$.resultDate'
  9              )
 10          )
 11      AS jt
 12  WHERE
 13      t1.id = 1;

Error starting at line : 1 in command -
SELECT
    jt.*
FROM
    t1,
    JSON_TABLE ( t1.json_txt, '$.results[*]' ERROR ON ERROR
            COLUMNS (
                resid   NUMBER   PATH '$.resultID',
                resdate DATE     PATH '$.resultDate'
            )
        )
    AS jt
WHERE
    t1.id = 1
Error report -
ORA-01861: literal does not match format string

This is kind of surprising, my suspicion at the time was that something is wrong with the NLS_DATE_FORMAT settings. But whatever I tried, I always got the same error. Using MOS and other resources on the net turned out blank, which might well be down to my inadequate search efforts. Tracing didn’t really help either.

Second Test

I tried a few other things until, with the help of an educated guess, I got lucky: I removed the time portion from the resultDate and formatted it to the output you get from selecting sysdate from dual, as shown here:

SQL> select json_query(json_txt, '$.results' pretty error on error) 
  2  from t1 where id = 2; 

JSON_QUERY(JSON_TXT,'$.RESULTS'PRETTYERRORONERROR)          
------------------------------------------------------------
[                                                           
  {                                                         
    "resultID" : 2,                                         
    "resultDate" : "13-JUN-19",                             
    "details" : "none"                                      
  }                                                         
]                                                           

This time around, the error message was different:

SQL> SELECT
  2      jt.*
  3  FROM
  4      t1,
  5      JSON_TABLE ( t1.json_txt, '$.results[*]' ERROR ON ERROR
  6              COLUMNS (
  7                  resID   NUMBER    PATH '$.resultID',
  8                  resDATE DATE      PATH '$.resultDate'
  9              )
 10          )
 11      AS jt
 12  WHERE
 13      t1.id = 2;

Error starting at line : 1 in command -
SELECT
    jt.*
FROM
    t1,
    JSON_TABLE ( t1.json_txt, '$.results[*]' ERROR ON ERROR
            COLUMNS (
                resID   NUMBER    PATH  '$.resultID',
                resDATE DATE      PATH '$.resultDate'
            )
        )
    AS jt
WHERE
    t1.id = 2
Error report -
ORA-40688: date or time stamp string value does not conform
to ISO 8601 format

Aha! That’s much better than ORA-01861: literal does not match format string. So it would appear as if the date the system expected to find in the JSON document must be ISO 8601 compliant. Ummm, what exactly is an ISO 8601 date format? The Internet knew!

Third Test

So maybe using an ISO 8601 time format in the JSON document is the answer? Worth a try, the JSON document now looks like this:

SQL> select json_query(json_txt, '$.results' pretty error on error) 
  2  from t1 where id = 3;

JSON_QUERY(JSON_TXT,'$.RESULTS'PRETTYERRORONERROR)          
------------------------------------------------------------
[                                                           
  {                                                         
    "resultID" : 3,                                         
    "resultDate" : "2019-06-13T16:39:00+00:00",             
    "details" : "none"                                      
  }                                                         
]     

It seems that ISO 8601 requires a slightly more complex time expression, much more like a timestamp in Oracle. Let’s see if this works:

SQL> SELECT
  2      jt.*
  3  FROM
  4      t1,
  5      JSON_TABLE ( t1.json_txt, '$.results[*]' ERROR ON ERROR
  6              COLUMNS (
  7                  resID   NUMBER    PATH  '$.resultID',
  8                  resDATE DATE      PATH '$.resultDate'
  9              )
 10          )
 11      AS jt
 12  WHERE
 13      t1.id = 3;

     RESID RESDATE  
---------- ---------
         3 13-JUN-19

Hurray, no errors, but result data for a change!

Caveat

But wait, that’s not the whole story. I have stored a time component in the input document, so how do I get that? I’d normally use to_char() for that purpose, so let’s give that a go:

SQL> SELECT
  2      to_char(jt.resDATE, 'dd.mm.yyyy hh24:mi:ss') as formattedDate
  3  FROM
  4      t1,
  5      JSON_TABLE ( t1.json_txt, '$.results[*]' ERROR ON ERROR
  6              COLUMNS (
  7                  resID   NUMBER    PATH  '$.resultID',
  8                  resDATE DATE      PATH '$.resultDate'
  9              )
 10          )
 11      AS jt
 12  WHERE
 13      t1.id = 3;

FORMATTEDDATE      
-------------------
13.06.2019 00:00:00

Hmmm, that’s midnight, and not 16:39:00+00:00 …. Unless I have missed something important it seems as if the DATE column type in the columns() clause stripped the input of its time component. Not to worry though, there’s a timestamp type. Maybe that can help?

SQL> SELECT
  2      jt.*
  3  FROM
  4      t1,
  5      JSON_TABLE ( t1.json_txt, '$.results[*]' ERROR ON ERROR
  6              COLUMNS (
  7                  resID   NUMBER    PATH '$.resultID',
  8                  resDATE TIMESTAMP PATH '$.resultDate'
  9              )
 10          )
 11      AS jt
 12  WHERE
 13      t1.id = 3;

     RESID RESDATE                     
---------- ----------------------------
         3 13-JUN-19 16.39.00.000000000

Voila! I’ve got a date and time.

Summary

From my very limited testing it appears as if input date in the JSON document has to be provided in ISO 8601 format, or otherwise Oracle raises an error. Unlike in SQLLDR’s control file there appears to be no option to tell Oracle about the formatting. Changing the NLS_DATE_FORMAT (not shown here) did affect the output of the query only. It didn’t appear to be applied when parsing the input.