Category Archives: Linux

sqlldr, direct path loads and concurrency in 12.2 and later

In my previous post I showed you that Oracle’s SQL loader (sqlldr) utility has a built-in timeout of 30 seconds waiting for locked resources before returning SQL*Loader-951/ORA-604/ORA-54 errors and failing to load data. This can cause quite some trouble! Before showing you the enhancement in 12.2 and later, here is the gist of the previous post.

Concurrency in Oracle sqlldr 12.1 and earlier

To show you how sqlldr times out I need to simulate an exclusive lock on the table in sqlplus for example. That’s quite simple:

SQL> set time on
10:17:00 SQL> lock table t2 in exclusive mode;

Table(s) Locked.

10:17:07 SQL>

Next I started a sqlldr process in another session. Please refer to the previous post for details, or take my word that I’m using a direct path insert strategy. The only difference is the size of the input file – I had to inflate it considerably to buy some time running standard diagnostic tools:

$ date; sqlldr /@loader control=t2_2.ctl ; date
Tue 23 Jul 10:22:35 BST 2019

SQL*Loader: Release 19.0.0.0.0 - Production on Tue Jul 23 10:22:35 2019
Version 19.4.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct
SQL*Loader-951: Error calling once/load initialization
ORA-00604: error occurred at recursive SQL level 1
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
Tue 23 Jul 10:23:06 BST 2019

The “date” commands reveal a timeout of 30 seconds. Setting ddl_lock_timeout has no bearing on the timeout: the database is waiting on kpodplck wait before retrying ORA-54:

10:22:36 SQL> r
  1* select event, state from v$session where program like 'sqlldr%'

EVENT                                                            STATE
---------------------------------------------------------------- -------------------
kpodplck wait before retrying ORA-54                             WAITING

1 row selected.

This was where I left off with the previous post until I noticed there is another option!

Oracle 12.2 and later

In 12.2 and later you can instruct sqlldr to wait until the lock is released. There is a new parameter named direct_path_lock_timeout:

$ sqlldr | egrep 'Version|direct_path'
Version 19.4.0.0.0
direct_path_lock_wait -- wait for access to table when currently locked  (Default FALSE)

Interestingly there are no hits for direct_path_lock_wait in My Oracle Support’s knowledgebase. There are merely a few documentation references. So what does this parameter do? While the table is still locked in exclusive mode, let’s start the sqlldr process with the new option:

$ date; sqlldr /@loader control=t2_2.ctl direct_path_lock_wait=true; date
Fri 26 Jul 10:13:54 BST 2019

SQL*Loader: Release 19.0.0.0.0 - Production on Fri Jul 26 10:13:54 2019
Version 19.4.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct
[ ... waiting ... ]

The process now sits there and waits … and it does so for more than 30 seconds. And instead of kpodplck wait before retrying ORA-54 it waits on … drums please: enq: TM contention!

10:20:11 SQL> select seq#, event, state, round(seconds_in_wait/60, 2) mins_waiting
10:20:21   2  from v$session where program like 'sqlldr%';

      SEQ# EVENT                    STATE               MINS_WAITING
---------- ------------------------ ------------------- ------------
       119 enq: TM - contention     WAITING                     6.53

10:20:26 SQL> 

This is great news if your sqlldr processes compete for TM enqueues and your load process takes a little longer than the previously hard coded timeout of 30 seconds. The process eventually completed successfully after the enqueue was released:

$ date; sqlldr /@loader control=t2_2.ctl direct_path_lock_wait=true; date
Fri 26 Jul 10:13:54 BST 2019

SQL*Loader: Release 19.0.0.0.0 - Production on Fri Jul 26 10:13:54 2019
Version 19.4.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct

Load completed - logical record count 950936.

Table T2:
  950924 Rows successfully loaded.

Check the log file:
  t2_2.log
for more information about the load.
Fri 26 Jul 10:21:32 BST 2019

Caveats

As with every pro, there are usually cons associated. The downside to waiting (for a potentially very long time) is that you might not notice load processes beginning to stack up unless proper instrumentation and monitoring are in place. Waiting too long for data to be loaded is equally bad as not loading at all because the end result is identical. As with many features in the database Oracle gives you plenty of options, and it’s up to the developers and architects to make the correct decisions on how to use them.

Summary

Beginning with sqlldr 12.2 Oracle introduced the option to wait for enqueues on segments to be released instead of aborting after 30 seconds.

In the next post I’ll write about another possibility to prevent exclusive table locks in the first place when running multiple concurrent sqlldr sessions.

Advertisements

sqlldr, direct path loads and concurrency in 12.1 and earlier

I have recently come across an interesting issue related to concurrent data loading into the Oracle database using sqlldr’s direct path mode. Although I investigated the situation on 12.1.0.2, I found that the same holds true in 19.4 as well when using the defaults. I reconstructed the case, although it is simplified a little to bring the point home.

The environment I used to put this post together is Oracle Restart 19.4.0 on Oracle Linux 7.6.

Test overview

For this test I am running concurrent sqlldr sessions to demonstrate the case. I am conscious of that fact that I could have used external tables, but then I wouldn’t have been able to write this post :)

Assume there’s a table named t2:

SQL> create table t2 as select * from dba_objects where 1 = 0;

Table created.

I am planning on populating the table with data in CSV format. Using sqlcl it is dead easy to create an input file, simply specify the /*csv*/ hint in your query and spool the output to a file: job done. The resulting input file is named t2.dat.

Next up is the control file. I am a creature of habit and tend to use a control file although I could have tried the same process using express mode.

Using the documentation I figured the following control file should do. Since it’s easy enough to forget passing direct=true to the command line I added it to the options clause. When using sqlcl to create a CSV file it adds the column headings in line 1, and they won’t load so I’m skipping these records. Here is the full control file:

options (direct=true, skip=1)
load data 
infile 't2.dat'
into table t2
append
fields csv without embedded 
date format 'dd-mon-yy'
trailing nullcols
(
OWNER,                                                                          
OBJECT_NAME,                                                                    
SUBOBJECT_NAME,                                                                 
OBJECT_ID,                                                                      
DATA_OBJECT_ID,                                                                 
OBJECT_TYPE,                                                                    
CREATED date,                                                                        
LAST_DDL_TIME date,                                                                  
TIMESTAMP,                                                                      
STATUS,                                                                         
TEMPORARY,                                                                      
GENERATED,                                                                      
SECONDARY,                                                                      
NAMESPACE,                                                                      
EDITION_NAME,                                                                   
SHARING,                                                                        
EDITIONABLE,                                                                    
ORACLE_MAINTAINED,                                                              
APPLICATION,                                                                    
DEFAULT_COLLATION,                                                              
DUPLICATED,                                                                     
SHARDED,                                                                        
CREATED_APPID,                                                                  
CREATED_VSNID,                                                                  
MODIFIED_APPID,                                                                 
MODIFIED_VSNID
)

With the prerequisites at hand I’m ready to perform some testing.

Test #1: concurrent sessions using defaults

First of all, what happens when starting 5 concurrent sessions? Will they complete? Let’s try this in bash:

for i in $(seq 1 5) ; do
  echo starting iteration $i
  sqlldr /@loader control=t2.ctl log=session_${i}.log &
done

This ran to completion without any visible errors, but I prefer to run sanity checking anyway. The first thing to do is to see if all records have been loaded. The input file contains 73148 data records by the way.

$ grep 'Rows successfully loaded' session*log
session_1.log:  73148 Rows successfully loaded.
session_2.log:  73148 Rows successfully loaded.
session_3.log:  73148 Rows successfully loaded.
session_4.log:  73148 Rows successfully loaded.
session_5.log:  73148 Rows successfully loaded.

So this looks ok, what about the run times?

$ grep '^Run' *.log
session_1.log:Run began on Mon Jul 22 21:32:31 2019
session_1.log:Run ended on Mon Jul 22 21:32:44 2019
session_2.log:Run began on Mon Jul 22 21:32:31 2019
session_2.log:Run ended on Mon Jul 22 21:32:40 2019
session_3.log:Run began on Mon Jul 22 21:32:31 2019
session_3.log:Run ended on Mon Jul 22 21:32:46 2019
session_4.log:Run began on Mon Jul 22 21:32:31 2019
session_4.log:Run ended on Mon Jul 22 21:32:48 2019
session_5.log:Run began on Mon Jul 22 21:32:31 2019
session_5.log:Run ended on Mon Jul 22 21:32:42 2019

All in all these loads completed in a reasonably short time. Some took longer to finish than others though. That made me curious and I logged into the database to see what was going on. I never consciously saw this event:

SQL> select sid, event, state from v$session where program like 'sqlldr%';

       SID EVENT                                                            STATE
---------- ---------------------------------------------------------------- -------------------
        44 kpodplck wait before retrying ORA-54                             WAITING
        52 kpodplck wait before retrying ORA-54                             WAITING
        54 kpodplck wait before retrying ORA-54                             WAITING
       290 Data file init write                                             WAITING
       291 kpodplck wait before retrying ORA-54                             WAITING

SQL> 

Using the output from my above query I could see that one session was active, all the others queued up behind it. Interestingly there is no “enq: TM – contention” which you’d see with direct path inserts in sqlplus for example:

-- in session 1
SQL> insert /*+ append */ into t2 select * from dba_objects;

73157 rows created.

-- notice there is no commit! 

-- Session 2 waits ...
SQL> insert /*+ append */ into t2 select * from dba_objects;

-- the DBA sees it all:
SQL> select sid, event, state from v$session where username = 'MARTIN'

       SID EVENT                                                            STATE
---------- ---------------------------------------------------------------- -------------------
        51 enq: TM - contention                                             WAITING
       270 SQL*Net message from client                                      WAITING

What’s even more interesting is this: ddl_lock_timeout which I thought might have helped, has no effect on sqlldr operations:

SQL> show parameter ddl_lock_timeout

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
ddl_lock_timeout                     integer     0
SQL> 

The parameter isn’t set at all, yet I could see “kpodplck wait before retrying ORA-54” which is strange. ORA-54 is of course the dreaded “resource busy and acquire with nowait specified” error. I managed to trace kpodp back to the direct load path functionality.

Without the ability to control waiting via a database parameter I am aware of, there must be some other timeout. And sure enough, there is:

$ time sqlldr /@loader control=t2.ctl

SQL*Loader: Release 19.0.0.0.0 - Production on Mon Jul 22 21:46:59 2019
Version 19.4.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct
SQL*Loader-951: Error calling once/load initialization
ORA-00604: error occurred at recursive SQL level 1
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

real    0m32.038s
user    0m0.011s
sys     0m0.011s

It appears as if sqlldr times out waiting for a locked resource after round about 30 seconds, regardless of ddl_lock_timeout.

And this is the end of the story if you aren’t on 12.2 or later. In the next post I’ll show you how you can work around this problem with more current releases.

Ansible tips’n’tricks: checking if a systemd service is running

I have been working on an Ansible playbook to update Oracle’s Tracefile Analyser (TFA). If you have been following this blog over the past few months you might remember that I’m a great fan of the tool! Using Ansible makes my life a lot easier: when deploying a new system I can ensure that I’m also installing TFA. Under normal circumstances, TFA should be present when the (initial) deployment playbook finishes. At least in theory.

As we know, life is what happens when you’re making other plans, and I’d rather check whether TFA is installed/configured/running before trying to upgrade it. The command to upgrade TFA is different from the command I use to deploy it.

I have considered quite a few different ways to do this but in the end decided to check for the oracle-tfa service: if the service is present, TFA must be as well. There are probably other ways, maybe better ones, but this one works for me.

Checking for the presence of a service

Ansible offers a module, called service_facts since version 2.5 to facilitate working with services. I also tried the setup module but didn’t find what I needed. Consider the following output, generated on Oracle Linux 7.6 when gathering service facts:

TASK [get service facts] *******************************************************
 ok: [localhost] => {
     "ansible_facts": {
         "services": {
             "NetworkManager-wait-online.service": {
                 "name": "NetworkManager-wait-online.service", 
                 "source": "systemd", 
                 "state": "stopped"
             }, 
             "NetworkManager.service": {
                 "name": "NetworkManager.service", 
                 "source": "systemd", 
                 "state": "running"
             }, 
             "auditd.service": {
                 "name": "auditd.service", 
                 "source": "systemd", 
                 "state": "running"
             }, 

[ many more services ]

            "oracle-tfa.service": {
                 "name": "oracle-tfa.service", 
                 "source": "systemd", 
                 "state": "running"
             }, 

[ many more services ]

This looks ever so slightly complicated! And indeed, it took a little while to work the syntax out. My first attempt were all but unsuccessful.

Getting the syntax right

Thankfully I wasn’t the only one with the problem, and with a little bit of research ended up with this code:

---
 - hosts: localhost
   connection: local
   become: true

   tasks:
   - name: get service facts
     service_facts:

   - name: try to work out how to access the service
     debug:
       var: ansible_facts.services["oracle-tfa.service"]

Awesome! When running this on a system with TFA installed, it works quite nicely:

TASK [try to work out how to access the service] *******************************
 ok: [localhost] => {
     "ansible_facts.services[\"oracle-tfa.service\"]": {
         "name": "oracle-tfa.service", 
         "source": "systemd", 
         "state": "running"
     }
 }
 

 PLAY RECAP *********************************************************************
 localhost                  : ok=3    changed=0    unreachable=0    failed=0

The same code fails on a system without TFA installed:

TASK [try to work out how to access the service] *******************************
 ok: [localhost] => {
     "ansible_facts.services[\"oracle-tfa.service\"]": "VARIABLE IS NOT DEFINED!
      'dict object' has no attribute 'oracle-tfa.service'"
 }
 

 PLAY RECAP *********************************************************************
 localhost                  : ok=3    changed=0    unreachable=0    failed=0

Now the trick is to ensure that I’m not referencing an undefined variable. This isn’t too hard either, here is a useable playbook:

---
 - hosts: localhost
   connection: local 
 
   tasks:
   - name: get service facts
     service_facts:
 
   - name: check if TFA is installed
     fail:
       msg: Tracefile Analyzer is not installed, why? It should have been there!
     when: ansible_facts.services["oracle-tfa.service"] is not defined

The “tasks” include getting service facts before testing for the presence of the oracle-tfa.service. I deliberately fail the upgrade process to make the user aware of a situation that should not have happened.

Hope this helps!

RBAL (ospid: nnn): terminating the instance due to error 27625 after patching Oracle Restart

I have come across an odd behaviour trying to patch an Oracle Restart environment to 12.1.0.2 January 2019. Based on a twitter conversation this isn’t necessarily limited to my patch combination, there might be others as well. I have used opatchauto to apply patch 28813884 to both RDBMS and GRID homes plus its corresponding OJVM (Java) patch. Before diving into details, this is the environment I have been working with:

  • Oracle Restart 12.1.0.2 with an earlier PSU/OJVM combo applied
  • Separation of duties with oracle as the RDBMS owner, and grid owning the GRID infrastructure installation. This is a key point!
  • Patches to be applied
    • Patch 28813884 (GI PSU)
    • Patch 28790654 (corresponding OJVM patch)
  • OPatch version as installed in the GRID|RDBMS home
    • opatchauto: 13.9.4.2.0
    • opatch: 12.2.0.1.17

Following the instructions in the readme file I upgraded OPatch to the required version. The file I downloaded was named p6880880_121010_Linux-x86-64.zip. I double-checked the readme, and to me this is the correct file. After upgrading OPatch in the RDBMS and GRID homes, I started patching.

After this process completed, I wanted to start the database in UPGRADE mode as required by the OJVM patch. This needs to be done via sqlplus since srvctl does not support an “upgrade” option in “start database”.

And that’s where it hit me: whenever I tried to open the database in UPGRADE mode, it threw an error:

 SQL*Plus: Release 12.1.0.2.0 Production on Tue Jul 16 12:15:34 2019

 Copyright (c) 1982, 2014, Oracle.  All rights reserved.

 Connected to an idle instance.
 
 SQL> startup upgrade
 
 ORACLE instance started.
 
 Total System Global Area 1073741824 bytes
 Fixed Size                  2932632 bytes
 Variable Size             713031784 bytes
 Database Buffers          352321536 bytes
 Redo Buffers                5455872 bytes
 ORA-03113: end-of-file on communication channel
 Process ID: 22354
 Session ID: 14 Serial number: 41958

Looking at the alert log I can see that RBAL terminated the instance:

ALTER DATABASE   MOUNT
 ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.56.113)(PORT=1521))' SCOPE=MEMORY SID='ORCL';
 2019-07-16 12:15:43.404000 -04:00
 NOTE: ASMB mounting group 1 (DATA)
 WARNING: cellinit.ora is missing. RBAL is terminating the instance.
 RBAL (ospid: 22332): terminating the instance due to error 27625
 System state dump requested by (instance=1, osid=22332 (RBAL)), summary=[abnormal instance termination].
 System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/ORCL/trace/ORCL_diag_22306_20190716121543.trc
 Dumping diagnostic data in directory=[cdmp_20190716121543], requested by (instance=1, osid=22332 (RBAL)), summary=[abnormal instance termination].
 Instance terminated by RBAL, pid = 22332
 2019-07-16 12:15:49.164000 -04:00

But since this is Oracle Restart, Clusterware will simply restart the database. Unless of course you’ve been changing the default behaviour. And funny enough, this works (see further down in the article as to why). Although I’d appreciate this in most cases, the automatic restart isn’t appropriate in my situation: when started by Clusterware, the database is not in upgrade mode:

 SQL> select status from v$instance;

 STATUS
 ------------
 OPEN

Which is a problem for me. A MOS search about error 27625 didn’t reveal anything too useful, and it took me quite a while to realise the problem has to do with permissions. An Internet search finally gave me the right answer, a fellow blogger has pointed it out a little while ago …

So what exactly is the problem? The RDBMS “oracle” binary needs a specific set of permissions and ownership/group membership for Oracle Restart with separation of duties enabled. This is what it looked like before applying the patch:

$ ls -l $ORACLE_HOME/bin/oracle
-rwsr-s--x. 1 oracle asmadmin 324518992 Jul 16 12:15 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle

The file permissions are ‘6751’ with oracle owning the file but it belongs to the asmadmin group. Remember, I am using a separate user for Grid Infrastructure with its own specific operating system groups. After running opatch, this changed to:

$ ls -l $ORACLE_HOME/bin/oracle
-rwsr-s--x. 1 oracle oinstall 324518992 Jul 16 12:29 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle

Permissions remained the same, however the group changed from asmadmin to oinstall. Not quite what I had in mind, and it reproducibly causes instance crashes. There is a simple solution: make sure permissions are set correctly! Using “Database Creation on 11.2/12.1/12.2 Grid Infrastructure with Role Separation ( ORA-15025, KFSG-00312, ORA-15081 ) (Doc ID 1084186.1)” got me on the right track.

 $ ls -l $ORACLE_HOME/bin/oracle
 -rwsr-s--x. 1 oracle asmadmin 324518992 Jul 16 12:29 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle

 $ sqlplus / as sysdba

 SQL*Plus: Release 12.1.0.2.0 Production on Tue Jul 16 12:34:45 2019

 Copyright (c) 1982, 2014, Oracle.  All rights reserved.

 Connected to an idle instance.

 SQL> startup upgrade
 ORACLE instance started.

 Total System Global Area 1073741824 bytes
 Fixed Size                  2932632 bytes
 Variable Size             713031784 bytes
 Database Buffers          352321536 bytes
 Redo Buffers                5455872 bytes
 Database mounted.
 Database opened.

 SQL> select status from v$instance;

 STATUS
 ------------
 OPEN MIGRATE

 SQL> 

With the database in upgrade mode (I believe the “migrate” is a left over from the 9i days) I am reassured that running datapatch (the OJVM part actually) works as advertised.

By the way Clusterware corrects the group permissions when you issue a “srvctl start database -db …” command as documented in “Starting the database using srvctl changes the group setting for oracle binary (Doc ID 1508027.1)”. Which didn’t solve my problem as I can’t start the database in upgrade mode using srvctl.

Hope this helps you one day!

Learning Kubernetes: persistent storage with Minikube

As part of my talk at (the absolutely amazing) Riga Dev Days 2019 I deployed Oracle Restful Data Services (ORDS) in Minikube as my application’s endpoint. I’ll blog about deploying ORDS 19 in docker and running it on Kubernetes later, but before I can do so I want to briefly touch about persistent storage in Minikube because I saw it as a pre-requisite.

In a previous post I wrote about using Minikube as a runtime environment for learning Kubernetes. This post uses the same versions as before – Docker 18.09 and Kubernetes 1.14 are invoked by Minikube 1.1.0.

Quick & dirty intro to storage concepts for Kubernetes

Container storage is – at least to my understanding – ephemeral. So, in other words, if you delete the container, and create a new one, locally stored data is gone. In “standalone” Docker, you can use so-called volumes to store things more permanently. A similar concept exists in Kubernetes in form of a persistent volume (PV) and a corresponding persistent volume claim (PVC).

In a nutshell, and greatly simplified, a persistent volume is a piece of persistent storage pre-allocated on a Kubernetes cluster that can be “claimed”. Hence the names …

There is a lot more to say about persistent data in Kubernetes, and others have already done that, so you won’t find an explanation of storage concepts in Kubernetes here. Please head over to the documentation set for more details.

Be careful though, as the use of persistent volumes and persistent volume claims is quite an in-depth topic, especially outside the developer-scoped Minikube. I specifically won’t touch on the subject of persistent storage outside of Minikube, I’d rather save that for a later post.

Persistent Volumes for my configuration files

ORDS requires you to run an installation routine first, in the cause of which it will create a directory containing its runtime configuration. The shell script I’m using to initialise my container checks for the existence of the configuration directory and skips the installation step if it finds one. It proceeds straight to starting the Tomcat container. This is primarily done to speed the startup process up. If I were not to use the PV/PVC combination the pods in my deployment would have to run the installation each time they start up, something I wanted to avoid.

A less complex example please

I didn’t want to make things more complicated than necessary, so instead I’ll use a much simpler example before writing up on my ORDS deployment. My example is based on the official Ubuntu image, and I’m writing a “heartbeat” file into the mounted volume and see if I can find it on the underlying infrastructure.

The Minikube documentation informs us that Minikube preserves data in /data and a few other locations. Anything you try to put elsewhere will be lost. With that piece of information at hand I proceeded with the storage creation.

Experienced Minikube users might point out at this stage that pre-creating a volume isn’t needed as Minikube supports dynamic volume creation. From what I can tell that’s correct, but not what I chose to do.

Creating a persistent volume

Based on the previously mentioned documentation I created a persistent volume and accompanying volume claim like this:

$ cat persistent-volumes.yml 

kind: PersistentVolume
apiVersion: v1
metadata:
  name: research-vol
  labels:
    type: local
spec:
  capacity:
    storage: 2Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data/research-vol"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: research-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ""
  volumeName: research-vol
  resources:
    requests:
      storage: 1Gi

This approach is very specific to Minikube because I’m using persistent volumes of type HostPath.

Translated to plain English this means that I’m creating a 2 GB volume pointing to /data/research-vol on my Minikube system. And I’m asking for 1 GB in my persistent volume claim. The access mode (ReadWriteOnce) seems to be related to mounting the volume (concurrently) on multi-node clusters. Or it’s a bug because I successfully wrote to a single PVC from multiple pods as you can see in a bit… In my opinion the documentation wasn’t particularly clear on the subject.

Build the Docker image

Before I can deploy my example application to Minikube, I need to build a Docker image first. This is really boring, but I’ll show it here for the sake of completeness:

$ cat Dockerfile 
FROM ubuntu
COPY run.sh /usr/local/bin
RUN chmod +x /usr/local/bin/run.sh
ENTRYPOINT ["/usr/local/bin/run.sh"]

The shell script I’m running is shown here:

 $ cat run.sh 
 #!/usr/bin/env bash

 set -euxo pipefail

 if [[ ! -d $VOLUME ]]; then
   /bin/echo ERR: cannot find the volume $VOLUME, exiting
   exit 1
 fi

 while true; do
   /usr/bin/touch ${VOLUME}/$(hostname)-$(date +%Y%m%d-%H%M%S)
   /bin/sleep 10
 done

As you can see it is just an infinite loop writing heartbeat files into the directory indicated by the VOLUME environment variable. The image is available to Minikube after building it. Refer to my previous post on how to build Docker images for Minikube, or have a look at the documentation. I built the image as “research:v2”

Deploy

With the foundation laid, I can move on to defining the deployment. I’ll do that again with the help of a YAML file:

 $ cat research-deployment.yml 
 apiVersion: extensions/v1beta1
 kind: Deployment
 metadata:
   labels:
     app: research
   name: research
 spec:
   replicas: 2
   selector:
     matchLabels:
       app: research
   template:
     metadata:
       labels:
         app: research
     spec:
       containers:
       - image: research:v2
         env:
         - name: VOLUME
           value: "/var/log/heartbeat"
         name: research
         volumeMounts:
         - mountPath: "/var/log/heartbeat"
           name: research-vol
       volumes:
       - name: research-vol
         persistentVolumeClaim: 
           claimName: research-pvc
       restartPolicy: Always

If you haven’t previously worked with Kubernetes this might look daunting, but it isn’t actually. I’m asking for the deployment of 2 copies (“replicas”) of my research:v2 image. I am passing an environment variable – VOLUME – to the container image. It contains the path to the persistent volume’s mount point. I’m also mounting a PV named research-vol as /var/log/heartbeat/ in each container. This volume in the container scope is based on the definition found in the volumes: section of the YAML file. It’s important to match the information in volumes and volumeMounts.

Running

After a quick kubectl apply -f research-deployment.yml I have 2 pods happily writing to the persistent volume.

 $ kubectl get pods
 NAME                       READY   STATUS    RESTARTS   AGE
 research-f6668c975-98684   1/1     Running   0          6m22s
 research-f6668c975-z6pll   1/1     Running   0          6m22s

The directory exists as specified on the Minikube system. If you used the VirtualBox driver, use minikube ssh to access the VM and change to the PV’s directory:

# pwd
/data/research-vol
# ls -l *research-f6668c975-z6pll* | wc -l
53
# ls -l *research-f6668c975-98684* | wc -l
53

As you can see, both pods are able to write to the directory. The volume’s contents were preserved when I deleted the deployment:

$ kubectl delete deploy/research
deployment.extensions "research" deleted
$ kubectl get pods
No resources found.

This demonstrates that both pods from my deployment are gone. What does it look like back on the Minikube VM?

# ls -l *research-f6668c975-z6pll* | wc -l
63
# ls -l *research-f6668c975-98684* | wc -l
63

You can see that files are still present. The persistent volume lives up to its name.

Next I wanted to see if (new) Pods pick up what’s in the PV. After a short modification to the shell script and a quick docker build followed by a modification of the deployment to use research:v3, the container prints the number of files:

#!/usr/bin/env bash

set -euxo pipefail

if [[ ! -d $VOLUME ]]; then
  /bin/echo ERR: cannot find the volume $VOLUME, exiting
  exit 1
fi

/bin/echo INFO: found $(ls -l $VOLUME/* | wc -l) files in the volume

while true; do
  /usr/bin/touch ${VOLUME}/$(hostname)-$(date +%Y%m%d-%H%M%S)
  /bin/sleep 10
done

As proven by the logs:

 $ kubectl get pods
 NAME                        READY   STATUS        RESTARTS   AGE
 research-556cc9989c-bwzkg   1/1     Running       0          13s
 research-556cc9989c-chg76   1/1     Running       0          13s

 $ kubectl logs research-556cc9989c-bwzkg
 [[ ! -d /var/log/heartbeat ]]
 ++ wc -l
 ++ ls -l /var/log/heartbeat/research-7d4c7c8dc8-5xrtr-20190606-105014 
 ( a lot of output not shown )
 /bin/echo INFO: found 166 files in the volume
 INFO: found 166 files in the volume
 true
 ++ hostname
 ++ date +%Y%m%d-%H%M%S
 /usr/bin/touch /var/log/heartbeat/research-556cc9989c-bwzkg-20190606-105331
 /bin/sleep 10
 true
 ++ hostname
 ++ date +%Y%m%d-%H%M%S
 /usr/bin/touch /var/log/heartbeat/research-556cc9989c-bwzkg-20190606-105341
 /bin/sleep 10 

It appears as if I successfully created and used persistent volumes in Minikube.

Using the Secure External Password store with sqlcl

Sometimes it is necessary to invoke a SQL script in bash or otherwise in an unattended way. SQLcl has become my tool of choice because it’s really lightweight and can do a lot. If you haven’t worked with it yet, you really should give it a go.

So how does one go about invoking SQL scripts from the command line these days? There’s an age-old problem with unattended execution: how do you authenticate against the database? There are many ways to do so, some better than others. This post shows how to use the Secure External Password Store with SQLcl. As always, there is more than one way to do this, @FranckPachot recently wrote about a different approach on Medium which you might want to check out as well.

Please don’t store passwords in scripts

I have seen passwords embedded in shell scripts far too often, and that’s something I really don’t like for many, many reasons. Thankfully Oracle offers an alternative to storing clear text passwords in the form of the Secure External Password Store (SEPS).This post explains one of many ways to use a wallet to use sqlcl to connect to a database. It assumes that a Secure External Password store is set up with the necessary credentials. Components referenced in this post are:

  • sqlcl 19.1
  • Instant Client Basic 18.5
  • Oracle XE 18.4

The SEPS wallet is found in /home/oracle/seps with its corresponding tnsnames.ora and sqlnet.ora in /home/oracle/seps/tns. I have set TNS_ADMIN to /home/oracle/seps/tns and ensured that sqlnet.ora points to the correct wallet location.

First attempt

The first attempt at using sqlcl with the wallet resulted in the following error:

$ /home/oracle/sqlcl/bin/sql -L /@xepdb1

SQLcl: Release 19.1 Production on Fri May 17 05:56:56 2019

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

  USER          = 
  URL           = jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=oraclexe)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=XEPDB1)))
  Error Message = ORA-01017: invalid username/password; logon denied

I provided the -L flag to prevent sqlcl from asking me for different credentials after a failed login attempt. Using the -verbose flag in the next attempt I confirmed that sqlcl was indeed using my tnsnames.ora file in the directory specified by $TNS_ADMIN.

Investigating

So I started investigating … The first place to go to is the documentation, however I didn’t find anything relevant in the command line reference or FAQ shown on the product’s landing page. I then cast my net wider and found a few things on My Oracle Support (they didn’t apply to my version of sqlcl) and the Oracle forums.

I tried various things to get the thin client to cooperate with using the wallet but didn’t pursue that route further after learning about the option to use the OCI JDBC driver. After experimenting a little more I got on the right track.

Second attempt

The consensus in the Oracle forum posts I found seems to be to use the OCI flag when invoking the tool. So I tried that next:

$ /home/oracle/sqlcl/bin/sql -L -oci /@xepdb1

SQLcl: Release 19.1 Production on Fri May 17 06:09:29 2019

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

  USER          = 
  URL           = jdbc:oracle:oci8:@xepdb1
  Error Message = no ocijdbc18 in java.library.path
  USER          = 
  URL           = jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=oraclexe)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=XEPDB1)))
  Error Message = ORA-01017: invalid username/password; logon denied

No success yet, but there’s an important clue in the output the first URL indicates that indeed an OCI connection was tried, except that a shared library was missing from Java’s library path. I guessed correctly that ocijdbc18 is part of the instant client 18 basic installation. After installing the RPM for the latest 18c instant client I confirmed libocijdb18.so was part of the package.

From what I understand java doesn’t pick up the configuration created by ldconfig and you either have to set java.library.path manually (as in java -Djava.library.path=…) or set LD_LIBRARY_PATH. The latter is easier, and it gave me the desired result:

$ export LD_LIBRARY_PATH=/usr/lib/oracle/18.5/client64/lib:$LD_LIBRARY_PATH
$ echo "select user from dual" | /home/oracle/sqlcl/bin/sql -L -oci /@xepdb1

SQLcl: Release 19.1 Production on Fri May 17 06:15:29 2019

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Fri May 17 2019 06:15:32 -04:00

Connected to:
Oracle Database 18c Express Edition Release 18.0.0.0.0 - Production
Version 18.4.0.0.0

USER                                                                                                                            
------------------------------------------------------------------------
MARTIN                                                                                                                    

Disconnected from Oracle Database 18c Express Edition Release 18.0.0.0.0 - Production
Version 18.4.0.0.0

Result! I can use sqlcl to connect to a database using a wallet.

Oracle Instant Client RPM installation where to find things

Last week I blogged about the option to install Oracle’s Instant Client via the public YUM repository. If you go ahead and try this, there is one thing you will undoubtedly notice: file locations are rather unusual if you have worked with Oracle for a while. This is true at least for the 19c Instant Client, it might be similar for older releases although I didn’t check. I’d like to thank @oraclebase for prompting me to write this short article!

Installing the 19.3 “Basic” Instant Client package

So to start this post I am going to install the 19.3 “Basic” package on my Oracle Linux 7.6 lab environment:

$ sudo yum install oracle-instantclient19.3-basic
Loaded plugins: ulninfo
Resolving Dependencies
--> Running transaction check
---> Package oracle-instantclient19.3-basic.x86_64 0:19.3.0.0.0-1 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
 Package                     Arch   Version      Repository                Size
================================================================================
Installing:
 oracle-instantclient19.3-basic
                             x86_64 19.3.0.0.0-1 ol7_oracle_instantclient  51 M

Transaction Summary
================================================================================
Install  1 Package

Total download size: 51 M
Installed size: 225 M
Is this ok [y/d/N]: y
Downloading packages:
oracle-instantclient19.3-basic-19.3.0.0.0-1.x86_64.rpm     |  51 MB   00:09     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : oracle-instantclient19.3-basic-19.3.0.0.0-1.x86_64           1/1 
  Verifying  : oracle-instantclient19.3-basic-19.3.0.0.0-1.x86_64           1/1 

Installed:
  oracle-instantclient19.3-basic.x86_64 0:19.3.0.0.0-1                          

Complete!

With the software installed, let’s have a look at where everything is:

$ rpm -ql oracle-instantclient19.3-basic
/etc/ld.so.conf.d/oracle-instantclient.conf
/usr/lib/oracle
/usr/lib/oracle/19.3
/usr/lib/oracle/19.3/client64
/usr/lib/oracle/19.3/client64/bin
/usr/lib/oracle/19.3/client64/bin/adrci
/usr/lib/oracle/19.3/client64/bin/genezi
/usr/lib/oracle/19.3/client64/lib
/usr/lib/oracle/19.3/client64/lib/libclntsh.so
/usr/lib/oracle/19.3/client64/lib/libclntsh.so.10.1
/usr/lib/oracle/19.3/client64/lib/libclntsh.so.11.1
/usr/lib/oracle/19.3/client64/lib/libclntsh.so.12.1
/usr/lib/oracle/19.3/client64/lib/libclntsh.so.18.1
/usr/lib/oracle/19.3/client64/lib/libclntsh.so.19.1
/usr/lib/oracle/19.3/client64/lib/libclntshcore.so
/usr/lib/oracle/19.3/client64/lib/libclntshcore.so.19.1
/usr/lib/oracle/19.3/client64/lib/libipc1.so
/usr/lib/oracle/19.3/client64/lib/libmql1.so
/usr/lib/oracle/19.3/client64/lib/libnnz19.so
/usr/lib/oracle/19.3/client64/lib/libocci.so.19.1
/usr/lib/oracle/19.3/client64/lib/libociei.so
/usr/lib/oracle/19.3/client64/lib/libocijdbc19.so
/usr/lib/oracle/19.3/client64/lib/liboramysql19.so
/usr/lib/oracle/19.3/client64/lib/network
/usr/lib/oracle/19.3/client64/lib/network/admin
/usr/lib/oracle/19.3/client64/lib/network/admin/README
/usr/lib/oracle/19.3/client64/lib/ojdbc8.jar
/usr/lib/oracle/19.3/client64/lib/xstreams.jar
/usr/share/oracle
/usr/share/oracle/19.3
/usr/share/oracle/19.3/client64
/usr/share/oracle/19.3/client64/doc
/usr/share/oracle/19.3/client64/doc/BASIC_LICENSE
/usr/share/oracle/19.3/client64/doc/BASIC_README

As you can see in the RPM output, files are found under /usr/lib/oracle. That’s why I meant the file location is unusual. I for my part have followed the directory structure suggested by previous release’s Oracle Universal Installer (OUI) defaults and installed it under /u01/app/oracle/product/version/client_1. The actual location however doesn’t really matter.

No more manually calling ldconfig with 19.3

Note that before 19.3 you had to manually run ldconfig after installing the Instant Client RPM file. In 19c this is handled via a post-install script. The RPM adds its configuration in /etc/ld.so.conf.d/oracle-instantclient.conf.

$ rpm -q --scripts oracle-instantclient19.3-basic
postinstall scriptlet (using /bin/sh):
ldconfig
postuninstall scriptlet (using /bin/sh):
ldconfig

This is nicer – at least in my opinion – than setting LD_LIBRARY_PATH in a shell. This seemed to work just fine: I installed the SQLPlus package (oracle-instantclient19.3-sqlplus) and I could start it without any problems.

Network Configuration

If you have a requirement to add a TNS naming file, you should be able to do so by either setting TNS_ADMIN or place the file in /usr/lib/oracle/19.3/client64/lib/network/admin. I have a tnsnames.ora file pointing to my XE database here:

$ cat /usr/lib/oracle/19.3/client64/lib/network/admin/tnsnames.ora 
xepdb1 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = oraclexe)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = xepdb1)
    )
  )

I can connect to my database without setting any specific environment variables:

$ env | egrep -i 'tns|ld_library' | wc -l
0
$ sqlplus martin@xepdb1

SQL*Plus: Release 19.0.0.0.0 - Production on Mon May 13 09:38:12 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Enter password: 
Last Successful login time: Mon May 13 2019 09:22:11 -04:00

Connected to:
Oracle Database 18c Express Edition Release 18.0.0.0.0 - Production
Version 18.4.0.0.0

The SQLPlus package places a symbolic link to /usr/lib/oracle/19.3/client64/bin/sqlplus into /usr/bin so I don’t even have to adjust the PATH variable.

Summary

Being able to use the Instant Client out of the box is very useful for automated deployments where all you have to do is add a repository followed by a call to yum. This should make a lot of peoples’ lives much easier.