Author Archives: Martin Bach

About Martin Bach

Oracle DBA and Linux enthusiast, part time author and presenter.

Do you know perf trace? It’s an almost perfect replacement for strace

Last night while I was working on my POUG 2021 presentation I looked up what’s new with perf(1) since I last spoke about profiling and tracing Linux on-premises and in the cloud back in 2017. Unsurprisingly, quite a few things have changed.

A quick warning before we begin

As with every tracer/profiler, using tools such as perf trace(1) and strace(1) outside of lab/playground environments is discouraged as it can unexpectedly change a process’s behaviour, slow it down or even crash the process under investigation This article is for educational purposes only and purely to be used for research on a lab VM where potential damage is limited to the individual experimenting with the tools.

For these reasons you certainly don’t want to use strace or any other ptrace(2)-based tool such as gdb(1) to figure out what’s going on in a production environment. People a lot smarter than me have written about that, for example Brendan Gregg: wow, so much syscall. It’s a great read, please have a look at the article.

Performance example

Recently I came across a new option to perf, named perf trace which seems to be a much better suited tool for research as it seems to be far less intrusive.

Just to prove the point on Ubuntu 20.04.2 with kernel 5.8.0-63-generic, this is what it comes down to. The classic example showing the effect of strace on processes involves running dd(1) but that’s been used so many times I thought of something else. Since I’m researching pread(2) a lot right now, fio seems appropriate. When using the psync IO engine I should be able to record quite a few I/O-related system calls.

The following versions have been used when writing the article:

  • Ubuntu 20.04.2 LTS
  • Kernel 5.8.0-63-generic
  • strace 5.5
  • perf as per linux-tools-5.8.0-63-generic
  • fio 3.16

I have to admit that fio 3.16 isn’t the latest and greatest version, but it ships with Ubuntu making it the convenient solution.

Creation of a baseline without profiling/tracing

The first test is to run fio without profiling to establish a baseline:

$ fio --name=test1 --rw=randread --size=1000M --filename=/nvme/fio/test --ioengine=psync
test1: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=52.5MiB/s][r=13.4k IOPS][eta 00m:00s]
test1: (groupid=0, jobs=1): err= 0: pid=12472: Wed Jul 28 21:36:45 2021
  read: IOPS=13.2k, BW=51.5MiB/s (53.9MB/s)(1000MiB/19436msec)
    clat (usec): min=62, max=2244, avg=74.47, stdev= 6.20
     lat (usec): min=62, max=2244, avg=74.63, stdev= 6.23
    clat percentiles (usec):
     |  1.00th=[   69],  5.00th=[   71], 10.00th=[   71], 20.00th=[   72],
     | 30.00th=[   73], 40.00th=[   74], 50.00th=[   75], 60.00th=[   75],
     | 70.00th=[   76], 80.00th=[   77], 90.00th=[   79], 95.00th=[   82],
     | 99.00th=[   93], 99.50th=[   94], 99.90th=[  105], 99.95th=[  118],
     | 99.99th=[  123]
   bw (  KiB/s): min=44568, max=53872, per=100.00%, avg=52684.42, stdev=1549.03, samples=38
   iops        : min=11142, max=13468, avg=13171.11, stdev=387.26, samples=38
  lat (usec)   : 100=99.84%, 250=0.15%, 500=0.01%, 750=0.01%
  lat (msec)   : 4=0.01%
  cpu          : usr=2.87%, sys=14.03%, ctx=256017, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=256000,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=51.5MiB/s (53.9MB/s), 51.5MiB/s-51.5MiB/s (53.9MB/s-53.9MB/s), io=1000MiB (1049MB), run=19436-19436msec

Disk stats (read/write):
  nvme0n1: ios=255515/1, merge=0/0, ticks=16882/3, in_queue=16888, util=99.18%
$ 

Undoubtedly the above is one of the simplest uses of fio. I asked it to perform random reads on my NVMe device, at a 4k block size with the IO engine instructed to use pread() calls to perform I/O. The following list summarises the important performance details:

  • IOPS: 13.200
  • Storage bandwidth: 51.5 MiB/s
  • Completion time: 19436 milliseconds.

And no, this isn’t anywhere near the NVMe device’s performance but that’s not the point ;)

Interlude: the tricky bit profiling/tracing fio worker processes

Unfortunately I can’t simply strace the fio command as it creates what I’d like to call “controller” process for the lack of better word. The “controller” is attached to the pts (=terminal). There are further worker processes for each job. In my previous example I only ran one concurrent job, but you can ask fio to run multiple jobs in parallel by specifying --numjobs. Here is an example running the default number of jobs:

$ fio --name=test2 --rw=randread --size=1000M --filename=/nvme/fio/test --ioengine=psync

Using ps I can see 2 fio processes:

$ ps -ef | grep 'martin.*fio.*psync'
martin     15280    7841  3 22:10 pts/0    00:00:00 fio --name=test2 --rw=randread ...
martin     15294   15280 17 22:10 ?        00:00:01 fio --name=test2 --rw=randread ...

The one attached to pts/0 doesn’t really do any of the I/O work, it seems to be limited to communication with the worker(s).

The trick is to attach strace to the worker process as soon as it is started. I could have used strace -f fio ... to follow fio’s child process creation but then I’d end up with information about 2 processes in the trace when I really only wanted one.

Tracing fio execution using strace

This required two terminal sessions: session 1 is used to start fio, session 2 will host the strace command. For the record, this is the command I used to attach strace to the fio worker process:

$ sudo strace -cp $(pgrep -u martin fio | sort | tail -1)

The above command only works for a single worker process as you can imagine. Let’s have a look at the fio output after strace was attached to it:

$ fio --name=test2 --rw=randread --size=1000M --filename=/nvme/fio/test --ioengine=psync 
test2: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=37.4MiB/s][r=9585 IOPS][eta 00m:00s]
test2: (groupid=0, jobs=1): err= 0: pid=12797: Wed Jul 28 22:36:45 2021
  read: IOPS=9457, BW=36.9MiB/s (38.7MB/s)(1000MiB/27069msec)
    clat (usec): min=72, max=226788, avg=103.92, stdev=563.06
     lat (usec): min=72, max=226789, avg=104.07, stdev=563.06
    clat percentiles (usec):
     |  1.00th=[   86],  5.00th=[   91], 10.00th=[   94], 20.00th=[   96],
     | 30.00th=[   98], 40.00th=[  100], 50.00th=[  101], 60.00th=[  103],
     | 70.00th=[  105], 80.00th=[  108], 90.00th=[  112], 95.00th=[  117],
     | 99.00th=[  133], 99.50th=[  139], 99.90th=[  172], 99.95th=[  223],
     | 99.99th=[  701]
   bw (  KiB/s): min= 1184, max=39872, per=99.98%, avg=37822.56, stdev=5116.95, samples=54
   iops        : min=  296, max= 9968, avg=9455.63, stdev=1279.23, samples=54
  lat (usec)   : 100=42.53%, 250=57.43%, 500=0.02%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 250=0.01%
  cpu          : usr=2.83%, sys=23.20%, ctx=768059, majf=0, minf=11
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=256000,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=36.9MiB/s (38.7MB/s), 36.9MiB/s-36.9MiB/s (38.7MB/s-38.7MB/s), io=1000MiB (1049MB), run=27069-27069msec

Disk stats (read/write):
  nvme0n1: ios=254356/0, merge=0/0, ticks=17648/0, in_queue=17648, util=99.39%

Performance is down significantly compared to the first execution:

  • 9457 vs 13.200 IOPS have been recorded
  • A bandwidth reduction to 36.9MiB/s instead of 51.5 MiB/s
  • All work has been completed in 27069 instead of 19436 milliseconds.

Not quite the same dramatic reduction in performance you typically see with dd, but still significant. Did I mention that strace slows things down?

Using perf trace

Following the same approach as just described, it is possible to perf trace the fio worker:

$ sudo perf trace -s -p $(pgrep -u martin fio | sort | tail -1)

Here is the corresponding fio output (perf trace was busy recording what’s been going on)

$ fio --name=test3 --rw=randread --size=1000M --filename=/nvme/fio/test --ioengine=psync 
test3: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=45.9MiB/s][r=11.8k IOPS][eta 00m:00s]
test3: (groupid=0, jobs=1): err= 0: pid=12960: Wed Jul 28 22:40:12 2021
  read: IOPS=11.6k, BW=45.1MiB/s (47.3MB/s)(1000MiB/22160msec)
    clat (usec): min=65, max=264743, avg=84.96, stdev=523.12
     lat (usec): min=65, max=264744, avg=85.15, stdev=523.13
    clat percentiles (usec):
     |  1.00th=[   73],  5.00th=[   75], 10.00th=[   77], 20.00th=[   81],
     | 30.00th=[   82], 40.00th=[   84], 50.00th=[   85], 60.00th=[   85],
     | 70.00th=[   86], 80.00th=[   88], 90.00th=[   91], 95.00th=[   93],
     | 99.00th=[  105], 99.50th=[  113], 99.90th=[  125], 99.95th=[  131],
     | 99.99th=[  151]
   bw (  KiB/s): min=18656, max=47512, per=99.98%, avg=46200.75, stdev=4265.17, samples=44
   iops        : min= 4664, max=11878, avg=11550.18, stdev=1066.29, samples=44
  lat (usec)   : 100=98.46%, 250=1.54%, 500=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 500=0.01%
  cpu          : usr=2.93%, sys=22.58%, ctx=256053, majf=0, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=256000,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=45.1MiB/s (47.3MB/s), 45.1MiB/s-45.1MiB/s (47.3MB/s-47.3MB/s), io=1000MiB (1049MB), run=22160-22160msec

Disk stats (read/write):
  nvme0n1: ios=255885/0, merge=0/0, ticks=17328/0, in_queue=17329, util=99.49%

Performance Summary

Let’s put each run into perspective:

MetricWithout tracing/ profilingTracing: straceChange vs baselinetracing: perf tracechange vs baseline
IOPS13200945772%1160088%
Bandwidth51.5 MiB/s36.9 MiB/s72%45.1 MiB/s88%
Duration1943627069140%22160115%
Performance impact of strace and perf trace on a fio workload

Please don’t extrapolate anything from the figures in the table, I haven’t run the benchmarks 10 times to get the averages, I just played around a bit to get a rough understanding of the matter. The important take away is that tracing and profiling has an effect on performance, and can cause trouble up to and including crashing processes. See my earlier warning :)

Summary

I hope to show you in this blog post that tracing and profiling have an impact – as you can see in the above table. However there are differences in the tools used, with perf trace looking a lot better than strace. By the way I investigated the use of --seccomp-bpf with strace but it’s not quite the same as perf trace as you have to pass the system calls you are interested in upfront. Attaching to running processes using the -p PID flag also doesn’t work in my version of strace.

The measurement intrusion effect with strace reduces throughput and IOPS to 72% of what was recorded with the baseline. It also took 1.4 times longer to complete the fio benchmark. Using perf trace is better, but not perfect. There is a 12 percent drop in IOPS and bandwidth, at the same time it takes 15% longer to complete the fio run.

By the way I found the effect of perf trace exacerbated in VMs. This makes sense when considering how virtualisation works. This article was written on actual physical hardware without any hypervisor in between the O/S and the hardware.

As a rule of thumb, the higher the number of system calls, the higher the effect of any tool investigating them.

Do I really have to set resource limits for Oracle in systemd unit files?

TL;DR: it is very much required to set resource limits in systemd unit files. If you’d like to learn more about potential directives to put into a unit file, please have a look at the earlier post I wrote for a more thorough explanation. I also wrote a short post about a potential unit file used to start an Oracle single-instance database (without Oracle Restart) with a couple more details.

Failing to set resource limits in systemd unit files can lead to outages.

Large pages are a good example

Without the Limit-directives added to the unit file the database(s) might refuse to start. Using large pages is a good example. Switching a database to enforce the use of large pages is easy:

SQL> ALTER SYSTEM SET use_large_pages='ONLY' SCOPE=SPFILE;

System altered.

When implementing this directive you tell Oracle not to start at all unless the entire Shared Global Area (SGA) fits into large pages. It also means the database won’t start if your configuration is wrong. Better get it right :)

How many large pages do I need?

On my host with a single Oracle database it couldn’t be easier to work out the number of large pages required. The alert.log tells me I need to allocate 2050 large pages if I want the SGA to fit completely. If there are multiple Oracle instances per host, you need to allocate the sum of all required large pages without over-allocating them (more on that later).

On my system 2050 large pages is comfortably below the mark of 70% of available memory. As a rule of thumb, don’t allocate more memory in large pages than 70% of your total available memory. The O/S, most processes and the PGA cannot use large pages. Allocating too many large pages is a sure way to create instability and it’s not easy to fix so please don’t do it.

Again, as there are no other Oracle instances on the host, defining the appropriate number of large pages is simple. To make the change permanent I need to either change /etc/sysctl.conf (not a good idea) or create a new file /etc/sysctl.d/99-oracle-large-pages.conf (preferred) containing the following directive:

vm.nr_hugepages = 2052

I was lucky and managed to allocate the necessary number of large pages at runtime saving the time of a reboot. Either way, the output of cat /proc/sys/vm/nr_hugepages needs to show the correct number of large pages available:

$ cat /proc/sys/vm/nr_hugepages
2052

Next I’m shutting the database down before removing LimitMEMLOCK from the unit file mentioned in my earlier post, and reloading the systemd daemon. As you might have imagined, the database doesn’t come up when the system reboots:

$ ps -ef | egrep "tns|smon"
root          37       2  0 09:30 ?        00:00:00 [netns]
oracle      1011       1  0 19:30 ?        00:00:00 /u01/app/oracle/product/19.0.0/dbhome_1/bin/tnslsnr LISTENER -inherit
vagrant     7829    7805  0 19:31 pts/0    00:00:00 grep -E --color=auto tns|smon

This shouldn’t be a surprise. The default memlock setting is 64kb, not quite enough for my 4 GB SGA:

$ whoami
vagrant
$ egrep "^Limit|Max locked memory" /proc/$$/limits  
Limit                     Soft Limit           Hard Limit           Units     
Max locked memory         65536                65536                bytes   

And sure enough, ${ORACLE_HOME}/rdbms/log/startup.log confirm the suspicion:

/u01/app/oracle/product/19.0.0/dbhome_1/bin/dbstart: Starting up database "ORCL"
Thu Jun 17 19:30:14 UTC 2021


SQL*Plus: Release 19.0.0.0.0 - Production on Thu Jun 17 19:30:15 2021
Version 19.11.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

SQL> Connected to an idle instance.
SQL> ORA-27125: unable to create shared memory segment
Linux-x86_64 Error: 1: Operation not permitted
Additional information: 4641
Additional information: 10485760
SQL> Disconnected

/u01/app/oracle/product/19.0.0/dbhome_1/bin/dbstart: Database instance "ORCL" warm started.

Well it wasn’t really warm started, was it ;) I had 2052 large pages available for the SGA, the problem is the shell limit: oracle isn’t allowed to lock more than the default 64k of memory in large pages since systemd doesn’t use pam_limits(8).

Restoring the LimitMEMLOCK directive fixes that problem. After changing the unit file I rebooted once more and, voila:

$ ps -ef | egrep "tns|smon"
root          37       2  0 19:37 ?        00:00:00 [netns]
oracle      1016       1  0 19:37 ?        00:00:00 /u01/app/oracle/product/19.0.0/dbhome_1/bin/tnslsnr LISTENER -inherit
oracle      2761       1  0 19:37 ?        00:00:00 ora_smon_ORCL
vagrant     4438    4412  0 19:39 pts/0    00:00:00 grep -E --color=auto tns|smon

The database is up and running, and it uses large pages:

$ grep -i ^hugepages_ /proc/meminfo
HugePages_Total:    2052
HugePages_Free:        5
HugePages_Rsvd:        3
HugePages_Surp:        0

Summary

You can use systemd on Oracle Linux 8 to start your databases along with the operating system. This is great in situations when you don’t want to/can’t use Oracle Restart and/or Oracle Real Application Clusters. Making use of systemd.directives(7) allows the database administrator to set appropriate limits for processes created by systemd as the latter doesn’t read settings in /etc/limits.d/* and /etc/limits.conf.

Installing perf on Ubuntu 20.04 LTS in Oracle Cloud Infrastructure

This is a very short post with instructions on how to install perf on Ubunutu 20.04 LTS in Oracle Cloud Infrastructure (OCI). I created my VM using the Canonical-Ubuntu-20.04-Minimal-2021.06.14-0 image.

Updating the image first

The first thing I do with each image I spin up in the cloud is to update all the packages. By the way, you log into ubuntu using the ubuntu account (not opc):

$ ssh ubuntu@1.2.3.3
Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 5.8.0-1034-oracle x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage


This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.

0 updates can be applied immediately.

Last login: Tue Jul 13 19:52:40 2021 from 1.2.3.2

The command to update all packages in Ubuntu is a one-liner:

$ sudo apt-get update && sudo apt-get dist-upgrade -y 

In many cases a new kernel is installed as part of that exercise, mandating a reboot.

Installing perf

The current kernel at the time of writing was 5.8.0-1034-oracle. It’s nice to see that Oracle chose the HWE kernel (eg 5.8x.) over the GA kernel (5.4.x).

I used the following command to install perf on the VM:

$ sudo apt-get install linux-tools-$(uname -r) linux-tools-generic
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libdw1 libnuma1 libpci3 libslang2 libunwind8 linux-oracle-5.8-tools-5.8.0-1034
  linux-tools-5.4.0-77 linux-tools-5.4.0-77-generic linux-tools-common pci.ids
The following NEW packages will be installed:
  libdw1 libnuma1 libpci3 libslang2 libunwind8 linux-oracle-5.8-tools-5.8.0-1034
  linux-tools-5.4.0-77 linux-tools-5.4.0-77-generic linux-tools-5.8.0-1034-oracle
  linux-tools-common linux-tools-generic pci.ids
0 upgraded, 12 newly installed, 0 to remove and 0 not upgraded.
Need to get 12.9 MB of archives.
After this operation, 59.3 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://eu-frankfurt-1-ad-3.clouds.archive.ubuntu.com/ubuntu focal/main amd64 libslang2 amd64 2.3.2-4 [429 kB]
...

Once the installation is complete I can use perf as usual:

$ perf --version
perf version 5.8.18

Tracepoints!

A quick check of static tracepoints reveals plenty in 5.8.0-1034-oracle:

$ sudo perf list tracepoint | \
> awk -F: '{ a[$1]++ } END { for (i in a) print i, a[i]}' | \
> column
  syscalls 670			  sync_trace 1			  interconnect 2
  scsi 5			  rcu 1				  skb 3
  io_uring 14			  i2c 4				  nmi 1
  exceptions 2			  rseq 2			  msr 3
  qdisc 4			  mmc 2				  xen 27
  drm 3				  erofs 9			  jbd2 17
  mce 1				  tlb 1				  napi 1
  libata 6			  x86_fpu 11			  clk 16
  block 18			  pwm 2				  fib 1
  rpm 5				  power 22			  workqueue 4
  xdp 12			  page_pool 4			  tcp 7
  iocost 5			  timer 13			  cpuhp 3
  thermal 5			  initcall 3			  filelock 12
  rtc 12			  printk 1			  smbus 4
  cros_ec 2			  intel_iommu 7			  pagemap 2
  writeback 34			  alarmtimer 4			  compaction 14
  gpio 2			  hwmon 3			  sched 24
  iomap 8			  regmap 15			  mdio 1
  devlink 5			  btrfs 79			  bpf_test_run 1
  sunrpc 109			  udp 1				  irq_vectors 34
  neigh 7			  net 18			  vmscan 18
  irq 5				  regulator 11			  irq_matrix 12
  devfreq 1			  random 15			  page_isolation 1
  hyperv 5			  task 2			  ftrace 2
  percpu 5			  huge_memory 4			  wbt 4
  migrate 1			  raw_syscalls 2		  iscsi 7
  thermal_power_allocator 2	  mmap 1			  fs_dax 14
  ras 6				  iommu 7			  oom 8
  sock 3			  fib6 1			  signal 2
  vsyscall 1			  swiotlb 1			  kmem 13
  bridge 4			  cgroup 13			  filemap 4
  spi 7				  module 5			  resctrl 3
  xhci-hcd 53			  ext4 105			  dma_fence 7

I’ll do a few more tests later, but for now it seems I’m set and ready to go. Should you find problems I’m happy to update the post so please let me know!

Using systemd to start an Oracle single instance database when booting

I don’t work with systemd frequently enough to remember its syntax and other intricacies so I thought I’d just write down how to start an Oracle 19c single instance database via systemd on Oracle Linux 8.4. Primarily so I can look it up later when I need it, but isn’t that the reason many bloggers write posts?

A little bit of background to this post

There are quite a few blog articles out there describing how to start an Oracle database when a server comes up, however some of them still use the old SysV init model, or upstart. Or a combination of both.

Since RedHat/Oracle Linux 7, systemd took over as the main component controlling the boot process (and a lot more …), so using systemd‘s native syntax and tooling sounds good to me. Please note that the technique you are about to read does not apply to Oracle Restart or Oracle Real Application Clusters. Furthermore, I haven’t had time to test any other combination than my lab VM running Oracle Linux 8.4 and Oracle database 19c (19.11.0 to be precise).

If you’d like to have some additional background information please have a look at Shell limits, Oracle Single Instance, and systemd in Oracle Linux 8.4. It explains my choice of location for the systemd unit file and some of the configuration directives.

As always, just because something works for me doesn’t mean it works for you as well, so test, test, and test some more if you like to make use of (parts of) the article.

Creating the systemd unit file

Based on the sources cited in the earlier post (such as the Red Hat 8 documentation and the always-excellent oracle-base.com) I put a unit file together. It lives in /etc/systemd/system/oracle-db.service.

[Unit]
Description=a service to start databases and listener automatically
After=syslog.target network.target

[Service]
LimitNOFILE=1024:65536
LimitNPROC=2047:16384
LimitSTACK=10485760:33554432
LimitMEMLOCK=infinity

Type=forking

User=oracle
Group=oinstall

ExecStart=/u01/app/oracle/product/19.0.0/dbhome_1/bin/dbstart /u01/app/oracle/product/19.0.0/dbhome_1
ExecStop=/u01/app/oracle/product/19.0.0/dbhome_1/bin/dbshut /u01/app/oracle/product/19.0.0/dbhome_1

RemainAfterExit=True
Restart=no

[Install]
WantedBy=multi-user.target

The unit file defines the properties of the oracle-db service. Most notably:

  • It sets the necessary process limits in the Service section
  • The service type indicates a process is forked by this unit file
  • Since dbstart and dbsthut exit once their work is completed I need to tell systemd to consider the status of the service to be active via the RemainAfterExit directive
  • I also don’t want the service to be restarted automatically
  • The unit file relies on Oracle’s dbstart and dbshut scripts to start and stop the databases on this host

The unit file assumes locally attached storage. Have a look at the article on oracle-base.com if you need NFS for an example on how to define dependencies for NFSv3. Similarly, you might want to add dependencies on iSCSI or other network storage.

The next step is to modify /etc/oratab and change the start flag for my database (there is only one, ORCL):

$ grep $ORACLE_SID /etc/oratab
ORCL:/u01/app/oracle/product/19.0.0/dbhome_1:Y

Only those entries in oratab with a Y as their start flag will be considered by dbstart and dbstop.

Hint: the above is super easy to automate with Ansible and jinja2 templates :)

Next you should execute sudo systemctl daemon-reload to make systemd aware of the new unit file. Once systemd knows about it, it should be enabled via systemctl enable oracle-db.service.

Starting and stopping the database

After a reboot I found the following entry for oracle-db.service in systemd‘s journal:

Jun 17 18:34:47 server1 systemd[1]: Starting a service to start databases and listener automatically...
Jun 17 18:34:48 server1 dbstart[973]: Processing Database instance "ORCL": log file /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/log/startup.log
Jun 17 18:35:04 server1 systemd[1]: Started a service to start databases and listener automatically.

That seemed to have worked. The unit file also took care of the database when the VM was shut down:

Jun 17 18:39:50 server1 systemd[1]: Stopping a service to start databases and listener automatically...
Jun 17 18:39:50 server1 dbshut[4486]: Processing Database instance "ORCL": log file /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/log/shutdown.log
Jun 17 18:40:12 server1 systemd[1]: oracle-db.service: Succeeded.
Jun 17 18:40:12 server1 systemd[1]: Stopped a service to start databases and listener automatically.

It appears as if my unit file does what I want it to do, hurray.

Geeking out on Linux: file descriptors and their open modes & flags

While preparing my talk for the upcoming POUG 2021 conference I wanted to know which flags Oracle passes to open(2) when accessing a data file. A file needs to be opened before its contents can be accessed (no surprises here :). There are quite a few ways of opening a file, depending on whether you want to use the file system cache or not, and whether you want to perform synchronous or asynchronous I/O. There are more alternatives as well but I haven’t come across them with the Oracle database so let’s stick with the arguments/flags/modes available with open(2).

Based on what I understand of how Linux works, you can use the following methods to get the flags passed to open(2):

  1. Use a tracer to catch the call to open(2) when it happens
  2. Find a way to look up flags passed to open(2) after the call completed

This posts shows you both. I should point out though that tracing is an intrusive process and all steps you see in this post have been performed in my Oracle-on-vagrant build running on my laptop. The post is for educational purposes only, especially since you don’t need a tracer to get the answer.

WARNING: don’t use tracers in production environments! They can crash systems. If you are curious about Linux tracing and profiling, always experiment on lab VMs where you are on your own, and crashes don’t cause problems to others. Or worse.

Background

Consider the following example. I have created a session with sqlplus (any other client will do, too) and am about to issue my first query. This is an Oracle 19c database although I’m pretty sure this has been the standard behaviour for quite some time. Please note that the following steps don’t apply when you are using Automatic Storage Management (ASM). I also used filesystemio_options set to none. You see, this is really a post for educational purposes.

Tracing

At this stage, I haven’t touched a user-created segment. Oracle will have to open the data file containing my table next. Let’s see what happens when I run my select statement. Opening a second session on the host I can create a bpftrace probe and attach it to sys_enter_openat() on my server process ID (SPID, 21554). As soon as I start my query in session 1, Oracle opens the user tablespace’s only data file and my probe in session 2 fires:

[root@server1 ~]# bpftrace -e 'tracepoint:syscalls:sys_enter_openat /pid == 21554/ { 
    printf("%s opened file %s with flags %x mode %x\n", comm, str(args->filename), args->flags, args->mode); 
}'
Attaching 1 probe...
oracle_21554_or opened file /u02/oradata/ORCL/datafile/o1_mf_users_jdoz9om2_.dbf with flags 1002 mode 0

If you aren’t intimately familiar with Linux you might wonder how I knew which arguments to pass to printf()? That’s easy enough, the format of sys_enter_openat is available in /sys/kernel/debug/tracing/events/syscalls/sys_enter_openat/format

# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_openat/format
name: sys_enter_openat
ID: 624
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:int __syscall_nr;	offset:8;	size:4;	signed:1;
	field:int dfd;	offset:16;	size:8;	signed:0;
	field:const char * filename;	offset:24;	size:8;	signed:0;
	field:int flags;	offset:32;	size:8;	signed:0;
	field:umode_t mode;	offset:40;	size:8;	signed:0;

print fmt: "dfd: 0x%08lx, filename: 0x%08lx, flags: 0x%08lx, mode: 0x%08lx", ((unsigned long)(REC->dfd)), ((unsigned long)(REC->filename)), ((unsigned long)(REC->flags)), ((unsigned long)(REC->mode))

Back to the output generated by bpftrace. Unfortunately this isn’t quite as straight-forward as I thought. First of all, I’m returning the arguments to openat() as a hexadecimal number. Actually the flags passed to openat() are in octal notation, which requires translating them. The obvious use of %o in printf() isn’t permitted. 1002 in base 16 equals 10002 in base 8.

But what does the flag really mean?

But that’s only half the job done. What does the flag actually mean? Researching a little bit I found a few header files with more details. Long story short I found what I needed in /usr/include/asm-generic/fcntl.h. But … Linux x86-64 doesn’t implement all of these. You can check whether a given flag is valid or not in /usr/src/kernels/$(uname -r)/include/linux/fcntl.h.

With all that research done I now can tell you that a flag of 10002 means the file was opened using O_RDWR and O_DSYNC.

[oracle@server1 ~]$ egrep 'define O.*[0-9]+' /usr/include/asm-generic/fcntl.h
...
#define O_RDONLY	00000000
#define O_WRONLY	00000001
#define O_RDWR		00000002
...
#define O_DSYNC		00010000	/* used to be O_SYNC, see below */
#define O_DIRECT	00040000	/* direct disk access hint */
#define O_LARGEFILE	00100000
#define O_DIRECTORY	00200000	/* must be a directory */
#define O_NOFOLLOW	00400000	/* don't follow links */
#define O_NOATIME	01000000
#define O_CLOEXEC	02000000	/* set close_on_exec */
...
[oracle@server1 ~]$ 

You test whether a flag is set by using a bitwise and operation, for O_RDWR that’s easy enough in bash:

$ echo $(( 10002 & 2 ))
2

If the result of the bitand operations matches the flag’s value (in this case 2 equals 2) the flag is set.

And what does it mean? Oracle opened the data file read/write for synchronous I/O. Which was to be expected given I’m using a file system with filesystemio_options set to none. Oracle will neither use direct I/O bypassing the file system cache, nor asynchronous I/O either.

Let’s recap for a minute. So far I have used a tracer to read the arguments to openat(2) and decoded the flag after translating its hexadecimal representation to an octal value. That’s quite a lot of work and unless you know the flags in open(2) by heart not very enjoyable. Furthermore, you won’t catch the arguments to open(2) if the process has issued the statement before you traced it.

Determining the flags retrospectively

There are many ways to do so, the easiest is to check /proc/pid/fdinfo/fd. The first step is to identify the file descriptor of the data file opened by the Oracle process. In this case the user tablespace’s data file was opened with a file descriptor of 259

[oracle@server1 ~]$ ls -l /proc/21554/fd
total 0
lr-x------. 1 oracle oinstall 64 Jul  9 19:49 0 -> /dev/null
l-wx------. 1 oracle oinstall 64 Jul  9 19:49 1 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul  9 19:49 10 -> 'pipe:[127168]'
l-wx------. 1 oracle oinstall 64 Jul  9 19:49 13 -> 'pipe:[127169]'
l-wx------. 1 oracle oinstall 64 Jul  9 19:49 2 -> /dev/null
lrwx------. 1 oracle oinstall 64 Jul  9 19:50 256 -> /u02/oradata/ORCL/datafile/o1_mf_system_jdoz8275_.dbf
lrwx------. 1 oracle oinstall 64 Jul  9 19:50 257 -> /u02/oradata/ORCL/datafile/o1_mf_sysaux_jdoz95dk_.dbf
lrwx------. 1 oracle oinstall 64 Jul  9 19:50 258 -> /u02/oradata/ORCL/datafile/o1_mf_undotbs1_jdoz9nj3_.dbf
lrwx------. 1 oracle oinstall 64 Jul  9 19:50 259 -> /u02/oradata/ORCL/datafile/o1_mf_users_jdoz9om2_.dbf
lr-x------. 1 oracle oinstall 64 Jul  9 19:49 3 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul  9 19:49 4 -> /u01/app/oracle/product/19.0.0/dbhome_1/rdbms/mesg/oraus.msb
lr-x------. 1 oracle oinstall 64 Jul  9 19:49 5 -> /proc/21554/fd
lr-x------. 1 oracle oinstall 64 Jul  9 19:49 6 -> /var/lib/sss/mc/passwd
lrwx------. 1 oracle oinstall 64 Jul  9 19:49 7 -> 'anon_inode:[eventpoll]'

I can grab the flags from fdinfo:

[oracle@server1 ~]$ grep flags /proc/21554/fdinfo/259
flags:	02110002

Huh? That’s 8 bit though, not 5 as in the previous example. Decoding the flag was a lot easier this time as I knew the flags and their bitmasks.

02110002
not setO_CLOEXECO_LARGEFILEO_DSYNCnot set not setnot setO_RDWR
Bitmask set in fdinfo decoded

Still very much the same picture as before with the addition of the O_LARGEFILE and O_CLOEXEC flags. Check man open(2) If you want to know what they imply.

Using lsof is cheating

If you are lucky enough to have lsof installed on your system you can use it as well. Although it’s not much of a challenge since lsof decodes the flags for you ;) Well most of them at least. The following output was generated on a different system hence the PID and fd don’t match the earlier outptut:

[oracle@server1 ~]$ lsof -p 10009 +fg | egrep 'COMMAND|users'
COMMAND     PID   USER   FD      TYPE         FILE-FLAG DEVICE   SIZE/OFF      NODE NAME
oracle_10 10009 oracle  257u      REG RW,SYN,LG,0x80000   8,33   15736832  16777348 /u02/oradata/ORCL/datafile/o1_mf_users_jdoz9om2_.dbf
[oracle@server1 ~]$ 

And you don’t even need to be root to get these details. Unfortunately lsof as packaged in Linux 8.4 (4.93.2) doesn’t recognise O_CLOEXEC as per a bugzilla entry I found.

Summary

I love Linux since there are so many ways to do things. My talk at POUG 2021 will contain lots of references to BCC and bpftrace so this was a good starting point to learn about these tools. On the other hand it turned out I didn’t need to concern myself with tracing tools at all as Linux often keeps information in many places. You just need to know what to do with it.

Shell limits, Oracle Single Instance, and systemd in Oracle Linux 8.4

The more blog posts I write the harder it becomes to come up with a good title. So what is this post all about? In short, it’s about the implications of using systemd to start a single instance Oracle database in Oracle Linux and associated shell limits. In theory this should apply to Single Instance Oracle only – Oracle Restart/Real Application Clusters use different means to control the start of databases via their respective registries.

A short introduction to systemd

As of Oracle Linux 7 systemd replaced upstart in the same way upstart replaced SysV init earlier. systemd is a system and service manager for Linux Operating systems and does a lot more than either of its predecessors. It’s also extremely well documented in man pages.

An Oracle administrator might use systemd to start a database automatically when a system boots up. This is done via a so-called unit file, containing the necessary instructions to start the Oracle database.

Which leads me back to the topic of this post. You can read at oracle-base.com and a few other places that systemd does not respect shell limits set by pam_limits(8). Which is considered a feature; feel free to check the bugzilla discussion for more details. Remember that pam_limits(8) and its associated limits.conf file is used to configure shell limits for the oracle user, as per the Database Installation Guide.

To demonstrate the effect I’m going to use my most recent Oracle Linux 8.4 lab VM. The oracle account has been created and configured via the 19c preinstall RPM. I’m using my Vagrant box, updated to Oracle Linux 8.4 including all patches up to June 14th.

NOTE: the settings you’ll see later in the unit files are for Oracle Linux 8.4. Although Oracle Linux 7 uses systemd as well it might not support the same directives and/or syntax.

Let’s have a look at systemd unit files.

Creating a oneshot systemd unit file

I picked /etc/systemd/system as the location for my new unit file. At least in Oracle Linux 8.4 that seems to be the preferred location for custom unit files.

WARNING: This is totally lab VM/playground territory, you should never do this in a real (live) environment!

If you find the examples confusing because you aren’t familiar with systemd please head over to the documentation for more information.

# cat /etc/systemd/system/limitstest.service 
[Unit]
Description=A oneshot service to test whether systemd respects PAM limits

[Service]
User=oracle
Group=oinstall
Type=oneshot
ExecStart="/home/oracle/test.sh"

This unit file merely fires off /home/oracle/test.sh (once) as oracle:oinstall. The little shell script in /home/oracle/test.sh couldn’t be simpler:

$ cat /home/oracle/test.sh 
#!/usr/bin/env bash

echo "-----------------------------------------------------------"
echo "test run starts at $(/usr/bin/date):"
echo "I am $(/usr/bin/whoami)"

echo ""
echo "my own PID is: $$"
echo ""
/usr/bin/ps -ef| /usr/bin/grep $$ | /usr/bin/grep -v grep

echo ""
echo "hard limits according to ulimit:"
echo ""
ulimit -Ha

echo ""
echo "soft limits according to ulimit:"
echo ""
ulimit -Sa

echo ""
echo "actual limits as per /proc":
echo ""
/usr/bin/cat /proc/$$/limits

echo -----------------------------------------------------------

It prints a few PIDs and shell limits (both hard and soft limits). Finally, it looks into the /proc file system to get the shell limits of itself (${$}).

Unless you restart your (virtual) server, systemd won’t know about the new unit file. Alternatively you can run systemctl daemon-reload to reload the daemon.

Running the unit file

So let’s go ahead and see what the output of the unit file is. I started it using systemctl start limitstest.service.

# journalctl -u limitstest.service

...

Jun 15 19:44:07 server1 systemd[1]: Starting A oneshot service to test whether systemd respects PAM limits...
Jun 15 19:44:07 server1 test.sh[13046]: -----------------------------------------------------------
Jun 15 19:44:07 server1 test.sh[13046]: test run starts at Tue Jun 15 19:44:07 UTC 2021:
Jun 15 19:44:07 server1 test.sh[13046]: I am oracle
Jun 15 19:44:07 server1 test.sh[13046]: my own PID is: 13046
Jun 15 19:44:07 server1 test.sh[13046]: oracle     13046       1  0 19:44 ?        00:00:00 bash /home/oracle/test.sh
Jun 15 19:44:07 server1 test.sh[13046]: oracle     13049   13046  0 19:44 ?        00:00:00 /usr/bin/ps -ef
Jun 15 19:44:07 server1 test.sh[13046]: hard limits according to ulimit:
Jun 15 19:44:07 server1 test.sh[13046]: core file size          (blocks, -c) unlimited
Jun 15 19:44:07 server1 test.sh[13046]: data seg size           (kbytes, -d) unlimited
Jun 15 19:44:07 server1 test.sh[13046]: scheduling priority             (-e) 0
Jun 15 19:44:07 server1 test.sh[13046]: file size               (blocks, -f) unlimited
Jun 15 19:44:07 server1 test.sh[13046]: pending signals                 (-i) 30554
Jun 15 19:44:07 server1 test.sh[13046]: max locked memory       (kbytes, -l) 64
Jun 15 19:44:07 server1 test.sh[13046]: max memory size         (kbytes, -m) unlimited
Jun 15 19:44:07 server1 test.sh[13046]: open files                      (-n) 262144
Jun 15 19:44:07 server1 test.sh[13046]: pipe size            (512 bytes, -p) 8
Jun 15 19:44:07 server1 test.sh[13046]: POSIX message queues     (bytes, -q) 819200
Jun 15 19:44:07 server1 test.sh[13046]: real-time priority              (-r) 0
Jun 15 19:44:07 server1 test.sh[13046]: stack size              (kbytes, -s) unlimited
Jun 15 19:44:07 server1 test.sh[13046]: cpu time               (seconds, -t) unlimited
Jun 15 19:44:07 server1 test.sh[13046]: max user processes              (-u) 30554
Jun 15 19:44:07 server1 test.sh[13046]: virtual memory          (kbytes, -v) unlimited
Jun 15 19:44:07 server1 test.sh[13046]: file locks                      (-x) unlimited
Jun 15 19:44:07 server1 test.sh[13046]: soft limits according to ulimit:
Jun 15 19:44:07 server1 test.sh[13046]: core file size          (blocks, -c) unlimited
Jun 15 19:44:07 server1 test.sh[13046]: data seg size           (kbytes, -d) unlimited
Jun 15 19:44:07 server1 test.sh[13046]: scheduling priority             (-e) 0
Jun 15 19:44:07 server1 test.sh[13046]: file size               (blocks, -f) unlimited
Jun 15 19:44:07 server1 test.sh[13046]: pending signals                 (-i) 30554
Jun 15 19:44:07 server1 test.sh[13046]: max locked memory       (kbytes, -l) 64
Jun 15 19:44:07 server1 test.sh[13046]: max memory size         (kbytes, -m) unlimited
Jun 15 19:44:07 server1 test.sh[13046]: open files                      (-n) 1024
Jun 15 19:44:07 server1 test.sh[13046]: pipe size            (512 bytes, -p) 8
Jun 15 19:44:07 server1 test.sh[13046]: POSIX message queues     (bytes, -q) 819200
Jun 15 19:44:07 server1 test.sh[13046]: real-time priority              (-r) 0
Jun 15 19:44:07 server1 test.sh[13046]: stack size              (kbytes, -s) 8192
Jun 15 19:44:07 server1 test.sh[13046]: cpu time               (seconds, -t) unlimited
Jun 15 19:44:07 server1 test.sh[13046]: max user processes              (-u) 30554
Jun 15 19:44:07 server1 test.sh[13046]: virtual memory          (kbytes, -v) unlimited
Jun 15 19:44:07 server1 test.sh[13046]: file locks                      (-x) unlimited
Jun 15 19:44:07 server1 test.sh[13046]: actual limits as per /proc:
Jun 15 19:44:07 server1 test.sh[13046]: Limit                     Soft Limit           Hard Limit           Units
Jun 15 19:44:07 server1 test.sh[13046]: Max cpu time              unlimited            unlimited            seconds
Jun 15 19:44:07 server1 test.sh[13046]: Max file size             unlimited            unlimited            bytes
Jun 15 19:44:07 server1 test.sh[13046]: Max data size             unlimited            unlimited            bytes
Jun 15 19:44:07 server1 test.sh[13046]: Max stack size            8388608              unlimited            bytes
Jun 15 19:44:07 server1 test.sh[13046]: Max core file size        unlimited            unlimited            bytes
Jun 15 19:44:07 server1 test.sh[13046]: Max resident set          unlimited            unlimited            bytes
Jun 15 19:44:07 server1 test.sh[13046]: Max processes             30554                30554                processes
Jun 15 19:44:07 server1 test.sh[13046]: Max open files            1024                 262144               files
Jun 15 19:44:07 server1 test.sh[13046]: Max locked memory         65536                65536                bytes
Jun 15 19:44:07 server1 test.sh[13046]: Max address space         unlimited            unlimited            bytes
Jun 15 19:44:07 server1 test.sh[13046]: Max file locks            unlimited            unlimited            locks
Jun 15 19:44:07 server1 test.sh[13046]: Max pending signals       30554                30554                signals
Jun 15 19:44:07 server1 test.sh[13046]: Max msgqueue size         819200               819200               bytes
Jun 15 19:44:07 server1 test.sh[13046]: Max nice priority         0                    0
Jun 15 19:44:07 server1 test.sh[13046]: Max realtime priority     0                    0
Jun 15 19:44:07 server1 test.sh[13046]: Max realtime timeout      unlimited            unlimited            us
Jun 15 19:44:07 server1 test.sh[13046]: -----------------------------------------------------------
Jun 15 19:44:07 server1 systemd[1]: limitstest.service: Succeeded.
Jun 15 19:44:07 server1 systemd[1]: Started A oneshot service to test whether systemd respects PAM limits.

...

All right, that worked and we are off to a good start. Those of us who installed Oracle a few times might already spot a few details in the above output. It’s not quite what I had in mind.

Executing test.sh in an interactive shell

For comparison, this is the output when logged in as oracle (in an interactive shell)

$ ./test.sh 
-----------------------------------------------------------
test run starts at Tue Jun 15 20:13:41 UTC 2021:
I am oracle

my own PID is: 13173

oracle     13173   13150  0 20:13 pts/0    00:00:00 bash ./test.sh
oracle     13176   13173  0 20:13 pts/0    00:00:00 /usr/bin/ps -ef

hard limits according to ulimit:

core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 30554
max locked memory       (kbytes, -l) 134217728
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 32768
cpu time               (seconds, -t) unlimited
max user processes              (-u) 16384
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

soft limits according to ulimit:

core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 30554
max locked memory       (kbytes, -l) 134217728
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 16384
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

actual limits as per /proc:

Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            10485760             33554432             bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             16384                16384                processes 
Max open files            1024                 65536                files     
Max locked memory         137438953472         137438953472         bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       30554                30554                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
-----------------------------------------------------------

As you can see, there are quite some differences making it necessary to set limits via systemd-specific syntax in the unit file itself. I can only conclude that systemd – as documented – does not pay attention to the configuration set by pam_limits(8).

OK, so shell limits aren’t respected, what next?

So if systemd doesn’t make use of pam_limits(8) there needs to be a different solution. There is a whole raft of options documented in systemd.directives(7) for Oracle Linux 8. Again, this might be different for Oracle Linux 7. The database needs the following shell limits to be set:

  • open file descriptors (“nofile”)
  • number of processes available to a single user (“nproc”)
  • Size of the stack segment per process (“stack”)
  • Maximum locked memory limit (“memlock”)

These map to the following systemd.directives(7):

  • LimitNOFILE
  • LimitNPROC
  • LimitSTACK
  • LimitMEMLOCK

You should really go ahead and read the man pages for systemd, they are great!

Amending the unit file

The next step is to add the above directives to the unit file, reload systemd, start the service and see what happens. Note that most sources I found only set LimitMEMLOCK and LimitNOFILE for Oracle. I am using the values from the Oracle documentation – your system might require different settings.

[Unit]
Description=A oneshot service to test whether systemd respects PAM limits

[Service]
LimitNOFILE=1024:65536
LimitNPROC=2047:16384
LimitSTACK=10485760:33554432
LimitMEMLOCK=infinity
User=oracle
Group=oinstall
Type=oneshot
ExecStart="/home/oracle/test.sh"

Apart from the additional Limit.* directives, it’s the same file. The syntax in Oracle Linux 8 is LimitDIRECTIVE=soft:hard limit.

With the unit file changed, the following limits have been recorded:

Jun 16 20:30:57 server1 systemd[1]: Starting A oneshot service to test whether systemd respects PAM limits...
Jun 16 20:30:57 server1 test.sh[13821]: -----------------------------------------------------------
Jun 16 20:30:57 server1 test.sh[13821]: test run starts at Wed Jun 16 20:30:57 UTC 2021:
Jun 16 20:30:57 server1 test.sh[13821]: I am oracle
Jun 16 20:30:57 server1 test.sh[13821]: my own PID is: 13821
Jun 16 20:30:57 server1 test.sh[13821]: oracle     13821       1  0 20:30 ?        00:00:00 bash /home/oracle/test.sh
Jun 16 20:30:57 server1 test.sh[13821]: oracle     13824   13821  0 20:30 ?        00:00:00 /usr/bin/ps -ef
Jun 16 20:30:57 server1 test.sh[13821]: hard limits according to ulimit:
Jun 16 20:30:57 server1 test.sh[13821]: core file size          (blocks, -c) unlimited
Jun 16 20:30:57 server1 test.sh[13821]: data seg size           (kbytes, -d) unlimited
Jun 16 20:30:57 server1 test.sh[13821]: scheduling priority             (-e) 0
Jun 16 20:30:57 server1 test.sh[13821]: file size               (blocks, -f) unlimited
Jun 16 20:30:57 server1 test.sh[13821]: pending signals                 (-i) 30554
Jun 16 20:30:57 server1 test.sh[13821]: max locked memory       (kbytes, -l) unlimited
Jun 16 20:30:57 server1 test.sh[13821]: max memory size         (kbytes, -m) unlimited
Jun 16 20:30:57 server1 test.sh[13821]: open files                      (-n) 65536
Jun 16 20:30:57 server1 test.sh[13821]: pipe size            (512 bytes, -p) 8
Jun 16 20:30:57 server1 test.sh[13821]: POSIX message queues     (bytes, -q) 819200
Jun 16 20:30:57 server1 test.sh[13821]: real-time priority              (-r) 0
Jun 16 20:30:57 server1 test.sh[13821]: stack size              (kbytes, -s) 32768
Jun 16 20:30:57 server1 test.sh[13821]: cpu time               (seconds, -t) unlimited
Jun 16 20:30:57 server1 test.sh[13821]: max user processes              (-u) 16384
Jun 16 20:30:57 server1 test.sh[13821]: virtual memory          (kbytes, -v) unlimited
Jun 16 20:30:57 server1 test.sh[13821]: file locks                      (-x) unlimited
Jun 16 20:30:57 server1 test.sh[13821]: soft limits according to ulimit:
Jun 16 20:30:57 server1 test.sh[13821]: core file size          (blocks, -c) unlimited
Jun 16 20:30:57 server1 test.sh[13821]: data seg size           (kbytes, -d) unlimited
Jun 16 20:30:57 server1 test.sh[13821]: scheduling priority             (-e) 0
Jun 16 20:30:57 server1 test.sh[13821]: file size               (blocks, -f) unlimited
Jun 16 20:30:57 server1 test.sh[13821]: pending signals                 (-i) 30554
Jun 16 20:30:57 server1 test.sh[13821]: max locked memory       (kbytes, -l) unlimited
Jun 16 20:30:57 server1 test.sh[13821]: max memory size         (kbytes, -m) unlimited
Jun 16 20:30:57 server1 test.sh[13821]: open files                      (-n) 1024
Jun 16 20:30:57 server1 test.sh[13821]: pipe size            (512 bytes, -p) 8
Jun 16 20:30:57 server1 test.sh[13821]: POSIX message queues     (bytes, -q) 819200
Jun 16 20:30:57 server1 test.sh[13821]: real-time priority              (-r) 0
Jun 16 20:30:57 server1 test.sh[13821]: stack size              (kbytes, -s) 10240
Jun 16 20:30:57 server1 test.sh[13821]: cpu time               (seconds, -t) unlimited
Jun 16 20:30:57 server1 test.sh[13821]: max user processes              (-u) 2047
Jun 16 20:30:57 server1 test.sh[13821]: virtual memory          (kbytes, -v) unlimited
Jun 16 20:30:57 server1 test.sh[13821]: file locks                      (-x) unlimited
Jun 16 20:30:57 server1 test.sh[13821]: actual limits as per /proc:
Jun 16 20:30:57 server1 test.sh[13821]: Limit                     Soft Limit           Hard Limit           Units
Jun 16 20:30:57 server1 test.sh[13821]: Max cpu time              unlimited            unlimited            seconds
Jun 16 20:30:57 server1 test.sh[13821]: Max file size             unlimited            unlimited            bytes
Jun 16 20:30:57 server1 test.sh[13821]: Max data size             unlimited            unlimited            bytes
Jun 16 20:30:57 server1 test.sh[13821]: Max stack size            10485760             33554432             bytes
Jun 16 20:30:57 server1 test.sh[13821]: Max core file size        unlimited            unlimited            bytes
Jun 16 20:30:57 server1 test.sh[13821]: Max resident set          unlimited            unlimited            bytes
Jun 16 20:30:57 server1 test.sh[13821]: Max processes             2047                 16384                processes
Jun 16 20:30:57 server1 test.sh[13821]: Max open files            1024                 65536                files
Jun 16 20:30:57 server1 test.sh[13821]: Max locked memory         unlimited            unlimited            bytes
Jun 16 20:30:57 server1 test.sh[13821]: Max address space         unlimited            unlimited            bytes
Jun 16 20:30:57 server1 test.sh[13821]: Max file locks            unlimited            unlimited            locks
Jun 16 20:30:57 server1 test.sh[13821]: Max pending signals       30554                30554                signals
Jun 16 20:30:57 server1 test.sh[13821]: Max msgqueue size         819200               819200               bytes
Jun 16 20:30:57 server1 test.sh[13821]: Max nice priority         0                    0
Jun 16 20:30:57 server1 test.sh[13821]: Max realtime priority     0                    0
Jun 16 20:30:57 server1 test.sh[13821]: Max realtime timeout      unlimited            unlimited            us
Jun 16 20:30:57 server1 systemd[1]: limitstest.service: Succeeded.
Jun 16 20:30:57 server1 test.sh[13821]: -----------------------------------------------------------
Jun 16 20:30:57 server1 systemd[1]: Started A oneshot service to test whether systemd respects PAM limits.

Thankfully this works, and the settings as documented by Oracle are implemented.

Summary

I always found it hard to come up with a working systemd unit file to start a single instance Oracle database. There is a wide range of articles covering SysV init, upstart, and systemd. It certainly can be confusing for a DBA to pick the right one. I have never been a great fan of using /etc/init.d for startup scripts in modern Linux distributions, even though Oracle Linux has a compatibility wrapper to deal with legacy startup scripts. After spending time with systemd I’m now comfortable writing my own start/stop script for RDBMS and listener. Maybe I’ll publish the results in a different post.

I hope you found the article useful for writing your own systemd unit files in Oracle Linux 7 and 8.

Automating Vagrant Box versioning

The longer I work in IT the more I dislike repetitive processes. For example, when updating my Oracle Linux 8 Vagrant Base Box I repeat the same process over and over:

  • Boot the VirtualBox (source) VM
  • Enable port forwarding for SSH
  • SSH to the VM to initiate the update via dnf update -y && reboot
  • Run vagrant package, calculate the SHA256 sum, modify the metadata file
  • Use vagrant box update to make it known to vagrant

There has to be a better way to do that, and in fact there is. A little bit of shell scripting later all I need to do is run my “update base box” script, and grab a coffee while it’s all done behind the scenes. The most part of the exercise laid out above is quite boring, but I thought I’d share how I’m modifying the metadata file in the hope to save you a little bit of time and effort. If you would like a more thorough explanation of the process please head over to my previous post.

Updating the Metadata File

If you would like to version-control your vagrant boxes locally, you need a metadata file, maybe something similar to ol8.json shown below. It defines my Oracle Linux 8 boxes (at the moment there is only one):

$ cat ol8.json 
{
  "name": "ol8",
  "description": "Martins Oracle Linux 8",
  "versions": [
    {
      "version": "8.4.0",
      "providers": [
        {
          "name": "virtualbox",
          "url": "file:///vagrant/boxes/ol8_8.4.0.box",
          "checksum": "b28a3413d33d4917bc3b8321464c54f22a12dadd612161b36ab20754488f4867",
          "checksum_type": "sha256"
        }
      ]
    }
  ]
}

For the sake of argument, let’s assume I want to upgrade my Oracle Linux 8.4.0 box to the latest and greatest packages that were available at the time of writing. As it’s a minor update I’ll call the new version 8.4.1. To keep the post short and (hopefully) entertaining I’m skipping the upgrade of the VM.

Option (1): jq

Fast forward to the metadata update: I need to add a new element to the versions array. I could have used jq for that purpose and it would have been quite easy:

$ jq '.versions += [{
>       "version": "8.4.1",
>       "providers": [
>         {
>           "name": "virtualbox",
>           "url": "file:///vagrant/boxes/ol8_8.4.1.box",
>           "checksum": "ecb3134d7337a9ae32c303e2dee4fa6e5b9fbbea5a38084097a6b5bde2a56671",
>           "checksum_type": "sha256"
>         }
>       ]
>     }]' ol8.json
{
  "name": "ol8",
  "description": "Martins Oracle Linux 8",
  "versions": [
    {
      "version": "8.4.0",
      "providers": [
        {
          "name": "virtualbox",
          "url": "file:///vagrant/boxes/ol8_8.4.0.box",
          "checksum": "b28a3413d33d4917bc3b8321464c54f22a12dadd612161b36ab20754488f4867",
          "checksum_type": "sha256"
        }
      ]
    },
    {
      "version": "8.4.1",
      "providers": [
        {
          "name": "virtualbox",
          "url": "file:///vagrant/boxes/ol8_8.4.1.box",
          "checksum": "ecb3134d7337a9ae32c303e2dee4fa6e5b9fbbea5a38084097a6b5bde2a56671",
          "checksum_type": "sha256"
        }
      ]
    }
  ]
}

That would be too easy ;) Sadly I don’t have jq available on all the systems I’d like to run this script on. But wait, I have Python available.

Option (2): Python

Although I’m certainly late to to the party I truly enjoy working with Python. Below you’ll find a (shortened) version of a Python script to take care of the metadata addition.

Admittedly it does a few additional things compared to the very basic jq example. For instance, it takes a backup of the metadata file, takes and parses command line arguments etc. It’s a bit longer than a one-liner though ;)

#!/usr/bin/env python3

# PURPOSE
# add metadata about a new box version to the metadata file
# should also work with python2

import json
import argparse
import os
import sys
from time import strftime
import shutil

# Parsing the command line. Use -h to print help
parser = argparse.ArgumentParser()
parser.add_argument("version",       help="the new version of the vagrant box to be added. Must be unique")
parser.add_argument("sha256sum",     help="the sha256 sum of the newly created package.box")
parser.add_argument("box_file",      help="full path to the package.box, eg /vagrant/boxes/ol8_8.4.1.box")
parser.add_argument("metadata_file", help="full path to the metadata file, eg /vagrant/boxes/ol8.json")
args = parser.parse_args()

# this is the JSON element to add
new_box_version = {
    "version": args.version,
    "providers": [
        {
            "name": "virtualbox",
            "url": "file://" + args.box_file,
            "checksum": args.sha256sum,
            "checksum_type": "sha256"
        }
    ]
}

...

# check if the box_file exists
if (not os.path.isfile(args.box_file)):
    sys.exit("FATAL: Vagrant box file {} does not exist".format(args.box_file))

# read the existing metadata file
try:
    with open(args.metadata_file, 'r+') as f:
        metadata = json.load(f)
except OSError as err:
    sys.exit ("FATAL: Cannot open the metadata file {} for reading: {}".format(args.metadata_file, err))

# check if the version to be added exists already. 
all_versions =  metadata["versions"]
if args.version in all_versions.__str__():
    sys.exit ("FATAL: new version {} to be added is a duplicate".format(args.version))

# if the new box doesn't exist already, it's ok to add it
metadata['versions'].append(new_box_version)

# create a backup of the existing file before writing
try:
    bkpfile = args.metadata_file + "_" + strftime("%y%m%d_%H%M%S")
    shutil.copy(args.metadata_file, bkpfile)
except OSError as err:
    sys.exit ("FATAL: cannot create a backup of the metadata file {}".format(err))

# ... and write changes to disk
try:
    with open(args.metadata_file, 'w') as f:
        json.dump(metadata, f, indent=2)
except OSError as err:
    sys.exit ("FATAL: cannot save metadata to {}: {}".format(args.metadata_file, err))

print("INFO: process completed successfully")

That’s it! Next time I need to upgrade my Vagrant boxes I can rely on a fully automated process, saving me quite a bit of time when I’m instantiating a new Vagrant-based environment.

SQLcl autotrace is way more than “just” (SQL*Plus) autotrace

As part of the research I did for the 2nd edition of our Exadata book I used session statistics quite heavily. Session statistics can provide additional insights in situations where the wait interface on its own doesn’t get you any further.

On the command line, my tools of choice are either Tanel Poder’s excellent Session Snapper or Adrian Billington’s mystats utility. There are of course others, it just so happened that I’ve been using the previous two quite heavily. If you prefer a graphical front-end such as SQL Developer, you can grab the change in session statistics easily as described in an earlier post.

I’m not sure why it never occurred to me to check if SQL Developer’s best friend, sqlcl, has the same functionality. Well it does, and that’s great news. The other great news is that you can download sqlcl directly from Oracle’s website now.

UPDATE when I first wrote about sqlcl‘s ability to display session statistics it had a problem getting the actual execution plan from the database (it used explain plan for <sql> internally. This is fixed with the current release. I updated the article accordingly.

Downloading SQLcl the new way

The first step to complete is to pull the latest version of sqlcl. Many thanks to Connor McDonald who told me there is a super-easy way to always get the latest version. Just use the following URL:

https://download.oracle.com/otn_software/java/sqldeveloper/sqlcl-latest.zip

Great! I don’t even need to worry about the version:

$ wget https://download.oracle.com/otn_software/java/sqldeveloper/sqlcl-latest.zip
--2021-07-29 17:11:23--  https://download.oracle.com/otn_software/java/sqldeveloper/sqlcl-latest.zip
Resolving download.oracle.com (download.oracle.com)... 104.76.200.85
Connecting to download.oracle.com (download.oracle.com)|104.76.200.85|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 37581054 (36M) [application/zip]
Saving to: ‘sqlcl-latest.zip’

sqlcl-latest.zip           100%[=====================================>]  35.84M  2.43MB/s    in 18s     

2021-07-29 17:11:42 (1.95 MB/s) - ‘sqlcl-latest.zip’ saved [37581054/37581054]

Note that I’m downloading sqlcl-latest.zip, which translates to 21.2.1.195.1252 at the time of writing:

$ cd sqlcl/bin && ./sql -V
SQLcl: Release 21.2.1.0 Production Build: 21.2.1.195.1252

With the latest sqlcl release unzipped it’s time to grab some session statistics.

Session Statistics, please!

To keep matters reasonably simple I’m using a non-partitioned table for my demonstration. As always, my examples use the excellent Swingbench (Order Entry) benchmark schema. Once connected to the database, I can start investigating. By the way I’m running this query on Oracle 19c in a VM on my laptop.

The first step is to enable auto-trace as shown here:

SQL> show version
Oracle SQLDeveloper Command-Line (SQLcl) version: 21.2.1.0 build: 21.2.1.195.1252
SQL> help set autotrace
SET AUTOTRACE
  SET AUTOT[RACE] {OFF|ON|TRACE[ONLY]}
SQL> set autotrace on 
Autotrace Enabled
Shows the execution plan as well as statistics of the statement.

Let’s run a query against the orders table:

SQL> var oid number
SQL> exec :oid := 1

PL/SQL procedure successfully completed.

SQL> select /*+ gather_plan_statistics */ /* sqlcl is cool */ 
  2* order_id, order_date, order_mode from soe.orders o where o.order_id = :oid;

   ORDER_ID                         ORDER_DATE    ORDER_MODE 
___________ __________________________________ _____________ 
          1 07-MAY-11 06.00.00.000000000 AM    direct        

PLAN_TABLE_OUTPUT 
______________________________________________________________________________________________________________ 
SQL_ID  f746puksv24jv, child number 0                                                                          
-------------------------------------                                                                          
select /*+ gather_plan_statistics */ /* sqlcl is cool */  order_id,                                            
order_date, order_mode from soe.orders o where o.order_id = :oid                                               
                                                                                                               
Plan hash value: 4043159647                                                                                    
                                                                                                               
-----------------------------------------------------------------------------------------------------------    
| Id  | Operation                   | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |    
-----------------------------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT            |          |      1 |        |      1 |00:00:00.01 |       4 |      3 |    
|   1 |  TABLE ACCESS BY INDEX ROWID| ORDERS   |      1 |      1 |      1 |00:00:00.01 |       4 |      3 |    
|*  2 |   INDEX UNIQUE SCAN         | ORDER_PK |      1 |      1 |      1 |00:00:00.01 |       3 |      2 |    
-----------------------------------------------------------------------------------------------------------    
                                                                                                               
Predicate Information (identified by operation id):                                                            
---------------------------------------------------                                                            
                                                                                                               
   2 - access("O"."ORDER_ID"=:OID)                                                                             
                                                                                                               

Statistics
-----------------------------------------------------------
               1  CPU used by this session
               1  CPU used when call started
               1  DB time
              44  Requests to/from client
              44  SQL*Net roundtrips to/from client
               3  buffer is not pinned count
            1299  bytes received via SQL*Net from client
           84726  bytes sent via SQL*Net to client
               4  calls to get snapshot scn: kcmgss
               4  calls to kcmgcs
           24576  cell physical IO interconnect bytes
               4  consistent gets
               4  consistent gets examination
               1  consistent gets examination (fastpath)
               4  consistent gets from cache
               1  enqueue releases
               1  enqueue requests
               4  execute count
             195  file io wait time
               3  free buffer requested
               1  index fetch by key
           32768  logical read bytes from cache
              56  non-idle wait count
               4  opened cursors cumulative
               1  opened cursors current
               4  parse count (total)
               3  physical read IO requests
           24576  physical read bytes
               3  physical read total IO requests
           24576  physical read total bytes
               3  physical reads
               3  physical reads cache
               1  rows fetched via callback
               4  session logical reads
               3  shared hash latch upgrades - no wait
               1  sorts (memory)
            2010  sorts (rows)
               1  table fetch by rowid
              46  user calls 

This is pretty nifty in my opinion. There’s an execution plan and a whole raft of relevant session statistics.

Unlike in earlier sqlcl releases the execution plan is now fetched from the database. I ran a sql trace to confirm although the presence of a SQL ID and a child number in the above output should indicate this is the “real” plan.

Happy Troubleshooting!

Installing instant clients on Oracle Linux 8 in Oracle Cloud Infrastructure

I recently presented my views on upgrading to Oracle 19c to the DOAG community. As part of that I pointed out how easy it was to deploy the instant client on a VM using yum and dnf. I previously wrote about the installation of the instant client in an on-premises configuration.

This post describes the (admittedly short) procedure to deploy the current instant client to an Oracle Linux 8.3 installation. The steps in this post should also allow you to deploy the instant client in an on-premises environment the same way.

The OCI VM I created is as vanilla as it gets, I didn’t even put it under OS Management control.

Initial Deployment

After the VM has been provisioned, I found the following repositories enabled:

$ sudo dnf repolist
repo id               repo name
ol8_MySQL80           MySQL 8.0 for Oracle Linux 8 (x86_64)
ol8_UEKR6             Latest Unbreakable Enterprise Kernel Release 6 for Oracle Linux 8 (x86_64)
ol8_addons            Oracle Linux 8 Addons (x86_64)
ol8_appstream         Oracle Linux 8 Application Stream (x86_64)
ol8_baseos_latest     Oracle Linux 8 BaseOS Latest (x86_64)
ol8_ksplice           Ksplice for Oracle Linux 8 (x86_64)
ol8_oci               Oracle Linux 8 OCI Packages (x86_64)
ol8_oci_included      Oracle Software for OCI users on Oracle Linux 8 (x86_64)

A quick search for the instant client reveals the following hits:

$ sudo dnf search instantclient
Last metadata expiration check: 0:22:35 ago on Fri 21 May 2021 06:38:23 GMT.
=================================== Name Matched: instantclient ====================================
oracle-instantclient-release-el8.x86_64 : Oracle Instant Client yum repository configuration
oracle-instantclient-release-el8.src : Oracle Instant Client yum repository configuration

That isn’t the actual instant client though:

$ sudo dnf info oracle-instantclient-release-el8
Last metadata expiration check: 0:23:34 ago on Fri 21 May 2021 06:38:23 GMT.
Available Packages
Name         : oracle-instantclient-release-el8
Version      : 1.0
Release      : 1.el8
Architecture : src
Size         : 15 k
Source       : None
Repository   : ol8_baseos_latest
Summary      : Oracle Instant Client yum repository configuration
License      : GPLv2
Description  : This package contains the  Oracle Instant Client yum repository configuration.

Add the Repository

So let’s add the repository:

$ sudo dnf install oracle-instantclient-release-el8
Last metadata expiration check: 0:25:04 ago on Fri 21 May 2021 06:38:23 GMT.
Dependencies resolved.
====================================================================================================
 Package                                Architecture Version          Repository               Size
====================================================================================================
Installing:
 oracle-instantclient-release-el8       x86_64       1.0-1.el8        ol8_baseos_latest        16 k

Transaction Summary
====================================================================================================
Install  1 Package

Total download size: 16 k
Installed size: 18 k
Is this ok [y/N]: y
Downloading Packages:
oracle-instantclient-release-el8-1.0-1.el8.x86_64.rpm               212 kB/s |  16 kB     00:00    
----------------------------------------------------------------------------------------------------
Total                                                               201 kB/s |  16 kB     00:00     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                            1/1 
  Installing       : oracle-instantclient-release-el8-1.0-1.el8.x86_64                          1/1 
  Running scriptlet: oracle-instantclient-release-el8-1.0-1.el8.x86_64                          1/1 
  Verifying        : oracle-instantclient-release-el8-1.0-1.el8.x86_64                          1/1 

Installed:
  oracle-instantclient-release-el8-1.0-1.el8.x86_64                                                 

Complete!

With the repository configuration in place I should be able to install the instant client next.

Installing the instant client

The next search for the instant client is more successful. The repository configuration is added automatically, refreshed, and dnf presented the following hits:

$ sudo dnf search instantclient
Oracle Instant Client 21 for Oracle Linux 8 (x86_64)                 17 kB/s | 4.2 kB     00:00    
=================================== Name Matched: instantclient ====================================
oracle-instantclient-basic.x86_64 : Oracle Instant Client Basic package
oracle-instantclient-basiclite.x86_64 : Oracle Instant Client Light package
oracle-instantclient-devel.x86_64 : Development header files for Oracle Instant Client.
oracle-instantclient-jdbc.x86_64 : Supplemental JDBC features for the Oracle Instant Client
oracle-instantclient-odbc.x86_64 : Oracle Instant Client ODBC
oracle-instantclient-release-el8.x86_64 : Oracle Instant Client yum repository configuration
oracle-instantclient-release-el8.src : Oracle Instant Client yum repository configuration
oracle-instantclient-sqlplus.x86_64 : Oracle Instant Client SQL*Plus package
oracle-instantclient-tools.x86_64 : Tools for Oracle Instant Client

I wanted to use SQL*Plus, which should pull the dependent *basic package as well. And it does:

$ sudo dnf install oracle-instantclient-sqlplus
Last metadata expiration check: 0:01:12 ago on Fri 21 May 2021 07:03:52 GMT.
Dependencies resolved.
====================================================================================================
 Package                          Arch       Version           Repository                      Size
====================================================================================================
Installing:
 oracle-instantclient-sqlplus     x86_64     21.1.0.0.0-1      ol8_oracle_instantclient21     706 k
Installing dependencies:
 oracle-instantclient-basic       x86_64     21.1.0.0.0-1      ol8_oracle_instantclient21      54 M

Transaction Summary
====================================================================================================
Install  2 Packages

Total download size: 54 M
Installed size: 241 M
Is this ok [y/N]: y
Downloading Packages:
(1/2): oracle-instantclient-sqlplus-21.1.0.0.0-1.x86_64.rpm         7.1 MB/s | 706 kB     00:00    
(2/2): oracle-instantclient-basic-21.1.0.0.0-1.x86_64.rpm            35 MB/s |  54 MB     00:01    
----------------------------------------------------------------------------------------------------
Total                                                                35 MB/s |  54 MB     00:01     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                            1/1 
  Installing       : oracle-instantclient-basic-21.1.0.0.0-1.x86_64                             1/2 
  Running scriptlet: oracle-instantclient-basic-21.1.0.0.0-1.x86_64                             1/2 
  Installing       : oracle-instantclient-sqlplus-21.1.0.0.0-1.x86_64                           2/2 
  Running scriptlet: oracle-instantclient-sqlplus-21.1.0.0.0-1.x86_64                           2/2 
  Verifying        : oracle-instantclient-basic-21.1.0.0.0-1.x86_64                             1/2 
  Verifying        : oracle-instantclient-sqlplus-21.1.0.0.0-1.x86_64                           2/2 

Installed:
  oracle-instantclient-basic-21.1.0.0.0-1.x86_64  oracle-instantclient-sqlplus-21.1.0.0.0-1.x86_64 

Complete!

Thankfully, sqlplus is in the path, and can be executed without setting PATH, or LD_LIBRARY_PATH.

$ sqlplus /nolog

SQL*Plus: Release 21.0.0.0.0 - Production on Fri May 21 07:07:56 2021
Version 21.1.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

Nice! A quick check on yum.oracle.com reveals the same instant client is available over there, too.

Have fun!

Installation of Oracle Restful Data Services 20.4 without using SYSDBA

It’s really hard to come up with good, concise, and short twitter-compatible blog titles, so let’s try with a few more words. What I’d like to share is how to install Oracle Restful Data Services (ORDS) v20.4 without having to connect to the database as SYSDBA. There are good reasons not to connect as SYSDBA, and I’m glad Oracle made it possible to grant the necessary privileges for an ORDS installation to a regular database account. It’s not a new feature at all, the option has been around for a while but I didn’t have time to write about it yet.

Some background before getting started

The installation of Oracle’s Restful Data Services (ORDS) consists of 2 parts:

  1. Deploying the ORDS distribution to a Docker container or virtual machine
  2. Connecting to and configuring its database

Initially it was necessary to connect to the highly privileged SYSDBA account to complete step 2. Thankfully this isn’t necessary anymore. My post describes how to install and configure ORDS 20.4 against an Oracle 19c (non-container) database.

Deploying ORDS

The first step consists of downloading ORDS from Oracle’s website. I am planning on deploying it in one of my Oracle Linux 8 vagrant boxes. There is more to the deployment of ORDS than configuration, which I’m leaving to a later post. More specifically I’ll not concern myself integrating ORDS into a Docker container or even Tomcat 9.x to keep the post simple.

The Ansible playbook I use creates an Oracle account out of habit. It also creates the location I want it to use – /opt/ords – and changes the directory’s ownership to oracle:oinstall. Finally, it unzips ORDS.

Configuring ORDS

ORDS stores its configuration in a directory of your choice. Since I’m lazy I use /opt/ords/config for this purpose. Once the directory is created on the file system you tell ORDS where to find its configuration:

[oracle@ords ords]$ java -jar /opt/ords/ords.war configdir /opt/ords/config
2021-04-21T18:48:23.156Z INFO        Set config.dir to /opt/ords/config in: /opt/ords/ords.war

Once this step is complete it’s time to configure ORDS and its database connection. This step is #2 in the above list and referred to as “installation” in the documentation.

Creating the less-privileged user in the database

As per the Install Guide you need to run a script to grant a non-privileged user the rights to configure ORDS. The script doesn’t create the user so you have to ensure it exists. The user didn’t seem to require any elevated privileges. I went creative and created the installation account:

SQL> create user ordsinstall identified by values '...';

User created.

SQL> grant create session to ordsinstall;

Grant succeeded.

With the user created I could start the minimum privilege script:

SQL> @installer/ords_installer_privileges ordsinstall

The script takes one argument: the account you intend to use for the installation (ordsinstall).

Installing ORDS in the database

The final step is to install ORDS in the database. This can be done in many ways. Trying to keep it simple I went with the interactive installation.

Have a look at the screen output, it should be self-explanatory for the most part. When prompted for the administrator username you provide the account just created (ordsinstall in my case). Since I wanted to try SQL*Developer Web, I chose that option. Your mileage may vary.

[oracle@ords ords]$ java -jar ords.war install advanced
Specify the database connection type to use.
Enter number for [1] Basic  [2] TNS  [3] Custom URL [1]:1
Enter the name of the database server [localhost]:server3
Enter the database listen port [1521]:
Enter 1 to specify the database service name, or 2 to specify the database SID [1]:1
Enter the database service name:ORCL
Enter 1 if you want to verify/install Oracle REST Data Services schema or 2 to skip this step [1]:
Enter the database password for ORDS_PUBLIC_USER:
Confirm password:
Requires to login with administrator privileges to verify Oracle REST Data Services schema.

Enter the administrator username:ordsinstall
Enter the database password for ordsinstall:
Confirm password:
Connecting to database user: ordsinstall url: jdbc:oracle:thin:@//server3:1521/ORCL

Retrieving information.
Enter the default tablespace for ORDS_METADATA [SYSAUX]:
Enter the temporary tablespace for ORDS_METADATA [TEMP]:
Enter the default tablespace for ORDS_PUBLIC_USER [SYSAUX]:
Enter the temporary tablespace for ORDS_PUBLIC_USER [TEMP]:
Enter 1 if you want to use PL/SQL Gateway or 2 to skip this step.
If using Oracle Application Express or migrating from mod_plsql then you must enter 1 [1]:2
Enter a number to select a feature to enable:
   [1] SQL Developer Web  (Enables all features)
   [2] REST Enabled SQL
   [3] Database API
   [4] REST Enabled SQL and Database API
   [5] None
Choose [1]:1
2021-04-21T19:38:18.012Z INFO        reloaded pools: []
Installing Oracle REST Data Services version 20.4.3.r0501904
... Log file written to /home/oracle/ords_install_core_2021-04-21_193818_00414.log
... Verified database prerequisites
... Created Oracle REST Data Services proxy user
Warning: Nashorn engine is planned to be removed from a future JDK release
... Created Oracle REST Data Services schema
... Granted privileges to Oracle REST Data Services
... Created Oracle REST Data Services database objects
... Log file written to /home/oracle/ords_install_datamodel_2021-04-21_193842_00226.log
... Log file written to /home/oracle/ords_install_apex_2021-04-21_193846_00491.log
Completed installation for Oracle REST Data Services version 20.4.3.r0501904. Elapsed time: 00:00:32.177 

Enter 1 if you wish to start in standalone mode or 2 to exit [1]:2
[oracle@ords ords]$

That’s it! ORDS has been configured in the database, and I didn’t have to connect as SYSDBA. I think that’s a big step ahead, and I have meant to write about this topic for a while.

Have fun!