Category Archives: Linux

Silent installation: Oracle Restart 19c, ASM Filter Driver, RHCK edition

As promised in the earlier post here are my notes about installing Oracle Restart 19c on Oracle Linux 7.7 using the RedHat compatible kernel (RHCK). Please consult the ACFS/ASMFD compatibility matrix, My Oracle Support DocID 1369107.1 for the latest information about ASMFD compatibility with various kernels as well.

Why am I starting the series with a seemingly “odd” kernel, at least from the point of view of Oracle Linux? If you try to install the Oracle Restart base release with UEK 5, you get strange error messages back from gridSetup telling you about invalid ASM disks. While that’s probably true, it’s a secondary error. The main cause of the problem is this:

[root@server5 bin]# ./afddriverstate supported
AFD-620: AFD is not supported on this operating system version: '4.14.35-1902.300.11.el7uek.x86_64'
AFD-9201: Not Supported
AFD-9294: updating file /etc/sysconfig/oracledrivers.conf 

Which is easy to run into since doesn’t validate this for you when running in silent mode. The GUI version of the installer protects you from the mistake though. Upgrading to the latest UEK 5 doesn’t change this message, you need to check the certification matrix to learn that Oracle Restart 19.4.0 and later are required for UEK 5 if you’d like to use ASMFD (or ACFS for that matter). This scenario will be covered in a later post.

Using the Red Hat Compatible Kernel alleviates this problem for me. Just be aware of the usual caveats when using the Red Hat Kernel on Oracle Linux such as YUM changing the default kernel during yum upgrade etc. I’d also like to iterate that this post isn’t an endorsement for ASM Filter Driver, but since the documentation was a little unclear I thought I’d write up how I got to a working installation. It is up to you to ensure that ASMFD is a workable solution for your environment by following industry best known practices.

Configuration Options

In the post introducing this series I claimed to have identified 2 options for installing Oracle Restart 19c using ASMFD: the first one is to use UDEV to prepare ASM block devices, the second one is to label the ASM disks using asmcmd afd_label.

Huh, UDEV? That hasn’t really been blogged about at all in the context of ASMFD, or at least I didn’t find anyone who did. I’m inferring the possibility of using UDEV from “Configuring Oracle ASM Filter Driver During Installation” (link to documentation):

“If you do not use udev on the system where the Oracle Grid Infrastructure is installed, then you can also complete the following procedure to provision disks for Oracle ASMFD before the installer is launched”

You actually only have to choose one of them. Let’s start with the more frequently covered approach of labelling disks using asmcmd.

My environment

I have applied all the patches to this environment up to March 26th to my lab enviroment. The Oracle Linux release I’m using is 7.7:

[root@server4 ~]# cat /etc/oracle-release
Oracle Linux Server release 7.7 

The KVM VM I’m using for this blog post uses the latest Red Hat Compatible Kernel at the time of writing (kernel-3.10.0-1062.18.1.el7.x86_64). You will notice that I’m using the virtio driver, leading to “strange” device names. Instead of /dev/sd it’s /dev/vd. My first two block devices are reserved for the O/S and Oracle, the remaining ones are going to be used for ASM. I have an old (bad?) habit of partitioning block devices for ASM as you might notice. Most of the Oracle setup is done by the 19c preinstall RPM, which I used.

I created a grid owner – grid – to own the Oracle Restart installation. Quite a few blog posts I came across referenced group membership, and I’d like to do the same:

[root@server4 ~]# id -a grid 
uid=54322(grid) gid=54321(oinstall) groups=54321(oinstall),54322(dba),54328(asmadmin),54327(asmdba) 

The block devices I’m intending to use for ASM are /dev/vdc to /dev/vdf – the first 2 are intended for +DATA, the other 2 will become part of +RECO. As you can see they are partitioned:

[root@server4 ~]# lsblk --ascii
vdf                   251:80   0   10G  0 disk 
`-vdf1                251:81   0   10G  0 part 
vdd                   251:48   0   10G  0 disk 
`-vdd1                251:49   0   10G  0 part 
vdb                   251:16   0   50G  0 disk 
`-vdb1                251:17   0   50G  0 part 
  `-oraclevg-orabinlv 252:2    0   50G  0 lvm  /u01
sr0                    11:0    1 1024M  0 rom  
vde                   251:64   0   10G  0 disk 
`-vde1                251:65   0   10G  0 part 
vdc                   251:32   0   10G  0 disk 
`-vdc1                251:33   0   10G  0 part 
vda                   251:0    0   12G  0 disk 
|-vda2                251:2    0 11.5G  0 part 
| |-rootvg-swaplv     252:1    0  768M  0 lvm  [SWAP]
| `-rootvg-rootlv     252:0    0 10.8G  0 lvm  /
`-vda1                251:1    0  500M  0 part /boot  

With all that out of the way it is time to cover the installation.

Labeling disks

I’m following the procedure documented in the 19c Administrator’s Guide chapter 20, section “Configuring Oracle ASM Filter Driver During Installation”. I have prepared my environment up to the step where I’d have to launch This is a fairly well known process, and I won’t repeat it here.

Once the 19c install image has been extracted to my future Grid Home, the first step is to check if my system is supported:

[root@server4 ~]# cd /u01/app/grid/product/19.0.0/grid/bin
[root@server4 bin]# ./afddriverstate supported
AFD-9200: Supported 
[root@server4 bin]# uname -r

“AFD-9200: Supported” tells me that I can start labeling disks. This requires me to be root, and I have to set ORACLE_HOME and ORACLE_BASE. For some reason, the documentation suggests using /tmp as ORACLE_BASE, which I’ll use as well:

[root@server4 bin]# pwd
[root@server4 bin]# export ORACLE_BASE=/tmp
[root@server4 bin]# export ORACLE_HOME=/u01/app/grid/product/19.0.0/grid
[root@server4 bin]# ./asmcmd afd_label DATA1 /dev/vdc1 --init
[root@server4 bin]# ./asmcmd afd_label DATA2 /dev/vdd1 --init 

[root@server4 bin]# ./asmcmd afd_lslbl /dev/vdc1
Label                     Duplicate  Path
DATA1                                 /dev/vdc1

[root@server4 bin]# ./asmcmd afd_lslbl /dev/vdd1
Label                     Duplicate  Path
DATA2                                 /dev/vdd1  

Note the use of the –init flag. This is only needed if Grid Infrastructure isn’t installed yet.

Labeling the disks did not have an effect on the block devices’ permissions. Right after finishing the 2 calls to label my 2 block devices, this is the output from my file system:

[root@server4 bin]# ls -l /dev/vd[c-d]*
brw-rw----. 1 root disk 252, 32 Mar 27 09:46 /dev/vdc
brw-rw----. 1 root disk 252, 33 Mar 27 12:55 /dev/vdc1
brw-rw----. 1 root disk 252, 48 Mar 27 09:46 /dev/vdd
brw-rw----. 1 root disk 252, 49 Mar 27 12:58 /dev/vdd1
[root@server4 bin]#  

The output of afd_lslbl indicated that both of my disks are ready to become part of an ASM disk group, so let’s start the installer.


I haven’t been able to make sense of the options in the response file until I started the installer in GUI mode and created a response file based on my choices. To cut a long story short, here is my call to

[grid@server4 ~]$ /u01/app/grid/product/19.0.0/grid/ -silent \
> INVENTORY_LOCATION=/u01/app/oraInventory \
> ORACLE_BASE=/u01/app/grid \
> -waitforcompletion -ignorePrereqFailure -lenientInstallMode \
> oracle.install.option=HA_CONFIG \
> oracle.install.asm.OSDBA=asmdba \
> oracle.install.asm.OSASM=asmadmin \
> \
> oracle.install.asm.diskGroup.disks=/dev/vdc1,/dev/vdd1 \
> oracle.install.asm.diskGroup.diskDiscoveryString=/dev/vd* \
> oracle.install.asm.diskGroup.redundancy=EXTERNAL \
> oracle.install.asm.diskGroup.AUSize=4 \
> oracle.install.asm.configureAFD=true \
> \
> oracle.install.asm.SYSASMPassword=thinkOfASuperSecretPassword \
> oracle.install.asm.monitorPassword=thinkOfASuperSecretPassword
Launching Oracle Grid Infrastructure Setup Wizard...

The response file for this session can be found at:

You can find the log of this install session at:

As a root user, execute the following script(s):
        1. /u01/app/oraInventory/
        2. /u01/app/grid/product/19.0.0/grid/

Execute /u01/app/grid/product/19.0.0/grid/ on the following nodes:

Successfully Setup Software.
As install user, execute the following command to complete the configuration.
/u01/app/grid/product/19.0.0/grid/ -executeConfigTools -responseFile /u01/app/grid/product/19.0.0/grid/install/response/grid_2020-03-27_01-06-14PM.rsp [-silent]
Note: The required passwords need to be included in the response file.
Moved the install session logs to:
[grid@server4 ~]$

It took a little while to work out that despite labeling the disks for ASMFD I didn’t have to put any reference to AFD into the call to Have a look at the ASM disk string and the block devices: that’s what I’d use if I were using UDEV rules for device name persistence. The syntax might appear counter-intuitive. However there’s a “configureAFD” flag you need to set to true.

Since this is a lab environment I’m ok with external redundancy. Make sure you pick a redundancy level appropriate for your use case.

Running the configuration tools

The remaining steps are identical to a non ASMFD setup. First you run followed by The output of the latter showed this for me, indicating success:

[root@server4 ~]# /u01/app/grid/product/19.0.0/grid/
Check /u01/app/grid/product/19.0.0/grid/install/root_server4_2020-03-27_13-11-05-865019723.log for the output of root script

[root@server4 ~]#
[root@server4 ~]# cat /u01/app/grid/product/19.0.0/grid/install/root_server4_2020-03-27_13-11-05-865019723.log
Performing root user operation.

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/grid/product/19.0.0/grid
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...

Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/grid/product/19.0.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
2020/03/27 13:11:13 CLSRSC-363: User ignored prerequisites during installation
Creating OCR keys for user 'grid', privgrp 'oinstall'..
Operation successful.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Node server4 successfully pinned.
2020/03/27 13:13:55 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'

server4     2020/03/27 13:16:59     /u01/app/grid/crsdata/server4/olr/backup_20200327_131659.olr     724960844
2020/03/27 13:17:54 CLSRSC-327: Successfully configured Oracle Restart for a standalone server
[root@server4 ~]# 

Well that looks ok, now on to the final step, configuration! As indicated in the output, you need to update the response (/u01/app/grid/product/19.0.0/grid/install/response/grid_2020-03-27_01-06-14PM.rsp) file with the required passwords. For me that was oracle.install.asm.monitorPassword and oracle.install.asm.SYSASMPassword. Once the response file was updated, I called once again:

[grid@server4 ~]$ /u01/app/grid/product/19.0.0/grid/ -executeConfigTools -responseFile /u01/app/grid/product/19.0.0/grid/install/response/grid_2020-03-27_01-06-14PM.rsp -silent
Launching Oracle Grid Infrastructure Setup Wizard...

You can find the logs of this session at:

You can find the log of this install session at:
Successfully Configured Software. 

And that’s it! The software has been configured successfully. Don’t forget to remove the passwords from the response file!


After a little while I have been able to configure Oracle Restart 19c/ASMFD on Oracle Linux 7.7/RHCK. Let’s check what this implies.

I’ll first look at the status of ASM Filter Driver:

[grid@server4 ~]$ . oraenv
ORACLE_SID = [grid] ? +ASM
The Oracle base has been set to /u01/app/grid
[grid@server4 ~]$ afddriverstate installed
AFD-9203: AFD device driver installed status: 'true'
[grid@server4 ~]$ afddriverstate loaded
AFD-9205: AFD device driver loaded status: 'true'
[grid@server4 ~]$ afddriverstate version
AFD-9325:     Driver OS kernel version = 3.10.0-862.el7.x86_64.
AFD-9326:     Driver build number = 190222.
AFD-9212:     Driver build version =
AFD-9547:     Driver available build number = 190222.
AFD-9548:     Driver available build version =
[grid@server4 ~]$  

That’s encouraging: ASMFD is loaded and works on top of kernel-3.10 (RHCK)

I am indeed using the base release (and have to patch now!)

[grid@server4 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
29585399;OCW RELEASE UPDATE (29585399)
29517247;ACFS RELEASE UPDATE (29517247)
29517242;Database Release Update : (29517242)
29401763;TOMCAT RELEASE UPDATE (29401763)

OPatch succeeded. 

And … I’m also using ASMFD:

SQL> col name for a20
SQL> col path for a10
SQL> col library for a50
SQL> set lines 120
SQL> select name, path, library from v$asm_disk where group_number <> 0;

NAME                 PATH       LIBRARY
-------------------- ---------- --------------------------------------------------
DATA1                AFD:DATA1  AFD Library - Generic , version 3 (KABI_V3)
DATA2                AFD:DATA2  AFD Library - Generic , version 3 (KABI_V3)

SQL> show parameter asm_diskstring

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring                       string      /dev/vd*, AFD:*

This concludes the setup of my lab environment.

Vagrant tips’n’tricks: changing /etc/hosts automatically for Oracle Universal Installer

Oracle Universal Installer, or OUI for short, doesn’t at all like it if the hostname resolves to an IP address in the range. At best it complains, at worst it starts installing and configuring software only to abort and bury the real cause deep in the logs.

I am a great fan of HashiCorp’s Vagrant as you might have guessed reading some of the previous articles, and as such wanted a scripted solution to changing the hostname to something more sensible before I begin provisioning software. I should probably add that I’m using my own base boxes; the techniques in this post should equally apply to other boxes as well.

Each of the Vagrant VMs I’m creating is given a private network for communication with its peers. This is mainly done to prevent me from having to deal with port forwarding on the NAT device. If you haven’t used Vagrant before you might not know that by default, each Vagrant VM will come up with a single NIC that has to use NAT. The end goal for this post is to ensure that my VM’s hostname maps to the private network’s IP address, not as it would normally do.

Setting the scene

By default, Vagrant doesn’t seem to mess with the hostname of the VM. This can be changed by using a configuration variable. Let’s start with the Vagrantfile for my Oracle Linux 7 box:

# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure("2") do |config|
  config.vm.define "ol7guest" do |u|
    # this is a base box I created and stored locally = "oracleLinux7Base"

    u.ssh.private_key_path = "/path/to/key"

    u.vm.hostname = "ol7guest" "private_network", ip: ""

    u.vm.provider "virtualbox" do |v|
      v.memory = 2048 = "ol7guest"
      v.cpus = 1

Please ignore the fact that my Vagrantfile is slightly more complex than it needs to be. I do like having speaking names for my VMs, rather than “default” showing up in vagrant status. Using this terminology in the Vagrantfile also makes it easier to add more VMs to the configuration should I so need.

Apart from you just read the only remarkable thing to mention about this file is this line:

    u.vm.hostname = "ol7guest"

As per the Vagrant documentation, I can use this directive to set the hostname of the VM. And indeed, it does:

$ vagrant ssh ol7guest
Last login: Thu Jan 09 21:14:59 2020 from
[vagrant@ol7guest ~]$  

The hostname is set, however it resolves to as per /etc/hosts:

[vagrant@ol7guest ~]$ cat /etc/hosts    ol7guest    ol7guest   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6 

Not quite what I had in mind, but apparently expected behaviour. So the next step is to change the first line in /etc/hosts to match the private IP address I assigned to the second NIC. As an Ansible fan I am naturally leaning towards using a playbook, but I also understand that not everyone has Ansible installed on the host and using the ansible_local provisioner might take longer than necessary unless your box has Ansible pre-installed.

The remainder of this post deals with an Ansible solution and the least common denominator, the shell provisioner.

Using an Ansible playbook

Many times I’m using Ansible playbooks to deploy software to Vagrant VMs anyway, so embedding a little piece of code into my playbooks to change /etc/hosts isn’t a lot of work. The first step is to amend the Vagrantfile to reference the Ansible provisioner. One possible way to do this in the context of my example is this:

# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure("2") do |config|
  config.vm.define "ol7guest" do |u|
    # this is a base box I created and stored locally = "oracleLinux7Base"

    u.ssh.private_key_path = "/path/to/key"

    u.vm.hostname = "ol7guest" "private_network", ip: ""

    u.vm.provision "ansible" do |ansible|
      ansible.playbook = "change_etc_hosts.yml"
      ansible.verbose = "v"

    u.vm.provider "virtualbox" do |v|
      v.memory = 2048 = "ol7guest"
      v.cpus = 1

It is mostly the same file with the addition of the call to Ansible. As you can imagine the playbook is rather simple:

- hosts: ol7guest
  become: yes
  - name: change /etc/hosts
      path: '/etc/hosts'
      regexp: '.*ol7guest.*' 
      line: '   ol7guest' 
      backup: yes

It uses the lineinfile module to find lines containing ol7guest and replaces that line with the “correct” IP address. The resulting hosts file is exactly what I need:

[vagrant@ol7guest ~]$ cat /etc/hosts   ol7guest   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
[vagrant@ol7guest ~]$ 

The first line of the original file has been replaced with the private IP which should enable OUI to progress past this potential stumbling block.

Using the shell provisioner

The second solution involves the shell provisioner, which – unlike Ansible – isn’t distribution agnostic and needs to be tailored to the target platform. On Oracle Linux, the following worked for me:

# -*- mode: ruby -*-
# vi: set ft=ruby :

$script = <<-SCRIPT
/usr/bin/cp /etc/hosts /root && \
/usr/bin/sed -ie '/ol7guest/d' /etc/hosts && \
/usr/bin/echo ' ol7guest' >> /etc/hosts

Vagrant.configure("2") do |config|
  config.vm.define "ol7guest" do |u|
    # this is a base box I created and stored locally = "oracleLinux7Base"

    u.ssh.private_key_path = "/path/to/key"

    u.vm.hostname = "ol7guest" "private_network", ip: ""

    u.vm.provision "shell", inline: $script

    u.vm.provider "virtualbox" do |v|
      v.memory = 2048 = "ol7guest"
      v.cpus = 1

The script copies /etc/hosts to root’s home directory and then changes it to match my needs. At the end, the file is in exactly the shape I need it to be in.


Whether you go with the shell provisioner or embed the change to the hostname in an (existing) Ansible playbook doesn’t matter much. I would definitely argue in support of having the code embedded in a playbook if that’s what will provision additional software anyways. If installing Ansible on the host isn’t an option, using the shell as a fallback mechanism is perfectly fine, too. Happy hacking!

Tips’n’tricks: finding the (injected) private key pair used in Vagrant boxes

In an earlier article I described how you could use SSH keys to log into a Vagrant box created by the Virtualbox provider. The previous post emphasised my preference for using custom Vagrant boxes and my own SSH keys.

Nevertheless there are occasions when you can’t create your own Vagrant box, and you have to resort to the Vagrant insecure-key-pair-swap procedure instead. If you are unsure about these security related discussion points, review the documentation about creating one’s own Vagrant boxes (section “Default User Settings”) for some additional background information.

Continuing the discussion from the previous post, what does a dynamically injected SSH key imply for the use with the SSH agent?

Vagrant cloud, boxes, and the insecure key pair

Let’s start with an example to demonstrate the case. I have decided to use the latest Ubuntu 16.04 box from HashiCorp’s Vagrant cloud for no particular reason. In hindsight I should have gone for 18.04 instead, as it’s much newer. For the purpose of this post it doesn’t really matter though.

$ vagrant up ubuntu
Bringing machine 'ubuntu' up with 'virtualbox' provider...
==> ubuntu: Importing base box 'ubuntu/xenial64'...
==> ubuntu: Matching MAC address for NAT networking...
==> ubuntu: Checking if box 'ubuntu/xenial64' version '20191204.0.0' is up to date...
==> ubuntu: Setting the name of the VM: ubuntu
==> ubuntu: Fixed port collision for 22 => 2222. Now on port 2200.
==> ubuntu: Clearing any previously set network interfaces...
==> ubuntu: Preparing network interfaces based on configuration...
    ubuntu: Adapter 1: nat
    ubuntu: Adapter 2: hostonly
==> ubuntu: Forwarding ports...
    ubuntu: 22 (guest) => 2200 (host) (adapter 1)
==> ubuntu: Running 'pre-boot' VM customizations...
==> ubuntu: Booting VM...
==> ubuntu: Waiting for machine to boot. This may take a few minutes...
    ubuntu: SSH address:
    ubuntu: SSH username: vagrant
    ubuntu: SSH auth method: private key
    ubuntu: Vagrant insecure key detected. Vagrant will automatically replace
    ubuntu: this with a newly generated keypair for better security.
    ubuntu: Inserting generated public key within guest...
    ubuntu: Removing insecure key from the guest if it's present...
    ubuntu: Key inserted! Disconnecting and reconnecting using new SSH key...
==> ubuntu: Machine booted and ready!
==> ubuntu: Checking for guest additions in VM...
    ubuntu: The guest additions on this VM do not match the installed version of
    ubuntu: VirtualBox! In most cases this is fine, but in rare cases it can
    ubuntu: prevent things such as shared folders from working properly. If you see
    ubuntu: shared folder errors, please make sure the guest additions within the
    ubuntu: virtual machine match the version of VirtualBox you have installed on
    ubuntu: your host and reload your VM.
    ubuntu: Guest Additions Version: 5.1.38
    ubuntu: VirtualBox Version: 6.0
==> ubuntu: Setting hostname...
==> ubuntu: Mounting shared folders...
    ubuntu: /vagrant => /home/martin/vagrant/ubunutu 

This started my “ubuntu” VM (I don’t like it when my VMs are called “default”, so I tend to give them better designations):

$ vboxmanage list vms | grep ubuntu
"ubuntu" {a507ba0c-...24bb} 

You may have noticed that 2 network interfaces are brought online in the output created by vagrant up. This is done to stay in line with the story of the previous post and not something that’s strictly speaking necessary.

The key message in the context of this blog post found the logs is this:

    ubuntu: SSH auth method: private key
    ubuntu: Vagrant insecure key detected. Vagrant will automatically replace
    ubuntu: this with a newly generated keypair for better security.
    ubuntu: Inserting generated public key within guest...
    ubuntu: Removing insecure key from the guest if it's present...
    ubuntu: Key inserted! Disconnecting and reconnecting using new SSH key... 

As you can read, the insecure key was detected and replaced. But where can I find the replaced key?

Locating the new private key

This took me a little while to find out, and I’m hoping this post saves you a minute. The key information (drum roll please) can be found in the output of vagrant ssh-config:

$ vagrant ssh-config ubuntu
Host ubuntu
  User vagrant
  Port 2200
  UserKnownHostsFile /dev/null
  StrictHostKeyChecking no
  PasswordAuthentication no
  IdentityFile /home/martin/vagrant/ubunutu/.vagrant/machines/ubuntu/virtualbox/private_key
  IdentitiesOnly yes
  LogLevel FATAL 

This contains all the information you need to SSH into the machine! It doesn’t seem to print information about the second NIC though, but that’s ok as I can always look at its details in the Vagrantfile itself.


Using the information from above, I can connect to the system using either port 2200 (forwarded on the NAT device), or the private IP (which is and has not been shown here):

$ ssh -p 2200 \
> -i /home/martin/vagrant/ubunutu/.vagrant/machines/ubuntu/virtualbox/private_key \
> vagrant@localhost hostname

$ ssh -i /home/martin/vagrant/ubunutu/.vagrant/machines/ubuntu/virtualbox/private_key \
> vagrant@ hostname

This should be all you need to get cracking with the Vagrant box. But wait! The full path to the key is somewhat lengthy, and that makes it a great candidate for storing it with the SSH agent. That’s super-easy, too:

$ ssh-add /home/martin/vagrant/ubunutu/.vagrant/machines/ubuntu/virtualbox/private_key
Identity added: /home/martin/vagrant/ubunutu/.vagrant/machines/ubuntu/virtualbox/private_key (/home/martin/vagrant/ubunutu/.vagrant/machines/ubuntu/virtualbox/private_key)

Apologies for the formatting. But it was worth it!

$ ssh vagrant@ hostname

That’s a lot less typing than before…

By the way, it should be easy to spot this key in the output of ssh-add -l as it’s most likely the one with the longest path. If that doesn’t help you identify the key, ssh-keygen -lf /path/to/key prints the key’s fingerprint, for which you can grep in the output of ssh-add -l.

Have fun!

Tips’n’tricks: understanding “too many authentication failures” in SSH

Virtualbox VMs powered by Vagrant require authentication via SSH keys so you don’t have to provide a password each time vagrant up is doing its magic. Provisioning tools you run as part of the vagrant up command also rely on the SSH key based authentication to work properly. This is documented in the official Vagrant documentation set.

I don’t want to use unknown SSH keys with my own Vagrant boxes as a matter of principle. Whenever I create a new custom box I resort to a dedicated SSH key I’m using just for this purpose. This avoids the trouble with Vagrant’s “insecure key pair”, all I need to do is add config.ssh.private_key_path = "/path/to/key" to the Vagrantfile.

The documentation further reads I have to use a NAT device as the first network card in the VM. For some of my VMs I define an additional NIC using a host-only, private network for communication between say for example middle tier and database layer. I don’t want to mess around with port forwarding to enable communication between my VMs, and Vagrant makes it super easy to define another NIC.

This sounds interesting, but what does that have to do with this post?

Please bear with me, I’m building up a story ;) It will all make sense in a minute…

Connecting to the VM’s second interface

With all that in place it’s easy to SSH into my Vagrant box. Assume I have a Vagrant VM with an IP address of to which I want to connect via SSH. Remember when I said I have a dedicated SSH key for my Vagrant boxes? The SSH key is stored in ~/.ssh/vagrant. The SSH command to connect to the environment is simple:

$ ssh -i ~/.ssh/vagrant vagrant@

… and this connects me without having to provide a password.

Saving time for the lazy

Providing the path to the SSH key to use gets a little tedious after a while. There are a couple of solutions to this; there might be more, but I only know about these two:

  • Create a configuration in ~/.ssh/config Except that doesn’t work particularly well with keys for which you defined a passphrase as you now have to enter the passphrase each time
  • Add the key to the SSH agent

On Linux and MacOS I prefer the second method, especially since I’m relying on passphrases quite heavily. Recently I encountered a problem with this approach, though. When trying to connect to the VM, I received the following error message:

$ ssh vagrant@
Received disconnect from port 22:2: Too many authentication failures
Disconnected from port 22

What’s that all about? I am sure I have the necessary key added to the agent:

$ ssh-add -l | grep -c vagrant

Well it turns out that if you have too many non-matching keys, you can run into the pre-authentication problem like I did. The first step in troubleshooting SSH connections (at least to me) is to enable the verbose option:

$ ssh -v vagrant@

[ ... more detail ... ]

debug1: Will attempt key: key1 ... redacted ... agent
debug1: Will attempt key: key10 ... redacted ... agent
debug1: Will attempt key: key2 ... redacted ... agent
debug1: Will attempt key: key3 ... redacted ... agent
debug1: Will attempt key: key4 ... redacted ... agent
debug1: Will attempt key: key5 ... redacted ... agent
debug1: Will attempt key: key6 ... redacted ... agent
debug1: Will attempt key: key7 ... redacted ... agent
debug1: Will attempt key: key8 ... redacted ... agent
debug1: Will attempt key: key9 ... redacted ... agent

[ ... ]

debug1: Next authentication method: publickey
debug1: Offering public key: key1 ... redacted ... agent
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic,password


debug1: Offering public key: key5 ... redacted ... agent
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic,password
Received disconnect from port 22:2: Too many authentication failures
Disconnected from port 22

It is my understanding that SSH is querying the agent for SSH keys, and it receives them. After trying key1 through key5 and not finding a match, it decides to stop and returns said error message.

There are quite a few keys currently added to my running agent:

$ ssh-add -l | wc -l

The Solution

The solution is quite straight forward: I need to store keys with the agent, but I have to indicate which of the stored keys to log in to my VM. This is probably best done in ~/.ssh/config:

$ cat ~/.ssh/config
    IdentityFile ~/.ssh/vagrant

In summary, I’m now using a combination of the 2 approaches I outlined above to great effect: now I can log in without having to worry about the keys stored by my agent, and the order in which they are stored.

Ansible Tips’n’tricks: rebooting Vagrant boxes after a kernel upgrade

Occasionally I have to reboot my Vagrant boxes after kernel updates have been installed as part of an Ansible playbook during the “vagrant up” command execution.

I create my own Vagrant base boxes because that’s more convenient for me than pulling them from Vagrant’s cloud. However they, too, need TLC and updates. So long story short, I run a yum upgrade after spinning up Vagrant boxes in Ansible to have access to the latest and greatest (and hopefully most secure) software.

To stay in line with Vagrant’s philosophy, Vagrant VMs are lab and playground environments I create quickly. And I can dispose of them equally quickly, because all that I’m doing is controlled via code. This isn’t something you’d do with Enterprise installations!

Vagrant and Ansible for lab VMs!

Now how do you reboot a Vagrant controlled VM in Ansible? Here is how I’m doing this for VirtualBox 6.0.14 and Vagrant 2.2.6. Ubuntu 18.04.3 comes with Ansible 2.5.1.

Finding out if a kernel upgrade is needed

My custom Vagrant boxes are all based on Oracle Linux 7 and use UEK as the kernel of choice. That is important because it determines how I can find out if yum upgraded the kernel (eg UEK) as part of a “yum upgrade”.

There are many ways to do so, I have been using the following code snippet with some success:

 - name: check if we need to reboot after a kernel upgrade
    shell: if [ $(/usr/bin/rpm -q kernel-uek|/usr/bin/tail -n 1) != kernel-uek-$(uname -r) ]; then /usr/bin/echo 'reboot'; else /usr/bin/echo 'no'; fi
    register: must_reboot

So in other words I compare the last line from rpm -q kernel-uek to the name of the running kernel. If they match – all good. If they don’t, it seems there is a newer kernel-uek* RPM on disk than that of the running kernel. If the variable “must_reboot” contains “reboot”, I guess I have to reboot.


Ansible introduced a reboot module recently, however my Ubuntu 18.04 system’s Ansible version is too old for that and I wanted to stay with the distribution’s package. I needed an alternative.

There are lots of code snippets out there to reboot systems in Ansible, but none of them worked for me. So I decided to write the process up in this post :)

The following block worked for my very specific setup:

  - name: reboot if needed
    - shell: sleep 5 && systemctl reboot
      async: 300
      poll: 0
      ignore_errors: true

    - name: wait for system to come back online
        delay: 60
        timeout: 300
    when: '"reboot" in must_reboot.stdout'

This works nicely with the systems I’m using.

Except there’s a catch lurking in the code: when installing Oracle the software is made available via Virtualbox’s shared folders as defined in the Vagrantfile. When rebooting a Vagrant box outside the Vagrant interface (eg not using the vagrant reload command), shared folders aren’t mounted automatically. In other words, my playbook will fail trying to unzip binaries because it can’t find them. Which isn’t what I want. To circumvent this situation I add the following instruction into the block you just saw:

    - name: re-mount the shared folder after a reboot
        path: /mnt
        src: mnt
        fstype: vboxsf
        state: mounted

This re-mounts my shared folder, and I’m good to go!


Before installing Oracle software in Vagrant for lab and playground use I always want to make sure I have all the latest and greatest patches installed as part of bringing a Vagrant box online for the first time.

Using Ansible I can automate the entire process from start to finish, even including kernel updates in the process. These are applied before I install the Oracle software!

Upgrading the kernel (or any other software components for that matter) post Oracle installation is more involved, and I usually don’t need to do this during the lifetime of the Vagrant (playground/lab) VM. Which is why Vagrant is beautiful, especially when used together with Ansible.

orachk can now warn about unwanted cleanup of files in /var/tmp/.oracle

Some time ago @martinberx mentioned on twitter that one of his Linux systems suffered from Clusterware issues for which there wasn’t a readily available explanation. It turned out that the problem he faced were unwanted (from an Oracle perspective at least) automatic cleanup operations in /var/tmp/.oracle. You can read more at the original blog post.

The short version is this: systemd (1) – successor to SysV init and Upstart – tries to be helpful removing unused files in a number of “temp” directories. However some of the files it can remove are essential for Clusterware, and without them all sorts of trouble ensue.

A note about this post’s shell life and versions used

In case you found this post via a search engine, here are the key properties of the system I worked on while compiling this post – an Oracle Linux 7.7 VM and orachk 19.2.0_20190717. Everything in IT has a shell life, and this post is no different. It is likely to become obsolete with new releases of the operating system and/or orachk versions.


My Oracle Suport (MOS) has been updated, there are finally hits when searching for “tmpfiles.d” for Oracle/RedHat Linux 7. Exadata users find the issue documented as EX50. More importantly, orachk 19.2.0_20190717 (and potentially earlier releases, I haven’t checked) warn you about this potential stability issue as you can see in figure 1:

Figure 1: orachk warns about potential problems

This is great news for database administrators as this might have gone undetected otherwise. It should be noted though that the suggested solution underneath Action/Repair is incomplete. You cannot simply copy and paste the 3 lines mentioned in that section as the documentation for tmpfiles.d (5) reveals. Subsequent orachk runs confirm this by still flagging the outcome of the check as critical.

Ensuring the check passes

A little more digging into the issue and corresponding MOS notes revealed that a different syntax is needed. This has already been covered in a couple of other blog posts, I’m adding the information here to save you time. With the amended configuration added towards the end of /usr/lib/tmpfiles.d/tmp.conf, orachk was happy:

[root@server1]# cat /usr/lib/tmpfiles.d/tmp.conf
# This file is part of systemd.
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.

# See tmpfiles.d(5) for details

[ ... more output ... ]

x /tmp/.oracle*
x /var/tmp/.oracle*
x /usr/tmp/.oracle*

I re-ran the orachk command and thankfully, the test succeeded as you can see this in figure 2:

Figure 2: with a little extra configuration the check passes.

I have no idea if systemd picks the change in my configuration file up without restarting the timer, so I’m also doing this for good measure:

[root@server1 ~]# systemctl restart systemd-tmpfiles-clean.timer

With these remediation steps in place, you have done everything Oracle documented to be safe from Clusterware issues caused by systemd. Shortly before hitting the publish button @FritsHoogland let me know that Oracle’s Grid Infrastructure RU 19.4 has a go at fixing this issue. It doesn’t add all the lines to satisfy orachk though, and you need to review /usr/lib/tmpfiles.d/tmp.conf after applying the 19.4 Grid Infrastructure RU.

Ansible tips’n’tricks: executing a loop conditionally

When writing playbooks, I occasionally add optional tasks. These tasks are only executed if a corresponding configuration variable is defined. I usually define configuration variables either in group_vars/* or alternatively in the role’s roleName/default/ directory.

The “when” keyword can be used to test for the presence of a variable and execute a task if the condition evaluates to “true”. However this isn’t always straight-forward to me, and recently I stumbled across some interesting behaviour that I found worth mentioning. I would like to point out that I’m merely an Ansible enthusiast, and by no means a pro. In case there is a better way to do this, please let me know and I’ll update the post :)

Before showing you my code, I’d like to add a little bit of detail here in case someone finds this post via a search engine:

  • Ansible version: ansible 2.8.2
  • Operating system: Fedora 29 on Linux x86-64

The code

This is the initial code I started with:

$ tree
├── inventory.ini
├── roles
│   └── example
│       ├── defaults
│       │   └── main.yml
│       └── tasks
│           └── main.yml
└── variables.yml

4 directories, 4 files

$ nl variables.yml 
      1  ---
      2  - hosts: blogpost
      3    become: yes
      4    roles:
      5    - example

$ nl roles/example/defaults/main.yml 
     1  #
     2  # some variables
     3  #

     4  oracle_disks: ''

$ nl roles/example/tasks/main.yml
     1  ---
     2  - name: print lenght of oracle_disks variable
     3    debug: 
     4      msg: "The variable has a length of {{ oracle_disks | length }}"

     5  - name: format disk devices
     6    parted:
     7      device: "{{ item }}"
     8      number: 1
     9      state: present
    10      align: optimal
    11      label: gpt
    12    loop: "{{ oracle_disks }}"
    13    when: oracle_disks | length > 0

This will not work, as you can see in a minute.

The error

And indeed, the execution of my playbook (variables.yml) failed:

$ ansible-playbook -vi inventory.ini variables.yml 
Using /etc/ansible/ansible.cfg as config file

PLAY [blogpost] ******************************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************
ok: [server6]

TASK [example : print lenght of oracle_disks variable] ***************************************************************
ok: [server6] => {}


The variable has a length of 0

TASK [example : format disk devices] *********************************************************************************
fatal: [server6]: FAILED! => {}


Invalid data passed to 'loop', it requires a list, got this instead: . 
Hint: If you passed a list/dict of just one element, try adding wantlist=True 
to your lookup invocation or use q/query instead of lookup.

PLAY RECAP ***********************************************************************************************************
server6                    : ok=2    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

The intention was to not execute the task named “format disk devices” if oracle_disks has a length of 0. This seems to be evaluated too late though, and it turned out to be the wrong check anyway. I tried various permutations of the scheme, but none were successful while oracle_disks was set to the empty string. Which is wrong, but please bear with me …

No errors with meaningful values

The loop syntax in the role’s tasks/main.yml file is correct though, once I set the variable to a list, it worked:

$ nl roles/example/defaults/main.yml
      1  #
      2  # some variables
      3  #
      4  oracle_disks: 
      5  - /dev/vdc
      6  - /dev/vdd

$ ansible-playbook -vi inventory.ini variables.yml
Using /etc/ansible/ansible.cfg as config file

PLAY [blogpost] ******************************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************
ok: [server6]

TASK [example : print lenght of oracle_disks variable] ***************************************************************
ok: [server6] => {}


The variable has a length of 2

TASK [example : format disk devices] *********************************************************************************
changed: [server6] => (item=/dev/vdc) => {
    "ansible_loop_var": "item",
    "changed": true,
    "disk": {
        "dev": "/dev/vdc",
        "logical_block": 512,
        "model": "Virtio Block Device",
        "physical_block": 512,
        "size": 10485760.0,
        "table": "gpt",
        "unit": "kib"
    "item": "/dev/vdc",
    "partitions": [
            "begin": 1024.0,
            "end": 10484736.0,
            "flags": [],
            "fstype": "",
            "name": "primary",
            "num": 1,
            "size": 10483712.0,
            "unit": "kib"
    "script": "unit KiB mklabel gpt mkpart primary 0% 100%"
changed: [server6] => (item=/dev/vdd) => {
    "ansible_loop_var": "item",
    "changed": true,
    "disk": {
        "dev": "/dev/vdd",
        "logical_block": 512,
        "model": "Virtio Block Device",
        "physical_block": 512,
        "size": 10485760.0,
        "table": "gpt",
        "unit": "kib"
    "item": "/dev/vdd",
    "partitions": [
            "begin": 1024.0,
            "end": 10484736.0,
            "flags": [],
            "fstype": "",
            "name": "primary",
            "num": 1,
            "size": 10483712.0,
            "unit": "kib"
    "script": "unit KiB mklabel gpt mkpart primary 0% 100%"

PLAY RECAP ***********************************************************************************************************
server6                    : ok=3    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

So what gives? It once more goes to show that as soon as you do things right, they start working.

Checking if a variable is defined

How can I prevent the task from being executed? There are probably a great many ways of achieving this goal, I learned that not defining oracle_disks seems to work for me. Here I’m commenting out all references to the variable before trying again:

$ cat roles/example/defaults/main.yml 
# some variables

#- /dev/vdc
#- /dev/vdd

$ cat roles/example/tasks/main.yml 
- name: print lenght of oracle_disks variable
    msg: "The variable has a length of {{ oracle_disks | length }}"
  when: oracle_disks is defined

- name: format disk devices
    device: "{{ item }}"
    number: 1
    state: present
    align: optimal
    label: gpt
  loop: "{{ oracle_disks }}" 
  when: oracle_disks is defined

$ ansible-playbook -vi inventory.ini variables.yml 
Using /etc/ansible/ansible.cfg as config file

PLAY [blogpost] ******************************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************
ok: [server6]

TASK [example : print lenght of oracle_disks variable] ***************************************************************
skipping: [server6] => {}

TASK [example : format disk devices] *********************************************************************************
skipping: [server6] => {
    "changed": false,
    "skip_reason": "Conditional result was False"

PLAY RECAP ***********************************************************************************************************
server6                    : ok=1    changed=0    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0 

With the variable not defined, the task is skipped as intended.

As you read earlier, using the empty string (”) isn’t the right way to mark a variable as “empty”. I guess this is where my other programming languages influenced me a bit (cough * perl * cough). The proper way to indicate there are no items in the list (as per the documentation) is this:

$ nl roles/example/defaults/main.yml 
     1  #
     2  # some variables
     3  #

     4  oracle_disks: []

$ nl roles/example/tasks/main.yml 
     1  ---
     2  - name: print lenght of oracle_disks variable
     3    debug: 
     4      msg: "The variable has a length of {{ oracle_disks | length }}"
     5    when: oracle_disks is defined

     6  - name: format disk devices
     7    parted:
     8      device: "{{ item }}"
     9      number: 1
    10      state: present
    11      align: optimal
    12      label: gpt
    13    loop: "{{ oracle_disks | default([]) }}" 

The default() assignment in tasks/main.yml line 13 shouldn’t be necessary with the assignment completed in defaults/main.yml line 4. It doesn’t seem to hurt either. Instead of the conditional check message you will see the task executed, but since there is nothing to loop over, it finishes straight away:

$ ansible-playbook -vi inventory.ini variables.yml 
Using /etc/ansible/ansible.cfg as config file

PLAY [blogpost] ***********************************************************************************************************************************************************************

TASK [Gathering Facts] ****************************************************************************************************************************************************************
ok: [server6]

TASK [example : print lenght of oracle_disks variable] ********************************************************************************************************************************
ok: [server6] => {}


The variable has a length of 0

TASK [example : format disk devices] **************************************************************************************************************************************************

PLAY RECAP ****************************************************************************************************************************************************************************
server6                    : ok=2    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0

Happy coding!