Category Archives: Automation

Tips'n'tricks: understanding "too many authentication failures" in SSH

Virtualbox VMs powered by Vagrant require authentication via SSH keys so you don’t have to provide a password each time vagrant up is doing its magic. Provisioning tools you run as part of the vagrant up command also rely on the SSH key based authentication to work properly. This is documented in the official Vagrant documentation set.

I don’t want to use unknown SSH keys with my own Vagrant boxes as a matter of principle. Whenever I create a new custom box I resort to a dedicated SSH key I’m using just for this purpose. This avoids the trouble with Vagrant’s “insecure key pair”, all I need to do is add config.ssh.private_key_path = "/path/to/key" to the Vagrantfile.

The documentation further reads I have to use a NAT device as the first network card in the VM. For some of my VMs I define an additional NIC using a host-only, private network for communication between say for example middle tier and database layer. I don’t want to mess around with port forwarding to enable communication between my VMs, and Vagrant makes it super easy to define another NIC.

This sounds interesting, but what does that have to do with this post?

Please bear with me, I’m building up a story ;) It will all make sense in a minute…

Connecting to the VM’s second interface

With all that in place it’s easy to SSH into my Vagrant box. Assume I have a Vagrant VM with an IP address of 192.168.56.202 to which I want to connect via SSH. Remember when I said I have a dedicated SSH key for my Vagrant boxes? The SSH key is stored in ~/.ssh/vagrant. The SSH command to connect to the environment is simple:

$ ssh -i ~/.ssh/vagrant vagrant@192.169.56.202

… and this connects me without having to provide a password.

Saving time for the lazy

Providing the path to the SSH key to use gets a little tedious after a while. There are a couple of solutions to this; there might be more, but I only know about these two:

  • Create a configuration in ~/.ssh/config Except that doesn’t work particularly well with keys for which you defined a passphrase as you now have to enter the passphrase each time
  • Add the key to the SSH agent

On Linux and MacOS I prefer the second method, especially since I’m relying on passphrases quite heavily. Recently I encountered a problem with this approach, though. When trying to connect to the VM, I received the following error message:

$ ssh vagrant@192.169.56.202
Received disconnect from 192.169.56.202 port 22:2: Too many authentication failures
Disconnected from 192.169.56.202 port 22

What’s that all about? I am sure I have the necessary key added to the agent:

$ ssh-add -l | grep -c vagrant
1

Well it turns out that if you have too many non-matching keys, you can run into the pre-authentication problem like I did. The first step in troubleshooting SSH connections (at least to me) is to enable the verbose option:

$ ssh -v vagrant@192.169.56.202

[ ... more detail ... ]

debug1: Will attempt key: key1 ... redacted ... agent
debug1: Will attempt key: key10 ... redacted ... agent
debug1: Will attempt key: key2 ... redacted ... agent
debug1: Will attempt key: key3 ... redacted ... agent
debug1: Will attempt key: key4 ... redacted ... agent
debug1: Will attempt key: key5 ... redacted ... agent
debug1: Will attempt key: key6 ... redacted ... agent
debug1: Will attempt key: key7 ... redacted ... agent
debug1: Will attempt key: key8 ... redacted ... agent
debug1: Will attempt key: key9 ... redacted ... agent

[ ... ]

debug1: Next authentication method: publickey
debug1: Offering public key: key1 ... redacted ... agent
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic,password

[...]

debug1: Offering public key: key5 ... redacted ... agent
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic,password
Received disconnect from 192.169.56.202 port 22:2: Too many authentication failures
Disconnected from 192.169.56.202 port 22

It is my understanding that SSH is querying the agent for SSH keys, and it receives them. After trying key1 through key5 and not finding a match, it decides to stop and returns said error message.

There are quite a few keys currently added to my running agent:

$ ssh-add -l | wc -l
11

The Solution

The solution is quite straight forward: I need to store keys with the agent, but I have to indicate which of the stored keys to log in to my VM. This is probably best done in ~/.ssh/config:

$ cat ~/.ssh/config 

192.168.56.202
    IdentityFile ~/.ssh/vagrant

In summary, I’m now using a combination of the 2 approaches I outlined above to great effect: now I can log in without having to worry about the keys stored by my agent, and the order in which they are stored.

Ansible Tips’n’tricks: rebooting Vagrant boxes after a kernel upgrade

Occasionally I have to reboot my Vagrant boxes after kernel updates have been installed as part of an Ansible playbook during the “vagrant up” command execution.

I create my own Vagrant base boxes because that’s more convenient for me than pulling them from Vagrant’s cloud. However they, too, need TLC and updates. So long story short, I run a yum upgrade after spinning up Vagrant boxes in Ansible to have access to the latest and greatest (and hopefully most secure) software.

To stay in line with Vagrant’s philosophy, Vagrant VMs are lab and playground environments I create quickly. And I can dispose of them equally quickly, because all that I’m doing is controlled via code. This isn’t something you’d do with Enterprise installations!

Vagrant and Ansible for lab VMs!

Now how do you reboot a Vagrant controlled VM in Ansible? Here is how I’m doing this for VirtualBox 6.0.14 and Vagrant 2.2.6. Ubuntu 18.04.3 comes with Ansible 2.5.1.

Finding out if a kernel upgrade is needed

My custom Vagrant boxes are all based on Oracle Linux 7 and use UEK as the kernel of choice. That is important because it determines how I can find out if yum upgraded the kernel (eg UEK) as part of a “yum upgrade”.

There are many ways to do so, I have been using the following code snippet with some success:

 - name: check if we need to reboot after a kernel upgrade
    shell: if [ $(/usr/bin/rpm -q kernel-uek|/usr/bin/tail -n 1) != kernel-uek-$(uname -r) ]; then /usr/bin/echo 'reboot'; else /usr/bin/echo 'no'; fi
    register: must_reboot

So in other words I compare the last line from rpm -q kernel-uek to the name of the running kernel. If they match – all good. If they don’t, it seems there is a newer kernel-uek* RPM on disk than that of the running kernel. If the variable “must_reboot” contains “reboot”, I guess I have to reboot.

Rebooting

Ansible introduced a reboot module recently, however my Ubuntu 18.04 system’s Ansible version is too old for that and I wanted to stay with the distribution’s package. I needed an alternative.

There are lots of code snippets out there to reboot systems in Ansible, but none of them worked for me. So I decided to write the process up in this post :)

The following block worked for my very specific setup:

  - name: reboot if needed
    block:
    - shell: sleep 5 && systemctl reboot
      async: 300
      poll: 0
      ignore_errors: true

    - name: wait for system to come back online
      wait_for_connection:
        delay: 60
        timeout: 300
    when: '"reboot" in must_reboot.stdout'

This works nicely with the systems I’m using.

Except there’s a catch lurking in the code: when installing Oracle the software is made available via Virtualbox’s shared folders as defined in the Vagrantfile. When rebooting a Vagrant box outside the Vagrant interface (eg not using the vagrant reload command), shared folders aren’t mounted automatically. In other words, my playbook will fail trying to unzip binaries because it can’t find them. Which isn’t what I want. To circumvent this situation I add the following instruction into the block you just saw:

    - name: re-mount the shared folder after a reboot
      mount:
        path: /mnt
        src: mnt
        fstype: vboxsf
        state: mounted

This re-mounts my shared folder, and I’m good to go!

Summary

Before installing Oracle software in Vagrant for lab and playground use I always want to make sure I have all the latest and greatest patches installed as part of bringing a Vagrant box online for the first time.

Using Ansible I can automate the entire process from start to finish, even including kernel updates in the process. These are applied before I install the Oracle software!

Upgrading the kernel (or any other software components for that matter) post Oracle installation is more involved, and I usually don’t need to do this during the lifetime of the Vagrant (playground/lab) VM. Which is why Vagrant is beautiful, especially when used together with Ansible.

Ansible tips’n’tricks: executing a loop conditionally

When writing playbooks, I occasionally add optional tasks. These tasks are only executed if a corresponding configuration variable is defined. I usually define configuration variables either in group_vars/* or alternatively in the role’s roleName/default/ directory.

The “when” keyword can be used to test for the presence of a variable and execute a task if the condition evaluates to “true”. However this isn’t always straight-forward to me, and recently I stumbled across some interesting behaviour that I found worth mentioning. I would like to point out that I’m merely an Ansible enthusiast, and by no means a pro. In case there is a better way to do this, please let me know and I’ll update the post :)

Before showing you my code, I’d like to add a little bit of detail here in case someone finds this post via a search engine:

  • Ansible version: ansible 2.8.2
  • Operating system: Fedora 29 on Linux x86-64

The code

This is the initial code I started with:

$ tree
.
├── inventory.ini
├── roles
│   └── example
│       ├── defaults
│       │   └── main.yml
│       └── tasks
│           └── main.yml
└── variables.yml

4 directories, 4 files

$ nl variables.yml 
      1  ---
      2  - hosts: blogpost
      3    become: yes
      4    roles:
      5    - example

$ nl roles/example/defaults/main.yml 
     1  #
     2  # some variables
     3  #

     4  oracle_disks: ''

$ nl roles/example/tasks/main.yml
     1  ---
     2  - name: print lenght of oracle_disks variable
     3    debug: 
     4      msg: "The variable has a length of {{ oracle_disks | length }}"

     5  - name: format disk devices
     6    parted:
     7      device: "{{ item }}"
     8      number: 1
     9      state: present
    10      align: optimal
    11      label: gpt
    12    loop: "{{ oracle_disks }}"
    13    when: oracle_disks | length > 0

This will not work, as you can see in a minute.

The error

And indeed, the execution of my playbook (variables.yml) failed:

$ ansible-playbook -vi inventory.ini variables.yml 
Using /etc/ansible/ansible.cfg as config file

PLAY [blogpost] ******************************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************
ok: [server6]

TASK [example : print lenght of oracle_disks variable] ***************************************************************
ok: [server6] => {}

MSG:

The variable has a length of 0


TASK [example : format disk devices] *********************************************************************************
fatal: [server6]: FAILED! => {}

MSG:

Invalid data passed to 'loop', it requires a list, got this instead: . 
Hint: If you passed a list/dict of just one element, try adding wantlist=True 
to your lookup invocation or use q/query instead of lookup.


PLAY RECAP ***********************************************************************************************************
server6                    : ok=2    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

The intention was to not execute the task named “format disk devices” if oracle_disks has a length of 0. This seems to be evaluated too late though, and it turned out to be the wrong check anyway. I tried various permutations of the scheme, but none were successful while oracle_disks was set to the empty string. Which is wrong, but please bear with me …

No errors with meaningful values

The loop syntax in the role’s tasks/main.yml file is correct though, once I set the variable to a list, it worked:

$ nl roles/example/defaults/main.yml
      1  #
      2  # some variables
      3  #
        
      4  oracle_disks: 
      5  - /dev/vdc
      6  - /dev/vdd

$ ansible-playbook -vi inventory.ini variables.yml
Using /etc/ansible/ansible.cfg as config file

PLAY [blogpost] ******************************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************
ok: [server6]

TASK [example : print lenght of oracle_disks variable] ***************************************************************
ok: [server6] => {}

MSG:

The variable has a length of 2

TASK [example : format disk devices] *********************************************************************************
changed: [server6] => (item=/dev/vdc) => {
    "ansible_loop_var": "item",
    "changed": true,
    "disk": {
        "dev": "/dev/vdc",
        "logical_block": 512,
        "model": "Virtio Block Device",
        "physical_block": 512,
        "size": 10485760.0,
        "table": "gpt",
        "unit": "kib"
    },
    "item": "/dev/vdc",
    "partitions": [
        {
            "begin": 1024.0,
            "end": 10484736.0,
            "flags": [],
            "fstype": "",
            "name": "primary",
            "num": 1,
            "size": 10483712.0,
            "unit": "kib"
        }
    ],
    "script": "unit KiB mklabel gpt mkpart primary 0% 100%"
}
changed: [server6] => (item=/dev/vdd) => {
    "ansible_loop_var": "item",
    "changed": true,
    "disk": {
        "dev": "/dev/vdd",
        "logical_block": 512,
        "model": "Virtio Block Device",
        "physical_block": 512,
        "size": 10485760.0,
        "table": "gpt",
        "unit": "kib"
    },
    "item": "/dev/vdd",
    "partitions": [
        {
            "begin": 1024.0,
            "end": 10484736.0,
            "flags": [],
            "fstype": "",
            "name": "primary",
            "num": 1,
            "size": 10483712.0,
            "unit": "kib"
        }
    ],
    "script": "unit KiB mklabel gpt mkpart primary 0% 100%"
}

PLAY RECAP ***********************************************************************************************************
server6                    : ok=3    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

So what gives? It once more goes to show that as soon as you do things right, they start working.

Checking if a variable is defined

How can I prevent the task from being executed? There are probably a great many ways of achieving this goal, I learned that not defining oracle_disks seems to work for me. Here I’m commenting out all references to the variable before trying again:

$ cat roles/example/defaults/main.yml 
#
# some variables
#

#oracle_disks: 
#- /dev/vdc
#- /dev/vdd

$ cat roles/example/tasks/main.yml 
---
- name: print lenght of oracle_disks variable
  debug: 
    msg: "The variable has a length of {{ oracle_disks | length }}"
  when: oracle_disks is defined

- name: format disk devices
  parted:
    device: "{{ item }}"
    number: 1
    state: present
    align: optimal
    label: gpt
  loop: "{{ oracle_disks }}" 
  when: oracle_disks is defined

$ ansible-playbook -vi inventory.ini variables.yml 
Using /etc/ansible/ansible.cfg as config file

PLAY [blogpost] ******************************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************
ok: [server6]

TASK [example : print lenght of oracle_disks variable] ***************************************************************
skipping: [server6] => {}

TASK [example : format disk devices] *********************************************************************************
skipping: [server6] => {
    "changed": false,
    "skip_reason": "Conditional result was False"
}

PLAY RECAP ***********************************************************************************************************
server6                    : ok=1    changed=0    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0 

With the variable not defined, the task is skipped as intended.

As you read earlier, using the empty string (”) isn’t the right way to mark a variable as “empty”. I guess this is where my other programming languages influenced me a bit (cough * perl * cough). The proper way to indicate there are no items in the list (as per the documentation) is this:

$ nl roles/example/defaults/main.yml 
     1  #
     2  # some variables
     3  #

     4  oracle_disks: []

$ nl roles/example/tasks/main.yml 
     1  ---
     2  - name: print lenght of oracle_disks variable
     3    debug: 
     4      msg: "The variable has a length of {{ oracle_disks | length }}"
     5    when: oracle_disks is defined

     6  - name: format disk devices
     7    parted:
     8      device: "{{ item }}"
     9      number: 1
    10      state: present
    11      align: optimal
    12      label: gpt
    13    loop: "{{ oracle_disks | default([]) }}" 

The default() assignment in tasks/main.yml line 13 shouldn’t be necessary with the assignment completed in defaults/main.yml line 4. It doesn’t seem to hurt either. Instead of the conditional check message you will see the task executed, but since there is nothing to loop over, it finishes straight away:

$ ansible-playbook -vi inventory.ini variables.yml 
Using /etc/ansible/ansible.cfg as config file

PLAY [blogpost] ***********************************************************************************************************************************************************************

TASK [Gathering Facts] ****************************************************************************************************************************************************************
ok: [server6]

TASK [example : print lenght of oracle_disks variable] ********************************************************************************************************************************
ok: [server6] => {}

MSG:

The variable has a length of 0


TASK [example : format disk devices] **************************************************************************************************************************************************

PLAY RECAP ****************************************************************************************************************************************************************************
server6                    : ok=2    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0

Happy coding!

Ansible tips’n’tricks: checking if a systemd service is running

I have been working on an Ansible playbook to update Oracle’s Tracefile Analyser (TFA). If you have been following this blog over the past few months you might remember that I’m a great fan of the tool! Using Ansible makes my life a lot easier: when deploying a new system I can ensure that I’m also installing TFA. Under normal circumstances, TFA should be present when the (initial) deployment playbook finishes. At least in theory.

As we know, life is what happens when you’re making other plans, and I’d rather check whether TFA is installed/configured/running before trying to upgrade it. The command to upgrade TFA is different from the command I use to deploy it.

I have considered quite a few different ways to do this but in the end decided to check for the oracle-tfa service: if the service is present, TFA must be as well. There are probably other ways, maybe better ones, but this one works for me.

Checking for the presence of a service

Ansible offers a module, called service_facts since version 2.5 to facilitate working with services. I also tried the setup module but didn’t find what I needed. Consider the following output, generated on Oracle Linux 7.6 when gathering service facts:

TASK [get service facts] *******************************************************
 ok: [localhost] => {
     "ansible_facts": {
         "services": {
             "NetworkManager-wait-online.service": {
                 "name": "NetworkManager-wait-online.service", 
                 "source": "systemd", 
                 "state": "stopped"
             }, 
             "NetworkManager.service": {
                 "name": "NetworkManager.service", 
                 "source": "systemd", 
                 "state": "running"
             }, 
             "auditd.service": {
                 "name": "auditd.service", 
                 "source": "systemd", 
                 "state": "running"
             }, 

[ many more services ]

            "oracle-tfa.service": {
                 "name": "oracle-tfa.service", 
                 "source": "systemd", 
                 "state": "running"
             }, 

[ many more services ]

This looks ever so slightly complicated! And indeed, it took a little while to work the syntax out. My first attempt were all but unsuccessful.

Getting the syntax right

Thankfully I wasn’t the only one with the problem, and with a little bit of research ended up with this code:

---
 - hosts: localhost
   connection: local
   become: true

   tasks:
   - name: get service facts
     service_facts:

   - name: try to work out how to access the service
     debug:
       var: ansible_facts.services["oracle-tfa.service"]

Awesome! When running this on a system with TFA installed, it works quite nicely:

TASK [try to work out how to access the service] *******************************
 ok: [localhost] => {
     "ansible_facts.services[\"oracle-tfa.service\"]": {
         "name": "oracle-tfa.service", 
         "source": "systemd", 
         "state": "running"
     }
 }
 

 PLAY RECAP *********************************************************************
 localhost                  : ok=3    changed=0    unreachable=0    failed=0

The same code fails on a system without TFA installed:

TASK [try to work out how to access the service] *******************************
 ok: [localhost] => {
     "ansible_facts.services[\"oracle-tfa.service\"]": "VARIABLE IS NOT DEFINED!
      'dict object' has no attribute 'oracle-tfa.service'"
 }
 

 PLAY RECAP *********************************************************************
 localhost                  : ok=3    changed=0    unreachable=0    failed=0

Now the trick is to ensure that I’m not referencing an undefined variable. This isn’t too hard either, here is a useable playbook:

---
 - hosts: localhost
   connection: local 
 
   tasks:
   - name: get service facts
     service_facts:
 
   - name: check if TFA is installed
     fail:
       msg: Tracefile Analyzer is not installed, why? It should have been there!
     when: ansible_facts.services["oracle-tfa.service"] is not defined

The “tasks” include getting service facts before testing for the presence of the oracle-tfa.service. I deliberately fail the upgrade process to make the user aware of a situation that should not have happened.

Hope this helps!

Ansible tips’n’tricks: provision multiple machines in parallel with Vagrant and Ansible

Vagrant is a great tool that I’m regularly using for building playground environments on my laptop. I recently came across a slight inconvenience with Vagrant’s Virtualbox provider: occasionally I would like to spin up a Data Guard environment and provision both VMs in parallel to save time. Sadly you can’t bring up multiple machines in parallel using the VirtualBox provisioner according to the documentation . This was true as of April 11 2019 and might change in the future, so keep an eye out on the reference.

I very much prefer to save time by doing things in parallel, and so I started digging around how I could achieve this goal.

The official documentation mentions something that looks like a for loop to wait for all machines to be up. This isn’t really an option, I wanted more control over machine names and IP addresses. So I came up with this approach, it may not be the best, but it falls into the “good enough for me” category.

Vagrantfile

The Vagrantfile is actually quite simple and might remind you of a previous article:

  1 Vagrant.configure("2") do |config|
  2   config.ssh.private_key_path = "/path/to/key"
  3 
  4   config.vm.define "server1" do |server1|
  5     server1.vm.box = "ansibletestbase"
  6     server1.vm.hostname = "server1"
  7     server1.vm.network "private_network", ip: "192.168.56.11"
  8     server1.vm.synced_folder "/path/to/stuff", "/mnt",
  9       mount_options: ["uid=54321", "gid=54321"]
 10 
 11     config.vm.provider "virtualbox" do |vb|
 12       vb.memory = 2048
 13       vb.cpus = 2
 14     end
 15   end
 16 
 17   config.vm.define "server2" do |server2|
 18     server2.vm.box = "ansibletestbase"
 19     server2.vm.hostname = "server2"
 20     server2.vm.network "private_network", ip: "192.168.56.12"
 21     server2.vm.synced_folder "/path/to/stuff", "/mnt",
 22       mount_options: ["uid=54321", "gid=54321"]
 23 
 24     config.vm.provider "virtualbox" do |vb|
 25       vb.memory = 2048
 26       vb.cpus = 2
 27     end
 28   end
 29 
 30   config.vm.provision "ansible" do |ansible|
 31     ansible.playbook = "hello.yml"
 32     ansible.groups = {
 33       "oracle_si" => ["server[1:2]"],
 34       "oracle_si:vars" => { 
 35         "install_rdbms" => "true",
 36         "patch_rdbms" => "true",
 37       }
 38     }
 39   end
 40 
 41 end

Ansibletestbase is my custom Oracle Linux 7 image that I keep updated for personal use. I define a couple of machines, server1 and server2 and from line 30 onwards let Ansible provision them.

A little bit of an inconvenience

Now here is the inconvenient bit: if I provided an elaborate playbook to provision Oracle in line 31 of the Vagrantfile, it would be run serially. First for server1, and only after it completed (or failed…) server2 will be created and provisioned. This is the reason for a rather plain playbook, hello.yml:

$ cat hello.yml 
---
- hosts: oracle_si
  tasks:
  - name: say hello
    debug: var=ansible_hostname

This literally takes no time to execute at all, so no harm is done running it serially once per VM. Not only is no harm done, quite the contrary: Vagrant discovered an Ansible provider in the Vagrantfile and created a suitable inventory file for me. I’ll gladly use it later.

How does this work out?

Enough talking, time to put this to test and to bring up both machines. As you will see in the captured output, they start one-by-one, run their provisioning tool and proceed to the next system.

$ vagrant up 
Bringing machine 'server1' up with 'virtualbox' provider...
Bringing machine 'server2' up with 'virtualbox' provider...
==> server1: Importing base box 'ansibletestbase'...
==> server1: Matching MAC address for NAT networking...

[...]

==> server1: Running provisioner: ansible...

[...]

    server1: Running ansible-playbook...

PLAY [oracle_si] ***************************************************************

TASK [Gathering Facts] *********************************************************
ok: [server1]

TASK [say hello] ***************************************************************
ok: [server1] => {
    "ansible_hostname": "server1"
}

PLAY RECAP *********************************************************************
server1                    : ok=2    changed=0    unreachable=0    failed=0   

==> server2: Importing base box 'ansibletestbase'...
==> server2: Matching MAC address for NAT networking...

[...]

==> server2: Running provisioner: ansible...

[...]
    server2: Running ansible-playbook...

PLAY [oracle_si] ***************************************************************

TASK [Gathering Facts] *********************************************************
ok: [server2]

TASK [say hello] ***************************************************************
ok: [server2] => {
    "ansible_hostname": "server2"
}

PLAY RECAP *********************************************************************
server2                    : ok=2    changed=0    unreachable=0    failed=0

As always the Ansible provisioner created an inventory file I can use in ./.vagrant/provisioners/ansible/inventory/vagrant_ansible_inventory. The inventory looks exactly as described in the |ansible| block, and it has the all important global variables as well.

$cat ./.vagrant/provisioners/ansible/inventory/vagrant_ansible_inventory
# Generated by Vagrant
server2 ansible_host=127.0.0.1 ansible_port=2201 ansible_user='vagrant' ansible_ssh_private_key_file='/path/to/key'
server1 ansible_host=127.0.0.1 ansible_port=2200 ansible_user='vagrant' ansible_ssh_private_key_file='/path/to/key'

[oracle_si]
server[1:2]

[oracle_si:vars]
install_rdbms=true
patch_rdbms=true

After ensuring both that machines are up using the “ping” module, I can run the actual playbook. You might have to confirm the servers’ ssh keys the first time you run this:

$ ansible -i ./.vagrant/provisioners/ansible/inventory/vagrant_ansible_inventory -m ping oracle_si
server1 | SUCCESS => {
"changed": false,
"ping": "pong"
}
server2 | SUCCESS => {
"changed": false,
"ping": "pong"
}

All good to go! Let’s call the actual playbook to provision my machines.

$ ansible-playbook -i ./.vagrant/provisioners/ansible/inventory/vagrant_ansible_inventory provisioning/oracle.yml 

TASK [Gathering Facts] ***************************************************************
ok: [server1]
ok: [server2]

TASK [test for Oracle Linux 7] *******************************************************
skipping: [server1]
skipping: [server2]

And we’re off to the races. Happy automating!

Ansible tips’n’tricks: testing and debugging Ansible scripts using Vagrant

At last year’s UKOUG I presented about Vagrant and how to use this great piece of software to test and debug Ansible scripts easily. Back then in December I promised a write-up, but for various reasons only now got around to finishing it.

Vagrant’s Ansible Provisioner

Vagrant offers two different Ansible provisioners: “ansible” and “ansible_local”. The “ansible” provisioner depends on a local Ansible installation, on the host. If this isn’t feasible, you can use “ansible_local” instead. As the name implies it executes code on the VM instead of on the host. This post is about the “ansible” provisioner.

Most people use Vagrant with the default VirtualBox provider, and so do I in this post.

A closer look at the Vagrantfile

It all starts with a Vagrantfile. A quick “vagrant init ” will get you one. My test image I use for deploying the Oracle database comes with all the necessary block devices and packages needed, saving me quite some time. Naturally I’ll start with that one.

$ cat -n Vagrantfile 
     1    # -*- mode: ruby -*-
     2    # vi: set ft=ruby :
     3    
     4    Vagrant.configure("2") do |config|
     5    
     6      config.ssh.private_key_path = "/path/to/ssh/key"
     7    
     8      config.vm.box = "ansibletestbase"
     9      config.vm.define "server1" do |server1|
    10        server1.vm.box = "ansibletestbase"
    11        server1.vm.hostname = "server1"
    12        server1.vm.network "private_network", ip: "192.168.56.15"
    13    
    14        config.vm.provider "virtualbox" do |vb|
    15          vb.memory = 2048
    16          vb.cpus = 2 
    17        end 
    18      end 
    19    
    20      config.vm.provision "ansible" do |ansible|
    21        ansible.playbook = "blogpost.yml"
    22        ansible.groups = { 
    23          "oracle_si" => ["server1"],
    24          "oracle_si:vars" => { 
    25            "install_rdbms" => "true",
    26            "patch_rdbms" => "true",
    27            "create_db" => "true"
    28          }   
    29        }   
    30      end 
    31    
    32    end

Since I have decided to create my own custom image without relying on the “insecure key pair” I need to keep track of my SSH keys. This is done in line 6. Otherwise there wouldn’t be an option to connect to the system and Vagrant couldn’t bring the VM up.

Lines 8 to 18 define the VM – which image to derive it from, and how to configure it. The settings are pretty much self-explanatory so I won’t go into too much detail. Only this much:

  • I usually want a host-only network instead of just a NAT device, and I create one in line 12. The IP address maps to and address on vboxnet0 in my configuration. If you don’t have a host-only network and want one, you can create it in VirtualBox’s preferences.
  • In line 14 to 17 I set some properties of my VM. I want it to come up with 2 GB of RAM and 2 CPUs.

Integrating Ansible into the Vagrantfile

The Ansible configuration is found on lines 20 to 30. As soon as the VM comes up I want Vagrant to run the Ansible provisioner and execute my playbook named “blogpost.yml”.

Most of my playbooks rely on global variables I define in the inventory file. Vagrant will create an inventory for me when it finds an Ansible provisioner in the Vagrantfile. The inventory it creates doesn’t fit my needs though, but that is easy to change. Recent Vagrant versions allow me to create the inventory just as I need it. You see this in lines 22 to 28. The resulting inventory file is created in .vagrant/provisioners/ansible/inventory/vagrant_ansible_inventory and looks like this:

$ cat .vagrant/provisioners/ansible/inventory/vagrant_ansible_inventory 
# Generated by Vagrant

server1 ansible_host=127.0.0.1 ansible_port=2222 ansible_user='vagrant' ansible_ssh_private_key_file='/path/to/ssh/key'

[oracle_si]
server1

[oracle_si:vars]
install_rdbms=true
patch_rdbms=true
create_db=true

That’s exactly what I’d use if I manually edited the inventory file, except that I don’t need to use “vagrant ssh-config” to figure out what the current SSH configuration is.

I define a group of hosts, and a few global variables for my playbook. This way all I need to do is change the Vagrantfile and control the execution of my playbook rather than maintaining information in 2 places (Vagrantfile and static inventory).

Ansible Playbook

The final piece of information is the actual Ansible playbook. Except for the host group I’m not going to use the inventory’s variables to keep the example simple.

$ cat -n blogpost.yml 
     1    ---
     2    - name: blogpost
     3      hosts: oracle_si
     4      vars:
     5      - oravg_pv: /dev/sdb
     6      become: yes
     7      tasks:
     8      - name: say hello
     9        debug: msg="hello from {{ ansible_hostname }}"
    10    
    11      - name: partitioning PVs for the volume group
    12        parted:
    13          device: "{{ oravg_pv }}"
    14          number: 1
    15          state: present
    16          align: optimal
    17          label: gpt

Expressed in plain English, it reads: take the block device indicated by the variable oravg_pv and create a single partition on it spanning the entire device.

As soon as I “vagrant up” the VM, it all comes together:

$ vagrant up
Bringing machine 'server1' up with 'virtualbox' provider…
==> server1: Importing base box 'ansibletestbase'…
==> server1: Matching MAC address for NAT networking…
==> server1: Setting the name of the VM: blogpost_server1_1554188252201_2080
==> server1: Clearing any previously set network interfaces…
==> server1: Preparing network interfaces based on configuration…
server1: Adapter 1: nat
server1: Adapter 2: hostonly
==> server1: Forwarding ports…
server1: 22 (guest) => 2222 (host) (adapter 1)
==> server1: Running 'pre-boot' VM customizations…
==> server1: Booting VM…
==> server1: Waiting for machine to boot. This may take a few minutes…
server1: SSH address: 127.0.0.1:2222
server1: SSH username: vagrant
server1: SSH auth method: private key
==> server1: Machine booted and ready!
==> server1: Checking for guest additions in VM…
==> server1: Setting hostname…
==> server1: Configuring and enabling network interfaces…
server1: SSH address: 127.0.0.1:2222
server1: SSH username: vagrant
server1: SSH auth method: private key
==> server1: Mounting shared folders…
server1: /vagrant => /home/martin/vagrant/blogpost
==> server1: Running provisioner: ansible…

Vagrant has automatically selected the compatibility mode '2.0'according to the Ansible version installed (2.7.7).

Alternatively, the compatibility mode can be specified in your Vagrantfile:
https://www.vagrantup.com/docs/provisioning/ansible_common.html#compatibility_mode

server1: Running ansible-playbook...

PLAY [blogpost] *************************************************************

TASK [Gathering Facts] ******************************************************
ok: [server1]

TASK [say hello] ************************************************************
ok: [server1] => {
"msg": "hello from server1"
}

TASK [partitioning PVs for the volume group] ********************************
changed: [server1]

PLAY RECAP ******************************************************************
server1 : ok=3 changed=1 unreachable=0 failed=0

Great! But I forgot to partition /dev/sd[cd] in the same way as I partition /dev/sdb! That’s a quick fix:

---
- name: blogpost
  hosts: oracle_si
  vars:
  - oravg_pv: /dev/sdb
  - asm_disks:
      - /dev/sdc
      - /dev/sdd
  become: yes
  tasks:
  - name: say hello
    debug: msg="hello from {{ ansible_hostname }}"

  - name: partitioning PVs for the Oracle volume group
    parted:
      device: "{{ oravg_pv }}"
      number: 1
      state: present
      align: optimal
      label: gpt

  - name: partition block devices for ASM
    parted:
      device: "{{ item }}"
      number: 1
      state: present
      align: optimal
      label: gpt
    loop: "{{ asm_disks }}"

Re-running the provisioning script couldn’t be easier. Vagrant has a command for this: “vagrant provision”. This command re-runs the provisioning code against a VM. A quick “vagrant provision” later my system is configured exactly the way I want:

$ vagrant provision
==> server1: Running provisioner: ansible...
Vagrant has automatically selected the compatibility mode '2.0'
according to the Ansible version installed (2.7.7).

Alternatively, the compatibility mode can be specified in your Vagrantfile:
https://www.vagrantup.com/docs/provisioning/ansible_common.html#compatibility_mode

    server1: Running ansible-playbook...

PLAY [blogpost] ****************************************************************

TASK [Gathering Facts] *********************************************************
ok: [server1]

TASK [say hello] ***************************************************************
ok: [server1] => {
    "msg": "hello from server1"
}

TASK [partitioning PVs for the Oracle volume group] ****************************
ok: [server1]

TASK [partition block devices for ASM] *****************************************
changed: [server1] => (item=/dev/sdc)
changed: [server1] => (item=/dev/sdd)

PLAY RECAP *********************************************************************
server1                    : ok=4    changed=1    unreachable=0    failed=0   

This is it! Using just a few commands I can spin up VMs, test my Ansible scripts and later on when I’m happy with them, check the code into source control.

Ansible tips’n’tricks: understanding your Ansible configuration

When writing automation scripts I tend to use a local Ansible configuration file. This has certain advantages for me, such as including it in a version control system (VCS). It also is a valid option for developers without access to the global configuration file installed by the package manager. And more convenient to use than setting environment variables.

WARNING: There are some very important security considerations though, which you must be aware of before using a local configuration file.

Until now I haven’t spent a lot of time thinking about configuration variables and the order of precedence, but that is exactly what I’d like to do in this post.

When putting this short article together I used Ansible 2.7.6 on Oracle Linux 7.6

What does the documentation have to say?

The official documentation does a really good job explaining the configuration settings. You have the choice of:

  • Environment variable
  • configuration file (ansible.cfg) in your local directory
  • A playbook-independent configuration file in ~/.ansible.cfg
  • The global settings in /etc/ansible/ansible.cfg

Let’s try this

This is an example of a very minimalist project folder in my lab, deliberately omitting a local configuration file.

[vagrant@server1 ansible]$ ls -la
total 16
drwxrwxr-x. 3 vagrant vagrant   53 Feb 14 09:09 .
drwx------. 5 vagrant vagrant 4096 Feb 14 08:40 ..
drwxrwxr-x. 7 vagrant vagrant 4096 Feb 14 09:09 .git
-rw-rw-r--. 1 vagrant vagrant   17 Feb 14 09:09 .gitignore
-rw-rw-r--. 1 vagrant vagrant   94 Feb 14 08:40 hello.yml

The example is kept super-short on purpose.

Precedence

Without a local configuration file (~/.ansible.cfg or $(pwd)/ansible.cfg) and/or defined environment variables, the settings made by your friendly system administrator in /etc/ansible/ansible.cfg govern every execution. You can use ansible-config to view the non-default values:

[vagrant@server1 ansible]$ ansible-config dump --only-changed
ANSIBLE_NOCOWS(/etc/ansible/ansible.cfg) = True
[vagrant@server1 ansible]$ 

As you can see, the global configuration file prevents the use of cowsay.

Now if I as an automation developer wanted to override a certain setting, I could use an environment variable. The documentation tricked me for a moment – when you see “DEFAULT_LOG_PATH” for example, exporting that variable does not have any effect. What you need to use is shown in the same section, but further down, with the Environment: label. To set the default log path on your shell, use ANSIBLE_LOG_PATH as in this example:

[vagrant@server1 ansible]$ export ANSIBLE_LOG_PATH=/home/vagrant/ansible/logs/my_ansible.log
[vagrant@server1 ansible]$ ansible-config dump --only-changed
ANSIBLE_NOCOWS(/etc/ansible/ansible.cfg) = True
DEFAULT_LOG_PATH(env: ANSIBLE_LOG_PATH) = /home/vagrant/ansible/logs/my_ansible.log

Thankfully ansible-config shows me the origin for each setting.

Using a local configuration file

Now how does a local configuration file play into this? Let’s try! I keep the global configuration file as it is, but I unset the environment variable. Here is the result:

[vagrant@server1 ansible]$ unset ANSIBLE_LOG_PATH
[vagrant@server1 ansible]$ cat ansible.cfg 
[defaults]

stdout_callback = debug
log_path = /home/vagrant/ansible/logs/ansible_blogpost.log
[vagrant@server1 ansible]$ ansible-config dump --only-changed
DEFAULT_LOG_PATH(/home/vagrant/ansible/ansible.cfg) = /home/vagrant/ansible/logs/ansible_blogpost.log
DEFAULT_STDOUT_CALLBACK(/home/vagrant/ansible/ansible.cfg) = debug
[vagrant@server1 ansible]$ 

What’s interesting is that my “nocows” setting from the global configuration file in /etc/ansible/ansible.cfg isn’t merged into the configuration. Without the use of environment variables, the only settings coming into play are those of the local configuration file. The same seems to apply if a local configuration file exists in addition to ~/.ansible.cfg. The file in the current working directory always took precedence in my testing.

This did not affect environment variables, they have always been considered by ansible-config. If for example I wanted to temporarily save the logfile to a different place and was too lazy to fire up vim, I could use this approach
[vagrant@server1 ansible]$ ANSIBLE_LOG_PATH=/home/vagrant/ansible/logs/overriding.log ansible-config dump --only-changed
DEFAULT_LOG_PATH(env: ANSIBLE_LOG_PATH) = /home/vagrant/ansible/logs/overriding.log
DEFAULT_STDOUT_CALLBACK(/home/vagrant/ansible/ansible.cfg) = debug
[vagrant@server1 ansible]$ 
Happy scripting!