Category Archives: Automation

Vagrant Ansible Provisioner: working with the Ansible Inventory – addendum

Recently I wrote a post about one of my dream combinations, Ansible and Vagrant. After hitting the publish button I noticed that there might be a need for a part II – passing complex data types such as lists and dicts to Ansible via a Vagrantfile.

I wrote a similar post for when you are in a situation where you invoke an Ansible playbook directly from the command line. For this article the invocation of the Ansible playbook happens as part of a call to vagrant up or vagrant provision.

Setup

I’m going to reuse the Vagrantfile from the previous article:

Vagrant.configure("2") do |config|
  
  config.vm.box = "debianbase"
  config.vm.hostname = "debian"

  config.ssh.private_key_path = "/home/martin/.ssh/debianbase"

  config.vm.provider "virtualbox" do |vb|
    vb.vcpus = 2
    vb.memory = "1024"
    vb.name = "debian"
  end

  config.vm.provision "ansible" do |ansible|
    ansible.playbook = "provisioning/example01.yml"
    ansible.verbose = "v"
    # ...
  end
end

The directory/file layout is also identical, repeated here for convenience:

$ tree provisioning/
provisioning/
├── example01.yml
├── example02.yml
├── group_vars
│   └── all.yml
└── roles
    └── role1
        └── tasks
            └── main.yml

I used Ubuntu 22.04, patched to 230306 with both Ansible and Vagrant versions as provided by the distribution:

  • Ansible 2.10.8
  • Vagrant 2.2.19

Passing lists to the Ansible playbook

This time however I’d like to pass a list to the playbook indicating which block devices to partition. The type of variable is a list, with either 1 or more elements. The Ansible code iterates over the list and performs the action on the current item. Here’s the code from the playbook example01.yml:

- hosts: default

  tasks: 
  - ansible.builtin.debug:
      var: blkdevs

  - name: print block devices to be partitioned
    ansible.builtin.debug:
      msg: If this was a call to community.general.parted I'd partition {{ item }} now
    loop: "{{ blkdevs }}"

The question is: how can I pass a list to the playbook? As with scalar data types I wrote about yesterday you use host_vars in the Vagrantfile:

  config.vm.provision "ansible" do |ansible|
    ansible.playbook = "provisioning/example01.yml"
    ansible.verbose = "v"
    ansible.host_vars = {
      "default" => {
        "blkdevs" => '[ "/dev/sdb", "/dev/sdc" ]'
      }
    }
  end

Note the use of single and double quotes! Without quotes around the entire RHS expression Ansible will complain about a syntax error in the dynamically generated inventory. The provisioner does what it’s supposed to do:

PLAY [default] *****************************************************************

TASK [Gathering Facts] *********************************************************
ok: [default]

TASK [ansible.builtin.debug] ***************************************************
ok: [default] => {
    "blkdevs": [
        "/dev/sdb",
        "/dev/sdc"
    ]
}

TASK [print block devices to be partitioned] ***********************************
ok: [default] => (item=/dev/sdb) => {
    "msg": "If this was a call to community.general.parted I'd partition /dev/sdb now"
}
ok: [default] => (item=/dev/sdc) => {
    "msg": "If this was a call to community.general.parted I'd partition /dev/sdc now"
}

PLAY RECAP *********************************************************************
default                    : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0 

Passing Dicts to the Ansible playbook

Passing a dict works exactly the same way, which is why I feel like I can keep this section short. The Vagrantfile uses the same host_var, blkdevs, but this time it’s a dict with keys indicating the intended use of the block devices. Each key is associated with a list of values containing the actual block device(s). Lists are perfectly fine even if they only contain a single item ;)

  config.vm.provision "ansible" do |ansible|
    ansible.playbook = "provisioning/example02.yml"
    ansible.verbose = "v"
    ansible.host_vars = {
      "default" => {
        "blkdevs" => 
          '{ "binaries": ["/dev/sdb"], "database": ["/dev/sdc", "/dev/sdd"], "fast_recovery_area": ["/dev/sde"] }'
      }
    }
  end

The playbook iterates over the list of block devices provided as the dict’s values:

- hosts: default
  become: true

  tasks: 
  - name: format block devices for Oracle binaries
    ansible.builtin.debug:
      msg: If this was a call to community.general.parted I'd partition {{ item }} now
    loop: "{{ blkdevs.binaries }}"
  
  - name: format block devices for Oracle database files
    ansible.builtin.debug:
      msg: If this was a call to community.general.parted I'd partition {{ item }} now
    loop: "{{ blkdevs.database }}"
  
  - name: format block devices for Oracle database Fast Recovery Area
    ansible.builtin.debug:
      msg: If this was a call to community.general.parted I'd partition {{ item }} now
    loop: "{{ blkdevs.fast_recovery_area }}"

Using lists as the dict’s values solves the problem of having to distinguish between a scalar variable like /dev/sdc and multiple block devices like /dev/sdc, /dev/sdd to be used.

Et voila! Here’s the result:

PLAY [default] *****************************************************************

TASK [Gathering Facts] *********************************************************
ok: [default]

TASK [format block devices for Oracle binaries] ********************************
ok: [default] => (item=/dev/sdb) => {
    "msg": "If this was a call to community.general.parted I'd partition /dev/sdb now"
}

TASK [format block devices for Oracle database files] **************************
ok: [default] => (item=/dev/sdc) => {
    "msg": "If this was a call to community.general.parted I'd partition /dev/sdc now"
}
ok: [default] => (item=/dev/sdd) => {
    "msg": "If this was a call to community.general.parted I'd partition /dev/sdd now"
}

TASK [format block devices for Oracle database Fast Recovery Area] *************
ok: [default] => (item=/dev/sde) => {
    "msg": "If this was a call to community.general.parted I'd partition /dev/sde now"
}

PLAY RECAP *********************************************************************
default                    : ok=4    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0 

Happy automating!

Advertisement

Ansible tips’n’tricks: gather facts in an ad-hoc fashion

There are times when I really need to get some ansible_facts from a host to work out details about, say the network card, storage, or Linux Distribution to continue coding. And I don’t want to/have the patience to run add a debug step in my Ansible playbook either :) Thankfully Ansible has just the right tool for the case, called ad-hoc command execution.

Since I can never remember how to gather ansible_facts I decided to write it down, hopefully this saves me (and you!) 5 minutes next time.

Setup

I am using ansible-5.9.0-1.fc36.noarch as provided by Fedora 36 (which includes ansible-core-2.12.10-1.fc36.noarch) on Linux x86-64. Vagrant 2.3.4 has been provided by the HashiCorp repository.

Gathering facts: using Vagrant’s dynamic Ansible inventory

If you are using the Ansible provisioner with your Vagrant box, Vagrant will create a suitable inventory for you. Assuming there is only a single VM defined in your Vagrantfile you can use the following command to gather facts:

ansible -i .vagrant/provisioners/ansible/inventory/ default -m setup
default | SUCCESS => {
    "ansible_facts": {
        "ansible_all_ipv4_addresses": [
            "10.0.2.15"
        ],
        "ansible_all_ipv6_addresses": [
            "fe80::a00:27ff:fec0:f04e"
        ],
        "ansible_apparmor": {
            "status": "enabled"

If you have multiple VMs defined in your Vagrantfile you need to either specify all or the VM name as defined in the inventory.

Gathering facts without an inventory

If you have a VM you can SSH to there is an alternative option available to you: simply specify the IP address or DNS name of the VM as the Ansible inventory followed by a ",", like so:

ansible -i nginx, nginx -u ansible --private-key ~/.ssh/ansible -m ansible.builtin.setup | head
nginx | SUCCESS => {
    "ansible_facts": {
        "ansible_all_ipv4_addresses": [
            "10.0.2.15",
            "192.168.56.43"
        ],
        "ansible_all_ipv6_addresses": [
            "fe80::a00:27ff:fe8d:7f5f",
            "fe80::a00:27ff:fe37:33f6"
        ],

That’s all there is to gathering ansible_facts in an ad-hoc fashion. Happy automating!

Vagrant Ansible Provisioner: working with the Ansible Inventory

Vagrant and Ansible are a great match: using Vagrant it’s very easy to work with virtual machines. Creating, updating, and removing VMs is just a short command away. Vagrant provides various provisioners to configure the VM, and Ansible is one of these. This article covers the ansible provisioner as opposed to ansible_local.

Earlier articles I wrote might be of interest in this context:

The post was written using Ubuntu 22.04 patched to 230306, I used Ansible and Vagrant as provided by the distribution:

  • Ansible 2.10.8
  • Vagrant 2.2.19

Configuring the Ansible Inventory

Very often the behaviour of an Ansible playbook is controlled using variables. Providing variables to Ansible from a Vagrantfile is quite neat and subject of this article.

Let’s have a look at the most basic Vagrantfile:

Vagrant.configure("2") do |config|
  
  config.vm.box = "debianbase"
  config.vm.hostname = "debian"

  config.ssh.private_key_path = "/home/martin/.ssh/debianbase"

  config.vm.provider "virtualbox" do |vb|
    vb.cpus = 2
    vb.memory = "2048"
    vb.name = "debian"
  end
  
  config.vm.provision "ansible" do |ansible|
    ansible.playbook = "provisioning/blogpost.yml"
    ansible.verbose = "v"
  end
end

I frequently use a flag indicating if the Ansible script should reboot the VM after the update of all packages completed. Within the provisioning folder I store group_vars, roles, and the main playbook as per the recommendation in the docs:

$ tree provisioning/
provisioning/
├── blogpost.yml
├── group_vars
│   └── all.yml
└── roles
    └── role1
        └── tasks
            └── main.yml

All global variables I don’t necessarily expect to change are stored in group_vars/all.yml. This includes the reboot_flag flag that defaults to false. The playbook does not need to list the variable in its own vars section, in fact doing so would grant the variable a higher precedence and my way of providing a variable to Ansible via Vagrant would fail. Here is the playbook:

- hosts: default
  become: true

  tasks: 
  - debug:
      var: reboot_flag

  - name: reboot
    ansible.builtin.reboot:
    when: reboot_flag | bool

Since rebooting can be a time consuming task I don’t want to do this by default, which is fine by me as I understand that I have to reboot manually later.

Let’s see what happens when the VM is provisioned:

PLAY [default] *****************************************************************

TASK [Gathering Facts] *********************************************************
ok: [default]

TASK [debug] *******************************************************************
ok: [default] => {
    "reboot_flag": false
}

TASK [reboot] ******************************************************************
skipping: [default] => {
    "changed": false,
    "skip_reason": "Conditional result was False"
}

PLAY RECAP *********************************************************************
default                    : ok=2    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0

Overriding variables

In case I want to override the flag I can do so without touching my Ansible playbook only by changing the Vagrantfile. Thanks to host_vars I can pass variables to Ansible via the inventory. Here’s the changed section in the Vagrantfile:

  config.vm.provision "ansible" do |ansible|
    ansible.playbook = "provisioning/blogpost.yml"
    ansible.verbose = "v"
    ansible.host_vars = {
      "default" => {
        "reboot_flag" => true
      }
    }
  end

All host_vars for my default VM are then appended to the inventory in .vagrant/provisioners/ansible/inventory/vagrant_ansible_inventory.

Next time I run vagrant provision the flag is changed to true, and the VM is rebooted:

PLAY [default] *****************************************************************

TASK [Gathering Facts] *********************************************************
ok: [default]

TASK [debug] *******************************************************************
ok: [default] => {
    "reboot_flag": "true"
}

TASK [reboot] ******************************************************************
changed: [default] => {
    "changed": true,
    "elapsed": 20,
    "rebooted": true
}

PLAY RECAP *********************************************************************
default                    : ok=3    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0 

Summary

Vagrant offers a very neat way of creating an Ansible inventory on the fly. If your Ansible playbooks are written in a way that different execution paths/options are configurable via variables a single playbook is highly flexible and can be used for many things. In the age of version control it’s very convenient not having to touch the source code of an Ansible playbook as that might interfere with other projects. Variables, passed at runtime, are much better suited to create flexible automation scripts.

Avoiding pitfalls when using cURL in CI/CD pipelines

Continuous Integration/Continuous Delivery (or Deployment, depending on your point of view) pipelines are at the core of many successful software projects. When designing your pipelines you sooner or later end up using REST calls to perform certain tasks. cURL is a popular command line tool to invoke REST APIs, and is commonly used in pipelines. Before you start using cURL in that capacity I’d like to draw your attention to a potential pitfall you can run into.

CI/CD Pipelines

A CI/CD pipeline typically consists of a series of tasks executed after a (git) commit is pushed to the remote registry. The idea is to ensure compliance with coding standards, formatting, and code quality, amongst a great wealth of other things. A pipeline is typically sub-divided into stages such as “build”, “lint”, “deploy” or anything else you can think of. Each stage consists of one or more tasks.

Whether or not the pipeline progresses to the next stage depends on the success or failure of tasks. Return codes are usually used to determine success or failure: a return code of 0 implies success, everything else usually terminates the pipeline’s execution.

Experimenting with cURL Exit Codes

In order to use cURL effectively in a CI pipeline it’s important to understand its error codes. Consider the following simulated API using node and express.js:

import express from 'express'
const app = express()
const port = 8080
const host = '0.0.0.0'

// allow for a successful test
app.get('/', (req, res) => {
  res.set('Content-Type', 'text/plain')
  res.send('test successful')
})

// invoke this URL to provoke a HTTP 400 (bad request) error
// see https://expressjs.com/en/4x/api.html#res.set for details
app.get('/failure', (req, res) => {
  res.set('Content-Type', 'text/plain')
  res.status(400).send('Bad Request')
})

app.listen(port, host, () => {
  console.log(`Simulated API server available on ${host}:${port}!`)
})

I created a small container image with the above code using the node:lts image (that’s node 18.14.2 and express 4.18.2, the most current versions at the time of writing, Feb 25th) and ran it.

“But what about security?” I hear you ask. You will undoubtedly have noted that this isn’t production code, it lacks authentication and other security features, logging, and basically everything apart from returning a bit of text and a HTTP status code. I’m also going to use HTTP calls for the API – enabling HTTPS would have been overkill for my example. In the real world you wouldn’t run APIs without TLS protection, would you? Since this post is about HTTP status codes and cURL in CI pipelines none of the extra bells and whistles are necessary, and crucially they’d probably distract from the actual problem. If you’re coding your APIs you should always adhere to industry best practices!

Starting the container

I stated the container as follows:

podman run --rm -it --name some-api --publish 8080:8080 api:0.5

The CMD directive in the project’s Dockerfile starts node and passes the api.mjs file to it. The API is now ready for business:

Simulated API server available on 0.0.0.0:8080!

Scenario 1: normal, successful completion

Let’s start with the successful invocation of the simulated API:

$ curl http://localhost:8080
test successful
$ echo $?
0

OK, nothing to see here, moving on… This is what was expected and shown for reference ;)

Scenario 2: Bad Request

I’m pointing curl to http://localhost:8080/failure next:

$ curl http://localhost:8080/failure
Bad Request
$ echo $?
0

Hmm, so that’s odd, curl‘s return code is 0 (= success) despite the error? Let’s dig a little deeper by using the verbose option and returning the headers

$ curl -iv http://localhost:8080/failure
*   Trying ::1:8080...
* Connected to localhost (::1) port 8080 (#0)
> GET /failure HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.74.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 400 Bad Request
HTTP/1.1 400 Bad Request
< X-Powered-By: Express
X-Powered-By: Express
< Content-Type: text/plain; charset=utf-8
Content-Type: text/plain; charset=utf-8
< Content-Length: 11
Content-Length: 11
< ETag: W/"b-EFiDB1U+dmqzx9Mo2UjcZ1SJPO8"
ETag: W/"b-EFiDB1U+dmqzx9Mo2UjcZ1SJPO8"
< Date: Sat, 25 Feb 2023 11:34:16 GMT
Date: Sat, 25 Feb 2023 11:34:16 GMT
< Connection: keep-alive
Connection: keep-alive
< Keep-Alive: timeout=5
Keep-Alive: timeout=5

< 
* Connection #0 to host localhost left intact
Bad Request

So it’s pretty clear that the HTTP status code is 400 (Bad Request). But that’s not reflected in the return code. Let’s fix this!

Instructing cURL to fail

A look at the cURL manual page reveals this interesting option:

       -f, --fail
              (HTTP) Fail silently (no output at all) on server  errors.  This
              is  mostly done to enable scripts etc to better deal with failed
              attempts. In normal cases when an HTTP server fails to deliver a
              document,  it  returns  an HTML document stating so (which often
              also describes why and more). This flag will prevent  curl  from
              outputting that and return error 22.

              This  method is not fail-safe and there are occasions where non-
              successful response codes will slip through, especially when au‐
              thentication is involved (response codes 401 and 407).

Which looks like exactly what I need. Let’s try this option:

$ curl --fail http://localhost:8080/failure
curl: (22) The requested URL returned error: 400 Bad Request
$ echo $?
22

Well that’s better! There’s a non-zero return code now.

Summary

The --fail option in curl (or --fail-with-body if your version of curl is 7.76 or later) allows DevOps engineers to architect their pipelines with greater resilience. Rather than manually parsing the cURL output checking for errors you can now rely on the REST API call’s return code to either proceed with the pipeline or stop execution. Please note that the –fail option isn’t fail-safe, as per the above comment in the man-page. Neither does it protect you from an API returning a HTTP-200 code if in fact an error occurred. But it’s definitely something I’ll use from now on by default.

Rendering .adoc include directives properly on GitHub

I recently worked on an issue where a perfectly fine ASCIIDoc file didn’t render properly in GitHub. At first I thought it was a broken file reference, but when I used the preview in my IDE I noticed that the syntax and link to the file are both correct. And yes, you can tell I’m new to ASCIIDoc :)

Here’s a screenshot of my document not rendering the way I intended in GitHub:

The Terraform example in section 2.3 should have displayed the contents of ./main.tf instead of a line starting with link:

As it turns out this is a long standing problem, see issue 1095 in GitHub. The relevant section from the source file reads:

=== Python Example

This is equally useless Python code.

[source, python]
----
import os
print ("hello ASCII doc")
----

=== Terraform Example

Unlike the previous examples the Terraform code is imported. This might not render properly in Github.

[source, hcl]
----
include::./main.tf[lines=1..-1]
----

You should see the configuration of a `provider {}` block

I didn’t go to great length with the Terraform code, all it does is show the definition of the Oracle Cloud Infrastructure provider:

#
# configure the Terraform Provider for Oracle Cloud Infrastructure
#
provider "oci" {
  fingerprint          = var.api_fingerprint
  private_key_path     = var.api_private_key_path
  region               = var.region
  tenancy_ocid         = var.tenancy_id
  user_ocid            = var.user_id
}

# add some actual code next...

Interestingly most IDEs render the ASCIIDoc correctly, they shown the combined text from both files even in preview mode. It’s really down to the aforementioned issue in GitHub that my file doesn’t render the way I have in mind.

Working around the problem

In an attempt at trying to save you 5 minutes I’d like to show you a potential workaround using asciidoctor-reducer. I opted to install it in a container, this way it should be easier to use it in my CI pipeline. The exact way you choose to invoke the tool does not matter, the actual call is most likely very similar to this:

$ asciidoctor-reducer my.adoc -o my-reduced.adoc

If you can use GitHub actions in your project you might want to have a look at an example featuring GitHub actions instead.

The asciidoctor-reducer post-processor combined the files, instead of an include directive the contents of main.tf was present in the ASCIIDoc file.

$ diff my.adoc my-reduced.adoc
55c55,67
< include::./main.tf[lines=1..-1]
---
> 
> #
> # configure the Terraform Provider for Oracle Cloud Infrastructure
> #
> provider "oci" {
>   fingerprint          = var.api_fingerprint
>   private_key_path     = var.api_private_key_path
>   region               = var.region
>   tenancy_ocid         = var.tenancy_id
>   user_ocid            = var.user_id
> }
> 
> # add some actual code next...
63d74
< 

This is a solution that works quite well for me personally. All I have to do is plug the container into my CI pipeline and have the tool create the combined document for me. Since I can review/test all inputs separately there is no need for me to check the generated file back into git. An alternative way of automating the generation of the reduced document is to create a pre-commit git hook. As with everything, possibilities are endless. Just pick the one that works for you.

Summary

Until there is no support for the include directive in GitHub, ASCII Doc workarounds are needed for documents to be rendered with the correct information. Using asciidoctor-reducer is a great option since it generates the desired results without requiring duplication of content.

As with all open-source tools make sure their license is compatible with your use case/company. I haven’t tested the tool thoroughly yet, so please ensure you comfortable with the way it works, especially with regards to unwanted side effects. This post is not an endorsement of either ASCIIDoc nor asciidoctor-reducer: use at your own risk and always have a backup ;)

Vagrant: always provision virtual machines

Since Spectre and Meltdown (2 infamous side channel attack vectors on CPUs) have become public I thought about better, more secure ways to browse the web. When I read that a commercial vendor for operating systems created a solution where a browser is started in a disposable sandbox that gets discarded when you exit the browser session I thought of ways to implement this feature myself.

Since I’m a great fan of both Virtualbox and Vagrant I decided to use the combination of the two to get this done. My host runs Ubuntu 22.04 LTS, and I’m using Vagrant 2.2.19 (the one shipping with the distribution, it’s not the latest version!) as well as Virtualbox 6.1.40. Whilst the solution presented in this article provides a more secure (notice how I didn’t claim this to be secure ;) ) approach to web browsing it doesn’t keep the host up to date. Security updates for the host O/S and hypervisor (read: Virtualbox) are crucial, too.

Please be super-careful when thinking of implementing a strategy where provisioners are run always, it can and potentially will break your system! For most use cases provisioning a VM each time it starts is not what you want.

Building a “browser” VM

I started off by creating a small “browser” VM with a minimal GUI and a web browser – nothing else – and registered this system as a vagrant box. This is the first step towards my solution: being able to create/tear down the sandbox. Not perfect, and there are more secure ways, but I’m fine with my approach.

The one thing that’s necessary though is updating the VM, ideally performed automatically, at each start. Vagrant provisioners can help with that.

Defining one or more provisioners in the Vagrantfile is a great way to initially configure a VM when it is created for the first time and works really well. Provisioners thankfully do NOT run with each subsequent start of the VM. If they were run each time it would probably be a disaster for all of my other Vagrant VMs. For my sandbox browser VM though I want all packages to be updated automatically.

Switching from on-demand provisioning to automatic provisioning

As I said, VMs are provisioned once by default, subsequent starts won’t run the provisioners as you can see in the output:

$ vagrant up

[output of vagrant bringing my VM up skipped]

==> default: Machine already provisioned. Run `vagrant provision` or use the `--provision`
==> default: flag to force provisioning. Provisioners marked to run always will still run.

The section detailing provisioners in my Vagrantfile is super simple because it has to run in Linux and Windows and I’m too lazy to install Ansible on my Windows box. The above output was caused by the following directive:

Vagrant.configure("2") do |config|

  # ... more directives ...

  config.vm.provision "shell",
    inline: "sudo apt-get update --error-on=any && sudo apt-get dist-upgrade -y"

  # ... even more directives ...

Looking at the command you may have guessed that this is a Debian-based VM, and I’m neither using Flatpack nor Snaps. All packages in this environment are DEBs. That’s easier to maintain for me.

To change the provision section to always run, simply tell it to:

Vagrant.configure("2") do |config|

  # ... more directives ...

  config.vm.provision "shell",
    inline: "sudo apt-get update --error-on=any && sudo apt-get dist-upgrade -y",
    run: "always"

Next time the vagrant VM starts, the provisioner marked as “run: always” will be triggered, even though the VM wasn’t created from scratch:

$ vagrant up

[output of vagrant bringing my VM up skipped once more]

==> default: Machine already provisioned. Run `vagrant provision` or use the `--provision`
==> default: flag to force provisioning. Provisioners marked to run always will still run.
==> default: Running provisioner: shell...
    default: Running: inline script

[output of apt-get omitted for brevity]

There you go! I could have achieved the same by telling vagrant to provision the VM using the --provision flag but I’m sure I would have forgotten that half the time.

Anyone using Ansible can benefit from running provisioners always, too:

Vagrant.configure("2") do |config|

  # ... more directives ...

  config.vm.provision "ansible", run: "always" do |ansible|
      ansible.playbook = "/path/to/ansible/playbook.yaml"
  end

Next time the VM is started by vagrant the Ansible playbook will be executed.

Summary

Vagrant can be instructed to run provisioners always if the use case merits it. For the most part it’s not advisable to run provisioners each time the VM comes up as it might well mess with the installation already present.

Retrieving passwords from OCI Vault for use in Terraform

This post is written with the intention to complement the excellent “A comprehensive guide to managing secrets in your Terraform code” by Yevgeniy Brikman. Its aim is to detail how Oracle Cloud Infrastructure Vault (OCI Vault) can be used to securely store credentials and subsequently use them in Terraform scripts.

If you haven’t done so I recommend reading Yevgeni’s post to get some background information as to why storing passwords anywhere in code, even dot-configuration files, is a Truly Bad Idea. This article provides an example for his third technique: using a dedicated secrets store.

Never, ever, store any credentials in code. Just . don’t . do it. It’s disaster waiting to happen

– every security conscious person, always

Standard disclaimer: please be advised that creating cloud resources most likely costs you money, and keeping them running even more so. Don’t create any cloud resources unless you are authorised to spend that money and know about the implications of creating the resources mentioned in this post.

The problem with the Terraform state file

Whilst using OCI Vault for storing and retrieving secrets is without a doubt a great step towards safer code management, there is still an unsolved issue with Terraform: the state file is considered sensitive information by HashiCorp at the time of writing (2022-05-30). When using the local backend (eg the default) passwords and other sensitive information are stored in clear text in a JSON file. Storing sensitive information in clear text is very much counter-productive to the article’s goals. Alternative backends providing encryption at rest are most likely better suited. Please ensure you remain compliant with your IT security department’s policies regarding the Terraform state file.

Overview

In this article you can read how to create an Autonomous Database (ADB) instance using a tiny Terraform script. Compared to some other tutorials about the subject you won’t find the ADMIN password provided in the code.

Rather than providing the ADB instance’s ADMIN password as an environment variable, the password is retrieved from an OCI Vault secret and passed to the ADB resource. The ADB instance is just one potential use case for using OCI Vault in Terraform: anywhere secrets need to be used to create/maintain resources, the technique detailed for ADB applies as well.

Secrets in the context of OCI Vault are credentials such as passwords, certificates, SSH keys, or authentication tokens that you use with Oracle Cloud Infrastructure services. An OCI Vault Secret cannot be looked up as such: secrets are wrapped into what’s referred to as a secret bundle. A secret bundle consists of the secret contents, properties of the secret and secret version (such as version number or rotation state), and user-provided contextual metadata for the secret.

To keep this article short-ish, it is assumed that a secret has already been created and its Oracle Cloud Identifier (OCID) is known. The secret’s OCID is passed to the Terraform script via a variable.

An Autonomous Database instance is perfectly suited to demonstrate the use of a Terraform Data Source for looking up vault secrets as it does not require any supporting resources such as Virtual Cloud Networks, or any elaborate network security settings. The Terraform script will create a publicly accessible ADB instance protected by an Access Control List (ACL) allowing only specific IP addresses to connect. Furthermore, mutual TLS is enabled for even stronger security.

Using an OCI Vault Secret

Lookup operations in Terraform are performed using Data Sources. There are data sources for most cloud resources, including the aforementioned secret bundle. Provided the secret’s OCID is passed via a variable, the lookup using an oci_secrets_secretbundle data source could be performed as follows:

data "oci_secrets_secretbundle" "bundle" {

  secret_id = var.secret_ocid
}

Thankfully the OCI Terraform provider is smart enough to retrieve the current, active version of the secret. Once the secret has been retrieved, it can be used for the creation of an ADB instance. Since secrets are base64 encoded, they have to be decoded before they can be used. The following snippet demonstrates the use of the data source inside the ADB resource:

resource "oci_database_autonomous_database" "demo_adb_21c" {
  compartment_id              = var.compartment_ocid
  db_name                     = "DEMO"
  admin_password              = base64decode(data.oci_secrets_secretbundle.bundle.secret_bundle_content.0.content)
  cpu_core_count              = 1
  data_storage_size_in_tbs    = 1
  db_version                  = "21c"
  db_workload                 = "OLTP"
  display_name                = "ADB Free Tier 21c"
  is_free_tier                = true
  is_mtls_connection_required = true
  ocpu_count                  = 1
  whitelisted_ips             = var.allowed_ip_addresses
}

A call to terraform plan followed by a terraform apply will initiate the creation of the ADB instance. As long as the admin password complies with the password complexity rules of the ADB resource, the database will be created. Once its lifecycle status changed to running, the database will be accessible to IP addresses specified in var.allowed_addresses (a list of strings). Should you invoke the Terraform script from a Linux shell, this might be a way to set the variable:

$ export TF_VAR_allowed_ip_addresses='[ "1.2.3.4", "4.5.6.7" ]'
$ terraform plan -out myplan

Summary

Using OCI Vault to store sensitive information is a secure way to mitigate against many password-handling problems. The Terraform state file remains a concern, especially when using the local backend as it stores all information in clear text. The IT security department should be consulted as to how this potential security vulnerability should be treated. Other backends than the local backend exist and might suit the IT security team’s needs better.

Once a Vault secret has been looked up, it can be used in any Terraform resource. Referencing data sources should lead to more secure code deployments.

Happy Automating!

Vagrant: mapping a Virtualbox VM to a Vagrant environment

This is a small post hopefully saving you a few minutes mapping Vagrant and VirtualBox environments.

I typically have lots of Vagrant environments defined. I love Vagrant as a technology, it makes it super easy to spin up Virtual Machines (VMs) and learn about new technologies.

Said Vagrant environments obviously show up as VMs in VirtualBox. To make it more interesting I have a few more VirtualBox VMs that don’t map to a Vagrant environment. Adding in a naming convention that’s been growing organically over time I occasionally find myself at a loss as to which VirtualBox VM maps to a Vagrant environment. Can this be done? Yep, and creating a mapping is quite simple actually. Here is what I found useful.

Directory structure

My Vagrant directory structure is quite simple: I defined ${HOME}/vagrant as top-level directory with a sub-directory containing all my (custom) boxes. Apart from ~/vagrant/boxes I create further sub-directories for each project. For example:

[martin@ryzen: vagrant]$ ls -ld *oracle* boxes
drwxrwxr-x 2 martin martin 4096 Nov 23 16:52 boxes
drwxrwxr-x 3 martin martin   41 Feb 16  2021 oracle_19c_dg
drwxrwxr-x 3 martin martin   41 Nov 19  2020 oracle_19c_ol7
drwxrwxr-x 3 martin martin   41 Jan  6  2021 oracle_19c_ol8
drwxrwxr-x 3 martin martin   41 Nov 25 12:54 oracle_xe

But … which of my VirtualBox VMs belongs to the oracle_xe environment?

Mapping a Vagrant environment to a VirtualBox VM

Vagrant keeps a lot of metadata in the project’s .vagrant directory. Continuing with the oracle_xe example, here is what it stores:

[martin@buildhost: oracle_xe]$ tree .vagrant/
.vagrant/
├── machines
│   └── oraclexe
│       └── virtualbox
│           ├── action_provision
│           ├── action_set_name
│           ├── box_meta
│           ├── creator_uid
│           ├── id
│           ├── index_uuid
│           ├── synced_folders
│           └── vagrant_cwd
├── provisioners
│   └── ansible
│       └── inventory
│           └── vagrant_ansible_inventory
└── rgloader
    └── loader.rb

7 directories, 10 files

Looking at the above output I guess I should look at .vagrant/machines/

The machine name (oraclexe) is derived from the Vagrantfile. I create a config.vm.define section per VM out of habit (even when I create just 1 VM), as you can see here in my shortened Vagrantfile:

# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure("2") do |config|
  
  config.vm.define "oraclexe" do |xe|
    xe.vm.box = "ol7"
    xe.vm.box_url = "file:///home/martin/vagrant/boxes/ol7.json"

    ...

    xe.vm.provision "ansible" do |ansible|
      ansible.playbook = "setup.yml"
    end
  end
end

In case you don’t give your VMs a name you should find a directory named default instead.

As I’m using Vagrant together with VirtualBox I’m not surprised to find a sub-directory named virtualbox.

Finally! You see the VM’s metadata in that directory. The VM’s ID can be found in .vagrant/machines/oraclexe/virtualbox/id. The file contains the internal ID VirtualBox uses to identify VMs. Using that knowledge to my advantage I can create the lookup as shown here:

[martin@buildhost: oracle_xe]$ vboxmanage list vms | grep $(cat .vagrant/machines/oraclexe/virtualbox/id)
"oraclexe" {67031773-bad9-4325-937b-e471d02a56a3}

Voila! This wasn’t particularly hard since the VM name is oracelxe as well. Nevertheless I found this technique works well regardless of how you curated your Vagrantfile.

Happy Automating!

Configuring a VM using Ansible via the OCI Bastion Service

In my previous post I wrote about the creation of a Bastion Service using Terraform. As I’m incredibly lazy I prefer to configure the system pointed at by my Bastion Session with a configuration management tool. If you followed my blog for a bit you might suspect that I’ll use Ansible for that purpose. Of course I do! The question is: how do I configure the VM accessible via a Bastion Session?

Background

Please have a look at my previous post for a description of the resources created. In a nutshell the Terraform code creates a Virtual Cloud Network (VCN). There is only one private subnet in the VCN. A small VM without direct access to the Internet resides in the private subet. Another set of Terraform code creates a bastion session allowing me to connect to the VM.

I wrote this post on Ubuntu 20.04 LTS using ansible 4.8/ansible-core 2.11.6 by the way. From what I can tell these were current at the time of writing.

Connecting to the VM via a Bastion Session

The answer to “how does one connect to a VM via a Bastion Session?” isn’t terribly difficult once you know how to. The clue to my solution is with the SSH connection string as shown by the Terraform output variable. It prints the contents of oci_bastion_session.demo_bastionsession.ssh_metadata.command

$ terraform output
connection_details = "ssh -i <privateKey> -o ProxyCommand=\"ssh -i <privateKey> -W %h:%p -p 22 ocid1.bastionsession.oc1.eu-frankfurt-1.a...@host.bastion.eu-frankfurt-1.oci.oraclecloud.com\" -p 22 opc@10.0.2.39"

If I can connect to the VM via SSH I surely can do so via Ansible. As per the screen output above you can see that the connection to the VM relies on a proxy in form of the bastion session. See man 5 ssh_config for details. Make sure to provide the correct SSH keys in both locations as specified in the Terraform code. I like to think of the proxy session as a Jump Host to my private VM (its internal IP is 10.0.2.39). And yes, I am aware of alternative options to SSH, the one shown above however is the most compatible (to my knowledge).

Creating an Ansible Inventory and running a playbook

Even though it’s not the most flexible option I’m a great fan of using Ansible inventories. The use of an inventory saves me from typing a bunch of options on the command line.

Translating the Terraform output into the inventory format, this is what worked for me:

[blogpost]
privateinst ansible_host=10.0.2.39 ansible_user=opc ansible_ssh_common_args='-o ProxyCommand="ssh -i ~/.oci/oci_rsa -W %h:%p -p 22 ocid1.bastionsession.oc1.eu-frankfurt-1.a...@host.bastion.eu-frankfurt-1.oci.oraclecloud.com"'

Let’s run some Ansible code! Consider this playbook:

- hosts: blogpost
  tasks:
  - name: say hello
    ansible.builtin.debug:
      msg: hello from {{ ansible_hostname }}

With the inventory set, it’s now possible to run the playbook:

$ ansible-playbook -vi inventory.ini blogpost.yml 
Using /tmp/ansible/ansible.cfg as config file

PLAY [blogpost] *********************************************************************************************************

TASK [Gathering Facts] **************************************************************************************************
ok: [privateinst]

TASK [say hello] ********************************************************************************************************
ok: [privateinst] => {}

MSG:

hello from privateinst

PLAY RECAP **************************************************************************************************************
privateinst                : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

The playbook is of course very simple, but it can be easily extended. The tricky bit was establishing the connection, once the connection is established the sky is the limit!

Create an OCI bastion service via Terraform

Maintaining bastion hosts (a “jump box” or other network entry point directly exposed to the Internet) is somewhat frowned upon by security conscious architects, for good reasons. In my opinion the only way to connect on-premises systems to the cloud is by means of a dedicated, low-latency/high-bandwidth, and most importantly well-secured link.

I never liked the idea of exposing systems to the Internet – too much can go wrong and you’d be surprised about the number of port-scans you see, followed by attempts at breaking in. Sometimes of course opening a system to the Internet is unavoidable: a website offering services to the public is quite secure if it cannot be reached but won’t generate a lot of revenue that way. Thankfully there are ways to expose such applications safely to the Internet, a topic that’s out of scope of this post though.

My very personal need for the bastion service

I create lots of demos using Oracle Cloud Infrastructure (OCI) and setting up a dedicated link isn’t always practical. The solution for me is to use Oracle’s bastion service. This way I can ensure time-based secure access to my resources in a private subnet. Most importantly there is no need to connect a VM directly to the Internet. And since it’s all fully automated it doesn’t cause any more work than terraform up followed by a terraform destroy when the demo completed.

This blog post describes how I create a VCN with a private subnet containing a VM. The entire infrastructure is intended as a DEMO only. None of the resources will live longer than for the duration of a conference talk. Please don’t follow this approach if you would like to deploy systems in the cloud for > 45 minutes. Also be aware that it’s entirely possible for you to incur cost when calling terraform up on the code. As always, the code will be available on Github.

Creating a Bastion Service

The bastion service is created by Terraform. Following the advice from the excellent Terraform Up and Running (2nd ed) I separated the resource creation into three directories:

  • Network
  • Compute
  • Bastion

To keep things reasonably simple I refrained from creating modules.

Directory layout

Please have a look at the book for more details about the directory structure. You’ll notice that I simplified the example a little.

$ tree .
.
├── bastionsvc
│   ├── main.tf
│   ├── terraform.tfstate
│   └── variables.tf
├── compute
│   ├── compute.tf
│   ├── main.tf
│   ├── outputs.tf
│   ├── terraform.tfstate
│   ├── terraform.tfstate.backup
│   └── variables.tf
├── network
│   ├── network.tf
│   ├── outputs.tf
│   ├── terraform.tfstate
│   ├── terraform.tfstate.backup
│   └── variables.tf
├── readme.md
└── variables.tf

I decided to split the network code into a generic section and the bastion service for reason explained later.

Generic Network Code

The network code is responsible for creating the Virtual Cloud Network (VCN) including subnets, security lists, necessary gateways etc. When I initially used the bastion service I struggled a bit with Network Security Groups (NSG) and went with a security list instead. I guess I should re-visit that decision at some point.

The network must be created first. In addition to creating all the necessary infrastructure it exports an output variable used by the compute and bastion code. These read remote state to get the necessary OCIDs.

Note that the choice of a remote data source has its drawbacks as described in the documentation. These don’t apply for my demos as I’m the only user of the code. And while I’m at it, using local state is acceptable only because I know I’m the only one using the code. Local state doesn’t necessarily work terribly well for team-development.

Here are some key features of the network code. As these tend to go stale over time, have a look at the Github repository for the latest and greatest revision.

resource "oci_core_vcn" "vcn" {

  compartment_id = var.compartment_ocid
  cidr_block     = "10.0.2.0/24"
  defined_tags   = var.network_defined_tags
  display_name   = "demovcn"
  dns_label      = "demo"

}

# --------------------------------------------------------------------- subnet

resource "oci_core_subnet" "private_subnet" {

  cidr_block                 = var.private_sn_cidr_block
  compartment_id             = var.compartment_ocid
  vcn_id                     = oci_core_vcn.vcn.id
  defined_tags               = var.network_defined_tags
  display_name               = "private subnet"
  dns_label                  = "private"
  prohibit_public_ip_on_vnic = true
  prohibit_internet_ingress  = true
  route_table_id             = oci_core_route_table.private_rt.id
  security_list_ids          = [
    oci_core_security_list.private_sl.id
  ]
}

The security list allows SSH only from within the same subnet:

# --------------------------------------------------------------------- security list

resource "oci_core_security_list" "private_sl" {

  compartment_id = var.compartment_ocid
  vcn_id         = oci_core_vcn.vcn.id

...

  egress_security_rules {

    destination = var.private_sn_cidr_block
    protocol    = "6"

    description      = "SSH outgoing"
    destination_type = ""

    stateless = false
    tcp_options {

      max = 22
      min = 22

    }
  }

  ingress_security_rules {

    protocol = "6"
    source   = var.private_sn_cidr_block

    description = "SSH inbound"

    source_type = "CIDR_BLOCK"
    tcp_options {

      max = 22
      min = 22

    }

  }
}

The bastion service and its corresponding session are going to be created in the same private subnet as the compute instance for the sake of simplicity.

Compute Instance

The compute instance is created as a VM.Standard.E3.Flex shape with 2 OCPUs. There’s nothing too special about the resource, except maybe that I’m explicitly enabling the bastion plugin agent, a prerequisite for using the service.

resource "oci_core_instance" "private_instance" {
  agent_config {
    is_management_disabled = false
    is_monitoring_disabled = false

...

    plugins_config {
      desired_state = "ENABLED"
      name = "Bastion"
    }
  }

  defined_tags = var.compute_defined_tags

  create_vnic_details {
    
    assign_private_dns_record = true
    assign_public_ip = false
    hostname_label = "privateinst"
    subnet_id = data.terraform_remote_state.network_state.outputs.private_subnet_id
    nsg_ids = []
  }

...

Give it a couple of minutes for all agents to start.

Bastion Service

Once the VM’s bastion agent is up it is possible to create the bastion service:

resource "oci_bastion_bastion" "demo_bastionsrv" {

  bastion_type     = "STANDARD"
  compartment_id   = var.compartment_ocid
  target_subnet_id = data.terraform_remote_state.network_state.outputs.private_subnet_id

  client_cidr_block_allow_list = [
    var.local_laptop_id
  ]

  defined_tags = var.network_defined_tags

  name = "demobastionsrv"
}


resource "oci_bastion_session" "demo_bastionsession" {

  bastion_id = oci_bastion_bastion.demo_bastionsrv.id
  defined_tags = var.network_defined_tags
  
  key_details {
  
    public_key_content = var.ssh_bastion_key
  }

  target_resource_details {

    session_type       = "MANAGED_SSH"
    target_resource_id = data.terraform_remote_state.compute_state.outputs.private_instance_id

    target_resource_operating_system_user_name = "opc"
    target_resource_port                       = "22"
  }

  session_ttl_in_seconds = 3600

  display_name = "bastionsession-private-host"
}

output "connection_details" {
  value = oci_bastion_session.demo_bastionsession.ssh_metadata.command
}

The Bastion is set up in the private subnet created by the network code. Note that I’m defining the session’s client_cidr_block_allow_list specifically to only allow my external IP to access the service. The session is of type Managed SSH, thus requires a Linux host.

And this is all I can say about the creation of a bastion session in Terraform.

Terraform in action

Once all the resources have been created all I need to do is adapt the SSH command provided by my output variable shown here:

connection_details = "ssh -i <privateKey> -o ProxyCommand=\"ssh -i <privateKey> -W %h:%p -p 22 ocid1.bastionsession.oc1.eu-frankfurt-1.am...@host.bastion.eu-frankfurt-1.oci.oraclecloud.com\" -p 22 opc@10.0.2.94"

After adopting the SSH command I can connect to the instance.

$ ssh -i ...
The authenticity of host '10.0.2.94 (<no hostip for proxy command>)' can't be established.
ECDSA key fingerprint is SHA256:Ot...
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '10.0.2.94' (ECDSA) to the list of known hosts.
Activate the web console with: systemctl enable --now cockpit.socket

[opc@privateinst ~]$ hostname
privateinst
[opc@privateinst ~]$ logout

That’s it! I am connected to the instance and experiment with my demo.

Another reason I love Terraform: when the demo has concluded I can simply tear down all resources with very few commands.