Learning Kubernetes: persistent storage with Minikube

As part of my talk at (the absolutely amazing) Riga Dev Days 2019 I deployed Oracle Restful Data Services (ORDS) in Minikube as my application’s endpoint. I’ll blog about deploying ORDS 19 in docker and running it on Kubernetes later, but before I can do so I want to briefly touch about persistent storage in Minikube because I saw it as a pre-requisite.

In a previous post I wrote about using Minikube as a runtime environment for learning Kubernetes. This post uses the same versions as before – Docker 18.09 and Kubernetes 1.14 are invoked by Minikube 1.1.0.

Quick & dirty intro to storage concepts for Kubernetes

Container storage is – at least to my understanding – ephemeral. So, in other words, if you delete the container, and create a new one, locally stored data is gone. In “standalone” Docker, you can use so-called volumes to store things more permanently. A similar concept exists in Kubernetes in form of a persistent volume (PV) and a corresponding persistent volume claim (PVC).

In a nutshell, and greatly simplified, a persistent volume is a piece of persistent storage pre-allocated on a Kubernetes cluster that can be “claimed”. Hence the names …

There is a lot more to say about persistent data in Kubernetes, and others have already done that, so you won’t find an explanation of storage concepts in Kubernetes here. Please head over to the documentation set for more details.

Be careful though, as the use of persistent volumes and persistent volume claims is quite an in-depth topic, especially outside the developer-scoped Minikube. I specifically won’t touch on the subject of persistent storage outside of Minikube, I’d rather save that for a later post.

Persistent Volumes for my configuration files

ORDS requires you to run an installation routine first, in the cause of which it will create a directory containing its runtime configuration. The shell script I’m using to initialise my container checks for the existence of the configuration directory and skips the installation step if it finds one. It proceeds straight to starting the Tomcat container. This is primarily done to speed the startup process up. If I were not to use the PV/PVC combination the pods in my deployment would have to run the installation each time they start up, something I wanted to avoid.

A less complex example please

I didn’t want to make things more complicated than necessary, so instead I’ll use a much simpler example before writing up on my ORDS deployment. My example is based on the official Ubuntu image, and I’m writing a “heartbeat” file into the mounted volume and see if I can find it on the underlying infrastructure.

The Minikube documentation informs us that Minikube preserves data in /data and a few other locations. Anything you try to put elsewhere will be lost. With that piece of information at hand I proceeded with the storage creation.

Experienced Minikube users might point out at this stage that pre-creating a volume isn’t needed as Minikube supports dynamic volume creation. From what I can tell that’s correct, but not what I chose to do.

Creating a persistent volume

Based on the previously mentioned documentation I created a persistent volume and accompanying volume claim like this:

$ cat persistent-volumes.yml 

kind: PersistentVolume
apiVersion: v1
metadata:
  name: research-vol
  labels:
    type: local
spec:
  capacity:
    storage: 2Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data/research-vol"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: research-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ""
  volumeName: research-vol
  resources:
    requests:
      storage: 1Gi

This approach is very specific to Minikube because I’m using persistent volumes of type HostPath.

Translated to plain English this means that I’m creating a 2 GB volume pointing to /data/research-vol on my Minikube system. And I’m asking for 1 GB in my persistent volume claim. The access mode (ReadWriteOnce) seems to be related to mounting the volume (concurrently) on multi-node clusters. Or it’s a bug because I successfully wrote to a single PVC from multiple pods as you can see in a bit… In my opinion the documentation wasn’t particularly clear on the subject.

Build the Docker image

Before I can deploy my example application to Minikube, I need to build a Docker image first. This is really boring, but I’ll show it here for the sake of completeness:

$ cat Dockerfile 
FROM ubuntu
COPY run.sh /usr/local/bin
RUN chmod +x /usr/local/bin/run.sh
ENTRYPOINT ["/usr/local/bin/run.sh"]

The shell script I’m running is shown here:

 $ cat run.sh 
 #!/usr/bin/env bash

 set -euxo pipefail

 if [[ ! -d $VOLUME ]]; then
   /bin/echo ERR: cannot find the volume $VOLUME, exiting
   exit 1
 fi

 while true; do
   /usr/bin/touch ${VOLUME}/$(hostname)-$(date +%Y%m%d-%H%M%S)
   /bin/sleep 10
 done

As you can see it is just an infinite loop writing heartbeat files into the directory indicated by the VOLUME environment variable. The image is available to Minikube after building it. Refer to my previous post on how to build Docker images for Minikube, or have a look at the documentation. I built the image as “research:v2”

Deploy

With the foundation laid, I can move on to defining the deployment. I’ll do that again with the help of a YAML file:

 $ cat research-deployment.yml 
 apiVersion: extensions/v1beta1
 kind: Deployment
 metadata:
   labels:
     app: research
   name: research
 spec:
   replicas: 2
   selector:
     matchLabels:
       app: research
   template:
     metadata:
       labels:
         app: research
     spec:
       containers:
       - image: research:v2
         env:
         - name: VOLUME
           value: "/var/log/heartbeat"
         name: research
         volumeMounts:
         - mountPath: "/var/log/heartbeat"
           name: research-vol
       volumes:
       - name: research-vol
         persistentVolumeClaim: 
           claimName: research-pvc
       restartPolicy: Always

If you haven’t previously worked with Kubernetes this might look daunting, but it isn’t actually. I’m asking for the deployment of 2 copies (“replicas”) of my research:v2 image. I am passing an environment variable – VOLUME – to the container image. It contains the path to the persistent volume’s mount point. I’m also mounting a PV named research-vol as /var/log/heartbeat/ in each container. This volume in the container scope is based on the definition found in the volumes: section of the YAML file. It’s important to match the information in volumes and volumeMounts.

Running

After a quick kubectl apply -f research-deployment.yml I have 2 pods happily writing to the persistent volume.

 $ kubectl get pods
 NAME                       READY   STATUS    RESTARTS   AGE
 research-f6668c975-98684   1/1     Running   0          6m22s
 research-f6668c975-z6pll   1/1     Running   0          6m22s

The directory exists as specified on the Minikube system. If you used the VirtualBox driver, use minikube ssh to access the VM and change to the PV’s directory:

# pwd
/data/research-vol
# ls -l *research-f6668c975-z6pll* | wc -l
53
# ls -l *research-f6668c975-98684* | wc -l
53

As you can see, both pods are able to write to the directory. The volume’s contents were preserved when I deleted the deployment:

$ kubectl delete deploy/research
deployment.extensions "research" deleted
$ kubectl get pods
No resources found.

This demonstrates that both pods from my deployment are gone. What does it look like back on the Minikube VM?

# ls -l *research-f6668c975-z6pll* | wc -l
63
# ls -l *research-f6668c975-98684* | wc -l
63

You can see that files are still present. The persistent volume lives up to its name.

Next I wanted to see if (new) Pods pick up what’s in the PV. After a short modification to the shell script and a quick docker build followed by a modification of the deployment to use research:v3, the container prints the number of files:

#!/usr/bin/env bash

set -euxo pipefail

if [[ ! -d $VOLUME ]]; then
  /bin/echo ERR: cannot find the volume $VOLUME, exiting
  exit 1
fi

/bin/echo INFO: found $(ls -l $VOLUME/* | wc -l) files in the volume

while true; do
  /usr/bin/touch ${VOLUME}/$(hostname)-$(date +%Y%m%d-%H%M%S)
  /bin/sleep 10
done

As proven by the logs:

 $ kubectl get pods
 NAME                        READY   STATUS        RESTARTS   AGE
 research-556cc9989c-bwzkg   1/1     Running       0          13s
 research-556cc9989c-chg76   1/1     Running       0          13s

 $ kubectl logs research-556cc9989c-bwzkg
 [[ ! -d /var/log/heartbeat ]]
 ++ wc -l
 ++ ls -l /var/log/heartbeat/research-7d4c7c8dc8-5xrtr-20190606-105014 
 ( a lot of output not shown )
 /bin/echo INFO: found 166 files in the volume
 INFO: found 166 files in the volume
 true
 ++ hostname
 ++ date +%Y%m%d-%H%M%S
 /usr/bin/touch /var/log/heartbeat/research-556cc9989c-bwzkg-20190606-105331
 /bin/sleep 10
 true
 ++ hostname
 ++ date +%Y%m%d-%H%M%S
 /usr/bin/touch /var/log/heartbeat/research-556cc9989c-bwzkg-20190606-105341
 /bin/sleep 10 

It appears as if I successfully created and used persistent volumes in Minikube.