RAC in KVM is possible without iSCSI
Posted by Martin Bach on September 3, 2013
(This post is for Jerry. He will know when he reads it)
I have been a great supporter of many flavours of virtualisation and my earliest experience with Xen goes back to Oracle VM 2 which was based on RHEL 4 and an early version of Xen. Why am I saying this? Because Xen is (was?) simple and elegant. Especially for building RAC systems: paravirtualised Linux was all you needed, and a dual-core machine: Xen is very lightweight even though recent achievements in processor architecture (nested page tables, single root IO virtualisation, others) make it more desirable to use hardware virtualisation with paravirtualised drivers. This is what this post is about!
Shared storage in Xen
As you know you need shared block devices for RAC for voting disks, OCR, data files, redo logs, the lot. In Xen that’s as straight forward as it gets:
disk = [ 'file:/var/lib/xen/images/node1/root.img,xvda,w', ... 'file:/var/lib/xen/images/node1/ocr001.img,xvdd,w!' ]
The “w!” means the block device is shared. Simple. Make the file available to the second RAC node and you are done.
In 2009 (yikes!) I ask the question if you could have RAC on KVM with the same technique. Who would have known then that Karl would be a colleague on day? Fast forward a few years and I have implemented such a solution. And it works in the lab.
However, it somehow annoys me to run an iSCSI server in the lab just for this. Sitting around a table in London earlier this year I had a conversation with Jerry who-like me-has a Xen background. He likes that you can use sparse files (those with holes in them) for shared RAC storage in Xen. You can’t in many other virtualisation solutions! So there is the challenge I set myself:
- Create 2 node RAC
- … without using LVM for shared storage
- … sparse files for shared storage
- … on KVM
It took me a little while to work this out, but: it is possible! Now I have the time to write it up.
KVM is short for Kernel Based Virtual Machines. It is Red Hat’s favourite virtualisation solution (I think) and for some time it looked as it was the coffin nail to Xen. KVM at the time was easier to handle: it comes with the distribution’s own kernel, no separately compiled kernel/multibooting was needed. Back in the day you had a “xenified” kernel with the Xen patches ported forward (adding dom0 support). Today things are better and these patches are part of the mainline kernel since around kernel 3.0.
Back to KVM. I tried to use KVM with the para-virtualised scsi interface, named virtio. It works like a charm, but virtio has a dependency on PCI slots in the guest. There is not an unlimited number of these! And furthermore I couldn’t get them to be shareable either.
Enter virtio-scsi, for example here: http://www.linux-kvm.com/content/virtio-scsi
Virtio itself is nothing new, and UEK 2 supported those devices all the time. What UEK 2 did not support was virtio-scsi. This functionality was introduced in a later kernel than UEK 2 is based on, I believe it was 3.4 but am not sure about that.
Update 140212: Kernel UEK 3 supports virtio-scsi. If you are using Oracle Linux 6.5 for your RAC testing then you shouldn’t have to change the grub configuration to use the RHEL compatible kernel.
Why does it matter? Because I managed to satisfy my two requirements with virtio-scsi. If you are using stock Oracle Linux 6.4 then you need to boot using the Red Hat Kernel. Yes, it says 2.6.32 but it maybe has little in common with it. Nevertheless the RH kernel allows you to use virtio-scsi! If you really want an Oracle kernel and it’s a toy environment you could try UEK 3 beta at the time of writing or even the playground channel and install an even newer kernel. Yesterday 3.11 was released.
Creating the files on the host
Creating the files for the VMs is not necessary in the first step. The virt-install tool does that for you. You could begin the installation as shown here:
# virt-install --name racnode1 --connect qemu:///system --ram 4096 --vcpus 2 \ --disk path=/var/lib/kvm/images/racnode1/disk.img,format=raw,bus=virtio,cache=none \ --network=bridge:br0,model=virtio --vnc --os-type=linux --os-variant=rhel6 \ --cdrom /m/V37084-01.iso --accelerate --keymap=en-gb
Have a look a the man-page for the tool, the above command creates an Oracle Linux system with 2 CPUs, 4 GB RAM and an 8 GB disk (sparse files again) to be installed from CDROM. If you are really sophisticated you give it a kickstart file and do the whole installation unattended… Or better still: PXE boot it and supply the kick start file as part of the process!
Adding the storage for RAC is done in 3 steps for me:
- Add space for Oracle binaries
- Add 3 x 5 GB LUNs for the OCR and voting disk (I always store them separately)
- Add LUNs for DATA and RECO disk groups.
The process is always the same, so I’ll show it for the first disk for +OCR. Here I am adding the first 3 ASM disks:
# dd if=/dev/zero of=ocr01.img bs=1 count=1 seek=5G 1+0 records in 1+0 records out 1 byte (1 B) copied, 3.1759e-05 s, 31.5 kB/s ...
That was quick, wasn’t it? And it’s all fake!
# ls -lh total 2.1G -rwxr-xr-x 1 qemu qemu 8.0G Sep 2 23:57 disk.img -rw-r--r-- 1 root root 5.1G Sep 2 23:56 ocr01.img -rw-r--r-- 1 root root 5.1G Sep 2 23:57 ocr02.img -rw-r--r-- 1 root root 5.1G Sep 2 23:57 ocr03.img
Add the -s flag for the real size
# ls -lhs total 2.1G 2.1G -rwxr-xr-x 1 qemu qemu 8.0G Sep 2 23:58 disk.img 4.0K -rw-r--r-- 1 root root 5.1G Sep 2 23:56 ocr01.img 4.0K -rw-r--r-- 1 root root 5.1G Sep 2 23:57 ocr02.img 4.0K -rw-r--r-- 1 root root 5.1G Sep 2 23:57 ocr03.img
So you can see we are not using any space.
Adding disks to VMs
The next step is to add the storage to the VM. If you are using the virt-manager that comes with Oracle Linux 6.4 you have now lost as it doesn’t know about virtio-scsi. There is a solution though: (drums please) the command line! libvirt in Oracle Linux 6.4 understands virtio-scsi syntax. So here you go.
At this point you need to create a backup of your configuration files!
# virsh dumpxml <domain you want to modify> > /path/to/backup/location/domainname.xml
First, we need the virtio-scsi controller. Edit the libvirt XML file using virsh edit and add something similar to this example (syntax reference is the libvirt page). The controllers belong into the domain-devices section.
<controller type='scsi' index='0' model='virtio-scsi'/>
Adapt the example to work in your environment using the libvirt documentation as the source. Then add the disks. That’s further up in domain-devices and should look similar to this:
<disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/kvm/images/racnode1/ocr01.img'/> <target dev='sda' bus='scsi'/> <shareable/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
Make the appropriate changes to the configuration, add the other disks and off we go! When booting make sure you boot into the Red Hat kernel (unless you are on UEK3) since UEK2 doesn’t know the virtio-scsi driver. And voila-here is the disk:
[root@racnode1 ~]# fdisk -l /dev/sda Disk /dev/sda: 5368 MB, 5368709120 bytes 166 heads, 62 sectors/track, 1018 cylinders Units = cylinders of 10292 * 512 = 5269504 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000
If you don’t see a disk /dev/sda although it is defined in the domain then you are booting off a kernel that doesn’t support virtio-scsi.
The remaining steps aren’t different from any other RAC installation, and I leave them as an exercise to the reader until I get that blog post written.
If you don’t like hacking your domain configuration files directly you might as well use the virsh API to do so, have a look at these two references from the Fedora documentation set: