Martins Blog

Trying to explain complex things in simple terms

RAC in KVM is possible without iSCSI

Posted by Martin Bach on September 3, 2013

(This post is for Jerry. He will know when he reads it)

I have been a great supporter of many flavours of virtualisation and my earliest experience with Xen goes back to Oracle VM 2 which was based on RHEL 4 and an early version of Xen. Why am I saying this? Because Xen is (was?) simple and elegant. Especially for building RAC systems: paravirtualised Linux was all you needed, and a dual-core machine: Xen is very lightweight even though recent achievements in processor architecture (nested page tables, single root IO virtualisation, others) make it more desirable to use hardware virtualisation with paravirtualised drivers. This is what this post is about!

Shared storage in Xen

As you know you need shared block devices for RAC for voting disks, OCR, data files, redo logs, the lot. In Xen that’s as straight forward as it gets:

disk = [
  'file:/var/lib/xen/images/node1/root.img,xvda,w',
   ...
  'file:/var/lib/xen/images/node1/ocr001.img,xvdd,w!'
]

The “w!” means the block device is shared. Simple. Make the file available to the second RAC node and you are done.

In 2009 (yikes!) I ask the question if you could have RAC on KVM with the same technique. Who would have known then that Karl would be a colleague on day? Fast forward a few years and I have implemented such a solution. And it works in the lab.

However, it somehow annoys me to run an iSCSI server in the lab just for this. Sitting around a table in London earlier this year I had a conversation with Jerry who-like me-has a Xen background. He likes that you can use sparse files (those with holes in them) for shared RAC storage in Xen. You can’t in many other virtualisation solutions! So there is the challenge I set myself:

  • Create 2 node RAC
  • … without using LVM for shared storage
  • … sparse files for shared storage
  • … on KVM

It took me a little while to work this out, but: it is possible! Now I have the time to write it up.

Background

KVM is short for Kernel Based Virtual Machines. It is Red Hat’s favourite virtualisation solution (I think) and for some time it looked as it was the coffin nail to Xen. KVM at the time was easier to handle: it comes with the distribution’s own kernel, no separately compiled kernel/multibooting was needed. Back in the day you had a “xenified” kernel with the Xen patches ported forward (adding dom0 support). Today things are better and these patches are part of the mainline kernel since around kernel 3.0.

Back to KVM. I tried to use KVM with the para-virtualised scsi interface, named virtio. It works like a charm, but virtio has a dependency on PCI slots in the guest. There is not an unlimited number of these! And furthermore I couldn’t get them to be shareable either.

Enter virtio-scsi, for example here: http://www.linux-kvm.com/content/virtio-scsi

Virtio itself is nothing new, and UEK 2 supported those devices all the time. What UEK 2 did not support was virtio-scsi. This functionality was introduced in a later kernel than UEK 2 is based on, I believe it was 3.4 but am not sure about that.

Update 140212: Kernel UEK 3 supports virtio-scsi. If you are using Oracle Linux 6.5 for your RAC testing then you shouldn’t have to change the grub configuration to use the RHEL compatible kernel. 

Why does it matter? Because I managed to satisfy my two requirements with virtio-scsi. If you are using stock Oracle Linux 6.4 then you need to boot using the Red Hat Kernel. Yes, it says 2.6.32 but it maybe has little in common with it. Nevertheless the RH kernel allows you to use virtio-scsi! If you really want an Oracle kernel and it’s a toy environment you could try UEK 3 beta at the time of writing or even the playground channel and install an even newer kernel. Yesterday 3.11 was released.

Creating the files on the host

Creating the files for the VMs is not necessary in the first step. The virt-install tool does that for you. You could begin the installation as shown here:

# virt-install --name racnode1 --connect qemu:///system --ram 4096 --vcpus 2 \
--disk path=/var/lib/kvm/images/racnode1/disk.img,format=raw,bus=virtio,cache=none \
--network=bridge:br0,model=virtio --vnc --os-type=linux --os-variant=rhel6 \
--cdrom /m/V37084-01.iso --accelerate --keymap=en-gb

Have a look a the man-page for the tool, the above command creates an Oracle Linux system with 2 CPUs, 4 GB RAM and an 8 GB disk (sparse files again) to be installed from CDROM. If you are really sophisticated you give it a kickstart file and do the whole installation unattended… Or better still: PXE boot it and supply the kick start file as part of the process!

Creating storage

Adding the storage for RAC is done in 3 steps for me:

  • Add space for Oracle binaries
  • Add 3 x 5 GB LUNs for the OCR and voting disk (I always store them separately)
  • Add LUNs for DATA and RECO disk groups.

The process is always the same, so I’ll show it for the first disk for +OCR. Here I am adding the first 3 ASM disks:

# dd if=/dev/zero of=ocr01.img bs=1 count=1 seek=5G
1+0 records in
1+0 records out
1 byte (1 B) copied, 3.1759e-05 s, 31.5 kB/s
...

That was quick, wasn’t it? And it’s all fake!

# ls -lh
total 2.1G
-rwxr-xr-x 1 qemu qemu 8.0G Sep  2 23:57 disk.img
-rw-r--r-- 1 root root 5.1G Sep  2 23:56 ocr01.img
-rw-r--r-- 1 root root 5.1G Sep  2 23:57 ocr02.img
-rw-r--r-- 1 root root 5.1G Sep  2 23:57 ocr03.img

Add the -s flag for the real size

# ls -lhs
total 2.1G
2.1G -rwxr-xr-x 1 qemu qemu 8.0G Sep  2 23:58 disk.img
4.0K -rw-r--r-- 1 root root 5.1G Sep  2 23:56 ocr01.img
4.0K -rw-r--r-- 1 root root 5.1G Sep  2 23:57 ocr02.img
4.0K -rw-r--r-- 1 root root 5.1G Sep  2 23:57 ocr03.img

So you can see we are not using any space.

Adding disks to VMs

The next step is to add the storage to the VM. If you are using the virt-manager that comes with Oracle Linux 6.4 you have now lost as it doesn’t know about virtio-scsi. There is a solution though: (drums please) the command line! libvirt in Oracle Linux 6.4 understands virtio-scsi syntax. So here you go.

At this point you need to create a backup of your configuration files!

# virsh dumpxml <domain you want to modify> > /path/to/backup/location/domainname.xml

First, we need the virtio-scsi controller. Edit the libvirt XML file using virsh edit and add something similar to this example (syntax reference is the libvirt page). The controllers belong into the domain-devices section.

<controller type='scsi' index='0' model='virtio-scsi'/>

Adapt the example to work in your environment using the libvirt documentation as the source. Then add the disks. That’s further up in domain-devices and should look similar to this:

    <disk type='file' device='disk'>
          <driver name='qemu' type='raw'/>
          <source file='/var/lib/kvm/images/racnode1/ocr01.img'/>
          <target dev='sda' bus='scsi'/>
          <shareable/>
          <address type='drive' controller='0' bus='0' target='0' unit='0'/>
   </disk>

Make the appropriate changes to the configuration, add the other disks and off we go! When booting make sure you boot into the Red Hat kernel (unless you are on UEK3) since UEK2 doesn’t know the virtio-scsi driver. And voila-here is the disk:

[root@racnode1 ~]# fdisk -l /dev/sda

Disk /dev/sda: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

If you don’t see a disk /dev/sda although it is defined in the domain then you are booting off a kernel that doesn’t support virtio-scsi.

The remaining steps aren’t different from any other RAC installation, and I leave them as an exercise to the reader until I get that blog post written.

Appendix

If you don’t like hacking your domain configuration files directly you might as well use the virsh API to do so, have a look at these two references from the Fedora documentation set:

About these ads

3 Responses to “RAC in KVM is possible without iSCSI”

  1. Can you clarify the host OS and guest OS version? I take it you are running some version of RHEL 6 in the guest (with distribution default kernel)? Are there special considerations for the host to support virto-scsi?

    • Martin Bach said

      Yes I think I can

      Th host was OpenSuSE 12.3 with KVM that came with the distribution. The guest is Oracle Linux 6.4 with the _Red Hat_ kernel. At the time of writing UEK did not have support for virtio-scsi which is needed to support the virtio-scsi controller and the disks. Modinfo and modprobe can be used to load the module on the host, lsmod shows if it is loaded. I have to admit I got the idea after using virt-manager and the GUI showed the various storage options in a drop-down list.

  2. recurrentnull said

    Hi Martin,

    this was extremely useful for me, thanks a lot for the post! In fact when setting up my first “home RAC” I was googling for how to handle the shared storage problem, found the discussion on oracle-l and ended up using iscsi (which I documented, for myself rather ;-), in http://recurrentnull.wordpress.com/2012/11/01/rac-on-a-laptop-kvm-libvirt/)

    But this is much nicer, and worked fine on my combination of Fedora host (Heisenbug) and Centos guests (6.5) :-)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 2,316 other followers

%d bloggers like this: