Martins Blog

Trying to explain complex things in simple terms

Archive for the ‘KVM’ Category

Little things worth knowing: when a transient ASM disk failure cannot be fixed in time

Posted by Martin Bach on October 7, 2016

In the previous blog post I used libvirt and KVM in my lab environment to simulate a transient disk failure and how to recover from it. This post takes this example a step further: I am simulating another disk failure, but this time won’t pretend I can fix the issue and put it back. In other words, I simulate the effect of the disk_repair_time hitting zero.

Most of what I am covering here is an extension of the previous post, I’ll mention the main detail here for your benefit, but would like to invite you to revert to the previous post for more detail.

The idea is to show you the output of the ASM alert.log and result of the lost disk in the V$-views.

As with the previous post, the code examples in this one are for demonstration purposes only!

The setup in a nutshell

I am using Oracle Linux 7.2 with UEK4 as the host system; KVM and libvirt are responsible for the guests (virtual machines). The guest VM used for this example is named asmtest, and uses Oracle Linux 7.2 as well. There are 9 ASM disks – 8 for +DATA featuring normal redundancy in 2 failure groups. I added +RECO for the sake of completeness with external redundancy. This post is about +DATA. To keep it simple I used an Oracle Restart configuration patched to the July 2016 PSU.

Removing the disk

As with the previous example I am using libvirt to remove a “LUN” temporarily from the guest. And sure enough, the VM picks this up. This is the relevant output obtained via journalctl -f

Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Sense Key : Illegal Request [current] 
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Add. Sense: Logical unit not supported

In the previous post I keep referring to I copied the part of the ASM instance’s alert.log that showed how the disk repair timer was ticking down. This time I am simulating the case where – for whatever reason – the transient failure could not be fixed. In that case, this is what you would see in the alert.log:

2016-09-29 15:38:21.752000 +01:00
WARNING: Started Drop Disk Timeout for Disk 1 (DATA_0001) in group 1 with a value 600
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (600) secs on ASM inst 1
2016-09-29 15:41:25.379000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (416) secs on ASM inst 1
2016-09-29 15:44:29.012000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (232) secs on ASM inst 1
2016-09-29 15:47:32.643000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (48) secs on ASM inst 1
2016-09-29 15:50:36.259000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (0) secs on ASM inst 1

The last line in the above output is definitely NOT what you want to see. If at all possible, you should fix the problem causing DATA_0001 in this example from being dropped. If you see the message about the disk being dropped you are facing the inevitable rebalance operation. Here is the continued output from the ASM alert.log:

WARNING: PST-initiated drop of 1 disk(s) in group 1(.4232658126))
SQL> alter diskgroup DATA drop disk DATA_0001 force /* ASM SERVER */
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: requesting all-instance membership refresh for group=1
GMON updating for reconfiguration, group 1 at 40 for pid 23, osid 3216
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
NOTE: membership refresh pending for group 1/0xfc493cce (DATA)
GMON querying group 1 at 41 for pid 16, osid 2701
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
SUCCESS: refreshed membership for 1/0xfc493cce (DATA)
SUCCESS: alter diskgroup DATA drop disk DATA_0001 force /* ASM SERVER */
SUCCESS: PST-initiated drop disk in group 1(4232658126))
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: starting rebalance of group 1/0xfc493cce (DATA) at power 1
Starting background process ARB0
ARB0 started with pid=20, OS id=3655 
NOTE: assigning ARB0 to group 1/0xfc493cce (DATA) with 1 parallel I/O
2016-09-29 15:50:40.257000 +01:00
NOTE: restored redundancy of control and online logs in DATA
NOTE: Rebalance has restored redundancy for any existing control file or redo log in disk group DATA
NOTE: restored redundancy of control and online logs in DATA
2016-09-29 15:50:51.655000 +01:00
NOTE: requesting all-instance membership refresh for group=1
GMON updating for reconfiguration, group 1 at 42 for pid 21, osid 3719
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
2016-09-29 15:50:54.647000 +01:00
GMON updating for reconfiguration, group 1 at 43 for pid 21, osid 3719
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
2016-09-29 15:50:57.571000 +01:00
NOTE: membership refresh pending for group 1/0xfc493cce (DATA)
GMON querying group 1 at 44 for pid 16, osid 2701
GMON querying group 1 at 45 for pid 16, osid 2701
NOTE: Disk _DROPPED_0001_DATA in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 1/0xfc493cce (DATA)
NOTE: Attempting voting file refresh on diskgroup DATA
2016-09-29 15:51:03.589000 +01:00
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 1/0xfc493cce (DATA)

The last line indicates that the rebalance operation is complete. My disk group was almost empty, the rebalance operation took almost no time. But indeed, when you check V$ASM_DISK, the disk is gone:

SQL> r
  1  select name,path,disk_number,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2* from v$asm_disk where group_number = 1 order by disk_number

NAME       PATH            DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP  REPAIR_TIMER
---------- --------------- ----------- ------- ------------ ------- -------- ---------- ------------
DATA_0000  /dev/asm-disk02           0 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0002  /dev/asm-disk03           2 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0003  /dev/asm-disk06           3 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0004  /dev/asm-disk07           4 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0005  /dev/asm-disk08           5 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0006  /dev/asm-disk04           6 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0007  /dev/asm-disk05           7 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0

7 rows selected.

SQL> select count(*), failgroup from v$asm_disk where name like 'DATA%' group by failgroup;

  COUNT(*) FAILGROUP
---------- ----------
         3 BATCH1
         4 BATCH2

My system is fully operational, and the rebalance did not run into any space problems. Space problems are the last thing you want to have when rebalancing. I did some research about this subject earlier and documented it in a blog post.

In a future blog post I am going to investigate what happens when a partner disk of DATA_0001 fails while the repair timer is ticking down.

Posted in 12c Release 1, Automatic Storage Management, KVM, Linux | Leave a Comment »

Little things worth knowing: transient ASM disk failures in 12c

Posted by Martin Bach on September 30, 2016

For quite some time we have been treated nicely by ASM when it comes to transient disk failures. Since 11.1 (if memory serves me right), transient failures won’t cause an ASM disk to be dropped immediately. This is good, because it can potentially save a lot of time! When a disk is dropped from an ASM disk, a rebalance operation is unavoidable. And there is more than a chance of another one following it, because ultimately you want to add the disk back to the disk group. Well, to be fair, this is only applicable for normal or high redundancy disk groups.

The feature I just described very briefly is referred to as ASM Fast Mirror Resync, and documented in the ASM Administrator’s Guide in section Administering Oracle ASM Disk Groups.

By the way, there is another new feature you might want to look at, called FAILGROUP_REPAIR_TIME, but I’ll keep that for another post.

In this post I’d like to demonstrate how easy it is to simulate a transient failure of an ASM disk using libvirt and KVM on a host running Oracle Linux 7.2. I also wanted to have the output from the O/S and ASM alert.log as personal reference.

As usual, this procedure is strictly for YOUR HOME LAB USE ONLY and for educational purposes.

The setup

When creating the VM – named “asmtest” – I created a boot volume, and a 50 GB volume for the Oracle binaries. To keep it manageable I went for an Oracle 12.1 Restart configuration, patched to the July PSU. There are 9 ASM disks, presented as SCSI devices (not virtio), I’ll explain why in a bit. The guest O/S is Oracle Linux 7.2, as is my lab server. All virtual disks on the VM are partitioned although that is not necessary (strictly speaking).

Using the lsscsi tool (part of the sg3_utils), I created udev rules following the excellent description I found on oracle-base.com.

[oracle@asmtest ~]$ cat /etc/udev/rules.d/61-asm.rules 

KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-0", SYMLINK+="asm-disk01", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-1", SYMLINK+="asm-disk02", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-2", SYMLINK+="asm-disk03", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-3", SYMLINK+="asm-disk04", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-0", SYMLINK+="asm-disk05", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-1", SYMLINK+="asm-disk06", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-2", SYMLINK+="asm-disk07", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-3", SYMLINK+="asm-disk08", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-0", SYMLINK+="asm-disk09", OWNER="oracle", GROUP="asmdba", MODE="0660"

This is slightly more complicated than my usual setup, but was needed for device name persistence. Unfortunately libvirt does not report the same block device names as my guest VM. Whilst the host “sees” sd[h-v] my guest reported these devices as sd[a-i]. Persistent device names to the rescue! The only issue I had initially was that disks presented to the VM via virtio don’t report any SCSI IDs I need for my udev rules, or in other words, lsscsi returned no data. Presenting the block devices via the SCSI interface did the trick.

The hypothesis

This snippet from the documentation (referenced above) is exactly what I wanted to test:

After you repair the disk, run the SQL statement ALTER DISKGROUP ONLINE DISK. This statement brings a repaired disk group back online to enable writes so that no new writes are missed. This statement also starts a procedure to copy of all of the extents that are marked as stale on their redundant copies.

Here is the current configuration I am using for my DATA disk group, as seen by ASM:

SQL> select name,state,type,offline_disks from v$asm_diskgroup
  2  where name = 'DATA';

NAME                           STATE       TYPE   OFFLINE_DISKS
------------------------------ ----------- ------ -------------
DATA                           MOUNTED     NORMAL             0



SQL> select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2  from v$asm_disk where group_number = 1 order by disk_number;

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP        REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- ---------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0001                 1 /dev/asm-disk01 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0

You can see 2 failure groups, batch1 and batch2, and all disks are available/online. The operating system mapping works perfectly thanks to my udev rules:

[oracle@asmtest ~]$ ls -l /dev/asm-disk0*
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk01 -> sda1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk02 -> sdd1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk03 -> sdc1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk04 -> sdb1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk05 -> sde1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk06 -> sdh1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk07 -> sdg1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk08 -> sdf1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk09 -> sdi1

Let’s detach a disk and cause a transient failure

Libvirt is really cool, it allows me to modify my VM configuration at runtime. One example is removing of disks, using the detach-disk command:

[root@host ~]# virsh help detach-disk
  NAME
    detach-disk - detach disk device

  SYNOPSIS
    detach-disk <domain> <target> [--persistent] [--config] [--live] [--current]

  DESCRIPTION
    Detach disk device.

  OPTIONS
    [--domain] <string>  domain name, id or uuid
    [--target] <string>  target of disk device
    --persistent     make live change persistent
    --config         affect next boot
    --live           affect running domain
    --current        affect current domain

But which disk to remove? The disks attached to a VM can be listed using the domblklist command:

[root@host ~]# virsh domblklist asmtest
Target     Source
------------------------------------------------
vda        /path/to/slow/disk/asmtest
vdb        /path/to/slow/disk/asmtest-1
sdh        /path/to/fast/disk/asmtest-asm01.img
sdi        /path/to/fast/disk/asmtest-asm02.img
sdj        /path/to/fast/disk/asmtest-asm03.img
sdk        /path/to/fast/disk/asmtest-asm04.img
sdo        /path/to/fast/disk/asmtest-asm05.img
sdp        /path/to/fast/disk/asmtest-asm06.img
sdq        /path/to/fast/disk/asmtest-asm07.img
sdr        /path/to/fast/disk/asmtest-asm08.img
sdv        /path/to/fast/disk/asmtest-asm09.img

I mentioned previously that the information provided by the host does not necessarily match that of the guest. Since I have persistent device names though in my VM there shouldn’t be an issue. Let’s take a disk away, but only temporarily! Since my O/S and Oracle binaries reside on disks presented to the guest as virtio devices, I know they are named vd[a-z] and better left alone. That leaves sdh as the first disk to remove.

[root@host ~]# virsh detach-disk asmtest sdh --live
Disk detached successfully

And the disk is gone.

And what is happening?

The disk removal has immediate effect on the ASM instance. Switching over to my VM I can see the following information recorded in the ASM alert.log:

2016-09-29 13:45:13.936000 +01:00
WARNING: Write Failed. group:1 disk:1 AU:1 offset:1044480 size:4096
path:/dev/asm-disk01
 incarnation:0xe969ccd4 asynchronous result:'I/O error'
 subsys:System krq:0x7fda4f0b8210 bufp:0x7fda4f20b000 osderr1:0x69b5 osderr2:0x0
 IO elapsed time: 0 usec Time waited on I/O: 0 usec
WARNING: Hbeat write to PST disk 1.3916025044 in group 1 failed. [2]
NOTE: process _b000_+asm (4346) initiating offline of disk 1.3916025044 (DATA_0001) with mask 0x7e in group 1 (DATA) without client assisting
NOTE: checking PST: grp = 1
GMON checking disk modes for group 1 at 13 for pid 22, osid 4346
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: checking PST for grp 1 done.
NOTE: sending set offline flag message (844741558) to 1 disk(s) in group 1
WARNING: Disk 1 (DATA_0001) in group 1 mode 0x7f is now being offlined
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 14 for pid 22, osid 4346
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x7e, op = clear
GMON updating disk modes for group 1 at 15 for pid 22, osid 4346
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: cache closing disk 1 of grp 1: DATA_0001
NOTE: PST update grp = 1 completed successfully
2016-09-29 13:46:53.300000 +01:00
WARNING: Started Drop Disk Timeout for Disk 1 (DATA_0001) in group 1 with a value 600
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (600) secs on ASM inst 1

I also had a journalctl -f running in a different session. Unsurprisingly the O/S also noticed the disk disappearing:

Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache
Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 13:45:13 asmtest kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 13:45:13 asmtest kernel: Sense Key : Illegal Request [current] 
Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 13:45:13 asmtest kernel: Add. Sense: Logical unit not supported

A transient failure

In yet another session I can see that the disk is about to go, the repair timer has started ticking down, and is decremented as time passes. This did not happen straight away, there was a little delay of maybe a couple of minutes after the disk was removed.

SQL> r
  1  select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2* from v$asm_disk where group_number = 1 order by disk_number

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP       REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- --------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0001                 1                 MISSING UNKNOWN      OFFLINE NORMAL   BATCH1                   600
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0

8 rows selected.

SQL> r
  1  select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2* from v$asm_disk where group_number = 1 order by disk_number

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP       REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- --------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0001                 1                 MISSING UNKNOWN      OFFLINE NORMAL   BATCH1                   416
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0

8 rows selected.

Another day I would like to run an additional test to see what happens when the repair_timer is actually down to 0, but for now let’s add the disk back in. On the host, I use libvirt again for this:

[root@host ~]# virsh attach-disk asmtest /path/to/fast/disk/asmtest-asm01.img sdh --live
Disk attached successfully

As soon as the prompt returns, I can see the guest O/S picking the disk up.

Sep 29 13:54:09 asmtest kernel: scsi 2:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    1.5. PQ: 0 ANSI: 5
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: Attached scsi generic sg0 type 0
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] 10485760 512-byte logical blocks: (5.36 GB/5.00 GiB)
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Write Protect is off
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Mode Sense: 63 00 00 08
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 29 13:54:09 asmtest kernel:  sda: sda1
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Attached SCSI disk

The documentation states that I should now be able to online the disk again, which I tried. The result is shown below:

SQL> alter diskgroup data online all;

Diskgroup altered.

I am showing you the complete ASM alert.log output here, feel free to skip to the bottom:

2016-09-29 13:54:44.774000 +01:00
SQL> alter diskgroup data online all
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: initiating resync of disk group 1 disks
DATA_0001 (1)
NOTE: process _user3750_+asm (3750) initiating offline of disk 1.3916025044 (DATA_0001) with mask 0x7e in group 1 (DATA) without client assisting
NOTE: sending set offline flag message (4172832432) to 1 disk(s) in group 1
WARNING: Disk 1 (DATA_0001) in group 1 mode 0x1 is now being offlined
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 17 for pid 20, osid 3750
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: PST update grp = 1 completed successfully
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x7e, op = clear
GMON updating disk modes for group 1 at 18 for pid 20, osid 3750
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: PST update grp = 1 completed successfully
NOTE: requesting all-instance membership refresh for group=1
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x11, op = assign
GMON updating disk modes for group 1 at 19 for pid 20, osid 3750
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: requesting all-instance disk validation for group=1
NOTE: disk validation pending for 1 disk in group 1/0xceb93c35 (DATA)
NOTE: Found /dev/asm-disk01 for disk DATA_0001
NOTE: completed disk validation for 1/0xceb93c35 (DATA)
NOTE: discarding redo for group 1 disk 1
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x19, op = assign
GMON updating disk modes for group 1 at 20 for pid 20, osid 3750
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: membership refresh pending for group 1/0xceb93c35 (DATA)
GMON querying group 1 at 21 for pid 16, osid 2535
NOTE: cache opening disk 1 of grp 1: DATA_0001 path:/dev/asm-disk01
SUCCESS: refreshed membership for 1/0xceb93c35 (DATA)
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x5d, op = assign
GMON updating disk modes for group 1 at 22 for pid 20, osid 3750
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x7d, op = assign
GMON updating disk modes for group 1 at 23 for pid 20, osid 3750
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
SUCCESS: alter diskgroup data online all
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: starting rebalance of group 1/0xceb93c35 (DATA) at power 1
Starting background process ARB0
ARB0 started with pid=23, OS id=4986
NOTE: assigning ARB0 to group 1/0xceb93c35 (DATA) with 1 parallel I/O
2016-09-29 13:54:48.265000 +01:00
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x7f, op = assign
GMON updating disk modes for group 1 at 24 for pid 23, osid 4986
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: reset timers for disk: 1
NOTE: completed online of disk group 1 disks
DATA_0001 (1)
2016-09-29 13:54:49.580000 +01:00
NOTE: stopping process ARB0
NOTE: requesting all-instance membership refresh for group=1
SUCCESS: rebalance completed for group 1/0xceb93c35 (DATA)
NOTE: membership refresh pending for group 1/0xceb93c35 (DATA)
GMON querying group 1 at 25 for pid 16, osid 2535
SUCCESS: refreshed membership for 1/0xceb93c35 (DATA)
2016-09-29 13:54:51.286000 +01:00
NOTE: Attempting voting file refresh on diskgroup DATA

This looks promising, but what about V$ASM_DISK?

SQL> select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2  from v$asm_disk where group_number = 1 order by disk_number
  3  /

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP       REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- --------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0001                 1 /dev/asm-disk01 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0

8 rows selected.

SQL> 

Hurray, I have my disk back! And at no time did I see a rebalance operation. Thanks ASM!

Posted in 12c Release 1, KVM, Linux | 1 Comment »

Installing Oracle 12.1.0.2 RAC on Oracle Linux 7-part 2

Posted by Martin Bach on January 12, 2015

In the first part of the article series you could read how a kickstart file made the installation of Oracle Linux 7 a lot more bearable. In this part of the series it’s all about configuring the operating system. The installation of Grid Infrastructure and the Oracle database is for another set of posts.

There are quite some differences between Oracle Linux 6 and 7

To me the transition from Oracle Linux 6 to 7 feels like the step from Solaris 9 to 10 at the time. Personally I think that a lot has changed. Although, it’s fair to say that it has been quite some time it has been announced that the network stack commands we know and love are deprecated and might go… Even with Oracle Linux 6 there was a threat that network manager would now be the only tool to modify your network settings (which thankfully was not the case). A lot of efforts of the Linux community have now come to fruition, and it’s time to adjust to the future. Even when it’s painful (and it is, at least a bit).

Configuring the network

The warning has been out there quite a while but now it seems to be true-no more system-config-network-tui to configure the network! No more ifconfig! Oh dear-quite a bit of learning to be done. Luckily someone else has done all the legwork and documented the changes. A good example is this one:

https://dougvitale.wordpress.com/2011/12/21/deprecated-linux-networking-commands-and-their-replacements/

So first of all-don’t fear: although all network interfaces are configured using network manager now, you can still use a command line tool: nmtui. After trying it out I have to say I’m not really convinced about its usability. What appears better is the use of the nmcli, network manager command line tool. It’s use is quite confusing, and it appears to me as if the whole network manager toolset was developed for laptop users, not servers. But I digress. I have a few interfaces in my RAC VM, the first was configured during the installation, eth[1-3] aren’t configured yet.

[root@localhost ~]# nmcli connection show
NAME         UUID                                  TYPE            DEVICE
System eth0  77e3f8a9-76d0-4051-a8f2-cbbe39dab089  802-3-ethernet  eth0
[root@localhost ~]# nmcli device status
DEVICE  TYPE      STATE         CONNECTION
eth0    ethernet  connected     System eth0
eth1    ethernet  disconnected  --
eth2    ethernet  disconnected  --
eth3    ethernet  disconnected  --
lo      loopback  unmanaged     --
[root@localhost ~]#

At this point I have used eth0 as the management network (similar to the way Exadata does) and will use the other networks for the database. eth1 will act as the public network, eth2 and eth3 will be private.

Although the network interfaces can be named differently for device name persistence I stick with the old naming for now. I don’t want to run into trouble with the installer just yet. On physical hardware you are very likely to see very different network interface names, the kernel uses a naming scheme identifying where the cards are (on the main board, or in extension cards for example). I’ll write another post about that soon.

Using dnsmasq (on the host) I configure my hosts for these addresses:

[root@ol62 ~]# grep rac12pri /etc/hosts
192.168.100.107	rac12pri1.example.com		rac12pri1
192.168.100.108	rac12pri1-vip.example.com	rac12pri1-vip
192.168.100.109	rac12pri2.example.com		rac12pri2
192.168.100.110	rac12pri2-vip.example.com	rac12pri2-vip
192.168.100.111	rac12pri-scan.example.com	rac12pri-scan
192.168.100.112	rac12pri-scan.example.com	rac12pri-scan
192.168.100.113	rac12pri-scan.example.com	rac12pri-scan

Configuring the interface is actually not too hard once you got the hang of it. It took me a little while to get it though… It almost appears as if something that was simple and easy to use was made difficult to use.

[root@localhost ~]# nmcli con add con-name eth1 ifname eth1 type ethernet ip4 192.168.100.107/24 gw4 192.168.100.1
[root@localhost ~]# nmcli con add con-name eth2 ifname eth2 type ethernet ip4 192.168.101.107/24
[root@localhost ~]# nmcli con add con-name eth3 ifname eth3 type ethernet ip4 192.168.102.107/24 

[root@localhost ~]# nmcli con show
NAME         UUID                                  TYPE            DEVICE
eth2         ccc7f592-b563-4b9d-a36b-2b45809e4643  802-3-ethernet  eth2
eth1         ae897dee-42ff-4ccd-843b-7c97ba0d5315  802-3-ethernet  eth1
System eth0  77e3f8a9-76d0-4051-a8f2-cbbe39dab089  802-3-ethernet  eth0
eth3         b6074c9a-dcc4-4487-9a8a-052e4c60bbca  802-3-ethernet  eth3

I can now verify the IP addresses using the “ip” tool (ifconfig was not installed, I haven’t yet checked if there was a compatibility package though)

[root@localhost ~]# ip addr show
1: lo: &lt;LOOPBACK,UP,LOWER_UP&gt; mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:6e:6f:67 brd ff:ff:ff:ff:ff:ff
    inet 192.168.150.111/24 brd 192.168.150.255 scope global eth0
    inet6 fe80::5054:ff:fe6e:6f67/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:96:ad:88 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.107/24 brd 192.168.100.255 scope global eth1
    inet6 fe80::5054:ff:fe96:ad88/64 scope link
       valid_lft forever preferred_lft forever
4: eth2: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:c1:cc:8e brd ff:ff:ff:ff:ff:ff
    inet 192.168.101.107/24 brd 192.168.101.255 scope global eth2
    inet6 fe80::5054:ff:fec1:cc8e/64 scope link
       valid_lft forever preferred_lft forever
5: eth3: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:7e:59:45 brd ff:ff:ff:ff:ff:ff
    inet 192.168.102.107/24 brd 192.168.102.255 scope global eth3
    inet6 fe80::5054:ff:fe7e:5945/64 scope link
       valid_lft forever preferred_lft forever

Now what’s left is setting the hostname-which is a simple call to hostnamectl –static set-hostname rac12pri1. nmcli gives you an interface to changing the hostname as well. I repeated the steps for node 2, they are identical except for the network IP addresses of course.

So that concludes the network setup.

Managing linux daemons

If you are curious about setting services at runlevel, then there’ll be another surprise:

[root@rac12pri2 ~]# chkconfig --list

Note: This output shows SysV services only and does not include native
      systemd services. SysV configuration data might be overridden by native
      systemd configuration.

      If you want to list systemd services use 'systemctl list-unit-files'.
      To see services enabled on particular target use
      'systemctl list-dependencies [target]'.

iprdump        	0:off	1:off	2:on	3:on	4:on	5:on	6:off
iprinit        	0:off	1:off	2:on	3:on	4:on	5:on	6:off
iprupdate      	0:off	1:off	2:on	3:on	4:on	5:on	6:off
netconsole     	0:off	1:off	2:off	3:off	4:off	5:off	6:off
network        	0:off	1:off	2:on	3:on	4:on	5:on	6:off
pmcd           	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmie           	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmlogger       	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmmgr          	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmproxy        	0:off	1:off	2:off	3:off	4:off	5:off	6:off
pmwebd         	0:off	1:off	2:off	3:off	4:off	5:off	6:off
rhnsd          	0:off	1:off	2:on	3:on	4:on	5:on	6:off
[root@rac12pri2 ~]#

If you just got familiar with upstart then there are some bad news: upstart is now replaced with systemd… This might be the right time to read up on that if you aren’t familiar with it yet:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/System_Administrators_Guide/chap-Managing_Services_with_systemd.html

Things are a little different with that, so here is an example how to enable and start the NTP service. It has to be installed first if that hasn’t been the case. You also should add the -x flag in /etc/sysconfig/ntpd. First I would like to see if the service is available. You use systemctl for this-so instead of a chkconfig ntpd –list you call systemctl as shown:

[root@rac12pri ~]# systemctl list-units --type service --all | grep ntpd
ntpd.service                                                                              loaded inactive dead    Network Time Service
ntpdate.service                                                                           loaded inactive dead    Set time via NTP

I have to get used to the new syntax: previously you used “service <whatever> status” and then, if you needed, typed backspace a few times and changed status to start. The new syntax is closer to human language but less practical: systemctl status <service>. Changing status to start requires more typing.

The check proved that the service exists (i.e. the NTP package is installed), but it is not started. We can change this:

[root@rac12pri ~]# systemctl enable ntpd.service
[root@rac12pri ~]# systemctl start ntpd.service
[root@rac12pri ~]# systemctl status ntpd.service
ntpd.service - Network Time Service
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled)
Active: active (running) since Tue 2014-12-16 15:38:47 GMT; 1s ago
Process: 5179 ExecStart=/usr/sbin/ntpd -u ntp:ntp $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 5180 (ntpd)
CGroup: /system.slice/ntpd.service
└─5180 /usr/sbin/ntpd -u ntp:ntp -g -x

Dec 16 15:38:47 rac12pri ntpd[5180]: Listen normally on 8 eth1 fe80::5054:ff:fe96:ad88 UDP 123
Dec 16 15:38:47 rac12pri ntpd[5180]: Listen normally on 9 eth2 fe80::5054:ff:fec1:cc8e UDP 123
Dec 16 15:38:47 rac12pri ntpd[5180]: Listen normally on 10 eth3 fe80::5054:ff:fe7e:5945 UDP 123
Dec 16 15:38:47 rac12pri ntpd[5180]: Listen normally on 11 eth0 fe80::5054:ff:fe6e:6f67 UDP 123
Dec 16 15:38:47 rac12pri ntpd[5180]: Listening on routing socket on fd #28 for interface updates
Dec 16 15:38:47 rac12pri ntpd[5180]: 0.0.0.0 c016 06 restart
Dec 16 15:38:47 rac12pri ntpd[5180]: 0.0.0.0 c012 02 freq_set ntpd 0.000 PPM
Dec 16 15:38:47 rac12pri ntpd[5180]: 0.0.0.0 c011 01 freq_not_set
Dec 16 15:38:47 rac12pri systemd[1]: Started Network Time Service.
Dec 16 15:38:48 rac12pri ntpd[5180]: 0.0.0.0 c614 04 freq_mode
[root@rac12pri ~]#

The call to “systemctl enable” replaces an invocation of chkconfig to automatically start ntpd as a service (chkconfig ntpd on). Starting the service does not produce any output, hence the need to check the status.

There is a slight caveat with the use of NTP: it is not the default time keeping service. Another tool, named chronyd is used instead.

This causes a problem after the next reboot: chronyd will be started, NTPd won’t be. The Red Hat documentation therefore has a section on how to switch:

[root@rac12pri ~]# systemctl stop chronyd
[root@rac12pri ~]# systemctl disable chronyd
[root@rac12pri ~]# systemctl status chronyd

Storage

Shared storage is provided by KVM. I am using my SSDs in the lab from where I create a few “LUNs”. These must explicitly be made “shareable” to be accessible by more than one guest. Since 12.1.0.2.0 Oracle installs a database for the cluster health monitor by default. Currently I use the following setup for my lab 12.1.0.2 clusters:

  1. +CHM (external redundancy) – 1x 15GB
  2. +OCR (normal redundancy) – 3x 2 GB
  3. +DATA (external redundancy) – 1 x 15GB
  4. +RECO (external redundancy) – 1 x 10 GB

If you use the guided installation of Grid Infrastructure the installer will prompt you for a single disk group only. This means that the CHM database as well as the OCR and voting files be installed in that disk group. I prefer to separate them though, which is why I create a second disk group OCR after the installation has completed and move the voting files and OCR out of +CHM.

DATA and RECO are standard Exadata disk groups and I like to keep things consistent for myself.

I use fdisk to partition the future ASM disks with 1 partition spanning the whole LUN.

Other tasks

A lot of the other pre-installation tasks can actually be performed during the kickstart installation. I still like to use SELinux in permissive mode even though-according to Requirements for Installing Oracle Database 12.1 on RHEL6 or OL6 64-bit (x86-64) (Doc ID 1529864.1)-selinux can be in “enforcing”. The directive in the kickstart file is

selinux –permissive

You shouldn’t have to install additional packages-all packages to be installed should go into the %packages section of the file. Simply copy the package names from the official documentation and paste below the last package in the section. There is one exception to the rule: cvuqdisk must be installed from the Oracle installation media.

Settings for /etc/sysctl.conf and /etc/security/limits.conf can also be made in the kickstart file as shown in the first part of this series.

Storage to be made available to RAC must have permissions set. Since there isn’t an ASMLib in Oracle Linux 7 to my knowledge UDEV will have to be used, and my udev configuration file, too, is in the first part.

To make sure my user and group IDs for the oracle and grid account are the same I create the accounts in the kickstart file as well. Passwords are deliberately not set-they may evolve and I can’t possibly remember them all :)

User equivalence can be set up using a technique I have already described in an earlier blog post. Although the user equivalence setup can be deferred to when you install Grid Infrastructure I still perform it before to allow me to run the cluster verification tool with the -fixup option.

Posted in 12c Release 1, KVM, Linux, RAC | 3 Comments »

Upgrading clustered Grid Infrastructure to 12.1.0.2 from 12.1.0.1.3

Posted by Martin Bach on July 29, 2014

Oracle 12.1.0.2 is out, after lots of announcements the product has finally been released. I had just extended my 12.1.0.1.3 cluster to 3 nodes and was about to apply the July PSU when I saw the news. So why not try and upgrade to the brand new thing?

What struck me at first was the list of new features … Oracle’s patching strategy has really changed over time. I remember the days when Oracle didn’t usually add additional features into point releases. Have a look at the new 12.1.0.2 features and that would possibly qualify to be 12c Release 2…

In summary the upgrade process is actually remarkably simple, and hasn’t changed much since earlier versions of the software. Here are the steps in chronological order.

./runInstaller

I don’t know how often I have type ./ruinInstaller instead of runInstaller, but here you go. This is the first wizard screen after splash screen has disappeared.

GI 12.1.0.2-001

Naturally I went for the upgrade of my cluster. Before launching the installer though I made sure that everything was in working order by means of cluvfy. On to the next screen:

GI 002

always install English only. Troubleshooting Oracle in a different language (especially if I don’t speak or understand) is really hard so I avoid it in the first place.

Over to the screen that follows and oops-my SYSDG disk group (containing OCR and voting files) is too small. Bugger. In the end I added 3 new 10GB LUNs and dropped the old ones. But it took me a couple of hours to do so. Worse: it wasn’t even needed, but proved to be a good learning exercise. The requirement to have that much free space is most likely caused by the management repository and related infrastructure.

GI 003 error

Back to this screen everything is in best order, the print screen has been taken just prior to the change to the next. Note the button to skip the updates on unreachable nodes. Not sure if I wanted to do that though.

GI 003

I haven’t got OEM agents on the servers (yet) so I’m skipping the registration for now. You can always do that later.

GI 004

This screen is familiar; I am keeping my choices from the initial installation. Grid Infrastructure is owned by Oracle despite the ASMDBA and ASMADMIN groups by the way.

GI 005

On the screen below you define where on the file system you want to install Grid Infrastructure. Remember that for clustered deployments the ORACLE_HOME cannot be in the path of the ORACLE_BASE. For this to work you have to jump to the command line and create the directory on all servers and grant ownership to the GI owner account (oracle in this case, could be grid as well).

GI 006

Since I like to be in control I don’t allow Oracle to run the root scripts. I didn’t in 12.1.0.1 either:

GI 007

In that screen you notice the familiar checking of requirements.

GI 008

In my case there were only a few new ones shown here. This is a lab server so I don’t plan on using swap, but the kernel parameter “panic_on_oops” is new. I also didn’t set the reverse path filtering which I corrected before continuing. Interestingly the installer points out that there is a change in the asm_diskstring with its implications.

One thing I haven’t recorded here (because I am using Oracle Linux 6.5 with UEK3) is the requirement for using a 2.6.39 kernel – that sounds like UEK2 to me.

Update: my system is Oracle Linux 6.5, not Red Hat. See Sigrid’s comments below: for Red Hat Linux there doesn’t seem to be a similar requirement to use UEK 2, which matches the documentation (Installation guide for Grid Infrastructure/Linux).

Another interesting case was that the kernel_core pattern wasn’t equal on all nodes. Turned out that 2 nodes had the package abrt installed, and the other two didn’t. Once the packages were installed on all nodes, the warning went away.

GI 009

Unfortunately I didn’t take a print screen of the summary in case you wonder where that is. I went straight into the installation phase:

GI 010

At the end of which you are prompted to run the upgrade scripts. Remember to run them in screen and pay attention to the order you run them in.

GI 011

The output from the last node is shown here:

[root@rac12node3 ~]# /u01/app/12.1.0.2/grid/rootupgrade.sh
Performing root user operation.

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u01/app/12.1.0.2/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/12.1.0.2/grid/crs/install/crsconfig_params
2014/07/26 16:15:51 CLSRSC-4015: Performing install or upgrade action for Oracle
Trace File Analyzer (TFA) Collector.

2014/07/26 16:19:58 CLSRSC-4003: Successfully patched Oracle Trace File Analyzer (TFA) Collector.

2014/07/26 16:20:02 CLSRSC-464: Starting retrieval of the cluster configuration data

2014/07/26 16:20:51 CLSRSC-465: Retrieval of the cluster configuration data has successfully completed.

2014/07/26 16:20:51 CLSRSC-363: User ignored prerequisites during installation

ASM configuration upgraded in local node successfully.

2014/07/26 16:21:16 CLSRSC-466: Starting shutdown of the current Oracle Grid Infrastructure stack

2014/07/26 16:22:51 CLSRSC-467: Shutdown of the current Oracle Grid Infrastructure stack has successfully
completed.

OLR initialization - successful
2014/07/26 16:26:53 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.conf'

CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
2014/07/26 16:34:34 CLSRSC-343: Successfully started Oracle Clusterware stack

clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 12c Release 1.
Successfully taken the backup of node specific configuration in OCR.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
2014/07/26 16:35:55 CLSRSC-478: Setting Oracle Clusterware active version on the last node to be upgraded

2014/07/26 16:35:55 CLSRSC-482: Running command: '/u01/app/12.1.0.2/grid/bin/crsctl set crs activeversion'

Started to upgrade the Oracle Clusterware. This operation may take a few minutes.
Started to upgrade the CSS.
The CSS was successfully upgraded.
Started to upgrade Oracle ASM.
Started to upgrade the CRS.
The CRS was successfully upgraded.
Successfully upgraded the Oracle Clusterware.
Oracle Clusterware operating version was successfully set to 12.1.0.2.0
2014/07/26 16:38:51 CLSRSC-479: Successfully set Oracle Clusterware active version

2014/07/26 16:39:13 CLSRSC-476: Finishing upgrade of resource types

2014/07/26 16:39:26 CLSRSC-482: Running command: 'upgrade model  -s 12.1.0.1.0 -d 12.1.0.2.0 -p last'

2014/07/26 16:39:26 CLSRSC-477: Successfully completed upgrade of resource types

2014/07/26 16:40:17 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded

Did you notice that TFA has been added? Trace File Analyzer is another of these cool things to play with, it was available with 11.2.0.4 and as an add-on to 12.1.0.1.

Result!

Back to OUI to complete the upgrade. After which cluvfy performs a final check and I’m done. Prove it worked:

[oracle@rac12node1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Thu Jul 26 17:13:02 2014

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> select banner from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
PL/SQL Release 12.1.0.2.0 - Production
CORE	12.1.0.2.0	Production
TNS for Linux: Version 12.1.0.2.0 - Production
NLSRTL Version 12.1.0.2.0 - Production

SQL>

In another post I’ll detail the upgrade for my databases. I am particularly interested about the unplug/plug way of migrating…

Posted in 12c Release 1, KVM, Linux, RAC | Tagged: , | 6 Comments »

RAC 12c enhancements: adding an additional SCAN-part 1

Posted by Martin Bach on April 22, 2014

Based on customer request Oracle has added the functionality to add a second SCAN, completely independent of the SCAN defined/created during the cluster creation. Why would you want to use this feature? A few reasons that spring to mind are:

  • Consolidation: customers insist on using a different network
  • Separate network for Data Guard traffic

To demonstrate the concept I am going to show you in this blog post how I

  1. Add a new network resource
  2. Create new VIPs
  3. Add a new SCAN
  4. Add a new SCAN listener

It actually sounds more complex than it is, but I have a feeling I need to split this article in multiple parts as it’s far too long.

The lab setup

When you install RAC 11.2 and 12.1 you are prompted to specify a Single Client Access Name, or SCAN. This SCAN is usually defined in the corporate DNS server and resolves to 3 IP addresses. This allows for an easy way to implement client-side load balancing. The SCAN is explained in more detail in Pro Oracle Database 11g RAC on Linux for 11.2 and on OTN for 11.2 and 12.1. To spice the whole configuration up a little bit I decided to use RAC One Node on the clusters I am using for this demonstration.

I created 2 12.1.0.1.2 clusters for this Data Guard test. Hosts ron12cprinode1 and ron12cprinode2 form the primary cluster, ron12csbynode1 and ron12csbynode2 will form the standby cluster. The RAC One Node database is named RON:

[oracle@ron12cprinode1 ~]$ srvctl config database -db ron
Database unique name: ron
Database name: ron
Oracle home: /u01/app/oracle/product/12.1.0.1/dbhome_1
Oracle user: oracle
Spfile: +DATA/ron/spfilepri.ora
Password file: +DATA/ron/orapwpri
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: ron
Database instances:
Disk Groups: RECO
Mount point paths:
Services: ron12c
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: pri
Candidate servers: ron12cprinode1,ron12cprinode2
Database is administrator managed
[oracle@ron12cprinode1 ~]$

To make things even more interesting I defined my ORACLE_SID prefix on the primary to be “pri” and “sby” on the standby.

[oracle@ron12cprinode1 ~]$ ps -ef | grep smon
oracle    2553     1  0 Feb06 ?        00:00:09 asm_smon_+ASM1
oracle   15660 15578  0 05:05 pts/3    00:00:00 grep smon
oracle   28241     1  0 Feb07 ?        00:00:18 ora_smon_pri_1
[oracle@ron12cprinode1 ~]$

A quick check with gpnptool reveals the network usage before the addition of the second SCAN:

<gpnp:Network-Profile>
 <gpnp:HostNetwork id="gen" HostName="*">
  <gpnp:Network id="net1" IP="192.168.100.0" Adapter="eth0" Use="public"/>
  <gpnp:Network id="net2" IP="192.168.101.0" Adapter="eth1" Use="cluster_interconnect"/>
 </gpnp:HostNetwork>
</gpnp:Network-Profile>

There is the default network, (“netnum 1”) that is created on the network defined as “public” during the installation. I have another spare network port (eth2) reserved for the new network and Data Guard traffic. Currently network 1 is the only one available.

[root@ron12cprinode1 ~]# srvctl config network
Network 1 exists
Subnet IPv4: 192.168.100.0/255.255.255.0/eth0, static
Subnet IPv6:

As you can see RAC 12c now supports IPv6. I have another network available that I want to make available for Data Guard traffic. For this purpose I added all nodes into DNS. I am a bit old-fashioned when it comes to DNS, I am still using bind most of the time. Here is an excerpt of my reverse name resolution file:

; hosts - primary cluster
50	PTR	ron12cprinode1.dg.example.com.
51	PTR	ron12cprinode1-vip.example.com.
52	PTR	ron12cprinode2.dg.example.com.
53	PTR	ron12cprinode2-vip.dg.example.com.
; Data Guard SCAN - primary cluster
54	PTR	ron12cpri-scan.dg.example.com.
55	PTR	ron12cpri-scan.dg.example.com.
56	PTR	ron12cpri-scan.dg.example.com.

; hosts - standby cluster
57	PTR	ron12csbynode1.dg.example.com.
58	PTR	ron12csbynode1-vip.dg.example.com.
59	PTR	ron12csbynode2.dg.example.com.
60	PTR	ron12csbynode2-vip.dg.example.com.
; Data Guard SCAN - standby cluster
61	PTR	ron12csby-scan.dg.example.com.
62	PTR	ron12csby-scan.dg.example.com.
63	PTR	ron12csby-scan.dg.example.com.

The domain is *.dg.example.com, the primary database client traffic will be routed through *.example.com.

Adding the new network

The first step to be performed is to make Clusterware aware of the second network. I am doing this on both sides of the cluster. Notice that the primary nodes are called *pri* whereas the standby cluster is called *sby*

[root@ron12cprinode1 ~]# srvctl add network -netnum 2 -subnet 192.168.102.0/255.255.255.0/eth2 -nettype static -verbose
Successfully added Network.

[root@ron12csbynode1 ~]# srvctl add network -netnum 2 -subnet 192.168.102.0/255.255.255.0/eth2 -nettype static -verbose
Successfully added Network.

So this worked, now I have 2 networks:

[root@ron12cprinode1 ~]# srvctl config network
Network 1 exists
Subnet IPv4: 192.168.100.0/255.255.255.0/eth0, static
Subnet IPv6:
Network 2 exists
Subnet IPv4: 192.168.102.0/255.255.255.0/eth2, static
Subnet IPv6:
[root@ron12cprinode1 ~]#

In the next step I have to add VIPs for the new nodes on the *.dg.example.com subnet. The VIPs must be added on all cluster nodes, 4 in my case.

[oracle@ron12cprinode2 ~]$ srvctl add vip -h

Adds a VIP to the Oracle Clusterware.

Usage: srvctl add vip -node <node_name> -netnum <network_number> -address {<name>|<ip>}/<netmask>[/if1[|if2...]] [-skip] [-verbose]
    -node     <node_name>          Node name
    -address                       <vip_name|ip>/<netmask>[/if1[|if2...]] VIP address specification for node applications
    -netnum   <net_num>            Network number (default number is 1)
    -skip                          Skip reachability check of VIP address
    -verbose                       Verbose output
    -help                          Print usage
[oracle@ron12cprinode2 ~]$

So I did this on each node in my cluster

[root@ron12cprinode1 ~]# srvctl add vip -node ron12cprinode1 -netnum 2 -address 192.168.102.51/255.255.255.0/eth2 -verbose
Network exists: 2/192.168.102.0/255.255.255.0/eth2, type static
Successfully added VIP.
[root@ron12cprinode2 ~]# srvctl add vip -node ron12cprinode2 -netnum 2 -address 192.168.102.53/255.255.255.0/eth2 -verbose
Network exists: 2/192.168.102.0/255.255.255.0/eth2, type static
Successfully added VIP.

[root@ron12csbynode1 ~]# srvctl add vip -node ron12csbynode1 -netnum 2 -address 192.168.102.58/255.255.255.0/eth2 -verbose
Network exists: 2/192.168.102.0/255.255.255.0/eth2, type static
Successfully added VIP.
[root@ron12csbynode2 ~]# srvctl add vip -node ron12csbynode2 -netnum 2 -address 192.168.102.60/255.255.255.0/eth2 -verbose
Network exists: 2/192.168.102.0/255.255.255.0/eth2, type static
Successfully added VIP.

And I need to start the VIPs. They have some funny names as you can see in crsctl status resource (the names can’t be defined, see output of srvctl add scan -h above)

[root@ron12cprinode1 ~]# srvctl status vip -vip ron12cprinode1_2
VIP 192.168.102.51 is enabled
VIP 192.168.102.51 is not running
[root@ron12cprinode1 ~]# srvctl start vip -vip ron12cprinode1_2
[root@ron12cprinode1 ~]# srvctl start vip -vip ron12cprinode2_2
[root@ron12cprinode1 ~]# srvctl status vip -vip ron12cprinode1_2
VIP 192.168.102.51 is enabled
VIP 192.168.102.51 is running on node: ron12cprinode1
[root@ron12cprinode1 ~]#

Add the second SCAN

At this time you can add the second SCAN. The command syntax is shown here:

[oracle@ron12cprinode1 ~]$ srvctl add scan -h

Adds a SCAN VIP to the Oracle Clusterware.

Usage: srvctl add scan -scanname <scan_name> [-netnum <network_number>]
    -scanname <scan_name>          Domain name qualified SCAN name
    -netnum  <net_num>             Network number (default number is 1)
    -subnet                        <subnet>/<netmask>[/if1[|if2...]] NET address specification for network
    -help                          Print usage

Implemented on my first cluster node the command is easier to comprehend.

[root@ron12cprinode1 ~]# srvctl add scan -scanname ron12cpri-dgscan.dg.example.com -netnum 2

[root@ron12cprinode1 ~]# srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node ron12cprinode2
SCAN VIP scan2 is enabled
SCAN VIP scan2 is running on node ron12cprinode1
SCAN VIP scan3 is enabled
SCAN VIP scan3 is running on node ron12cprinode1

You need to create the SCAN on both clusters. On my primary cluster the SCAN has been created with the following configuration:

[root@ron12cprinode1 ~]# srvctl config scan -netnum 2
SCAN name: ron12cpri-dgscan.dg.example.com, Network: 2
Subnet IPv4: 192.168.102.0/255.255.255.0/eth2
Subnet IPv6:
SCAN 0 IPv4 VIP: 192.168.102.54
SCAN name: ron12cpri-dgscan.dg.example.com, Network: 2
Subnet IPv4: 192.168.102.0/255.255.255.0/eth2
Subnet IPv6:
SCAN 1 IPv4 VIP: 192.168.102.55
SCAN name: ron12cpri-dgscan.dg.example.com, Network: 2
Subnet IPv4: 192.168.102.0/255.255.255.0/eth2
Subnet IPv6:
SCAN 2 IPv4 VIP: 192.168.102.56
[root@ron12cprinode1 ~]#

You can see the new VIPs in the output of ifconfig, just as you would with the primary SCAN:

eth2      Link encap:Ethernet  HWaddr 52:54:00:FE:E2:D5
          inet addr:192.168.102.50  Bcast:192.168.102.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fefe:e2d5/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:523 errors:0 dropped:0 overruns:0 frame:0
          TX packets:339 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:118147 (115.3 KiB)  TX bytes:72869 (71.1 KiB)

eth2:1    Link encap:Ethernet  HWaddr 52:54:00:FE:E2:D5
          inet addr:192.168.102.55  Bcast:192.168.102.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

So there is nothing too surprising in the output, it’s exactly the same as before with the public SCAN created during the installation.

End of part 1

This already seems like a lot of text to me so I think it’s time to pause here. The next parts will demonstrate the addition of the SCAN listeners, the new node listeners on the *.dg.example.com network and finally the duplication of the primary RAC One Node database for use as a standby database.

Posted in 12c Release 1, KVM | Tagged: , | 4 Comments »

Interesting GNS anomaly in 12.1.0.1.2 RAC

Posted by Martin Bach on March 13, 2014

I was encountering an interesting anomaly with my 12.1.0.1.2 RAC cluster based on Grid Naming System. I have written about the setup here.

No need to get back to the post-instead let me give you some background if you are not familiar with the Grid Naming System in RAC. Most RAC installations use a Domain Name Server (DNS) to resolve names to IP addresses. In the case of GNS the name resolution is delegated from the main DNS server to a so-called subdomain. In my case my DNS server (bind version 9) is configured on aux.example.com (192.168.100.2), and it delegates name resolution for *.gns.example.com (my cluster) to the Grid Naming System. The GNS address in use is 192.168.100.37 and must be defined in the corporate DNS. The required IP addresses have to be supplied by a DHCP server in the same range as the public network. My public network is on 192.168.100/24, the private network is on 192.168.101/24 and 192.168.102/24. Which component resolves what? Here is a summary:

  • The GNS VIP is resolved by DNS. It’s not in the subdomain according to the documentation. I am using gns-vip.example.com (192.168.100.37) for it.
  • The node public IP is resolved via GNS. Oracle uses the $(hostname) to assign the name. The IP address is fixed
  • The node VIPs are resolved via GNS, the assigned name is $(hostname)-vip and the addresses are assigned by DHCP
  • The private interconnect interface is resolved by GNS, usually as $(hostname)-priv. Addresses are either static or assigned by DHCP
  • The SCAN VIPs are resolved via GNS and assigned by DHCP. And this is exactly where my problem was.

The Problem

All over sudden I had connection problems with my tomcat server connecting to the database. The address I used in tnsnames ora was rac12scan.gns.example.com and hadn’t changed at all. It simply didn’t point to a valid IP half of the time. When I pinged the SCAN I was surprised to see that it listed adresses that weren’t in the output of “srvctl config scan”.

And sure enough, a nslookup against the SCAN resulted in the output of 7 (!) addresses. There was obviously a discrepancy between what Oracle thought the SCAN was (3 IP addresses) and what GNS reported. I tried various cures, including restarting the DNS and DHCP servers (even moving the lease file on the DHCP server to a temporary location), and variations of restarting GNS and the whole cluster (crsctl stop clsuter -all; crsctl start cluster -all). Nothing helped.

Troubleshooting

As you’d expect the name resolution only works with GNS started. If you try to “nslookup rac12scan.gns.example.com” while GNS is down, you get the NXDOMAIN result code which does not really help. The output of “srvctl status gns -list and -detail” is useful in troubleshooting the problem. I also ran “cluvfy comp gns” but that came back clean. Nevertheless, the system instisted that the SCAN was made up of 7 IP addresses, and only 4 of them had SCAN VIPs. No surprise I can’t connect half the time. Now the question is of course: why does the GNS daemon grab so many IP addresses? So I looked at the gnsd.log in $GRID_HOME/log/$(hostname -s)/gns to see what happens. Here is an excerpt from the log file:

2014-03-11 16:37:30.305: [     GNS][1009862208]main::clsgnocrOpenKeyInternal: (:CLSGN01305:) #0 of 3: OCR error 4 (PROC-4: The cluster registry key to be operated on does not exist.) is not expected 32 - no retry will be done.
2014-03-11 16:37:30.305: [     GNS][1009862208]main::clsgnocrOpenKeyInternal: (:CLSGN01307:) Pass #0: open of "SYSTEM.GNS.SUBDOMAIN.gns|dexample|dcom.rac12gns-scan1-vip.SRV" failed with error 4 Last: PROC-4: The cluster registry key to be operated on does not exist. (4) - unable to retry.
2014-03-11 16:37:30.306: [     GNS][1009862208]main::clsgnocrOpenKeyInternal: (:CLSGN01305:) #0 of 3: OCR error 4 (PROC-4: The cluster registry key to be operated on does not exist.) is not expected 32 - no retry will be done.
2014-03-11 16:37:30.306: [     GNS][1009862208]main::clsgnocrOpenKeyInternal: (:CLSGN01307:) Pass #0: open of "SYSTEM.GNS.SUBDOMAIN.gns|dexample|dcom.rac12gns-scan1-vip.TXT" failed with error 4 Last: PROC-4: The cluster registry key to be operated on does not exist. (4) - unable to retry.
2014-03-11 16:37:30.306: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12node1-vip" Type: A 192.168.100.44 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.308: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12node2-vip" Type: A 192.168.100.39 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.308: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.34 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.36 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.35 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.40 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.42 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.41 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12scan" Type: A 192.168.100.31 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.309: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12gns-scan2-vip" Type: A 192.168.100.41 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.310: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12gns-scan2-vip" Type: A 192.168.100.34 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.310: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12gns-scan3-vip" Type: A 192.168.100.40 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.310: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12gns-scan3-vip" Type: A 192.168.100.38 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.310: [     GNS][1009862208]main::clsgndhaRegisterAddresses: Name: "rac12gns-scan1-vip" Type: A 192.168.100.31 Unique: TRUE Flags: ALLOCATED
2014-03-11 16:37:30.310: [   CLSNS][1009862208]Resolve::clsns_SetTraceLevel:trace level set to 1.
2014-03-11 16:37:30.310: [   CLSNS][1009862208]Resolve::clsns_SetTraceLevel:trace level set to 1.
2014-03-11 16:37:30.311: [   CLSNS][1009862208]Resolver #0::clsns_SetTraceLevel:trace level set to 1.
2014-03-11 16:37:30.311: [   CLSNS][1009862208]Resolver #0::clsns_SetTraceLevel:trace level set to 1.
2014-03-11 16:37:30.311: [   CLSNS][1009862208]Resolver #1::clsns_SetTraceLevel:trace level set to 1.

You can see the address assignment here, and this corresponds to the DHCP leases I saw on the DHCP server host’s /var/log/messages file. Notice that rac12scan grabs 7 IP addresses. This is surprising, the documentation normally states just 3 IP addresses for use with the SCAN. As you can further see the SCAN VIPs resolve only to a subset of these. 192.168.100.36 for example does not have a corresponding SCAN-VIP like some others too. Also, rac12gns-scan2-vip and rac12gns-scan3-vip appear twice. So if rac12scan now resolves to 192.168.100.36 you will get a TNS Error in the form “Destination host does not exist”. Annoying! But why does it do so? The mesages before (PROC-4) gave me a clue.

The information about GNS VIPs seems to be stored in the OCR. An ocrdump confirmed the findings. The question remains: why do Oracle use more than 3 IP addresses for the SCAN? I can only speculate that the DHCP addresses obtained are added to the OCR, and the existing ones don’t seem to be cleared out or updated correctly.

In an effort to solve the problem I dropped the SCAN listeners and the SCAN VIPS and recreated them. But this seemed to confuse the cluster even more, and the name resolution didn’t work. I tried both the subdomain delegation (nslookup rac12scan.gns.example.com) and querying GNS directly (nslookup rac12scan.gns.example.com – 192.168.100.37), both did not resolve anything. A restart of Clusterware didn’t change the situation so I decided to reboot the cluster nodes hoping to bring everything back to normal (yes I should know better, this isn’t Windows). After some waiting until the servers come back up there was no change. Now the system was so confused that it didn’t come up with a SCAN at all. Trying to restore the OCR failed, and I couldn’t even run a rootcrs.pl -deconfig -force in preparation for a reinstallation of Clusterware. So I trashed the system. I don’t believe there is a time for GNS in my lab after this experience.

Thanks to Marcin Przepiorowski (@pioro) who helped during the investigation.

References:

Grid Infrastructure Installation Guide 12c:

  • 1.1.3 Oracle Grid Infrastructure Network Checklist
  • 5.5.1 About Oracle Grid Infrastructure Name Resolution Options
  • 5.11 Grid Naming Service Standard Cluster Configuration Example

Posted in 12c Release 1, KVM, Linux | 1 Comment »

Applying GI PSU 12.1.0.1.2 in the lab

Posted by Martin Bach on January 15, 2014

In my previous posts about the first RAC Grid Infrastructure Patchset I document a few issues I encountered that were worth noting. But where things work as advertised I am more than happy to document it too. In a way, the January 2014 GI PSU works as you’d hope it would (at least in my lab for my 2 node cluster). Well-almost: if you have a non 12.1 database in your environment you might encounter this.

UPDATE: You might want review some additional information with regards to datapatch.

Admittedly it’s taken from an Oracle Restart (i.e. non cluster) environment but I can’t see this not happening in RAC:

[root@server1 stage]# opatchauto apply /u01/stage/12.1.0.1.2/17735306 -ocmrf /u01/stage/ocm.rsp
OPatch Automation Tool
Copyright (c) 2013, Oracle Corporation.  All rights reserved.

OPatchauto version : 12.1.0.1.2
OUI version        : 12.1.0.1.0
Running from       : /u01/app/grid/product/12.1.0.1/grid

opatchauto log file: /u01/app/grid/product/12.1.0.1/grid/cfgtoollogs/opatchauto/17735306/ -
opatch_gi_2014-02-17_20-04-54_deploy.log

Parameter Validation: Successful

System Configuration Collection failed: oracle.osysmodel.driver.crs.productdriver.ProductDriverException:
PRCD-1027 : Failed to retrieve database ora11
PRCD-1229 : An attempt to access configuration of database ora11 was rejected because its
 version 11.2.0.3.0 differs from the program version 12.1.0.1.0. Instead run the program
from /u01/app/oracle/product/11.2.0.3.

opatchauto failed with error code 2.
[root@server1 stage]#

The environment

I have a 2 node RAC cluster that hasn’t previously been patched. It uses GNS for name resolution and is probably one of the more complex setups. So I was keen to test the new GI Patch Set Update. Note that a PSU will roll back previous patches, it’s not a delta as you might know from RPM.

The cluster installation using GNS was slightly more challenging than the usual RAC system so I decided not to try separation of duties. In other words, the GRID and RDBMS owner are identical: oracle.

Staging

As always you have to update OPatch before you can apply the latest patch. I did this in all 4 homes: 2 RDBMS and 2 Grid homes. Download and stage the patch to a location you like by unzipping it.

Next you need the ocm response file. I simply copied it from the other cluster.

Patch application

I started the installation of the patch on node 1 of my cluster in a screen session as always. Node 1 was responsible for the GNS resource at the time. Following the instructions and some previous experience I decided to let opatchauto do all of the work. And this time Oracle even included the “apply” keyword in the docs! And it didn’t bail out immediately with obscure error messages either…

[root@rac12node1 temp]# opatchauto apply /u01/temp/17735306 -ocmrf /u01/temp/ocm.rsp
OPatch Automation Tool
Copyright (c) 2013, Oracle Corporation.  All rights reserved.

OPatchauto version : 12.1.0.1.2
OUI version        : 12.1.0.1.0
Running from       : /u01/app/12.1.0.1/grid

opatchauto log file: /u01/app/12.1.0.1/grid/cfgtoollogs/opatchauto/17735306/opatch_gi_2014-01-15_08-54-25_deploy.log

Parameter Validation: Successful

Grid Infrastructure home:
/u01/app/12.1.0.1/grid
RAC home(s):
/u01/app/oracle/product/12.1.0.1/dbhome_1

Configuration Validation: Successful

Patch Location: /u01/temp/17735306
Grid Infrastructure Patch(es): 17077442 17303297 17552800
RAC Patch(es): 17077442 17552800

Patch Validation: Successful

Stopping RAC (/u01/app/oracle/product/12.1.0.1/dbhome_1) ... Successful
Following database(s) were stopped and will be restarted later during the session: rac12c

Applying patch(es) to "/u01/app/oracle/product/12.1.0.1/dbhome_1" ...
Patch "/u01/temp/17735306/17077442" successfully applied to "/u01/app/oracle/product/12.1.0.1/dbhome_1".
Patch "/u01/temp/17735306/17552800" successfully applied to "/u01/app/oracle/product/12.1.0.1/dbhome_1".

Stopping CRS ... Successful

Applying patch(es) to "/u01/app/12.1.0.1/grid" ...
Patch "/u01/temp/17735306/17077442" successfully applied to "/u01/app/12.1.0.1/grid".
Patch "/u01/temp/17735306/17303297" successfully applied to "/u01/app/12.1.0.1/grid".
Patch "/u01/temp/17735306/17552800" successfully applied to "/u01/app/12.1.0.1/grid".

Starting CRS ... Successful

Starting RAC (/u01/app/oracle/product/12.1.0.1/dbhome_1) ... Successful

SQL changes, if any, are applied successfully on the following database(s): RAC12C

Apply Summary:
Following patch(es) are successfully installed:
GI Home: /u01/app/12.1.0.1/grid: 17077442, 17303297, 17552800
RAC Home: /u01/app/oracle/product/12.1.0.1/dbhome_1: 17077442, 17552800

opatchauto succeeded.

Wow, I could hardly believe my eyes. I “tail”d the logfile in a different session to see what it was doing but the output seems less verbose these days than initial when “opatch auto” came out. See further down in the article about useful log locations.

A few remarks. It took quite a while for the patch to advance past “Starting CRS…” I remember staring at the screen anxiously in previous patch exercises and witnessed the CRS start sequence timing out. In this case there is a valid reason for the slowness, and it’s documented in $GRID_HOME/cfgtoollogs/crsconfig/crspatch*.log: it validates ACFS and applies patches to the -MGMTDB database. In fact you should open a third session to tail the crspatch log file to see what is happening as it provides a wealth of information about stopping and starting Clusterware resources.

With GI PSU 12.1.0.1.1 I ran into problems with CRS which got confused about who is mastering the OCR. With 12.1.0.1.2 this didn’t happen. CRS started successfully on the last node, and even started the database instance.

Log locations

Information about what’s going on can be found in many locations…

    • $GRID_HOME/cfgtoollogs/opatchauto/17735306/opatch_gi_timestamp_deploy.log for the opatchauto process
    • $GRID_HOME/cfgtoollogs/crsconfig/crspatch_hostname_timestamp.log records stop, unlocking, start and locking of Clusterware

More log information can be found in:

  • $GRID_HOME/cfgtoollogs/opatch contains logs for the opatch commands issued against the GRID home
  • $ORACLE_HOME/cfgtoollogs/opatch/opatchtimestamp.log – records output from the individual opatch commands issued against the RDBMS home.

Note these logs are not written to in that sequence. Start with the opatchauto logs, then refer to the more specific log locations for individual troubleshooting. The complete log file names are referenced at the end of the opatchauto log.

SQLPatch

The post installation steps require you to load SQL scripts into the database-it seems to have done that, as shown in the opatchauto log file:

2014-01-15_09-32-13 :
Ignoring the dbconsole command.
2014-01-15_09-32-17 :
Executing command:
/bin/bash -c 'ORACLE_HOME=/u01/app/oracle/product/12.1.0.1/dbhome_1 \
ORACLE_SID=$(/u01/app/oracle/product/12.1.0.1/dbhome_1/bin/srvctl status instance -d RAC12C -n rac12node1 | cut -d " " -f 2) \
/u01/app/oracle/product/12.1.0.1/dbhome_1/OPatch/datapatch'
2014-01-15_09-33-45 :
Successfully executed the above command.

SQL changes, if any, are applied successfully on the following database(s): RAC12C

Except that I could not find a log file in $ORACLE_HOME/sqlpatch/17552800 directory in the RDBMS home on the first node. The -MGMTDB has been patched, the log is in the $GRID_HOME. The registry$history view didn’t show any sign of a patch either. Only after applying the PSU on the last node did Oracle patch the database.

Update

It appears that PSU 1 and 2 have issues when running datapatch. I found this in the output of my patch log:

Executing command:
/bin/bash -c 'ORACLE_HOME=/u01/app/oracle/product/12.1.0.1/dbhome_2 ORACLE_SID=$(/u01/app/oracle/product/12.1.0.1/dbhome_2/bin/srvctl sta
tus instance -d RCDB1 -n rac12node3 | cut -d " " -f 2) /u01/app/oracle/product/12.1.0.1/dbhome_2/OPatch/datapatch'
2014-03-21_11-25-36 :

COMMAND EXECUTION FAILURE :
SQL Patching tool version 12.1.0.1.0 on Fri Mar 21 11:23:26 2014
Copyright (c) 2013, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...done
For the following PDBs: CDB$ROOT
  Nothing to roll back
  The following patches will be applied: 17552800
For the following PDBs: PDB$SEED
  Nothing to roll back
  The following patches will be applied: 17552800
For the following PDBs: ACDEMOPDB
  Nothing to roll back
  The following patches will be applied: 17552800
For the following PDBs: DUBLIN
  Nothing to roll back
  The following patches will be applied: 17552800
Adding patches to installation queue...done
Installing patches...done
Validating logfiles...

ERROR:
mkdir: cannot create directory `/u01/app/grid/cfgtoollogs': Permission denied
mkdir: cannot create directory `/u01/app/grid/cfgtoollogs': Permission denied
mkdir: cannot create directory `/u01/app/grid/cfgtoollogs': Permission denied
mkdir: cannot create directory `/u01/app/grid/cfgtoollogs': Permission denied
DBD::Oracle::st execute failed: ORA-03113: end-of-file on communication channel
Process ID: 20551
Session ID: 21 Serial number: 9 (DBD ERROR: OCIStmtExecute) [for Statement "ALTER PLUGGABLE DATABASE pdb$seed
                 CLOSE IMMEDIATE INSTANCES=ALL"] at /u01/app/oracle/product/12.1.0.1/dbhome_2/sqlpatch/sqlpatch.pm line 448, <LOGFILE> line 6129.

The corresponding bug number is 17421502, and a patch exists. I downloaded the patch and applied it on all of my nodes (rolling!). After everything has come up, I re-ran datapatch and this time it seemed to have worked without an “end-of-file on communication channel”. See MOS Datapatch may fail while patching a RAC+PDB environment (Doc ID 1632818.1) for more information.


Note that you have to set your ORACLE_SID to the database to be patched for the above command to work.

Summary

Applying the January PSU to my cluster worked flawlessly, definitely a huge improvement over the last PSU which was a bit of a disaster.

Posted in 12c Release 1, KVM, Linux | Tagged: | Leave a Comment »

Inside a RAC 12c GNS cluster

Posted by Martin Bach on November 26, 2013

Based on some reader feedback I started looking at GNS again, but this time it will be for RAC 12c. According to the documentation GNS has been enhanced so that you can use it without subdomain delegation. I decided to try the “old fashioned” way though: DHCP for VIPs, SCAN IPs, subdomain delegation and the like as it is the most complex setup. I occasionally like complex.

The network setup is exactly the same as I used before in 11.2 and thankfully didn’t require any changes. The cluster I am building is a 2 node system on Oracle Linux 6.4 and the Red Hat compatible kernel. I have to use this as the Unbreakable Kernel doesn’t know about block devices made available to it via virtio-scsi. I use virtio-scsi for shared block devices very much in the same way I did for Xen.

The installation was rather rough, I had a few niggles around the DNS setup with reverse name resolution. I wish I had an updated checkip.sh available for the standard RAC installation, it would make it so much easier to detect problems in advance!

Once the networking problems were sorted out I could confirm that GNS was indeed working after the installation of both nodes:

[oracle@rac12node1 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac12node1               STABLE
               ONLINE  ONLINE       rac12node2               STABLE
ora.SYSDG.dg
               ONLINE  ONLINE       rac12node1               STABLE
               ONLINE  ONLINE       rac12node2               STABLE
ora.asm
               ONLINE  ONLINE       rac12node1               Started,STABLE
               ONLINE  ONLINE       rac12node2               Started,STABLE
ora.net1.network
               ONLINE  ONLINE       rac12node1               STABLE
               ONLINE  ONLINE       rac12node2               STABLE
ora.ons
               ONLINE  ONLINE       rac12node1               STABLE
               ONLINE  ONLINE       rac12node2               STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac12node2               STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       rac12node1               STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       rac12node1               STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       rac12node1               169.254.61.234 192.1
                                                             68.101.30 192.168.10
                                                             2.30,STABLE
ora.cvu
      1        ONLINE  ONLINE       rac12node1               STABLE
ora.gns
      1        ONLINE  ONLINE       rac12node1               STABLE
ora.gns.vip
      1        ONLINE  ONLINE       rac12node1               STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       rac12node1               Open,STABLE
ora.oc4j
      1        ONLINE  ONLINE       rac12node1               STABLE
ora.rac12node1.vip
      1        ONLINE  ONLINE       rac12node1               STABLE
ora.rac12node2.vip
      1        ONLINE  ONLINE       rac12node2               STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       rac12node2               STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       rac12node1               STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       rac12node1               STABLE
--------------------------------------------------------------------------------

As you can see the GNS processes (GNS + GNS-VIP) are part of the upper stack, and both are avilable. If that is the case then some IP addresses must have been acquired via DHCP, and fair enough, if I check my DHCP server I can find out that addresses were lent:

...
Nov 15 14:24:22 aux dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth0
Nov 15 14:24:22 aux dhcpd: Abandoning IP address 192.168.100.32: pinged before offer
Nov 15 14:24:22 aux dhcpd: Wrote 0 class decls to leases file.
Nov 15 14:24:22 aux dhcpd: Wrote 2 leases to leases file.
Nov 15 14:24:27 aux dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth0
Nov 15 14:24:28 aux dhcpd: DHCPOFFER on 192.168.100.33 to 00:00:00:00:00:00 via eth0
Nov 15 14:24:28 aux dhcpd: DHCPREQUEST for 192.168.100.33 (192.168.100.2) from
 00:00:00:00:00:00 via eth0
Nov 15 14:24:28 aux dhcpd: DHCPACK on 192.168.100.33 to 00:00:00:00:00:00 via eth0
Nov 15 14:24:46 aux dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth0
Nov 15 14:24:47 aux dhcpd: DHCPOFFER on 192.168.100.34 to 00:00:00:00:00:00 via eth0
Nov 15 14:24:47 aux dhcpd: DHCPREQUEST for 192.168.100.34 (192.168.100.2) from 00:00:00:00:00:00 via eth0
Nov 15 14:24:47 aux dhcpd: DHCPACK on 192.168.100.34 to 00:00:00:00:00:00 via eth0
Nov 15 14:24:52 aux dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth0
Nov 15 14:24:53 aux dhcpd: DHCPOFFER on 192.168.100.35 to 00:00:00:00:00:00 via eth0
Nov 15 14:24:53 aux dhcpd: DHCPREQUEST for 192.168.100.35 (192.168.100.2) from
00:00:00:00:00:00 via eth0
Nov 15 14:24:53 aux dhcpd: DHCPACK on 192.168.100.35 to 00:00:00:00:00:00 via eth0
Nov 15 14:25:11 aux dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth0
Nov 15 14:25:12 aux dhcpd: DHCPOFFER on 192.168.100.36 to 00:00:00:00:00:00 via eth0
Nov 15 14:25:12 aux dhcpd: DHCPREQUEST for 192.168.100.36 (192.168.100.2) from
00:00:00:00:00:00 via eth0
Nov 15 14:25:12 aux dhcpd: DHCPACK on 192.168.100.36 to 00:00:00:00:00:00 via eth0
...

Further information about the network address acquisition can be found in the gns log file in $GRID_HOME/log/$(hostname -s)/gnsd/gnsd.log. After a lot of verbose messaging about its own intialisation you can find some more interesting information.

2013-11-15 14:24:16.892: [     GNS][827389472]main::clsgndGetInstanceInfo: version: 12.1.0.1.0
 address "tcp://192.168.100.37:33912" port: 33912 process ID: "11081" state: "Initializing".
2013-11-15 14:24:16.892: [     GNS][827389472]main::clsgndadvAdvertise: Listening for commands
 on tcp://192.168.100.37:33912 port 33912.

So it’s ready to take commands. And soon thereafter they come in:

2013-11-15 14:24:21.309: [     GNS][690910976]Resolver #0::clsgndnsAddAnswer: name:
 "Oracle-GNS" adding Name: "Oracle-GNS" Type: SRV service: Oracle-GNS instance: rac12gns
  target: Oracle-GNS protocol: tcp service name: gns.example.com sub-subdomain: (NULL)
  port: 33912 weight: 0 priority: 0 Unique: FALSE Flags: ALLOCATED NON-PERSISTENT
  NO-EXPIRATION Creation: Fri Nov 15 14:24:16 2013  Expiration Fri Nov 15 14:24:31 2013
...
2013-11-15 14:24:38.947: [     GNS][680404736]Command #1::clsgndcpHostAdvertise:
 advertising name "rac12node1-vip" address: "192.168.100.33" time to live (seconds): 0
 lease expiration (seconds) 0.
2013-11-15 14:24:43.601: [     GNS][680404736]Command #1::clsgnctrInsertHMAC: Connection
 ID: length: 48 7169...f7d85556125a
2013-11-15 14:24:51.707: [     GNS][678303488]Command #2::clsgnctrCreateReceivePacket:
 connection version: 12.1.0.0.0 (0xc100000)
2013-11-15 14:24:51.707: [     GNS][678303488]Command #2::clsgnctrCreateReceivePacket:
 Connection ID: length: 48 b720f6f1ab0...30f2fe49811677520829bb71b74fbce
2013-11-15 14:24:51.708: [     GNS][678303488]Command #2::clsgndcpWait: Running command
 "advertise" (4933)
2013-11-15 14:24:51.708: [     GNS][678303488]Command #2::clsgndcpHostAdvertise:
 advertising name "rac12gns-scan1-vip" address: "192.168.100.34" time to
 live (seconds): 0 lease expiration (seconds) 0.
2013-11-15 14:24:57.286: [     GNS][678303488]Command #2::clsgnctrInsertHMAC: Connection
 ID: length: 48 77d111eb90a30...4f6da783e4e2fd033529
2013-11-15 14:24:57.288: [     GNS][680404736]Command #1::clsgnctrCreateReceivePacket:
  connection version: 12.1.0.0.0 (0xc100000)
2013-11-15 14:24:57.288: [     GNS][680404736]Command #1::clsgnctrCreateReceivePacket:
 Connection ID: length: 48 bddc...930aa35aef01bcd880ab041c0406681958f1fb3c08d8ab51
2013-11-15 14:24:57.289: [     GNS][680404736]Command #1::clsgndcpWait: Running command
 "advertise" (4933)
2013-11-15 14:24:57.289: [     GNS][680404736]Command #1::clsgndcpHostAdvertise:
 advertising name "rac12scan" address: "192.168.100.34" time to live (seconds): 0
 lease expiration (seconds) 0.
2013-11-15 14:25:11.489: [     GNS][676202240]Command #3::clsgnctrCreateReceivePacket:
 connection version: 12.1.0.0.0 (0xc100000)
2013-11-15 14:25:11.489: [     GNS][676202240]Command #3::clsgnctrCreateReceivePacket:
 Connection ID: length: 48 1898551a83efa80ee41b50dd68ea...7dba2099a097
2013-11-15 14:25:11.489: [     GNS][676202240]Command #3::clsgndcpWait: Running command
 "advertise" (4933)
2013-11-15 14:25:11.489: [     GNS][676202240]Command #3::clsgndcpHostAdvertise:
 advertising name "rac12gns-scan2-vip" address: "192.168.100.35" time to live (seconds):
 0 lease expiration (seconds) 0.
...

But instead of having to go over this file you can get the same information from clusterware:

[oracle@rac12node1 ~]$ srvctl config scan
SCAN name: rac12scan.gns.example.com, Network: 1
Subnet IPv4: 192.168.100.0/255.255.255.0/eth0
Subnet IPv6:
SCAN 0 IPv4 VIP: -/scan1-vip/192.168.100.34
SCAN name: rac12scan.gns.example.com, Network: 1
Subnet IPv4: 192.168.100.0/255.255.255.0/eth0
Subnet IPv6:
SCAN 1 IPv4 VIP: -/scan2-vip/192.168.100.35
SCAN name: rac12scan.gns.example.com, Network: 1
Subnet IPv4: 192.168.100.0/255.255.255.0/eth0
Subnet IPv6:
SCAN 2 IPv4 VIP: -/scan3-vip/192.168.100.36

[oracle@rac12node1 ~]$ srvctl config nodeapps
Network 1 exists
Subnet IPv4: 192.168.100.0/255.255.255.0/eth0, dhcp
Subnet IPv6:
VIP exists: network number 1, hosting node rac12node1
VIP IPv4 Address: -/rac12node1-vip/192.168.100.33
VIP IPv6 Address:
VIP exists: network number 1, hosting node rac12node2
VIP IPv4 Address: -/rac12node2-vip/192.168.100.39
VIP IPv6 Address:
ONS exists: Local port 6100, remote port 6200, EM port 2016

[oracle@rac12node1 ~]$ srvctl config gns -list
Oracle-GNS A 192.168.100.37 Unique Flags: 0x15
rac12gns-scan1-vip A 192.168.100.34 Unique Flags: 0x1
rac12gns-scan2-vip A 192.168.100.35 Unique Flags: 0x1
rac12gns-scan3-vip A 192.168.100.36 Unique Flags: 0x1
rac12gns.Oracle-GNS SRV Target: Oracle-GNS Protocol: tcp Port: 33912 Weight: 0 Priority: 0
  Flags: 0x15
rac12gns.Oracle-GNS TXT CLUSTER_NAME="rac12gns", CLUSTER_GUID="16aaef3df3d8efd6bffb1f37101d84e3",
  NODE_ADDRESS="192.168.100.37", SERVER_STATE="RUNNING", VERSION="12.1.0.1.0",
  DOMAIN="gns.example.com" Flags: 0x15
rac12node1-vip A 192.168.100.33 Unique Flags: 0x1
rac12node2-vip A 192.168.100.39 Unique Flags: 0x1
rac12scan A 192.168.100.34 Unique Flags: 0x1
rac12scan A 192.168.100.35 Unique Flags: 0x1
rac12scan A 192.168.100.36 Unique Flags: 0x1

[root@rac12node1 ~]# srvctl config gns -detail
GNS is enabled.
GNS is listening for DNS server requests on port 53
GNS is using port 5,353 to connect to mDNS
GNS status: OK
Domain served by GNS: gns.example.com
GNS version: 12.1.0.1.0
Globally unique identifier of the cluster where GNS is running: 16aaef3df3d8efd6bffb1f37101d84e3
Name of the cluster where GNS is running: rac12gns
Cluster type: server.
GNS log level: 1.
GNS listening addresses: tcp://192.168.100.37:16934.

[oracle@rac12node1 ~]$ cluvfy comp gns -postcrsinst -verbose

Verifying GNS integrity

Checking GNS integrity...
Checking if the GNS subdomain name is valid...
The GNS subdomain name "gns.example.com" is a valid domain name
Checking if the GNS VIP belongs to same subnet as the public network...
Public network subnets "192.168.100.0, 192.168.100.0, 192.168.100.0, 192.168.100.0,
  192.168.100.0" match with the GNS VIP "192.168.100.0, 192.168.100.0, 192.168.100.0,
  192.168.100.0, 192.168.100.0"
Checking if the GNS VIP is a valid address...
GNS VIP "gns-vip.example.com" resolves to a valid IP address
Checking the status of GNS VIP...
Checking if FDQN names for domain "gns.example.com" are reachable

GNS resolved IP addresses are reachable

GNS resolved IP addresses are reachable

GNS resolved IP addresses are reachable
Checking status of GNS resource...
  Node          Running?                  Enabled?
  ------------  ------------------------  ------------------------
  rac12node1    yes                       yes
  rac12node2    no                        yes

GNS resource configuration check passed
Checking status of GNS VIP resource...
  Node          Running?                  Enabled?
  ------------  ------------------------  ------------------------
  rac12node1    yes                       yes
  rac12node2    no                        yes

GNS VIP resource configuration check passed.

GNS integrity check passed

Verification of GNS integrity was successful.

And finally you can use nslookup to perform name resolution:

[oracle@rac12node1 ~]$ nslookup rac12scan.gns.example.com 192.168.100.37
Server:         192.168.100.37
Address:        192.168.100.37#53

Name:   rac12scan.gns.example.com
Address: 192.168.100.34
Name:   rac12scan.gns.example.com
Address: 192.168.100.36
Name:   rac12scan.gns.example.com
Address: 192.168.100.35

This is reflected in the GNS log file:

013-11-15 15:06:21.190: [     GNS][682505984]Resolver #3::clsgndnsProcessNameQuestion:
 Query received for name: "rac12scan.gns.example.com." type: "A"
2013-11-15 15:06:21.190: [     GNS][682505984]Resolver #3::clsgndnsProcessNameQuestion:
 Query received for name: "rac12scan.gns.example.com." type: "A"
2013-11-15 15:06:21.190: [     GNS][682505984]Resolver #3::clsgndnsAddAnswer: name:
 "rac12scan" adding Name: "rac12scan" Type: A 192.168.100.34 Unique: TRUE Flags: ALLOCATED
2013-11-15 15:06:21.190: [     GNS][682505984]Resolver #3::clsgndnsAddAnswer: name:
 "rac12scan" adding Name: "rac12scan" Type: A 192.168.100.36 Unique: TRUE Flags: ALLOCATED
2013-11-15 15:06:21.190: [     GNS][682505984]Resolver #3::clsgndnsAddAnswer: name:
 "rac12scan" adding Name: "rac12scan" Type: A 192.168.100.35 Unique: TRUE Flags: ALLOCATED
2013-11-15 15:06:21.190: [     GNS][682505984]Resolver #3::clsgndnsResolve: name
 "rac12scan.gns.example.com." Type: Addr Resolved: TRUE

So that’s Clusterware using GNS installed. A few points to notice

  • The GNS VIP is on the domain, in this case gns-vip.example.com
  • The public node names have to be defined in DNS or the hosts file (or both)
  • Only the VIPs (node VIPs and SCAN VIPs) are assigned by DHCP
  • Have a large enough range of addresses in /etc/dhcp/dhcpd.conf
  • Run cluvfy stage -pre dbinst -n node1,node2 to check if the prerequisites for the RDBMS installation are met

Happy installations!

Posted in 12c Release 1, KVM, Linux | 4 Comments »

RAC in KVM is possible without iSCSI

Posted by Martin Bach on September 3, 2013

(This post is for Jerry. He will know when he reads it)

I have been a great supporter of many flavours of virtualisation and my earliest experience with Xen goes back to Oracle VM 2 which was based on RHEL 4 and an early version of Xen. Why am I saying this? Because Xen is (was?) simple and elegant. Especially for building RAC systems: paravirtualised Linux was all you needed, and a dual-core machine: Xen is very lightweight even though recent achievements in processor architecture (nested page tables, single root IO virtualisation, others) make it more desirable to use hardware virtualisation with paravirtualised drivers. This is what this post is about!

Shared storage in Xen

As you know you need shared block devices for RAC for voting disks, OCR, data files, redo logs, the lot. In Xen that’s as straight forward as it gets:

disk = [
  'file:/var/lib/xen/images/node1/root.img,xvda,w',
   ...
  'file:/var/lib/xen/images/node1/ocr001.img,xvdd,w!'
]

The “w!” means the block device is shared. Simple. Make the file available to the second RAC node and you are done.

In 2009 (yikes!) I ask the question if you could have RAC on KVM with the same technique. Who would have known then that Karl would be a colleague on day? Fast forward a few years and I have implemented such a solution. And it works in the lab. Read the rest of this entry »

Posted in KVM, Linux, Xen | 3 Comments »

Adding a node to a 12c RAC cluster

Posted by Martin Bach on August 15, 2013

This post is not in chronological order, I probably should have written something about installing RAC 12c first. I didn’t want to write the tenth installation guide for Clusterware so I’m focusing on extending my two node cluster to three nodes to test the new Flex ASM feature. If you care about installing RAC 12c head over to RAC Attack for instructions, or Tim Hall’s site. The RAC Attack instructions are currently being worked at for a 12c upgrade, you can follow/participate the work on this free mailing list.

The cluster I installed is based on KVM on my lab server. I have used Oracle Linux 6.4 with UEK2 for the host OS. It is a standard, i.e. not a Flex Cluster but with Flex ASM configured. My network configuration is as shown:

  • public network on 192.168.100/24
  • private network on 192.168.101/24
  • ASM network on 192.168.1/24

As you can see here:

[grid@rac12node1 ~]$ oifcfg getif
eth0  192.168.100.0  global  public
eth1  192.168.101.0  global  cluster_interconnect
eth2  192.168.1.0    global  asm

Grid and oracle are the respective software owners, and I am making use of the new separation of duties on the RDBMS layer. Read the rest of this entry »

Posted in 12c Release 1, Automatic Storage Management, KVM, Linux, RAC | 2 Comments »