Category Archives: KVM

Silent installation: Oracle Restart 19c, ASM Filter Driver, UEK 5 edition

As promised in an earlier post here are my notes about installing Oracle Restart with ASM Filter Driver (ASMFD) 19c on Oracle Linux 7 using UEK 5.

Since the approach you are about to read isn’t explicitly covered in the documentation I suggest you ask Oracle Support whether it is supported before using this outside a playground/lab environment.

I also forgot about this post waiting to be published in my drafts folder, it should have gone out early April. Some components I used to put the post together aren’t the latest and greatest, please adjust accordingly.

My environment

The environment hasn’t changed massively compared to the RHCK edition of this post, except of course for the kernel used:

  • My lab environment consists of a KVM VM using the virtio driver
  • Oracle Linux 7.7
  • Kernel UEK 5 (patched to 4.14.35-1902.300.11.el7uek.x86_64)
  • All other packages up to date as of April 1st 2020
  • The first partition of /dev/vd[c-f] are to be used as ASM disks for +DATA
  • This is a fresh installation, no upgrade, no ASMLib has ever been in use

The challenge of using ASMFD with Oracle Restart and UEK5

The challenge installing Oracle Restart 19c together with ASMFD is lack of support in the base release:

[root@server5 bin]# ./acfsdriverstate supported
ACFS-9459: ADVM/ACFS is not supported on this OS version: '4.14.35-1902.300.11.el7uek.x86_64'
ACFS-9201: Not Supported
ACFS-9294: updating file /etc/sysconfig/oracledrivers.conf
[root@server5 bin]# uname -r
4.14.35-1902.300.11.el7uek.x86_64
[root@server5 bin]#  

Which is easy to run into since gridSetup.sh shipped with 19.3 doesn’t validate this for you when running in silent mode. The GUI version of the installer protects you from the mistake though. Upgrading to the latest UEK 5 doesn’t change this message as you just saw, but was a necessity in my case anyway as you can see later. As per My Oracle Support (MOS) certification matrix, DocID 1369107.1, Oracle 19.4.0 is the first release to support ASM Filter Driver (ASMFD). The base release, 19.3.0 does not support ASMFD (or ACFS for that matter) out of the box.

I’d like to iterate again that this post isn’t an endorsement for ASM Filter Driver, but since the documentation was a little unclear I thought I’d write up how I got to a working installation. It is up to you to ensure that ASMFD is a workable solution for your environment by following industry best known methods.

So how do I get to a working Oracle Restart 19c/ASMFD installation when I’m using UEK 5?

As I said in the first part of the series there are 2 options available for installing Oracle Restart 19c using ASMFD, at least in theory: the first one is to use UDEV to prepare ASM block devices, the second one is to label the ASM disks using asmcmd afd_label.

UDEV

Huh, UDEV? That hasn’t really been blogged about at all in the context of ASMFD, or at least I didn’t find anyone who did. I’m inferring the possibility of using UDEV from “Configuring Oracle ASM Filter Driver During Installation” (link to documentation):

If you do not use udev on the system where the Oracle Grid Infrastructure is installed, then you can also complete the following procedure to provision disks for Oracle ASMFD before the installer is launched

The docs then continue with an explanation of using disk labeling. I was very interested in the first part of the statement quoted above (… not using UDEV …) as it implies using UDEV is a viable option.

So I went ahead and tried to use ASMFD with UDEV rules. I have previously used UDEV rules without ASMFD when installing Oracle products so that wasn’t too hard to do.

After changing ownership of the relevant block devices to grid:asmadmin and mode to 0660 via UDEV, here’s the result on the changes. Permissions on my block devices to be used for ASM are now correct:

[root@server5 ~]# ls -l /dev/vd[cd]*
brw-rw----. 1 root disk     251, 32 Apr  1 15:49 /dev/vdc
brw-rw----. 1 grid asmadmin 251, 33 Apr  1 15:49 /dev/vdc1
brw-rw----. 1 root disk     251, 48 Apr  1 15:49 /dev/vdd
brw-rw----. 1 grid asmadmin 251, 49 Apr  1 15:49 /dev/vdd1
[root@server5 ~]#  

Preparing for patching and installing

Due to the issue of not being able to install Oracle Restart 19c/ASMFD with the base release I am using a slightly different approach this time. It’s rather subtle, but effective. There is only 1 difference in the call to gridSetup.sh compared to the last post. The goal is to merge the latest RU into the unzipped binaries prior to invoking the installer to create a supported configuration.

Since Oracle Grid Infrastructure 12.2 has been released it is possible to merge a Release Update (RU) into the unzipped installation image. This process has changed quite a bit over time, as reported by Ludovico Caldara for example. You can’t simply execute gridSetup.sh -applyRU and to merge the RU into the installation image, you also need to pass the parameters for an installation to avoid an error when trying a silent installation. In GUI mode, the graphical user interface starts after merging the patch into the binaries instead.

Combining the -applyRU flags and installation options as used in the previous post, I can both patch and install the software. My system has Oracle 19.6.0 available and I’m going to apply it. In preparation I have to update OPatch as the grid owner. Once that’s done I need to unzip the RU to a staging location, still logged in as the grid owner:

[grid@server5 ~]$ unzip -q /mnt/19.6.0/p6880880_190000_Linux-x86-64.zip -d /u01/app/grid/product/19.0.0/grid
replace /u01/app/grid/product/19.0.0/grid/OPatch/emdpatch.pl? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
[grid@server5 ~]$
[grid@server5 ~]$ unzip -q /mnt/19.6.0/p30501910_190000_Linux-x86-64.zip -d /u01/stage
[grid@server5 ~]$ 

Patch and install

With all preparations in place, it’s time to call gridSetup.sh with the -applyRU flag necessary for UEK5:

[grid@server5 ~]$ cd /u01/app/grid/product/19.0.0/grid/
[grid@server5 grid]$ ./gridSetup.sh -silent \
> INVENTORY_LOCATION=/u01/app/oraInventory \
> SELECTED_LANGUAGES=en \
> ORACLE_BASE=/u01/app/grid \
> ORACLE_HOME_NAME=RESTART_ASMFD_UEK \
> -waitforcompletion -ignorePrereqFailure -lenientInstallMode \
> -applyRU /u01/stage/30501910 \
> oracle.install.option=HA_CONFIG \
> oracle.install.asm.OSDBA=asmdba \
> oracle.install.asm.OSASM=asmadmin \
> oracle.install.asm.diskGroup.name=DATA \
> oracle.install.asm.diskGroup.disks=/dev/vdc1,/dev/vdd1 \
> oracle.install.asm.diskGroup.diskDiscoveryString=/dev/vd* \
> oracle.install.asm.diskGroup.redundancy=EXTERNAL \
> oracle.install.asm.diskGroup.AUSize=4 \
> oracle.install.asm.configureAFD=true \
> oracle.install.crs.rootconfig.executeRootScript=false \
> oracle.install.asm.SYSASMPassword=thinkOfASuperSecretPassword \
> oracle.install.asm.monitorPassword=thinkOfASuperSecretPassword
Preparing the home to patch...
Applying the patch /u01/stage/30501910...
Successfully applied the patch.
The log can be found at: /tmp/GridSetupActions2020-04-01_04-03-20PM/installerPatchActions_2020-04-01_04-03-20PM.log
Launching Oracle Grid Infrastructure Setup Wizard...

The response file for this session can be found at:
 /u01/app/grid/product/19.0.0/grid/install/response/grid_2020-04-01_04-03-20PM.rsp

You can find the log of this install session at:
 /tmp/GridSetupActions2020-04-01_04-03-20PM/gridSetupActions2020-04-01_04-03-20PM.log

As a root user, execute the following script(s):
        1. /u01/app/oraInventory/orainstRoot.sh
        2. /u01/app/grid/product/19.0.0/grid/root.sh

Execute /u01/app/grid/product/19.0.0/grid/root.sh on the following nodes:
[server5]



Successfully Setup Software.
As install user, execute the following command to complete the configuration.
        /u01/app/grid/product/19.0.0/grid/gridSetup.sh -executeConfigTools -responseFile /u01/app/grid/product/19.0.0/grid/install/response/grid_2020-04-01_04-03-20PM.rsp [-silent]
Note: The required passwords need to be included in the response file.


Moved the install session logs to:
 /u01/app/oraInventory/logs/GridSetupActions2020-04-01_04-03-20PM
[grid@server5 grid]$ 

“Successfully Setup Software”, the message I was waiting for ;) From then on it’s the same as described with the previous blog post. Heed over there to read more about the execution of orainstRoot.sh, root.sh and the configuration tool step to finish the installation.

Verification

As with the previous post I’d like to share some of the results of the installation. First, what about the afddriverstate?

[grid@server5 ~]$ afddriverstate installed
AFD-9203: AFD device driver installed status: 'true'
[grid@server5 ~]$ afddriverstate loaded
AFD-9205: AFD device driver loaded status: 'true'
[grid@server5 ~]$ afddriverstate version
AFD-9325:     Driver OS kernel version = 4.14.35-1902.0.9.el7uek.x86_64.
AFD-9326:     Driver build number = 191219.
AFD-9212:     Driver build version = 19.0.0.0.0.
AFD-9547:     Driver available build number = 191219.
AFD-9548:     Driver available build version = 19.0.0.0.0.
[grid@server5 ~]$  

This looks all right. What about the patch level?

[grid@server5 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
30655595;TOMCAT RELEASE UPDATE 19.0.0.0.0 (30655595)
30557433;Database Release Update : 19.6.0.0.200114 (30557433)
30489632;ACFS RELEASE UPDATE 19.6.0.0.0 (30489632)
30489227;OCW RELEASE UPDATE 19.6.0.0.0 (30489227)

OPatch succeeded. 

And finally, what about ASM? Is it using ASM Filter Driver for its disks?

SQL> col name for a20
SQL> col path for a10
SQL> col library for a50
SQL> set lines 120
SQL> select name, path, library from v$asm_disk where group_number <> 0;

NAME                 PATH       LIBRARY
-------------------- ---------- --------------------------------------------------
DATA1                AFD:DATA1  AFD Library - Generic , version 3 (KABI_V3)
DATA2                AFD:DATA2  AFD Library - Generic , version 3 (KABI_V3)

SQL> 
SQL> show parameter asm_diskstring

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring                       string      /dev/vd*, AFD:* 

So all in all, the status isn’t different from the previous post, except I have already patched my environment, saving me time.

Failed attempts

I am keeping these as reminder to myself. Maybe they are of use to you as well when troubleshooting.

When trying to install Oracle Restart after labeling ASM disks as demonstrated in the previous post, the installation of the 19.3.0 (base release) fails:

[grid@server5 grid]$ ./gridSetup.sh -silent \
> INVENTORY_LOCATION=/u01/app/oraInventory \
> SELECTED_LANGUAGES=en \
> ORACLE_BASE=/u01/app/grid \
> ORACLE_HOME_NAME=RESTART_ASMFD_UEK \
> -waitforcompletion -ignorePrereqFailure -lenientInstallMode \
> -applyRU /u01/stage/30501910 \
> oracle.install.option=HA_CONFIG \
> oracle.install.asm.OSDBA=asmdba \
> oracle.install.asm.OSASM=asmadmin \
> oracle.install.asm.diskGroup.name=DATA \
> oracle.install.asm.diskGroup.disks=/dev/vdc1,/dev/vdd1 \
> oracle.install.asm.diskGroup.diskDiscoveryString=/dev/vd* \
> oracle.install.asm.diskGroup.redundancy=EXTERNAL \
> oracle.install.asm.diskGroup.AUSize=4 \
> oracle.install.asm.configureAFD=true \
> oracle.install.crs.rootconfig.executeRootScript=false \
> oracle.install.asm.SYSASMPassword=thinkOfASuperSecretPassword \
> oracle.install.asm.monitorPassword=thinkOfASuperSecretPassword
Preparing the home to patch...
Applying the patch /u01/stage/30501910...
Successfully applied the patch.
The log can be found at: /tmp/GridSetupActions2020-04-01_03-25-52PM/installerPatchActions_2020-04-01_03-25-52PM.log
Launching Oracle Grid Infrastructure Setup Wizard...

[FATAL] [INS-30508] Invalid ASM disks.
   CAUSE: The disks [/dev/vdd1, /dev/vdc1] were not valid.
   ACTION: Please choose or enter valid ASM disks.
[FATAL] [INS-30515] Insufficient space available in the selected disks.
   CAUSE: Insufficient space available in the selected Disks. At least, 32 MB of free space is required.
   ACTION: Choose additional disks such that the total size should be at least 32 MB.
Moved the install session logs to:
 /u01/app/oraInventory/logs/GridSetupActions2020-04-01_03-25-52PM
[grid@server5 grid]$  

Similarly, if you don’t run with a very recent UEK 5 kernel patch level (4.14.35-1902.300.11.el7uek.x86_64 and later), the call to gridSetup.sh also fails, even when specifying -applyRU:

[grid@server5 grid]$ ./gridSetup.sh -silent \
> INVENTORY_LOCATION=/u01/app/oraInventory \
> SELECTED_LANGUAGES=en \
> ORACLE_BASE=/u01/app/grid \
> ORACLE_HOME_NAME=RESTART_ASMFD_UEK \
> -waitforcompletion -ignorePrereqFailure -lenientInstallMode \
> -applyRU /u01/stage/30501910 \
> oracle.install.option=HA_CONFIG \
> oracle.install.asm.OSDBA=asmdba \
> oracle.install.asm.OSASM=asmadmin \
> oracle.install.asm.diskGroup.name=DATA \
> oracle.install.asm.diskGroup.disks=/dev/vdc1,/dev/vdd1 \
> oracle.install.asm.diskGroup.diskDiscoveryString=/dev/vd* \
> oracle.install.asm.diskGroup.redundancy=EXTERNAL \
> oracle.install.asm.diskGroup.AUSize=4 \
> oracle.install.asm.configureAFD=true \
> oracle.install.crs.rootconfig.executeRootScript=false \
> oracle.install.asm.SYSASMPassword=thinkOfASuperSecretPassword \
> oracle.install.asm.monitorPassword=thinkOfASuperSecretPassword
Preparing the home to patch...
Applying the patch /u01/stage/30501910...
Successfully applied the patch.
The log can be found at: /tmp/GridSetupActions2020-04-01_03-55-32PM/installerPatchActions_2020-04-01_03-55-32PM.log
Launching Oracle Grid Infrastructure Setup Wizard...

[FATAL] [INS-41223] ASM Filter Driver is not supported on this platform.
   ACTION: To proceed, do not specify or select the Oracle ASM Filter Driver option.
*ADDITIONAL INFORMATION:*
 - AFD-620: AFD is not supported on this operating system version: 'EL7'
 - AFD-9999: Cannot change to Not valid path:
 - 

Moved the install session logs to:
 /u01/app/oraInventory/logs/GridSetupActions2020-04-01_03-55-32PM 

The kernel I used in this case was the stock Oracle Linux 7.7 kernel:

[root@server5 ~]# uname -r
4.14.35-1818.3.3.el7uek.x86_64 

The nice touch is that it’s telling me the use of ASMFD on my old kernel isn’t supported.

Happy installing!

Creating a new disk group for use with ASM Filter Driver on the command line in Oracle 19c

In my previous post I shared my surprise when I learned that calling gridSetup.sh 19c for use with Oracle ASM Filter Driver (ASMFD) required me to specify the names of the native block devices. This is definitely different from installing ASM with ASMLib where you pass ASM disks as “ORCL:diskname” to the installer.

Um, that’s great, but why did I write this post? Well, once the installation/configuration steps are completed you most likely need to create at least a second disk group. In my case that’s going to be RECO, for use with the Fast Recovery Area (FRA). This post details the necessary steps to get there, as they are different compared to the initial call to gridSetup.sh.

And while I might sound like a broken record, I would like to remind you that I’m not endorsing ASM Filter Driver. I merely found the documentation unclear in some places, and this post hopes to clarify certain aspects around the use of ASMFD. Pleae remember that ASMFD is new-ish technology and it’s up to every user to apply industry best known methods to ensure everything works as expected.

My environment

The lab environment hasn’t changed, I’m still using the same Oracle Linux 7.7 KVM VM I prepared for use with the last post. Storage is still made accessible via the virtio driver. The VM boots into the Red Hat Kernel.

Previously I installed the base release, Oracle Restart 19.3.0. Since the base release has been made available, quite a few issues have been addressed in later Release Updates (RU). To keep up with the latest fixes my system has since been patched to 19.6.0. Oracle 19.6.0 was the current RU at the time of writing.

Creating a new disk group

Since I’m using Ansible for most things these days I had to come up with a straight-forward method of creating a disk group. ASM has shipped with ASM Configuration Assistant (asmca) for quite a while now, and it can be used to create a disk group in a simple, elegant call (link to documentation). I could of course have created the disk group in sqlplus but this would have required a lot more typing, and I’m inherently lazy.

Unlike the initial call to gridSetup.sh where you pass native block devices along with a request to configure ASMFD, the steps for creating the disk group require you to label the disks beforehand. This is pretty trivial, and more importantly, easy to automate with Ansible.

Labeling the disks

As per my earlier post, I’m planning on using /dev/vde1 and /dev/vdf1 for RECO. The first step is to label the disks. The call is similar to the one you read about earlier:

[root@server4 ~]# . oraenv
ORACLE_SID = [+ASM] ? +ASM
The Oracle base remains unchanged with value /u01/app/grid
[root@server4 ~]# asmcmd afd_label RECO1 /dev/vde1
[root@server4 ~]# asmcmd afd_label RECO2 /dev/vdf1
[root@server4 ~]# asmcmd afd_lslbl
--------------------------------------------------------------------------------
Label                     Duplicate  Path
================================================================================
DATA1                                 /dev/vdc1
DATA2                                 /dev/vdd1
RECO1                                 /dev/vde1
RECO2                                 /dev/vdf1
[root@server4 ~]#  

Note the absence of the “–init” flag when invoking asmcmd afd_label … The way I understand it, this flag is used only during the initial installation.

Creating the disk group

Once the disks are labeled, you can create the disk group. Using the documentation reference I shared earlier I ended up with this call to asmca:

[grid@server4 ~]$ asmca -silent \
> -createDiskGroup -diskGroupName RECO \
> -disk 'AFD:RECO*' -redundancy EXTERNAL \
> -au_size 4 -compatible.asm 19.0.0 -compatible.rdbms 19.0.0

[DBT-30001] Disk groups created successfully. Check /u01/app/grid/cfgtoollogs/asmca/asmca-200402AM115509.log for details.

[grid@server4 ~]$  

Thanks to ASMFD I don’t have to specify individual disks, I can simply tell it to use all disks that go by the name of RECO* – RECO1 and RECO2 in this example. The actual number of ASM disks doesn’t matter using this call, again helping me automate the process.

This this environment is exclusively used for Oracle 19c I can safely set the compatibility to 19c both for ASM as well as the database. Refer to the ASM documentation for further information about the disk group compatibility properties.

Verification

The output of the command indicates success, so let’s have a look at the ASM configuration:

SQL> select d.name, d.path, d.library, dg.name
  2  from v$asm_disk d left join v$asm_diskgroup dg on (dg.group_number = d.group_number)
  3  where dg.name = 'RECO';

NAME       PATH            LIBRARY                                                      NAME
---------- --------------- ------------------------------------------------------------ ----------
RECO1      AFD:RECO1       AFD Library - Generic , version 3 (KABI_V3)                  RECO
RECO2      AFD:RECO2       AFD Library - Generic , version 3 (KABI_V3)                  RECO

SQL> select d.name, dg.name, dg.compatibility, dg.database_compatibility
  2  from v$asm_disk d left join v$asm_diskgroup dg on (dg.group_number = d.group_number)
  3  where dg.name = 'RECO';

NAME       NAME       COMPATIBILITY        DATABASE_COMPATIBILI
---------- ---------- -------------------- --------------------
RECO1      RECO       19.0.0.0.0           19.0.0.0.0
RECO2      RECO       19.0.0.0.0           19.0.0.0.0 

This seems to have worked. I can also see the disk groups registered in Clusterware:

[grid@server4 ~]$ crsctl stat res -t -w "TYPE == ora.diskgroup.type"
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       server4                  STABLE
ora.RECO.dg
               ONLINE  ONLINE       server4                  STABLE
--------------------------------------------------------------------------------
[grid@server4 ~]$ 

Silent installation: Oracle Restart 19c, ASM Filter Driver, RHCK edition

As promised in the earlier post here are my notes about installing Oracle Restart 19c on Oracle Linux 7.7 using the RedHat compatible kernel (RHCK). Please consult the ACFS/ASMFD compatibility matrix, My Oracle Support DocID 1369107.1 for the latest information about ASMFD compatibility with various kernels as well.

Why am I starting the series with a seemingly “odd” kernel, at least from the point of view of Oracle Linux? If you try to install the Oracle Restart base release with UEK 5, you get strange error messages back from gridSetup telling you about invalid ASM disks. While that’s probably true, it’s a secondary error. The main cause of the problem is this:

[root@server5 bin]# ./afddriverstate supported
AFD-620: AFD is not supported on this operating system version: '4.14.35-1902.300.11.el7uek.x86_64'
AFD-9201: Not Supported
AFD-9294: updating file /etc/sysconfig/oracledrivers.conf 

Which is easy to run into since gridSetup.sh doesn’t validate this for you when running in silent mode. The GUI version of the installer protects you from the mistake though. Upgrading to the latest UEK 5 doesn’t change this message, you need to check the certification matrix to learn that Oracle Restart 19.4.0 and later are required for UEK 5 if you’d like to use ASMFD (or ACFS for that matter). This scenario will be covered in a later post.

Using the Red Hat Compatible Kernel alleviates this problem for me. Just be aware of the usual caveats when using the Red Hat Kernel on Oracle Linux such as YUM changing the default kernel during yum upgrade etc. I’d also like to iterate that this post isn’t an endorsement for ASM Filter Driver, but since the documentation was a little unclear I thought I’d write up how I got to a working installation. It is up to you to ensure that ASMFD is a workable solution for your environment by following industry best known practices.

Configuration Options

In the post introducing this series I claimed to have identified 2 options for installing Oracle Restart 19c using ASMFD: the first one is to use UDEV to prepare ASM block devices, the second one is to label the ASM disks using asmcmd afd_label.

Huh, UDEV? That hasn’t really been blogged about at all in the context of ASMFD, or at least I didn’t find anyone who did. I’m inferring the possibility of using UDEV from “Configuring Oracle ASM Filter Driver During Installation” (link to documentation):

“If you do not use udev on the system where the Oracle Grid Infrastructure is installed, then you can also complete the following procedure to provision disks for Oracle ASMFD before the installer is launched”

You actually only have to choose one of them. Let’s start with the more frequently covered approach of labelling disks using asmcmd.

My environment

I have applied all the patches to this environment up to March 26th to my lab enviroment. The Oracle Linux release I’m using is 7.7:

[root@server4 ~]# cat /etc/oracle-release
Oracle Linux Server release 7.7 

The KVM VM I’m using for this blog post uses the latest Red Hat Compatible Kernel at the time of writing (kernel-3.10.0-1062.18.1.el7.x86_64). You will notice that I’m using the virtio driver, leading to “strange” device names. Instead of /dev/sd it’s /dev/vd. My first two block devices are reserved for the O/S and Oracle, the remaining ones are going to be used for ASM. I have an old (bad?) habit of partitioning block devices for ASM as you might notice. Most of the Oracle setup is done by the 19c preinstall RPM, which I used.

I created a grid owner – grid – to own the Oracle Restart installation. Quite a few blog posts I came across referenced group membership, and I’d like to do the same:

[root@server4 ~]# id -a grid 
uid=54322(grid) gid=54321(oinstall) groups=54321(oinstall),54322(dba),54328(asmadmin),54327(asmdba) 

The block devices I’m intending to use for ASM are /dev/vdc to /dev/vdf – the first 2 are intended for +DATA, the other 2 will become part of +RECO. As you can see they are partitioned:

[root@server4 ~]# lsblk --ascii
NAME                  MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vdf                   251:80   0   10G  0 disk 
`-vdf1                251:81   0   10G  0 part 
vdd                   251:48   0   10G  0 disk 
`-vdd1                251:49   0   10G  0 part 
vdb                   251:16   0   50G  0 disk 
`-vdb1                251:17   0   50G  0 part 
  `-oraclevg-orabinlv 252:2    0   50G  0 lvm  /u01
sr0                    11:0    1 1024M  0 rom  
vde                   251:64   0   10G  0 disk 
`-vde1                251:65   0   10G  0 part 
vdc                   251:32   0   10G  0 disk 
`-vdc1                251:33   0   10G  0 part 
vda                   251:0    0   12G  0 disk 
|-vda2                251:2    0 11.5G  0 part 
| |-rootvg-swaplv     252:1    0  768M  0 lvm  [SWAP]
| `-rootvg-rootlv     252:0    0 10.8G  0 lvm  /
`-vda1                251:1    0  500M  0 part /boot  

With all that out of the way it is time to cover the installation.

Labeling disks

I’m following the procedure documented in the 19c Administrator’s Guide chapter 20, section “Configuring Oracle ASM Filter Driver During Installation”. I have prepared my environment up to the step where I’d have to launch gridSetup.sh. This is a fairly well known process, and I won’t repeat it here.

Once the 19c install image has been extracted to my future Grid Home, the first step is to check if my system is supported:

[root@server4 ~]# cd /u01/app/grid/product/19.0.0/grid/bin
[root@server4 bin]# ./afddriverstate supported
AFD-9200: Supported 
[root@server4 bin]# uname -r
3.10.0-1062.18.1.el7.x86_64 

“AFD-9200: Supported” tells me that I can start labeling disks. This requires me to be root, and I have to set ORACLE_HOME and ORACLE_BASE. For some reason, the documentation suggests using /tmp as ORACLE_BASE, which I’ll use as well:

[root@server4 bin]# pwd
/u01/app/grid/product/19.0.0/grid/bin
[root@server4 bin]# export ORACLE_BASE=/tmp
[root@server4 bin]# export ORACLE_HOME=/u01/app/grid/product/19.0.0/grid
[root@server4 bin]# ./asmcmd afd_label DATA1 /dev/vdc1 --init
[root@server4 bin]# ./asmcmd afd_label DATA2 /dev/vdd1 --init 

[root@server4 bin]# ./asmcmd afd_lslbl /dev/vdc1
--------------------------------------------------------------------------------
Label                     Duplicate  Path
================================================================================
DATA1                                 /dev/vdc1

[root@server4 bin]# ./asmcmd afd_lslbl /dev/vdd1
--------------------------------------------------------------------------------
Label                     Duplicate  Path
================================================================================
DATA2                                 /dev/vdd1  

Note the use of the –init flag. This is only needed if Grid Infrastructure isn’t installed yet.

Labeling the disks did not have an effect on the block devices’ permissions. Right after finishing the 2 calls to label my 2 block devices, this is the output from my file system:

[root@server4 bin]# ls -l /dev/vd[c-d]*
brw-rw----. 1 root disk 252, 32 Mar 27 09:46 /dev/vdc
brw-rw----. 1 root disk 252, 33 Mar 27 12:55 /dev/vdc1
brw-rw----. 1 root disk 252, 48 Mar 27 09:46 /dev/vdd
brw-rw----. 1 root disk 252, 49 Mar 27 12:58 /dev/vdd1
[root@server4 bin]#  

The output of afd_lslbl indicated that both of my disks are ready to become part of an ASM disk group, so let’s start the installer.

Call gridSetup.sh

I haven’t been able to make sense of the options in the response file until I started the installer in GUI mode and created a response file based on my choices. To cut a long story short, here is my call to gridSetup.sh:

[grid@server4 ~]$ /u01/app/grid/product/19.0.0/grid/gridSetup.sh -silent \
> INVENTORY_LOCATION=/u01/app/oraInventory \
> SELECTED_LANGUAGES=en \
> ORACLE_BASE=/u01/app/grid \
> ORACLE_HOME_NAME=ASMFD_RHCK \
> -waitforcompletion -ignorePrereqFailure -lenientInstallMode \
> oracle.install.option=HA_CONFIG \
> oracle.install.asm.OSDBA=asmdba \
> oracle.install.asm.OSASM=asmadmin \
> oracle.install.asm.diskGroup.name=DATA \
> oracle.install.asm.diskGroup.disks=/dev/vdc1,/dev/vdd1 \
> oracle.install.asm.diskGroup.diskDiscoveryString=/dev/vd* \
> oracle.install.asm.diskGroup.redundancy=EXTERNAL \
> oracle.install.asm.diskGroup.AUSize=4 \
> oracle.install.asm.configureAFD=true \
> oracle.install.crs.rootconfig.executeRootScript=false \
> oracle.install.asm.SYSASMPassword=thinkOfASuperSecretPassword \
> oracle.install.asm.monitorPassword=thinkOfASuperSecretPassword
Launching Oracle Grid Infrastructure Setup Wizard...

The response file for this session can be found at:
 /u01/app/grid/product/19.0.0/grid/install/response/grid_2020-03-27_01-06-14PM.rsp

You can find the log of this install session at:
 /tmp/GridSetupActions2020-03-27_01-06-14PM/gridSetupActions2020-03-27_01-06-14PM.log

As a root user, execute the following script(s):
        1. /u01/app/oraInventory/orainstRoot.sh
        2. /u01/app/grid/product/19.0.0/grid/root.sh

Execute /u01/app/grid/product/19.0.0/grid/root.sh on the following nodes:
[server4]

Successfully Setup Software.
As install user, execute the following command to complete the configuration.
/u01/app/grid/product/19.0.0/grid/gridSetup.sh -executeConfigTools -responseFile /u01/app/grid/product/19.0.0/grid/install/response/grid_2020-03-27_01-06-14PM.rsp [-silent]
Note: The required passwords need to be included in the response file.
Moved the install session logs to:
/u01/app/oraInventory/logs/GridSetupActions2020-03-27_01-06-14PM
[grid@server4 ~]$

It took a little while to work out that despite labeling the disks for ASMFD I didn’t have to put any reference to AFD into the call to gridSetup.sh. Have a look at the ASM disk string and the block devices: that’s what I’d use if I were using UDEV rules for device name persistence. The syntax might appear counter-intuitive. However there’s a “configureAFD” flag you need to set to true.

Since this is a lab environment I’m ok with external redundancy. Make sure you pick a redundancy level appropriate for your use case.

Running the configuration tools

The remaining steps are identical to a non ASMFD setup. First you run orainstRoot.sh followed by root.sh. The output of the latter showed this for me, indicating success:

[root@server4 ~]# /u01/app/grid/product/19.0.0/grid/root.sh
Check /u01/app/grid/product/19.0.0/grid/install/root_server4_2020-03-27_13-11-05-865019723.log for the output of root script

[root@server4 ~]#
[root@server4 ~]# cat /u01/app/grid/product/19.0.0/grid/install/root_server4_2020-03-27_13-11-05-865019723.log
Performing root user operation.

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/grid/product/19.0.0/grid
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...


Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/grid/product/19.0.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/grid/crsdata/server4/crsconfig/roothas_2020-03-27_01-11-06PM.log
2020/03/27 13:11:13 CLSRSC-363: User ignored prerequisites during installation
LOCAL ADD MODE
Creating OCR keys for user 'grid', privgrp 'oinstall'..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Node server4 successfully pinned.
2020/03/27 13:13:55 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'

server4     2020/03/27 13:16:59     /u01/app/grid/crsdata/server4/olr/backup_20200327_131659.olr     724960844
2020/03/27 13:17:54 CLSRSC-327: Successfully configured Oracle Restart for a standalone server
[root@server4 ~]# 

Well that looks ok, now on to the final step, configuration! As indicated in the output, you need to update the response (/u01/app/grid/product/19.0.0/grid/install/response/grid_2020-03-27_01-06-14PM.rsp) file with the required passwords. For me that was oracle.install.asm.monitorPassword and oracle.install.asm.SYSASMPassword. Once the response file was updated, I called gridSetup.sh once again:

[grid@server4 ~]$ /u01/app/grid/product/19.0.0/grid/gridSetup.sh -executeConfigTools -responseFile /u01/app/grid/product/19.0.0/grid/install/response/grid_2020-03-27_01-06-14PM.rsp -silent
Launching Oracle Grid Infrastructure Setup Wizard...

You can find the logs of this session at:
/u01/app/oraInventory/logs/GridSetupActions2020-03-27_01-20-47PM

You can find the log of this install session at:
 /u01/app/oraInventory/logs/UpdateNodeList2020-03-27_01-20-47PM.log
Successfully Configured Software. 

And that’s it! The software has been configured successfully. Don’t forget to remove the passwords from the response file!

Verification

After a little while I have been able to configure Oracle Restart 19c/ASMFD on Oracle Linux 7.7/RHCK. Let’s check what this implies.

I’ll first look at the status of ASM Filter Driver:

[grid@server4 ~]$ . oraenv
ORACLE_SID = [grid] ? +ASM
The Oracle base has been set to /u01/app/grid
[grid@server4 ~]$ afddriverstate installed
AFD-9203: AFD device driver installed status: 'true'
[grid@server4 ~]$ afddriverstate loaded
AFD-9205: AFD device driver loaded status: 'true'
[grid@server4 ~]$ afddriverstate version
AFD-9325:     Driver OS kernel version = 3.10.0-862.el7.x86_64.
AFD-9326:     Driver build number = 190222.
AFD-9212:     Driver build version = 19.0.0.0.0.
AFD-9547:     Driver available build number = 190222.
AFD-9548:     Driver available build version = 19.0.0.0.0.
[grid@server4 ~]$  

That’s encouraging: ASMFD is loaded and works on top of kernel-3.10 (RHCK)

I am indeed using the base release (and have to patch now!)

[grid@server4 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
29585399;OCW RELEASE UPDATE 19.3.0.0.0 (29585399)
29517247;ACFS RELEASE UPDATE 19.3.0.0.0 (29517247)
29517242;Database Release Update : 19.3.0.0.190416 (29517242)
29401763;TOMCAT RELEASE UPDATE 19.0.0.0.0 (29401763)

OPatch succeeded. 

And … I’m also using ASMFD:

SQL> col name for a20
SQL> col path for a10
SQL> col library for a50
SQL> set lines 120
SQL> select name, path, library from v$asm_disk where group_number <> 0;

NAME                 PATH       LIBRARY
-------------------- ---------- --------------------------------------------------
DATA1                AFD:DATA1  AFD Library - Generic , version 3 (KABI_V3)
DATA2                AFD:DATA2  AFD Library - Generic , version 3 (KABI_V3)

SQL> show parameter asm_diskstring

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring                       string      /dev/vd*, AFD:*
SQL>  

This concludes the setup of my lab environment.

Using virt-install/KVM and kickstart to create lots of identical VMs quickly

In preparation for an upcoming talk about Oracle 12.2 I am planning on getting my feet wet with the Oracle 12.2 Sharding option. It seems to be a very interesting subject to get into, but like all new subjects, they can be intimidating at first due to their complexity. So why not use the instructions someone has already written down?

Oracle has a nice series of articles about Sharding. I found the Sharding landing page in the Maximum Availability Architecture section on OTN. For the purpose of this blog post I’m intending to follow the “Oracle Sharded Database Deployment with Active Data Guard using CREATE SHARD method for On-Premises – Cookbook”, available from the aforementioned landing page.

About those hardware requirements

Well that’s what I initially had planned. What I hadn’t expected was the number of VMs to be created. On page 7 in the aforementioned PDF, the author explains the pre-requisites. Copying verbally from the document:

You must acquire brand new 7 VMs which do not have any pre-existing Oracle database or listeners running on them.

Well, that’s 7 VMs. Not an awful lot of fun if you have to install all of them manually. Thankfully I don’t have to! In my lab environments I run KVM plus libvirt to make my life easier. There are many useful aspects of this combination, and one of them is named virt-install. The tool allows you to create a VM plus storage, networking etc entirely on the command line. But that would still require the administrator to navigate the anaconda screens, something I’d like to avoid. For quite some time administrators have deployed systems using an automation option. On Red Hat Linux and clones that mechanism is called “Kickstart”. Kickstart – documented in the Red Hat Installation Guide chapter 23 – is an immensely powerful way of deploying systems in a consistent, and always correct (or wrong :) way. In this article I’d like to share how you could run virt-install to install VMs with minimum effort.

Say hello to virt-install

virt-install is beautifully documented making it easy to create VMs. The syntax however takes a little while to understand. I have previously written about virt-install but those posts are now showing their age, and a new example seems in order.

Before presenting the command to create the VM let’s think about the machine’s capabilities first.

  • VM to host a single instance Oracle database and some auxiliary software
  • Should probably have at least two vcpus
  • Eight GB of DRAM should be the minimum
  • I like to separate the root file system from the application data, e.g. I want two volume groups: rootvg and oraclevg

Let’s translate that to virt-install; this works on virt-install 1.4.0 which is the current version on my Oracle Linux 7.3 host.

# virt-install --name shard0 --memory 8192 --memorybacking hugepages=yes \
--vcpus 2 --metadata description=shard0_vm --location /path/to/OL7.3.iso \
--initrd-inject=/tmp/shard0.ks --os-variant=rhel7.3 \
--extra-args="ks=file:/shard0.ks console=tty0 console=ttyS0,115200n8"  \
--disk size=12,pool=capacitypool --disk size=50,pool=latencypool \
--network network=public --nographics

This worked for me: the virt-install command (sourcing the kickstart file) creates and configures a fresh VM with all the required packages for Oracle 12c. I couldn’t reference the 12.2 preinstall RPM because it’s not part of the Oracle Linux 7.3 ISO. That’s ok for me, I can wait for the 7.4 ISO when this might be part of the standard distribution. As a workaround you could install the 12.2 preinstall RPM once the new VM is installed, the RPM can be found in the OL7 “latest” yum channel.

virt-install what?

The virt-install command might require an explanation or two. All of these parameters are explained in the man-page, the ones that stick out a bit are covered here.

name
This is the name the VM gets. You can manage it using virsh, the libvirt command line interface or it’s GUI cousin, virt-manager
memorybacking
Using this command I instruct the VM to use large pages in the hope that it simplifies the hypervisor’s job of memory management. Obviously this requires large pages to be set up and enough of them available
location
As explained in the docs you can make an ISO available to virt-install that serves as the installation source. Doing so requires you to run virt-install as root
initrd-inject and extra-args
Using these together in combination with the location flag allows me to stick a kickstart file through to the VM. In the kickstart file I configured partitioning, networking, firewall/selinux settings as well as packages. I can even have a %post section in which I can run arbitrary shell commands. The other parameters are taken from the Red Hat documentation and allow me to install the VM in console mode. Requires the “text” setting in the kickstart file, and the nographics flag
disk
Using the disk directive I request 2 qcow2 virtual disk images to be created. I place the O/S on the capacity pool (backed by hard disk – a lot of capacity but dog slow) and the Oracle related data on a virtual disk backed by the SSDs in the machine.
network
In a similar way I am instructing the VM to connect to the network I defined beforehand, named public.
nographics
If you want to make sure you don’t need X11 at all to install the VM, you can use nographics. You have to provide a “text” directive in your kickstart file and most likely set the console to something sensible, see above for an example

I have said it before and I say it again: libvirt is a fantastic tool, and so easy to use. If you spend a little bit of time preparing your environment, it gets even simpler. Here are a couple of examples:

Before creating VMs on my host I created a couple of storage pools, namely capacity and latency. That’s where I instruct the storage to be used. This way I can simply instruct libvirt to create a volume of x GB in the pool and assign it to the VM. The order matters, the 12 GB disk will become /dev/vda (KVM uses virtio drivers), the second one /dev/vdb. The partitioning scheme is defined in the kickstart file.

Likewise, I can create a network that all my VMs can connect to. On the back-end, libvirt created iptables rules, and a network bridge:

# virsh net-info public
Name:        public
UUID:        some long identifier
Active:      yes
Persistent:  yes
Autostart:   yes
Bridge:      virbr1

VMs that are part of this network – including the host (x.x.x.1) can communicate with each other.

I hope I could get you a little interested in libvirt. Have a look at the documentation, it is very good and explains these topics in more detail.

If you haven’t used virt-install previously this might look uninspiring. However using the commands above I don’t need to set up the installation source via NFS, SMB or HTTP. Neither do I need to provide the kickstart file via a web server, keeping setup requirements very low.

What about the kickstart file? Where do I get one

I had the same question when I first started with the tool. The documentation is quite good, but a working example would be nice. Turns out you have one, it’s created by default in /root/anaconda-ks.cfg and just like an Oracle response file, contains a recording of the options you entered in the GUI. Have a look at it, and adjust to your needs.

New Events for Data Guard and Synchronous Redo Transport in 12c (2)

After the baseline has been established in the first part of this series it’s time to measure the effect of the network in this part. The second test will introduce an interesting feature: Using Linux’s own Traffic Shaper/Quality of Services module I will add a delay of 100ms to the Data Guard network interface card (NIC) to slow things down a little.

WARNING: this is of course a lab or VM-only situation. I can’t exert control over wire quality in my (own) switches, hence some software magic is needed on my virtual ones. This post is intended to be merely for educational purposes, not for use at work.

I am continuing to use the 2 node RAC 12.1.0.2.170117 primary database on Oracle Linux 7 with UEK 4 and an identical RAC to host my standby database.

All database traffic still uses eth0 – my public network (192.168.100.0/24). In order not to affect my SSH session with the planned network changes I created a new out-of-band management interface (eth2) on the Data Guard standby database servers. I love KVM and virtio drivers for the NICs: a few commands in virsh later and every RAC node has a new NIC, and of course without rebooting. Adding some nmcli magic and I can configure all interfaces in 1 line!

Here’s what it looks like from connection manager (the system is Oracle Linux 7.3)

[root@rac12sec1 ~]# nmcli c show
NAME  UUID                                  TYPE            DEVICE 
eth0  3e8915f1-10ce-4da7-99b8-d2a6947f9ff7  802-3-ethernet  eth0   
eth1  0ebe1279-40e1-4faf-a0be-49588d347fcf  802-3-ethernet  eth1   
mgmt  51054729-19ae-49e7-ba18-c6b41475ec1b  802-3-ethernet  eth2   

My eth0 device maps to the public networks, e.g. all database traffic including Data Guard. The private interconnect uses eth1 with eth2 as the management interface. I have deliberately kept this simple, usually I’d request a dedicated set of interfaces to route Data Guard traffic (I blogged about that earlier in the context of RAC) here:

Please read past part I of the Adding an additional SCAN-series to get the full story.

Change protection mode

Following the scenario I outlined in part I I first have to change the protection mode to Maximum Availability. Again, please see part I of this series for an explanation about Protection Levels, my choice of protection level, and why “Maximum Availability” might or might not work for you.

Since I’m using the Data Guard Broker, I need to head over and change the mode in dgmgrl.

DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 40 seconds ago)

As with many other maintenance tasks, you use the edit command to change the configuration.

DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
Error: ORA-16627: operation disallowed since no standby databases would 
remain to support protection mode

Failed.

I seem to have gotten a bit ahead of myself – changing to the more serious protection modes requires changing the transport mode from async to sync. Let’s do that, and then change the protection mode:

DGMGRL> edit database 'NCDBA' set property LogXptMode = 'SYNC';
Property "logxptmode" updated
DGMGRL> edit database 'NCDBB' set property LogXptMode = 'SYNC';
Property "logxptmode" updated
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
Succeeded.
DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxAvailability
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 19 seconds ago)

DGMGRL> 

That looks better now, the first half of the job is done. The second half of this article details how to make the network look a lot worse.

Net Emulator

Using the Linux netem kernel module I can simulate many complex network situations – I found https://wiki.linuxfoundation.org/networking/netem to be a good reference.

I should probably say at this point that no one would use a link that has 100 ms delay between sites for (sync) redo transport. That simply doesn’t make sense. I used 100ms to have a large enough factor so that I can actually be sure that a deviation of performance figures is actually a result of my fiddling with the network latency. Large numbers make spotting outliers easier.

How do you mess with the network? That is simple when you are on Linux! On my standby cluster I use the tc command, part of the iproute package. This is the “normal” state:

[root@rac12sec1 ~]# tc qdisc
qdisc pfifo_fast 0: dev eth0 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth2 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
[root@rac12sec1 ~]# 

I am merely using these tools, if you want to understand them, have a look at the Linux Advanced Routing & Traffic Control HOWTO chapter 9, and refer to the link to the Linux Foundation wiki I mentioned earlier. Then take it from there.

The syntax of tc qdisc is complex at first sight:

[root@rac12sec1 ~]# tc qdisc help
Usage: tc qdisc [ add | del | replace | change | show ] dev STRING
       [ handle QHANDLE ] [ root | ingress | parent CLASSID ]
       [ estimator INTERVAL TIME_CONSTANT ]
       [ stab [ help | STAB_OPTIONS] ]
       [ [ QDISC_KIND ] [ help | OPTIONS ] ]

       tc qdisc show [ dev STRING ] [ingress]
Where:
QDISC_KIND := { [p|b]fifo | tbf | prio | cbq | red | etc. }
OPTIONS := ... try tc qdisc add <desired QDISC_KIND> help
STAB_OPTIONS := ... try tc qdisc add stab help

The above allows us to introduce NetEM to the mix. On my Linux 7.3/UEK4 system it offers the following options:

[root@rac12sec1 ~]# tc qdisc add netem help
Usage: ... netem [ limit PACKETS ] 
                 [ delay TIME [ JITTER [CORRELATION]]]
                 [ distribution {uniform|normal|pareto|paretonormal} ]
                 [ corrupt PERCENT [CORRELATION]] 
                 [ duplicate PERCENT [CORRELATION]]
                 [ loss random PERCENT [CORRELATION]]
                 [ loss state P13 [P31 [P32 [P23 P14]]]
                 [ loss gemodel PERCENT [R [1-H [1-K]]]
                 [ ecn ]
                 [ reorder PRECENT [CORRELATION] [ gap DISTANCE ]]
                 [ rate RATE [PACKETOVERHEAD] [CELLSIZE] [CELLOVERHEAD]]

That’s a wealth of options. I can simulate somewhat unreliable WAN links, even with variations in delay! Refer to the Linux Foundation wiki for more details on how to simulate networks more realistically.

Introducing Delay

I prefer to keep it simple for this post, so I’ll stick with the most basic setting for network delay. I am introducing this to the Data Guard network on both of my standby nodes while connected to the management interface eth2:

[root@rac12sec1 ~]# tc qdisc add dev eth0 root netem delay 100ms

[root@rac12sec1 ~]# tc qdisc
qdisc netem 8001: dev eth0 root refcnt 2 limit 1000 delay 100.0ms
qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth2 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

[root@rac12sec2 ~]# tc qdisc
qdisc netem 8001: dev eth0 root refcnt 2 limit 1000 delay 100.0ms
qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth2 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

[root@rac12sec2 ~]# tc qdisc show dev eth0 
qdisc netem 8001: root refcnt 2 limit 1000 delay 100.0ms

The effect can be seen immediately by pinging one of the standby nodes:

[oracle@rac12pri1 ~]$ ping rac12sec1
PING rac12sec1.example.com (192.168.100.20) 56(84) bytes of data.
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=1 ttl=64 time=100 ms
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=2 ttl=64 time=100 ms
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=3 ttl=64 time=100 ms
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=4 ttl=64 time=100 ms
^C
--- rac12sec1.example.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 100.368/100.415/100.478/0.229 ms
[oracle@rac12pri1 ~]$ 

So 100ms delay it is!

Starting the Benchmark

With the scene set, it’s time to start the benchmark. The command to start it resembles the one I used for part I, except for the name of the results file.

[oracle@loadgen1 bin]$ ./charbench -cs //rac12pri-scan/swingbench_both \
> -uc 20 -r results_20_maxavail_netem.xml -rt 00:10
Author  :	 Dominic Giles
Version :	 2.5.0.971

Results will be written to results_20_maxavail_netem.xml.
Hit Return to Terminate Run...

Time		Users	TPM	TPS

1:08:56 PM      20      5188    94          

What effect does this have on the database? Let’s check the Data Guard broker first:

DGMGRL> show database 'NCDBB'

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          10 seconds (computed 0 seconds ago)
  Average Apply Rate: 207.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

DGMGRL> show database 'NCDBB'

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          0 seconds (computed 0 seconds ago)
  Average Apply Rate: 211.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

So it looks like there isn’t any lag visible with the benchmark underway, however it took a little while for it to start as visible in the first example. Here is the output from Tanel Poder’s ASHTOP script again (using Tanel Poder’s ashtop again – which requires you to have the Diagnostic Pack license as it access v$active_session_history):

SQL> @ashtop program,event,wait_class,session_state session_type='BACKGROUND' "to_date('17.02.2017 08:10','dd.mm.yyyy hh24:mi')" "to_date('17.02.2017 08:15','dd.mm.yyyy hh24:mi')"

    Total
  Seconds     AAS %This   PROGRAM                                          EVENT                                    WAIT_CLASS      SESSION FIRST_SEEN          LAST_SEEN           DIST_SQLEXEC_SEEN
--------- ------- ------- ------------------------------------------------ ---------------------------------------- --------------- ------- ------------------- ------------------- -----------------
      298     1.0   24% | oracle@rac12pri1 (LGWR)                          SYNC Remote Write                        Other           WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59                 1
      298     1.0   24% | oracle@rac12pri1 (NSS2)                          Redo Transport MISC                      Other           WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59                 1
      276      .9   23% | oracle@rac12pri2 (LGWR)                          SYNC Remote Write                        Other           WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59                 1
      274      .9   23% | oracle@rac12pri2 (NSS2)                          Redo Transport MISC                      Other           WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59                 1
        8      .0    1% | oracle@rac12pri2 (LMS1)                                                                                   ON CPU  2017-02-17 08:10:16 2017-02-17 08:14:59                 1
        7      .0    1% | oracle@rac12pri2 (LMS0)                          gcs log flush sync                       Other           WAITING 2017-02-17 08:10:43 2017-02-17 08:14:51                 1
        5      .0    0% | oracle@rac12pri1 (LMS0)                          gcs log flush sync                       Other           WAITING 2017-02-17 08:10:23 2017-02-17 08:13:48                 1
        5      .0    0% | oracle@rac12pri2 (LMS0)                                                                                   ON CPU  2017-02-17 08:10:00 2017-02-17 08:13:24                 1
        5      .0    0% | oracle@rac12pri2 (NSV1)                                                                                   ON CPU  2017-02-17 08:12:43 2017-02-17 08:14:44                 1
        4      .0    0% | oracle@rac12pri1 (DBW0)                                                                                   ON CPU  2017-02-17 08:11:16 2017-02-17 08:13:35                 1
        3      .0    0% | oracle@rac12pri2 (CJQ0)                                                                                   ON CPU  2017-02-17 08:12:39 2017-02-17 08:12:51                 2
        3      .0    0% | oracle@rac12pri2 (RSM0)                                                                                   ON CPU  2017-02-17 08:11:44 2017-02-17 08:14:44                 3
        2      .0    0% | oracle@rac12pri1 (LMON)                          control file sequential read             System I/O      WAITING 2017-02-17 08:11:51 2017-02-17 08:14:01                 1
        2      .0    0% | oracle@rac12pri1 (LMS1)                          gcs log flush sync                       Other           WAITING 2017-02-17 08:11:14 2017-02-17 08:11:44                 1
        2      .0    0% | oracle@rac12pri2 (DBW0)                          db file parallel write                   System I/O      WAITING 2017-02-17 08:12:14 2017-02-17 08:14:24                 1
        2      .0    0% | oracle@rac12pri2 (LGWR)                          log file parallel write                  System I/O      WAITING 2017-02-17 08:12:59 2017-02-17 08:14:42                 1
        1      .0    0% | oracle@rac12pri1 (CKPT)                          control file sequential read             System I/O      WAITING 2017-02-17 08:13:46 2017-02-17 08:13:46                 1
        1      .0    0% | oracle@rac12pri1 (DBW0)                          db file parallel write                   System I/O      WAITING 2017-02-17 08:10:28 2017-02-17 08:10:28                 1
        1      .0    0% | oracle@rac12pri1 (LGWR)                          Redo Transport Close                     Other           WAITING 2017-02-17 08:12:22 2017-02-17 08:12:22                 1
        1      .0    0% | oracle@rac12pri1 (LGWR)                          log file sequential read                 System I/O      WAITING 2017-02-17 08:14:16 2017-02-17 08:14:16                 1

20 rows selected.

As you can see the top waits are in wait class “other”: “SYNC Remote Write” and “Redo Transport MISC”. I have pulled an AWR diff report that shows an interesting picture. The first snapshot set represents the full network performance, while the second one shows the effect of the 100ms network delay. Only relevant information is shown.

WORKLOAD REPOSITORY RAC Compare Period Report for

Database Summary
~~~~~~~~~~~~~~~~
                            Database                   Snapshot Ids     Number of Instances    Number of Hosts    Report Total (minutes)
               ----------------------------------   -----------------   -----------------   -----------------   ---------------------
Snapshot Set            Id Name      RAC Block Si      Begin      End   In Repor    Total   In Repor    Total      DB time Elapsed ti
------------   ----------- --------- --- --------   -------- --------   -------- --------   -------- --------   ---------- ----------
First (1st)      481582958 NCDBA     YES     8192        303      304          2        2          2        2         22.1       10.2
Second (2nd)     481582958 NCDBA     YES     8192        301      302          2        2          2        2        141.9       10.6
Database Instances Included In Report
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                                                                             Elapsed              Instance Avg Activ
Set       DB Id Inst Instance  Release         Host       Startup         Begin Snap Time End Snap Time   Time (min) DB time (m Up Time (h  Sessions Platform
--- ----------- ---- --------- --------------- ---------- --------------- --------------- --------------- ---------- ---------- ---------- --------- ---------------
1st   481582958    1 NCDBA1    12.1.0.2.0      rac12pri1  17-Feb-17 04:00 17-Feb-17 08:59 17-Feb-17 09:09       10.0        2.0        5.2       0.2 Linux x86 64-bi
                   2 NCDBA2    12.1.0.2.0      rac12pri2  17-Feb-17 04:00 17-Feb-17 08:59 17-Feb-17 09:09       10.0       20.1        5.2       2.0 Linux x86 64-bi

2nd   481582958    1 NCDBA1    12.1.0.2.0      rac12pri1  17-Feb-17 04:00 17-Feb-17 08:06 17-Feb-17 08:17       10.5      134.0        4.3      12.8 Linux x86 64-bi
                   2 NCDBA2    12.1.0.2.0      rac12pri2  17-Feb-17 04:00 17-Feb-17 08:07 17-Feb-17 08:17       10.5        7.9        4.3       0.8 Linux x86 64-bi

...

Workload Comparison
~~~~~~~~~~~~~~~~~~~                      1st Per Sec          2nd Per Sec      %Diff              1st Per Txn          2nd Per Txn      %Diff
                                     ---------------      ---------------     ------          ---------------      ---------------     ------
                       DB time:                  2.2                 13.3      514.3                      0.0                  0.2    1,600.0
                      CPU time:                  1.5                  0.5      -66.0                      0.0                  0.0        0.0
           Background CPU time:                  0.5                  0.2      -56.9                      0.0                  0.0        0.0
             Redo size (bytes):            983,488.1            322,026.1      -67.3                  4,164.1              4,192.4        0.7
         Logical read (blocks):             30,773.7              8,299.3      -73.0                    130.3                108.0      -17.1
                 Block changes:              5,665.8              1,886.0      -66.7                     24.0                 24.6        2.3
        Physical read (blocks):                 10.5                 18.3       74.4                      0.0                  0.2      500.0
       Physical write (blocks):                654.9                196.8      -69.9                      2.8                  2.6       -7.6
              Read IO requests:                 10.1                 12.9       27.3                      0.0                  0.2      325.0
             Write IO requests:                503.8                110.4      -78.1                      2.1                  1.4      -32.4
                  Read IO (MB):                  0.1                  0.1       75.0                      0.0                  0.0        0.0
                 Write IO (MB):                  5.1                  1.5      -69.9                      0.0                  0.0        0.0
                  IM scan rows:                  0.0                  0.0        0.0                      0.0                  0.0        0.0
       Session Logical Read IM:
                    User calls:                515.8                188.2      -63.5                      2.2                  2.5       12.4
                  Parses (SQL):                 24.9                 17.7      -28.8                      0.1                  0.2      109.1
             Hard parses (SQL):                  0.1                  0.2       45.5                      0.0                  0.0        0.0
            SQL Work Area (MB):                  0.1                  0.5      253.8                      0.0                  0.0      253.8
                        Logons:                  1.3                  3.9      191.7                      0.0                  0.1      400.0
                Executes (SQL):              2,587.7                853.0      -67.0                     11.0                 11.1        1.4
                  Transactions:                236.2                 76.8      -67.5

...

Top Timed Events   First DB/Inst: NCDBA/NCDBA1 Snaps: 303-304 (Elapsed time: 612.266 sec  DB time: 1326.09 sec),  Second DB/Inst: NCDBA/NCDBA1 Snaps: 301-302 (Elapsed time: 638.691 sec  DB time: 8514.42 sec)
-> Events with a "-" did not make the Top list in this set of snapshots, but are displayed for comparison purposes
-> Summary for all instances

                                               1st                                                                                                2nd
------------------------------------------------------------------------------------------------   ------------------------------------------------------------------------------------------------
Event                          Wait Class           Waits      Time(s)  Avg Time(ms)    %DB time   Event                          Wait Class           Waits      Time(s)  Avg Time(ms)    %DB time
------------------------------ ------------- ------------ ------------ ------------- -----------   ------------------------------ ------------- ------------ ------------ ------------- -----------
 CPU time                                             N/A        919.5           N/A        69.3    log file sync                 Commit              49,010      7,976.7         162.8        93.7
 log file sync                 Commit             144,284        270.4           1.9        20.4    Redo Transport MISC           Other               11,392      1,192.0         104.6        14.0
 log file parallel write       System I/O         141,581        156.6           1.1        11.8    SYNC Remote Write             Other               11,392      1,179.1         103.5        13.8
 gc current block 2-way        Cluster             98,483         71.9           0.7         5.4    CPU time                                             N/A        328.7           N/A         3.9
 gc cr block 2-way             Cluster             71,469         47.9           0.7         3.6    gcs log flush sync            Other                4,480         56.0          12.5         0.7
 target log write size         Other               22,284         26.5           1.2         2.0    gc current block 2-way        Cluster             62,843         43.3           0.7         0.5
 gc current grant busy         Cluster             22,561         16.8           0.7         1.3    gc current block busy         Cluster                284         39.3         138.4         0.5
 control file sequential read  System I/O           8,497          6.0           0.7         0.5    log file switch (checkpoint i Configuration           75         37.1         495.1         0.4
 db file sequential read       User I/O             5,862          5.4           0.9         0.4    rdbms ipc reply               Other                   32         27.1         845.4         0.3
 log file sequential read      System I/O          14,895          4.8           0.3         0.4    gc cr block 2-way             Cluster             39,015         26.0           0.7         0.3
-log file switch (checkpoint i Configuration          212          3.6          17.2         0.3   -log file parallel write       System I/O          11,267         20.3           1.8         0.2
-gcs log flush sync            Other                  461          0.4           0.9         0.0   -gc current grant busy         Cluster             12,427          8.2           0.7         0.1
-gc current block busy         Cluster                126          0.3           2.7         0.0   -control file sequential read  System I/O           7,326          5.0           0.7         0.1
-                                                     N/A          N/A           N/A         N/A   -db file sequential read       User I/O             6,615          4.7           0.7         0.1
-                                                     N/A          N/A           N/A         N/A   -log file sequential read      System I/O             174          0.4           2.4         0.0
                          --------------------------------------------------------------------------------------------------------------------

This doesn’t look too good, does it? The throughput is down substantially, which is visible in Swingbench itself: in the first example I managed to achieve 255 TPS while I’m averaging 90-95 in the synch + delay example. The numbers from the AWR excerpt shown above confirm this, and allows you to work out where the problem is.

Here is a print screen taken from OEM (that’s a different benchmark execution), showing the same situation. Make sure you are licensed appropriately to use this view.

oemexpress_with_latency

Summary Part II

So yes, there is a very noticeable affect of the network delay, and the AWR report shows the relevant wait events in 12c indicating where time is spent.

In my opinion it’s a bit unfortunate to have the new waits in the “Other” wait class, but that’s how it is.

References

http://www.oracle.com/technetwork/database/availability/sync-2437177.pdf

New Events for Data Guard and Synchronous Redo Transport in 12c (1)

I may have said it before but I consider presenting and teaching a great way to expand one’s knowledge: first of all it requires me to really understand a subject. Secondly, when presenting, you get lots of interesting questions that can turn into blog posts like this one.

Lately I have been asked about the impact of synchronous log shipping to a physical standby database. I was sure there was an effect to be observed, depending most likely on the network latency between systems but I didn’t have any evidence I could pull out of the hat to back up my thoughts. So what better than trying! I also read that some of the events have changed in 12c, and wanted to make them visible. My environment is based on the 2 node RAC primary/2 node RAC standby configuration I wrote about in my previous posts.

Since their initial setup I upgraded the cluster to 12.1.0.2.170117 for Clusterware and RDBMS.

I am planning to share my findings in two posts: the first one (the one you are reading) will present the baseline, the second one to follow shows how performance can be impacted by poor network performance. I guess the second one will be a bit longer …

Background

Data Guard allows the administrator to define the databases in a Data Guard configuration to be in one of three possible modes. The first one is Maximum Performance, the second Maximum Availability and finally Maximum Protection. All three modes are covered in the official documentation set and MOS. Without saying it very scientifically:

  • Maximum Performance uses asynchronous log shipping and is the default. The performance of the standby database (or rather lack thereof) should not impact the performance of the primary. At a cost: it is not guaranteed that redo is actually written to the standby when the commit command returns.
  • Maximum Protection allows you to enforce a Zero Data Loss (ZDL) strategy but also at a cost: The primary will shut down if it can’t transmit redo information to the standby. That is how you prevent data loss!
  • Maximum Availability is the middle ground and I’ll focus on that protection mode here.

Make sure to read the documentation and understand the implications of each protection mode when assessing the most suitable mode for your environment!

Background: New Events

Before going into details I should probably point out that there are some new Data Guard events. Hurray! More events and more meaningful names, great. Just one caveat-have a look at the output from my 12c database:

select name, wait_class from v$event_name
where name in (
'ARCH Remote Write','ASYNC Remote Write','Redo Transport Attach',
'Redo Transport Close','Redo Transport Detach','Redo Transport Open',
'Redo Transport Ping','Redo Transport Slave Shutdown','Redo Transport Slave Startup',
'Redo Writer Remote Sync Complete', 'Redo Writer Remote Sync Notify',
'Remote SYNC Ping','SYNC Remote Write');

NAME                                               WAIT_CLASS
-------------------------------------------------- ----------------------------------------
ASYNC Remote Write                                 Other
SYNC Remote Write                                  Other
ARCH Remote Write                                  Other
Redo Transport Attach                              Other
Redo Transport Detach                              Other
Redo Transport Open                                Other
Redo Transport Close                               Other
Redo Transport Ping                                Other
Remote SYNC Ping                                   Other
Redo Transport Slave Startup                       Other
Redo Transport Slave Shutdown                      Other
Redo Writer Remote Sync Notify                     Other
Redo Writer Remote Sync Complete                   Other

13 rows selected.

Compare this to the pre 12c events (taken from an 11.2.0.3 database since they are mostly gone in 12c)

select name, wait_class from v$event_name
where name in (
'ARCH wait on ATTACH','ARCH wait on SENDREQ','ARCH wait on DETACH',
'LNS wait on ATTACH', 'LNS wait on SENDREQ','LNS wait on DETACH',
'LGWR wait on LNS','LNS wait on LGWR','LGWR-LNS wait on channel');

NAME                                               WAIT_CLASS
-------------------------------------------------- --------------------
ARCH wait on ATTACH                                Network
LNS wait on ATTACH                                 Network
LNS wait on SENDREQ                                Network
LNS wait on DETACH                                 Network
ARCH wait on SENDREQ                               Network
ARCH wait on DETACH                                Network
LNS wait on LGWR                                   Network
LGWR wait on LNS                                   Network
LGWR-LNS wait on channel                           Other

9 rows selected.

For the sake of completeness here are those events that remained in 12c:

select name, wait_class from v$event_name
where name in (
'ARCH wait on ATTACH','ARCH wait on SENDREQ','ARCH wait on DETACH',
'LNS wait on ATTACH', 'LNS wait on SENDREQ','LNS wait on DETACH',
'LGWR wait on LNS','LNS wait on LGWR','LGWR-LNS wait on channel');

NAME                           WAIT_CLASS
------------------------------ ------------------------------
LNS wait on LGWR               Network
LGWR wait on LNS               Network
LGWR-LNS wait on channel       Other

Bugger. The new events are all in the “Other” wait class. My initial attempts at filtering information from v$event_histogram based on events with the “Network” class had little success. I guess friends of the OEM performance pages might also find that interesting. If you wonder from where I pulled these events – I used a white paper named “Best Practices for Synchronous Redo Transport” as reference.

Testing: The Environment

As you read in the introduction, this environment is based on a two node RAC primary -> two node RAC standby configuration:

DGMGRL> show configuration verbose

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database

  Properties:
    FastStartFailoverThreshold      = '30'
    OperationTimeout                = '30'
    TraceLevel                      = 'USER'
    FastStartFailoverLagLimit       = '30'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '0'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'CONTINUE'

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS
DGMGRL> show database 'NCDBA';

Database - NCDBA

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    NCDBA1
    NCDBA2

Database Status:
SUCCESS

DGMGRL> show database 'NCDBB'

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          0 seconds (computed 0 seconds ago)
  Average Apply Rate: 177.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

DGMGRL> show database 'NCDBA' LogXptMode
  LogXptMode = 'async'
DGMGRL> show database 'NCDBB' LogXptMode
  LogXptMode = 'async'

The configuration is currently set to “Maximum Performance”, databases ship redo asynchronously.

All systems are patched to the January 2017 Proactive Bundle Patch:

[oracle@rac12pri1 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
24917972;Database PSU 12.1.0.2.170117, Oracle JavaVM Component (JAN2017)
24732082;Database Patch Set Update : 12.1.0.2.170117 (24732082)
24828633;OCW Patch Set Update : 12.1.0.2.170117 (24828633)

OPatch succeeded.

Testing part I: The Baseline

I am using Swingbench’s Order Entry benchmark to generate a little bit of load. For the avoidance of doubt: I am not interested in a comparison between asynchronous and synchronous log shipping. I would like to see the effect caused by the difference in latency in the form of hugely increased waits. In my first test I am running with the defaults of my VM (the NIC uses virtio drivers), no change to SDU or send/receive buffers.

$ date
Fri Feb 17 13:58:58 UTC 2017
$ ./charbench -cs //rac12pri-scan/swingbench_both \
> -uc 20 -r results_20_maxperf.xml -rt 00:10
Author  :        Dominic Giles
Version :        2.5.0.971

Results will be written to results_20_maxperf.xml.
Hit Return to Terminate Run...

Time            Users   TPM     TPS

1:59:55 PM      20      3852    255

On the apply side this translates to the following figures (the average apply rate is misleading and should be ignored; it captures all apply since MRP0 was started)

DGMGRL> show database 'NCDBB'

Database - NCDBB

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          0 seconds (computed 0 seconds ago)
  Average Apply Rate: 713.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    NCDBB1
    NCDBB2 (apply instance)

Database Status:
SUCCESS

There is no transport lag, and no apply lag either. When checking for waits on background processes (using Tanel Poder’s ashtop – which requires you to have the Diagnostic Pack license as it access v$active_session_history) I noticed the following:

SQL> @ashtop program,event,wait_class,session_state session_type='BACKGROUND' "to_date('17.02.2017 09:02','dd.mm.yyyy hh24:mi')" "to_date('17.02.2017 09:07','dd.mm.yyyy hh24:mi')"

    Total
  Seconds     AAS %This   PROGRAM                                          EVENT                                    WAIT_CLASS      SESSION FIRST_SEEN          LAST_SEEN           DIST_SQLEXEC_SEEN
--------- ------- ------- ------------------------------------------------ ---------------------------------------- --------------- ------- ------------------- ------------------- -----------------
       53      .2   29% | oracle@rac12pri2 (LGWR)                          log file parallel write                  System I/O      WAITING 2017-02-17 09:03:42 2017-02-17 09:06:53                 1
       23      .1   13% | oracle@rac12pri2 (LG00)                          log file parallel write                  System I/O      WAITING 2017-02-17 09:02:03 2017-02-17 09:03:26                 1
       19      .1   10% | oracle@rac12pri2 (DBW0)                                                                                   ON CPU  2017-02-17 09:02:07 2017-02-17 09:06:39                 1
       12      .0    7% | oracle@rac12pri2 (LGWR)                          target log write size                    Other           WAITING 2017-02-17 09:02:03 2017-02-17 09:03:27                 1
        9      .0    5% | oracle@rac12pri1 (LMS1)                                                                                   ON CPU  2017-02-17 09:04:37 2017-02-17 09:06:58                 1
        8      .0    4% | oracle@rac12pri2 (TT00)                                                                                   ON CPU  2017-02-17 09:02:15 2017-02-17 09:06:56                 1
        6      .0    3% | oracle@rac12pri1 (LG00)                          log file parallel write                  System I/O      WAITING 2017-02-17 09:03:08 2017-02-17 09:06:44                 1
        6      .0    3% | oracle@rac12pri2 (LGWR)                                                                                   ON CPU  2017-02-17 09:04:05 2017-02-17 09:06:41                 1
        5      .0    3% | oracle@rac12pri2 (LG00)                                                                                   ON CPU  2017-02-17 09:02:02 2017-02-17 09:03:27                 1
        5      .0    3% | oracle@rac12pri2 (LMS0)                                                                                   ON CPU  2017-02-17 09:02:07 2017-02-17 09:06:14                 1
        5      .0    3% | oracle@rac12pri2 (LMS1)                                                                                   ON CPU  2017-02-17 09:02:24 2017-02-17 09:06:28                 1
        4      .0    2% | oracle@rac12pri1 (LMS0)                                                                                   ON CPU  2017-02-17 09:04:21 2017-02-17 09:06:05                 1
        3      .0    2% | oracle@rac12pri1 (LGWR)                          LGWR all worker groups                   Other           WAITING 2017-02-17 09:03:08 2017-02-17 09:06:13                 1
        2      .0    1% | oracle@rac12pri1 (DBW0)                          db file parallel write                   System I/O      WAITING 2017-02-17 09:03:08 2017-02-17 09:03:42                 1
        2      .0    1% | oracle@rac12pri2 (CKPT)                                                                                   ON CPU  2017-02-17 09:05:10 2017-02-17 09:05:13                 1
        2      .0    1% | oracle@rac12pri2 (TT00)                          log file sequential read                 System I/O      WAITING 2017-02-17 09:03:17 2017-02-17 09:03:46                 1
        1      .0    1% | oracle@rac12pri1 (CJQ0)                                                                                   ON CPU  2017-02-17 09:04:44 2017-02-17 09:04:44                 1
        1      .0    1% | oracle@rac12pri1 (DBW0)                                                                                   ON CPU  2017-02-17 09:03:05 2017-02-17 09:03:05                 1
        1      .0    1% | oracle@rac12pri1 (LG00)                                                                                   ON CPU  2017-02-17 09:02:02 2017-02-17 09:02:02                 1
        1      .0    1% | oracle@rac12pri1 (LGWR)                          target log write size                    Other           WAITING 2017-02-17 09:05:47 2017-02-17 09:05:47                 1

The time difference between charbench and the database can be explained by time zones: my load generator is set to use UTC whereas the database is on EST; the result is a 6 hour time difference. This is a minor oversight on my part and has nothing to do with the location of the machines. In fact, they are all virtualised.

Summary Part I

In this part I tried to lay the foundation for the next one where I’m using my lab environment to simulate an artificially high network latency. I have to admit again that my setup is somewhat over-simplified-using Data Guard with asynchronous log shipping and Maximum Performance mode is not something I’d like to support for anything more important than my toy databases in the lab.

References

http://www.oracle.com/technetwork/database/availability/sync-2437177.pdf

Creating a RAC 12.1 Data Guard Physical Standby environment (4)

In the previous three parts of this series a lot of preparation work, needed for the configuration of Data Guard, was performed. In this part of the mini-series they all come to fruition. Using the Data Guard broker a switchover operation will be performed. A couple of new features in 12c make this easier. According to the “Changes in This Release for Oracle Data Guard Concepts and Administration” chapter of the 12.1 Data Guard Concepts and Administration guide:

When [you, ed.] perform a switchover from an Oracle RAC primary database to a physical standby database, it is no longer necessary to shut down all but one primary database instance.

I have always wanted to test that in a quiet moment…

I have previously blogged about another useful change that should make my life easier: the static registration of the *_DGMGRL services in the listener.ora file is no longer needed. Have a look at my Data Guard Broker Setup Changes post for more details and reference to the documentation.

NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!

Now let’s get to it.

Step 1: Check the status of the configuration

In the first step I always check the configuration and make sure I can switch over. Data Guard 12c has a nifty automatic check that helps, but I always have a list of tasks I perform prior to a switchover (not shown in this blog post).

The following commands are somewhat sensitive to availability of the network – you should protect your sessions against any type of network failure! I am using screen (1) for that purpose, there are other tools out there doing similar things. Network glitches are too common to ignore, and I have come to appreciate the ability to resume work without too many problems after having seen the dreaded “broken pipe” message in my terminal window…

[oracle@rac12sec1 ~]$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys@ncdbb
Password:
Connected as SYSDBA.
DGMGRL> show configuration

Configuration - ractest

  Protection Mode: MaxPerformance
  Members:
  NCDBA - Primary database
    NCDBB - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 55 seconds ago)

DGMGRL> validate database 'NCDBB'
...

The command to check for switchover readiness is new to 12c as well and called “validate database”. I don’t have screen output from the situation at this point-just take my word that I was ready :) Don’t switch over if you have any concerns or doubts the operation might not succeed! “Validate database” does not relieve you from your duties to check for switchover readiness – follow your procedures.

Step 2: Switch Over

Finally, the big moment has come! It takes just one line to perform the switchover:

DGMGRL> switchover to 'NCDBB'
Performing switchover NOW, please wait...
New primary database "NCDBB" is opening...
Oracle Clusterware is restarting database "NCDBA" ...
Switchover succeeded, new primary is "NCDBB"
DGMGRL> 

DGMGRL> show database 'NCDBA';

Database - NCDBA

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 9.00 KByte/s
  Real Time Query:    ON
  Instance(s):
    NCDBA1
    NCDBA2 (apply instance)

Database Status:
SUCCESS

DGMGRL> show database 'NCDBB';

Database - NCDBB

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    NCDBB1
    NCDBB2

Database Status:
SUCCESS

DGMGRL> 

Well that was easy! Did you notice Data Guard Broker telling us that ‘Oracle Clusterware is restarting database “NCDBA” …’ ? I like it.

If you get stuck at this point something has gone wrong with the database registration in the OCR. You shouldn’t run into problems though, because you tested every aspect of the RAC system before handing the system over to its intended users, didn’t you?

Validating the new standby database shows no issues. I haven’t noticed it before but “validate database” allows you to get more verbose output:

DGMGRL> validate database verbose 'NCDBA';

  Database Role:     Physical standby database
  Primary Database:  NCDBB

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Capacity Information:
    Database  Instances        Threads
    NCDBB     2                2
    NCDBA     2                2

  Temporary Tablespace File Information:
    NCDBB TEMP Files:  1
    NCDBA TEMP Files:  1

  Flashback Database Status:
    NCDBB:  On
    NCDBA:  Off

  Data file Online Move in Progress:
    NCDBB:  No
    NCDBA:  No

  Standby Apply-Related Information:
    Apply State:      Running
    Apply Lag:        0 seconds (computed 1 second ago)
    Apply Delay:      0 minutes

  Transport-Related Information:
    Transport On:      Yes
    Gap Status:        No Gap
    Transport Lag:     0 seconds (computed 1 second ago)
    Transport Status:  Success

  Log Files Cleared:
    NCDBB Standby Redo Log Files:  Cleared
    NCDBA Online Redo Log Files:   Cleared
    NCDBA Standby Redo Log Files:  Available

  Current Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (NCDBB)                 (NCDBA)
    1         2                       3                       Sufficient SRLs
    2         2                       3                       Sufficient SRLs

  Future Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (NCDBA)                 (NCDBB)
    1         2                       3                       Sufficient SRLs
    2         2                       3                       Sufficient SRLs

  Current Configuration Log File Sizes:
    Thread #   Smallest Online Redo      Smallest Standby Redo
               Log File Size             Log File Size
               (NCDBB)                   (NCDBA)
    1          50 MBytes                 50 MBytes
    2          50 MBytes                 50 MBytes

  Future Configuration Log File Sizes:
    Thread #   Smallest Online Redo      Smallest Standby Redo
               Log File Size             Log File Size
               (NCDBA)                   (NCDBB)
    1          50 MBytes                 50 MBytes
    2          50 MBytes                 50 MBytes

  Apply-Related Property Settings:
    Property                        NCDBB Value              NCDBA Value
    DelayMins                       0                        0
    ApplyParallel                   AUTO                     AUTO

  Transport-Related Property Settings:
    Property                        NCDBB Value              NCDBA Value
    LogXptMode                      ASYNC                    ASYNC
    RedoRoutes                      <empty>                  <empty>
    Dependency                      <empty>                  <empty>
    DelayMins                       0                        0
    Binding                         optional                 optional
    MaxFailure                      0                        0
    MaxConnections                  1                        1
    ReopenSecs                      300                      300
    NetTimeout                      30                       30
    RedoCompression                 DISABLE                  DISABLE
    LogShipping                     ON                       ON

  Automatic Diagnostic Repository Errors:
    Error                       NCDBB    NCDBA
    No logging operation        NO       NO
    Control file corruptions    NO       NO
    SRL Group Unavailable       NO       NO
    System data file missing    NO       NO
    System data file corrupted  NO       NO
    System data file offline    NO       NO
    User data file missing      NO       NO
    User data file corrupted    NO       NO
    User data file offline      NO       NO
    Block Corruptions found     NO       NO

DGMGRL> 

Isn’t that cool? That’s more information at my fingertips than I can shake a stick at! It’s also a lot more than I could think of (eg online datafile move!).

Interestingly the Broker reports that I have “Sufficient SRLs”. I have seen it complain about the number of Standby Redo Logs in the past and blogged about this Interesting observation about standby redo logs in Data Guard

Summary

After 4 (!) posts about the matter I have finally been able to perform a switchover operation. Role reversals are a much neglected operation a DBA should be comfortable with. In a crisis situation everyone needs to be clear about what needs to be done to restore service to the users. The database is usually the easier part … Success of Data Guard switchover operations also depends on the quality of change management: it is easy to “forget” applying configuration changes on the DR site.

In today’s busy times only few of us are lucky enough to intimately know each and every database we look after. What’s more common (sadly!) is that a DBA looks after 42 or more databases. This really only works without too many issues if procedures and standards are rock solid, and enforced.

Creating a RAC 12.1 Data Guard Physical Standby environment (3)

In the previous two parts of this series you read about my lab environment and the preparations on the network side as well as the database. In this part I’ll cover the database duplication. Again, this won’t be a short post …

NOTE: As always, this is just a demonstration using VMs in my lab, based on my notes. Your system is most likely different, so in real-life you might take a different approach. The techniques I am using here were suitable for me, and my own small scale testing. I tried to make sure they are valid, but you may want to allocate more resources in your environment. Test, test, test on your own environment on test kit first!

Step 1: Create an initialisation file

The next step is the preparation of an initialisation file. I am taking NCDBA as the sample and transfer it over to rac12sec1:

SQL> create pfile='/tmp/initNCDBA1.ora' from spfile;

File created.

[oracle@rac12pri1 ~]$ scp /tmp/initNCDBA1.ora rac12sec1:/tmp
oracle@rac12sec1's password: 
initNCDBA1.ora                                            100% 1515     1.5KB/s   00:00    
[oracle@rac12pri1 ~]$ 

The file is a bit verbose, and needs some editing. I appreciate that RMAN replaces the file anyway with a copy but I like to keep it tidy, at least at first … The end result is shown here-as always, you make sure that your version of the pfile matches your site’s requirements and standards. The important part is where you set the directory paths correctly, and you also ensure that db_name and db_unique_name are set correctly.

*.audit_file_dest='/u01/app/oracle/admin/NCDBB/adump'
*.audit_trail='db'
*.cluster_database=true
*.compatible='12.1.0.2.0'
*.db_block_size=8192
*.db_create_file_dest='+DATA'
*.db_domain=''
*.db_name='NCDBA'
*.db_unique_name='NCDBB'
*.db_recovery_file_dest='+RECO'
*.db_recovery_file_dest_size=4785m
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=NCDBBXDB)'
NCDBB1.instance_number=1
NCDBB2.instance_number=2
*.open_cursors=300
*.pga_aggregate_target=1024m
*.processes=300
*.remote_login_passwordfile='exclusive'
*.sga_target=3072m
NCDBB2.thread=2
NCDBB1.thread=1
NCDBB1.undo_tablespace='UNDOTBS1'
NCDBB2.undo_tablespace='UNDOTBS2'

The file needs to be transferred to $ORACLE_HOME/dbs as initNCDBB1.ora.

Step 2: Create a temporary password file

You undoubtedly spotted that remote_login_passwordfile is set, which is a pre-requisite for connecting to the database as SYS via the network. The password file is required for an active database duplication, and without a password file this wouldn’t work. Since Oracle takes care of the password file during the RMAN duplicate it appears to be sufficient to simply create a password file in $ORACLE_HOME/dbs on rac12sec1:

[oracle@rac12sec1 dbs]$ orapwd file=orapwNCDBB1

Enter password for SYS:
[oracle@rac12sec1 dbs]$

During my testing this was enough to proceed to the next step.

Step 3: Duplicate the database “for standby”

The system should now be ready for the standby database’s creation. It’s been a bit of a task to get there, but from now on things will be easier … Using the *_DUP entries in the system’s tnsnames.ora file I connect to the target and auxiliary database, followed by the command to create the standby. In my case the database is more than tiny and the RMAN defaults are enough. Larger databases would probably need different settings and commands.


[oracle@rac12sec1 dbs]$ rman target sys@ncdba_dup auxiliary sys@ncdbb_dup

Recovery Manager: Release 12.1.0.2.0 - Production on Wed Dec 14 10:25:47 2016

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

target database Password: 
connected to target database: NCDBA (DBID=479005270)
auxiliary database Password: 
connected to auxiliary database: NCDBA (DBID=479005270, not open)

RMAN> startup clone nomount

connected to auxiliary database (not started)
Oracle instance started

Total System Global Area    3221225472 bytes

Fixed Size                     2929552 bytes
Variable Size                838863984 bytes
Database Buffers            2365587456 bytes
Redo Buffers                  13844480 bytes

RMAN> duplicate target database for standby from active database;

Starting Duplicate Db at 14-DEC-16
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=240 instance=NCDBB1 device type=DISK

contents of Memory Script:
{
   backup as copy reuse
   targetfile  '+DATA/NCDBA/PASSWORD/pwdncdba.256.930561891' auxiliary format
 '/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/orapwNCDBB1'   ;
}
executing Memory Script

Starting backup at 14-DEC-16
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=276 instance=NCDBA1 device type=DISK
Finished backup at 14-DEC-16

contents of Memory Script:
{
   sql clone "create spfile from memory";
   shutdown clone immediate;
   startup clone nomount;
   restore clone from service  'ncdba_dup' standby controlfile;
}  
executing Memory Script

sql statement: create spfile from memory

Oracle instance shut down

sql statement: alter system archive log current

contents of Memory Script:
{  
   switch clone datafile all;
}
executing Memory Script

datafile 1 switched to datafile copy
input datafile copy RECID=7 STAMP=930565247 file name=+DATA/NCDBB/DATAFILE/system.259.930565213
datafile 2 switched to datafile copy
input datafile copy RECID=8 STAMP=930565247 file name=+DATA/NCDBB/DATAFILE/undotbs2.260.930565221
datafile 3 switched to datafile copy
input datafile copy RECID=9 STAMP=930565247 file name=+DATA/NCDBB/DATAFILE/sysaux.261.930565223
datafile 4 switched to datafile copy
input datafile copy RECID=10 STAMP=930565247 file name=+DATA/NCDBB/DATAFILE/undotbs1.262.930565231
datafile 5 switched to datafile copy
input datafile copy RECID=11 STAMP=930565247 file name=+DATA/NCDBB/DATAFILE/example.263.930565233
datafile 6 switched to datafile copy
input datafile copy RECID=12 STAMP=930565247 file name=+DATA/NCDBB/DATAFILE/users.264.930565241
Finished Duplicate Db at 14-DEC-16

RMAN>

Perfect – the duplicate finished ok and all 6 datafiles have been transferred to my standby cluster. There are a few things worth noting here:

  • The password file of the primary (residing in ASM) has been copied to ‘/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/orapwNCDBB1’ (not in ASM)
  • The standby database’s spfile has been created from memory – a lot of unneccesary and some obsolete parameters are now in the standby database’s spfile

Step 4: Finishing Touches

There are a few things still outstanding before declaring the environment ready. First of all you need to register the standby database in the Cluster Registry (OCR). Use srvctl for this task. One way to do this is shown here:

[oracle@rac12sec1 dbs]$ srvctl add database -db NCDBB -oraclehome $ORACLE_HOME -dbtype RAC \
> -role PHYSICAL_STANDBY -startoption MOUNT -dbname NCDBA -diskgroup "DATA,RECO"

[oracle@rac12sec1 dbs]$ srvctl add instance -db NCDBB -instance NCDBB1 -node rac12sec1
[oracle@rac12sec1 dbs]$ srvctl add instance -db NCDBB -instance NCDBB2 -node rac12sec2

When registering your standby database make sure you are license compliant!

Next I move the password file to ASM. The file currently resides on rac12sec1 in the $ORACLE_HOME/dbs directory. You can use the orapwd command to move it. If you registered the database in the OCR as I did in the previous step you will get an updated configuration as a result – the password file configuration is changed automatically.

[oracle@rac12sec1 dbs]$ srvctl config database -db NCDBB | grep -i password
Password file: +DATA/NCDBB/orapwncdbb

Since I like error messages when the database starts even less than unnecessary parameters in my spfile I tend to remove these from the standby database. And besides – in its current form the spfile wouldn’t allow me to mount both RAC instances anyway. For some odd reason the mapping between SID and instances is lost during the RMAN duplicate.

So I start off by creating a pfile from the spfile, and remove all unneeded underscore parameters and deprecated ones. In the resulting step I ensure that all the necessary RAC parameters are in. The minimalistic end result looks like this for my database:

*.audit_file_dest='/u01/app/oracle/admin/NCDBB/adump'
*.audit_trail='DB'
*.cluster_database=TRUE
*.compatible='12.1.0.2.0'
*.control_files='+DATA/NCDBB/CONTROLFILE/current.258.930565201',
   '+RECO/NCDBB/CONTROLFILE/current.258.930565201'#Restore Controlfile
*.db_block_size=8192
*.db_create_file_dest='+DATA'
*.db_domain=''
*.db_name='NCDBA'
*.db_recovery_file_dest='+RECO'
*.db_recovery_file_dest_size=4785M
*.db_unique_name='NCDBB'
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=NCDBBXDB)'
NCDBB1.instance_number=1
NCDBB2.instance_number=2
*.open_cursors=300
*.pga_aggregate_target=1G
*.processes=300
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_target=3G
NCDBB2.thread=2
NCDBB1.thread=1
NCDBB1.undo_tablespace='UNDOTBS1'
NCDBB2.undo_tablespace='UNDOTBS2'

Next I can move the spfile to ASM and update the OCR profile. After these steps, the OCR profile reflects the new location of the spfile. Be sure to check all $ORACLE_HOME/dbs directories on the cluster nodes to prevent local pfiles/spfiles taking precedence over the one referred to in the OCR.

[oracle@rac12sec1 dbs]$ srvctl config database -d NCDBB | grep spfile
Spfile: +DATA/NCDBB/spfileNCDBB

Everything is awesome!

It’s been a bit of a journey, but it was worth it. Before continuing with the remaining configuration steps, here is a little bit of detail about the system as it is:

[oracle@rac12sec1 dbs]$ srvctl status database -db NCDBB -verbose
Instance NCDBB1 is running on node rac12sec1. Instance status: Mounted (Closed).
Instance NCDBB2 is running on node rac12sec2. Instance status: Mounted (Closed).
[oracle@rac12sec1 dbs]$ 

SQL> select name,name,db_unique_name,database_role,log_mode,force_logging,flashback_on from v$database;

NAME      NAME      DB_UNIQUE_NAME   DATABASE_ROLE    LOG_MODE     FORCE_LOGGING   FLASHBACK_ON
--------- --------- ---------------- ---------------- ------------ --------------- ------------------
NCDBA     NCDBA     NCDBB            PHYSICAL STANDBY ARCHIVELOG   YES             NO

SQL> select * from v$active_instances;

INST_NUMBER INST_NAME                                                        CON_ID
----------- ------------------------------------------------------------ ----------
          1 rac12sec1:NCDBB1                                                      0
          2 rac12sec2:NCDBB2                                                      0

In my initial drafts I intended to continue this article with the Data Guard configuration, but having looked at the WordPress-formatted part 3 up to here I decided to split it into 2 pieces. Next up is the Data Guard Broker configuration, followed by more posts about switching over and application design considerations. Stay tuned!

back to part 2 | on to part 3b

Little things worth knowing: when a transient ASM disk failure cannot be fixed in time

In the previous blog post I used libvirt and KVM in my lab environment to simulate a transient disk failure and how to recover from it. This post takes this example a step further: I am simulating another disk failure, but this time won’t pretend I can fix the issue and put it back. In other words, I simulate the effect of the disk_repair_time hitting zero.

Most of what I am covering here is an extension of the previous post, I’ll mention the main detail here for your benefit, but would like to invite you to revert to the previous post for more detail.

The idea is to show you the output of the ASM alert.log and result of the lost disk in the V$-views.

As with the previous post, the code examples in this one are for demonstration purposes only!

The setup in a nutshell

I am using Oracle Linux 7.2 with UEK4 as the host system; KVM and libvirt are responsible for the guests (virtual machines). The guest VM used for this example is named asmtest, and uses Oracle Linux 7.2 as well. There are 9 ASM disks – 8 for +DATA featuring normal redundancy in 2 failure groups. I added +RECO for the sake of completeness with external redundancy. This post is about +DATA. To keep it simple I used an Oracle Restart configuration patched to the July 2016 PSU.

Removing the disk

As with the previous example I am using libvirt to remove a “LUN” temporarily from the guest. And sure enough, the VM picks this up. This is the relevant output obtained via journalctl -f

Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Sense Key : Illegal Request [current] 
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Add. Sense: Logical unit not supported

In the previous post I keep referring to I copied the part of the ASM instance’s alert.log that showed how the disk repair timer was ticking down. This time I am simulating the case where – for whatever reason – the transient failure could not be fixed. In that case, this is what you would see in the alert.log:

2016-09-29 15:38:21.752000 +01:00
WARNING: Started Drop Disk Timeout for Disk 1 (DATA_0001) in group 1 with a value 600
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (600) secs on ASM inst 1
2016-09-29 15:41:25.379000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (416) secs on ASM inst 1
2016-09-29 15:44:29.012000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (232) secs on ASM inst 1
2016-09-29 15:47:32.643000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (48) secs on ASM inst 1
2016-09-29 15:50:36.259000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (0) secs on ASM inst 1

The last line in the above output is definitely NOT what you want to see. If at all possible, you should fix the problem causing DATA_0001 in this example from being dropped. If you see the message about the disk being dropped you are facing the inevitable rebalance operation. Here is the continued output from the ASM alert.log:

WARNING: PST-initiated drop of 1 disk(s) in group 1(.4232658126))
SQL> alter diskgroup DATA drop disk DATA_0001 force /* ASM SERVER */
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: requesting all-instance membership refresh for group=1
GMON updating for reconfiguration, group 1 at 40 for pid 23, osid 3216
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
NOTE: membership refresh pending for group 1/0xfc493cce (DATA)
GMON querying group 1 at 41 for pid 16, osid 2701
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
SUCCESS: refreshed membership for 1/0xfc493cce (DATA)
SUCCESS: alter diskgroup DATA drop disk DATA_0001 force /* ASM SERVER */
SUCCESS: PST-initiated drop disk in group 1(4232658126))
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: starting rebalance of group 1/0xfc493cce (DATA) at power 1
Starting background process ARB0
ARB0 started with pid=20, OS id=3655 
NOTE: assigning ARB0 to group 1/0xfc493cce (DATA) with 1 parallel I/O
2016-09-29 15:50:40.257000 +01:00
NOTE: restored redundancy of control and online logs in DATA
NOTE: Rebalance has restored redundancy for any existing control file or redo log in disk group DATA
NOTE: restored redundancy of control and online logs in DATA
2016-09-29 15:50:51.655000 +01:00
NOTE: requesting all-instance membership refresh for group=1
GMON updating for reconfiguration, group 1 at 42 for pid 21, osid 3719
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
2016-09-29 15:50:54.647000 +01:00
GMON updating for reconfiguration, group 1 at 43 for pid 21, osid 3719
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
2016-09-29 15:50:57.571000 +01:00
NOTE: membership refresh pending for group 1/0xfc493cce (DATA)
GMON querying group 1 at 44 for pid 16, osid 2701
GMON querying group 1 at 45 for pid 16, osid 2701
NOTE: Disk _DROPPED_0001_DATA in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 1/0xfc493cce (DATA)
NOTE: Attempting voting file refresh on diskgroup DATA
2016-09-29 15:51:03.589000 +01:00
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 1/0xfc493cce (DATA)

The last line indicates that the rebalance operation is complete. My disk group was almost empty, the rebalance operation took almost no time. But indeed, when you check V$ASM_DISK, the disk is gone:

SQL> r
  1  select name,path,disk_number,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2* from v$asm_disk where group_number = 1 order by disk_number

NAME       PATH            DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP  REPAIR_TIMER
---------- --------------- ----------- ------- ------------ ------- -------- ---------- ------------
DATA_0000  /dev/asm-disk02           0 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0002  /dev/asm-disk03           2 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0003  /dev/asm-disk06           3 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0004  /dev/asm-disk07           4 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0005  /dev/asm-disk08           5 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0006  /dev/asm-disk04           6 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0007  /dev/asm-disk05           7 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0

7 rows selected.

SQL> select count(*), failgroup from v$asm_disk where name like 'DATA%' group by failgroup;

  COUNT(*) FAILGROUP
---------- ----------
         3 BATCH1
         4 BATCH2

My system is fully operational, and the rebalance did not run into any space problems. Space problems are the last thing you want to have when rebalancing. I did some research about this subject earlier and documented it in a blog post.

In a future blog post I am going to investigate what happens when a partner disk of DATA_0001 fails while the repair timer is ticking down.

Little things worth knowing: transient ASM disk failures in 12c

For quite some time we have been treated nicely by ASM when it comes to transient disk failures. Since 11.1 (if memory serves me right), transient failures won’t cause an ASM disk to be dropped immediately. This is good, because it can potentially save a lot of time! When a disk is dropped from an ASM disk, a rebalance operation is unavoidable. And there is more than a chance of another one following it, because ultimately you want to add the disk back to the disk group. Well, to be fair, this is only applicable for normal or high redundancy disk groups.

The feature I just described very briefly is referred to as ASM Fast Mirror Resync, and documented in the ASM Administrator’s Guide in section Administering Oracle ASM Disk Groups.

By the way, there is another new feature you might want to look at, called FAILGROUP_REPAIR_TIME, but I’ll keep that for another post.

In this post I’d like to demonstrate how easy it is to simulate a transient failure of an ASM disk using libvirt and KVM on a host running Oracle Linux 7.2. I also wanted to have the output from the O/S and ASM alert.log as personal reference.

As usual, this procedure is strictly for YOUR HOME LAB USE ONLY and for educational purposes.

The setup

When creating the VM – named “asmtest” – I created a boot volume, and a 50 GB volume for the Oracle binaries. To keep it manageable I went for an Oracle 12.1 Restart configuration, patched to the July PSU. There are 9 ASM disks, presented as SCSI devices (not virtio), I’ll explain why in a bit. The guest O/S is Oracle Linux 7.2, as is my lab server. All virtual disks on the VM are partitioned although that is not necessary (strictly speaking).

Using the lsscsi tool (part of the sg3_utils), I created udev rules following the excellent description I found on oracle-base.com.

[oracle@asmtest ~]$ cat /etc/udev/rules.d/61-asm.rules 

KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-0", SYMLINK+="asm-disk01", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-1", SYMLINK+="asm-disk02", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-2", SYMLINK+="asm-disk03", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi1-0-0-3", SYMLINK+="asm-disk04", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-0", SYMLINK+="asm-disk05", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-1", SYMLINK+="asm-disk06", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-2", SYMLINK+="asm-disk07", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi2-0-0-3", SYMLINK+="asm-disk08", OWNER="oracle", GROUP="asmdba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="0QEMU_QEMU_HARDDISK_drive-scsi3-0-0-0", SYMLINK+="asm-disk09", OWNER="oracle", GROUP="asmdba", MODE="0660"

This is slightly more complicated than my usual setup, but was needed for device name persistence. Unfortunately libvirt does not report the same block device names as my guest VM. Whilst the host “sees” sd[h-v] my guest reported these devices as sd[a-i]. Persistent device names to the rescue! The only issue I had initially was that disks presented to the VM via virtio don’t report any SCSI IDs I need for my udev rules, or in other words, lsscsi returned no data. Presenting the block devices via the SCSI interface did the trick.

The hypothesis

This snippet from the documentation (referenced above) is exactly what I wanted to test:

After you repair the disk, run the SQL statement ALTER DISKGROUP ONLINE DISK. This statement brings a repaired disk group back online to enable writes so that no new writes are missed. This statement also starts a procedure to copy of all of the extents that are marked as stale on their redundant copies.

Here is the current configuration I am using for my DATA disk group, as seen by ASM:

SQL> select name,state,type,offline_disks from v$asm_diskgroup
  2  where name = 'DATA';

NAME                           STATE       TYPE   OFFLINE_DISKS
------------------------------ ----------- ------ -------------
DATA                           MOUNTED     NORMAL             0



SQL> select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2  from v$asm_disk where group_number = 1 order by disk_number;

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP        REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- ---------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0001                 1 /dev/asm-disk01 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                      0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                      0

You can see 2 failure groups, batch1 and batch2, and all disks are available/online. The operating system mapping works perfectly thanks to my udev rules:

[oracle@asmtest ~]$ ls -l /dev/asm-disk0*
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk01 -> sda1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk02 -> sdd1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk03 -> sdc1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk04 -> sdb1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk05 -> sde1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk06 -> sdh1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk07 -> sdg1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk08 -> sdf1
lrwxrwxrwx. 1 root root 4 Sep 29 13:38 /dev/asm-disk09 -> sdi1

Let’s detach a disk and cause a transient failure

Libvirt is really cool, it allows me to modify my VM configuration at runtime. One example is removing of disks, using the detach-disk command:

[root@host ~]# virsh help detach-disk
  NAME
    detach-disk - detach disk device

  SYNOPSIS
    detach-disk <domain> <target> [--persistent] [--config] [--live] [--current]

  DESCRIPTION
    Detach disk device.

  OPTIONS
    [--domain] <string>  domain name, id or uuid
    [--target] <string>  target of disk device
    --persistent     make live change persistent
    --config         affect next boot
    --live           affect running domain
    --current        affect current domain

But which disk to remove? The disks attached to a VM can be listed using the domblklist command:

[root@host ~]# virsh domblklist asmtest
Target     Source
------------------------------------------------
vda        /path/to/slow/disk/asmtest
vdb        /path/to/slow/disk/asmtest-1
sdh        /path/to/fast/disk/asmtest-asm01.img
sdi        /path/to/fast/disk/asmtest-asm02.img
sdj        /path/to/fast/disk/asmtest-asm03.img
sdk        /path/to/fast/disk/asmtest-asm04.img
sdo        /path/to/fast/disk/asmtest-asm05.img
sdp        /path/to/fast/disk/asmtest-asm06.img
sdq        /path/to/fast/disk/asmtest-asm07.img
sdr        /path/to/fast/disk/asmtest-asm08.img
sdv        /path/to/fast/disk/asmtest-asm09.img

I mentioned previously that the information provided by the host does not necessarily match that of the guest. Since I have persistent device names though in my VM there shouldn’t be an issue. Let’s take a disk away, but only temporarily! Since my O/S and Oracle binaries reside on disks presented to the guest as virtio devices, I know they are named vd[a-z] and better left alone. That leaves sdh as the first disk to remove.

[root@host ~]# virsh detach-disk asmtest sdh --live
Disk detached successfully

And the disk is gone.

And what is happening?

The disk removal has immediate effect on the ASM instance. Switching over to my VM I can see the following information recorded in the ASM alert.log:

2016-09-29 13:45:13.936000 +01:00
WARNING: Write Failed. group:1 disk:1 AU:1 offset:1044480 size:4096
path:/dev/asm-disk01
 incarnation:0xe969ccd4 asynchronous result:'I/O error'
 subsys:System krq:0x7fda4f0b8210 bufp:0x7fda4f20b000 osderr1:0x69b5 osderr2:0x0
 IO elapsed time: 0 usec Time waited on I/O: 0 usec
WARNING: Hbeat write to PST disk 1.3916025044 in group 1 failed. [2]
NOTE: process _b000_+asm (4346) initiating offline of disk 1.3916025044 (DATA_0001) with mask 0x7e in group 1 (DATA) without client assisting
NOTE: checking PST: grp = 1
GMON checking disk modes for group 1 at 13 for pid 22, osid 4346
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: checking PST for grp 1 done.
NOTE: sending set offline flag message (844741558) to 1 disk(s) in group 1
WARNING: Disk 1 (DATA_0001) in group 1 mode 0x7f is now being offlined
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 14 for pid 22, osid 4346
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x7e, op = clear
GMON updating disk modes for group 1 at 15 for pid 22, osid 4346
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: cache closing disk 1 of grp 1: DATA_0001
NOTE: PST update grp = 1 completed successfully
2016-09-29 13:46:53.300000 +01:00
WARNING: Started Drop Disk Timeout for Disk 1 (DATA_0001) in group 1 with a value 600
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (600) secs on ASM inst 1

I also had a journalctl -f running in a different session. Unsurprisingly the O/S also noticed the disk disappearing:

Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache
Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 13:45:13 asmtest kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 13:45:13 asmtest kernel: Sense Key : Illegal Request [current] 
Sep 29 13:45:13 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 13:45:13 asmtest kernel: Add. Sense: Logical unit not supported

A transient failure

In yet another session I can see that the disk is about to go, the repair timer has started ticking down, and is decremented as time passes. This did not happen straight away, there was a little delay of maybe a couple of minutes after the disk was removed.

SQL> r
  1  select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2* from v$asm_disk where group_number = 1 order by disk_number

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP       REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- --------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0001                 1                 MISSING UNKNOWN      OFFLINE NORMAL   BATCH1                   600
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0

8 rows selected.

SQL> r
  1  select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2* from v$asm_disk where group_number = 1 order by disk_number

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP       REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- --------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0001                 1                 MISSING UNKNOWN      OFFLINE NORMAL   BATCH1                   416
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0

8 rows selected.

Another day I would like to run an additional test to see what happens when the repair_timer is actually down to 0, but for now let’s add the disk back in. On the host, I use libvirt again for this:

[root@host ~]# virsh attach-disk asmtest /path/to/fast/disk/asmtest-asm01.img sdh --live
Disk attached successfully

As soon as the prompt returns, I can see the guest O/S picking the disk up.

Sep 29 13:54:09 asmtest kernel: scsi 2:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    1.5. PQ: 0 ANSI: 5
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: Attached scsi generic sg0 type 0
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] 10485760 512-byte logical blocks: (5.36 GB/5.00 GiB)
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Write Protect is off
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Mode Sense: 63 00 00 08
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 29 13:54:09 asmtest kernel:  sda: sda1
Sep 29 13:54:09 asmtest kernel: sd 2:0:0:0: [sda] Attached SCSI disk

The documentation states that I should now be able to online the disk again, which I tried. The result is shown below:

SQL> alter diskgroup data online all;

Diskgroup altered.

I am showing you the complete ASM alert.log output here, feel free to skip to the bottom:

2016-09-29 13:54:44.774000 +01:00
SQL> alter diskgroup data online all
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: initiating resync of disk group 1 disks
DATA_0001 (1)
NOTE: process _user3750_+asm (3750) initiating offline of disk 1.3916025044 (DATA_0001) with mask 0x7e in group 1 (DATA) without client assisting
NOTE: sending set offline flag message (4172832432) to 1 disk(s) in group 1
WARNING: Disk 1 (DATA_0001) in group 1 mode 0x1 is now being offlined
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 17 for pid 20, osid 3750
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: PST update grp = 1 completed successfully
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0xe969ccd4, mask = 0x7e, op = clear
GMON updating disk modes for group 1 at 18 for pid 20, osid 3750
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: PST update grp = 1 completed successfully
NOTE: requesting all-instance membership refresh for group=1
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x11, op = assign
GMON updating disk modes for group 1 at 19 for pid 20, osid 3750
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: requesting all-instance disk validation for group=1
NOTE: disk validation pending for 1 disk in group 1/0xceb93c35 (DATA)
NOTE: Found /dev/asm-disk01 for disk DATA_0001
NOTE: completed disk validation for 1/0xceb93c35 (DATA)
NOTE: discarding redo for group 1 disk 1
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x19, op = assign
GMON updating disk modes for group 1 at 20 for pid 20, osid 3750
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: membership refresh pending for group 1/0xceb93c35 (DATA)
GMON querying group 1 at 21 for pid 16, osid 2535
NOTE: cache opening disk 1 of grp 1: DATA_0001 path:/dev/asm-disk01
SUCCESS: refreshed membership for 1/0xceb93c35 (DATA)
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x5d, op = assign
GMON updating disk modes for group 1 at 22 for pid 20, osid 3750
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x7d, op = assign
GMON updating disk modes for group 1 at 23 for pid 20, osid 3750
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
SUCCESS: alter diskgroup data online all
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: starting rebalance of group 1/0xceb93c35 (DATA) at power 1
Starting background process ARB0
ARB0 started with pid=23, OS id=4986
NOTE: assigning ARB0 to group 1/0xceb93c35 (DATA) with 1 parallel I/O
2016-09-29 13:54:48.265000 +01:00
NOTE: initiating PST update: grp 1 (DATA), dsk = 1/0x0, mask = 0x7f, op = assign
GMON updating disk modes for group 1 at 24 for pid 23, osid 4986
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: PST update grp = 1 completed successfully
NOTE: reset timers for disk: 1
NOTE: completed online of disk group 1 disks
DATA_0001 (1)
2016-09-29 13:54:49.580000 +01:00
NOTE: stopping process ARB0
NOTE: requesting all-instance membership refresh for group=1
SUCCESS: rebalance completed for group 1/0xceb93c35 (DATA)
NOTE: membership refresh pending for group 1/0xceb93c35 (DATA)
GMON querying group 1 at 25 for pid 16, osid 2535
SUCCESS: refreshed membership for 1/0xceb93c35 (DATA)
2016-09-29 13:54:51.286000 +01:00
NOTE: Attempting voting file refresh on diskgroup DATA

This looks promising, but what about V$ASM_DISK?

SQL> select name,disk_number,path,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2  from v$asm_disk where group_number = 1 order by disk_number
  3  /

NAME            DISK_NUMBER PATH            MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP       REPAIR_TIMER
--------------- ----------- --------------- ------- ------------ ------- -------- --------------- ------------
DATA_0000                 0 /dev/asm-disk02 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0001                 1 /dev/asm-disk01 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0002                 2 /dev/asm-disk03 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0003                 3 /dev/asm-disk06 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0004                 4 /dev/asm-disk07 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0005                 5 /dev/asm-disk08 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0
DATA_0006                 6 /dev/asm-disk04 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                     0
DATA_0007                 7 /dev/asm-disk05 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                     0

8 rows selected.

SQL> 

Hurray, I have my disk back! And at no time did I see a rebalance operation. Thanks ASM!