Category Archives: Automatic Storage Management

Automatic Storage Management

Little things worth knowing: when a transient ASM disk failure cannot be fixed in time

In the previous blog post I used libvirt and KVM in my lab environment to simulate a transient disk failure and how to recover from it. This post takes this example a step further: I am simulating another disk failure, but this time won’t pretend I can fix the issue and put it back. In other words, I simulate the effect of the disk_repair_time hitting zero.

Most of what I am covering here is an extension of the previous post, I’ll mention the main detail here for your benefit, but would like to invite you to revert to the previous post for more detail.

The idea is to show you the output of the ASM alert.log and result of the lost disk in the V$-views.

As with the previous post, the code examples in this one are for demonstration purposes only!

The setup in a nutshell

I am using Oracle Linux 7.2 with UEK4 as the host system; KVM and libvirt are responsible for the guests (virtual machines). The guest VM used for this example is named asmtest, and uses Oracle Linux 7.2 as well. There are 9 ASM disks – 8 for +DATA featuring normal redundancy in 2 failure groups. I added +RECO for the sake of completeness with external redundancy. This post is about +DATA. To keep it simple I used an Oracle Restart configuration patched to the July 2016 PSU.

Removing the disk

As with the previous example I am using libvirt to remove a “LUN” temporarily from the guest. And sure enough, the VM picks this up. This is the relevant output obtained via journalctl -f

Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Sense Key : Illegal Request [current] 
Sep 29 15:34:36 asmtest kernel: sd 2:0:0:0: [sda]
Sep 29 15:34:36 asmtest kernel: Add. Sense: Logical unit not supported

In the previous post I keep referring to I copied the part of the ASM instance’s alert.log that showed how the disk repair timer was ticking down. This time I am simulating the case where – for whatever reason – the transient failure could not be fixed. In that case, this is what you would see in the alert.log:

2016-09-29 15:38:21.752000 +01:00
WARNING: Started Drop Disk Timeout for Disk 1 (DATA_0001) in group 1 with a value 600
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (600) secs on ASM inst 1
2016-09-29 15:41:25.379000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (416) secs on ASM inst 1
2016-09-29 15:44:29.012000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (232) secs on ASM inst 1
2016-09-29 15:47:32.643000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (48) secs on ASM inst 1
2016-09-29 15:50:36.259000 +01:00
WARNING: Disk 1 (DATA_0001) in group 1 will be dropped in: (0) secs on ASM inst 1

The last line in the above output is definitely NOT what you want to see. If at all possible, you should fix the problem causing DATA_0001 in this example from being dropped. If you see the message about the disk being dropped you are facing the inevitable rebalance operation. Here is the continued output from the ASM alert.log:

WARNING: PST-initiated drop of 1 disk(s) in group 1(.4232658126))
SQL> alter diskgroup DATA drop disk DATA_0001 force /* ASM SERVER */
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: requesting all-instance membership refresh for group=1
GMON updating for reconfiguration, group 1 at 40 for pid 23, osid 3216
NOTE: cache closing disk 1 of grp 1: (not open) DATA_0001
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
NOTE: membership refresh pending for group 1/0xfc493cce (DATA)
GMON querying group 1 at 41 for pid 16, osid 2701
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
SUCCESS: refreshed membership for 1/0xfc493cce (DATA)
SUCCESS: alter diskgroup DATA drop disk DATA_0001 force /* ASM SERVER */
SUCCESS: PST-initiated drop disk in group 1(4232658126))
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: starting rebalance of group 1/0xfc493cce (DATA) at power 1
Starting background process ARB0
ARB0 started with pid=20, OS id=3655 
NOTE: assigning ARB0 to group 1/0xfc493cce (DATA) with 1 parallel I/O
2016-09-29 15:50:40.257000 +01:00
NOTE: restored redundancy of control and online logs in DATA
NOTE: Rebalance has restored redundancy for any existing control file or redo log in disk group DATA
NOTE: restored redundancy of control and online logs in DATA
2016-09-29 15:50:51.655000 +01:00
NOTE: requesting all-instance membership refresh for group=1
GMON updating for reconfiguration, group 1 at 42 for pid 21, osid 3719
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
2016-09-29 15:50:54.647000 +01:00
GMON updating for reconfiguration, group 1 at 43 for pid 21, osid 3719
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0003 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0000 (PST copy 1)
NOTE: group 1 PST updated.
2016-09-29 15:50:57.571000 +01:00
NOTE: membership refresh pending for group 1/0xfc493cce (DATA)
GMON querying group 1 at 44 for pid 16, osid 2701
GMON querying group 1 at 45 for pid 16, osid 2701
NOTE: Disk _DROPPED_0001_DATA in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 1/0xfc493cce (DATA)
NOTE: Attempting voting file refresh on diskgroup DATA
2016-09-29 15:51:03.589000 +01:00
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 1/0xfc493cce (DATA)

The last line indicates that the rebalance operation is complete. My disk group was almost empty, the rebalance operation took almost no time. But indeed, when you check V$ASM_DISK, the disk is gone:

SQL> r
  1  select name,path,disk_number,mount_status,header_status,mode_status,state,failgroup,repair_timer
  2* from v$asm_disk where group_number = 1 order by disk_number

NAME       PATH            DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE    FAILGROUP  REPAIR_TIMER
---------- --------------- ----------- ------- ------------ ------- -------- ---------- ------------
DATA_0000  /dev/asm-disk02           0 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0002  /dev/asm-disk03           2 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0003  /dev/asm-disk06           3 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0004  /dev/asm-disk07           4 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0005  /dev/asm-disk08           5 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0
DATA_0006  /dev/asm-disk04           6 CACHED  MEMBER       ONLINE  NORMAL   BATCH1                0
DATA_0007  /dev/asm-disk05           7 CACHED  MEMBER       ONLINE  NORMAL   BATCH2                0

7 rows selected.

SQL> select count(*), failgroup from v$asm_disk where name like 'DATA%' group by failgroup;

  COUNT(*) FAILGROUP
---------- ----------
         3 BATCH1
         4 BATCH2

My system is fully operational, and the rebalance did not run into any space problems. Space problems are the last thing you want to have when rebalancing. I did some research about this subject earlier and documented it in a blog post.

In a future blog post I am going to investigate what happens when a partner disk of DATA_0001 fails while the repair timer is ticking down.

Fixing a problem with the ASM spfile preventing RAC 12c from starting

This is a little note to myself on how to fix a corrupt spfile in clustered ASM. I hope you find it useful, too.

Let’s assume you made a change to the ASM (server) parameter file that causes an issue. You are most likely to notice this once CRS is restarted but parts of the stack fail to come up. If “crsctl check crs” mentions any component not started you can try to find out where in the bootstrap process you are stuck. Here is the output from my system.

[root@rac12pri1 ~]# crsctl stat res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               STABLE
ora.cluster_interconnect.haip
      1        ONLINE  OFFLINE      rac12pri1                STARTING
ora.crf
      1        ONLINE  OFFLINE                               STABLE
ora.crsd
      1        ONLINE  OFFLINE                               STABLE
ora.cssd
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.ctssd
      1        ONLINE  ONLINE       rac12pri1                OBSERVER,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.evmd
      1        ONLINE  INTERMEDIATE rac12pri1                STABLE
ora.gipcd
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.gpnpd
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.mdnsd
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.storage
      1        ONLINE  OFFLINE                               STABLE
--------------------------------------------------------------------------------
[root@rac12pri1 ~]#

I noticed that lots of components are not started. If you are interested in the startup order and dependencies between processes you can find this documented in the Clusterware Administration and Deployment Guide – Chapter 1, Figure 1-2

Another useful piece of information is the Clusterware alert.log. Unlike Oracle Clusterware version 11.2 where log information was in the $GRID_HOME the 12c CRS logs moved to the ADR. A quick look at the alert.log showed


2015-07-28 09:16:51.247 [OCSSD(11611)]CRS-8500: Oracle Clusterware OCSSD process is starting with operating system process ID 11611
2015-07-28 09:16:52.347 [OCSSD(11611)]CRS-1713: CSSD daemon is started in hub mode
2015-07-28 09:16:57.974 [OCSSD(11611)]CRS-1707: Lease acquisition for node rac12pri1 number 1 completed
2015-07-28 09:16:59.076 [OCSSD(11611)]CRS-1605: CSSD voting file is online: /dev/vdc1; details in /u01/app/oracle/diag/crs/rac12pri1/crs/trace/ocssd.trc.
2015-07-28 09:16:59.089 [OCSSD(11611)]CRS-1672: The number of voting files currently available 1 has fallen to the minimum number of voting files required 1.
2015-07-28 09:17:08.198 [OCSSD(11611)]CRS-1601: CSSD Reconfiguration complete. Active nodes are rac12pri1 .
2015-07-28 09:17:10.276 [OCTSSD(11694)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 11694
2015-07-28 09:17:11.261 [OCTSSD(11694)]CRS-2403: The Cluster Time Synchronization Service on host rac12pri1 is in observer mode.
2015-07-28 09:17:11.469 [OCTSSD(11694)]CRS-2407: The new Cluster Time Synchronization Service reference node is host rac12pri1.
2015-07-28 09:17:11.469 [OCTSSD(11694)]CRS-2401: The Cluster Time Synchronization Service started on host rac12pri1.
2015-07-28 09:17:43.016 [ORAROOTAGENT(11376)]CRS-5019: All OCR locations are on ASM disk groups [CHM], and none of these disk groups are
   mounted. Details are at "(:CLSN00140:)" in "/u01/app/oracle/diag/crs/rac12pri1/crs/trace/ohasd_orarootagent_root.trc".
2015-07-28 09:18:05.139 [OCSSD(11611)]CRS-1625: Node rac12pri2, number 2, was shut down
2015-07-28 09:18:05.139 [OCSSD(11611)]CRS-1625: Node rac12pri3, number 3, was shut down
2015-07-28 09:18:05.139 [OCSSD(11611)]CRS-1625: Node rac12pri4, number 4, was shut down

In other words, CSSD has found the block device I use for the voting files, and concludes its initial work. However, the oracle ROOT agent (orarootagent) cannot proceed since none of the OCR locations on ASM can be opened. Checking the log file at that particular time I can see where the problem is:

2015-07-28 09:17:42.989946*:kgfo.c@2846: kgfoCheckMount dg=CHM ok=0
2015-07-28 09:17:42.990045 : USRTHRD:3741497088: {0:9:3} -- trace dump on error exit --

2015-07-28 09:17:42.990057 : USRTHRD:3741497088: {0:9:3} Error [kgfoAl06] in [kgfokge] at kgfo.c:2850

2015-07-28 09:17:42.990067 : USRTHRD:3741497088: {0:9:3} ORA-15077: could not locate ASM instance serving a
  required diskgroup

2015-07-28 09:17:42.990077 : USRTHRD:3741497088: {0:9:3} Category: 7

2015-07-28 09:17:42.990115 : USRTHRD:3741497088: {0:9:3} DepInfo: 15077

2015-07-28 09:17:42.990382 : USRTHRD:3741497088: {0:9:3} -- trace dump end --

2015-07-28 09:17:42.990408 :CLSDYNAM:3741497088: [ora.storage]{0:9:3} [start] retcode = 7, kgfoCheckMount(CHM)
2015-07-28 09:17:42.990423 :CLSDYNAM:3741497088: [ora.storage]{0:9:3} [start] (null) category: 7, operation:
 kgfoAl06, loc: kgfokge, OS error: 15077,
 other: ORA-15077: could not locate ASM instance serving a required diskgroup

So there is not a single ASM instance that could serve the required diskgroup. Hmmm… So maybe I have to back out the change I just made. I have developed a habit of creating backups (pfiles) of spfiles prior to implementing changes. But even if there is no backup of the spfile I can still get the system back, and here are the steps I used. Just as with the database, I need to

  1. Create a temporary pfile on the file system
  2. Start ASM using this temporary pfile
  3. Create a backup my (bad) spfile from the ASM disk group
  4. Extract all parameters
  5. Create a proper pfile that I use to start the cluster with
  6. Convert that to a spfile in ASM

Fixing the problem

The first step is to create a temporary pfile. Using the ASM instance’s alert.log I can go scroll up to a point in time before the change I made to check which parameters are needed. These following are just an example, your settings are different!

...
Using parameter settings in server-side spfile +CHM/rac12pri/ASMPARAMETERFILE/registry.253.885820125
System parameters with non-default values:
  large_pool_size          = 12M
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskstring           = "/dev/vd*1"
  asm_diskgroups           = "DATA"
  asm_diskgroups           = "RECO"
  asm_power_limit          = 1
NOTE: remote asm mode is remote (mode 0x202; from cluster type)
Cluster communication is configured to use the following interface(s) for this instance
  169.254.106.70
  169.254.184.41
cluster interconnect IPC version: Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
...

The new pfile, /tmp/init+ASM1.ora, has the following contents:

  large_pool_size          = 12M
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskstring           = "/dev/vd*1"
  asm_diskgroups           = "DATA"
  asm_diskgroups           = "RECO"
  asm_power_limit          = 1

I can now start the first ASM instance:

[oracle@rac12pri1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 12.1.0.2.0 Production on Tue Jul 28 09:23:23 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup pfile='/tmp/init+ASM1.ora'
ASM instance started

Total System Global Area 1140850688 bytes
Fixed Size                  2933400 bytes
Variable Size            1112751464 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted
SQL>

The alert.log also records the location of the spfile-you should back this up now (using asmcmd or any other tool). Using the backup, you should be able to reconstruct your spfile, but make sure to take the offending parameter out.

I decided to create the spfile as spfileASM.ora in ASM. I amended my temporary pfile with the settings from the recovered spfile and put it back into the cluster.


SQL> create spfile='+CHM/rac12pri/spfileASM.ora' from pfile='/tmp/init+ASM1.ora';

File created.

Why the name change? You cannot create files in ASM that have OMF names. Trying to create the spfile with the original name will cause an error:


SQL> create spfile='+CHM/rac12pri/ASMPARAMETERFILE/registry.253.885820125' from pfile='/tmp/init+ASM1.ora';
create spfile='+CHM/rac12pri/ASMPARAMETERFILE/registry.253.885820125' from pfile='/tmp/init+ASM1.ora'
*
ERROR at line 1:
ORA-17502: ksfdcre:4 Failed to create file
+CHM/rac12pri/ASMPARAMETERFILE/registry.253.885820125
ORA-15177: cannot operate on system aliases

The really nice thing is that this is reflected in the Grid Plug And Play (GPNP) profile immediately. The ASM alert.log showed:

2015-07-28 09:25:01.323000 +01:00
NOTE: updated gpnp profile ASM SPFILE to
NOTE: header on disk 0 advanced to format #2 using fcn 0.0
2015-07-28 09:25:58.332000 +01:00
NOTE: updated gpnp profile ASM diskstring: /dev/vd*1
NOTE: updated gpnp profile ASM diskstring: /dev/vd*1
NOTE: updated gpnp profile ASM SPFILE to +CHM/rac12pri/spfileASM.ora

And the XML profile is updated too (reformatted for better readability)


[oracle@rac12pri1 ~]$ gpnptool get -o-

<?xml version="1.0" encoding="UTF-8"?>
<gpnp:GPnP-Profile Version="1.0" xmlns="http://www.grid-pnp.org/2005/11/gpnp-profile"
  xmlns:gpnp="http://www.grid-pnp.org/2005/11/gpnp-profile"
  xmlns:orcl="http://www.oracle.com/gpnp/2005/11/gpnp-profile"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd"
  ProfileSequence="7" ClusterUId="886a0e42a...5d805357c76a"
  ClusterName="rac12pri" PALocation="">
  <gpnp:Network-Profile>
    <gpnp:HostNetwork id="gen" HostName="*">
     <gpnp:Network id="net1" IP="192.168.100.0" Adapter="eth0" Use="public"/>
     <gpnp:Network id="net2" IP="192.168.101.0" Adapter="eth1" Use="cluster_interconnect"/>
     <gpnp:Network id="net3" IP="192.168.102.0" Adapter="eth2" Use="asm,cluster_interconnect"/>
    </gpnp:HostNetwork>
  </gpnp:Network-Profile>
  <orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/>
  <orcl:ASM-Profile id="asm" DiscoveryString="/dev/vd*1" SPFile="+CHM/rac12pri/spfileASM.ora" Mode="remote"/>
  <ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#">...</ds:Signature>
</gpnp:GPnP-Profile>

This should be it-the correct values have been restored, the spfile is back on shared storage, and I should be able to start with this combination. After having issued the stop/start commands to CRS it was indeed all well:

[root@rac12pri1 ~]# crsctl check cluster
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online   

[root@rac12pri1 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       rac12pri1                STABLE
ora.CHM.dg
               ONLINE  ONLINE       rac12pri1                STABLE
ora.DATA.dg
               ONLINE  ONLINE       rac12pri1                STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac12pri1                STABLE
ora.RECO.dg
               ONLINE  ONLINE       rac12pri1                STABLE
ora.net1.network
               ONLINE  ONLINE       rac12pri1                STABLE
ora.ons
               ONLINE  ONLINE       rac12pri1                STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       rac12pri1                169.254.1.137 192.16
                                                             8.101.10 192.168.102
                                                             .10,STABLE
ora.asm
      1        ONLINE  ONLINE       rac12pri1                STABLE
      2        ONLINE  OFFLINE                               STABLE
      3        ONLINE  OFFLINE                               STABLE
ora.cdb.db
      1        OFFLINE OFFLINE                               Instance Shutdown,ST
                                                             ABLE
      2        OFFLINE OFFLINE                               STABLE
      3        OFFLINE OFFLINE                               STABLE
      4        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       rac12pri1                Open,STABLE
ora.ncdb.db
      1        ONLINE  ONLINE       rac12pri1                Open,STABLE
      2        ONLINE  OFFLINE                               STABLE
      3        ONLINE  OFFLINE                               STABLE
      4        ONLINE  OFFLINE                               STABLE
ora.ncdb.fotest.svc
      1        ONLINE  OFFLINE                               STABLE
      2        ONLINE  ONLINE       rac12pri1                STABLE
ora.oc4j
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.rac12pri1.vip
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.rac12pri2.vip
      1        ONLINE  INTERMEDIATE rac12pri1                FAILED OVER,STABLE
ora.rac12pri3.vip
      1        ONLINE  INTERMEDIATE rac12pri1                FAILED OVER,STABLE
ora.rac12pri4.vip
      1        ONLINE  INTERMEDIATE rac12pri1                FAILED OVER,STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       rac12pri1                STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       rac12pri1                STABLE
--------------------------------------------------------------------------------

Time to start Clusterware on the other nodes and to report “We are back and running” :)

Reference

UKOUG post conference geek update part 1 – ACFS for Oracle databases

One of the many interesting things I heard at the conference this time around was that Oracle’s future direction includes the use of database files on ACFS. When ACFS came out this was strictly ruled out, but has been possible for a little while now, I believe with 12.1.0.1.0. With the Oracle Database Appliance (ODA) using this deployment option and hearing about it at the conference, a little further investigation was in order. During one of the presentation @OracleRACPM Markus Michalewicz had a reference to a script that I didn’t know on his slides. The script is called gDBClone, and I wanted to see how it works. The idea is that the script can be used to create a snap-clone of a database if the source is on ACFS and in archivelog mode.

As it turned out there were a few hurdles along the way and I will point them out so you don’t run into the same issues.

UPDATE: these hurdles might be there since Oracle databases aren’t supported on ACFS in Oracle Restart: https://docs.oracle.com/database/121/OSTMG/asmfilesystem.htm#OSTMG95961 Please consider yourself warned! For the rest of the article let’s pretend that this is a clustered environment. The article is therefore purely educational and no encouragement to do this in real life.

The script and associated white paper assume that you have two systems-production and dev/test/uat. The goal of the clone procedure is to be able to create a copy of your live database on your dev/test/uat cluster. This database can then be used as the source for ACFS snapshots. This can be represented in ASCII art:


+------+   clone   +------------------+
| PROD |   ---->   |      MASTER      |
+------+           |      /     \     |  snap #1
                   |     /       \    |  snap #2
                   | CLONE1    CLONE2 |
                   +------------------+

It is by no means required to follow this approach, and if nothing else then you can use the clone script to run a RMAN duplication in a single command. I once wrote a script to do the same but this was a truly complex thing to do.

For this article I’ll assume that you clone PROD (named ORCL) to MASTER, and snap MASTER to CLONE1.

The setup

Since my “travel-lab” is not a RAC cluster I opted for an installation of 12.1.0.2.1 for Oracle Restart and the database to keep it simple.

ACFS storage will be provided by and ADVM volume from disk group data. In my case this was quite easy to accomplish. Since this was an 12.1 system anyway I created my DATA disk group with ASM, RDBMS and ADVM compatibility for 12.1.0.2.0.

You create the ASM Dynamic Volume Manager (ADVM) volume on top of the ASM disk group, in my case on DATA. This is a simple task and can be performed by a variety of tools, I opted for a call to asmcmd:

ASMCMD> volcreate
usage: volcreate -G <diskgroup> -s <size> [ --column <number> ] [ --width <stripe_width> ]
[--redundancy {high|mirror|unprotected} ] [--primary {hot|cold}] [--secondary {hot|cold}] <volume>
help:  help volcreate
ASMCMD> volcreate -G data -s 10G volume1
ASMCMD> volinfo --all
Diskgroup Name: DATA

         Volume Name: VOLUME1
         Volume Device: /dev/asm/volume1-162
         State: ENABLED
         Size (MB): 10240
         Resize Unit (MB): 64
         Redundancy: UNPROT
         Stripe Columns: 8
         Stripe Width (K): 1024
         Usage:
         Mountpath:

Initially I got a message that ASM could not communicate with the (ASM) Volume driver.

SQL> /* ASMCMD */alter diskgroup data add volume 'volume1' size 10G
ORA-15032: not all alterations performed
ORA-15477: cannot communicate with the volume driver
ERROR: /* ASMCMD */alter diskgroup data add volume 'volume1' size 10G

This was solved by modprobe calls to the oracle kernel modules.

[root@server5 ~]# modprobe  oracleacfs
[root@server5 ~]# modprobe  oracleadvm
[root@server5 ~]# modprobe  oracleoks
[root@server5 ~]# lsmod | grep oracle
oracleadvm            507006  7
oracleacfs           3307457  1
oracleoks             505749  2 oracleadvm,oracleacfs

There is a known issue with the execution of udev rules (/etc/udev/rules.d/55-usm.rules) that might delay the setting of permissions. On my system a udevadm trigger solved it. Still odd (Oracle Linux 6.6/ UEK 3 3.8.13-44.1.1.el6uek.x86_64), especially since a call to acfsdriverstate supported stated it was supported.

Once the volume is created it needs to be formatted using ACFS. This can be done in this way:

[oracle@server5 ~]$  mkfs -t acfs /dev/asm/volume1-162
mkfs.acfs: version                   = 12.1.0.2.0
mkfs.acfs: on-disk version           = 39.0
mkfs.acfs: volume                    = /dev/asm/volume1-162
mkfs.acfs: volume size               = 10737418240  (  10.00 GB )
mkfs.acfs: Format complete.

Trying to register the file system in the ACFS registry pointed me to the first problem with the procedure on Oracle Restart:

[root@server5 ~]# acfsutil registry -a /dev/asm/volume1-162 /u01/oradata
Usage: srvctl <command> <object> [<options>]
    commands: enable|disable|start|stop|status|add|remove|modify|update|getenv|setenv|
       unsetenv|config|upgrade|downgrade
    objects: database|service|asm|diskgroup|listener|home|ons
For detailed help on each command and object and its options use:
  srvctl <command> -help [-compatible] or
  srvctl <command> <object> -help [-compatible]
PRKO-2012 : filesystem object is not supported in Oracle Restart
acfsutil registry: ACFS-03111: unable to add ACFS mount /u01/oradata within Oracle Registry

Interesting-but not a problem in the lab. I usually mount file systems where in my opinion they logically belong to. In this case I mounted the file system to /u01/oradata. Spoiler alert: this is not what you are supposed to do if you want to use the gDBClone script.

To cut a long story short, the mount point was assumed to be in /acfs for a snap’d or cloned database. The script also assumes that your system is a RAC environment, and I found it not to work well in 12.1 at all due to the way it tries to get the database version from the OCR (OLR) profile. First the new ACFS file system is mounted, then made accessible to the oracle user:

[root@server5 ~]# mount -t acfs /dev/asm/volume1-162 /acfs
[root@server5 ~]# chown -R oracle:dba /acfs
[root@server5 ~]# mount | grep acfs
/dev/asm/volume1-162 on /acfs type acfs (rw)

Creating the MASTER database

My source system resides in ASM, and there is no way of creating COW clones in ASM. The MASTER database must be moved to ACFS first as a result, from where you can take storage snapshots.

A quick hack was required since I only had 1 machine, so I created the source database (PROD in the ASCII art example) as “orcl” using the following call to dbca and setting it to archivelog mode:

dbca -silent -createDatabase -templateName General_Purpose.dbc  -gdbname orcl \
> -sid orcl -sysPassword pwd1 -systemPassword pwd2  -emConfiguration none  \
> -storageType ASM  -asmsnmpPassword pwd3 -diskGroupName data -recoveryGroupName reco  \
> -totalMemory 2048

The next step is to create the MASTER database. The gDBClone script checks if a database is on ACFS in function checkIfACFS(). If a database is found to be on ASM (by checking the SPFILE location in the Clusterware profile) it requires it to be CLONED as opposed to SNAPped. Here is the command to clone ORCL to MASTER. You must set your environment to an RDBMS home before executing the script.

# ./gDBClone clone -sdbname orcl -sdbhost server5 -sdbport 1521 -tdbname master -acfs /acfs -debug

It uses a RMAN duplicate under the covers. The arguments are almost self-explanatory. It takes the location of the source database (can be remote) and where you want to store the database. Since I desperately want to store the clone on ACFS I specified it in the command. The -debug flag prints more verbose output, a lot of information is also found in /var/log/gDBName/. Note that the script is to be run as root :(

To get there a few tricks were necessary in Oracle Restart environments, I’ll feed them back to Oracle to see if they can be added to the script. You probably won’t encounter problems when using a clustered 11.2 installation.

At one point the script checks the VERSION flag in the database resource profile (crsctl stat res ora..db -p | grep ‘^VERSION’, and since that field no longer shows up in 12.1 the variable is undefined in perl and the execution fails. The problem with Oracle Restart is related to setting the database type to SINGLE on the local host (srvctl add database … -c SINGLE -x …). Oracle Restart doesn’t understand those switches. Also you can’t set the db_unique_name in 12c to a name of a database already registered in the OCR. Moving the step to register the database further down in the execution helped.

Please don’t get me wrong: the script is very neat in that it allows you to run an RMAN duplicate command over the network, potentially creating the backup on the fly. In 12c RMAN will pull backup pieces if they exist instead of creating a backup on the fly to reduce its impact on the production database. Oh and I forgot-you can even convert it to a clustered database if it is not already.

Creating an ACFS clone database

Next up is the creation of the CLONE database. My first attempts were unsuccessful. The main trick seems to be to keep the ACFS mount underneath the / (root) file system. Mounting it elsewhere caused the script to fail. If you can read perl, check the $acfs variable and checkIfACFS() routine to understand why. Also, the compatibility of your ASM diskgroup containing the volume has to be greater than 11.2.0.3 or you get this error:

[root@server5 ~]# ./gDBClone snap -sdbname master -tdbname clone1
2014-12-12 11:03:53: I Getting host info...
2014-12-12 11:03:53: I Starting.....
2014-12-12 11:03:53: I Validating environment.....

Enter the source MASTER SYSDBA password:
2014-12-12 11:03:56: I Getting OH version...
2014-12-12 11:04:04: I Checking SCAN listener server5.example.com:1521...
2014-12-12 11:04:05: I Checking database CLONE1 existence...
2014-12-12 11:04:05: I Checking registered instance CLONE1 ...
2014-12-12 11:04:10: I Checking if the Source Database MASTER it's on ASM
2014-12-12 11:04:10: I Source Database MASTER it's on ACFS
2014-12-12 11:04:10: I Checking snapshot CLONE1 existence
2014-12-12 11:04:10: I Setting up clone environment....
2014-12-12 11:04:10: I Starting auxiliary listener....
2014-12-12 11:05:10: I Creating ACFS snapshot.....
2014-12-12 11:05:10: I Start/stop MASTER to check consistency.
2014-12-12 11:05:42: I Checking if the source database MASTER is stored on an ACFS snapshot
acfsutil snap create: ACFS-03048: Snapshot operation could not complete.
acfsutil snap create: ACFS-03174: The Oracle ASM Dynamic Volume Manager (Oracle ADVM) compatibility
    attribute for the disk group is less than 11.2.0.3.0.
2014-12-12 11:05:42: E Error getting ACFS snapshot

This can be fixed quite easily provided that you don’t break anything. Remember that compatibility can be raised but never lowered. With everything in place, the clone is quick:

[root@server5 ~]# ./gDBClone snap -sdbname master -tdbname clone1
2014-12-12 11:20:17: I Getting host info...
2014-12-12 11:20:17: I Starting.....
2014-12-12 11:20:17: I Validating environment.....

Enter the source MASTER SYSDBA password:
2014-12-12 11:20:19: I Getting OH version...
2014-12-12 11:20:27: I Checking SCAN listener server5.example.com:1521...
2014-12-12 11:20:27: I Checking database CLONE1 existence...
2014-12-12 11:20:28: I Checking registered instance CLONE1 ...
2014-12-12 11:20:31: I Checking if the Source Database MASTER it's on ASM
2014-12-12 11:20:31: I Source Database MASTER it's on ACFS
2014-12-12 11:20:31: I Checking snapshot CLONE1 existence
2014-12-12 11:20:31: I Setting up clone environment....
2014-12-12 11:20:31: I Starting auxiliary listener....
2014-12-12 11:21:31: I Creating ACFS snapshot.....
2014-12-12 11:21:31: I Start/stop MASTER to check consistency.
2014-12-12 11:21:58: I Checking if the source database MASTER is stored on an ACFS snapshot
2014-12-12 11:21:58: I Setting up snapshot database.....
2014-12-12 11:21:58: I Creating Clone parameter files
2014-12-12 11:22:01: I Activating clone database.....
PRKO-2002 : Invalid command line option: -j
2014-12-12 11:23:04: I Successfully created clone "CLONE1" database

Now what does this database look like? I remember a conversation with Andy Colvin about this since he saw it on our ODA first, but consider this:

[oracle@server5 ~]$ sq

SQL*Plus: Release 12.1.0.2.0 Production on Fri Dec 12 11:51:22 2014

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
/acfs/.ACFS/snaps/CLONE1/MASTER/MASTER/datafile/o1_mf_system_b8p3pxs7_.dbf
/acfs/.ACFS/snaps/CLONE1/MASTER/MASTER/datafile/o1_mf_sysaux_b8p3pz4f_.dbf
/acfs/.ACFS/snaps/CLONE1/MASTER/MASTER/datafile/o1_mf_undotbs1_b8p3q2q0_.dbf
/acfs/.ACFS/snaps/CLONE1/MASTER/MASTER/datafile/o1_mf_users_b8p3qo2o_.dbf

Now the interesting thing is when you look into that top-level ACFS directory:

SQL> !ls -la /acfs/
total 76
drwxr-xr-x.  5 oracle dba       4096 Dec 12 10:35 .
dr-xr-xr-x. 27 root   root      4096 Dec 12 10:30 ..
drwx------.  2 oracle dba      65536 Dec 12 10:31 lost+found
drwxr-xr-x.  3 oracle oinstall  4096 Dec 12 10:52 MASTER

No .ACFS file! Even more interestingly, you can actually see what’s in the .ACFS directory when referring to it directly

SQL> !ls -l /acfs/.ACFS
total 8
drwxrwx---. 6 root root 4096 Dec 12 10:31 repl
drwxr-xr-x. 3 root root 4096 Dec 12 11:21 snaps

What I want to do next is to run a few performance benchmarks on a database in ACFS to see how it holds up. But that’s for another day…

Can you have high redundancy files in a normal redundancy diskgroup?

One of the perks of teaching classes is that I get to research questions asked. In the last Exadata Administration Class I taught someone asked: can you have your disk groups in Exadata on normal redundancy yet have certain databases use high redundancy? This would be a good interview question …

The answer is yes, which I remembered from researching material on the 11g RAC book but I wanted to prove that it is the case.

Update: I planned a second blog post where I wanted to test the effect but Alex Fatkulin was quicker, and I promise I didn’t see his post when I wrote mine. Otherwise there probably wouldn’t have been one :) In summary, you aren’t really any better protected. The disk group remains at normal redundancy, even with the data files in high. Looking at Alex’s results (and I encourage you to do so) I concur with his summary that although you have a 3rd copy of the extent protecting you from corruption, you don’t have higher resilience.

This is not Exadata specific by the way. When I face a question around ASM and internals my first idea is to use an Internet search engine and look up the ASM work by Luca Canali. This page is almost always of relevance when looking at ASM file structures: https://twiki.cern.ch/twiki/bin/view/PDBService/ASM_Internals.

ASM Templates

I am not aware of many users of ASM who know about ASM templates, although we all make use of them. The templates define the striping and mirroring of files, and are explained in the ASM Admin Guide chapter 5. You are not limited to using the Oracle provided templates, you can create your own as well, which is key to this post. I headed over to the ASM instance on my cluster and created the template:

SQL> alter diskgroup data add template adminclassHigh attributes(high);

Diskgroup altered.

With the template in place I can create a high redundancy tablespace for example in my normal redundancy diskgroup:

SQL> select name, group_number, type from v$asm_diskgroup;

NAME                           GROUP_NUMBER TYPE
------------------------------ ------------ ------
RECO                                      3 NORMAL
DBFS_DG                                   2 NORMAL
DATA                                      1 NORMAL

Unlike what the documentation suggests you do not need to change db_create_file_dest for this to work.

SQL> show parameter db_create_file_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_create_file_dest                  string      +DATA

Let’s put that template to use.

SQL> create tablespace testhigh datafile '+data(adminclassHigh)' size 10m;

Tablespace created.

SQL> select file_name from dba_data_Files;

FILE_NAME
-----------------------------------------------------------------------------------------------
+DATA/PRI/02810A3FB7F65856E053D908A8C003E1/DATAFILE/system.363.857664247
+DATA/PRI/02810A3FB7F65856E053D908A8C003E1/DATAFILE/sysaux.364.857664247
+DATA/PRI/02810A3FB7F65856E053D908A8C003E1/DATAFILE/users.365.857664253
+DATA/PRI/02810A3FB7F65856E053D908A8C003E1/DATAFILE/testhigh.357.857880753

Note that I specify the newly created template in () in the datafile clause. The tablespace has been created, next I need the file number and the incarnation to look up how the extents are distributed.

SQL> select DISK_KFFXP,AU_KFFXP,PXN_KFFXP,XNUM_KFFXP,LXN_KFFXP 
  2   from x$kffxp  where number_kffxp = 357 and incarn_kffxp = 857880753;

DISK_KFFXP   AU_KFFXP  PXN_KFFXP XNUM_KFFXP  LXN_KFFXP
---------- ---------- ---------- ---------- ----------
         4      73706          0          0          0
        23      73767          1          0          1
        24      28895          2          0          2
         1      73679          3          1          0
        27      30015          4          1          1
        22      73717          5          1          2
        21      73725          6          2          0
         7      73698          7          2          1
        33      73707          8          2          2

9 rows selected.

What looks a little cryptic can be translated using Luca’s notes as:

  • DISK is the ASM disk where the extent is located (as in V$ASM_DISK)
  • AU indicates the relative position of the allocation unit from the beginning of the disk
  • PXN is the progressive extent number ( = actual extent)
  • XNUM is the ASM file extent number (mirrored extent pairs have the same extent value)
  • LXN indicates the type of extent: 0 = primary extent, 1 = first mirror copy, 2 = second mirror copy

So you can see that each primary extent has 2 mirror copies, also known as high redundancy.

If you omit the template in the datafile clause you get what you’d expect: normal redundancy.

SQL> drop tablespace testhigh including contents and datafiles;

Tablespace dropped.

SQL> create tablespace testhigh datafile '+data' size 10m;

Tablespace created.

SQL>  select file_name from dba_data_Files;

FILE_NAME
-----------------------------------------------------------------------------------------------
+DATA/PRI/02810A3FB7F65856E053D908A8C003E1/DATAFILE/system.363.857664247
+DATA/PRI/02810A3FB7F65856E053D908A8C003E1/DATAFILE/sysaux.364.857664247
+DATA/PRI/02810A3FB7F65856E053D908A8C003E1/DATAFILE/users.365.857664253
+DATA/PRI/02810A3FB7F65856E053D908A8C003E1/DATAFILE/testhigh.357.857880927

And on ASM:

SQL> select DISK_KFFXP,AU_KFFXP,PXN_KFFXP,XNUM_KFFXP,LXN_KFFXP
  2  from x$kffxp  where number_kffxp = 357 and incarn_kffxp = 857880927;

DISK_KFFXP   AU_KFFXP  PXN_KFFXP XNUM_KFFXP  LXN_KFFXP
---------- ---------- ---------- ---------- ----------
        30      73724          0          0          0
        16      73749          1          0          1
         0      73690          2          1          0
        28      73682          3          1          1
        17      73722          4          2          0
        10      73697          5          2          1

6 rows selected.

As you can see each extent has 1 mirror copy, not 2.

DBMS_FILE_TRANSFER potentially cool but then it is not

This post is interesting for all those of you who plan to transfer data files between database instance. Why would you consider this? Here’s an excerpt from the official 12.1 package documentation:

The DBMS_FILE_TRANSFER package provides procedures to copy a binary file within a database or to transfer a binary file between databases.

But it gets better:

The destination database converts each block when it receives a file from a platform with different endianness. Datafiles can be imported after they are moved to the destination database as part of a transportable operation without RMAN conversion.

So that’s a way not only to copy data files from one database to another but it also allows me to get a file from SPARC and make it available on Linux!

Before you are getting too excited though like I did here’s a catch. If the source file is in ASM you are kind of stuffed. Consider the below (11.2.0.3.0) example, created with a bog standard dbca “General Purpose” database. I have created the directories and db link and now want to get the files from database “source” to database “dbmsft”

begin
 dbms_file_transfer.get_file('dbmsft_src1','system.256.823861409','source','dbmsft_dst1','system.256.823861409');
 dbms_file_transfer.get_file('dbmsft_src1','sysaux.257.823861409','source','dbmsft_dst1','sysaux.257.823861409');
 dbms_file_transfer.get_file('dbmsft_src1','undotbs1.258.823861411','source','dbmsft_dst1','undotbs1.258.823861411');
 dbms_file_transfer.get_file('dbmsft_src1','users.259.823861411','source','dbmsft_dst1','users.259.823861411');
 dbms_file_transfer.get_file('dbmsft_src2','nonomf_01.dbf','source','dbmsft_dst2','nonomf_01.dbf');
end;
  8  /
begin
*
ERROR at line 1:
ORA-19504: failed to create file
"+DATA/dbmsft/datafile/system.256.823861409"
ORA-17502: ksfdcre:4 Failed to create file
+DATA/dbmsft/datafile/system.256.823861409
ORA-15046: ASM file name '+DATA/dbmsft/datafile/system.256.823861409' is
not in single-file creation form
ORA-06512: at "SYS.DBMS_FILE_TRANSFER", line 37
ORA-06512: at "SYS.DBMS_FILE_TRANSFER", line 132
ORA-06512: at line 2

And yes, that’s not a bug, it’s a feature, the same as with creating tablespaces. You simply cannot specify a fully qualified ASM file name when creating a file. You can of course use an alias name but that’s not pretty, is it? I hate mixing non OMF and OMF file names in ASM.

Need to check if that’s possible in 12c…

Adding a node to a 12c RAC cluster

This post is not in chronological order, I probably should have written something about installing RAC 12c first. I didn’t want to write the tenth installation guide for Clusterware so I’m focusing on extending my two node cluster to three nodes to test the new Flex ASM feature. If you care about installing RAC 12c head over to RAC Attack for instructions, or Tim Hall’s site. The RAC Attack instructions are currently being worked at for a 12c upgrade, you can follow/participate the work on this free mailing list.

The cluster I installed is based on KVM on my lab server. I have used Oracle Linux 6.4 with UEK2 for the host OS. It is a standard, i.e. not a Flex Cluster but with Flex ASM configured. My network configuration is as shown:

  • public network on 192.168.100/24
  • private network on 192.168.101/24
  • ASM network on 192.168.1/24

As you can see here:

[grid@rac12node1 ~]$ oifcfg getif
eth0  192.168.100.0  global  public
eth1  192.168.101.0  global  cluster_interconnect
eth2  192.168.1.0    global  asm

Grid and oracle are the respective software owners, and I am making use of the new separation of duties on the RDBMS layer. Continue reading

Performance testing with Virident PCIe SCM

Thanks to the kind introduction from Kevin Closson I was given the opportunity to benchmark the Virident PCIe flash cards. I have written a little review of the testing conducted, mainly using SLOB. To my great surprise Virident gave me access to a Westmere-EP system running a top of the line 2s12c24t system with lots of memory.

In summary the testing shows that the “flash revolution” is happening, and that there are lots of vendors out there building solutions for HPC and Oracle database workloads alike. Have a look at the attached PDF for the full story if you are interested. When looking at the numbers please bear in mind it was a two socket system! I’m confident the server could not max out the cards.

Full article:

Virident testing martin bach consulting