Deploying I/O intensive workloads in the cloud: Oracle Automatic Storage Management (ASM)

Over the past month I wrote a few posts about deploying I/O intensive workloads in the cloud. Using standard Linux tools, mainly Logical Volume Manager (LVM) I tried to prevent certain pitfalls from occurring. Although I’m a great fan of LVM and RAID (and their combination), there are situations where LVM/Software RAID aren’t part the best solution. This is especially true when it comes to extending a VM’s storage configuration for an Oracle Database.

Striping, Mirroring and Risk

With LVM RAID (or LVM on top of Software RAID) it is possible to stripe an Oracle database-or any other I/O intensive workload-across multiple disks. At the risk of losing the RAID device (remember that RAID 0 offers exactly zero protection from disk failure) you can gain a performance advantage. The risk can be partially mitigated by using a proven, tested, and most importantly, rehearsed technique to still meet the RTO and RPO of the database.

The trouble with LVM RAID can potentially start as soon as you add more storage to the VM. I hope I managed to demonstrate the risk of I/O hotspots in my earlier posts.

Oracle’s ASM is different from stock-Linux tools, and it’s much less of a general purpose solution. Being an Oracle product it is also subject to a different license model. Which rules it out for most generic use cases, or at least that’s my experience. If, however, you want to deploy an Oracle database in the cloud, it is well worth considering ASM. I don’t want to say it’s free of drawbacks (no piece of software is) but in my opinion its benefits outweigh the disadvantages deploying a database.

For the sake of argument I’ll treat Oracle Restart and Grid Infrastructure as synonyms in this article. Oracle Restart is made up of ASM as well as a trimmed version of Oracle’s Clusterware as used in Real Application Clusters. Oracle Restart is installed into a separate Oracle Home, you usually install one database software home in addition. More on that later.

ASM vs LVM: a Question of Concepts

ASM has been around for quite some time and I like to think of it as a mature technology. In a way it is similar to LVM as you aggregate block devices (Physical Volumes in LVM) into Disk Groups (Volume Groups in LVM). Rather than creating another layer of abstraction on top of the ASM Disk Group as you do with LVM you simply point the database at a couple of Disk Groups and you are done. There is no need to maintain an equivalent of a Logical Volume or file system. A shorter code path to traverse tends to be less work. And it’s common knowledge that the fastest way to do something is not to do it in the first place. I should also point out that ASM does not perform I/O. It’s always the database session that does; otherwise ASM would never scale.

But what about protection from failure? Put very simply, in ASM you have a choice between striping and striping + mirroring. There are multiple so-called redundancy levels each with their own implications. If you are interested you can find the relevant details in Oracle’s Automatic Storage Management Administration Guide.

My Test Environment’s Setup

To keep things consistent with my previous posts I am installing Oracle Restart on my VM.Standard.E4.Flex VM in Oracle Cloud Infrastructure. Both Grid Infrastructure and database software are patched to 19.12.0, the current release at the time of writing. The underlying Linux version is 8.4 with kernel 5.4.17-2102.203.6.el8uek.x86_64. I decided to use UDEV rules for device name persistence and setting permissions rather than ASMLib or ASM Filter Driver. To keep things simple and also to follow the path I chose with my previous LVM/RAID posts I’m going to create the +DATA and +RECO Disk Groups with EXTERNAL redundancy. With external redundancy failure of a single block device in an ASM Disk Group will bring the entire Disk Group down, taking the database with it: game over. This is the same as with a RAID 0 configuration.

Again, and in line with the other posts about the topic, this article doesn’t concern itself with the durability of block devices in the cloud. External Redundancy should only be considered if approved in your organisation. You are most likely also required to put additional means in place to guarantee the database’s RTO and RPO. See my earlier comments and posts for details.

My +DATA disk group is currently made up of 2 block devices, +RECO consists of just 1 device. The database lives in +DATA with the Fast Recovery Area (FRA) located on +RECO.

SQL> select dg.name dg_name, dg.type, d.name disk_name, d.os_mb, d.path
  2   from v$asm_disk d join v$asm_diskgroup dg on (d.group_number = dg.group_number);

DG_NAME    TYPE   DISK_NAME       OS_MB PATH
---------- ------ ---------- ---------- ------------------------------
RECO       EXTERN RECO_0000      511998 /dev/oracleoci/oraclevde1
DATA       EXTERN DATA_0001      511998 /dev/oracleoci/oraclevdd1
DATA       EXTERN DATA_0000      511998 /dev/oracleoci/oraclevdc1

You can see from the volume sizes this is a lab/playground environment. The concepts however are independent of disk size. Just make sure the disks you use are of the same size and performance characteristics. Terraform is the most convenient way in the cloud to ensure they are.

Performance

Just as before I’ll start the familiar Swingbench workload. It isn’t meant to benchmark the system but to see which disks are in use. As in the previous examples I gave, Online Redo Logs aren’t multiplexed. This really is acceptable only in this scenario and shouldn’t be done with any serious deployments of the database. It helps me isolate I/O though, hence it’s why I did it.

Before getting detailed I/O performance figures I need to check the current device mapping:

SQL> !ls -l /dev/oracleoci/oraclevd{c,d}1
lrwxrwxrwx. 1 root root 7 Sep  1 15:21 /dev/oracleoci/oraclevdc1 -> ../sdc1
lrwxrwxrwx. 1 root root 7 Sep  1 15:21 /dev/oracleoci/oraclevdd1 -> ../sdd1

Looking at the iostat output I can see both /dev/sdc and /dev/sdd actively used:

[oracle@oracle-19c-asm ~]$ iostat -xmz 5 3
Linux 5.4.17-2102.203.6.el8uek.x86_64 (oracle-19c-asm)  09/01/2021      _x86_64_        (16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.19    0.00    0.26    0.12    0.01   98.43

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  ... %util
sda              1.12    1.03      0.04      0.03     0.01     0.54  ...  0.10
dm-0             1.03    0.95      0.03      0.03     0.00     0.00  ...  0.08
dm-1             0.02    0.60      0.00      0.01     0.00     0.00  ...  0.01
sdb              0.87    0.51      0.04      0.00     0.00     0.12  ...  0.09
dm-2             0.86    0.63      0.04      0.00     0.00     0.00  ...  0.09
sdc            291.58    4.87     54.15      0.05     3.51     0.01  ... 22.92
sdd            289.95    4.05     53.63      0.04     3.37     0.01  ... 19.01
sde              0.13    0.00      0.00      0.00     0.00     0.00  ...  0.01
sdf              0.10    0.72      0.00      0.01     0.00     0.00  ...  0.13

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.23    0.00    7.77   23.90    0.33   63.78

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  ... %util
sda              0.00    2.40      0.00      0.05     0.00     1.20  ...  0.12
dm-0             0.00    0.60      0.00      0.00     0.00     0.00  ...  0.08
dm-1             0.00    3.00      0.00      0.05     0.00     0.00  ...  0.04
sdb              0.00    0.40      0.00      0.00     0.00     0.00  ...  0.02
dm-2             0.00    0.40      0.00      0.00     0.00     0.00  ...  0.02
sdc           24786.60   67.40    211.80      0.57  2319.60     0.00 ... 100.00
sdd           24575.40   72.00    210.01      0.55  2302.80     0.00 ...  97.70
sdf              0.00    0.40      0.00      0.00     0.00     0.00  ...  0.06

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.74    0.00    7.65   24.38    0.31   62.93

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  ... %util
sda              0.00    1.80      0.00      0.02     0.00     0.20  ...  0.04
dm-0             0.00    1.20      0.00      0.02     0.00     0.00  ...  0.02
dm-1             0.00    0.80      0.00      0.01     0.00     0.00  ...  0.02
sdc           24684.20   61.60    215.14      0.50  2844.40     0.40 ... 100.00
sdd           24399.80   68.40    212.41      0.55  2787.20     0.60 ...  95.74
sdf              0.00    0.80      0.00      0.01     0.00     0.00  ...  0.10

This should demonstrate the fact ASM stripes data across disks. Up to this point there isn’t any visible difference in the iostat output compared to my previous posts.

Extending Storage

The main difference between LVM/RAID and ASM is yet to come: what happens if I have to add storage to the +DATA disk group? Remember that with LVM you had to add as many additional devices as you had in use. In other words, if you used a RAID 0 consisting of 2 block devices, you need to add another 2. With ASM you don’t have the same restriction as you can see in a minute.

I have added another block device to the VM, named /dev/oracleoci/oraclevdf with the exact same size and performance characteristics as the existing 2 devices. After partitioning it and checking for device permissions I can add the device to the Disk Group. There are many ways to do so, I’m showing you the SQL interface.

[grid@oracle-19c-asm ~]$ sqlplus / as sysasm

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Sep 2 06:21:08 2021
Version 19.12.0.0.0

Copyright (c) 1982, 2021, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.12.0.0.0

SQL> alter diskgroup data add disk '/dev/oracleoci/oraclevdf1' ; 

Diskgroup altered.

SQL>

The prompt returns immediately, however there is an asynchronous operation started in the background, a so-called re-balance task:

SQL> select dg.name, o.operation, o.state,o.sofar,o.est_work,o.est_minutes, o.error_code
  2   from v$asm_diskgroup dg join v$asm_operation o using (group_number)
  3  /

NAME                           OPERA STAT      SOFAR   EST_WORK EST_MINUTES ERROR_CODE
------------------------------ ----- ---- ---------- ---------- ----------- --------------------------------------------
DATA                           REBAL RUN       14608          0           0
DATA                           REBAL DONE          0          0           0
DATA                           REBAL DONE      33308      33308           0

Once completed, another disk has been added to the +DATA disk group:

SQL> select dg.name dg_name, dg.type, d.name disk_name, d.os_mb, d.path
  2   from v$asm_disk d join v$asm_diskgroup dg on (d.group_number = dg.group_number)
  3  where dg.name = 'DATA'
  4  /

DG_NAME    TYPE   DISK_NAME	  OS_MB PATH
---------- ------ ---------- ---------- ------------------------------
DATA	   EXTERN DATA_0002	 511998 /dev/oracleoci/oraclevdf1
DATA	   EXTERN DATA_0000	 511998 /dev/oracleoci/oraclevdc1
DATA	   EXTERN DATA_0001	 511998 /dev/oracleoci/oraclevdd1

SQL> 

The disk rebalance operation is an online operation by the way with a few tunables such as the so-called power limit: you can trade off completion time vs effect it has on ongoing I/O operations. For some time the maximum value of ASM’s power limit was 11 ;)

What does that mean for our Swingbench workload? Let’s have a look at iostat while the same workload is running. Please remember that /dev/oracleoci/oraclevd[cdf]1 are part of the ASM +DATA Disk Group:

[grid@oracle-19c-asm ~]$ ls -l /dev/oracleoci/oraclevd[cdf]1
lrwxrwxrwx. 1 root root 7 Sep  2 06:30 /dev/oracleoci/oraclevdc1 -> ../sdd1
lrwxrwxrwx. 1 root root 7 Sep  2 06:30 /dev/oracleoci/oraclevdd1 -> ../sdb1
lrwxrwxrwx. 1 root root 7 Sep  2 06:35 /dev/oracleoci/oraclevdf1 -> ../sdf1

Please bear this in mind when looking at the iostat output:

[grid@oracle-19c-asm ~]$ iostat -xmz 5 3
Linux 5.4.17-2102.203.6.el8uek.x86_64 (oracle-19c-asm) 	09/02/2021 	_x86_64_	(16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.27    0.03    0.37    0.40    0.03   98.90

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   ...  %util
sda              4.92    1.21      0.14      0.08     0.03   ...   0.26
dm-0             4.53    0.68      0.13      0.07     0.00   ...   0.23
dm-1             0.12    0.75      0.00      0.01     0.00   ...   0.02
sdb            391.83    7.36     12.15      3.60    27.41   ...   6.90
sdc              0.15    0.71      0.00      0.01     0.00   ...   0.14
sdd            396.92    8.48     12.20      3.61    28.23   ...   6.85
sdf            383.58   13.97      3.22     10.71    27.53   ...   5.92
sde              3.74    0.85      0.19      0.01     0.00   ...   0.28
dm-2             3.75    1.02      0.19      0.01     0.00   ...   0.28

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.60    0.00   12.18   26.38    1.61   52.24

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   ...  %util
sda              0.00    0.40      0.00      0.00     0.00   ...   0.06
dm-0             0.00    0.40      0.00      0.00     0.00   ...   0.06
sdb           24375.60  176.80    203.25      1.39  1635.40  ...   97.62
sdc              0.00    0.80      0.00      0.01     0.00   ...   0.14
sdd           24654.60  172.40    205.89      1.45  1689.80  ...   99.96
sdf           24807.40  201.20    207.31      1.51  1718.20  ...   97.86
sde              0.00    1.00      0.00      0.01     0.00   ...   0.04
dm-2             0.00    1.20      0.00      0.01     0.00   ...   0.04

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.22    0.00   13.05   23.61    1.55   54.57

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   ...  %util
sda              0.00    0.60      0.00      0.00     0.00   ...   0.10
dm-0             0.00    0.40      0.00      0.00     0.00   ...   0.04
dm-1             0.00    0.20      0.00      0.00     0.00   ...   0.06
sdb           24783.40  145.40    212.17      1.15  2363.20  ...   97.48
sdc              0.00    0.60      0.00      0.00     0.00   ...   0.14
sdd           24795.40  113.60    213.19      1.00  2470.80  ...   99.90
sdf           24871.00  106.00    213.34      0.97  2426.00  ...   97.00
sde              0.00    2.40      0.00      0.02     0.00   ...   0.08
dm-2             0.00    2.60      0.00      0.02     0.00   ...   0.08

You can see that all 3 disks are more or less evenly used. This is the main difference to the use of LVM RAID. Thanks to the rebalance operation all data on the disk group is redistributed across the disks in the group.

Summary

When it comes to deploying an Oracle database in an Infrastructure as a Service (IaaS) scenario Oracle’s ASM offers lots of advantages over stock Linux tools. For example, it is possible to add storage to an ASM Disk Group as and when it’s needed without over-provisioning. ASM furthermore rebalances all data in the Disk Group across all disks as part of a configuration change as you just saw. That way it is much harder to create I/O hotspots I often see when ASM is not in use.

In addition to ASM you also get other amenities as a side effect. For example, Oracle Restart allows you to start databases and database services automatically when the system boots up. There is no need to write systemd unit files as it’s all done behind the covers. Should your database crash for some reason, provided it can, Oracle Restart automatically brings it up again without your intervention. It also works beautifully in conjunction with Oracle’s Universal Connection Pool (UCP) and Data Guard.

The use of ASM implies direct I/O. I said earlier that ASM doesn’t maintain a file system layer when used for the Oracle database (that’s not entirely correct but true for all the databases I saw) and as a result Linux can’t cache I/O. This is considered a good thing in the community by most. Oracle has its own buffer cache after all, as long as it’s sized appropriately for your workload, double-buffering isn’t the best use of precious DRAM.

So much for the plus side, but what about the implications of using Oracle Restart? First of all, it’s another Oracle software home you need to maintain. Given the high degree of automation possible these days that shouldn’t be an issue. An Ansible playbook is easy enough to write, patching all Oracle Restart components.

If your organisation mandates a separation of duties between database and storage/Linux administration your respective administrator might need to learn a new technology.

I’m sure you can think of additional downsides to using ASM, and I admit I won’t delve into the subject deeper as I’m quite biased. ASM has been one of the truly outstanding innovations for running Oracle in my opinion. The human aspect of introducing a new technology however isn’t to be under-estimated and the best technology doesn’t always win the race.