Over the past month I wrote a few posts about deploying I/O intensive workloads in the cloud. Using standard Linux tools, mainly Logical Volume Manager (LVM) I tried to prevent certain pitfalls from occurring. Although I’m a great fan of LVM and RAID (and their combination), there are situations where LVM/Software RAID aren’t part the best solution. This is especially true when it comes to extending a VM’s storage configuration for an Oracle Database.
Striping, Mirroring and Risk
With LVM RAID (or LVM on top of Software RAID) it is possible to stripe an Oracle database-or any other I/O intensive workload-across multiple disks. At the risk of losing the RAID device (remember that RAID 0 offers exactly zero protection from disk failure) you can gain a performance advantage. The risk can be partially mitigated by using a proven, tested, and most importantly, rehearsed technique to still meet the RTO and RPO of the database.
The trouble with LVM RAID can potentially start as soon as you add more storage to the VM. I hope I managed to demonstrate the risk of I/O hotspots in my earlier posts.
Oracle’s ASM is different from stock-Linux tools, and it’s much less of a general purpose solution. Being an Oracle product it is also subject to a different license model. Which rules it out for most generic use cases, or at least that’s my experience. If, however, you want to deploy an Oracle database in the cloud, it is well worth considering ASM. I don’t want to say it’s free of drawbacks (no piece of software is) but in my opinion its benefits outweigh the disadvantages deploying a database.
For the sake of argument I’ll treat Oracle Restart and Grid Infrastructure as synonyms in this article. Oracle Restart is made up of ASM as well as a trimmed version of Oracle’s Clusterware as used in Real Application Clusters. Oracle Restart is installed into a separate Oracle Home, you usually install one database software home in addition. More on that later.
ASM vs LVM: a Question of Concepts
ASM has been around for quite some time and I like to think of it as a mature technology. In a way it is similar to LVM as you aggregate block devices (Physical Volumes in LVM) into Disk Groups (Volume Groups in LVM). Rather than creating another layer of abstraction on top of the ASM Disk Group as you do with LVM you simply point the database at a couple of Disk Groups and you are done. There is no need to maintain an equivalent of a Logical Volume or file system. A shorter code path to traverse tends to be less work. And it’s common knowledge that the fastest way to do something is not to do it in the first place. I should also point out that ASM does not perform I/O. It’s always the database session that does; otherwise ASM would never scale.
But what about protection from failure? Put very simply, in ASM you have a choice between striping and striping + mirroring. There are multiple so-called redundancy levels each with their own implications. If you are interested you can find the relevant details in Oracle’s Automatic Storage Management Administration Guide.
My Test Environment’s Setup
To keep things consistent with my previous posts I am installing Oracle Restart on my VM.Standard.E4.Flex VM in Oracle Cloud Infrastructure. Both Grid Infrastructure and database software are patched to 19.12.0, the current release at the time of writing. The underlying Linux version is 8.4 with kernel 5.4.17-2102.203.6.el8uek.x86_64. I decided to use UDEV rules for device name persistence and setting permissions rather than ASMLib or ASM Filter Driver. To keep things simple and also to follow the path I chose with my previous LVM/RAID posts I’m going to create the +DATA
and +RECO
Disk Groups with EXTERNAL redundancy. With external redundancy failure of a single block device in an ASM Disk Group will bring the entire Disk Group down, taking the database with it: game over. This is the same as with a RAID 0 configuration.
Again, and in line with the other posts about the topic, this article doesn’t concern itself with the durability of block devices in the cloud. External Redundancy should only be considered if approved in your organisation. You are most likely also required to put additional means in place to guarantee the database’s RTO and RPO. See my earlier comments and posts for details.
My +DATA
disk group is currently made up of 2 block devices, +RECO
consists of just 1 device. The database lives in +DATA
with the Fast Recovery Area (FRA) located on +RECO
.
SQL> select dg.name dg_name, dg.type, d.name disk_name, d.os_mb, d.path 2 from v$asm_disk d join v$asm_diskgroup dg on (d.group_number = dg.group_number); DG_NAME TYPE DISK_NAME OS_MB PATH ---------- ------ ---------- ---------- ------------------------------ RECO EXTERN RECO_0000 511998 /dev/oracleoci/oraclevde1 DATA EXTERN DATA_0001 511998 /dev/oracleoci/oraclevdd1 DATA EXTERN DATA_0000 511998 /dev/oracleoci/oraclevdc1
You can see from the volume sizes this is a lab/playground environment. The concepts however are independent of disk size. Just make sure the disks you use are of the same size and performance characteristics. Terraform is the most convenient way in the cloud to ensure they are.
Performance
Just as before I’ll start the familiar Swingbench workload. It isn’t meant to benchmark the system but to see which disks are in use. As in the previous examples I gave, Online Redo Logs aren’t multiplexed. This really is acceptable only in this scenario and shouldn’t be done with any serious deployments of the database. It helps me isolate I/O though, hence it’s why I did it.
Before getting detailed I/O performance figures I need to check the current device mapping:
SQL> !ls -l /dev/oracleoci/oraclevd{c,d}1 lrwxrwxrwx. 1 root root 7 Sep 1 15:21 /dev/oracleoci/oraclevdc1 -> ../sdc1 lrwxrwxrwx. 1 root root 7 Sep 1 15:21 /dev/oracleoci/oraclevdd1 -> ../sdd1
Looking at the iostat
output I can see both /dev/sdc
and /dev/sdd
actively used:
[oracle@oracle-19c-asm ~]$ iostat -xmz 5 3 Linux 5.4.17-2102.203.6.el8uek.x86_64 (oracle-19c-asm) 09/01/2021 _x86_64_ (16 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 1.19 0.00 0.26 0.12 0.01 98.43 Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s ... %util sda 1.12 1.03 0.04 0.03 0.01 0.54 ... 0.10 dm-0 1.03 0.95 0.03 0.03 0.00 0.00 ... 0.08 dm-1 0.02 0.60 0.00 0.01 0.00 0.00 ... 0.01 sdb 0.87 0.51 0.04 0.00 0.00 0.12 ... 0.09 dm-2 0.86 0.63 0.04 0.00 0.00 0.00 ... 0.09 sdc 291.58 4.87 54.15 0.05 3.51 0.01 ... 22.92 sdd 289.95 4.05 53.63 0.04 3.37 0.01 ... 19.01 sde 0.13 0.00 0.00 0.00 0.00 0.00 ... 0.01 sdf 0.10 0.72 0.00 0.01 0.00 0.00 ... 0.13 avg-cpu: %user %nice %system %iowait %steal %idle 4.23 0.00 7.77 23.90 0.33 63.78 Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s ... %util sda 0.00 2.40 0.00 0.05 0.00 1.20 ... 0.12 dm-0 0.00 0.60 0.00 0.00 0.00 0.00 ... 0.08 dm-1 0.00 3.00 0.00 0.05 0.00 0.00 ... 0.04 sdb 0.00 0.40 0.00 0.00 0.00 0.00 ... 0.02 dm-2 0.00 0.40 0.00 0.00 0.00 0.00 ... 0.02 sdc 24786.60 67.40 211.80 0.57 2319.60 0.00 ... 100.00 sdd 24575.40 72.00 210.01 0.55 2302.80 0.00 ... 97.70 sdf 0.00 0.40 0.00 0.00 0.00 0.00 ... 0.06 avg-cpu: %user %nice %system %iowait %steal %idle 4.74 0.00 7.65 24.38 0.31 62.93 Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s ... %util sda 0.00 1.80 0.00 0.02 0.00 0.20 ... 0.04 dm-0 0.00 1.20 0.00 0.02 0.00 0.00 ... 0.02 dm-1 0.00 0.80 0.00 0.01 0.00 0.00 ... 0.02 sdc 24684.20 61.60 215.14 0.50 2844.40 0.40 ... 100.00 sdd 24399.80 68.40 212.41 0.55 2787.20 0.60 ... 95.74 sdf 0.00 0.80 0.00 0.01 0.00 0.00 ... 0.10
This should demonstrate the fact ASM stripes data across disks. Up to this point there isn’t any visible difference in the iostat
output compared to my previous posts.
Extending Storage
The main difference between LVM/RAID and ASM is yet to come: what happens if I have to add storage to the +DATA
disk group? Remember that with LVM you had to add as many additional devices as you had in use. In other words, if you used a RAID 0 consisting of 2 block devices, you need to add another 2. With ASM you don’t have the same restriction as you can see in a minute.
I have added another block device to the VM, named /dev/oracleoci/oraclevdf
with the exact same size and performance characteristics as the existing 2 devices. After partitioning it and checking for device permissions I can add the device to the Disk Group. There are many ways to do so, I’m showing you the SQL interface.
[grid@oracle-19c-asm ~]$ sqlplus / as sysasm SQL*Plus: Release 19.0.0.0.0 - Production on Thu Sep 2 06:21:08 2021 Version 19.12.0.0.0 Copyright (c) 1982, 2021, Oracle. All rights reserved. Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.12.0.0.0 SQL> alter diskgroup data add disk '/dev/oracleoci/oraclevdf1' ; Diskgroup altered. SQL>
The prompt returns immediately, however there is an asynchronous operation started in the background, a so-called re-balance task:
SQL> select dg.name, o.operation, o.state,o.sofar,o.est_work,o.est_minutes, o.error_code 2 from v$asm_diskgroup dg join v$asm_operation o using (group_number) 3 / NAME OPERA STAT SOFAR EST_WORK EST_MINUTES ERROR_CODE ------------------------------ ----- ---- ---------- ---------- ----------- -------------------------------------------- DATA REBAL RUN 14608 0 0 DATA REBAL DONE 0 0 0 DATA REBAL DONE 33308 33308 0
Once completed, another disk has been added to the +DATA
disk group:
SQL> select dg.name dg_name, dg.type, d.name disk_name, d.os_mb, d.path 2 from v$asm_disk d join v$asm_diskgroup dg on (d.group_number = dg.group_number) 3 where dg.name = 'DATA' 4 / DG_NAME TYPE DISK_NAME OS_MB PATH ---------- ------ ---------- ---------- ------------------------------ DATA EXTERN DATA_0002 511998 /dev/oracleoci/oraclevdf1 DATA EXTERN DATA_0000 511998 /dev/oracleoci/oraclevdc1 DATA EXTERN DATA_0001 511998 /dev/oracleoci/oraclevdd1 SQL>
The disk rebalance operation is an online operation by the way with a few tunables such as the so-called power limit: you can trade off completion time vs effect it has on ongoing I/O operations. For some time the maximum value of ASM’s power limit was 11 ;)
What does that mean for our Swingbench workload? Let’s have a look at iostat
while the same workload is running. Please remember that /dev/oracleoci/oraclevd[cdf]1
are part of the ASM +DATA
Disk Group:
[grid@oracle-19c-asm ~]$ ls -l /dev/oracleoci/oraclevd[cdf]1 lrwxrwxrwx. 1 root root 7 Sep 2 06:30 /dev/oracleoci/oraclevdc1 -> ../sdd1 lrwxrwxrwx. 1 root root 7 Sep 2 06:30 /dev/oracleoci/oraclevdd1 -> ../sdb1 lrwxrwxrwx. 1 root root 7 Sep 2 06:35 /dev/oracleoci/oraclevdf1 -> ../sdf1
Please bear this in mind when looking at the iostat
output:
[grid@oracle-19c-asm ~]$ iostat -xmz 5 3 Linux 5.4.17-2102.203.6.el8uek.x86_64 (oracle-19c-asm) 09/02/2021 _x86_64_ (16 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.27 0.03 0.37 0.40 0.03 98.90 Device r/s w/s rMB/s wMB/s rrqm/s ... %util sda 4.92 1.21 0.14 0.08 0.03 ... 0.26 dm-0 4.53 0.68 0.13 0.07 0.00 ... 0.23 dm-1 0.12 0.75 0.00 0.01 0.00 ... 0.02 sdb 391.83 7.36 12.15 3.60 27.41 ... 6.90 sdc 0.15 0.71 0.00 0.01 0.00 ... 0.14 sdd 396.92 8.48 12.20 3.61 28.23 ... 6.85 sdf 383.58 13.97 3.22 10.71 27.53 ... 5.92 sde 3.74 0.85 0.19 0.01 0.00 ... 0.28 dm-2 3.75 1.02 0.19 0.01 0.00 ... 0.28 avg-cpu: %user %nice %system %iowait %steal %idle 7.60 0.00 12.18 26.38 1.61 52.24 Device r/s w/s rMB/s wMB/s rrqm/s ... %util sda 0.00 0.40 0.00 0.00 0.00 ... 0.06 dm-0 0.00 0.40 0.00 0.00 0.00 ... 0.06 sdb 24375.60 176.80 203.25 1.39 1635.40 ... 97.62 sdc 0.00 0.80 0.00 0.01 0.00 ... 0.14 sdd 24654.60 172.40 205.89 1.45 1689.80 ... 99.96 sdf 24807.40 201.20 207.31 1.51 1718.20 ... 97.86 sde 0.00 1.00 0.00 0.01 0.00 ... 0.04 dm-2 0.00 1.20 0.00 0.01 0.00 ... 0.04 avg-cpu: %user %nice %system %iowait %steal %idle 7.22 0.00 13.05 23.61 1.55 54.57 Device r/s w/s rMB/s wMB/s rrqm/s ... %util sda 0.00 0.60 0.00 0.00 0.00 ... 0.10 dm-0 0.00 0.40 0.00 0.00 0.00 ... 0.04 dm-1 0.00 0.20 0.00 0.00 0.00 ... 0.06 sdb 24783.40 145.40 212.17 1.15 2363.20 ... 97.48 sdc 0.00 0.60 0.00 0.00 0.00 ... 0.14 sdd 24795.40 113.60 213.19 1.00 2470.80 ... 99.90 sdf 24871.00 106.00 213.34 0.97 2426.00 ... 97.00 sde 0.00 2.40 0.00 0.02 0.00 ... 0.08 dm-2 0.00 2.60 0.00 0.02 0.00 ... 0.08
You can see that all 3 disks are more or less evenly used. This is the main difference to the use of LVM RAID. Thanks to the rebalance operation all data on the disk group is redistributed across the disks in the group.
Summary
When it comes to deploying an Oracle database in an Infrastructure as a Service (IaaS) scenario Oracle’s ASM offers lots of advantages over stock Linux tools. For example, it is possible to add storage to an ASM Disk Group as and when it’s needed without over-provisioning. ASM furthermore rebalances all data in the Disk Group across all disks as part of a configuration change as you just saw. That way it is much harder to create I/O hotspots I often see when ASM is not in use.
In addition to ASM you also get other amenities as a side effect. For example, Oracle Restart allows you to start databases and database services automatically when the system boots up. There is no need to write systemd
unit files as it’s all done behind the covers. Should your database crash for some reason, provided it can, Oracle Restart automatically brings it up again without your intervention. It also works beautifully in conjunction with Oracle’s Universal Connection Pool (UCP) and Data Guard.
The use of ASM implies direct I/O. I said earlier that ASM doesn’t maintain a file system layer when used for the Oracle database (that’s not entirely correct but true for all the databases I saw) and as a result Linux can’t cache I/O. This is considered a good thing in the community by most. Oracle has its own buffer cache after all, as long as it’s sized appropriately for your workload, double-buffering isn’t the best use of precious DRAM.
So much for the plus side, but what about the implications of using Oracle Restart? First of all, it’s another Oracle software home you need to maintain. Given the high degree of automation possible these days that shouldn’t be an issue. An Ansible playbook is easy enough to write, patching all Oracle Restart components.
If your organisation mandates a separation of duties between database and storage/Linux administration your respective administrator might need to learn a new technology.
I’m sure you can think of additional downsides to using ASM, and I admit I won’t delve into the subject deeper as I’m quite biased. ASM has been one of the truly outstanding innovations for running Oracle in my opinion. The human aspect of introducing a new technology however isn’t to be under-estimated and the best technology doesn’t always win the race.
You must be logged in to post a comment.