Category Archives: 12c Release 2

12.2 New Feature: the FLEX ASM disk group part 3

In the previous 2 parts of this mini series I introduced the Flex ASM disk group and two related concepts, the Quota Group and File Group. In what should have become the final part (but isn’t) I am interested in checking whether quotas are enforced.

(Un)fortunately I have uncovered a few more things that are worth investigating and blogging about, which is why a) this isn’t the last post and b) it got a bit shorter than the previous two. Had I combined part 3 and 4 it would have been too long for sure … BTW, you can navigate all posts using the links at the very bottom of the page.

Are quotas enforced?

The purpose of the Quota Group is … to enforce quotas on a disk group, much like on a file system. This is quite interesting, because you now have a hard limit to which databases can grow within a disk group even for non-CDBs.

The question I set out to answer in this part is whether quotas are enforced. As you read in the previous post’s interlude, I have created 2 databases on the FLEX ASM disk group: ORCL and CDB, both container databases.

The current state of affairs for my File Groups is this:

ASMCMD> lsfg
File Group         Disk Group  Quota Group  Used Quota MB  Client Name    Client Type  
DEFAULT_FILEGROUP  FLEX        GENERIC      2488                                       
CDB_CDB$ROOT       FLEX        QG_CDB       6744           CDB_CDB$ROOT   DATABASE     
CDB_PDB$SEED       FLEX        QG_CDB       1656           CDB_PDB$SEED   DATABASE     
PDB1               FLEX        QG_CDB       1784           PDB1           DATABASE     
ORCL_CDB$ROOT      FLEX        GENERIC      9104           ORCL_CDB$ROOT  DATABASE     
ORCL_PDB$SEED      FLEX        GENERIC      1616           ORCL_PDB$SEED  DATABASE     
PDB1_0001          FLEX        GENERIC      9424           PDB1           DATABASE   

Database CDB is nicely tucked away in QG_CDB, but none of the ORCL database’s components are assigned to a Quota Group yet. I wanted to have another Quota Group QG_ORCL, for my second CDB. Somehow I think that a Quota Group per database makes sense.

ASMCMD> mkqg -G FLEX QG_ORCL quota 20G
Diskgroup altered.
ASMCMD> lsqg
Group_Num  Quotagroup_Num  Quotagroup_Name  Incarnation  Used_Quota_MB  Quota_Limit_MB  
2          1               GENERIC          1            22632          0               
2          2               QG_ORCL          7            0              20480           
2          3               QG_CDB           1            10184          20480           
ASMCMD> 
If you followed part 1 and 2 you may have noticed that my FLEX disk group group_number has changed from 5 to 2, after a server reboot.

SQL> select group_number, name, state, type from v$asm_diskgroup
  2  where group_number = 2;

GROUP_NUMBER NAME                           STATE       TYPE
------------ ------------------------------ ----------- ------
           2 FLEX                           CONNECTED   FLEX

Don’t let the change in numbers confuse you

Back to the example: File Groups ORCL_CDB$ROOT, ORCL_PDB$SEED and PDB1_0001 are not yet within QG_ORCL. This can be rectified using 3 simple mvfg commands in ASMCMD, or the corresponding SQL commands. After the move commands completed, the Quota Group still has space left (although it’s very limited)

ASMCMD> mvfg -G flex --filegroup PDB1_0001 QG_ORCL
Diskgroup altered.
ASMCMD> mvfg -G flex --filegroup ORCL_PDB$SEED  QG_ORCL
Diskgroup altered.
ASMCMD> mvfg -G flex --filegroup ORCL_CDB$ROOT  QG_ORCL
Diskgroup altered.
ASMCMD> lsfg
File Group         Disk Group  Quota Group  Used Quota MB  Client Name    Client Type  
DEFAULT_FILEGROUP  FLEX        GENERIC      2488                                       
CDB_CDB$ROOT       FLEX        QG_CDB       6744           CDB_CDB$ROOT   DATABASE     
CDB_PDB$SEED       FLEX        QG_CDB       1656           CDB_PDB$SEED   DATABASE     
PDB1               FLEX        QG_CDB       1784           PDB1           DATABASE     
ORCL_CDB$ROOT      FLEX        QG_ORCL      9104           ORCL_CDB$ROOT  DATABASE     
ORCL_PDB$SEED      FLEX        QG_ORCL      1616           ORCL_PDB$SEED  DATABASE     
PDB1_0001          FLEX        QG_ORCL      9424           PDB1           DATABASE     
ASMCMD> lsqg
Group_Num  Quotagroup_Num  Quotagroup_Name  Incarnation  Used_Quota_MB  Quota_Limit_MB  
2          1               GENERIC          1            2488           0               
2          2               QG_ORCL          7            20144          20480           
2          3               QG_CDB           1            10184          20480           
ASMCMD> 

Now I just need to try and push it over the edge to see if the quota has any effect. This is quite simple: all I need to do is create another tablespace in ORCL:PDB1

SQL> show con_name

CON_NAME
------------------------------
PDB1
SQL> create tablespace userdata datafile size 500m;
create tablespace userdata datafile size 500m
*
ERROR at line 1:
ORA-01119: error in creating database file '+FLEX'
ORA-17502: ksfdcre:4 Failed to create file +FLEX
ORA-15437: Not enough quota available in quota group QG_ORCL.

SQL> 

This is similar to what we could already enforce using the storage clause in the create pluggable database command. But what about ASM file types that aren’t data files? Querying the database I can find quite a few of these:

SQL> get /tmp/non_data_files
  1  SELECT
  2      f.group_number as DG_NUMBER,
  3      f.file_number,
  4      f.incarnation as file_incarnation,
  5      f.type,
  6      fg.name as filegroup_name,
  7      a.name as file_name
  8  FROM
  9      v$asm_alias a,
 10      v$asm_file f,
 11      v$asm_filegroup fg
 12  WHERE
 13          a.group_number = f.group_number
 14      AND
 15          a.file_number = f.file_number
 16      AND
 17          a.file_incarnation = f.incarnation
 18      AND
 19          fg.group_number = f.group_number
 20      AND
 21          fg.filegroup_number = f.filegroup_number
 22      AND
 23*         f.type  'DATAFILE';
SQL> start /tmp/non_data_files

 DG_NUMBER FILE_NUMBER FILE_INCARNATION TYPE            FILEGROUP_NAME       FILE_NAME
---------- ----------- ---------------- --------------- -------------------- ------------------------------
         2         282        948453843 PASSWORD        DEFAULT_FILEGROUP    pwdorcl.282.948453843
         2         287        948462715 PASSWORD        DEFAULT_FILEGROUP    pwdorcl.287.948462715
         2         293        948462849 CONTROLFILE     ORCL_CDB$ROOT        Current.293.948462849
         2         292        948462849 CONTROLFILE     ORCL_CDB$ROOT        Current.292.948462849
         2         294        948462855 ONLINELOG       ORCL_CDB$ROOT        group_2.294.948462855
         2         295        948462855 ONLINELOG       ORCL_CDB$ROOT        group_1.295.948462855
         2         296        948462861 ONLINELOG       ORCL_CDB$ROOT        group_1.296.948462861
         2         297        948462861 ONLINELOG       ORCL_CDB$ROOT        group_2.297.948462861
         2         304        948463227 ONLINELOG       ORCL_CDB$ROOT        group_3.304.948463227
         2         305        948463231 ONLINELOG       ORCL_CDB$ROOT        group_3.305.948463231
         2         306        948463239 ONLINELOG       ORCL_CDB$ROOT        group_4.306.948463239
         2         307        948463243 ONLINELOG       ORCL_CDB$ROOT        group_4.307.948463243
         2         298        948462891 TEMPFILE        ORCL_CDB$ROOT        TEMP.298.948462891
         2         302        948462937 TEMPFILE        ORCL_PDB$SEED        TEMP.302.948462937
         2         308        948463249 PARAMETERFILE   ORCL_CDB$ROOT        spfile.308.948463249
         2         312        948464283 TEMPFILE        PDB1_0001            TEMP.312.948464283

16 rows selected.

Let’s take the online redo logs and add another thread to instance 1:

SQL> alter database add logfile thread 1 size 1g;
alter database add logfile thread 1 size 1g
*
ERROR at line 1:
ORA-00301: error in adding log file '+FLEX' - file cannot be created
ORA-17502: ksfdcre:4 Failed to create file +FLEX
ORA-15437: Not enough quota available in quota group QG_ORCL.


SQL> 

This proves that quotas are enforced, at least for a couple of very simple examples.

Summary Part 3

There is certainly a lot more to discover about Quota Groups, datafiles set to autoextent, archivelog growth to a certain size, temp- and undo tablespaces etc. At first glance, it looks good. The key again is to have suitable monitoring in place that warns DBAs about File Groups running out of space. I wonder if that’s port of Enterprise Manager, otherwise it’s easy to write such checks yourself either as custom metrics in OEM or in tools such as Nagios.

12.2 New Feature: the FLEX ASM disk group part 2

In the first part of this series I explained the basics and some potential motivation behind the use of ASM Flex disk groups. In this part I would like to complete the description of new concepts.

New Concepts related to FLEX ASM Disk Groups

With the Flex disk group mounted, the next steps are to create a few new entities. First, I want to create a Quota Group. The Quota Group – as the name implies – will enforce quotas for entities residing within it. It is optional to add one yourself, Oracle creates a default Quota Group for you that does not enforce storage limits. As you will see later, the default Quota Group will be assigned to all new databases in the Flex ASM disk group.

The entity to be stored inside the Quota Group is named a File Group, and serves to logically group files such as those belonging to a database. According to the SQL Language Reference version 12.2, the File Group can be created for a

  • Database (non-CDB, CDB, and PDB)
  • Cluster or
  • ASM Volume

Remember that the compatible.rdbms and compatible.asm parameters are to be set to 12.2.0.1(+) on the Flex ASM disk group, ruling out pre 12.2 databases. For this post I am intending to store database-related files in the File Group. And since I like CDBs, I’m using this database type.

Database Creation

With the FLEX disk group in place I fired up dbca in silent mode to create a database for me. Before submitting the command however I created a Quota Group, connected to the ASM instance, like this:

SQL> alter diskgroup flex add quotagroup QG_CDB set quota = 20g;

Diskgroup altered.

SQL> select QUOTAGROUP_NUMBER,NAME,USED_QUOTA_MB,QUOTA_LIMIT_MB from v$asm_quotagroup;

QUOTAGROUP_NUMBER NAME                           USED_QUOTA_MB QUOTA_LIMIT_MB
----------------- ------------------------------ ------------- --------------
                1 GENERIC                                    0              0
                3 QG_CDB                                     0          20480
SQL> 

Oracle provides a GENERIC Quota Group without storage limitations when the Flex ASM disk group is created. QG_CDB is the Quota Group I just created. In hindsight I don’t think that creating the Quota Group at this stage was necessary because it wasn’t used straight away… but I’m getting ahead of myself.

Here is the command to create the two-node RAC database on my FLEX diskgroup, which I ran next:

[oracle@rac122pri1 ~]$ dbca -silent -createDatabase -templateName martin_cdb12cr2_001.dbc \
> -gdbName CDB -sysPassword secretpwd1 -systemPassword secretpwd2 -storageType ASM \
> -diskGroupName FLEX -recoveryGroupName FLEX -sampleSchema true \
> -totalMemory 2048 -dbsnmpPassword secretpwd3 -nodeinfo rac122pri1,rac122pri2 \
> -createAsContainerDatabase true -databaseConfigType RAC

I should probably create a disk group +FLEXRECO for my Fast Recovery Area, but since this is a lab system I don’t quite care enough to justify the extra space usage. Your mileage will most definitely vary, this isn’t supposed to be a blue print, instead it’s just me dabbling around with new technology :)

It takes a little time for the dbca command to return. It appears as if the database had automatically created File Groups for the CDB’s components, mapping to the GENERIC Quota Group (quotagroup_number 1 from the listing above). Querying the ASM instance, I can find these:

SQL> select FILEGROUP_NUMBER, NAME, CLIENT_NAME, USED_QUOTA_MB, QUOTAGROUP_NUMBER from v$asm_filegroup
  2  /

FILEGROUP_NUMBER NAME                 CLIENT_NAME          USED_QUOTA_MB QUOTAGROUP_NUMBER
---------------- -------------------- -------------------- ------------- -----------------
               0 DEFAULT_FILEGROUP                                     0                 1
               1 CDB_CDB$ROOT         CDB_CDB$ROOT                  6704                 1
               2 CDB_PDB$SEED         CDB_PDB$SEED                  1656                 1

Just like with Quota Groups, there is a default entity, named DEFAULT_FILEGROUP. The CDB_CDB$ROOT and CDB_PDB$SEED seem to map to the newly created database. As you can see a little later in this post, the first “CDB” in CDB_CDB$ROOT maps to the database name. CDB$ROOT and PDB$SEED should sound familiar if you have previously worked with Container Databases.

Oracle creating File Groups for databases on its own is not too bad, because it takes half the work away from me. To test whether this holds true for newly created PDBs, I added a PDB named PDB1 to my CDB. Sure enough, after the create pluggable database command returned, there was a new File Group:

SQL> select FILEGROUP_NUMBER, NAME, CLIENT_NAME from v$asm_filegroup;

FILEGROUP_NUMBER NAME                 CLIENT_NAME
---------------- -------------------- --------------------
               0 DEFAULT_FILEGROUP
               1 CDB_CDB$ROOT         CDB_CDB$ROOT
               2 CDB_PDB$SEED         CDB_PDB$SEED
               3 PDB1                 PDB1

The output made me think – somehow name and client_name no longer look that useful for establishing a relation between CDB and PDB1. This might actually not be necessary from a technical point of view, but somehow I need to understand which PDB1 is meant in the view in case there is more than 1. I can think of a PDB1 in CDB as super important, whereas PDB1 in some other CDB is less important. One way would be to name the PDB differently to allow a human to map the PDB name to a CDB. Having said that I certainly don’t want to have references to the CDB in my PDB name, that defeats the purpose of pluggable databases and wouldn’t hold true anyway when I unplug/plug the PDB to a different CDB.

Interlude
If you wonder what happens when you create another PDB1 in a different CDB, I did so, too. After a quick change to my dbca command and after the creation of another CDB I named ORCL, I create a new PDB1 within it. This is the result:

SQL> select filegroup_number, name, client_name, guid from v$asm_filegroup;

FILEGROUP_NUMBER NAME                 CLIENT_NAME          GUID
---------------- -------------------- -------------------- --------------------------------
               0 DEFAULT_FILEGROUP
               1 CDB_CDB$ROOT         CDB_CDB$ROOT         4700A987085A3DFAE05387E5E50A8C7B
               2 CDB_PDB$SEED         CDB_PDB$SEED         536DF51E8E28221BE0534764A8C0FD81
               3 PDB1                 PDB1                 537B677EF8DA0F1AE0534764A8C05729
               4 ORCL_CDB$ROOT        ORCL_CDB$ROOT        4700A987085A3DFAE05387E5E50A8C7B
               5 ORCL_PDB$SEED        ORCL_PDB$SEED        537E63B952183748E0534764A8C09A7F
               6 PDB1_0001            PDB1                 537EB5B87E62586EE0534764A8C05530

7 rows selected.

The good news is there is no clash, or even a failure of the “create pluggable database” command. The new database’s CDB$ROOT and PDB$SEED namespaces don’t collide because of the database-name prefix. CDB.PDB1 and ORCL.PDB1 don’t clash because Oracle appends a number to the File Group’s name.

The not-so-good news is that the (File Group) name and client_name became ambiguous. But there is a solution: playing around a bit I found that the GUID column in v$asm_filegroup maps to the GUID in v$pdbs/v$container.

SQL> select sys_context('USERENV','CDB_NAME') cdb_name, guid 
  2  from v$pdbs where guid = '537EB5B87E62586EE0534764A8C05530';

CDB_NAME                       GUID
------------------------------ --------------------------------
ORCL                           537EB5B87E62586EE0534764A8C05530

Looks like it’s time to understand those GUIDs better, hopefully there will be more to come soon.

Quotas

As you can see from the previous example, is not necessary to enforce a quota, however it is possible. I’ll do this for the sake of completeness (and in preparation for part 3).

NOTE: This section of the blog post was actually written before I had created the second CDB (“ORCL”), which is why you don’t see it in the command output.

Many of the administrative commands related to File Group and Quota Groups that I entered in SQL can also be issued via asmcmd, as shown here:

ASMCMD> lsqg
Group_Num  Quotagroup_Num  Quotagroup_Name  Incarnation  Used_Quota_MB  Quota_Limit_MB  
5          1               GENERIC          1            10016          0               
5          3               QG_CDB           1            0              20480           

ASMCMD> lsfg
File Group         Disk Group  Quota Group  Used Quota MB  Client Name   Client Type  
DEFAULT_FILEGROUP  FLEX        GENERIC      0                                         
CDB_CDB$ROOT       FLEX        GENERIC      6704           CDB_CDB$ROOT  DATABASE     
CDB_PDB$SEED       FLEX        GENERIC      1656           CDB_PDB$SEED  DATABASE     
PDB1               FLEX        GENERIC      1656           PDB1          DATABASE     

ASMCMD> help mvfg
mvfg
        Moves a file group in a disk group to the specified Quota Group.

Synopsis
        mvfg -G  --filegroup  

Description
        The options for the mvfg command are described below.

        -G diskgroup     - Disk group name.
        --filegroup      - File group name.

Examples
        The following is an example of the mvfg command. The file group
        FG1 in the DATA disk group is moved to the Quota Group QG1.

        ASMCMD [+] > mvfg -G DATA --filegroup FG1 QG1

See Also
       mkqg rmqg chqg lsqg

ASMCMD> 

The first two should be self-explanatory: lsqg is short for list Quota Group, and lsfg is the equivalent for File Groups. the mvfg takes a couple of parameters but should be straight forward. With the commands introduced it’s time to perform the action. I need to move File Groups using mvfg:

ASMCMD> mvfg -G flex --filegroup CDB_CDB$ROOT QG_CDB
Diskgroup altered.
ASMCMD> mvfg -G flex --filegroup CDB_PDB$SEED QG_CDB
Diskgroup altered.
ASMCMD> mvfg -G flex --filegroup PDB1 QG_CDB
Diskgroup altered.
ASMCMD> lsfg
File Group         Disk Group  Quota Group  Used Quota MB  Client Name   Client Type  
DEFAULT_FILEGROUP  FLEX        GENERIC      0                                         
CDB_CDB$ROOT       FLEX        QG_CDB       6704           CDB_CDB$ROOT  DATABASE     
CDB_PDB$SEED       FLEX        QG_CDB       1656           CDB_PDB$SEED  DATABASE     
PDB1               FLEX        QG_CDB       1656           PDB1          DATABASE     

ASMCMD> lsqg
Group_Num  Quotagroup_Num  Quotagroup_Name  Incarnation  Used_Quota_MB  Quota_Limit_MB  
5          1               GENERIC          1            0              0               
5          3               QG_CDB           1            10016          20480           
ASMCMD> 

The command completes pretty much instantaneously, so it’s not actually “moving” data, it appears that all there is done is an update on meta-data. Moving the File Group is translated to the following SQL commands, found in the ASM instance’s alert.log:

2017-07-04 11:01:53.492000 +01:00
SQL> /* ASMCMD */ALTER DISKGROUP FLEX MOVE FILEGROUP CDB_CDB$ROOT TO QG_CDB
SUCCESS: /* ASMCMD */ALTER DISKGROUP FLEX MOVE FILEGROUP CDB_CDB$ROOT TO QG_CDB
2017-07-04 11:02:08.645000 +01:00
SQL> /* ASMCMD */ALTER DISKGROUP FLEX MOVE FILEGROUP CDB_PDB$SEED TO QG_CDB
SUCCESS: /* ASMCMD */ALTER DISKGROUP FLEX MOVE FILEGROUP CDB_PDB$SEED TO QG_CDB
SQL> /* ASMCMD */ALTER DISKGROUP FLEX MOVE FILEGROUP PDB1 TO QG_CDB
SUCCESS: /* ASMCMD */ALTER DISKGROUP FLEX MOVE FILEGROUP PDB1 TO QG_CDB

Summary Part 2

In part 2 of the article series I played around with Quota Groups and File Groups. Getting to grips with these concepts is necessary for a better understanding of how a Flex ASM disk group works. In part 3 I’ll try and investigate the effect of changing properties to file groups, and whether quotas are enforced.

12.2 New Feature: the FLEX ASM disk group part 1

I knew about the 12.2 FLEX ASM disk group type from other presenters but until now – when researching the feature for the upcoming DOAG HA Day – I haven’t been able to appreciate how cool this is. And I really think it is pretty cool and worth sharing! There is a lot to be said about the feature and these tests, which is why I am splitting it into multiple parts.

Please be aware that this post is about my lab experiments, I have no production experience with FLEX ASM disk groups. As with all new features it might take a while to mature, so test, test, test…

Background

In previous versions of Oracle Grid Infrastructure/Automatic Storage Management (ASM), especially in consolidation environments, certain operations I would have liked to perform were not easily possible. Most properties of the disk group – such as redundancy levels etc. – are valid for all files that reside within it. For example, if you wanted to have normal redundancy for some databases, and high redundancy for others, you typically ended up with two “data” disk groups. The same goes for +RECO. I can think of scenarios where “filesystem-like” attributes within a disk group would have been nice. In the build-up to ASM 12.2, Oracle has steadily increased ASM limits to allow for more customisation, and in 12.1 you really could go a little over the top. You could have 63 ASM disk groups in 11.2 and up to 511 in 12.1. Although this should allow for plenty customisation, it adds a little maintenance overhead.

A more granular option to manage storage came with 12.1 and the introduction of Container Databases (CDBs). As part of the Pluggable Database’s creation, the administrator could specify a pdb_storage_clause as in this very basic example:

SQL> create pluggable database pdb1 
  2   admin user someone identified by somepassword
  3    ...
  4   storage (maxsize 200G);

Pluggable database created.

However, if the database was created within a disk group with high redundancy, all files residing in that disk group inherited that property. I couldn’t define a PDB with normal redundancy in a high redundancy disk group, at least I wasn’t aware of a way to do so in 12.1.

Flex Disk Group

A Flex Disk Group promises fine-granular management of data within the disk group. You can also enforce quotas in a disk group (probably most useful on database-level), and you can define properties such as redundancy settings per file type (also per database or Pluggable Database). So in other words you can now have a disk group containing 2 databases for example, out of which database 1 uses normal redundancy, and database 2 uses high redundancy. If database 2 is a Container Database (CDB), you can even manage settings as low down as the PDB level. A few new concepts need introducing before that can happen. Let’s begin with the most essential part: the new Flex ASM disk group. There is a variation of the theme for extended distance clusters, which is not in scope of this article series.

In my lab system, I have created a new ASM disk group of flex redundancy. The system I am using is a two-node RAC running 12.2.0.1.170620 on Oracle Linux 7.3 with UEK4. I called the disk group FLEX, here is the command used for its creation:

CREATE DISKGROUP FLEX FLEX REDUNDANCY  
 DISK 'AFD:FLEX1' 
 DISK 'AFD:FLEX2'   
 DISK 'AFD:FLEX3' 
 DISK 'AFD:FLEX4'  
 DISK 'AFD:FLEX5'  
ATTRIBUTE 
 'compatible.asm'='12.2.0.1',
 'compatible.rdbms'='12.2.0.1',
 'compatible.advm'='12.2.0.1',
 'au_size'='4M’;

Note the use of ASM Filter Driver which I am testing as part of my lab set up. It’s also enabled by default when you install ASM 12.2. Looking at the code example I do realise now that the disk group name is maybe not ideal … The important bit in the example is the use of “FLEX REDUNDANCY”, the 5 implicit different failure groups and the compatibility settings that need to be 12.2.

The documentation (Automatic Storage Management Administrator’s Guide, chapter “Managing Oracle ASM Flex Disk Groups”) states that a Flex disk group generally tolerates the loss of 2 failure groups (FGs). The same bullet point then elaborates that at least 5 failure groups are needed to absorb the loss of 2 FGs. The minimum number of FGs within a Flex disk group are 3.

If you now get all excited about this new feature, there is one Big Caveat: you need a 12.2 RDBMS instance to use the feature.

This command results in a flurry of activity in the ASM instance, I have captured the output from the command initiation to completion because it’s quite interesting to see what happens in 12.2 ASM when you create a new disk group. Feel free to scroll down past the listing if you aren’t interested in the finer details.

SQL> CREATE DISKGROUP FLEX FLEX REDUNDANCY  DISK 'AFD:FLEX1' SIZE 10239M
 DISK 'AFD:FLEX2' SIZE 10239M
 DISK 'AFD:FLEX3' SIZE 10239M
 DISK 'AFD:FLEX4' SIZE 10239M
 DISK 'AFD:FLEX5' SIZE 10239M
 ATTRIBUTE 'compatible.asm'='12.2.0.1','compatible.rdbms'='12.2.0.1','compatible.advm'='12.2.0.1','au_size'='4M'
NOTE: Assigning number (5,0) to disk (AFD:FLEX1)
NOTE: Assigning number (5,1) to disk (AFD:FLEX2)
NOTE: Assigning number (5,2) to disk (AFD:FLEX3)
NOTE: Assigning number (5,3) to disk (AFD:FLEX4)
NOTE: Assigning number (5,4) to disk (AFD:FLEX5)
2017-07-03 10:38:53.811000 +01:00
NOTE: initializing header (replicated) on grp 5 disk FLEX1
NOTE: initializing header (replicated) on grp 5 disk FLEX2
NOTE: initializing header (replicated) on grp 5 disk FLEX3
NOTE: initializing header (replicated) on grp 5 disk FLEX4
NOTE: initializing header (replicated) on grp 5 disk FLEX5
NOTE: initializing header on grp 5 disk FLEX1
NOTE: initializing header on grp 5 disk FLEX2
NOTE: initializing header on grp 5 disk FLEX3
NOTE: initializing header on grp 5 disk FLEX4
NOTE: initializing header on grp 5 disk FLEX5
NOTE: Disk 0 in group 5 is assigned fgnum=1
NOTE: Disk 1 in group 5 is assigned fgnum=2
NOTE: Disk 2 in group 5 is assigned fgnum=3
NOTE: Disk 3 in group 5 is assigned fgnum=4
NOTE: Disk 4 in group 5 is assigned fgnum=5
GMON updating for reconfiguration, group 5 at 657 for pid 45, osid 25857
NOTE: group 5 PST updated.
NOTE: initiating PST update: grp = 5
GMON updating group 5 at 658 for pid 45, osid 25857
NOTE: set version 0 for asmCompat 12.2.0.1.0 for group 5
NOTE: group FLEX: initial PST location: disks 0000 0001 0002 0003 0004
NOTE: PST update grp = 5 completed successfully
NOTE: cache registered group FLEX 5/0x0A58F009
NOTE: cache began mount (first) of group FLEX 5/0x0A58F009
NOTE: cache is mounting group FLEX created on 2017/07/03 10:38:52
NOTE: cache opening disk 0 of grp 5: FLEX1 label:FLEX1
NOTE: cache opening disk 1 of grp 5: FLEX2 label:FLEX2
NOTE: cache opening disk 2 of grp 5: FLEX3 label:FLEX3
NOTE: cache opening disk 3 of grp 5: FLEX4 label:FLEX4
NOTE: cache opening disk 4 of grp 5: FLEX5 label:FLEX5
* allocate domain 5, valid ? 0
kjbdomatt send to inst 2
NOTE: attached to recovery domain 5
NOTE: cache creating group 5/0x0A58F009 (FLEX)
NOTE: cache mounting group 5/0x0A58F009 (FLEX) succeeded
NOTE: allocating F1X0 (replicated) on grp 5 disk FLEX1
NOTE: allocating F1X0 (replicated) on grp 5 disk FLEX2
NOTE: allocating F1X0 (replicated) on grp 5 disk FLEX3
NOTE: allocating F1X0 on grp 5 disk FLEX1
NOTE: allocating F1X0 on grp 5 disk FLEX2
NOTE: allocating F1X0 on grp 5 disk FLEX3
2017-07-03 10:38:56.621000 +01:00
NOTE: Created Used Space Directory for 1 threads
NOTE: Created Virtual Allocation Locator (1 extents) and Table (5 extents) directories for group 5/0x0A58F009 (FLEX)
2017-07-03 10:39:00.153000 +01:00
NOTE: VAM migration has completed for group 5/0x0A58F009 (FLEX)
NOTE: diskgroup must now be re-mounted prior to first use
NOTE: cache dismounting (clean) group 5/0x0A58F009 (FLEX)
NOTE: messaging CKPT to quiesce pins Unix process pid: 25857, image: oracle@rac122pri1 (TNS V1-V3)
2017-07-03 10:39:01.805000 +01:00
NOTE: LGWR not being messaged to dismount
kjbdomdet send to inst 2
detach from dom 5, sending detach message to inst 2
freeing rdom 5
NOTE: detached from domain 5
NOTE: cache dismounted group 5/0x0A58F009 (FLEX)
GMON dismounting group 5 at 659 for pid 45, osid 25857
GMON dismounting group 5 at 660 for pid 45, osid 25857
NOTE: Disk FLEX1 in mode 0x7f marked for de-assignment
NOTE: Disk FLEX2 in mode 0x7f marked for de-assignment
NOTE: Disk FLEX3 in mode 0x7f marked for de-assignment
NOTE: Disk FLEX4 in mode 0x7f marked for de-assignment
NOTE: Disk FLEX5 in mode 0x7f marked for de-assignment
SUCCESS: diskgroup FLEX was created
NOTE: cache deleting context for group FLEX 5/0x0a58f009
NOTE: cache registered group FLEX 5/0x4718F00C
NOTE: cache began mount (first) of group FLEX 5/0x4718F00C
NOTE: Assigning number (5,0) to disk (AFD:FLEX1)
NOTE: Assigning number (5,1) to disk (AFD:FLEX2)
NOTE: Assigning number (5,2) to disk (AFD:FLEX3)
NOTE: Assigning number (5,3) to disk (AFD:FLEX4)
NOTE: Assigning number (5,4) to disk (AFD:FLEX5)
2017-07-03 10:39:08.161000 +01:00
NOTE: GMON heartbeating for grp 5 (FLEX)
GMON querying group 5 at 663 for pid 45, osid 25857
NOTE: cache is mounting group FLEX created on 2017/07/03 10:38:52
NOTE: cache opening disk 0 of grp 5: FLEX1 label:FLEX1
NOTE: 07/03/17 10:39:07 FLEX.F1X0 found on disk 0 au 10 fcn 0.0 datfmt 1
NOTE: cache opening disk 1 of grp 5: FLEX2 label:FLEX2
NOTE: 07/03/17 10:39:07 FLEX.F1X0 found on disk 1 au 10 fcn 0.0 datfmt 1
NOTE: cache opening disk 2 of grp 5: FLEX3 label:FLEX3
NOTE: 07/03/17 10:39:07 FLEX.F1X0 found on disk 2 au 10 fcn 0.0 datfmt 1
NOTE: cache opening disk 3 of grp 5: FLEX4 label:FLEX4
NOTE: cache opening disk 4 of grp 5: FLEX5 label:FLEX5
NOTE: cache mounting (first) flex redundancy group 5/0x4718F00C (FLEX)
* allocate domain 5, valid ? 0
kjbdomatt send to inst 2
NOTE: attached to recovery domain 5
start recovery: pdb 5, passed in flags x4 (domain enable 0)
validate pdb 5, flags x4, valid 0, pdb flags x204
* validated domain 5, flags = 0x200
NOTE: cache recovered group 5 to fcn 0.0
NOTE: redo buffer size is 512 blocks (2105344 bytes)
NOTE: LGWR attempting to mount thread 1 for diskgroup 5 (FLEX)
NOTE: LGWR found thread 1 closed at ABA 0.11262 lock domain=0 inc#=0 instnum=0
NOTE: LGWR mounted thread 1 for diskgroup 5 (FLEX)
NOTE: setting 11.2 start ABA for group FLEX thread 1 to 2.0
NOTE: LGWR opened thread 1 (FLEX) at fcn 0.0 ABA 2.0 lock domain=5 inc#=12 instnum=1 gx.incarn=1192816652 mntstmp=2017/07/03 10:39:08.437000
NOTE: cache mounting group 5/0x4718F00C (FLEX) succeeded
NOTE: cache ending mount (success) of group FLEX number=5 incarn=0x4718f00c
NOTE: Instance updated compatible.asm to 12.2.0.1.0 for grp 5 (FLEX).
NOTE: Instance updated compatible.asm to 12.2.0.1.0 for grp 5 (FLEX).
NOTE: Instance updated compatible.rdbms to 12.2.0.1.0 for grp 5 (FLEX).
NOTE: Instance updated compatible.rdbms to 12.2.0.1.0 for grp 5 (FLEX).
SUCCESS: diskgroup FLEX was mounted
NOTE: diskgroup resource ora.FLEX.dg is online
SUCCESS: CREATE DISKGROUP FLEX FLEX REDUNDANCY  DISK 'AFD:FLEX1' SIZE 10239M
 DISK 'AFD:FLEX2' SIZE 10239M
 DISK 'AFD:FLEX3' SIZE 10239M
 DISK 'AFD:FLEX4' SIZE 10239M
 DISK 'AFD:FLEX5' SIZE 10239M
 ATTRIBUTE 'compatible.asm'='12.2.0.1','compatible.rdbms'='12.2.0.1','compatible.advm'='12.2.0.1','au_size'='4M'
2017-07-03 10:39:09.429000 +01:00
NOTE: enlarging ACD to 2 threads for group 5/0x4718f00c (FLEX)
2017-07-03 10:39:11.438000 +01:00
SUCCESS: ACD enlarged for group 5/0x4718f00c (FLEX)
NOTE: Physical metadata for diskgroup 5 (FLEX) was replicated.
adrci> 

Quite a bit of activity for just 1 command… I checked the attributes of the disk group, they don’t seem too alien to me:

SQL> select name, value from v$asm_attribute 
  2   where group_number = 5 
  3   and name not like 'template%';

NAME                           VALUE
------------------------------ ------------------------------
idp.type                       dynamic
idp.boundary                   auto
disk_repair_time               3.6h
phys_meta_replicated           true
failgroup_repair_time          24.0h
thin_provisioned               FALSE
preferred_read.enabled         FALSE
sector_size                    512
logical_sector_size            512
content.type                   data
content.check                  FALSE
au_size                        4194304
appliance._partnering_type     GENERIC
compatible.asm                 12.2.0.1.0
compatible.rdbms               12.2.0.1.0
compatible.advm                12.2.0.1.0
cell.smart_scan_capable        FALSE
cell.sparse_dg                 allnonsparse
access_control.enabled         FALSE
access_control.umask           066
scrub_async_limit              1
scrub_metadata.enabled         FALSE

22 rows selected.

I didn’t specify anything about failure groups in the create disk group command, hence I am getting 5 of them:

SQL> select name, os_mb, failgroup, path from v$asm_disk where group_number = 5;

NAME            OS_MB FAILGROUP                      PATH
---------- ---------- ------------------------------ --------------------
FLEX1           10239 FLEX1                          AFD:FLEX1
FLEX2           10239 FLEX2                          AFD:FLEX2
FLEX3           10239 FLEX3                          AFD:FLEX3
FLEX4           10239 FLEX4                          AFD:FLEX4
FLEX5           10239 FLEX5                          AFD:FLEX5

The result is the new disk group, a new part of my existing lab setup. As you can see in the following output I went with a separate disk group for the Grid Infrastructure Management Repository (GIMR), named +MGMT. In addition I have a disk group named +OCR which (surprise!) I use for OCR and voting files plus the usual suspects, +DATA and +RECO. Except +FLEX, all these are disk group types that have been available forever.

[oracle@rac122pri1 ~]$ asmcmd lsdg
State    Type    Rebal  Sector  Logical_Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512             512   4096  4194304     20476    17068                0           17068              0             N  DATA/
MOUNTED  FLEX    N         512             512   4096  4194304     51180    50676                0               0              0             N  FLEX/
MOUNTED  EXTERN  N         512             512   4096  4194304     40956     6560                0            6560              0             N  MGMT/
MOUNTED  NORMAL  N         512             512   4096  4194304     15348    14480             5116            4682              0             Y  OCR/
MOUNTED  EXTERN  N         512             512   4096  4194304     15356    15224                0           15224              0             N  RECO/
[oracle@rac122pri1 ~]$ 

The values of 0 in required_mirror_free_mb and useable_file_mb for +FLEX aren’t bugs, they are documented to be empty for this type of disk group. You need to consult other entities – to be covered in the next posts – to check the space usage for your databases.

Wrap Up Part 1

Flex ASM disk groups are hugely interesting and worth keeping an eye out for when you are testing Oracle 12.2. I admit that 12.2 is still quite new, and the cautious person I am won’t use it for production until the first major patch set comes out and I tested it to death. I am also curious how the use of the ASM Filter Driver will change the way I am working with ASM. It might be similar to ASMLib but I have yet to work that out.

In the next part(s) I will cover additional concepts you need to understand in the context of Flex ASM disk groups.

Taming XML data in the v$cell% family of Exadata views

While researching Exadata 12.2 features that have been introduced with Oracle 12.2.1.1.0 I ran into an old problem of mine, querying the v$cell% views that are new with Exadata 12.1.1.1.0 and Oracle RDBSM 12c Release 1 in a meaningful way. In case you haven’t seen them yet, these views expose cell metrics and other information in the database without having to actually connect to the cells and execute cellcli commands. They are very useful, and form the basis of the latter part of the Exadata AWR reports.

On our 12.2 Exadata system (using cellos 12.2.1.1.0 and RDBMS 12.2.0.1.0) the following views are exposed to the curious researcher:

SQL> select table_name from dict 
  where table_name like 'V%CELL%' 
  order by 1;
                                                  
TABLE_NAME
-------------------------------------
V$CELL
V$CELL_CONFIG
V$CELL_CONFIG_INFO
V$CELL_DB
V$CELL_DB_HISTORY
V$CELL_DISK
V$CELL_DISK_HISTORY
V$CELL_GLOBAL
V$CELL_GLOBAL_HISTORY
V$CELL_IOREASON
V$CELL_IOREASON_NAME
V$CELL_METRIC_DESC
V$CELL_OFL_THREAD_HISTORY
V$CELL_OPEN_ALERTS
V$CELL_REQUEST_TOTALS
V$CELL_STATE
V$CELL_THREAD_HISTORY

17 rows selected.

There is a lot of useful info to be found in these, and most of the views are explained in Expert Oracle Exadata (2nd edition), chapter 11. I don’t always have a copy of the book at hand which is why I am putting this post together. To add some value, I am going to try and explain how I get to the results as well :)

Querying the views

For example if you are interested in your cell’s configuration, v$cell_config can be queried.

SQL> describe v$cell_config

 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 CELLNAME                                           VARCHAR2(1024)
 CELLHASH                                           NUMBER
 CONFTYPE                                           VARCHAR2(15)
 CONFVAL                                            CLOB
 CON_ID                                             NUMBER

SQL>

This view is a good representation of the way most of them are set up. There is a cell identifier (CELLNAME in the example), a configuration type (CONFTYPE) and then actual data stored as XML (stored as CONFVAL). Instead of storing the XML information as an XMLTYPE, a CLOB is more commonly found. When getting data from the CLOB it is often difficult to read, which is why I usually cast it to an XMLTYPE which formats nicer in SQLcl and SQLPlus.

A small example of getting information from this view is shown here. I am interested in the IORM plan for a specific cell:

SQL> select xmltype(confval) from v$cell_config 
  where cellname = '192.168.10.9' 
    and conftype = 'IORM';

XMLTYPE(CONFVAL)
--------------------------------------------------------------------------------
<?xml version="1.0" encoding="US-ASCII"?>
<cli-output>
  <version>1.0</version>
  <timestamp>1498120597819</timestamp>
  <context cell="someCell"/>
  <iormplan>
    <name>someCell_IORMPLAN</name>
    <catPlan/>
    <dbPlan/>
    <objective>basic</objective>
    <status>active</status>
  </iormplan>
</cli-output>

SQL>          

One alternative way to do so is to query the cell directly, while logged in:

[root@someCell ~]# cellcli -e list iormplan detail
         name:                   someCell_IORMPLAN
         catPlan:                
         dbPlan:                 
         objective:              basic
         status:                 active
[root@someCell ~]#

Except that cells are most often not accessible to DBAs.

Transforming XML

Just looking at the XML query output is good enough for small chunks of XML, however when investigating a larger XML document this quickly becomes impractical.

Back to the motivation for writing this post: a very useful view I wanted to query for information about statistics on columnar cache is named v$cell_state. I have used the Smart-Scan Deep Dive blog as a source. It features an example on how to extract a specific statistic from the XML data. I have extended the example to translate all statistics pertaining to the columnar cache into the more familiar format.

The XML in question is rather deeply nested. I have shortened the other output to give you an idea of the data I care about:

<cell_stats>
  <stats type="scheduler">...</stats>
  <stats type="network_directory">...</stats>
  <stats type="cache">...</stats>
  <stats type="MemCache">...</stats>
  <stats type="flashcache">
    <stats type="columnarcache">
      <stat name="columnar_cache_size">26673152</stat>
      <stat name="columnar_cache_keep_size">0</stat>
      <stat name="columnar_cache_keep_read">0</stat>
  ...

I need all the information in the ‘<stats type=”columnarcache”>’ part, lines 8 and following. The most useful way to translate XML to what looks like a relational table I found is based on a rather complex function, named XMLTABLE(). It is explained in the XML DB Developer’s Guide, chapter 4 XQuery and Oracle XML DB. Using that function I can translate XML into a relational format.

It is probably easiest to show you the query and then walk you through. According to Oracle’s blog post I can find the information about the columnar cache in v$cell_state. I don’t have to worry about the different statistics_type values because the filtering of output will be done on the XML using what is called an XPATH expression. I would like to check a single cell only, to demonstrate that it is possible to combine the output of XMLTABLE with other views/tables and query them using “regular” SQL:

SQL> select b.*
  from v$cell_state a,
  xmltable (
      '/cell_stats/stats/stats[@type="columnarcache"]/stat'
      passing xmltype(a.statistics_value)
      columns
          name   varchar2(50) path '@name',
          value  number       path '.'
  ) b
  where a.cell_name = '192.168.10.9'
  and b.value <> 0
 order by b.value;

Unsurprisingly I need to pass the XML data to the XMLTABLE in order to process it further. The XML data is found in v$cell_state.statistics_value. I need to access the view and pass the relevant column to the XMLTABLE function’s “passing” clause. An XMLTYPE is expected here, which is why I have to convert the LOB.

The XMLTABLE function should only process data that belongs to the nodes below ‘<stats type=”columnarcache”>’. An XPATH expression helps me defining the scope. This is actually where I spent most of the time because writing correct XMLPATHs doesn’t come naturally to me. Have a look at Wikipedia if XPATH is completely alien to you.

Getting the XPATH right can be challenging if you don’t do it every day. I learned from Marco Gralike who did something very similar to my investigation that it is possible to return the pruned XML tree structure as an XMLTYPE. I wasn’t quite sure for example which part of the XML document I wanted to use for the XMLTABLE’s columns clause. I went through a couple of iterations to find “my” subset of data. Here is one way to do this according to Marco’s blog post:

SQL> select b.cc
  from v$cell_state a,
  xmltable (
      '/cell_stats/stats/stats[@type="columnarcache"]'
      passing xmltype(a.statistics_value)
      columns cc XMLTYPE PATH '*'
  ) b
  where a.cell_name = '192.168.10.9';

You will get XML back (the column returned by XMLTABLE is cast to XMLTYPE in line 6), and then use that as the basis for further analysis.

Once the XML returned contains all the data you need, you can continue with writing the XMLTABLE’s columns clause to extract information from the XML document. That is where I ran into the next problem: I had syntax issues trying to reference the “self” node (which is a dot as you see in the full example in line 8) which is why I experimented a bit. It turned out I had to specify the column name, data type, and path before referencing the element.

The tricky bit working with the XML data is finding the correct XPATH, at least for me, because I am not an XML developer. I found it useful to start in the row containing the information I want and then work my way up the tree until I reach the root element. Playing around a bit more I managed to shorten my XPATH to just this one:

'//stats[@type="columnarcache"]/stat'

I still prefer the “complete” path expression because it gives me a clue where in the DOM model that information is stored.

Wrapping Up

If you want the complete picture for your entire Exadata cluster, simply add the cell name to the query:

SQL> select a.cell_name,b.name, b.value
from v$cell_state a,
xmltable (
    '/cell_stats/stats/stats[@type="columnarcache"]/stat'
    passing xmltype(a.statistics_value)
    columns 
        name   varchar2(50) path '@name',
        value  number       path '.'
) b
where b.value <> 0
order by a.cell_name,b.value;

Please have a look at Marco’s post where he explains a more complex scenario of nesting XMLTABLE functions, depending on what information you are after.

Hope this helps!

12.2 new features: a first encounter with the new Cluster Health Advisor

One of the many new manuals that showed up in Oracle Database 12.2 is the Oracle Autonomous Health Framework User’s Guide. This book contains a lot of information, some of which we already know from previous releases, and some new stuff as well. One of the new chapters caught my eye in particular:

“Proactively Detecting and Diagnosing Performance Issues for Oracle RAC”

Said chapter 5 introduces a new feature, called the Cluster Health Advisor (CHA), that should help you understand the performance of your cluster better. Really? Time to start the VMs to create a test.

My system is based on Oracle Linux 7.3 with UEK4. I created a 2 node 12.2.0.1.0 RAC, there weren’t any patches out at the time of writing. In other words, this function might change a little. My database, named CDB, features 2 PDBs: swingbench1 and swingbench2 (you can already guess where this is heading). It appears that my cluster is a “flex cluster” operating in standard mode. You noticed during configuration that you defined the nodes as “HUB nodes”, didn’t you? I have print screens from my RAC installation that I’ll put into a blog post when I find the time. In the meantime head over to oracle-base.com where you can find a RAC 12.2 installation walkthrough for Oracle Linux on VirtualBox.

The plan for this post is to overload the system deliberately to provoke the health advisor to give me recommendations about the cluster and/or the database.

The official documentation is found here: https://docs.oracle.com/database/122/ATNMS/monitoring-rac-with-cha.htm

How is the Cluster Health Advisory implemented?

As per the documentation, there are Cluster Health Advisory daemon processes found on each node, and sure enough there are:

[oracle@rac122pri1 ~]$ crsctl stat res -t 
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.DATA.dg
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.MGMT.dg
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.OCR.dg
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.RECO.dg
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.chad
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.net1.network
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.ons
               ONLINE  ONLINE       rac122pri1               STABLE
               ONLINE  ONLINE       rac122pri2               STABLE
ora.proxy_advm
               OFFLINE OFFLINE      rac122pri1               STABLE
               OFFLINE OFFLINE      rac122pri2               STABLE
...

The resource in question is “ora.chad”, and its profile defines dependencies to the GIMR or Grid Infrastructure Management Repository:

[oracle@rac122pri1 ~]$ crsctl stat res ora.chad -p | grep "DEPENDENCIES\b"
START_DEPENDENCIES=hard(global:ora.mgmtdb) pullup(global:ora.mgmtdb)
STOP_DEPENDENCIES=hard(global:intermediate:ora.mgmtdb)
[oracle@rac122pri1 ~]$ 

The documentation further states that the Health Advisor (CHA) Daemon automatically manages cluster nodes. Databases registered in Clusterware however must be registered specifically with the monitoring daemon.

The main tool to interact with the CHA is – chactl (there’s a surprise). It can take quite a few options:

[oracle@rac122pri1 bin]$ chactl -h
CLSCH-3659 : invalid command line syntax
Usage:
chactl monitor cluster [-model  [-force]]
chactl monitor database -db 
                        [-model  [-force]]
chactl unmonitor database -db 
chactl status [cluster|database [-db ]] [-verbose]
chactl config [cluster|database -db ]
chactl calibrate {cluster|database -db }
                 -model  [-force]
                 [-timeranges 'start=,end=,...']
                 [-kpiset 'name= min= max=,...' ]
    WHERE:
        -interval  : interval is in hrs
        -timeranges 'start=,end=,...' :
             Timestamp must be in format 'YYYY-MM-DD HH24:MI:SS'
    KPI for db:
        CPUPERCENT - CPU utilization - Percent
        IOREAD - Disk read - Mbyte/sec
        DBTIMEPERCALL - Database time per user call - usec/call
        IOWRITE - Disk write - Mbyte/sec
        IOTHROUGHPUT - Disk throughput - IO/sec
    KPI for cluster:
        CPUPERCENT - CPU utilization - Percent
        IOREAD - Disk read - Mbyte/sec
        IOWRITE - Disk write - Mbyte/sec
        IOTHROUGHPUT - Disk throughput - IO/sec
chactl query diagnosis [-cluster|-db ]
                       [-start  -end ]
                       [-htmlfile ]
chactl query model [-name  [-verbose]]
chactl query repository
chactl query calibration {-cluster|-db }
        [-timeranges 'start=,end=,...']
        [-kpiset 'name= min= max=,...' ]
        [-interval ]
    WHERE:
        -interval  : interval is in hrs
        -timeranges 'start=,end=,...' :
             Timestamp must be in format 'YYYY-MM-DD HH24:MI:SS'
    KPI for db:
        CPUPERCENT - CPU utilization - Percent
        IOREAD - Disk read - Mbyte/sec
        DBTIMEPERCALL - Database time per user call - usec/call
        IOWRITE - Disk write - Mbyte/sec
        IOTHROUGHPUT - Disk throughput - IO/sec
    KPI for cluster:
        CPUPERCENT - CPU utilization - Percent
        IOREAD - Disk read - Mbyte/sec
        IOWRITE - Disk write - Mbyte/sec
        IOTHROUGHPUT - Disk throughput - IO/sec
chactl remove model -name 
chactl rename model -from  -to 
chactl import model -name  -file  [-force]
chactl export model -name  -file 
chactl set maxretention -time 
chactl resize repository -entities 
   [-force | -eval]
[oracle@rac122pri1 bin]$

The output of the “help” command has been reformatted for readability.

And yes the documentation is right. At first only the cluster resource was monitored.

[oracle@rac122pri1 ~]$ chactl status
monitoring nodes rac122pri2, rac122pri1
not monitoring databases
[oracle@rac122pri1 ~]$ chactl status -verbose
monitoring nodes rac122pri2, rac122pri1 using model DEFAULT_CLUSTER
not monitoring databases
[oracle@rac122pri1 ~]$

Monitoring my database

In the next step I wanted to include my database to the monitoring configuration. That is easy:

[oracle@rac122pri1 ~]$ chactl monitor database -db CDB
[oracle@rac122pri1 ~]$ chactl status -verbose
monitoring nodes rac122pri2, rac122pri1 using model DEFAULT_CLUSTER
monitoring database cdb, instances CDB2, CDB1 using model DEFAULT_DB

Creating load and causing trouble

With the monitoring in place it was time to create some trouble. I started a swingbench run deliberately overloading the CPUs on my cluster. To add to the fun I rebooted a node during the swingbench execution. Although that didn’t stop processing (I used the connection pool method to establish sessions) I caused all of my sessions to pile up on the surviving node. It held its head up, but performance took a nosedive.

Any recommendations?

The Cluster Health Advisor can be interrogated using the “query diagnosis” command. After causing sufficient trouble I asked for a first assessment. Interestingly enough I hit the repository right after I rebooted node 2. I hadn’t checked at the time, but node 2 ran the GIMR – hence this output:

[oracle@rac122pri1 bin]$ chactl query diagnosis -db CDB\
> -start "2017-03-15 05:20:50" -end "2017-03-15 05:55:50"
CLSCH-2005 : Oracle Cluster Health Analysis Service (OCHAD) failed due to a Grid
Infrastructure Management Repository error.
Unable to start the Universal Connection Pool:
oracle.ucp.UniversalConnectionPoolException: Cannot get Connection from Datasource:
java.sql.SQLRecoverableException: Listener refused the connection with the following
error:

ORA-12514, TNS:listener does not currently know of service requested in connect descriptor

After a little while the failed resources were up again, and I could get some output:

[oracle@rac122pri1 bin]$ chactl query diagnosis -db CDB \
> -start "2017-03-15 05:20:50" -end "2017-03-15 05:55:50"
2017-03-15 05:54:40.0  Database cdb  DB Recovery Read IO Performance (CDB1) [detected]
2017-03-15 05:55:00.0  Database cdb  DB Recovery Read IO Performance (CDB1) [cleared]

Problem: DB Recovery Read IO Performance
Description: CHA detected that recovery reads are slower than expected.
Cause: The Cluster Health Advisor (CHA) detected that a database instance left the cluster
unexpectedly and instance recovery had a larger impact on performance than expected
because more database blocks than expected had to be recovered. The data blocks to which
recovery was applied were read from disk or from the global cache.
Action: Decrease the target for the mean time to recover ( MTTR target ) to increase the
incremental checkpointing rate.
[oracle@rac122pri1 bin]$

Spot on!

SQL> show parameter mttr

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_mttr_target               integer     0
SQL> 

This is typical for DBCA created databases based on the General_Purpose.dbc template. Good catch!

Now the system was still struggling under load:


SQL> select count(*), a.inst_id, b.name from gv$session a, v$pdbs b
  2  where a.con_id = b.con_id and a.username = 'SOE' group by a.inst_id, b.name;

  COUNT(*)    INST_ID NAME
---------- ---------- ------------------------------
        75          1 SWINGBENCH1

The pool was set to 75 users, but should have been split between both nodes. With the second node down however there was only one to run on.

The operating system didn’t look too happy either:

[root@rac122pri1 trace]# top -b  -n 1 | head -n 15
top - 06:05:08 up 21:14,  2 users,  load average: 15.47, 13.66, 8.69
Tasks: 1403 total,  10 running, 1393 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.0 us,  2.2 sy,  0.0 ni, 92.5 id,  0.2 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 16167836 total,  5510468 free,  4714544 used,  5942824 buff/cache
KiB Swap:   782332 total,   782332 free,        0 used.  7338672 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
10339 root      20   0  158952   5516   3608 R  16.7  0.0   0:00.03 top
10341 root      20   0   28604   5428   3728 R  11.1  0.0   0:00.02 perl
20798 oracle    20   0 2545588 812828 805112 S  11.1  5.0   0:40.34 oracle_20798_cd
20824 oracle    20   0 2543552 817728 809924 S  11.1  5.1   0:40.01 oracle_20824_cd
20828 oracle    20   0 2545592 810800 803068 S  11.1  5.0   0:39.88 oracle_20828_cd
24365 oracle    20   0 2543524 588452 581008 S  11.1  3.6   0:22.72 oracle_24365_cd
24457 oracle    20   0 2543748 596728 588752 S  11.1  3.7   0:22.56 oracle_24457_cd
28569 oracle    20   0 2543524 552440 545016 S  11.1  3.4   0:20.18 oracle_28569_cd
[root@rac122pri1 trace]#

Although I guess that is not too bad for a VM with only 4 cores! Inevitably though there is trouble (but it seems to be over)

[oracle@rac122pri1 bin]$ chactl query diagnosis -db CDB \
> -start "2017-03-15 05:20:00" -end "2017-03-15 06:20:00"
2017-03-15 05:54:40.0  Database cdb  DB Recovery Read IO Performance (CDB1) [detected]
2017-03-15 05:55:00.0  Database cdb  DB Recovery Read IO Performance (CDB1) [cleared]
2017-03-15 05:58:55.0  Database cdb  Long Waits for Global Cache Message Requests (CDB1) [detected]
2017-03-15 05:58:55.0  Database cdb  Long Redo Log Sync Wait (CDB1) [detected]
2017-03-15 06:04:25.0  Database cdb  Long Waits for Global Cache Message Requests (CDB1) [cleared]
2017-03-15 06:04:25.0  Database cdb  Long Redo Log Sync Wait (CDB1) [cleared]

Problem: DB Recovery Read IO Performance
Description: CHA detected that recovery reads are slower than expected.
Cause: The Cluster Health Advisor (CHA) detected that a database instance left the cluster
unexpectedly and instance recovery had a larger impact on performance than expected because
more database blocks than expected had to be recovered. The data blocks to which recovery
was applied were read from disk or from the global cache.
Action: Decrease the target for the mean time to recover ( MTTR target ) to increase the
incremental checkpointing rate.

Problem: Long Waits for Global Cache Message Requests
Description: CHA detected that global cache messages are taking a long time.
Cause: The Cluster Health Advisor (CHA) detected that global cache messages have not
completed and are blocking database processes. It is likely that Global Cache Server
Processes (LMS) on another node are unresponsive or are waiting for CPU or Memory.
Action: Check whether CHA reports any issues related to severe performance degradation
caused by CPU or memory starvation on other nodes in the cluster and follow the suggestions
to resolve those problems.

Problem: Long Redo Log Sync Wait
Description: CHA detected that commits are blocked for several seconds.
Cause: The Cluster Health Advisor (CHA) detected that commits are blocked for several
seconds because cluster commit progagation messages are very slow. It is likely that a
Global Cache Server Process (LMS) on another node is unresponsive or is waiting for CPU
or Memory.
Action: Check whether CHA reports any issues related to severe performance degradation
on other nodes in the cluster. Resolve the problems found on other nodes and instances
of this database in the cluster.
[oracle@rac122pri1 bin]$

I really like the text representation because I like the command line. For those of us who prefer a more aesthetically pleasing representation of the data, you can also create a HTML version of the findings specifying the htmlfile flag.

What else?

Going over the list of options again I found a few more that look interesting. For instance, what’s the footprint of this information in the repository? Query it:

[oracle@rac122pri1 bin]$ chactl query repository

specified max retention time(hrs): 72
available retention time(hrs)    : 307
available number of entities     : 17
allocated number of entities     : 4
total repository size(gb)        : 15.00
allocated repository size(gb)    : 1.29
[oracle@rac122pri1 bin]$ 

And what are these models I see when querying the status? DEFAULT_DB and DEFAULT_CLUSTER exist:

[oracle@rac122pri1 bin]$ chactl query model
Models: DEFAULT_CLUSTER, DEFAULT_DB

[oracle@rac122pri1 bin]$ chactl query model -name DEFAULT_DB
Model: DEFAULT_DB
Target Type: DATABASE
Version: 12.2.0.1_0
OS Calibrated on: 
Calibration Target Name: 
Calibration Date: 
Calibration Time Ranges: 
Calibration KPIs: 
[oracle@rac122pri1 bin]$ 

The documentation elaborates, it is possible to calibrate the tool to better reflect the environment.

This is actually quite useful, however I doubt many clusters will deviate from the standard model. Anyway, if you want to calibrate your cluster to a specific workload, you start of querying CHA to check if it has sufficient data. In my case I could dream up a scenario called “swingbench run”.

The cluster as such comes first:

[oracle@rac122pri1 bin]$ chactl query calibration -cluster -timeranges 
>'start=2017-03-15 05:20:00,end=2017-03-15 06:20:00'

Cluster name : rac122pri
Start time : 2017-03-15 05:20:00
End time : 2017-03-15 06:30:00
Total Samples : 1647
Percentage of filtered data : 100%

1) Disk read (ASM) (Mbyte/sec)

MEAN      MEDIAN    STDDEV    MIN       MAX     
1.19      0.45      3.71      0.00      86.69   

<25       <50       <75       =100    
99.64%    0.18%     0.12%     0.06%     0.00%    

2) Disk write (ASM) (Mbyte/sec)

MEAN      MEDIAN    STDDEV    MIN       MAX     
3.14      0.42      4.67      0.00      61.67   

<50       <100      <150      =200    
99.94%    0.06%     0.00%     0.00%     0.00%    

3) Disk throughput (ASM) (IO/sec)

MEAN      MEDIAN    STDDEV    MIN       MAX     
339.59    100.00    512.97    0.00      9700.00 

<5000     <10000    <15000    =20000  
99.88%    0.12%     0.00%     0.00%     0.00%    

4) CPU utilization (total) (%)

MEAN      MEDIAN    STDDEV    MIN       MAX     
37.89     16.80     33.90     4.30      99.90   

<20       <40       <60       =80     
55.68%    7.35%     3.76%     4.74%     28.48%   

[oracle@rac122pri1 bin]$ 

As per the command’s help message you can limit the output of KPIs. Similarly you can check if you have enough data for the database.

[oracle@rac122pri1 bin]$ chactl query calibration -db CDB -timeranges \
> 'start=2017-03-15 05:20:00,end=2017-03-15 06:20:00'
Database name : cdb
Start time : 2017-03-15 05:20:00
End time : 2017-03-15 06:30:00
Total Samples : 1632
Percentage of filtered data : 100%

1) Disk read (ASM) (Mbyte/sec)

MEAN      MEDIAN    STDDEV    MIN       MAX     
1.16      0.46      3.61      0.00      86.69   

<25       <50       <75       =100    
99.69%    0.12%     0.12%     0.06%     0.00%    

2) Disk write (ASM) (Mbyte/sec)

MEAN      MEDIAN    STDDEV    MIN       MAX     
3.17      0.43      4.69      0.00      61.67   

<50       <100      <150      =200    
99.94%    0.06%     0.00%     0.00%     0.00%    

3) Disk throughput (ASM) (IO/sec)

MEAN      MEDIAN    STDDEV    MIN       MAX     
341.61    100.00    514.31    0.00      9700.00 

<5000     <10000    <15000    =20000  
99.88%    0.12%     0.00%     0.00%     0.00%    

4) CPU utilization (total) (%)

MEAN      MEDIAN    STDDEV    MIN       MAX     
37.94     16.75     33.99     4.30      99.90   

<20       <40       <60       =80     
55.82%    7.17%     3.62%     4.66%     28.74%   

5) Database time (per user call) (usec/call)

MEAN      MEDIAN    STDDEV    MIN       MAX     
108878.33  6712.80   2778378.28  0.00      93077792.00

<10000000  <20000000  <30000000  <40000000  <50000000  <60000000  =70000000
99.88%    0.00%     0.00%     0.00%     0.00%     0.00%     0.06%     0.06%    

[oracle@rac122pri1 bin]$ 

Be careful though, sometimes you don’t have enough data! I saw a message like this when setting the window to 30 minutes:

[oracle@rac122pri1 ~]$ chactl query calibration -cluster \
> -timeranges 'start=2017-03-15 05:50:00,end=2017-03-15 06:20:00'

Cluster name : rac122pri
Start time : 2017-03-15 05:50:00
End time : 2017-03-15 06:20:00
Total Samples : 687
Percentage of filtered data : 100%
The number of data samples may not be sufficient for calibration.
...

It looks like you can leave it at that, or create a new, more accurate model with finer granularity. Interesting new features await!

A quick look at Oracle 12.2 performance instrumentation

Thanks to the prep-work done by Andy Colvin, who continuously and boldly goes where no man has gone before, one of our Exadata systems in the lab is now fully upgraded to 12.2.0.1. It comes fully equipped with the matching cellos to support all the cool new features. Exciting times!

The reason for this post is simple: I have started working on our talk for @Enkitec’s E4 conference in June but thought I’d share this little article with you as a teaser :) There might be one or two more of these posts but if you want the full story make sure you catch us (online) during the conference.

More diagnostic information in 12.2

The Oracle database truly champions diagnosability in a way I have not seen with any other system, and it does so out of the box. Granted, some of that requires an investment into extra cost options but the value one gets in form of Active Session History (ASH) and Automatic Workload Repository (AWR) is real. After I read the chapter on instrumentation in “Insights-tales from the Oaktable” (Apress) a long time ago, I started to realise the inherent beauty of having insights to code. This code can be your code if you instrumented it properly, or the Oracle codepath externalised as wait events. Sadly most application developers do not adhere to the principle of instrumenting code (or maybe don’t even know about the possibility?) and therefore complicate troubleshooting unnecessarily. The latter is not so much an issue on many platforms where you don’t have an equivalent of the Oracle Wait Interface and session counters anyway, but on Oracle it’s a real wasted opportunity as others have pointed out before me.

I’ll now take my idealist hat off and focus on the real world :) In the far more common case where the application isn’t instrumented by the developer, you can still get to some conclusions by using the Wait Interface and session counters. In most scenarios I am involved in the first step is to use Tanel Poder’s session snapper script which gives me insights to both.

So what has changed in this respect with 12.2? The test environment I am using is an Exadata quarter rack as mentioned before. The findings should be comparable with other Oracle software offerings, in the cloud and on premises.

Wait Interface

The Wait Interface is one of the most useful features for the performance engineer, and one of the building blocks for Statspack, ASH and AWR. I was curious if new events have been introduced in 12.2, hence this investigation. The sheer number of events tracked in the database prevents them from being listed verbally in this post. If you want to, you can use Tanel’s “sed.sql” to find out more, or simply query v$event_name.

An interesting tidbit for 12.2 has been covered by @FranckPachot: some of the more misleading event names such as db file scattered read and db file sequential readhave been clarified in 12.2. Search for events where the name is not equal to the display_name, or read Franck’s post on the DBI blog.

Wait events in 11.2.0.3

I like to start comparisons with a benchmark, and 11.2.0.3 seems to be a good candidate. Just looking at the wait_classes and counting events per class should be a good starting point:

SQL> select count(*), wait_class 
  2  from v$event_name 
  3  group by rollup(wait_class) 
  4  order by wait_class;

   COUNT(*) WAIT_CLASS
----------- ----------------------------------------------------------------
         55 Administrative
         17 Application
         50 Cluster
          2 Commit
         33 Concurrency
         24 Configuration
         95 Idle
         35 Network
        745 Other
          9 Queueing
          8 Scheduler
         31 System I/O
         48 User I/O
       1152

14 rows selected.

So there are 1152 events total in 11.2.0.3, keep that number in mind.

Wait events in 12.1.0.2

In my opinion 12.1 is a major step ahead, and I said it many times: I haven’t seen so many fundamental changes to the Oracle database for a long time. For various reasons though 12.1 hasn’t seen the traction in the field it probably deserved. Note how the diagnosability has been greatly enhanced:

SQL> select count(*), wait_class 
  2  from v$event_name 
  3  group by rollup(wait_class) 
  4  order by wait_class;

   COUNT(*) WAIT_CLASS
----------- ----------------------------------------------------------------
         57 Administrative
         17 Application
         64 Cluster
          4 Commit
         38 Concurrency
         26 Configuration
        121 Idle
         28 Network
       1186 Other
          9 Queueing
          9 Scheduler
         35 System I/O
         56 User I/O
       1650

14 rows selected.

A quick calculation reveals that 12.1 features 498 additional events, a lot more than 11.2.0.3. Not too bad in my opinion. Unfortunately most of these additional events ended up in the “Other” wait class. This is a real shame, and I have remarked this before when blogging about the new Data Guard Wait events: they probably should have ended up in the Network class instead. I am sure there are other examples like this.

Wait events in 12.2.0.1

Now what about 12.2? First let’s start with the list:

SQL> select count(*), wait_class
  2  from v$event_name
  3  group by rollup(wait_class)
  4  order by wait_class;

  COUNT(*) WAIT_CLASS
---------- ----------------------------------------------------------------
        57 Administrative
        17 Application
        68 Cluster
         4 Commit
        49 Concurrency
        27 Configuration
       135 Idle
        29 Network
      1314 Other
         9 Queueing
        10 Scheduler
        35 System I/O
        57 User I/O
      1811

14 rows selected.

There are indeed some new events, most of them can be found in the Other wait class. Again, this is quite unfortunate as it prevents the performance architect from identifying unknown wait events quickly.

I have decided to keep this post short-ish and will spend more time some other day to investigate the exact difference between 12.1.0.2 and 12.2.0.1. Most likely after E4 this year.

Session Counters

With the wait interface covered, it’s time to move on to the session counters. Continuing the approach I took with wait events I will group all session counters by class. Instead of re-inventing the wheel I am using a slightly adapted version of Tanel Poder’s “mys.sql” script to group counters by class. Most of them fall in just one, but there are others where more than one class is applicable. The 12.2 Reference Guide explains v$statname.class in more detail.

Session counters in 11.2.0.3

Before investigating 12.1 and 12.2 I’ll look at 11.2.0.3 first, as in the first section of the article. The SQL statement I used is this:

with classification as (
select name, TRIM(
  CASE WHEN BITAND(class,  1) =   1 THEN 'USER  ' END ||
  CASE WHEN BITAND(class,  2) =   2 THEN 'REDO  ' END ||
  CASE WHEN BITAND(class,  4) =   4 THEN 'ENQ   ' END ||
  CASE WHEN BITAND(class,  8) =   8 THEN 'CACHE ' END ||
  CASE WHEN BITAND(class, 16) =  16 THEN 'OS    ' END ||
  CASE WHEN BITAND(class, 32) =  32 THEN 'RAC   ' END ||
  CASE WHEN BITAND(class, 64) =  64 THEN 'SQL   ' END ||
  CASE WHEN BITAND(class,128) = 128 THEN 'DEBUG ' END
) class_name 
from v$statname
) select count(*), class_name from classification 
group by rollup(class_name)
order by class_name;

Executed on an 11.2.0.3 database this reveals the following numbers:

   COUNT(*) CLASS_NAME
----------- ------------------------------------------------
        121 CACHE
         27 CACHE RAC
         15 CACHE SQL
        188 DEBUG
          9 ENQ
         16 OS
         25 RAC
         32 REDO
         93 SQL
          2 SQL   DEBUG
        107 USER
          3 USER  RAC
        638

13 rows selected.

638 of them altogether. Keep the number in mind when moving to 12.1.

Session counters in 12.1.0.2

There was a major increase in the number of counters between 11.2.0.3 and 12.1.0.2. Consider the numbers:

  COUNT(*) CLASS_NAME
---------- ------------------------------------------------
       151 CACHE
        53 CACHE RAC
        15 CACHE SQL
       565 DEBUG
         9 ENQ
        16 OS
        35 RAC
        68 REDO
         1 REDO  RAC
       130 SQL
         2 SQL   DEBUG
       130 USER
         3 USER  RAC
      1178

14 rows selected.

That nearly doubles the number of counters available. Note that quite a few of the new counters fall into the DEBUG section. Looking a bit more closely you can see they seem to be used by the In Memory (IM) Option:

SQL>  SELECT
  2      regexp_substr(name,'\w+ \w+') AS short_name,
  3      COUNT(*)
  4  FROM
  5      v$statname
  6  WHERE
  7      class = 128
  8  GROUP BY
  9      regexp_substr(name,'\w+ \w+')
 10  ORDER BY
 11      2 DESC
 12  FETCH FIRST 5 ROWS ONLY;

SHORT_NAME                        COUNT(*)
------------------------------ -----------
IM repopulate                           49
IM scan                                 47
IM populate                             37
spare statistic                         35
IM space                                26

5 rows selected.

There are 198 session counters beginning with ‘IM %’. I can feel another post about DB In Memory coming …

Session counters in 12.2.0.1

Finally, here is the list of statistics in 12.2.0.1:

  COUNT(*) CLASS_NAME
---------- ------------------------------------------------
       174 CACHE
        73 CACHE RAC
        15 CACHE SQL
      1067 DEBUG
         9 ENQ
        16 OS
        35 RAC
        75 REDO
         1 REDO  RAC
       190 SQL
         2 SQL   DEBUG
       144 USER
         3 USER  RAC
      1804

Another 626 additional counters, that’s almost the number of counters available in total on the 11.2.0.3 system! Running my previous query again it seems that IM-related statistics dominate, but there are lots of others of interest.

As with the 12.2 wait events I don’t want to give too much away at this point (and the post is long enough anyway) so stay tuned for an update at a later time.

Summary

Oracle has been one of the best database engines around, and with 12.2.0.1 instrumentation is further improved. This post has again become too long, so I’ll stop here and defer the write-up of my investigation into those stats relevant for Exadata to a later point.

Happy troubleshooting!