Martins Blog

Trying to explain complex things in simple terms

Archive for September, 2014

Interesting observation about standby redo logs in Data Guard

Posted by Martin Bach on September 22, 2014

Some of you might have followed the discussion around the number of standby redo logs on twitter, but since 140 characters are woefully short for the complete story here’s the writeup that prompted the question. This is a test with 12.1.0.2 on virtualised Linux, repeated on a proper platform with physical hardware.

First of all here’s my setup. I have a dbca-based database (CDB, but doesn’t matter) that features 3 groups for its online redo logs. They are all 50 MB in size-important for this test, but not realistic :) Following the Oracle documentation I created n + 1 groups (per thread) on the standby to stop Data Guard broker from complaining about missing standby redo logs (SRL).

The end result was positive, here’s what the broker thinks:

DGMGRL> show configuration

Configuration - test

  Protection Mode: MaxPerformance
  Members:
  CDB1  - Primary database
    STDBY - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 23 seconds ago)

The broker likes to bemoan you in case SRLs are missing on either primary and/or standby. So I’m all good to go! But wait a second, when I’m using one of the cool new 12c features to check the failover readiness, something is odd:

DGMGRL> validate database "STDBY"

  Database Role:     Physical standby database
  Primary Database:  CDB1

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Flashback Database Status:
    CDB1:   Off
    STDBY:  Off

  Current Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (CDB1)                  (STDBY)
    1         3                       2                       Insufficient SRLs

  Future Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (STDBY)                 (CDB1)
    1         3                       0                       Insufficient SRLs
    Warning: standby redo logs not configured for thread 1 on CDB1

DGMGRL>

Pardon me? I _do_ have 4 groups of SRLs:

SQL> r
  1* select group#,sequence#,bytes,used,status from v$standby_log

    GROUP#  SEQUENCE#      BYTES       USED STATUS
---------- ---------- ---------- ---------- ----------
         4         27   52428800       2048 ACTIVE
         5         29   52428800      11264 ACTIVE
         6          0   52428800          0 UNASSIGNED
         7          0   52428800          0 UNASSIGNED

And 3 online redo logs:

SQL> select group#,thread#,sequence#,bytes,status from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES STATUS
---------- ---------- ---------- ---------- ----------------
         1          1         31   52428800 CURRENT
         2          1         29   52428800 INACTIVE
         3          1         30   52428800 INACTIVE

But the fun fact remains that out of my 4 standby redo logs, only 2 are ever used. @pioro suggested that it’s because there is no need to use any of the other two because of a lack of redo to be applied and that sounds plausible. To validate this I created a test environment on physical hardware with proper storage because my tests on the lab turned out to suffer from IO problems.

The setup is the same.

I created a couple of sessions that start off by creating a 1,000,000 rows table, then delete from it only to roll back after the delete completed. That’s a lot of redo for 50 MB files (again you wouldn’t use 50MB online redo logs in production-this is just a test). My observation remains: only groups 4 and 5 are used. When the redo generation on the primary gets too far ahead, there will be RFS transfers of the archived redo log. Here are a few queries with their output:

PROCESS   STATUS       CLIENT_P GROUP#                                      THREAD#  SEQUENCE#     BLOCK# ACTIVE_AGENTS
--------- ------------ -------- ---------------------------------------- ---------- ---------- ---------- -------------
ARCH      CLOSING      ARCH     5                                                 1        149      83968             0
ARCH      CONNECTED    ARCH     N/A                                               0          0          0             0
ARCH      CLOSING      ARCH     5                                                 1        147      86016             0
ARCH      CLOSING      ARCH     4                                                 1        148      83968             0
MRP0      APPLYING_LOG N/A      N/A                                               1        147      85766            25
RFS       IDLE         UNKNOWN  N/A                                               0          0          0             0
RFS       IDLE         UNKNOWN  N/A                                               0          0          0             0
RFS       RECEIVING    LGWR     3                                                 1        150      85744             0
RFS       IDLE         ARCH     N/A                                               0          0          0             0

Is this a problem? Not as far as I can tell. And maybe I did something incorrectly on my side too. There wasn’t a problem-my archiver process was quick enough to archive the SRLs and as soon as it fell behind it resorted to fetch archived redo logs from the primary. When it uses the FAL process you see line like this in the alert.log:

2014-09-17 02:34:45.164000 -05:00
RFS[4]: Selected log 4 for thread 1 sequence 211 dbid 788205474 branch 857548261
Media Recovery Log +RECO/STDBY/ARCHIVELOG/2014_09_17/thread_1_seq_209.368.858479683
Archived Log entry 141 added for thread 1 sequence 210 ID 0x2efbcba0 dest 1:
2014-09-17 02:34:46.215000 -05:00

When Real-Time-Apply was active (whenever redo transfer rates dropped), I saw this, alternating between group 4 and 5:

2014-09-17 02:35:33.653000 -05:00
Media Recovery Waiting for thread 1 sequence 230 (in transit)
Recovery of Online Redo Log: Thread 1 Group 5 Seq 230 Reading mem 0
  Mem# 0: +DATA/STDBY/ONLINELOG/group_5.380.858076769
  Mem# 1: +RECO/STDBY/ONLINELOG/group_5.342.858076769

So for some reason yet unknown to me there are only ever 2 SRLs in use.

Posted in 12c Release 1, Linux | 8 Comments »

Can you have high redundancy files in a normal redundancy diskgroup?

Posted by Martin Bach on September 11, 2014

One of the perks of teaching classes is that I get to research questions asked. In the last Exadata Administration Class I taught someone asked: can you have your disk groups in Exadata on normal redundancy yet have certain databases use high redundancy? This would be a good interview question …

The answer is yes, which I remembered from researching material on the 11g RAC book but I wanted to prove that it is the case.

Update: I planned a second blog post where I wanted to test the effect but Alex Fatkulin was quicker, and I promise I didn’t see his post when I wrote mine. Otherwise there probably wouldn’t have been one :) In summary, you aren’t really any better protected. The disk group remains at normal redundancy, even with the data files in high. Looking at Alex’s results (and I encourage you to do so) I concur with his summary that although you have a 3rd copy of the extent protecting you from corruption, you don’t have higher resilience.

This is not Exadata specific by the way. When I face a question around ASM and internals my first idea is to use an Internet search engine and look up the ASM work by Luca Canali. This page is almost always of relevance when looking at ASM file structures: https://twiki.cern.ch/twiki/bin/view/PDBService/ASM_Internals.

ASM Templates

I am not aware of many users of ASM who know about ASM templates, although we all make use of them. The templates define the striping and mirroring of files, and are explained in the ASM Admin Guide chapter 5. You are not limited to using the Oracle provided templates, you can create your own as well, which is key to this post. I headed over to the ASM instance on my cluster and created the template:

SQL> alter diskgroup data add template adminclassHigh attributes(high);

Diskgroup altered.

With the template in place I can create a high redundancy tablespace for example in my normal redundancy diskgroup:

SQL> select name, group_number, type from v$asm_diskgroup;

NAME                           GROUP_NUMBER TYPE
------------------------------ ------------ ------
RECO                                      3 NORMAL
DBFS_DG                                   2 NORMAL
DATA                                      1 NORMAL

Unlike what the documentation suggests you do not need to change db_create_file_dest for this to work.

SQL> show parameter db_create_file_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_create_file_dest                  string      +DATA

Let’s put that template to use.

SQL> create tablespace testhigh datafile '+data(adminclassHigh)' size 10m;

Tablespace created.

SQL> select file_name from dba_data_Files;

FILE_NAME
-----------------------------------------------------------------------------------------------
+DATA/PRI/02810A3FB7F65856E053D908A8C003E1/DATAFILE/system.363.857664247
+DATA/PRI/02810A3FB7F65856E053D908A8C003E1/DATAFILE/sysaux.364.857664247
+DATA/PRI/02810A3FB7F65856E053D908A8C003E1/DATAFILE/users.365.857664253
+DATA/PRI/02810A3FB7F65856E053D908A8C003E1/DATAFILE/testhigh.357.857880753

Note that I specify the newly created template in () in the datafile clause. The tablespace has been created, next I need the file number and the incarnation to look up how the extents are distributed.

SQL> select DISK_KFFXP,AU_KFFXP,PXN_KFFXP,XNUM_KFFXP,LXN_KFFXP 
  2   from x$kffxp  where number_kffxp = 357 and incarn_kffxp = 857880753;

DISK_KFFXP   AU_KFFXP  PXN_KFFXP XNUM_KFFXP  LXN_KFFXP
---------- ---------- ---------- ---------- ----------
         4      73706          0          0          0
        23      73767          1          0          1
        24      28895          2          0          2
         1      73679          3          1          0
        27      30015          4          1          1
        22      73717          5          1          2
        21      73725          6          2          0
         7      73698          7          2          1
        33      73707          8          2          2

9 rows selected.

What looks a little cryptic can be translated using Luca’s notes as:

  • DISK is the ASM disk where the extent is located (as in V$ASM_DISK)
  • AU indicates the relative position of the allocation unit from the beginning of the disk
  • PXN is the progressive extent number ( = actual extent)
  • XNUM is the ASM file extent number (mirrored extent pairs have the same extent value)
  • LXN indicates the type of extent: 0 = primary extent, 1 = first mirror copy, 2 = second mirror copy

So you can see that each primary extent has 2 mirror copies, also known as high redundancy.

If you omit the template in the datafile clause you get what you’d expect: normal redundancy.

SQL> drop tablespace testhigh including contents and datafiles;

Tablespace dropped.

SQL> create tablespace testhigh datafile '+data' size 10m;

Tablespace created.

SQL>  select file_name from dba_data_Files;

FILE_NAME
-----------------------------------------------------------------------------------------------
+DATA/PRI/02810A3FB7F65856E053D908A8C003E1/DATAFILE/system.363.857664247
+DATA/PRI/02810A3FB7F65856E053D908A8C003E1/DATAFILE/sysaux.364.857664247
+DATA/PRI/02810A3FB7F65856E053D908A8C003E1/DATAFILE/users.365.857664253
+DATA/PRI/02810A3FB7F65856E053D908A8C003E1/DATAFILE/testhigh.357.857880927

And on ASM:

SQL> select DISK_KFFXP,AU_KFFXP,PXN_KFFXP,XNUM_KFFXP,LXN_KFFXP
  2  from x$kffxp  where number_kffxp = 357 and incarn_kffxp = 857880927;

DISK_KFFXP   AU_KFFXP  PXN_KFFXP XNUM_KFFXP  LXN_KFFXP
---------- ---------- ---------- ---------- ----------
        30      73724          0          0          0
        16      73749          1          0          1
         0      73690          2          1          0
        28      73682          3          1          1
        17      73722          4          2          0
        10      73697          5          2          1

6 rows selected.

As you can see each extent has 1 mirror copy, not 2.

Posted in Automatic Storage Management | 1 Comment »

Intra-Database IORM in action

Posted by Martin Bach on September 10, 2014

I have been teaching the Enkitec Exadata Administration Class this week and made an interesting observation I thought was worth sharing with regards to IO Resource Management on Exadata.

I have created a Database Resource Manager (DBRM) Plan that specifically puts a resource consumer group to a disadvantage. Actually, quite severely so but the following shouldn’t be a realistic example in the first place: I wanted to prove a point. Hang-on I hear you say: you created a DBRM plan-the post has IORM in the subject though: what gives? Please allow me to explain.

Exadata offers 3 different ways to implement IORM to the keen engineer:

  • Intra-Database IORM
  • Inter-Database IORM
  • Category IORM

The latter 2 need to be implemented on the cells using the cellcli “alter IORMPLAN” directive, documented in the Storage Server User’s Guide in chapter 6. The first one though is pushed down to the cells when activated on the database, an activity that is visible in the cell alert.log. So when you create and enable a DBRM plan it will automatically become an IORM plan as well.

Setup

The code to create the DBRM plan is quite simple. I create a plan named ENKITEC_DBRM and two new consumer groups:

  • LOWPRIO_GROUP
  • HIGHPRIO_GROUP

HIGHPRIO_GROUP gets 100% of mgmt_p2 where LOWPRIO_GROUP gets 25% of the remaining resources on level mgmt_p3 and has to share them with the OTHER_GROUP as well. Not that you would do that in real life…. There was no change to the cpu_count, which is 24 on an X2 compute node (2s12c24t Westmere Xeon processors).

After the setup was complete I created a SQL script that I’m planning on executing in 50 sessions: 25 in low priority, and 25 in high priority sessions. The script is a simple “select count(*) from reallybigtable” – in this case an 80 GB table with 256 million rows, uncompressed.

Running SQLPLUS concurrently

I wrote a small shell script that is capable of launching a user-configurable number of SQLPLUS sessions against the database and included some timing information to measure the time-to-completion for all sessions. With that at hand I started 25 sessions for the low priority group and 25 for the high priority consumer group. In a third SSH-session I connected to the first cell in the quarter rack (it’s an X2 by the way) and repeatedly executed metric_iorm.pl. This little gem was written for cellserv 11.2.x and lists IORM metrics for each database in the system as well as a summary of IO activity against the cell. I explicitly mentioned cellsv 11.2.x because the new IORM instrumentation for PDBs is missing from the output. So far you can only realistically monitor non-CDBs with it when using RDBMS 12c. The perl script is available from “Tool for Gathering I/O Resource Manager Metrics: metric_iorm.pl (Doc ID 1337265.1)”. Deployed to celladmin’s home I was ready to start the test.

Test result

Executing metric_iorm.pl immediately after initiating the heavy IO load against the cells does not report a lot of useful data. It appears as if the contents of “metriccurrent” isn’t updated straight away but with a slight delay. Note there is no typical resource manager wait event such as “cpu quantum” here:

SQL> select username,event,state,sql_id from v$session where username like '%PRIO';

USERNAME                       EVENT                          STATE               SQL_ID
------------------------------ ------------------------------ ------------------- -------------
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITED KNOWN TIME   6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITED SHORT TIME   6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
HIGHPRIO                       cell smart table scan          WAITING             6q73x4ufb26pz
LOWPRIO                        cell smart table scan          WAITING             6q73x4ufb26pz

50 rows selected.

Almost all of the sessions were happily scanning the table, I am not aware of a way of finding out about IORM throttling on the RDBMS level but that doesn’t mean there is a way of finding out! In the end the sessions of LOWPRIO_GROUP took 320 seconds to scan the table, the sessions in HIGHPRIO_GROUP only needed 199 seconds to finish their scans of the table. So clearly IORM was at work here.

IO Performance

The end result was to be expected, but I wanted to know more. In this case the interesting bit is found on the cell. Please note that I only looked at the first cell in this X2 quarter rack, I didn’t check the combined throughput across all cells. The query I executed wasn’t making any use of Exadata features either, it was a brute force scan across all rows in the table in the absence of a where-clause. I just needed a lot of IO to demonstrate the use of IORM.

When all 50 sessions were active I could observe the following information from metric_iorm.pl for my database:

Database: DBM01
Utilization:     Small=0%    Large=97%
Flash Cache:     IOPS=45702
Disk Throughput: MBPS=0
Small I/O's:     IOPS=0.7    Avg qtime=8.8ms
Large I/O's:     IOPS=1115    Avg qtime=1941ms
        Consumer Group: HIGHPRIO_GROUP
        Utilization:     Small=0%    Large=94%
        Flash Cache:     IOPS=43474
        Disk Throughput: MBPS=0
        Small I/O's:     IOPS=0.0 
        Large I/O's:     IOPS=1064 
        Consumer Group: _ORACLE_BACKGROUND_GROUP_
        Utilization:     Small=0%    Large=0%
        Flash Cache:     IOPS=6.6
        Disk Throughput: MBPS=0
        Small I/O's:     IOPS=0.7
        Large I/O's:     IOPS=0.0
        Consumer Group: LOWPRIO_GROUP
        Utilization:     Small=0%    Large=3%
        Flash Cache:     IOPS=2222
        Disk Throughput: MBPS=52
        Small I/O's:     IOPS=0.0
        Large I/O's:     IOPS=50.4

You can see that IORM is in action: the low priority group is literally starved out, which is correct if you consider the plan I implemented. I have previously put my reallybigtable to good use and ended up with a lot of it in flash cache (thanks to cellsrv 11.2.3.3.x+ you benefit from flash cache during smart scans without having to pin the segment to it). Interestingly the disk throughput is 0 MB per second for the high priority group (which looks quite wrong to me, I need to debug the script). The low priority consumer group apparently makes use of spinning disk, although the combined throughput is lower.

After the 25 sessions from the high priority resource consumer group finished, it looked better for the sessions in the low priority group:

Database: DBM01
Utilization:     Small=0%    Large=88%
Flash Cache:     IOPS=44351
Disk Throughput: MBPS=0
Small I/O's:     IOPS=3.2    Avg qtime=0.0ms
Large I/O's:     IOPS=1080    Avg qtime=830ms
        Consumer Group: _ORACLE_BACKGROUND_GROUP_
        Utilization:     Small=0%    Large=0%
        Flash Cache:     IOPS=8.4
        Disk Throughput: MBPS=0
        Small I/O's:     IOPS=3.2  
        Large I/O's:     IOPS=0.0
        Consumer Group: LOWPRIO_GROUP
        Utilization:     Small=0%    Large=88%
        Flash Cache:     IOPS=44342
        Disk Throughput: MBPS=0
        Small I/O's:     IOPS=0.0
        Large I/O's:     IOPS=1080

You can no longer see that there are other sessions than background and LOWPRIO_GROUP active. The number of IOPS from Flash Cache went up significantly to the extent that disk isn’t reported to be used anymore. I need to verify that-it doesn’t look quite right to me either.

Summary

IO Resource Manager sadly is a much underused tool and few Exadata users seem to know it is active as soon as you implement a DBRM plan (the objective parameter notwithstanding). I am hoping that this post shows that using DBRM/IORM on Exadata is worth investigating time and effort to as it is a key enabler for database consolidation and more fancy topics such as DBaaS. And you don’t get something comparable on anything else that runs the Oracle software.

Posted in 12c Release 1, Exadata | Tagged: | 1 Comment »