Martins Blog

Trying to explain complex things in simple terms

Interesting observation about standby redo logs in Data Guard

Posted by Martin Bach on September 22, 2014

Some of you might have followed the discussion around the number of standby redo logs on twitter, but since 140 characters are woefully short for the complete story here’s the writeup that prompted the question. This is a test with 12.1.0.2 on virtualised Linux, repeated on a proper platform with physical hardware.

First of all here’s my setup. I have a dbca-based database (CDB, but doesn’t matter) that features 3 groups for its online redo logs. They are all 50 MB in size-important for this test, but not realistic :) Following the Oracle documentation I created n + 1 groups (per thread) on the standby to stop Data Guard broker from complaining about missing standby redo logs (SRL).

The end result was positive, here’s what the broker thinks:

DGMGRL> show configuration

Configuration - test

  Protection Mode: MaxPerformance
  Members:
  CDB1  - Primary database
    STDBY - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 23 seconds ago)

The broker likes to bemoan you in case SRLs are missing on either primary and/or standby. So I’m all good to go! But wait a second, when I’m using one of the cool new 12c features to check the failover readiness, something is odd:

DGMGRL> validate database "STDBY"

  Database Role:     Physical standby database
  Primary Database:  CDB1

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Flashback Database Status:
    CDB1:   Off
    STDBY:  Off

  Current Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (CDB1)                  (STDBY)
    1         3                       2                       Insufficient SRLs

  Future Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (STDBY)                 (CDB1)
    1         3                       0                       Insufficient SRLs
    Warning: standby redo logs not configured for thread 1 on CDB1

DGMGRL>

Pardon me? I _do_ have 4 groups of SRLs:

SQL> r
  1* select group#,sequence#,bytes,used,status from v$standby_log

    GROUP#  SEQUENCE#      BYTES       USED STATUS
---------- ---------- ---------- ---------- ----------
         4         27   52428800       2048 ACTIVE
         5         29   52428800      11264 ACTIVE
         6          0   52428800          0 UNASSIGNED
         7          0   52428800          0 UNASSIGNED

And 3 online redo logs:

SQL> select group#,thread#,sequence#,bytes,status from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES STATUS
---------- ---------- ---------- ---------- ----------------
         1          1         31   52428800 CURRENT
         2          1         29   52428800 INACTIVE
         3          1         30   52428800 INACTIVE

But the fun fact remains that out of my 4 standby redo logs, only 2 are ever used. @pioro suggested that it’s because there is no need to use any of the other two because of a lack of redo to be applied and that sounds plausible. To validate this I created a test environment on physical hardware with proper storage because my tests on the lab turned out to suffer from IO problems.

The setup is the same.

I created a couple of sessions that start off by creating a 1,000,000 rows table, then delete from it only to roll back after the delete completed. That’s a lot of redo for 50 MB files (again you wouldn’t use 50MB online redo logs in production-this is just a test). My observation remains: only groups 4 and 5 are used. When the redo generation on the primary gets too far ahead, there will be RFS transfers of the archived redo log. Here are a few queries with their output:

PROCESS   STATUS       CLIENT_P GROUP#                                      THREAD#  SEQUENCE#     BLOCK# ACTIVE_AGENTS
--------- ------------ -------- ---------------------------------------- ---------- ---------- ---------- -------------
ARCH      CLOSING      ARCH     5                                                 1        149      83968             0
ARCH      CONNECTED    ARCH     N/A                                               0          0          0             0
ARCH      CLOSING      ARCH     5                                                 1        147      86016             0
ARCH      CLOSING      ARCH     4                                                 1        148      83968             0
MRP0      APPLYING_LOG N/A      N/A                                               1        147      85766            25
RFS       IDLE         UNKNOWN  N/A                                               0          0          0             0
RFS       IDLE         UNKNOWN  N/A                                               0          0          0             0
RFS       RECEIVING    LGWR     3                                                 1        150      85744             0
RFS       IDLE         ARCH     N/A                                               0          0          0             0

Is this a problem? Not as far as I can tell. And maybe I did something incorrectly on my side too. There wasn’t a problem-my archiver process was quick enough to archive the SRLs and as soon as it fell behind it resorted to fetch archived redo logs from the primary. When it uses the FAL process you see line like this in the alert.log:

2014-09-17 02:34:45.164000 -05:00
RFS[4]: Selected log 4 for thread 1 sequence 211 dbid 788205474 branch 857548261
Media Recovery Log +RECO/STDBY/ARCHIVELOG/2014_09_17/thread_1_seq_209.368.858479683
Archived Log entry 141 added for thread 1 sequence 210 ID 0x2efbcba0 dest 1:
2014-09-17 02:34:46.215000 -05:00

When Real-Time-Apply was active (whenever redo transfer rates dropped), I saw this, alternating between group 4 and 5:

2014-09-17 02:35:33.653000 -05:00
Media Recovery Waiting for thread 1 sequence 230 (in transit)
Recovery of Online Redo Log: Thread 1 Group 5 Seq 230 Reading mem 0
  Mem# 0: +DATA/STDBY/ONLINELOG/group_5.380.858076769
  Mem# 1: +RECO/STDBY/ONLINELOG/group_5.342.858076769

So for some reason yet unknown to me there are only ever 2 SRLs in use.

8 Responses to “Interesting observation about standby redo logs in Data Guard”

  1. I have seen this exact behaviour in RAC and non-RAC DataGuard setups since at least 11.2.0.3. I never tried any kind of in depth testing of this “issue”, since I didn’t care either way (as long as recovery works as intended, I don’t care if Oracle uses all standby redo groups or only two of them.

    At least now I know I didn’t imagine things. :)

  2. Freek D'Hooge said

    When creating the standby logs, did you explictly set the thread?
    I vaguely remember something that if you did not did this, the standby logs where assigned at runtime to the threads as needed.
    This was normally only on RAC, but could be that something changed on 12c

    Regards,

    Freek

    • Martin Bach said

      Hi Freek,

      Initially I didn’t but that was one of the suggestions I implemented. Specifically set group and thread along with the size for the test.

  3. Lasse said

    Hi,
    I can confirm that by recreating the Standby Redo logs with thread solved the problem. My setup where 12.1.0.2.1

    Regards,

    Lasse

  4. Sumeet said

    Nice Post Martin .

    IMHO , the scenario described here is normal and has been existing since 9i .
    This is mentioned in Published MOS Doc 219344.1 .
    Snip from the note below

    ## BeginSnip
    The RFS-Process always tries to allocate the next available Standby RedoLog, so
    it is possible that you encounter a Switch between only two Standby RedoLogs,
    although you created lots more of them. There’s no rota defined here like in
    Online RedoLogs.
    ## EndSnip

    -Sumeet

  5. salvo said

    Hello, i have the same behaviour on Oracle 11.2.0.2 and also after recreating the standby log file with thread and size i continue to have the problem.

  6. sunny narula said

    I had seen this behaviour and it is because the SRL which are UNUSED are in thread 0 instead of thread 1 and that is the reaso it is always UNUSED so you need to recreate SRL

    • Igor Gorbatovsky said

      I’m here for the same reason. I’m trying to understand why my DB uses only two redo groups while there are 7 more. So how does the solution for this look like Sunny?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: