Martins Blog

Trying to explain complex things in simple terms

Archive for the ‘Cloud Control’ Category

Data Guard transport lag in OEM 12c

Posted by Martin Bach on January 30, 2014

I have come across this phenomenon a couple of times now so I thought it was worth writing up.

Consider a scenario where you get an alert because your standby database has an apply lag. The alert is generated by OEM and when you log in and check-it has indeed an apply lag. Even worse, the apply lag increases with every refresh of the page! I tagged this as an 11.2 problem but it’s definitely not related to that version.

Here is a screenshot of this misery:

 Lag in OEM

Now there are of course a number of possible causes:

  • There is a lag
  • You are not using Real Time Apply

The first one is easy to check: look at the redo generation rate on the primary database to see if it’s any different. Maybe you are currently loading lots of data? Maybe a batch job has been initiated that goes over a lot of data… the possibilities are nearly endless.

Another, more subtle interpretation could be that you are not using Real Time Apply. How can you check? In the broker command line interface for example:

DGMGRL> show configuration

Configuration - test

  Protection Mode: MaxPerformance
  Databases:
    pri - Primary database
      Warning: ORA-16789: standby redo logs not configured

    sby - Physical standby database
      Warning: ORA-16789: standby redo logs not configured

Fast-Start Failover: DISABLED

Configuration Status:
WARNING

The warnings about missing standby redo logs show that you cannot possibly use Real Time Apply (it needs standby redo logs). The other option is in the database itself:

SQL> select dest_id,status,database_mode,recovery_mode
  2  from v$archive_dest_status
  3  where status <> 'INACTIVE';

   DEST_ID STATUS    DATABASE_MODE   RECOVERY_MODE
---------- --------- --------------- -----------------------
         1 VALID     MOUNTED-STANDBY MANAGED
        32 VALID     UNKNOWN         IDLE

Did you notice dest_id of 32? That’s a bit of an unusual one, more on that later (since you can only set log_archive_dest_x where x is {1,31}).

So indeed we have managed recovery active, but not using Real Time Apply. This is expressed in the database status:

DGMGRL> show database verbose sby

Database - sby

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   28 seconds
  Apply Lag:       28 seconds
  Real Time Query: OFF
  Instance(s):
    sby

A few moments later when you query the database again the lag has increased:

DGMGRL> show database verbose sby

Database - sby

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   3 minutes 22 seconds
  Apply Lag:       3 minutes 22 seconds
  Real Time Query: OFF
  Instance(s):
    sby

This is to be expected-the primary is still happily processing user requests. The cure is to add standby redo logs, as suggested in so many places and described in the Data Guard documentation. After the successful addition of SRLs the lag should disappear. A restart of managed recovery using the broker will show something along these lines on the standby:

2014-01-30 14:35:18.353000 +00:00
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  THROUGH ALL SWITCHOVER DISCONNECT  USING CURRENT LOGFILE
Attempt to start background Managed Standby Recovery process (sby)
MRP0 started with pid=24, OS id=4854
MRP0: Background Managed Standby Recovery process started (sby)
2014-01-30 14:35:23.406000 +00:00
 started logmerger process
Managed Standby Recovery starting Real Time Apply
...
2014-01-30 14:37:12.595000 +00:00
Media Recovery Waiting for thread 1 sequence 20 (in transit)
2014-01-30 14:37:13.691000 +00:00
Recovery of Online Redo Log: Thread 1 Group 5 Seq 20 Reading mem 0
  Mem# 0: +DATA/sby/onlinelog/group_5.266.838218819

Two important bits of information are shown here: Managed Standby Recovery starting Real Time Apply and the fact that it is using the standby redo log. Sure enough, after the database is in sync with its primary and uses the log, the lag is gone:

DGMGRL> show database verbose sby

Database - sby

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   0 seconds
  Apply Lag:       0 seconds
  Real Time Query: OFF
  Instance(s):
    sby

And also in the OEM view:

OEM-lag-02

Slight Variation

I have also seen this problem in OEM where the transport lag was near 0 and therefore hardly visible due to the scale of the graph. The apply lag nevertheless resulted from the primary working and the current log hasn’t shipped to the standby-obviously before the implementation of standby redo logs. You saw a spike mounting in the OEM view until the next log switch on the primary when the apply lag dropped to 0 for a brief moment before increasing again.

Summary

Real Time Apply is a very very useful feature, especially when used together with the maximum availability protection mode. The real risk of not using standby redo logs – and implicitly no RT Apply – is that you lose data since the current online redo log on the primary has not been copied across. If you need to activate your standby you will be some transactions short of the primary. The larger the online redo log, the larger the gap.

Posted in 11g Release 2, Cloud Control | Tagged: , | 8 Comments »

AIM SIG and my talk about Enterprise Manager 12c

Posted by Martin Bach on March 15, 2012

Yesterday I presented at UKOUG’s Availability, Infrastructure and Management Special Interest Group (hey, say this 3 times in a row, quickly!) about Oracle Enterprise Manager 12c and my experience with it. As my good fried Piet de Visser pointed out I had way too much to say for the 45 minute slot allocated. But then Piet always tells me that. Sadly he is also often right :) That’s why I like seeing him during my talks!

In summary I would have liked to do a different presentation, and that’s for two reasons: 1) I overran and 2) I haven’t managed to show the patching part which is hugely interesting, at least to me.

Now here’s the reason for the blog post. I haven’t done online seminars yet, and was wondering if people were interested in a 1-1.5 hour UKOUG-like presentation from myself, broadcast via Goto Meeting or similar to an audience. Would that be of interest? The topics to be covered are:

  • Thinking about the installation of OEM 12c including HA and other deployment options
  • Walking through the installation using print screens (it takes too long and I would bore you showing a slow moving progress bar)
  • Demo time, i.e. logging on to the system and showing a few things

For the interactive part I’d

  • show the new user interface
  • walk through the agent push
  • demonstrate how to add a target
  • and finally guide you through the patching process

I prefer this to be interactive (technology permitting), with a timescale of 1 to 1.5 hours and a Q&A session at the end. I will try to set something up if there is sufficient interest.

WARNING & disclaimer

I’m not an expert in OEM 12c! I installed it, played with it and can show you what I know.

Posted in Cloud Control, Public Appearances | 4 Comments »

RAC One Node on Oracle Enterprise Manager 12c

Posted by Martin Bach on January 17, 2012

One of the promises from Oracle for OEM 12c was improved support for Oracle RAC One Node. I have spent quite a bit of time researching RON, and wrote a little article in 2 parts about it which you can find here:

One of my complaints with it was the limited support in OEM 11.1. At the time I was on a major consolidation project, which would have used OEM for management of the database.

OEM 11.1

Unfortunately OEM 11.1 didn’t have support for RAC One Node. Why? RON is a cluster database running on just one node. The interesting bit is that the ORACLE_SID is your normal ORACLE_SID with an underscore and a number. Under normal circumstances that number is _1, or RON_1. But as soon as you relocate the database using srvctl relocate database -d a second instance RON_2 is started until all sessions have failed over.

OEM obviously doesn’t know about RON_2: it was never discovered. Furthermore, the strict mapping of instance name to host is no longer true (the same applies for policy managed databases by the way!). A few weeks and a few switchover operations later you could be running RON_2 on racnode1.

As a consequence, the poor on-call DBA is paged about a database that has gone down, when it hasn’t-it’s up and running. As a DBA, I wouldn’t want that. After discussions with Oracle they promised to fix that problem, but it hasn’t made it into 11.1 hence this blog post about 12.

Read the rest of this entry »

Posted in 11g Release 2, Cloud Control, Linux | Tagged: , | Leave a Comment »

Installing OEM 12c agents in RPM format

Posted by Martin Bach on November 22, 2011

One of the questions I have always asked myself revolved around: “why doesn’t Oracle package certain software as an RPM on Linux?” Well this question has recently been answered in the form of the Oracle 12c agent. It IS possible to use an RPM based installation, although it doesn’t make 100 use of RPM. I have written this post to give you an idea what happens.

The procedure is described in the OEM 12 Cloud Control Advanced Installation and Configuration Guide, chapter 6. The process is very similar to the non-RPM based agent deployment. Let’s have a loot at it in detail.

Read the rest of this entry »

Posted in Cloud Control, Uncategorized | 2 Comments »