I have come across this phenomenon a couple of times now so I thought it was worth writing up.
Consider a scenario where you get an alert because your standby database has an apply lag. The alert is generated by OEM and when you log in and check-it has indeed an apply lag. Even worse, the apply lag increases with every refresh of the page! I tagged this as an 11.2 problem but it’s definitely not related to that version.
Here is a screenshot of this misery:
Now there are of course a number of possible causes:
- There is a lag
- You are not using Real Time Apply
The first one is easy to check: look at the redo generation rate on the primary database to see if it’s any different. Maybe you are currently loading lots of data? Maybe a batch job has been initiated that goes over a lot of data… the possibilities are nearly endless.
Another, more subtle interpretation could be that you are not using Real Time Apply. How can you check? In the broker command line interface for example:
DGMGRL> show configuration Configuration - test Protection Mode: MaxPerformance Databases: pri - Primary database Warning: ORA-16789: standby redo logs not configured sby - Physical standby database Warning: ORA-16789: standby redo logs not configured Fast-Start Failover: DISABLED Configuration Status: WARNING
The warnings about missing standby redo logs show that you cannot possibly use Real Time Apply (it needs standby redo logs). The other option is in the database itself:
SQL> select dest_id,status,database_mode,recovery_mode 2 from v$archive_dest_status 3 where status <> 'INACTIVE'; DEST_ID STATUS DATABASE_MODE RECOVERY_MODE ---------- --------- --------------- ----------------------- 1 VALID MOUNTED-STANDBY MANAGED 32 VALID UNKNOWN IDLE
Did you notice dest_id of 32? That’s a bit of an unusual one, more on that later (since you can only set log_archive_dest_x where x is {1,31}).
So indeed we have managed recovery active, but not using Real Time Apply. This is expressed in the database status:
DGMGRL> show database verbose sby Database - sby Role: PHYSICAL STANDBY Intended State: APPLY-ON Transport Lag: 28 seconds Apply Lag: 28 seconds Real Time Query: OFF Instance(s): sby
A few moments later when you query the database again the lag has increased:
DGMGRL> show database verbose sby Database - sby Role: PHYSICAL STANDBY Intended State: APPLY-ON Transport Lag: 3 minutes 22 seconds Apply Lag: 3 minutes 22 seconds Real Time Query: OFF Instance(s): sby
This is to be expected-the primary is still happily processing user requests. The cure is to add standby redo logs, as suggested in so many places and described in the Data Guard documentation. After the successful addition of SRLs the lag should disappear. A restart of managed recovery using the broker will show something along these lines on the standby:
2014-01-30 14:35:18.353000 +00:00 ALTER DATABASE RECOVER MANAGED STANDBY DATABASE THROUGH ALL SWITCHOVER DISCONNECT USING CURRENT LOGFILE Attempt to start background Managed Standby Recovery process (sby) MRP0 started with pid=24, OS id=4854 MRP0: Background Managed Standby Recovery process started (sby) 2014-01-30 14:35:23.406000 +00:00 started logmerger process Managed Standby Recovery starting Real Time Apply ... 2014-01-30 14:37:12.595000 +00:00 Media Recovery Waiting for thread 1 sequence 20 (in transit) 2014-01-30 14:37:13.691000 +00:00 Recovery of Online Redo Log: Thread 1 Group 5 Seq 20 Reading mem 0 Mem# 0: +DATA/sby/onlinelog/group_5.266.838218819
Two important bits of information are shown here: Managed Standby Recovery starting Real Time Apply and the fact that it is using the standby redo log. Sure enough, after the database is in sync with its primary and uses the log, the lag is gone:
DGMGRL> show database verbose sby Database - sby Role: PHYSICAL STANDBY Intended State: APPLY-ON Transport Lag: 0 seconds Apply Lag: 0 seconds Real Time Query: OFF Instance(s): sby
And also in the OEM view:
Slight Variation
I have also seen this problem in OEM where the transport lag was near 0 and therefore hardly visible due to the scale of the graph. The apply lag nevertheless resulted from the primary working and the current log hasn’t shipped to the standby-obviously before the implementation of standby redo logs. You saw a spike mounting in the OEM view until the next log switch on the primary when the apply lag dropped to 0 for a brief moment before increasing again.
Summary
Real Time Apply is a very very useful feature, especially when used together with the maximum availability protection mode. The real risk of not using standby redo logs – and implicitly no RT Apply – is that you lose data since the current online redo log on the primary has not been copied across. If you need to activate your standby you will be some transactions short of the primary. The larger the online redo log, the larger the gap.
You must be logged in to post a comment.