Little things worth knowing: redo transport in Data Guard 12.2 part 2

In the first part of this article I looked at a number of views and some netstat output to show how redo is transported from the primary database to its standby systems. The long story short is that TT02 (“async ORL multi”) was found sending redo to CDB3 asynchronously whilest NSS2 (“sync”) transferred redo to the synchronised target – CDB2. Unlike v$dataguard_process wanted me to believe, it really wasn’t LGWR sending redo over the network.

In this little article I would like to show you how the standby databases CDB2 and CDB3 receive redo and how you can map this back to the primary database, closing the loop.

How does CDB2 receive redo?

First I’m looking at CDB2, which receives redo via synchronous mode. I should be able to narrow the communication down between primary and standby by referring to the LGWR and TT02 process IDs in the CLIENT_PID column on the standby. As a quick reminder, 14986 is the PID for LGWR, 15029 belongs to NSS2, and 15252 maps to TT02. Let’s try:

SQL> select db_unique_name,database_role from v$database;

DB_UNIQUE_NAME                 DATABASE_ROLE
------------------------------ ----------------
CDB2                           PHYSICAL STANDBY

SQL> select name, pid, role, action, client_pid, client_role, sequence#, block#, dest_id
  2  from v$dataguard_process where client_pid in (14986,15029,15252);

NAME  PID        ROLE                    ACTION       CLIENT_PID CLIENT_ROLE       SEQUENCE#     BLOCK#    DEST_ID
----- ---------- ----------------------- ------------ ---------- ---------------- ---------- ---------- ----------
rfs   5517       RFS sync                IDLE              14986 log writer               95     174565          0

So it would appear the process responsible for shipping redo to “SYNC” destinations is the log writer. Actually, the output of v$dataguard_process is quite interesting, which is why I’m adding it here for the sake of completeness:

SQL> select name, pid, role, action, client_pid, client_role, sequence#, block#, dest_id
  2  from v$dataguard_process order by action;

NAME  PID        ROLE                    ACTION       CLIENT_PID CLIENT_ROLE       SEQUENCE#     BLOCK#    DEST_ID
----- ---------- ----------------------- ------------ ---------- ---------------- ---------- ---------- ----------
rfs   5517       RFS sync                IDLE              14986 log writer               95     229446          0
rfs   5350       RFS archive             IDLE              15224 archive gap               0          0          1
rfs   5346       RFS ping                IDLE              15124 gap manager              95          0          0
rfs   5354       RFS archive             IDLE              15233 archive gap               0          0          1
MRP0  5348       managed recovery        IDLE                  0 none                      0          0          0
rfs   5352       RFS archive             IDLE              15240 archive gap               0          0          1
LGWR  5207       log writer              IDLE                  0 none                      0          0          0
TT01  5259       redo transport timer    IDLE                  0 none                      0          0          0
TT00  5255       gap manager             IDLE                  0 none                      0          0          0
ARC1  5263       archive redo            IDLE                  0 none                      0          0          0
ARC2  5265       archive redo            IDLE                  0 none                      0          0          0
ARC3  5267       archive redo            IDLE                  0 none                      0          0          0
TMON  5242       redo transport monitor  IDLE                  0 none                      0          0          0
ARC0  5257       archive local           IDLE                  0 none                      0          0          0

14 rows selected.

This view tells me that LGWR is attached to the RFS sync proces. But now I know better than that, and it is similar to what I saw on the primary. Looking a little closer, I can see that strictly speaking, the RFS process is connected to NSS2:

[root@server2 ~]# netstat -tunalp | egrep 'Active|Proto|5517'
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp6       0      0 192.168.100.22:1521     192.168.100.21:15515    ESTABLISHED 5517/oracleCDB2   

I am repeating the values for the primary here so you don’t have to go back to the previous article:

[root@server1 ~]# ps -ef | egrep 'lgwr|lg0|nss'
oracle   14986     1  0 09:58 ?        00:01:19 ora_lgwr_CDB1
oracle   14990     1  0 09:58 ?        00:00:00 ora_lg00_CDB1
oracle   14994     1  0 09:58 ?        00:00:00 ora_lg01_CDB1
oracle   15029     1  0 09:58 ?        00:00:43 ora_nss2_CDB1

[root@server1 ~]# netstat -vW --numeric-ports -ee -p -la --tcp | egrep '14986|14990|14994|15029'
tcp        0      0 server1.example.com:15515 server2.example.com:1521 ESTABLISHED oracle     16400768   15029/ora_nss2_CDB1

You will notice that port 15515 on server1 belongs to ora_nss2_CDB1.

Going back a little to v$dataguard_process, it seems a bit weird to see MRP0 as “idle” when the database is in managed recovery mode using real time apply. Trying something else I am querying v$managed_standby and voila: MRP0 is said to apply logs:

SQL> select process,pid,status,client_process,client_pid,sequence#,block# 
  2  from v$managed_standby order by status;

PROCESS   PID        STATUS       CLIENT_P CLIENT_PID                                SEQUENCE#     BLOCK#
--------- ---------- ------------ -------- ---------------------------------------- ---------- ----------
DGRD      5255       ALLOCATED    N/A      N/A                                               0          0
DGRD      5259       ALLOCATED    N/A      N/A                                               0          0
MRP0      5348       APPLYING_LOG N/A      N/A                                              95     246625
ARCH      5257       CLOSING      ARCH     5257                                             92       4096
ARCH      5263       CLOSING      ARCH     5263                                             93       2048
ARCH      5265       CLOSING      ARCH     5265                                             94     342016
ARCH      5267       CONNECTED    ARCH     5267                                              0          0
RFS       5350       IDLE         UNKNOWN  15224                                             0          0
RFS       5354       IDLE         UNKNOWN  15233                                             0          0
RFS       5352       IDLE         UNKNOWN  15240                                             0          0
RFS       5517       IDLE         LGWR     14986                                            95     246626
RFS       5346       IDLE         Archival 15124                                             0          0

12 rows selected.

I guess that’s true, as the system is in constant recovery using the standby logfiles.

And what about CDB3?

On the other hand, CDB3 – to which redo is shipped asynchronously – lists TT02 as it’s counterpart:

SQL> select db_unique_name,database_role from v$database;

DB_UNIQUE_NAME                 DATABASE_ROLE
------------------------------ ----------------
CDB3                           PHYSICAL STANDBY

SQL> select name, pid, role, action, client_pid, client_role, sequence#, block#, dest_id 
  2   from v$dataguard_process order by action;

NAME  PID                      ROLE                    ACTION       CLIENT_PID CLIENT_ROLE       SEQUENCE#     BLOCK#    DEST_ID
----- ------------------------ ----------------------- ------------ ---------- ---------------- ---------- ---------- ----------
rfs   14803                    RFS ping                IDLE              15124 gap manager              96          0          0
rfs   14809                    RFS archive             IDLE              15233 archive gap               0          0          0
rfs   14811                    RFS async               IDLE              15252 async ORL multi          96      34674          0
MRP0  11825                    managed recovery        IDLE                  0 none                      0          0          0
ARC0  11776                    archive local           IDLE                  0 none                      0          0          0
ARC2  11786                    archive redo            IDLE                  0 none                      0          0          0
TT00  11774                    gap manager             IDLE                  0 none                      0          0          0
ARC3  11788                    archive redo            IDLE                  0 none                      0          0          0
TMON  11706                    redo transport monitor  IDLE                  0 none                      0          0          0
LGWR  11676                    log writer              IDLE                  0 none                      0          0          0
ARC1  11784                    archive redo            IDLE                  0 none                      0          0          0
TT01  11778                    redo transport timer    IDLE                  0 none                      0          0          0

12 rows selected.

Unlike the case with CDB2, the local RFS process is indeed connecting to TT02 on server1:

[root@server2 ~]# netstat -vW --numeric-ports -ee -p -la --tcp | egrep '^Active|^Proto|14811'
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name    
tcp6       0      0 server2.example.com:1521 server1.example.com:12936 ESTABLISHED oracle     4658198    14811/oracleCDB3    

… and on server1:

[root@server1 ~]# netstat -vW --numeric-ports -ee -p -la --tcp | egrep '^Active|^Proto|12936'
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address            State       User       Inode      PID/Program name    
tcp        0      0 server1.example.com:12936 server2.example.com:1521 ESTABLISHED oracle     15820538   15252/ora_tt02_CDB1 

This should be enough evidence, I rest my case :)

Summary

So I guess that answers my question: On my small VMs in the lab, NSSn is responsible for shipping redo to targets in “SYNC” mode. The redo transport server processes TTnn ship redo to destinations that are defined for asynchronous transport.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s