After the baseline has been established in the first part of this series it’s time to measure the effect of the network in this part. The second test will introduce an interesting feature: Using Linux’s own Traffic Shaper/Quality of Services module I will add a delay of 100ms to the Data Guard network interface card (NIC) to slow things down a little.
WARNING: this is of course a lab or VM-only situation. I can’t exert control over wire quality in my (own) switches, hence some software magic is needed on my virtual ones. This post is intended to be merely for educational purposes, not for use at work.
I am continuing to use the 2 node RAC 12.1.0.2.170117 primary database on Oracle Linux 7 with UEK 4 and an identical RAC to host my standby database.
All database traffic still uses eth0 – my public network (192.168.100.0/24). In order not to affect my SSH session with the planned network changes I created a new out-of-band management interface (eth2) on the Data Guard standby database servers. I love KVM and virtio drivers for the NICs: a few commands in virsh later and every RAC node has a new NIC, and of course without rebooting. Adding some nmcli magic and I can configure all interfaces in 1 line!
Here’s what it looks like from connection manager (the system is Oracle Linux 7.3)
[root@rac12sec1 ~]# nmcli c show
NAME UUID TYPE DEVICE
eth0 3e8915f1-10ce-4da7-99b8-d2a6947f9ff7 802-3-ethernet eth0
eth1 0ebe1279-40e1-4faf-a0be-49588d347fcf 802-3-ethernet eth1
mgmt 51054729-19ae-49e7-ba18-c6b41475ec1b 802-3-ethernet eth2
My eth0 device maps to the public networks, e.g. all database traffic including Data Guard. The private interconnect uses eth1 with eth2 as the management interface. I have deliberately kept this simple, usually I’d request a dedicated set of interfaces to route Data Guard traffic (I blogged about that earlier in the context of RAC) here:
Please read past part I of the Adding an additional SCAN-series to get the full story.
Change protection mode
Following the scenario I outlined in part I I first have to change the protection mode to Maximum Availability. Again, please see part I of this series for an explanation about Protection Levels, my choice of protection level, and why “Maximum Availability” might or might not work for you.
Since I’m using the Data Guard Broker, I need to head over and change the mode in dgmgrl.
DGMGRL> show configuration
Configuration - ractest
Protection Mode: MaxPerformance
Members:
NCDBA - Primary database
NCDBB - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS (status updated 40 seconds ago)
As with many other maintenance tasks, you use the edit command to change the configuration.
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
Error: ORA-16627: operation disallowed since no standby databases would
remain to support protection mode
Failed.
I seem to have gotten a bit ahead of myself – changing to the more serious protection modes requires changing the transport mode from async to sync. Let’s do that, and then change the protection mode:
DGMGRL> edit database 'NCDBA' set property LogXptMode = 'SYNC';
Property "logxptmode" updated
DGMGRL> edit database 'NCDBB' set property LogXptMode = 'SYNC';
Property "logxptmode" updated
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
Succeeded.
DGMGRL> show configuration
Configuration - ractest
Protection Mode: MaxAvailability
Members:
NCDBA - Primary database
NCDBB - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS (status updated 19 seconds ago)
DGMGRL>
That looks better now, the first half of the job is done. The second half of this article details how to make the network look a lot worse.
Net Emulator
Using the Linux netem kernel module I can simulate many complex network situations – I found https://wiki.linuxfoundation.org/networking/netem to be a good reference.
I should probably say at this point that no one would use a link that has 100 ms delay between sites for (sync) redo transport. That simply doesn’t make sense. I used 100ms to have a large enough factor so that I can actually be sure that a deviation of performance figures is actually a result of my fiddling with the network latency. Large numbers make spotting outliers easier.
How do you mess with the network? That is simple when you are on Linux! On my standby cluster I use the tc command, part of the iproute package. This is the “normal” state:
[root@rac12sec1 ~]# tc qdisc
qdisc pfifo_fast 0: dev eth0 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth2 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
[root@rac12sec1 ~]#
I am merely using these tools, if you want to understand them, have a look at the Linux Advanced Routing & Traffic Control HOWTO chapter 9, and refer to the link to the Linux Foundation wiki I mentioned earlier. Then take it from there.
The syntax of tc qdisc is complex at first sight:
[root@rac12sec1 ~]# tc qdisc help
Usage: tc qdisc [ add | del | replace | change | show ] dev STRING
[ handle QHANDLE ] [ root | ingress | parent CLASSID ]
[ estimator INTERVAL TIME_CONSTANT ]
[ stab [ help | STAB_OPTIONS] ]
[ [ QDISC_KIND ] [ help | OPTIONS ] ]
tc qdisc show [ dev STRING ] [ingress]
Where:
QDISC_KIND := { [p|b]fifo | tbf | prio | cbq | red | etc. }
OPTIONS := ... try tc qdisc add <desired QDISC_KIND> help
STAB_OPTIONS := ... try tc qdisc add stab help
The above allows us to introduce NetEM to the mix. On my Linux 7.3/UEK4 system it offers the following options:
[root@rac12sec1 ~]# tc qdisc add netem help
Usage: ... netem [ limit PACKETS ]
[ delay TIME [ JITTER [CORRELATION]]]
[ distribution {uniform|normal|pareto|paretonormal} ]
[ corrupt PERCENT [CORRELATION]]
[ duplicate PERCENT [CORRELATION]]
[ loss random PERCENT [CORRELATION]]
[ loss state P13 [P31 [P32 [P23 P14]]]
[ loss gemodel PERCENT [R [1-H [1-K]]]
[ ecn ]
[ reorder PRECENT [CORRELATION] [ gap DISTANCE ]]
[ rate RATE [PACKETOVERHEAD] [CELLSIZE] [CELLOVERHEAD]]
That’s a wealth of options. I can simulate somewhat unreliable WAN links, even with variations in delay! Refer to the Linux Foundation wiki for more details on how to simulate networks more realistically.
Introducing Delay
I prefer to keep it simple for this post, so I’ll stick with the most basic setting for network delay. I am introducing this to the Data Guard network on both of my standby nodes while connected to the management interface eth2:
[root@rac12sec1 ~]# tc qdisc add dev eth0 root netem delay 100ms
[root@rac12sec1 ~]# tc qdisc
qdisc netem 8001: dev eth0 root refcnt 2 limit 1000 delay 100.0ms
qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth2 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
[root@rac12sec2 ~]# tc qdisc
qdisc netem 8001: dev eth0 root refcnt 2 limit 1000 delay 100.0ms
qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth2 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
[root@rac12sec2 ~]# tc qdisc show dev eth0
qdisc netem 8001: root refcnt 2 limit 1000 delay 100.0ms
The effect can be seen immediately by pinging one of the standby nodes:
[oracle@rac12pri1 ~]$ ping rac12sec1
PING rac12sec1.example.com (192.168.100.20) 56(84) bytes of data.
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=1 ttl=64 time=100 ms
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=2 ttl=64 time=100 ms
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=3 ttl=64 time=100 ms
64 bytes from rac12sec1.example.com (192.168.100.20): icmp_seq=4 ttl=64 time=100 ms
^C
--- rac12sec1.example.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 100.368/100.415/100.478/0.229 ms
[oracle@rac12pri1 ~]$
So 100ms delay it is!
Starting the Benchmark
With the scene set, it’s time to start the benchmark. The command to start it resembles the one I used for part I, except for the name of the results file.
[oracle@loadgen1 bin]$ ./charbench -cs //rac12pri-scan/swingbench_both \
> -uc 20 -r results_20_maxavail_netem.xml -rt 00:10
Author : Dominic Giles
Version : 2.5.0.971
Results will be written to results_20_maxavail_netem.xml.
Hit Return to Terminate Run...
Time Users TPM TPS
1:08:56 PM 20 5188 94
What effect does this have on the database? Let’s check the Data Guard broker first:
DGMGRL> show database 'NCDBB'
Database - NCDBB
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 10 seconds (computed 0 seconds ago)
Average Apply Rate: 207.00 KByte/s
Real Time Query: OFF
Instance(s):
NCDBB1
NCDBB2 (apply instance)
Database Status:
SUCCESS
DGMGRL> show database 'NCDBB'
Database - NCDBB
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 0 seconds (computed 0 seconds ago)
Average Apply Rate: 211.00 KByte/s
Real Time Query: OFF
Instance(s):
NCDBB1
NCDBB2 (apply instance)
Database Status:
SUCCESS
So it looks like there isn’t any lag visible with the benchmark underway, however it took a little while for it to start as visible in the first example. Here is the output from Tanel Poder’s ASHTOP script again (using Tanel Poder’s ashtop again – which requires you to have the Diagnostic Pack license as it access v$active_session_history):
SQL> @ashtop program,event,wait_class,session_state session_type='BACKGROUND' "to_date('17.02.2017 08:10','dd.mm.yyyy hh24:mi')" "to_date('17.02.2017 08:15','dd.mm.yyyy hh24:mi')"
Total
Seconds AAS %This PROGRAM EVENT WAIT_CLASS SESSION FIRST_SEEN LAST_SEEN DIST_SQLEXEC_SEEN
--------- ------- ------- ------------------------------------------------ ---------------------------------------- --------------- ------- ------------------- ------------------- -----------------
298 1.0 24% | oracle@rac12pri1 (LGWR) SYNC Remote Write Other WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59 1
298 1.0 24% | oracle@rac12pri1 (NSS2) Redo Transport MISC Other WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59 1
276 .9 23% | oracle@rac12pri2 (LGWR) SYNC Remote Write Other WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59 1
274 .9 23% | oracle@rac12pri2 (NSS2) Redo Transport MISC Other WAITING 2017-02-17 08:10:00 2017-02-17 08:14:59 1
8 .0 1% | oracle@rac12pri2 (LMS1) ON CPU 2017-02-17 08:10:16 2017-02-17 08:14:59 1
7 .0 1% | oracle@rac12pri2 (LMS0) gcs log flush sync Other WAITING 2017-02-17 08:10:43 2017-02-17 08:14:51 1
5 .0 0% | oracle@rac12pri1 (LMS0) gcs log flush sync Other WAITING 2017-02-17 08:10:23 2017-02-17 08:13:48 1
5 .0 0% | oracle@rac12pri2 (LMS0) ON CPU 2017-02-17 08:10:00 2017-02-17 08:13:24 1
5 .0 0% | oracle@rac12pri2 (NSV1) ON CPU 2017-02-17 08:12:43 2017-02-17 08:14:44 1
4 .0 0% | oracle@rac12pri1 (DBW0) ON CPU 2017-02-17 08:11:16 2017-02-17 08:13:35 1
3 .0 0% | oracle@rac12pri2 (CJQ0) ON CPU 2017-02-17 08:12:39 2017-02-17 08:12:51 2
3 .0 0% | oracle@rac12pri2 (RSM0) ON CPU 2017-02-17 08:11:44 2017-02-17 08:14:44 3
2 .0 0% | oracle@rac12pri1 (LMON) control file sequential read System I/O WAITING 2017-02-17 08:11:51 2017-02-17 08:14:01 1
2 .0 0% | oracle@rac12pri1 (LMS1) gcs log flush sync Other WAITING 2017-02-17 08:11:14 2017-02-17 08:11:44 1
2 .0 0% | oracle@rac12pri2 (DBW0) db file parallel write System I/O WAITING 2017-02-17 08:12:14 2017-02-17 08:14:24 1
2 .0 0% | oracle@rac12pri2 (LGWR) log file parallel write System I/O WAITING 2017-02-17 08:12:59 2017-02-17 08:14:42 1
1 .0 0% | oracle@rac12pri1 (CKPT) control file sequential read System I/O WAITING 2017-02-17 08:13:46 2017-02-17 08:13:46 1
1 .0 0% | oracle@rac12pri1 (DBW0) db file parallel write System I/O WAITING 2017-02-17 08:10:28 2017-02-17 08:10:28 1
1 .0 0% | oracle@rac12pri1 (LGWR) Redo Transport Close Other WAITING 2017-02-17 08:12:22 2017-02-17 08:12:22 1
1 .0 0% | oracle@rac12pri1 (LGWR) log file sequential read System I/O WAITING 2017-02-17 08:14:16 2017-02-17 08:14:16 1
20 rows selected.
As you can see the top waits are in wait class “other”: “SYNC Remote Write” and “Redo Transport MISC”. I have pulled an AWR diff report that shows an interesting picture. The first snapshot set represents the full network performance, while the second one shows the effect of the 100ms network delay. Only relevant information is shown.
WORKLOAD REPOSITORY RAC Compare Period Report for
Database Summary
~~~~~~~~~~~~~~~~
Database Snapshot Ids Number of Instances Number of Hosts Report Total (minutes)
---------------------------------- ----------------- ----------------- ----------------- ---------------------
Snapshot Set Id Name RAC Block Si Begin End In Repor Total In Repor Total DB time Elapsed ti
------------ ----------- --------- --- -------- -------- -------- -------- -------- -------- -------- ---------- ----------
First (1st) 481582958 NCDBA YES 8192 303 304 2 2 2 2 22.1 10.2
Second (2nd) 481582958 NCDBA YES 8192 301 302 2 2 2 2 141.9 10.6
Database Instances Included In Report
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elapsed Instance Avg Activ
Set DB Id Inst Instance Release Host Startup Begin Snap Time End Snap Time Time (min) DB time (m Up Time (h Sessions Platform
--- ----------- ---- --------- --------------- ---------- --------------- --------------- --------------- ---------- ---------- ---------- --------- ---------------
1st 481582958 1 NCDBA1 12.1.0.2.0 rac12pri1 17-Feb-17 04:00 17-Feb-17 08:59 17-Feb-17 09:09 10.0 2.0 5.2 0.2 Linux x86 64-bi
2 NCDBA2 12.1.0.2.0 rac12pri2 17-Feb-17 04:00 17-Feb-17 08:59 17-Feb-17 09:09 10.0 20.1 5.2 2.0 Linux x86 64-bi
2nd 481582958 1 NCDBA1 12.1.0.2.0 rac12pri1 17-Feb-17 04:00 17-Feb-17 08:06 17-Feb-17 08:17 10.5 134.0 4.3 12.8 Linux x86 64-bi
2 NCDBA2 12.1.0.2.0 rac12pri2 17-Feb-17 04:00 17-Feb-17 08:07 17-Feb-17 08:17 10.5 7.9 4.3 0.8 Linux x86 64-bi
...
Workload Comparison
~~~~~~~~~~~~~~~~~~~ 1st Per Sec 2nd Per Sec %Diff 1st Per Txn 2nd Per Txn %Diff
--------------- --------------- ------ --------------- --------------- ------
DB time: 2.2 13.3 514.3 0.0 0.2 1,600.0
CPU time: 1.5 0.5 -66.0 0.0 0.0 0.0
Background CPU time: 0.5 0.2 -56.9 0.0 0.0 0.0
Redo size (bytes): 983,488.1 322,026.1 -67.3 4,164.1 4,192.4 0.7
Logical read (blocks): 30,773.7 8,299.3 -73.0 130.3 108.0 -17.1
Block changes: 5,665.8 1,886.0 -66.7 24.0 24.6 2.3
Physical read (blocks): 10.5 18.3 74.4 0.0 0.2 500.0
Physical write (blocks): 654.9 196.8 -69.9 2.8 2.6 -7.6
Read IO requests: 10.1 12.9 27.3 0.0 0.2 325.0
Write IO requests: 503.8 110.4 -78.1 2.1 1.4 -32.4
Read IO (MB): 0.1 0.1 75.0 0.0 0.0 0.0
Write IO (MB): 5.1 1.5 -69.9 0.0 0.0 0.0
IM scan rows: 0.0 0.0 0.0 0.0 0.0 0.0
Session Logical Read IM:
User calls: 515.8 188.2 -63.5 2.2 2.5 12.4
Parses (SQL): 24.9 17.7 -28.8 0.1 0.2 109.1
Hard parses (SQL): 0.1 0.2 45.5 0.0 0.0 0.0
SQL Work Area (MB): 0.1 0.5 253.8 0.0 0.0 253.8
Logons: 1.3 3.9 191.7 0.0 0.1 400.0
Executes (SQL): 2,587.7 853.0 -67.0 11.0 11.1 1.4
Transactions: 236.2 76.8 -67.5
...
Top Timed Events First DB/Inst: NCDBA/NCDBA1 Snaps: 303-304 (Elapsed time: 612.266 sec DB time: 1326.09 sec), Second DB/Inst: NCDBA/NCDBA1 Snaps: 301-302 (Elapsed time: 638.691 sec DB time: 8514.42 sec)
-> Events with a "-" did not make the Top list in this set of snapshots, but are displayed for comparison purposes
-> Summary for all instances
1st 2nd
------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------
Event Wait Class Waits Time(s) Avg Time(ms) %DB time Event Wait Class Waits Time(s) Avg Time(ms) %DB time
------------------------------ ------------- ------------ ------------ ------------- ----------- ------------------------------ ------------- ------------ ------------ ------------- -----------
CPU time N/A 919.5 N/A 69.3 log file sync Commit 49,010 7,976.7 162.8 93.7
log file sync Commit 144,284 270.4 1.9 20.4 Redo Transport MISC Other 11,392 1,192.0 104.6 14.0
log file parallel write System I/O 141,581 156.6 1.1 11.8 SYNC Remote Write Other 11,392 1,179.1 103.5 13.8
gc current block 2-way Cluster 98,483 71.9 0.7 5.4 CPU time N/A 328.7 N/A 3.9
gc cr block 2-way Cluster 71,469 47.9 0.7 3.6 gcs log flush sync Other 4,480 56.0 12.5 0.7
target log write size Other 22,284 26.5 1.2 2.0 gc current block 2-way Cluster 62,843 43.3 0.7 0.5
gc current grant busy Cluster 22,561 16.8 0.7 1.3 gc current block busy Cluster 284 39.3 138.4 0.5
control file sequential read System I/O 8,497 6.0 0.7 0.5 log file switch (checkpoint i Configuration 75 37.1 495.1 0.4
db file sequential read User I/O 5,862 5.4 0.9 0.4 rdbms ipc reply Other 32 27.1 845.4 0.3
log file sequential read System I/O 14,895 4.8 0.3 0.4 gc cr block 2-way Cluster 39,015 26.0 0.7 0.3
-log file switch (checkpoint i Configuration 212 3.6 17.2 0.3 -log file parallel write System I/O 11,267 20.3 1.8 0.2
-gcs log flush sync Other 461 0.4 0.9 0.0 -gc current grant busy Cluster 12,427 8.2 0.7 0.1
-gc current block busy Cluster 126 0.3 2.7 0.0 -control file sequential read System I/O 7,326 5.0 0.7 0.1
- N/A N/A N/A N/A -db file sequential read User I/O 6,615 4.7 0.7 0.1
- N/A N/A N/A N/A -log file sequential read System I/O 174 0.4 2.4 0.0
--------------------------------------------------------------------------------------------------------------------
This doesn’t look too good, does it? The throughput is down substantially, which is visible in Swingbench itself: in the first example I managed to achieve 255 TPS while I’m averaging 90-95 in the synch + delay example. The numbers from the AWR excerpt shown above confirm this, and allows you to work out where the problem is.
Here is a print screen taken from OEM (that’s a different benchmark execution), showing the same situation. Make sure you are licensed appropriately to use this view.

Summary Part II
So yes, there is a very noticeable affect of the network delay, and the AWR report shows the relevant wait events in 12c indicating where time is spent.
In my opinion it’s a bit unfortunate to have the new waits in the “Other” wait class, but that’s how it is.
References
http://www.oracle.com/technetwork/database/availability/sync-2437177.pdf
You must be logged in to post a comment.