Category Archives: 12c Release 2

You may still need gcc when patching Oracle Database 12.2

I have previously written about changes in the Oracle 12.2 preinstall RPM and how gcc is no longer part of the dependencies list. As was pointed out to me, this shouldn’t be necessary anymore, according to the 12.2 Linux Database Installation Guide. Check the blue note for a statement indicating that gcc and gcc-c++ aren’t needed for Grid Infrastructure, nor for the RDBMS software.

I have applied patch 27100009 (January 2018 Release Update 12.2.0.1.180116) on my 2 node RAC system in the lab, and found out that this is partially true :) You may or may not encounter this issue in your environment, see below.

Updating the preinstall RPM

Just to be sure I didn’t miss any changes to the preinstall RPM I pulled the latest one from Oracle and checked it’s requirements for gcc.

[oracle@rac122pri1 ~]$ rpm -qi oracle-database-server-12cR2-preinstall-1.0-3.el7.x86_64
Name        : oracle-database-server-12cR2-preinstall
Version     : 1.0
Release     : 3.el7
Architecture: x86_64
Install Date: Fri 19 Jan 2018 03:13:18 PM GMT
Group       : Test Environment/Libraries
Size        : 56561
License     : GPLv2
Signature   : RSA/SHA256, Mon 10 Jul 2017 11:27:07 AM BST, Key ID 72f97b74ec551f03
Source RPM  : oracle-database-server-12cR2-preinstall-1.0-3.el7.src.rpm
Build Date  : Mon 10 Jul 2017 11:26:59 AM BST
Build Host  : x86-ol7-builder-02.us.oracle.com
Relocations : (not relocatable)
Vendor      : Oracle
Summary     : Sets the system for Oracle Database single instance and Real Application
              Cluster install for Oracle Linux 7
Description :
The Oracle Preinstallation RPM package installs software packages and sets system parameters 
required for Oracle Database single instance and Oracle Real Application Clusters installations 
for Oracle Linux Release 7
Files affected: /etc/sysctl.conf, /boot/grub/menu.lst OR /boot/grub2/grub.cfg
Files added: /etc/security/limits.d/oracle-database-server-12cR2-preinstall.conf
[oracle@rac122pri1 ~]$ rpm -q --requires oracle-database-server-12cR2-preinstall-1.0-3.el7.x86_64 | grep -i gcc
libgcc
[oracle@rac122pri1 ~]$ 

So gcc is definitely not part of the dependent packages, and my minimum install doesn’t pull gcc (or gcc-c++ for that matter):

[oracle@rac122pri1 ~]$ rpm -qa | grep gcc
libgcc-4.8.5-16.el7_4.1.x86_64
[oracle@rac122pri1 ~]$ 

This is a 2 node cluster, consisting of nodes rac122pri1 and rac122pri2. Both use Oracle Linux 7.4 with UEK4 patched to February 12th 2018. As the names I picked for my cluster nodes suggest, this is a system running RAC 12.2.

IMPORTANT note: just because I hit the issue doesn’t mean there’s an issue for everyone. My lab environment is that – my lab environment. I just wanted to point out a way to investigate the problem and how I resolved it in my lab environment. Your mileage might vary.

Applying the January 2018 RU

I always like to be current when it comes to Oracle, and in that spirit decided to apply the January 2018 RU to my cluster after having made sure that I have a working backup. My rule is that there is no patching without backups, ever. And by backups I mean backups of the entire stack ;)

Since this was a new installation in my lab I thought I’d give opatchauto another chance and run with it. What could possibly go wrong?

Node 1 went without any issues, and opatchauto reported that it had applied the patches it wanted to apply. I don’t have the screen output anymore, however here’s the summary after the patch completed.

[oracle@rac122pri1 ~]$ opatch lspatches -oh /u01/app/oracle/product/12.2.0.1/dbhome_1
27335416;OCW JAN 2018 RELEASE UPDATE 12.2.0.1.180116 (27335416)
27105253;Database Release Update : 12.2.0.1.180116 (27105253)

OPatch succeeded.

[oracle@rac122pri1 ~]$ opatch lspatches -oh /u01/app/12.2.0.1/grid/
27335416;OCW JAN 2018 RELEASE UPDATE 12.2.0.1.180116 (27335416)
27144050;Tomcat Release Update 12.2.0.1.0(ID:171023.0830) (27144050)
27128906;ACFS Release Update : 12.2.0.1.0 (27128906)
27105253;Database Release Update : 12.2.0.1.180116 (27105253)
26839277;DBWLM RELEASE UPDATE 12.2.0.1.0(ID:170913) (26839277)

OPatch succeeded.
[oracle@rac122pri1 ~]$

Did you notice the “Tomcat Release Update 12.2.0.1.0” in the Grid Home? So much to investigate!

Patching node 2

After the patch completed on node 1 I continued with node 2. At first, everything looked quite normal, and opatch did it’s thing. I have taken a habit to always apply anything that depends on the network connection to be up in a screen (1) session. That protects the work I’m doing from intermittent network trouble, and allows me to keep a view on how things are progressing.

After a few minutes of work, opatchauto reported this (formatted for readability):

Start applying binary patch on home /u01/app/12.2.0.1/grid
Failed while applying binary patches on home /u01/app/12.2.0.1/grid

Execution of [OPatchAutoBinaryAction] patch action failed, check log for more details. Failures:
Patch Target : rac122pri2->/u01/app/12.2.0.1/grid Type[crs]
Details: [
---------------------------Patching Failed---------------------------------
Command execution failed during patching in home: /u01/app/12.2.0.1/grid, host: rac122pri2.
Command failed:  /u01/app/12.2.0.1/grid/OPatch/opatchauto  apply /u01/stage/27100009 -oh /u01/app/12.2.0.1/grid
    -target_type cluster -binary -invPtrLoc /u01/app/12.2.0.1/grid/oraInst.loc 
    -jre /u01/app/12.2.0.1/grid/OPatch/jre 
    -persistresult /u01/app/12.2.0.1/grid/OPatch/auto/dbsessioninfo/sessionresult_rac122pri2_crs.ser 
    -analyzedresult /u01/app/12.2.0.1/grid/OPatch/auto/dbsessioninfo/sessionresult_analyze_rac122pri2_crs.ser
Command failure output:
==Following patches FAILED in apply:

Patch: /u01/stage/27100009/27335416
Log: /u01/app/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-22_09-37-57AM_1.log
Reason: Failed during Patching: oracle.opatch.opatchsdk.OPatchException: Re-link fails on target "install_srvm".

Not exactly what you want to see during patching. But there’s no need to panic (just yet ;) The error message points me to a log file. Opening the log file it quickly become apparent why the issue occurred:

[22-Jan-2018 09:40:30] [INFO]       [OPSR-TIME] Finished applying patch "27335416" to local system
[22-Jan-2018 09:40:31] [INFO]       [OPSR-TIME] Loading raw inventory
[22-Jan-2018 09:40:31] [INFO]       [OPSR-MEMORY] Loaded all components from inventory. Heap memory in use: 314 (MB)
[22-Jan-2018 09:40:31] [INFO]       [OPSR-MEMORY] Loaded all one offs from inventory. Heap memory in use: 314 (MB)
[22-Jan-2018 09:40:31] [INFO]       [OPSR-TIME] Raw inventory loaded successfully
[22-Jan-2018 09:40:31] [INFO]       [OPSR-TIME] Loading cooked inventory
[22-Jan-2018 09:40:31] [INFO]       [OPSR-MEMORY] : Loading cooked one offs. Heap memory used 314 (MB)
[22-Jan-2018 09:40:32] [INFO]       [OPSR-MEMORY] : Loaded cooked oneoffs. Heap memory used : 363 (MB)
[22-Jan-2018 09:40:32] [INFO]       [OPSR-TIME] Cooked inventory loaded successfully
[22-Jan-2018 09:40:32] [INFO]       OUI-67050:Running make for target install_srvm
[22-Jan-2018 09:40:32] [INFO]       Start invoking 'make' at Mon Jan 22 09:40:32 GMT 2018Mon Jan 22 09:40:32 GMT 2018
[22-Jan-2018 09:40:32] [INFO]       Finish invoking 'make' at Mon Jan 22 09:40:32 GMT 2018
[22-Jan-2018 09:40:32] [WARNING]    OUI-67200:Make failed to invoke "/usr/bin/make -f ins_srvm.mk install_srvm 
                                    ORACLE_HOME=/u01/app/12.2.0.1/grid"....'/bin/sh: /usr/bin/gcc: No such file or directory
                                    make: *** [/u01/app/12.2.0.1/grid/rdbms/lib/config.o] Error 127
                                    '
[22-Jan-2018 09:40:32] [INFO]       Stack Description: java.lang.RuntimeException: /bin/sh: /usr/bin/gcc: No such file or directory
                                    make: *** [/u01/app/12.2.0.1/grid/rdbms/lib/config.o] Error 127
[22-Jan-2018 09:40:32] [INFO]       StackTrace: oracle.opatch.MakeAction.apply(MakeAction.java:534)

This is actually quite simple: for some reason opatch wants to use gcc, and doesn’t find it. After installing gcc and dependencies, I resumed opatchauto and finished patching successfully. I haven’t understood why Oracle wants to use gcc on node 2 after it didn’t require it on node 1.

Just thought I’d pass this on in case you hit the same problem. Happy patching!

Advertisements

OSWatcher, Tracefile Analyzer, and Oracle Restart 12.2

You are about to read the second part of this mini-series on TFA and Oracle 12.2. In the previous article I wrote about TFA and Oracle 12.2 single instance. In this short article I am going to have a look at TFA in a 12.2 Oracle Restart environment before rounding it up with an investigation into a full-blown RAC installation in part 3.

Summarising the first part I can only say that I am very happy that we now get TFA as part of the standard installation. Running it in daemon mode provides some great insights, and even if you did not upgrade the installation to “MOS-TFA”, you have a very fine tool for Oracle troubleshooting at your disposal.

Summary of the environment

My environment is largely the same as last time, except the machine name changed to server4 and I have additional storage for use with ASM.

  • It’s still Oracle Linux 7.4 with UEK4
  • Oracle 12.2.0.1.0 Grid Infrastructure was installed first
  • Followed by an installation of the Oracle 12.2.0.1.0 RDBMS EE software
  • After having created a database in ASM I applied the January 2018 GI RU
  • Finally I upgraded TFA to the current version (as downloaded from My Oracle Support DOC ID 1513912.1)

Not all of these steps are relevant for this article though.

Now where would you find TFA?

The question I had when creating this environment was essentially this: where would I find TFA? Would it be part of the Grid Home, or rather the RDBMS installation?

After having installed the binaries for Grid Infrastructure, I didn’t find a reference to roottfa.sh in the Grid Home’s root.sh script.

[oracle@server4 ~]$ cat /u01/app/oracle/product/12.2.0.1/grid/root.sh
#!/bin/sh
unset WAS_ROOTMACRO_CALL_MADE
. /u01/app/oracle/product/12.2.0.1/grid/install/utl/rootmacro.sh "$@"
. /u01/app/oracle/product/12.2.0.1/grid/install/utl/rootinstall.sh

#
# Root Actions related to network
#
/u01/app/oracle/product/12.2.0.1/grid/network/install/sqlnet/setowner.sh 

#
# Invoke standalone rootadd_rdbms.sh
#
/u01/app/oracle/product/12.2.0.1/grid/rdbms/install/rootadd_rdbms.sh

/u01/app/oracle/product/12.2.0.1/grid/rdbms/install/rootadd_filemap.sh 
/u01/app/oracle/product/12.2.0.1/grid/crs/config/rootconfig.sh $@ 
EXITCODE=$? 
if [ $EXITCODE -ne 0 ]; then
	exit $EXITCODE
fi

[oracle@server4 ~]$ 

I checked whether TFA has been configured in a different place, but there wasn’t any TFA-related process running nor was there a systemd unit file with *tfa* in its name. So it looks like you don’t get the option to install TFA automatically as part of the Grid Infrastructure installation.

Not finding TFA configured to run out of Grid Infrastructure surprised me, especially since Oracle states in the Autonomous Health Framework documentation chapter 4.2.2 that TFA is automatically configured as part of the GI configuration, upon invoking root.sh or rootupgrade.sh. According to my testing, this is true for RAC, but not Oracle Restart.

Which kind-of makes sense if you consider that most users will install the database software anyway in an Oracle Restart setup.

It doesn’t really matter, read on ;)

Installing the RDBMS software

If TFA isn’t configured automatically with Grid Infrastructure in an Oracle Restart configuration it should come as no surprise that TFA is once more installed and configured from the RDBMS home. Just as with the single instance installation I wrote about previously, you find a reference to roottfa.sh in the RDBMS home’s root.sh:

[oracle@server4 tfa]$ cat /u01/app/oracle/product/12.2.0.1/dbhome_1/root.sh 
#!/bin/sh
unset WAS_ROOTMACRO_CALL_MADE
. /u01/app/oracle/product/12.2.0.1/dbhome_1/install/utl/rootmacro.sh "$@"
. /u01/app/oracle/product/12.2.0.1/dbhome_1/install/utl/rootinstall.sh
/u01/app/oracle/product/12.2.0.1/dbhome_1/suptools/tfa/release/tfa_home/install/roottfa.sh
/u01/app/oracle/product/12.2.0.1/dbhome_1/install/root_schagent.sh

#
# Root Actions related to network
#
/u01/app/oracle/product/12.2.0.1/dbhome_1/network/install/sqlnet/setowner.sh 

#
# Invoke standalone rootadd_rdbms.sh
#
/u01/app/oracle/product/12.2.0.1/dbhome_1/rdbms/install/rootadd_rdbms.sh

/u01/app/oracle/product/12.2.0.1/dbhome_1/rdbms/install/rootadd_filemap.sh 
[oracle@server4 tfa]$ 

The procedure appears to be almost exactly the same as with single instance Oracle. I have again opted to use TFA in daemon mode, just as I did before in my post about Oracle single instance.

TFA thankfully discovers and adds both RDBMS as well as Grid Infrastructure directories (and quite a few Linux related directories, too!). You can see for yourself when running tfactl print directories.

As the final step you might want to consider upgrading TFA to the MOS version. That’s it-happy troubleshooting!

OSWatcher, Tracefile Analyzer, and Oracle 12.2 single instance

I have previously written about TFA, OSWatcher et all for Oracle 12.1. Since then, a lot of things have happened and I had an update for 12.2 on my to-do list for far too long. Experience teaches me that references to support notes and official documentation get out of date rather quickly, so as always, if you find anything that changed please let me know via the comments section and I’ll update the post.

This is going to be a 3 part mini-series to save you having to go over 42 pages of text … In this first part I’m going to have a look at single instance Oracle. In part 2 I’ll have a look at Oracle Restart environments, and finally in part 3 I’ll finish the series by looking at a 12.2 RAC system.

The environment

I am using a small VM to install Oracle 12.2.0.1.0 (initially) on Oracle Linux 7.4 with kernel UEK4. As always, my EE database binaries go into /u01/app/oracle/product/12.2.0.1/dbhome_1.

The installation/testing (and reporting) of my findings are following this approach:

  • Install the O/S
  • Install Oracle 12.2.0.1.0 EE
  • Create an EE database (not shown here)
  • Patch binaries and database to 12.2.0.1.180116
  • Upgrade TFA to 12.2.1.3.1 as downloaded from My Oracle Support DOC ID 1513912.1

These were the current versions at the time of writing.

Install Oracle 12.2.0.1.0

The first step after the O/S is provisioned is to install the Oracle software, obviously. I have noticed that TFA is part of the Oracle binaries. Towards the end of the installation process, you are prompted to execute root.sh, as normal. On my system, root.sh had the following contents:

      1 #!/bin/sh
      2 unset WAS_ROOTMACRO_CALL_MADE
      3 . /u01/app/oracle/product/12.2.0.1/dbhome_1/install/utl/rootmacro.sh "$@"
      4 . /u01/app/oracle/product/12.2.0.1/dbhome_1/install/utl/rootinstall.sh
      5 /u01/app/oracle/product/12.2.0.1/dbhome_1/suptools/tfa/release/tfa_home/install/roottfa.sh
      6 /u01/app/oracle/product/12.2.0.1/dbhome_1/install/root_schagent.sh
      7 
      8 #
      9 # Root Actions related to network
     10 #
     11 /u01/app/oracle/product/12.2.0.1/dbhome_1/network/install/sqlnet/setowner.sh
     12 
     13 #
     14 # Invoke standalone rootadd_rdbms.sh
     15 #
     16 /u01/app/oracle/product/12.2.0.1/dbhome_1/rdbms/install/rootadd_rdbms.sh
     17 
     18 /u01/app/oracle/product/12.2.0.1/dbhome_1/rdbms/install/rootadd_filemap.sh

After a few variables are set/defined by sourcing in files created during the installation, roottfa.sh is called (see line 5). It allows you to configure TFA to run as a background (daemon) process. I decided to go with that option after consulting chapter 4 in the 12.2 Autonomous Health Framework documentation and reading about the advantages of using TFA as a daemon. This may or may not be the right way to run TFA for you, the documentation is really good and helps you decide. Here is the transcript of my root.sh execution:

[root@server5 ~]#  /u01/app/oracle/product/12.2.0.1/dbhome_1/root.sh
Performing root user operation.

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u01/app/oracle/product/12.2.0.1/dbhome_1

Enter the full pathname of the local bin directory: [/usr/local/bin]:
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...


Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Do you want to setup Oracle Trace File Analyzer (TFA) now ? yes|[no] :
yes
Installing Oracle Trace File Analyzer (TFA).
Log File: /u01/app/oracle/product/12.2.0.1/dbhome_1/install/root_server5_2018-01-22_17-21-41-005116657.log
Finished installing Oracle Trace File Analyzer (TFA)

Once that message is shown, TFA is configured and controlled via a systemd unit file:

[root@server5 ~]# systemctl cat oracle-tfa
# /etc/systemd/system/oracle-tfa.service
# Copyright (c) 2014, Oracle and/or its affiliates. All rights reserved.
#
# Oracle TFA startup
#
[Unit]
Description=Oracle Trace File Analyzer
After=syslog.target
[Service]
ExecStart=/etc/init.d/init.tfa run >/dev/null 2>&1 </dev/null
Type=simple
Restart=always

[Install]
WantedBy=multi-user.target graphical.target

The service is enabled and running.

After the completion of roottfa.sh, TFA resides in $ORACLE_BASE/tfa and its subdirectories. This is documented in the 12.2 Autonomous Health Framework chapter 4.2.3 and has an interesting implication: if you set your environment using oraenv, you might find that you get errors invoking tfactl, such as these on my VM. I have used a “minimum install” for my operating system and quite specifically didn’t add any additional perl modules in my kickstart file. Now, when invoking tfactl after having set my environment using oraenv, I find that there are missing perl modules in my system’s perl installation:

[oracle@server5 ~]$ . oraenv
ORACLE_SID = [NCDB] ? NCDB
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@server5 ~]$ tfactl status
Can't locate Digest/MD5.pm in @INC (@INC contains: 
/usr/local/lib64/perl5 /usr/local/share/perl5 
/usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl
/usr/lib64/perl5 /usr/share/perl5 . /u01/app/oracle/tfa/server5/tfa_home/bin 
/u01/app/oracle/tfa/server5/tfa_home/bin/common /u01/app/oracle/tfa/server5/tfa_home/bin/modules 
/u01/app/oracle/tfa/server5/tfa_home/bin/common/exceptions) 
at /u01/app/oracle/tfa/server5/tfa_home/bin/common/tfactlshare.pm line 7628.
BEGIN failed--compilation aborted at /u01/app/oracle/tfa/server5/tfa_home/bin/common/tfactlshare.pm line 7628.
Compilation failed in require at /u01/app/oracle/tfa/server5/tfa_home/bin/tfactl.pl line 223.
BEGIN failed--compilation aborted at /u01/app/oracle/tfa/server5/tfa_home/bin/tfactl.pl line 223.

The output has been changed for readability (originally I was missing Data::Dumper as well). After studying the documentation (still section 4.2.3 in the aforementioned document), it turns out to be a user mistake. As I said before, after TFA is configured using roottfa.sh as part of the root.sh script execution, it runs in daemon mode and crucially, is available from $ORACLE_BASE/tfa. I found that location being referred to in /etc/init.d/init.tfa as well. When I simply typed “tfactl” into my terminal window, I invoked a different “tfactl”. There is a lot more to be said about this, and I will try and do so in a different post.

NB: the same section 4.2.3 in the documentation states that even should you not run TFA in daemon mode, you can still make use of “user mode TFA” in $ORACLE_HOME, although there are certain restrictions. I haven’t pursued that route.

Anyway, after switching to the location where TFA is actually installed ($ORACLE_BASE/tfa), all is well. It seems that running roottfa.sh creates a new “Oracle” perl:

[root@server5 ~]# find /u01/app/oracle/tfa -name perl
/u01/app/oracle/tfa/server5/tfa_home/perl
/u01/app/oracle/tfa/server5/tfa_home/perl/bin/perl

I found Digest::MD5 and Data::Dumper in /u01/app/oracle/tfa/server5/tfa_home/perl/lib/5.22.0/x86_64-linux-thread-multi.

So let’s try and get the status of the current installation from $ORACLE_BASE/tfa:

[oracle@server5 ~]$ /u01/app/oracle/tfa/bin/tfactl status

Access Denied: Only TFA Admin can run this command

Nearly there: the perl modules are no longer reported to be missing, the “Oracle” perl installation appears to be used now. But what about this error message? I read in section 4.2.4 “Securing Access to Oracle Trace File Analyzer” (still referring to the Autonomous Health Framework manual) that access to TFA is restricted. However, the RDBMS owner should have been granted access automatically.

Using the commands shown in the manual I checked permissions and it turns out that the oracle user is configured to have access to TFA.

[root@server5 ~]# /u01/app/oracle/tfa/bin/tfactl access lsusers
.---------------------------------.
|       TFA Users in server5      |
+-----------+-----------+---------+
| User Name | User Type | Status  |
+-----------+-----------+---------+
| oracle    | USER      | Allowed |
'-----------+-----------+---------'

In fact, I can query TFA’s status using the “print status” command as oracle (/u01/app/oracle/tfa/bin/tfactl print status). I compared the output of “tfactl -help” between oracle and root, and there are more options available when running as root. This might explain the above error.

What is the status now?

TFA is now set up and working, but using the base release:

[root@server5 ~]# /u01/app/oracle/tfa/bin/tfactl status

.------------------------------------------------------------------------------------------------.
| Host    | Status of TFA | PID   | Port  | Version    | Build ID             | Inventory Status |
+---------+---------------+-------+-------+------------+----------------------+------------------+
| server5 | RUNNING       | 18786 | 41482 | 12.2.1.0.0 | 12210020161122170355 | COMPLETE         |
'---------+---------------+-------+-------+------------+----------------------+------------------'

It should probably be patched to something more recent. I’ll try that in 2 ways: first by applying the January 2018 RU to see if the version changes. Since the standard deployment doesn’t come with OSWatcher which I’m particularly interested in, I’ll download and apply TFA 12.2.1.3.1 next. As with all patching, I need to make sure that I have working backups which I’m comfortable restoring should anything go badly wrong.

Status after applying the January RU

A combination of opatch/datapatch later, my system is on the latest RU patchlevel:

[oracle@server5 OPatch]$ opatch lspatches
27105253;Database Release Update : 12.2.0.1.180116 (27105253)

OPatch succeeded.

However, this did not have an effect on the version of TFA in $ORACLE_BASE:

[root@server5 ~]# systemctl restart oracle-tfa
[root@server5 ~]# /u01/app/oracle/tfa/bin/tfactl status

.------------------------------------------------------------------------------------------------.
| Host    | Status of TFA | PID   | Port  | Version    | Build ID             | Inventory Status |
+---------+---------------+-------+-------+------------+----------------------+------------------+
| server5 | RUNNING       | 24042 | 37226 | 12.2.1.0.0 | 12210020161122170355 | COMPLETE         |
'---------+---------------+-------+-------+------------+----------------------+------------------'

Not quite what I expected after reading the docs: the installation of the latest RU should have updated TFA as well. But maybe I got something wrong on my end. The RU readme did not have any reference to TFA that I could find.

Yet it doesn’t matter: I wanted to have all the great support tools anyway (and they aren’t shipped with “stock TFA”), so it was time to install the latest version from MOS.

Upgrading TFA using 12.2.1.3.1 (MOS)

The patch is quite simple and well documented. If TFA is up and running in daemon mode as in my example, the patching tool will recognise that fact and patch the installation in-place. After a couple of minutes on my VM, I have a new version:

[root@server5 stage]# /u01/app/oracle/tfa/bin/tfactl status

.------------------------------------------------------------------------------------------------.
| Host    | Status of TFA | PID   | Port  | Version    | Build ID             | Inventory Status |
+---------+---------------+-------+-------+------------+----------------------+------------------+
| server5 | RUNNING       | 28105 | 39100 | 12.2.1.3.1 | 12213120171215143839 | COMPLETE         |
'---------+---------------+-------+-------+------------+----------------------+------------------'

The MOS version comes with lots of useful tools as well:

[oracle@server5 stage]$ /u01/app/oracle/tfa/bin/tfactl toolstatus

.------------------------------------------------------------------.
|                   TOOLS STATUS - HOST : server5                  |
+----------------------+--------------+--------------+-------------+
| Tool Type            | Tool         | Version      | Status      |
+----------------------+--------------+--------------+-------------+
| Development Tools    | orachk       |   12.2.0.1.3 | DEPLOYED    |
|                      | oratop       |       14.1.2 | DEPLOYED    |
+----------------------+--------------+--------------+-------------+
| Support Tools Bundle | darda        | 2.10.0.R6036 | DEPLOYED    |
|                      | oswbb        |        8.1.2 | RUNNING     |
|                      | prw          | 12.1.13.11.4 | NOT RUNNING |
+----------------------+--------------+--------------+-------------+
| TFA Utilities        | alertsummary |   12.2.1.1.0 | DEPLOYED    |
|                      | calog        |   12.2.0.1.0 | DEPLOYED    |
|                      | changes      |   12.2.1.1.0 | DEPLOYED    |
|                      | dbglevel     |   12.2.1.1.0 | DEPLOYED    |
|                      | events       |   12.2.1.1.0 | DEPLOYED    |
|                      | grep         |   12.2.1.1.0 | DEPLOYED    |
|                      | history      |   12.2.1.1.0 | DEPLOYED    |
|                      | ls           |   12.2.1.1.0 | DEPLOYED    |
|                      | managelogs   |   12.2.1.1.0 | DEPLOYED    |
|                      | menu         |   12.2.1.1.0 | DEPLOYED    |
|                      | param        |   12.2.1.1.0 | DEPLOYED    |
|                      | ps           |   12.2.1.1.0 | DEPLOYED    |
|                      | pstack       |   12.2.1.1.0 | DEPLOYED    |
|                      | summary      |   12.2.1.1.0 | DEPLOYED    |
|                      | tail         |   12.2.1.1.0 | DEPLOYED    |
|                      | triage       |   12.2.1.1.0 | DEPLOYED    |
|                      | vi           |   12.2.1.1.0 | DEPLOYED    |
'----------------------+--------------+--------------+-------------'

Note :-
  DEPLOYED    : Installed and Available - To be configured or run interactively.
  NOT RUNNING : Configured and Available - Currently turned off interactively.
  RUNNING     : Configured and Available.

[oracle@server5 stage]$ 

Since I care a lot about OSWatcher, I was very pleased to see it running.

[oracle@server5 stage]$ ps -ef | grep -i osw
oracle   28344     1  0 10:58 ?        00:00:00 /bin/sh ./OSWatcher.sh 30 48 NONE /u01/app/oracle/tfa/repository/suptools/server5/oswbb/oracle/archive
oracle   28934 28344  0 10:58 ?        00:00:00 /bin/sh ./OSWatcherFM.sh 48 /u01/app/oracle/tfa/repository/suptools/server5/oswbb/oracle/archive
oracle   30662 27252  0 11:01 pts/4    00:00:00 grep --color=auto -i osw
[oracle@server5 stage]$ 

Kindly refer to the documentation for more information about TFA. It’s quite a collection of tools, and it helps you in so many ways…

Little things worth knowing: redo transport in Data Guard 12.2 part 2

In the first part of this article I looked at a number of views and some netstat output to show how redo is transported from the primary database to its standby systems. The long story short is that TT02 (“async ORL multi”) was found sending redo to CDB3 asynchronously whilest NSS2 (“sync”) transferred redo to the synchronised target – CDB2. Unlike v$dataguard_process wanted me to believe, it really wasn’t LGWR sending redo over the network.

In this little article I would like to show you how the standby databases CDB2 and CDB3 receive redo and how you can map this back to the primary database, closing the loop.

How does CDB2 receive redo?

First I’m looking at CDB2, which receives redo via synchronous mode. I should be able to narrow the communication down between primary and standby by referring to the LGWR and TT02 process IDs in the CLIENT_PID column on the standby. As a quick reminder, 14986 is the PID for LGWR, 15029 belongs to NSS2, and 15252 maps to TT02. Let’s try:

SQL> select db_unique_name,database_role from v$database;

DB_UNIQUE_NAME                 DATABASE_ROLE
------------------------------ ----------------
CDB2                           PHYSICAL STANDBY

SQL> select name, pid, role, action, client_pid, client_role, sequence#, block#, dest_id
  2  from v$dataguard_process where client_pid in (14986,15029,15252);

NAME  PID        ROLE                    ACTION       CLIENT_PID CLIENT_ROLE       SEQUENCE#     BLOCK#    DEST_ID
----- ---------- ----------------------- ------------ ---------- ---------------- ---------- ---------- ----------
rfs   5517       RFS sync                IDLE              14986 log writer               95     174565          0

So it would appear the process responsible for shipping redo to “SYNC” destinations is the log writer. Actually, the output of v$dataguard_process is quite interesting, which is why I’m adding it here for the sake of completeness:

SQL> select name, pid, role, action, client_pid, client_role, sequence#, block#, dest_id
  2  from v$dataguard_process order by action;

NAME  PID        ROLE                    ACTION       CLIENT_PID CLIENT_ROLE       SEQUENCE#     BLOCK#    DEST_ID
----- ---------- ----------------------- ------------ ---------- ---------------- ---------- ---------- ----------
rfs   5517       RFS sync                IDLE              14986 log writer               95     229446          0
rfs   5350       RFS archive             IDLE              15224 archive gap               0          0          1
rfs   5346       RFS ping                IDLE              15124 gap manager              95          0          0
rfs   5354       RFS archive             IDLE              15233 archive gap               0          0          1
MRP0  5348       managed recovery        IDLE                  0 none                      0          0          0
rfs   5352       RFS archive             IDLE              15240 archive gap               0          0          1
LGWR  5207       log writer              IDLE                  0 none                      0          0          0
TT01  5259       redo transport timer    IDLE                  0 none                      0          0          0
TT00  5255       gap manager             IDLE                  0 none                      0          0          0
ARC1  5263       archive redo            IDLE                  0 none                      0          0          0
ARC2  5265       archive redo            IDLE                  0 none                      0          0          0
ARC3  5267       archive redo            IDLE                  0 none                      0          0          0
TMON  5242       redo transport monitor  IDLE                  0 none                      0          0          0
ARC0  5257       archive local           IDLE                  0 none                      0          0          0

14 rows selected.

This view tells me that LGWR is attached to the RFS sync proces. But now I know better than that, and it is similar to what I saw on the primary. Looking a little closer, I can see that strictly speaking, the RFS process is connected to NSS2:

[root@server2 ~]# netstat -tunalp | egrep 'Active|Proto|5517'
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp6       0      0 192.168.100.22:1521     192.168.100.21:15515    ESTABLISHED 5517/oracleCDB2   

I am repeating the values for the primary here so you don’t have to go back to the previous article:

[root@server1 ~]# ps -ef | egrep 'lgwr|lg0|nss'
oracle   14986     1  0 09:58 ?        00:01:19 ora_lgwr_CDB1
oracle   14990     1  0 09:58 ?        00:00:00 ora_lg00_CDB1
oracle   14994     1  0 09:58 ?        00:00:00 ora_lg01_CDB1
oracle   15029     1  0 09:58 ?        00:00:43 ora_nss2_CDB1

[root@server1 ~]# netstat -vW --numeric-ports -ee -p -la --tcp | egrep '14986|14990|14994|15029'
tcp        0      0 server1.example.com:15515 server2.example.com:1521 ESTABLISHED oracle     16400768   15029/ora_nss2_CDB1

You will notice that port 15515 on server1 belongs to ora_nss2_CDB1.

Going back a little to v$dataguard_process, it seems a bit weird to see MRP0 as “idle” when the database is in managed recovery mode using real time apply. Trying something else I am querying v$managed_standby and voila: MRP0 is said to apply logs:

SQL> select process,pid,status,client_process,client_pid,sequence#,block# 
  2  from v$managed_standby order by status;

PROCESS   PID        STATUS       CLIENT_P CLIENT_PID                                SEQUENCE#     BLOCK#
--------- ---------- ------------ -------- ---------------------------------------- ---------- ----------
DGRD      5255       ALLOCATED    N/A      N/A                                               0          0
DGRD      5259       ALLOCATED    N/A      N/A                                               0          0
MRP0      5348       APPLYING_LOG N/A      N/A                                              95     246625
ARCH      5257       CLOSING      ARCH     5257                                             92       4096
ARCH      5263       CLOSING      ARCH     5263                                             93       2048
ARCH      5265       CLOSING      ARCH     5265                                             94     342016
ARCH      5267       CONNECTED    ARCH     5267                                              0          0
RFS       5350       IDLE         UNKNOWN  15224                                             0          0
RFS       5354       IDLE         UNKNOWN  15233                                             0          0
RFS       5352       IDLE         UNKNOWN  15240                                             0          0
RFS       5517       IDLE         LGWR     14986                                            95     246626
RFS       5346       IDLE         Archival 15124                                             0          0

12 rows selected.

I guess that’s true, as the system is in constant recovery using the standby logfiles.

And what about CDB3?

On the other hand, CDB3 – to which redo is shipped asynchronously – lists TT02 as it’s counterpart:

SQL> select db_unique_name,database_role from v$database;

DB_UNIQUE_NAME                 DATABASE_ROLE
------------------------------ ----------------
CDB3                           PHYSICAL STANDBY

SQL> select name, pid, role, action, client_pid, client_role, sequence#, block#, dest_id 
  2   from v$dataguard_process order by action;

NAME  PID                      ROLE                    ACTION       CLIENT_PID CLIENT_ROLE       SEQUENCE#     BLOCK#    DEST_ID
----- ------------------------ ----------------------- ------------ ---------- ---------------- ---------- ---------- ----------
rfs   14803                    RFS ping                IDLE              15124 gap manager              96          0          0
rfs   14809                    RFS archive             IDLE              15233 archive gap               0          0          0
rfs   14811                    RFS async               IDLE              15252 async ORL multi          96      34674          0
MRP0  11825                    managed recovery        IDLE                  0 none                      0          0          0
ARC0  11776                    archive local           IDLE                  0 none                      0          0          0
ARC2  11786                    archive redo            IDLE                  0 none                      0          0          0
TT00  11774                    gap manager             IDLE                  0 none                      0          0          0
ARC3  11788                    archive redo            IDLE                  0 none                      0          0          0
TMON  11706                    redo transport monitor  IDLE                  0 none                      0          0          0
LGWR  11676                    log writer              IDLE                  0 none                      0          0          0
ARC1  11784                    archive redo            IDLE                  0 none                      0          0          0
TT01  11778                    redo transport timer    IDLE                  0 none                      0          0          0

12 rows selected.

Unlike the case with CDB2, the local RFS process is indeed connecting to TT02 on server1:

[root@server2 ~]# netstat -vW --numeric-ports -ee -p -la --tcp | egrep '^Active|^Proto|14811'
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name    
tcp6       0      0 server2.example.com:1521 server1.example.com:12936 ESTABLISHED oracle     4658198    14811/oracleCDB3    

… and on server1:

[root@server1 ~]# netstat -vW --numeric-ports -ee -p -la --tcp | egrep '^Active|^Proto|12936'
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address            State       User       Inode      PID/Program name    
tcp        0      0 server1.example.com:12936 server2.example.com:1521 ESTABLISHED oracle     15820538   15252/ora_tt02_CDB1 

This should be enough evidence, I rest my case :)

Summary

So I guess that answers my question: On my small VMs in the lab, NSSn is responsible for shipping redo to targets in “SYNC” mode. The redo transport server processes TTnn ship redo to destinations that are defined for asynchronous transport.

Little things worth knowing: redo transport in Data Guard 12.2 part 1

I have researched Data Guard redo transport quite a bit, but so far haven’t really blogged about some of the details that I have worked out. This is what I am trying to do in this post.

In order to have a somewhat meaningful setup, I created a Data Guard configuration with 1 primary and 2 standby databases. They all reside on my lab (virtual) machines:

  • CDB1 currently operates in primary mode on server1
  • CDB2 and CDB3 are physical standby databases on server2
  • The VMs use Oracle Linux 7.4 with UEK4

The databases have been patched to 12.2.0.1.170814. I c^Hshould probably have created a dedicated VM for the second standby database but I hope what I found out with just 2 systems applies equally. If you know or find out otherwise, please let me know in the comments section of this article. This post is based on information from the Data Guard Concepts and Administration Guide version 12c as of 171212 as well as my own investigation shown here. There’s a good Oracle white paper out there on the impact of synchronous log shipping and some “best practices” which you might want to read as well.

Data Guard configuration

All my Data Guard instances are always managed via the Data Guard Broker, it’s just too convenient for me not to. Here’s the configuration detail for later reference:

DGMGRL> show configuration

Configuration - allthree

  Protection Mode: MaxAvailability
  Members:
  CDB1 - Primary database
    CDB2 - Physical standby database 
    CDB3 - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 22 seconds ago)

My protection mode is defined as MaxAvailability, which can be a good compromise between availability/data protection and performance. Your mileage will vary, picking the right protection mode for your environment is out of scope of this little article.

I am using sync redo transport to CDB2, but asynchronous redo transport to CDB3:

DGMGRL> show database 'CDB2' logxptmode
  LogXptMode = 'SYNC'
DGMGRL> show database 'CDB3' logxptmode
  LogXptMode = 'ASYNC'

In addition to the Broker, the primary database shows the same information:

SQL> select dest_id, destination, status, error, transmit_mode
  2  from v$archive_dest where destination is not null;

   DEST_ID DESTINATION                    STATUS    ERROR                TRANSMIT_MOD
---------- ------------------------------ --------- -------------------- ------------
         1 USE_DB_RECOVERY_FILE_DEST      VALID                          SYNCHRONOUS
         2 CDB2                           VALID                          PARALLELSYNC
         3 CDB3                           VALID                          ASYNCHRONOUS

DEST_ID 1 is of course the local archiving destination and doesn’t have anything to do with redo transport:

SQL> select dest_name, destination, type, valid_type, valid_role
  2  from v$archive_dest where dest_id = 1
  3  /

DEST_NAME            DESTINATION                    TYPE    VALID_TYPE      VALID_ROLE
-------------------- ------------------------------ ------- --------------- ------------
LOG_ARCHIVE_DEST_1   USE_DB_RECOVERY_FILE_DEST      PUBLIC  ALL_LOGFILES    ALL_ROLES

As @fritshoogland would point out, the log writer trace is a great source of information and I’ll show parts of it throughout the post. This excerpt shows how lgwr says it’s not responsible for destination 1:

Trace file /u01/app/oracle/diag/rdbms/cdb1/CDB1/trace/CDB1_lgwr_14986.trc
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
Build label:    RDBMS_12.2.0.1.0_LINUX.X64_170125
ORACLE_HOME:    /u01/app/oracle/product/12.2.0.1/dbhome_1
System name:    Linux
Node name:      server1
Release:        4.1.12-94.3.9.el7uek.x86_64
Version:        #2 SMP Fri Jul 14 20:09:40 PDT 2017
Machine:        x86_64
Instance name: CDB1
Redo thread mounted by this instance: 0 
Oracle process number: 20
Unix process pid: 14986, image: oracle@server1 (LGWR)


*** 2017-12-12T09:58:48.421846+00:00 (CDB$ROOT(1))
*** SESSION ID:(10.52290) 2017-12-12T09:58:48.421874+00:00
*** CLIENT ID:() 2017-12-12T09:58:48.421878+00:00
*** SERVICE NAME:() 2017-12-12T09:58:48.421882+00:00
*** MODULE NAME:() 2017-12-12T09:58:48.421885+00:00
*** ACTION NAME:() 2017-12-12T09:58:48.421888+00:00
*** CLIENT DRIVER:() 2017-12-12T09:58:48.421892+00:00
*** CONTAINER ID:(1) 2017-12-12T09:58:48.421895+00:00
 
Created 2 redo writer workers (2 groups of 1 each)

*** 2017-12-12T09:59:06.794537+00:00 (CDB$ROOT(1))
Destination LOG_ARCHIVE_DEST_3 is using asynchronous network I/O
Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR
...

The information I wanted to show you is found towards the end of this little extract: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR.

Investigation into Data Guard redo transport

Like so many other areas in the database, Data Guard is well instrumented. The main views I’ll use for this post are v$managed_standby and v$dataguard_process. Let’s begin by looking at the processes on the primary with the intention to work out which process is transmitting redo.

SQL> select process,pid,status,client_process,client_pid,sequence#,block#
  2  from v$managed_standby
  3  order by status;

PROCESS   PID        STATUS       CLIENT_P CLIENT_PID  SEQUENCE#     BLOCK#
--------- ---------- ------------ -------- ---------- ---------- ----------
DGRD      15275      ALLOCATED    N/A      N/A                 0          0
DGRD      15128      ALLOCATED    N/A      N/A                 0          0
DGRD      15362      ALLOCATED    N/A      N/A                 0          0
DGRD      15124      ALLOCATED    N/A      N/A                 0          0
ARCH      15240      CLOSING      ARCH     15240              84          1
ARCH      15224      CLOSING      ARCH     15224              83          1
ARCH      15126      CLOSING      ARCH     15126              85     342016
ARCH      15233      CLOSING      ARCH     15233              82          1
LGWR      14986      WRITING      LGWR     14986              86      51577
LNS       15252      WRITING      LNS      15252              86      51577

10 rows selected.

OK, so we have a few archiver processes present, a group of “DGRD” processes, LNS, and LGWR. This is the current state of my primary database.

SQL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     84
Next log sequence to archive   86
Current log sequence           86

I am running a low-impact Swingbench workload to generate some redo, but not too much (Average Apply Rate: 56.00 KByte/s according to the Broker).

These DGRD processes are visible as TTnn processes on the O/S-level. I used the listagg function to group their PIDs into an egrep-able format:

SQL> select process , listagg(pid, '|') within group (order by pid) as pids
  2  from v$managed_standby group by process;

PROCESS   PIDS
--------- ------------------------------
ARCH      15126|15224|15233|15240
DGRD      15124|15128|15275|15362
LGWR      14986
LNS       15252

The first group to map to Linux processes is DGRD (I have an idea what these archiver processes will look like on the O/S so won’t go into that detail)

[oracle@server1 ~]$ ps -ef | egrep '15124|15128|15275|15362'
oracle   15124     1  0 09:59 ?        00:00:00 ora_tt00_CDB1
oracle   15128     1  0 09:59 ?        00:00:00 ora_tt01_CDB1
oracle   15275     1  0 09:59 ?        00:00:00 ora_tt03_CDB1
oracle   15362     1  0 09:59 ?        00:00:00 ora_tt04_CDB1

So there are tt00 – tt04 with the exception of tt02. The Oracle 12.2 reference refers to the TTnn processes as redo transport slaves. The LNS process from the query is our missing tt02:

[oracle@server1 ~]$ ps -ef | egrep 15252
oracle   15252     1  0 09:59 ?        00:00:04 ora_tt02_CDB1

Interestingly enough, the current (Oracle 12.2) documentation has nothing to say about the LNS process. Or it has hidden it well from me. I found a reference in the 10.2 DataGuard Concepts and Administration Guide, chapter 5 (Redo Transport) but nothing concrete.

The final process to show is LGWR:

[oracle@server1 ~]$ ps -ef | grep 14986
oracle   14986     1  0 09:58 ?        00:00:12 ora_lgwr_CDB1

The next step is to work out how processes on the primary and standby database map. v$dataguard_status is an interesting view, and it shows relevant information as seen in the alert.log. I found v$dataguard_process to be even more valuable. On the primary, I am getting the following results when querying it:

SQL> select name, pid, role, action, client_pid, client_role, sequence#, block#, dest_id
  2  from v$dataguard_process order by action;
  
NAME  PID        ROLE                    ACTION       CLIENT_PID CLIENT_ROLE       SEQUENCE#     BLOCK#    DEST_ID
----- ---------- ----------------------- ------------ ---------- ---------------- ---------- ---------- ----------
ARC0  15126      archive local           IDLE                  0 none                      0          0          0
ARC3  15240      archive redo            IDLE                  0 none                      0          0          0
TMON  15020      redo transport monitor  IDLE                  0 none                      0          0          0
NSS2  15029      sync                    IDLE                  0 none                      0          0          0
TT00  15124      gap manager             IDLE                  0 none                     86          0          2
TT01  15128      redo transport timer    IDLE                  0 none                      0          0          0
TT04  15362      controlfile update      IDLE                  0 none                      0          0          0
TT03  15275      heartbeat redo informer IDLE                  0 none                      0          0          0
ARC1  15224      archive redo            IDLE                  0 none                      0          0          0
ARC2  15233      archive redo            IDLE                  0 none                      0          0          0
TT02  15252      async ORL multi         WRITING               0 none                     86      92262          0
LGWR  14986      log writer              WRITING               0 none                     86      92271          0

12 rows selected.

So from that output it appears as if only LGWR and TT02 are the only processes actually writing. Would be interesting to know where they are writing to :) It’s unclear to me why the process with PID 15252 appears as TT02 instead of LNS, as in v$managed_standby.

If you run the same query again you notice that the block# is increasing: as far as I remember that’s a sign of work! This coincides with information from the alert.log:

Destination LOG_ARCHIVE_DEST_2 is SYNCHRONIZED
LGWR: Standby redo logfile selected to archive thread 1 sequence 94
LGWR: Standby redo logfile selected for thread 1 sequence 94 for destination LOG_ARCHIVE_DEST_2
2017-12-12T13:23:07.153256+00:00
Thread 1 advanced to log sequence 94 (LGWR switch)
  Current log# 1 seq# 94 mem# 0: /u01/oradata/CDB1/onlinelog/o1_mf_1_dvh035kx_.log
  Current log# 1 seq# 94 mem# 1: /u01/fast_recovery_area/CDB1/onlinelog/o1_mf_1_dvh0363f_.log
2017-12-12T13:23:07.310438+00:00
Archived Log entry 194 added for T-1.S-93 ID 0x3784c932 LAD:1
2017-12-12T13:23:07.531896+00:00
TT02: Standby redo logfile selected for thread 1 sequence 94 for destination LOG_ARCHIVE_DEST_3

There’s also a new entry that wasn’t present in v$managed_standby, named NSS2. This process has not yet been discussed:

[oracle@server1 ~]$ ps -ef  | grep 5348
oracle    5348     1  0 11:16 ?        00:00:01 ora_nss2_CDB1

Its job is described by the database reference as transferring redo for synchronous targets. Nevertheless, I never saw it in a state other than “IDLE”. The aforementioned white paper however references it in section “Understanding how synchronous transport insures data integrity”.

In this particular section, the authors state that LGWR writes to the online redo logs and an identical network redo write is performed by NSSn. Looking at the netstat output, LGWR (and its 2 worker processes) does not have a connection to “server2” open, but NSS2 does:

[root@server1 ~]# ps -ef | egrep 'lgwr|lg0|nss'
oracle   14986     1  0 09:58 ?        00:01:19 ora_lgwr_CDB1
oracle   14990     1  0 09:58 ?        00:00:00 ora_lg00_CDB1
oracle   14994     1  0 09:58 ?        00:00:00 ora_lg01_CDB1
oracle   15029     1  0 09:58 ?        00:00:43 ora_nss2_CDB1

[root@server1 ~]# netstat -vW --numeric-ports -ee -p -la --tcp | egrep '14986|14990|14994|15029'
tcp        0      0 server1.example.com:15515 server2.example.com:1521 ESTABLISHED oracle     16400768   15029/ora_nss2_CDB1 
[root@server1 ~]# 

I cross-referenced this with output from lsof and it seems to be correct. What I haven’t been able to work out though is the relationship between a user session, LGWR and NSSn process. One clue is in the current log writer trace. Continuing from the previous trace:

*** 2017-12-12 09:59:12.450533 
Netserver NSS2 (PID:15029) for mode SYNC has been re-initialized
Performing a channel reset to ignore previous responses
Connecting as publisher to KSR Channel [id = 20]
Successfully reused NSS2 (PID:15029) for dest CDB2 mode SYNC ocis = 0x7fea5c6f6c40

There appears to be some sort of communication, although it’s not clear to me of what kind ;)

Summary

In this part of the two-part series I hoped to give you a few insights into Data Guard redo transport for synchronous and asynchronous redo shipping. In the next part the information in this article will be joined up with diagnostic information from server2 and both standby databases.

Truncated CTAS statements seem to be no longer shortened

Many bloggers have already discussed that Oracle can truncate the text of a “create table as select” statement to 20 characters depending on your version and patch level. This can be a problem in case a CTAS statement is a resource hog, yet you don’t see the SQL text that is needed for troubleshooting. A quick search on My Oracle Support reveals this can happen on 11.2.0.4, 12.1.0.1 and 12.1.0.2 systems unless patched of course. This has been bugging me for quite some time now, and merits a blog post.

Reproducing the offending behaviour

Oracle has provided a number of patches over time to fix this undesirable short-cutting of the SQL text. I wanted to reproduce the issue on 12.1 to demonstrate the effect. To my shame I have to admit that since 12.2 has come out I have somewhat neglected my 12.1 lab system. It was quite a bit out of date, which was useful for this blog post as it will turn out.

One of my 12.1 systems (a two-node RAC) was patched to July 2016:

[oracle@rac12node1 ~]$ opatch lspatches
23177536;Database PSU 12.1.0.2.160719, Oracle JavaVM Component (JUL2016)
23144544;DATABASE BUNDLE PATCH: 12.1.0.2.160719 (23144544)
23520664;OCW Interim patch for 23520664 

OPatch succeeded.

This isn’t great, I prefer my lab systems to be more current. Long story short, this RAC system is about a year and a bit behind on patches. It shall be rectified at the next opportunity!

I am going to connect to a PDB to run the following steps, although it should not matter whether I’m running the test against a non-CDB or PDB.

To ensure repeated execution of my test case I have created a small script:

[oracle@rac12node1 ~]$ cat ctas_test.sql 
set echo on

sho user

alter session set events 'sql_trace level 8';

create table ctastest tablespace users as 
select * from dba_objects where object_id < 1000;

alter session set events 'sql_trace off';

select value from v$diag_info where name like 'Def%';
[oracle@rac12node1 ~]$ 

In the next step I am connecting to the database to run the script:

[oracle@rac12node1 ~]$ sqlplus martin@pdb_srv

SQL*Plus: Release 12.1.0.2.0 Production on Tue Oct 17 20:26:56 2017

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter password: 
Last Successful login time: Tue Oct 17 2017 18:52:35 +01:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> @ctas_test
SQL> 
SQL> sho user
USER is "MARTIN"
SQL> 
SQL> alter session set events 'sql_trace level 8';

Session altered.

SQL> 
SQL> create table ctastest tablespace users as 
  2  select * from dba_objects where object_id < 1000;

Table created.

SQL> alter session set events 'sql_trace off';

Session altered.

SQL> 
SQL> select value from v$diag_info where name like 'Def%';

VALUE
--------------------------------------------------------------------------------
/u01/app/oracle/diag/rdbms/cdb/CDB1/trace/CDB1_ora_16279.trc

And indeed, the SQL trace shows that the statement is truncated:

=====================
PARSING IN CURSOR #140207826820160 len=90 dep=0 uid=102 oct=1 lid=102 
tim=28239487087 hv=3859378539 ad='afafdb58' sqlid='d7y8ssmm0kwbb'
create table ctastest tab
END OF STMT
PARSE #140207826820160:c=282000,e=688749,p=90,cr=6300,cu=0,mis=1,r=0,dep=0,og=1,
 bplh=12630425,tim=28239487087
WAIT #140207826820160: nam='row cache lock' ela= 62 cache id=8 mode=0 request=3
 obj#=656 tim=28239487351

My Oracle Support also mentioned a couple of other places where the command was found to be truncated. I randomly picked one that matters to me, please consider MOS for more locations where the statements were truncated. Your mileage may vary.

SQL> set long 1000000

SQL> select sql_text, sql_fulltext from v$sql where sql_id = 'd7y8ssmm0kwbb';

SQL_TEXT
-------------------------------------------------------------------------------
SQL_FULLTEXT
-------------------------------------------------------------------------------
create table ctastest tablespace users as select * from dba_objects where 
object_id < 1000
create table ctastest tablespace users as select * from dba_objects where 
object_id < 1000

Thankfully the SQL_TEXT and SQL_FULLTEXT show the entire statement, which is good.

Now what about 12.2?

Out of curiosity I repeated the test in 12.2 to see if the problem was fixed, and it actually appears to be the case. Using the same script to connect to the database I get a different behaviour. This time around I am patched to a more recent release:

[oracle@server2 ~]$ opatch lspatches
26609817;DATABASE RELEASE UPDATE: 12.2.0.1.170814 (26609817)

OPatch succeeded.

Let’s try again:

[oracle@server2 ~]$ sqlplus martin@orcl

SQL*Plus: Release 12.2.0.1.0 Production on Tue Oct 17 20:22:15 2017

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Enter password: 
Last Successful login time: Tue Oct 17 2017 20:19:24 +02:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> @ctas_test
SQL>
SQL> sho user
USER is "MARTIN"
SQL>
SQL> alter session set events 'sql_trace level 8';

Session altered.

SQL>
SQL> create table ctastest tablespace users as 
  2  select * from dba_objects where object_id < 1000;

Table created.

SQL>
SQL> alter session set events 'sql_trace off';

Session altered.

SQL>
SQL> select value from v$diag_info where name like 'Def%';

VALUE
--------------------------------------------------------------------------------
/u01/app/oracle/diag/rdbms/orcl/ORCL/trace/ORCL_ora_16370.trc

SQL>

Well and it seems the statement is no longer truncated. Looking at ORCL_ora_16370.ora I can find the complete statement:

=====================
PARSING IN CURSOR #140577803287552 len=90 dep=0 uid=74 oct=1 lid=74 
 tim=39078309869 hv=3859378539 ad='64d25e58' sqlid='d7y8ssmm0kwbb'
create table ctastest tablespace users as select * from dba_objects 
where object_id < 1000
END OF STMT
PARSE #140577803287552:c=26826,e=27778,p=0,cr=140,cu=0,mis=1,r=0,dep=0,og=1,
 plh=3155089718,tim=39078309868
...

Wait-what about later patch levels for 12.1?

This could easily have turned out to be “one of these posts” where the information presented is correct, yet the picture isn’t complete. Things change with the Oracle database, and newer patches address old problems. Just to be sure what I said is accurate at least at the time of writing I created a new VM with the August 2017 patch set:

[oracle@server1 ~]$ opatch lspatches
26027162;Database PSU 12.1.0.2.170718, Oracle JavaVM Component (JUL2017)
26609783;Database Patch Set Update : 12.1.0.2.170814 (26609783)
18705302;

OPatch succeeded.

And interestingly, the issue with the truncated SQL statement in SQL traces disappeared:

=====================
PARSING IN CURSOR #140433649086704 len=90 dep=0 uid=111 oct=1 lid=111 
tim=797240178 hv=3859378539 ad='1b1add420' sqlid='d7y8ssmm0kwbb'
create table ctastest tablespace users as select * from dba_objects
 where object_id < 1000
END OF STMT
PARSE #140433649086704:c=165843,e=184715,p=197,cr=5489,cu=0,mis=1,r=0,dep=0,og=1,
 plh=12630425,tim=797240177
...

Great stuff: the problem is fixed in the current version for 12.1.0.2 and 12.2.0.1. This is seriously good news for me, especially the part about 12.1.

Changes to the Oracle preinstall RPMs in OL 7.3 and OL 7.4

For quite some time now Oracle has documented the use of the so-called preinstall RPMs to prepare Oracle Linux for the installation of the Oracle database software. I think that’s a great idea if the settings applied by the RPM fit your environment. If I find the time, I’ll write a blog post about what it does specifically in a little while. It definitely fits my lab environment, and I regularly kickstart my OL 7 VMs specifying the preinstall RPM in the %packages section.

When upgrading the current base image from Oracle Linux 7.2 to Oracle 7.4/Oracle 12.2 I noticed a few changes to the preinstall RPMs.

IMPORTANT NOTE If you find this post via an Internet search engine please bear in mind that Oracle could have changed the preinstall RPM since this article was posted in September 2017. Make sure you check with the most current version to see the problem I ran into still exists

Different name

The first surprise came when I changed the virt-install command to reference the Oracle Linux 7.4 ISO instead of 7.2. During the installation session a warning appeared stating that there wasn’t a package named oracle-rdbms-server-12cR1-preinstall anywhere to be found. Really? OK … that was a bit strange. I went straight off to check the public-yum repository for Oracle Linux 7.4 and indeed, there wasn’t the RPM I was looking for. All I could find was a package named oracle-database-server-12cR2-preinstall. All right then, I’ll use that instead. A change to the kickstart file was all that needed to be done, and subsequent installations completed ok.

Except for this other problem …

No more dependencies on the C compiler suite

The other night I wanted to apply a patch to the 12.2 lab cluster using opatchauto. The patching process ran into an error on node 2, and then failed completely trying to roll the patch back. Usually that’s a sad story (and the reason why you create backups before patching) but this time I was lucky and didn’t have to restore. Well, lucky in a way.

Going through the opatch log files (the error reported had something to do with a failure to relink a target in ins_rdbms.mk) I noticed that Oracle complained about not finding gcc. That struck me as odd, and I thought I’d actually seen gcc installed on my 7.2 base image with the 12.1 preinstall RPM.

A quick check reveals that the 12.1 preinstall RPM (taken from the Oracle Linux 7.2 base repository that I used previously) actually defines a dependency on gcc:

[oracle@server1 ~]$ rpm -qpi --requires /tmp/oracle-rdbms-server-12cR1-preinstall-1.0-4.el7.x86_64.rpm
Name        : oracle-rdbms-server-12cR1-preinstall
Version     : 1.0
Release     : 4.el7
Architecture: x86_64
Install Date: (not installed)
Group       : Test Environment/Libraries
Size        : 44346
License     : GPLv2
Signature   : RSA/SHA256, Wed 14 Oct 2015 02:18:22 AM EDT, Key ID 72f97b74ec551f03
Source RPM  : oracle-rdbms-server-12cR1-preinstall-1.0-4.el7.src.rpm
Build Date  : Wed 14 Oct 2015 02:18:19 AM EDT
Build Host  : x86-ol7-builder-01.us.oracle.com
Relocations : (not relocatable)
Vendor      : Oracle
Summary     : Sets the system for Oracle Database single instance 
              and Real Application Cluster install for Oracle Linux 7
Description :
The Oracle Preinstallation RPM package installs software packages and 
sets system parameters required for Oracle Database single instance and 
Oracle Real Application Clusters installations for Oracle Linux Release 7

Files affected: /etc/sysctl.conf, /boot/grub/menu.lst OR /boot/grub2/grub.cfg
Files added: /etc/security/limits.d/oracle-rdbms-server-12cR1-preinstall.conf
/bin/bash
/bin/sh
/bin/sh
/bin/sh
/bin/sh
/etc/redhat-release
bind-utils
binutils
compat-libcap1
compat-libstdc++-33
config(oracle-rdbms-server-12cR1-preinstall) = 1.0-4.el7
ethtool
gcc
gcc-c++
glibc
glibc-devel
initscripts
kernel-uek
ksh
libaio
libaio-devel
libgcc
libstdc++
libstdc++-devel
make
module-init-tools
nfs-utils
openssh-clients
pam
procps
psmisc
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(FileDigests) <= 4.6.0-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
smartmontools
sysstat
util-linux-ng
xorg-x11-utils
xorg-x11-xauth
rpmlib(PayloadIsXz) <= 5.2-1

As you can see, gcc (and even gcc-c++) feature quite prominently in the list of requirements.

The change must have come with Oracle Linux 7.3, but I haven’t noticed it until now. I checked the Oracle Linux 7.4 base repository on public-yum.oracle.com and noticed the following:

  • The 11g Release 2 preinstall RPM is gone (it was still available in the OL 7.3 repository)
  • The 12c Release 1 preinstall RPM is gone, too (it had already disappeared in OL 7.3 but I hadn’t noticed)
  • There is a new RPM for 12.2 named oracle-database-server-12cR2-preinstall

Interestingly enough all 3 preinstall RPMs (11.2/12.1/12.2) exist in the “latest” channel for OL7.

Looking at the new oracle-database-server-12cR2-preinstall.x86_64 RPM I noticed that gcc wasn’t referenced anymore:

[oracle@server1 ~]$ rpm -qi --requires oracle-database-server-12cR2-preinstall
Name        : oracle-database-server-12cR2-preinstall
Version     : 1.0
Release     : 3.el7
Architecture: x86_64
Install Date: Thu 31 Aug 2017 04:38:58 AM EDT
Group       : Test Environment/Libraries
Size        : 56561
License     : GPLv2
Signature   : RSA/SHA256, Mon 10 Jul 2017 06:27:07 AM EDT, Key ID 72f97b74ec551f03
Source RPM  : oracle-database-server-12cR2-preinstall-1.0-3.el7.src.rpm
Build Date  : Mon 10 Jul 2017 06:26:59 AM EDT
Build Host  : x86-ol7-builder-02.us.oracle.com
Relocations : (not relocatable)
Vendor      : Oracle
Summary     : Sets the system for Oracle Database single instance and
              Real Application Cluster install for Oracle Linux 7
Description :
The Oracle Preinstallation RPM package installs software packages and 
sets system parameters required for Oracle Database single instance and 
Oracle Real Application Clusters installations for Oracle Linux Release 7

Files affected: /etc/sysctl.conf, /boot/grub/menu.lst OR /boot/grub2/grub.cfg
Files added: /etc/security/limits.d/oracle-database-server-12cR2-preinstall.conf
/bin/bash
/bin/sh
/bin/sh
/bin/sh
/bin/sh
/etc/redhat-release
bind-utils
binutils
compat-libcap1
compat-libstdc++-33
config(oracle-database-server-12cR2-preinstall) = 1.0-3.el7
ethtool
glibc
glibc-devel
initscripts
kernel-uek
ksh
libaio
libaio-devel
libgcc
libstdc++
libstdc++-devel
make
module-init-tools
net-tools
nfs-utils
openssh-clients
oraclelinux-release
pam
procps
psmisc
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(FileDigests) <= 4.6.0-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
smartmontools
sysstat
unzip
util-linux-ng
xorg-x11-utils
xorg-x11-xauth
rpmlib(PayloadIsXz) <= 5.2-1

Interesting! I thought I’d post this just to let you know.

When planning on installing Enterprise Manager agents you might also notice the absence of gcc: the agent deployment prerequisite check for OEM 13.2 complained for this exact reason.

Full comparison between 12.1 and 12.2

I ran a quick diff to compare the dependencies between oracle-database-server-12cR2-preinstall-1.0-3.el7.x86_64 and oracle-rdbms-server-12cR1-preinstall-1.0-4.el7.x86_64.rpm. Here is the result:

[oracle@server1 ~]$ rpm -q --requires oracle-database-server-12cR2-preinstall > /tmp/12cR2-preinstall.txt
[oracle@server1 ~]$ rpm -qp --requires oracle-rdbms-server-12cR1-preinstall-1.0-4.el7.x86_64.rpm > /tmp/12cR1-preinstall.txt

[oracle@server1 ~]$ diff --width 100 -y /tmp/12cR1-preinstall.txt /tmp/12cR2-preinstall.txt 
/bin/bash                                       /bin/bash
/bin/sh                                         /bin/sh
/bin/sh                                         /bin/sh
/bin/sh                                         /bin/sh
/bin/sh                                         /bin/sh
/etc/redhat-release                             /etc/redhat-release
bind-utils                                      bind-utils
binutils                                        binutils
compat-libcap1                                  compat-libcap1
compat-libstdc++-33                             compat-libstdc++-33
config(oracle-rdbms-server-12cR1-preinstall)  | config(oracle-database-server-12cR2-preinstal
ethtool                                         ethtool
gcc                                           <
gcc-c++                                       <
glibc                                           glibc
glibc-devel                                     glibc-devel
initscripts                                     initscripts
kernel-uek                                      kernel-uek
ksh                                             ksh
libaio                                          libaio
libaio-devel                                    libaio-devel
libgcc                                          libgcc
libstdc++                                       libstdc++
libstdc++-devel                                 libstdc++-devel
make                                            make
module-init-tools                               module-init-tools
                                              > net-tools
nfs-utils                                       nfs-utils
openssh-clients                                 openssh-clients
                                              > oraclelinux-release
pam                                             pam
procps                                          procps
psmisc                                          psmisc
rpmlib(CompressedFileNames) <= 3.0.4-1          rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(FileDigests) <= 4.6.0-1                  rpmlib(FileDigests) <= 4.6.0-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1         rpmlib(PayloadFilesHavePrefix) <= 4.0-1
smartmontools                                   smartmontools
sysstat                                         sysstat
                                              > unzip
util-linux-ng                                   util-linux-ng
xorg-x11-utils                                  xorg-x11-utils
xorg-x11-xauth                                  xorg-x11-xauth
rpmlib(PayloadIsXz) <= 5.2-1                    rpmlib(PayloadIsXz) <= 5.2-1

It came as a bit of a surprise to see the (deprecated) net-tools again. Maybe it has to do with OSWatcher, as it likes netstat for example. I personally made the switch from net-tools to iproute some time ago and haven’t missed anything. I appreciate the addition of unzip to the list of required packages.