Category Archives: 12c Release 2

Dead Connection Detection (DCD) and the Oracle database

Dead Connection Detection is a useful feature of the Oracle database: it allows for the cleanup of “dead” sessions so they don’t linger around consuming memory and other system resources. The idea is simple: if the database detects that a client process is no longer connected to its server process, it cleans up. This can happen in many ways, in most cases this kind of problem is triggered by an end user.

A dead connection shouldn’t be confused with idle connections: an idle connection still maintains the network link between client and server process, except that there is no activity. Idle connections aren’t maintained/controlled via DCD, there are other tools in the database handling such cases.

As a by product, DCD can also help with overly eager firewalls forcibly removing seemingly idle network connections. I found the following posts and the references therein very useful:

With Oracle 12c Oracle changed the way DCD works by no longer relying on its network layer but rather pushing the functionality into the TCP stack on platforms that support it. This change in behaviour is also explained in an Oracle white paper from 2014.

For the first part of this post I set sqlnet.expire_time to 1 as per a MOS note I found, your value is probably different. The parameter is documented in the Net*8 reference, please use it to work out what the best value is for you. As others have pointed out, this parameter has to go into the RDBMS home, more specifically $ORACLE_HOME/network/admin/sqlnet.ora. I am using Oracle 19.4 on Oracle Linux 7.7 for this blog post. I have seen the same behaviour in as well in my lab.


In addition to the proven methods of checking whether TCP_KEEPALIVE is enabled for a given session I wanted to show another one. Using the ss(8) utility it is possible to show socket options. I also tried lsof but on my system I couldn’t get it to print the options:

SQL> select spid from v$process where addr = (select paddr from v$session where username = 'MARTIN');


SQL> exit 


[root@server2 ~]# lsof -nP -p 13656 -T f
lsof: unsupported TCP/TPI info selection: f
lsof 4.87

Although the man-page for lsof reads:

       -T [t]   controls the reporting of some TCP/TPI information,  also  reported  by  net‐
                stat(1),  following  the network addresses.  In normal output the information
                appears in parentheses, each item except TCP or TPI state name identified  by
                a keyword, followed by `=', separated from others by a single space:

                     <TCP or TPI state name>
                     QR=<read queue length>
                     QS=<send queue length>
                     SO=<socket options and values>
                     SS=<socket states>
                     TF=<TCP flags and values>
                     WR=<window read length>
                     WW=<window write length>

                Not all values are reported for all UNIX dialects.  Items values (when avail‐
                able) are reported after the item name and '='.

                When the field output mode is in effect  (See  OUTPUT  FOR  OTHER  PROGRAMS.)
                each item appears as a field with a `T' leading character.

                -T with no following key characters disables TCP/TPI information reporting.

                -T with following characters selects the reporting of specific TCP/TPI infor‐

                     f    selects reporting of socket options,
                          states and values, and TCP flags and

So let’s try something else: ss(8) – another utility to investigate sockets

Revealing socket options

I have used ss(8) in the past when I didn’t have netstat available, which is more and more common now that netstat is deprecated and its designated successor is ss :)

As far as I know you can’t limit ss to show information just for a PID, I use grep to limit the output. The output is in fact very wide, which is why this might not look pretty on the blog depending on whether the renderer decides to wrap output or not.

[root@server2 ~]# ss -nop | egrep 'NetidState|13656'
NetidState Recv-Q Send-Q                                    Local Address:Port                                      Peer Address:Port                                                                                                           
tcp  ESTAB 0      0                               [::ffff:]:1521                           [::ffff:]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,4.412ms,0)        
[root@server2 ~]# 

I used the following options:

  • -n for “Do not try to resolve service names”
  • -o for “Show timer information” and finally
  • -p to “Show process using socket”.

The main option here is -o. As per the man page:

       -o, --options
              Show timer information. For tcp protocol, the output format is:


                     the name of the timer, there are five kind of timer names:

                     on: means one of these timers: tcp retrans timer, tcp early retrans timer and tail loss probe timer
                     keepalive: tcp keep alive timer
                     timewait: timewait stage timer
                     persist: zero window probe timer
                     unknown: none of the above timers

                     how long time the timer will expire

                     how many times the retran occurs

With a little shell loop I can show how that timer is decrementing:

[root@server2 ~]# for i in $(seq 1 5); do ss -nop | grep 13656; sleep 1 ; done
tcp  ESTAB 0      0                               [::ffff:]:1521                           [::ffff:]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,20sec,0)          
tcp  ESTAB 0      0                               [::ffff:]:1521                           [::ffff:]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,19sec,0)          
tcp  ESTAB 0      0                               [::ffff:]:1521                           [::ffff:]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,18sec,0)          
tcp  ESTAB 0      0                               [::ffff:]:1521                           [::ffff:]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,17sec,0)          
tcp  ESTAB 0      0                               [::ffff:]:1521                           [::ffff:]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,16sec,0)          
[root@server2 ~]# 


Using the ss utility it is possible to check if a keepalive timer is implemented as a means to support DCD with 12.1 and later releases. Invoking ss(8) hasn’t caused any problems on my system, but as with every such tool you need to ensure it’s safe to use before attempting to look at an important system.


sqlldr, direct path loads and concurrency in 12.2 and later

In my previous post I showed you that Oracle’s SQL loader (sqlldr) utility has a built-in timeout of 30 seconds waiting for locked resources before returning SQL*Loader-951/ORA-604/ORA-54 errors and failing to load data. This can cause quite some trouble! Before showing you the enhancement in 12.2 and later, here is the gist of the previous post.

Concurrency in Oracle sqlldr 12.1 and earlier

To show you how sqlldr times out I need to simulate an exclusive lock on the table in sqlplus for example. That’s quite simple:

SQL> set time on
10:17:00 SQL> lock table t2 in exclusive mode;

Table(s) Locked.

10:17:07 SQL>

Next I started a sqlldr process in another session. Please refer to the previous post for details, or take my word that I’m using a direct path insert strategy. The only difference is the size of the input file – I had to inflate it considerably to buy some time running standard diagnostic tools:

$ date; sqlldr /@loader control=t2_2.ctl ; date
Tue 23 Jul 10:22:35 BST 2019

SQL*Loader: Release - Production on Tue Jul 23 10:22:35 2019

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct
SQL*Loader-951: Error calling once/load initialization
ORA-00604: error occurred at recursive SQL level 1
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
Tue 23 Jul 10:23:06 BST 2019

The “date” commands reveal a timeout of 30 seconds. Setting ddl_lock_timeout has no bearing on the timeout: the database is waiting on kpodplck wait before retrying ORA-54:

10:22:36 SQL> r
  1* select event, state from v$session where program like 'sqlldr%'

EVENT                                                            STATE
---------------------------------------------------------------- -------------------
kpodplck wait before retrying ORA-54                             WAITING

1 row selected.

This was where I left off with the previous post until I noticed there is another option!

Oracle 12.2 and later

In 12.2 and later you can instruct sqlldr to wait until the lock is released. There is a new parameter named direct_path_lock_timeout:

$ sqlldr | egrep 'Version|direct_path'
direct_path_lock_wait -- wait for access to table when currently locked  (Default FALSE)

Interestingly there are no hits for direct_path_lock_wait in My Oracle Support’s knowledgebase. There are merely a few documentation references. So what does this parameter do? While the table is still locked in exclusive mode, let’s start the sqlldr process with the new option:

$ date; sqlldr /@loader control=t2_2.ctl direct_path_lock_wait=true; date
Fri 26 Jul 10:13:54 BST 2019

SQL*Loader: Release - Production on Fri Jul 26 10:13:54 2019

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct
[ ... waiting ... ]

The process now sits there and waits … and it does so for more than 30 seconds. And instead of kpodplck wait before retrying ORA-54 it waits on … drums please: enq: TM contention!

10:20:11 SQL> select seq#, event, state, round(seconds_in_wait/60, 2) mins_waiting
10:20:21   2  from v$session where program like 'sqlldr%';

      SEQ# EVENT                    STATE               MINS_WAITING
---------- ------------------------ ------------------- ------------
       119 enq: TM - contention     WAITING                     6.53

10:20:26 SQL> 

This is great news if your sqlldr processes compete for TM enqueues and your load process takes a little longer than the previously hard coded timeout of 30 seconds. The process eventually completed successfully after the enqueue was released:

$ date; sqlldr /@loader control=t2_2.ctl direct_path_lock_wait=true; date
Fri 26 Jul 10:13:54 BST 2019

SQL*Loader: Release - Production on Fri Jul 26 10:13:54 2019

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct

Load completed - logical record count 950936.

Table T2:
  950924 Rows successfully loaded.

Check the log file:
for more information about the load.
Fri 26 Jul 10:21:32 BST 2019


As with every pro, there are usually cons associated. The downside to waiting (for a potentially very long time) is that you might not notice load processes beginning to stack up unless proper instrumentation and monitoring are in place. Waiting too long for data to be loaded is equally bad as not loading at all because the end result is identical. As with many features in the database Oracle gives you plenty of options, and it’s up to the developers and architects to make the correct decisions on how to use them.


Beginning with sqlldr 12.2 Oracle introduced the option to wait for enqueues on segments to be released instead of aborting after 30 seconds.

In the next post I’ll write about another possibility to prevent exclusive table locks in the first place when running multiple concurrent sqlldr sessions.

RAC One node databases are relocated by opatchauto in 12.2 part 2

In a previous post I shared how I found out that RAC One Node databases are relocated on-line during patching and I promised a few more tests and sharing of implications. If you aren’t familiar with RAC One Node I recommend having a look at the official documentation: The Real Application Clusters Administration and Deployment Guide features it prominently in the introduction. One of the things I like to keep in mind when working with RAC One Node is this, quoting from section 1.3 in said admin and deployment guide:

Using the Single Client Access Name (SCAN) to connect to the database, clients can locate the service independently of the node on which it is running. Relocating an Oracle RAC One Node instance is therefore mostly transparent to the client, depending on the client connection. Oracle recommends to use either Application Continuity and Oracle Fast Application Notification or Transparent Application Failover to minimize the impact of a relocation on the client.

A little later, you can read this important additional piece of information in the “notes” section:

To prepare for all failure possibilities, you must add at least one Dynamic Database Service (Oracle Clusterware-managed database service) to an Oracle RAC One Node database.

If you created your RAC One database with the database creation assistant (dbca), you are already complying with that rule. In my case, my lab database is named RON with the mandatory service RON_SVC. I opted for an instance prefix of “DCB”.

What I’d like to try out for this blog post is what happens to an active workload on a RAC One database during patching.

I have successfully patched multi-node RAC systems, but that required the application to support this procedure. One of my favourite talks is named “Advanced RAC programming techniques” where I demonstrate the resilience of an application based on RAC to node failures by using Universal Connection Pool (UCP), JDBC and Fast Connection Failover (FCF). UCP is a Fast Application Notification (FAN) aware connection pool allowing my application to react to cluster events such as node up/node down. The idea is to mask instance failure from users.

If all of this sounds super-cryptic, I would like to suggest chapters 5 and 6 of the RAC administration and deployment guide for you. There you can read more about Workload Management with Dynamic Database Services and Application Continuity in RAC.

RAC One Node is different from multi-node RAC as it is only ever active on one node in normal operations. Online relocation, as described in the previous post, is supported by temporarily adding a second (destination) instance to the cluster, and moving all transactions across before terminating them after a time-out (default 30m). Once the time-out is reached (or all sessions made it across to the destination) the source instance is shut down and you are back to 1 active instance.

The online relocation does not care too much about the compatibility of the application with the process. If your application is written for RAC, most likely it will migrate quickly from source to destination instance. If it isn’t, well, the hard timeout might kick in and evict a few of your users. In a worse case your users need to re-connect to the database. Even worse still, you might have to restart the middle-tier system(s).

Sadly I haven’t seen too many applications capable of handling RAC events properly. One application that does is Swingbench, so I’ll stick with it. I configured it according to Dominic Giles’s blog post.

This post assumes that you are properly licensed to use all these features.

The environment

My setup hasn’t changed from previous post so I spare you the boredom of repeating it here. Jump over to the other post for details.


For this blogpost I need to ensure that my RAC One node database resides on the node I am about to patch. I have again followed the patch readme very carefully, I have made sure that I have (proven, working, tested) backups of the entire stack and all the rest of it…

My database is indeed actively running on the node I am about to patch:

[oracle@rac122sec2 ~]$ srvctl status database -db DCB -detail -verbose
Instance DCB_1 is running on node rac122sec2 with online services RON_SVC. Instance status: Open.
Instance DCB_1 is connected to ASM instance +ASM2
Online relocation: INACTIVE

Before I can benefit from Application Continuity, I need to make a few changes to my application service, RON_SVC. There are quite a few sources to choose from, I went with the JDBC Developer’s Guide. Here’s the modification:

[oracle@rac122sec2 ~]$ srvctl modify service -db DCB -service RON_SVC -commit_outcome true \
> -failoverretry 30 -failoverdelay 10 -failovertype transaction \
> -replay_init_time 1800 -retention 86400 -notification true
[oracle@rac122sec2 ~]

Following the instructions on Dominic Giles’s blog, I also need to grant SOE the right to execute DBMS_APP_CONT.

Finally, I need to make changes to my Swingbench configuration file. The relevant part is shown here:

<SwingBenchConfiguration xmlns="">
    <Name>"Order Entry (PLSQL) V2"</Name>
    <Comment>Version 2 of the SOE Benchmark running in the database using PL/SQL</Comment>
        <DriverType>Oracle jdbc Driver</DriverType>
            <Property Key="StatementCaching">50</Property>
            <Property Key="FastFailover">true</Property>
            <Property Key="FetchSize">20</Property>
            <Property Key="OnsConfiguration">nodes=rac122sec1:6200,rac122sec2:6200</Property>
            <Property Key="FetchSize">20</Property>
            <Property Key="AppContinuityDriver">true</Property>

The connection string is actually on a single line, I have formatted it for readability in the above example. The main change from the standard configuration file is the use of connection pooling and setting the required properties for Application Continuity.

Let’s patch!

Once all the preparations are completed, it’s time to see how RAC One Node deals with an active workload undergoing an online relocation during patching. First of all I need to start the workload. I’d normally use charbench for this, but this time around opted for the GUI. It shows performance graphs over a 3 minute rolling window.

A few minutes after starting the benchmark I commenced patching. Soon thereafter, opatchauto told me that it was relocating the database.

OPatchauto session is initiated at Fri Jul 27 14:52:23 2018


Relocating RACOne home before patching on home /u01/app/oracle/product/

According to the session log, this happened at 14:54. And by the way, always keep the session log, it’s invaluable!

2018-07-27 14:54:07,914 INFO  [163] - 
Executing command as oracle: 
 /bin/sh -c 'ORACLE_HOME=/u01/app/oracle/product/ 
 /u01/app/oracle/product/ relocate database -d DCB -n rac122sec1 -w 30 -v'
2018-07-27 14:54:07,914 INFO  [163] - 
  COMMAND Looks like this: 
    /bin/sh -c 'ORACLE_HOME=/u01/app/oracle/product/ 
    /u01/app/oracle/product/ relocate database -d DCB -n rac122sec1 -w 30 -v'
2018-07-27 14:55:31,736 INFO  [163] - 
  status: 0
2018-07-27 14:55:31,737 INFO  [163] - 
  Successfully executed the above command.

2018-07-27 14:55:31,737 INFO  [163] - 
  Output from the command:
2018-07-27 14:55:31,737 INFO  [163] -
Configuration updated to two instances
Instance DCB_2 started
Services relocated
Waiting for up to 30 minutes for instance DCB_1 to stop ...
Instance DCB_1 stopped
Configuration updated to one instance

2018-07-27 14:55:31,737 INFO  [163] - 
  Command executed successfully.

You can see the same information by querying Clusterware, although there aren’t any timestamps associated with it:

[root@rac122sec1 ~]# srvctl status database -db DCB -detail -verbose
Instance DCB_1 is running on node rac122sec2 with online services RON_SVC. Instance status: Open.
Instance DCB_1 is connected to ASM instance +ASM2
Online relocation: ACTIVE
Source instance: DCB_1 on rac122sec2
Destination instance: DCB_2 on rac122sec1
[root@rac122sec1 ~]# 

Although the online relocation timeout was set to 30 minutes, use of modern coding techniques and connection pooling allowed for a much faster online relocation. As you can see in the log excerpt the entire relocation was completed 2018-07-27 14:55:31,737. Clusterware now tells me that my database runs on node 1:

[root@rac122sec1 ~]# date;  srvctl status database -db DCB -detail -verbose
Fri 27 Jul 14:55:38 2018
Instance DCB_2 is running on node rac122sec1 with online services RON_SVC. Instance status: Open.
Instance DCB_2 is connected to ASM instance +ASM1
Online relocation: INACTIVE

While this is wicked, what are the implications for the application? Have a look at this print screen, taken a minute after the online relocation completed.

As you can see there was nothing unusual recorded (tab events to the left), and I couldn’t see a drop in the number of sessions connected. I noticed a slight blip in performance though but it recovered very soon thereafter.


During automatic patching of the Oracle stack opatchauto will perform an online relocation of a RAC One Node database if it is found running on the node currently undergoing patching. If your application is developed with RAC in mind – such as using a FAN-aware connection pool like UCP, and either supports Fast Connection Failover or Application Continuity, there is a good chance that patching the system does not affect your service.

While my test was successful, it is by no means representative of a production workload – my RAC One database has a SGA of 4 GB and there were 5 concurrent users – hardly what I see in the real world. However that shouldn’t deter you: if you are using RAC One Node I think it’s definitely worth a try implementing modern technology into the application.

RAC One node databases are relocated by opatchauto in 12.2 part 1

This is an interesting observation I wanted to share. I have a feeling as if there didn’t seem to be too much information out there for RAC One Node (RON) users, and I hope this helps someone thinking about patching his system.

RAC-rolling patching is well documented in patch readme files, blog posts and official white papers. Most RAC DBAs have a solid handle on the procedure. Patching RAC One Node is a different affair.

What happens when patching a RAC One Node system? As the name suggests a RAC One Node database is a cluster database restricted to one active instance in normal operations. It is possible to relocate the database from one node to another online. Oracle does this by temporarily adding a second instance to the cluster database with the intention of letting it take over from the source instance. At the end of the online relocation, the source instance is shut down, and only the destination instance remains up and running.

An online relocation quite often is a manual operation. However I noticed that such an online relocation can happen during patching with opatchauto as well, at least in 12.2.

This post is intended to show you the process as it is, in the next part I’d like to show some implications of that approach.

The environment

In this example my lab environment consists of a 2 node RAC system currently patched to I wanted to apply the July 2018 RU to the system next to get some experience with the patch.

I have one RDBMS home in addition to the mandatory Grid home, same release level for both, no one-off patches (it’s a lab after all). The virtual machines run Oracle Linux 7.4 with kernel UEK4. To keep things simple there’s a single RAC One database, named RON. I assigned it DCB (“data centre B”) as unique name because I don’t like setting db_unique_name to reflect roles such as “PROD” and “STDBY”. It gets confusing when “STDBY” runs in primary role :)

Here’s the current status of my components:

[oracle@rac122sec2 ~]$ srvctl status database -db DCB
Instance DCB_1 is running on node rac122sec1
Online relocation: INACTIVE

[oracle@rac122sec2 ~]$ srvctl status service -db DCB
Service RON_SVC is running on instance(s) DCB_1
[oracle@rac122sec2 ~]$ 

For the curious, here’s the configuration metadata:

[oracle@rac122sec2 ~]$ srvctl config service -db DCB
Service name: RON_SVC
Server pool: 
Cardinality: 1
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Global: false
Commit Outcome: false
Failover type: 
Failover method: 
TAF failover retries: 
TAF failover delay: 
Failover restore: NONE
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: SERVICE_TIME
TAF policy specification: NONE
Pluggable database name: 
Maximum lag time: ANY
SQL Translation Profile: 
Retention: 86400 seconds
Replay Initiation Time: 300 seconds
Drain timeout: 
Stop option: 
Session State Consistency: DYNAMIC
GSM Flags: 0
Service is enabled
Preferred instances: DCB_1
Available instances: 
CSS critical: no

[oracle@rac122sec2 ~]$ srvctl config database -db DCB
Database unique name: DCB
Database name: RON
Oracle home: /u01/app/oracle/product/
Oracle user: oracle
Spfile: +DATA/DCB/spfileRON.ora
Password file: +DATA/DCB/orapwRON
Start options: open
Stop options: immediateb
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: 
Disk Groups: DATA,RECO
Mount point paths: 
Services: RON_SVC
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: DCB
Candidate servers: rac122sec1,rac122sec2
OSDBA group: dba
OSOPER group: oper
Database instances: DCB_1
CSS critical: no
CPU count: 0
Memory target: 0
Maximum memory: 0
Default network number for database services: 
Database is administrator managed

The most important takeaway is that my RON instance DCB_1 is running on node rac122sec1.

Now let’s patch

After having followed the instructions in the patch readme closely, and after double/triple/quadrupel checking that I have (working, tried and tested!) backups of the entire stack I am ready to patch. This time around I’m following the instructions for the automatic application of the Grid Infrastructure RU, eg using opatchauto. Here is some relevant output from the patching session:

OPatchauto session is initiated at Thu Jul 26 14:12:12 2018

System initialization log file is /u01/app/

Session log file is /u01/app/
The id for this session is Q4JA

Executing OPatch prereq operations to verify patch applicability on home /u01/app/

Executing OPatch prereq operations to verify patch applicability on home /u01/app/oracle/product/
Patch applicability verified successfully on home /u01/app/oracle/product/

Patch applicability verified successfully on home /u01/app/

Verifying SQL patch applicability on home /u01/app/oracle/product/
SQL patch applicability verified successfully on home /u01/app/oracle/product/

Preparing to bring down database service on home /u01/app/oracle/product/

WARNING: The service RON_SVC configured on dcb will not be switched as it is not configured to run on any other node(s).
No step execution required.........

Relocating RACOne home before patching on home /u01/app/oracle/product/
/u01/app/oracle/product/ is not a RACOne database. No step execution required........

Bringing down CRS service on home /u01/app/

Wait a minute, what’s that? Have a look at the line beginning with “Relocating RACOne home before patching…”. Relocating the database wasn’t necessary in this case (remember that the database was active on rac122sec1-the other node), but opatchauto can definitely relocate your RAC One database.

When it does, you will see something like this in the output generated by opatchauto:

Preparing to bring down database service on home /u01/app/oracle/product/

WARNING: The service RON_SVC configured on dcb will not be switched as it is not configured to run on any other node(s).
Successfully prepared home /u01/app/oracle/product/ to bring down database service

Relocating RACOne home before patching on home /u01/app/oracle/product/
Relocated RACOne home before patching on home /u01/app/oracle/product/

The last 2 lines are those of interest. opatchauto detected that a RAC One database was running on the active node, and relocated it. Under the covers it uses a “srvctl relocate database …” command, as shown in the session log file.

Interestingly however, and contrary to what I expected, opatchauto moves the RAC One database back to where it came from as a post-patch step. Towards then end of the patching session I saw this:

Starting CRS service on home /u01/app/
Postpatch operation log file location: /u01/app/oracle/crsdata/rac122sec2/crsconfig/crspatch_rac122sec2_2018-07-26_03-01-06PM.log
CRS service started successfully on home /u01/app/

Relocating back RACOne to home /u01/app/oracle/product/
Relocated back RACOne home successfully to home /u01/app/oracle/product/

Preparing home /u01/app/oracle/product/ after database service restarted
No step execution required.........

The relevant bit is in the middle (“relocating …”). After relocating the database to rac122sec1 opatchauto moved it back to rac122sec2.


Unlike rolling patching on multi-node RAC where all instances on the patched RDBMS home are shut down and applications rely on connection pools and Fast Application Notification to maintain service ability, a RAC One Node database might be relocated to a different node in the cluster. There are implications to that process for application developers, some of which I hope to share in the next post.

Little things worth knowing: Creating a RAC One Node database on the command line

This post is going to be super short, and mostly just a note to myself as I constantly forget how to create a RAC One database on the command line. This post is for but should be similar on 12.1 (although I didn’t test!).

Provided you are licensed appropriately, this is probably the most basic way how you create an admin-managed RAC One database on Linux for use in a lab environment:

dbca -silent -createDatabase -gdbName RON -templateName gold_image01.dbc \
 -createAsContainerDatabase false -databaseConfigType RACONENODE \
 -RACOneNodeServiceName RON_SVC -totalMemory 1024 \
 -nodelist rac122node1,rac122node2 -storageType ASM \
 -datafileDestination '+DATA' -recoveryAreaDestination '+RECO' \
 -recoveryAreaSize 10240

This works for me, but most likely not for you :) And it’s certainly not suitable for a production deployment. Make sure to adapt the command as needed; I tend to create gold images for use with dbca, and this is one example.

The command itself should be fairly self-explanatory. If you are unsure about the meaning of the various options, have a look at the output of “dbca -help -createDatabase” and the official documentation/My Oracle Support. I learned the hard way that forgetting the “-nodelist” results in a single instance creation instead of an error message.

I didn’t find too many examples on the net, hope someone finds this useful.

OSWatcher, Tracefile Analyzer, and Oracle RAC 12.2

When I started the series about Tracefile Analyzer (TFA) I promised three parts. One for single instance, another one for Oracle Restart and this one is going to be about Real Application Clusters. The previous two parts are published already, this is the final piece.

The environment

I am using a virtualised 2-node cluster named rac122pri with nodes rac122pri1{1,2} based on Oracle Linux 7.4. RAC is patched to I installed a Grid Home and an RDBMS home (Enterprise Edition).

Real Application Clusters

Before starting this discussion it’s worth pointing out that TFA integration in RAC 12.2 works really well. TFA is installed as part of the initial setup of the binaries and documented in the Autonomous Health Framework.

As soon as you execute as part of your RAC 12.2 Grid Infrastructure installation, TFA is set up as the first step out of 19. Unfortunately I don’t have the screen output available, but all the action is logged in rootcrs_$(hostname)_timestamp.log anyway. You can find that logfile in $ORACLE_BASE/crsdata/$(hostname)/crsconfig. If memory serves me right this is a 12.2 change, I think the file was in $GRID_HOME/cfgtoollogs in 12.1. but I’m not sure (and don’t have a system to check against).

Here is the relevant output from the log file:

2018-05-11 12:16:02: Command output:
>  CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'. 
>End Command output
2018-05-11 12:16:02: CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'.
2018-05-11 12:16:02: Executing cmd: /u01/app/ -p has -f clsrsc -m 4001
2018-05-11 12:16:02: Command output:
>  CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector. 
>End Command output
2018-05-11 12:16:02: CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector.
2018-05-11 12:16:02: Executing cmd: /u01/app/ -silent -crshome /u01/app/
2018-05-11 12:16:24: Command output:
>  TFA Installation Log will be written to File : /tmp/tfa_install_8879_2018_05_11-12_16_02.log
>  Starting TFA installation
>  Using JAVA_HOME : /u01/app/
>  Running Auto Setup for TFA as user root...
>  Installing TFA now...
>  TFA Will be Installed on rac122pri1...
>  TFA will scan the following Directories
>  ++++++++++++++++++++++++++++++++++++++++++++
>  .---------------------------------------------------------.
>  |                        rac122pri1                       |
>  +----------------------------------------------+----------+
>  | Trace Directory                              | Resource |
>  +----------------------------------------------+----------+
>  | /u01/app/        | CRS      |

... [a lot more directories]

>  | /u01/app/oracle/crsdata/rac122pri1/trace     | CRS      |
>  '----------------------------------------------+----------'
>  Installing TFA on rac122pri1:
>  HOST: rac122pri1     TFA_HOME: /u01/app/
>  .------------------------------------------------------------------------------.
>  | Host       | Status of TFA | PID  | Port | Version    | Build ID             |
>  +------------+---------------+------+------+------------+----------------------+
>  | rac122pri1 | RUNNING       | 9165 | 5000 | | 12210020161122170355 |
>  '------------+---------------+------+------+------------+----------------------'
>  Running Inventory in All Nodes...
>  Enabling Access for Non-root Users on rac122pri1...
>  Adding default users to TFA Access list...
>  Summary of TFA Installation:
>  .----------------------------------------------------------------------.
>  |                              rac122pri1                              |
>  +---------------------+------------------------------------------------+
>  | Parameter           | Value                                          |
>  +---------------------+------------------------------------------------+
>  | Install location    | /u01/app/ |
>  | Repository location | /u01/app/oracle/tfa/repository                 |
>  | Repository usage    | 0 MB out of 10240 MB                           |
>  '---------------------+------------------------------------------------'
>  Installing oratop extension..
>  TFA is successfully installed...


>End Command output
2018-05-11 12:16:24: Executing cmd: /u01/app/ -p has -f clsrsc -m 4002
2018-05-11 12:16:24: Command output:
>  CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector. 
>End Command output

As with the other database deployment types, TFA integrates into systemd on Oracle Linux 7:

[root@rac122pri1 ~]# systemctl status oracle-tfa
● oracle-tfa.service - Oracle Trace File Analyzer
   Loaded: loaded (/etc/systemd/system/oracle-tfa.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2018-06-20 23:03:11 BST; 15h ago
 Main PID: 3092 (init.tfa)
   CGroup: /system.slice/oracle-tfa.service
           ├─ 3092 /bin/sh /etc/init.d/init.tfa run >/dev/null 2>&1 </dev/null
           ├─ 3698 /u01/app/ -
  server -Xms32m -Xmx64m -Djava.awt.headless=true -Ddisable.checkForUpdate=true
  oracle.rat.tfa.TFAMain /u01/app/
           └─15827 /bin/sleep 30

The service is enabled and starts together with the operating system.

RAC-TFA lives in the Grid Home

If you read the previous parts of this series you might have noticed in the previous output that unlike single instance and Oracle Restart environments, TFA is resident in the Grid Home as opposed to ORACLE_BASE. This could have implications if you are using a system implementing separation of duties. For everyone else it’s a question of making sure the environment is set correctly.

[oracle@rac122pri1 ~]$ . oraenv
ORACLE_SID = [oracle] ? +ASM1
The Oracle base has been set to /u01/app/oracle
[oracle@rac122pri1 ~]$ which tfactl

Another difference between single node and RAC installations is the status message. TFA reports its status on all nodes in the cluster, as shown here:

[oracle@rac122pri1 ~]$ tfactl print status

| Host       | Status of TFA | PID  | Port | Version    | Build ID             | Inventory Status |
| rac122pri1 | RUNNING       | 3698 | 5000 | | 18210020180529111033 | COMPLETE         |
| rac122pri2 | RUNNING       | 3697 | 5000 | | 18210020180529111033 | COMPLETE         |
[oracle@rac122pri1 ~]$

This isn’t the stock version by the way, I have recently upgraded my deployment to More about that later. I recently installed RAC 12.2 and when invoking TFA am greeted with the familiar “WARNING – your software is older than 180 days. Please consider upgrading TFA to the latest version”.

You may want to consider upgrading the installation

As with the other deployments, the version you get with the base installation is most likely outdated. My base release showed this:

| Host       | Status of TFA | PID  | Port | Version    | Build ID             | Inventory Status |
| rac122pri1 | RUNNING       | 3698 | 5000 | | 12210020161122170355 | COMPLETE         |
| rac122pri2 | RUNNING       | 3697 | 5000 | | 12210020161122170355 | COMPLETE         |

But not only that: if you check for toolstatus you will notice that most of the support tools are missing just like with the other deployments. I love to have them and as such upgraded to the current version at the time of writing ( My Oracle Support DOC ID 1513912.1 is the reference, as always.

After the upgrade completed, I could see the following tools:

[oracle@rac122pri1 ~]$ tfactl toolstatus

|                 TOOLS STATUS - HOST : rac122pri1                 |
| Tool Type            | Tool         | Version      | Status      |
| Development Tools    | orachk       | | DEPLOYED    |
|                      | oratop       |       14.1.2 | DEPLOYED    |
| Support Tools Bundle | darda        | 2.10.0.R6036 | DEPLOYED    |
|                      | oswbb        |        8.0.1 | RUNNING     |
|                      | prw          | | NOT RUNNING |
| TFA Utilities        | alertsummary | | DEPLOYED    |
|                      | calog        | | DEPLOYED    |
|                      | changes      | | DEPLOYED    |
|                      | dbglevel     | | DEPLOYED    |
|                      | events       | | DEPLOYED    |
|                      | grep         | | DEPLOYED    |
|                      | history      | | DEPLOYED    |
|                      | ls           | | DEPLOYED    |
|                      | managelogs   | | DEPLOYED    |
|                      | menu         | | DEPLOYED    |
|                      | param        | | DEPLOYED    |
|                      | ps           | | DEPLOYED    |
|                      | pstack       | | DEPLOYED    |
|                      | search       | | DEPLOYED    |
|                      | summary      | | DEPLOYED    |
|                      | tail         | | DEPLOYED    |
|                      | triage       | | DEPLOYED    |
|                      | vi           | | DEPLOYED    |

Note :-
  DEPLOYED    : Installed and Available - To be configured or run interactively.
  NOT RUNNING : Configured and Available - Currently turned off interactively.
  RUNNING     : Configured and Available.


Whilst TFA is a great tool for single instance and Oracle Restart environments, it is indispensible for RAC. The ability to gather a lot of diagnostic data that would otherwise be difficult to get is the prime use case. Tools such as OSWatcher are very useful when it comes to diagnosing system status at the time of an instance eviction for example.

Hybrid Columnar Compression in 12.2 – nice new feature

Oracle 12.2 introduced an interesting optimisation for Hybrid Columnar Compression (HCC). Until 12.2 you had to use direct path inserts into HCC compressed segments for data to be actually compressed. If you didn’t use a direct path insert you would still succeed in entering data into the segment, however your newly inserted data was not HCC compressed. There is no error message or other warning telling you about that, which can lead to surprises for the unaware.

My friend and colleague Nick has pointed out that the official HCC white paper states – somewhat hidden – that this requirement is no longer as strict in 12.2. I haven’t managed to find the document Nick pointed out, but a quick search using my favourite retrieval engine unearthed the updated version for 18c.

The new feature is easy to miss: you read on page 3 that data must be loaded using data warehouse bulk loading (direct path) techniques. Examples of said technologies follow before the author continues with a discussion about querying HCC data. For me that was the end of the subject… Turns out it wasn’t: the nugget Nick unearthed was on page 4, in the last paragraph before the next section on “HCC Warehouse (Query) Compression”. Quoting literally from said white paper:

Starting with Oracle Database 12.2, HCC automatically compresses new data from SQL INSERT … SELECT statements, without the APPEND hint and array inserts from programmatic interfaces such as PL/SQL and the Oracle Call Interface (OCI).

Aha! Once you know what to look for you find the same information in the 12.2 new features guide, too. Sometimes it’s hard to see the wood for all those trees.

So there is no more need for the append hint for insert-select in 12.2? This would be great news, and should alleviate some issues people have from unknowingly using HCC without direct path inserts. And it’s good for concurrency! Although I do like white papers, there are times when the white paper has to be checked against the lab to see if it is actually true.

I did exactly this for this blog post.

12.2 Test Case

I am using the SOE.ORDERS table for that purpose, as it holds a fair bit of data. To see whether the new algorithm works I decided to create a new empty table ORDERS_HCC with the same table structure as SOE.ORDERS. In the next step I issue an insert-select statement. If the white paper is correct it’ll compress the data using Query High.

SQL (12.2)> show user    
SQL (12.2)> select banner from v$version where rownum  create table orders_hcc 
  2  column store compress for query high 
  3  as select * from orders where 1 = 0; 
Table created.

SQL (12.2)> insert into orders_hcc 
  2  select * from orders where rownum  commit;

Commit complete.

SQL (12.2)>

Note that I specifically omitted the /*+ append */ hint in the insert statement. Also note that the preceding CTAS statement didn’t select any rows from the source. In Oracle releases up to and including 12.1, data in ORDERS_HCC would not be compressed at the end of this little code snippet.

But how can you prove the white paper is right? Thankfully Oracle is well instrumented, and even more thankfully that instrumentation not limited to performance. The Oracle-provided dbms_compression package features a procedure called “get_compression_type()”, which allows you to pass it a ROWID and some other information and it’ll tell you the block’s compression algorithm.

Remember that you can change the compression algorithm for a given segment many times over. A partition in a range partitioned table can start uncompressed while being actively used, and as data gets colder, compression levels can be increased. Technically speaking the “alter table … column store” command on its own does not change the way data is stored in the segment. Data currently stored will remain in whatever state it was before changing the compression attribute. Only newly inserted data will be compressed according to the changed segment attribute.

Back to the example: using DBMS_COMPRESSION I would like to find out if my table data is indeed compressed for Query High after my earlier insert command. Let’s see if it is, using the first 10 rows as a rough indication.

SQL (12.2)> select dbms_compression.get_compression_type(user, 'ORDERS_HCC', rowid) ctype
  2  from ORDERS_HCC where rownum < 11;


10 rows selected.

Well it would appear as if these are all compressed for Query High (QH). Looking at the package definition I can see that a compression type of 4 indicates Query High.

So far so good, but I merely checked 10 out of 1 million rows. I’m fairly sure the rest of the table is also HCC compressed for QH, but I want proof. To remove any doubts, I can query the whole table. I’m doing this so you don’t have to. The next query will take forever (eg certainly more than 1 minute) to execute, and it is CPU bound so please don’t do this at work. If you really feel like having to run a query like this, don’t do it outside the lab. You have been warned :)

SQL (12.2)> with comptypes as (
  2   select rownum rn, dbms_compression.get_compression_type(user, 'ORDERS_HCC', rowid) ctype
  3     from ORDERS_HCC
  4  )
  5      select count(*), ctype from comptypes group by ctype;

  COUNT(*)      CTYPE
---------- ----------
   1000000          4

I always start these types of queries in a screen (1) session to prevent network connection issues from interrupting my long running task. After some time, the query returns with the results as you can see. The entire table is compressed with Query High.


Array-inserts into HCC segments can compress data in Oracle 12.2 even if you don’t specify the append hint. The behaviour for conventional inserts did not change. I am going to post another piece of research containing some more data later this week or next.

You may still need gcc when patching Oracle Database 12.2

I have previously written about changes in the Oracle 12.2 preinstall RPM and how gcc is no longer part of the dependencies list. As was pointed out to me, this shouldn’t be necessary anymore, according to the 12.2 Linux Database Installation Guide. Check the blue note for a statement indicating that gcc and gcc-c++ aren’t needed for Grid Infrastructure, nor for the RDBMS software.

I have applied patch 27100009 (January 2018 Release Update on my 2 node RAC system in the lab, and found out that this is partially true :) You may or may not encounter this issue in your environment, see below.

Updating the preinstall RPM

Just to be sure I didn’t miss any changes to the preinstall RPM I pulled the latest one from Oracle and checked it’s requirements for gcc.

[oracle@rac122pri1 ~]$ rpm -qi oracle-database-server-12cR2-preinstall-1.0-3.el7.x86_64
Name        : oracle-database-server-12cR2-preinstall
Version     : 1.0
Release     : 3.el7
Architecture: x86_64
Install Date: Fri 19 Jan 2018 03:13:18 PM GMT
Group       : Test Environment/Libraries
Size        : 56561
License     : GPLv2
Signature   : RSA/SHA256, Mon 10 Jul 2017 11:27:07 AM BST, Key ID 72f97b74ec551f03
Source RPM  : oracle-database-server-12cR2-preinstall-1.0-3.el7.src.rpm
Build Date  : Mon 10 Jul 2017 11:26:59 AM BST
Build Host  :
Relocations : (not relocatable)
Vendor      : Oracle
Summary     : Sets the system for Oracle Database single instance and Real Application
              Cluster install for Oracle Linux 7
Description :
The Oracle Preinstallation RPM package installs software packages and sets system parameters 
required for Oracle Database single instance and Oracle Real Application Clusters installations 
for Oracle Linux Release 7
Files affected: /etc/sysctl.conf, /boot/grub/menu.lst OR /boot/grub2/grub.cfg
Files added: /etc/security/limits.d/oracle-database-server-12cR2-preinstall.conf
[oracle@rac122pri1 ~]$ rpm -q --requires oracle-database-server-12cR2-preinstall-1.0-3.el7.x86_64 | grep -i gcc
[oracle@rac122pri1 ~]$ 

So gcc is definitely not part of the dependent packages, and my minimum install doesn’t pull gcc (or gcc-c++ for that matter):

[oracle@rac122pri1 ~]$ rpm -qa | grep gcc
[oracle@rac122pri1 ~]$ 

This is a 2 node cluster, consisting of nodes rac122pri1 and rac122pri2. Both use Oracle Linux 7.4 with UEK4 patched to February 12th 2018. As the names I picked for my cluster nodes suggest, this is a system running RAC 12.2.

IMPORTANT note: just because I hit the issue doesn’t mean there’s an issue for everyone. My lab environment is that – my lab environment. I just wanted to point out a way to investigate the problem and how I resolved it in my lab environment. Your mileage might vary.

Applying the January 2018 RU

I always like to be current when it comes to Oracle, and in that spirit decided to apply the January 2018 RU to my cluster after having made sure that I have a working backup. My rule is that there is no patching without backups, ever. And by backups I mean backups of the entire stack ;)

Since this was a new installation in my lab I thought I’d give opatchauto another chance and run with it. What could possibly go wrong?

Node 1 went without any issues, and opatchauto reported that it had applied the patches it wanted to apply. I don’t have the screen output anymore, however here’s the summary after the patch completed.

[oracle@rac122pri1 ~]$ opatch lspatches -oh /u01/app/oracle/product/
27335416;OCW JAN 2018 RELEASE UPDATE (27335416)
27105253;Database Release Update : (27105253)

OPatch succeeded.

[oracle@rac122pri1 ~]$ opatch lspatches -oh /u01/app/
27335416;OCW JAN 2018 RELEASE UPDATE (27335416)
27144050;Tomcat Release Update (27144050)
27128906;ACFS Release Update : (27128906)
27105253;Database Release Update : (27105253)
26839277;DBWLM RELEASE UPDATE (26839277)

OPatch succeeded.
[oracle@rac122pri1 ~]$

Did you notice the “Tomcat Release Update” in the Grid Home? So much to investigate!

Patching node 2

After the patch completed on node 1 I continued with node 2. At first, everything looked quite normal, and opatch did it’s thing. I have taken a habit to always apply anything that depends on the network connection to be up in a screen (1) session. That protects the work I’m doing from intermittent network trouble, and allows me to keep a view on how things are progressing.

After a few minutes of work, opatchauto reported this (formatted for readability):

Start applying binary patch on home /u01/app/
Failed while applying binary patches on home /u01/app/

Execution of [OPatchAutoBinaryAction] patch action failed, check log for more details. Failures:
Patch Target : rac122pri2->/u01/app/ Type[crs]
Details: [
---------------------------Patching Failed---------------------------------
Command execution failed during patching in home: /u01/app/, host: rac122pri2.
Command failed:  /u01/app/  apply /u01/stage/27100009 -oh /u01/app/
    -target_type cluster -binary -invPtrLoc /u01/app/ 
    -jre /u01/app/ 
    -persistresult /u01/app/ 
    -analyzedresult /u01/app/
Command failure output:
==Following patches FAILED in apply:

Patch: /u01/stage/27100009/27335416
Log: /u01/app/
Reason: Failed during Patching: oracle.opatch.opatchsdk.OPatchException: Re-link fails on target "install_srvm".

Not exactly what you want to see during patching. But there’s no need to panic (just yet ;) The error message points me to a log file. Opening the log file it quickly become apparent why the issue occurred:

[22-Jan-2018 09:40:30] [INFO]       [OPSR-TIME] Finished applying patch "27335416" to local system
[22-Jan-2018 09:40:31] [INFO]       [OPSR-TIME] Loading raw inventory
[22-Jan-2018 09:40:31] [INFO]       [OPSR-MEMORY] Loaded all components from inventory. Heap memory in use: 314 (MB)
[22-Jan-2018 09:40:31] [INFO]       [OPSR-MEMORY] Loaded all one offs from inventory. Heap memory in use: 314 (MB)
[22-Jan-2018 09:40:31] [INFO]       [OPSR-TIME] Raw inventory loaded successfully
[22-Jan-2018 09:40:31] [INFO]       [OPSR-TIME] Loading cooked inventory
[22-Jan-2018 09:40:31] [INFO]       [OPSR-MEMORY] : Loading cooked one offs. Heap memory used 314 (MB)
[22-Jan-2018 09:40:32] [INFO]       [OPSR-MEMORY] : Loaded cooked oneoffs. Heap memory used : 363 (MB)
[22-Jan-2018 09:40:32] [INFO]       [OPSR-TIME] Cooked inventory loaded successfully
[22-Jan-2018 09:40:32] [INFO]       OUI-67050:Running make for target install_srvm
[22-Jan-2018 09:40:32] [INFO]       Start invoking 'make' at Mon Jan 22 09:40:32 GMT 2018Mon Jan 22 09:40:32 GMT 2018
[22-Jan-2018 09:40:32] [INFO]       Finish invoking 'make' at Mon Jan 22 09:40:32 GMT 2018
[22-Jan-2018 09:40:32] [WARNING]    OUI-67200:Make failed to invoke "/usr/bin/make -f install_srvm 
                                    ORACLE_HOME=/u01/app/"....'/bin/sh: /usr/bin/gcc: No such file or directory
                                    make: *** [/u01/app/] Error 127
[22-Jan-2018 09:40:32] [INFO]       Stack Description: java.lang.RuntimeException: /bin/sh: /usr/bin/gcc: No such file or directory
                                    make: *** [/u01/app/] Error 127
[22-Jan-2018 09:40:32] [INFO]       StackTrace: oracle.opatch.MakeAction.apply(

This is actually quite simple: for some reason opatch wants to use gcc, and doesn’t find it. After installing gcc and dependencies, I resumed opatchauto and finished patching successfully. I haven’t understood why Oracle wants to use gcc on node 2 after it didn’t require it on node 1.

Just thought I’d pass this on in case you hit the same problem. Happy patching!

OSWatcher, Tracefile Analyzer, and Oracle Restart 12.2

You are about to read the second part of this mini-series on TFA and Oracle 12.2. In the previous article I wrote about TFA and Oracle 12.2 single instance. In this short article I am going to have a look at TFA in a 12.2 Oracle Restart environment before rounding it up with an investigation into a full-blown RAC installation in part 3.

Summarising the first part I can only say that I am very happy that we now get TFA as part of the standard installation. Running it in daemon mode provides some great insights, and even if you did not upgrade the installation to “MOS-TFA”, you have a very fine tool for Oracle troubleshooting at your disposal.

Summary of the environment

My environment is largely the same as last time, except the machine name changed to server4 and I have additional storage for use with ASM.

  • It’s still Oracle Linux 7.4 with UEK4
  • Oracle Grid Infrastructure was installed first
  • Followed by an installation of the Oracle RDBMS EE software
  • After having created a database in ASM I applied the January 2018 GI RU
  • Finally I upgraded TFA to the current version (as downloaded from My Oracle Support DOC ID 1513912.1)

Not all of these steps are relevant for this article though.

Now where would you find TFA?

The question I had when creating this environment was essentially this: where would I find TFA? Would it be part of the Grid Home, or rather the RDBMS installation?

After having installed the binaries for Grid Infrastructure, I didn’t find a reference to in the Grid Home’s script.

[oracle@server4 ~]$ cat /u01/app/oracle/product/
. /u01/app/oracle/product/ "$@"
. /u01/app/oracle/product/

# Root Actions related to network

# Invoke standalone

/u01/app/oracle/product/ $@ 
if [ $EXITCODE -ne 0 ]; then

[oracle@server4 ~]$ 

I checked whether TFA has been configured in a different place, but there wasn’t any TFA-related process running nor was there a systemd unit file with *tfa* in its name. So it looks like you don’t get the option to install TFA automatically as part of the Grid Infrastructure installation.

Not finding TFA configured to run out of Grid Infrastructure surprised me, especially since Oracle states in the Autonomous Health Framework documentation chapter 4.2.2 that TFA is automatically configured as part of the GI configuration, upon invoking or According to my testing, this is true for RAC, but not Oracle Restart.

Which kind-of makes sense if you consider that most users will install the database software anyway in an Oracle Restart setup.

It doesn’t really matter, read on ;)

Installing the RDBMS software

If TFA isn’t configured automatically with Grid Infrastructure in an Oracle Restart configuration it should come as no surprise that TFA is once more installed and configured from the RDBMS home. Just as with the single instance installation I wrote about previously, you find a reference to in the RDBMS home’s

[oracle@server4 tfa]$ cat /u01/app/oracle/product/ 
. /u01/app/oracle/product/ "$@"
. /u01/app/oracle/product/

# Root Actions related to network

# Invoke standalone

[oracle@server4 tfa]$ 

The procedure appears to be almost exactly the same as with single instance Oracle. I have again opted to use TFA in daemon mode, just as I did before in my post about Oracle single instance.

TFA thankfully discovers and adds both RDBMS as well as Grid Infrastructure directories (and quite a few Linux related directories, too!). You can see for yourself when running tfactl print directories.

As the final step you might want to consider upgrading TFA to the MOS version. That’s it-happy troubleshooting!

OSWatcher, Tracefile Analyzer, and Oracle 12.2 single instance

I have previously written about TFA, OSWatcher et all for Oracle 12.1. Since then, a lot of things have happened and I had an update for 12.2 on my to-do list for far too long. Experience teaches me that references to support notes and official documentation get out of date rather quickly, so as always, if you find anything that changed please let me know via the comments section and I’ll update the post.

This is going to be a 3 part mini-series to save you having to go over 42 pages of text … In this first part I’m going to have a look at single instance Oracle. In part 2 I’ll have a look at Oracle Restart environments, and finally in part 3 I’ll finish the series by looking at a 12.2 RAC system.

The environment

I am using a small VM to install Oracle (initially) on Oracle Linux 7.4 with kernel UEK4. As always, my EE database binaries go into /u01/app/oracle/product/

The installation/testing (and reporting) of my findings are following this approach:

  • Install the O/S
  • Install Oracle EE
  • Create an EE database (not shown here)
  • Patch binaries and database to
  • Upgrade TFA to as downloaded from My Oracle Support DOC ID 1513912.1

These were the current versions at the time of writing.

Install Oracle

The first step after the O/S is provisioned is to install the Oracle software, obviously. I have noticed that TFA is part of the Oracle binaries. Towards the end of the installation process, you are prompted to execute, as normal. On my system, had the following contents:

      1 #!/bin/sh
      3 . /u01/app/oracle/product/ "$@"
      4 . /u01/app/oracle/product/
      5 /u01/app/oracle/product/
      6 /u01/app/oracle/product/
      8 #
      9 # Root Actions related to network
     10 #
     11 /u01/app/oracle/product/
     13 #
     14 # Invoke standalone
     15 #
     16 /u01/app/oracle/product/
     18 /u01/app/oracle/product/

After a few variables are set/defined by sourcing in files created during the installation, is called (see line 5). It allows you to configure TFA to run as a background (daemon) process. I decided to go with that option after consulting chapter 4 in the 12.2 Autonomous Health Framework documentation and reading about the advantages of using TFA as a daemon. This may or may not be the right way to run TFA for you, the documentation is really good and helps you decide. Here is the transcript of my execution:

[root@server5 ~]#  /u01/app/oracle/product/
Performing root user operation.

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u01/app/oracle/product/

Enter the full pathname of the local bin directory: [/usr/local/bin]:
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...

Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Do you want to setup Oracle Trace File Analyzer (TFA) now ? yes|[no] :
Installing Oracle Trace File Analyzer (TFA).
Log File: /u01/app/oracle/product/
Finished installing Oracle Trace File Analyzer (TFA)

Once that message is shown, TFA is configured and controlled via a systemd unit file:

[root@server5 ~]# systemctl cat oracle-tfa
# /etc/systemd/system/oracle-tfa.service
# Copyright (c) 2014, Oracle and/or its affiliates. All rights reserved.
# Oracle TFA startup
Description=Oracle Trace File Analyzer
ExecStart=/etc/init.d/init.tfa run >/dev/null 2>&1 </dev/null


The service is enabled and running.

After the completion of, TFA resides in $ORACLE_BASE/tfa and its subdirectories. This is documented in the 12.2 Autonomous Health Framework chapter 4.2.3 and has an interesting implication: if you set your environment using oraenv, you might find that you get errors invoking tfactl, such as these on my VM. I have used a “minimum install” for my operating system and quite specifically didn’t add any additional perl modules in my kickstart file. Now, when invoking tfactl after having set my environment using oraenv, I find that there are missing perl modules in my system’s perl installation:

[oracle@server5 ~]$ . oraenv
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@server5 ~]$ tfactl status
Can't locate Digest/ in @INC (@INC contains: 
/usr/local/lib64/perl5 /usr/local/share/perl5 
/usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl
/usr/lib64/perl5 /usr/share/perl5 . /u01/app/oracle/tfa/server5/tfa_home/bin 
/u01/app/oracle/tfa/server5/tfa_home/bin/common /u01/app/oracle/tfa/server5/tfa_home/bin/modules 
at /u01/app/oracle/tfa/server5/tfa_home/bin/common/ line 7628.
BEGIN failed--compilation aborted at /u01/app/oracle/tfa/server5/tfa_home/bin/common/ line 7628.
Compilation failed in require at /u01/app/oracle/tfa/server5/tfa_home/bin/ line 223.
BEGIN failed--compilation aborted at /u01/app/oracle/tfa/server5/tfa_home/bin/ line 223.

The output has been changed for readability (originally I was missing Data::Dumper as well). After studying the documentation (still section 4.2.3 in the aforementioned document), it turns out to be a user mistake. As I said before, after TFA is configured using as part of the script execution, it runs in daemon mode and crucially, is available from $ORACLE_BASE/tfa. I found that location being referred to in /etc/init.d/init.tfa as well. When I simply typed “tfactl” into my terminal window, I invoked a different “tfactl”. There is a lot more to be said about this, and I will try and do so in a different post.

NB: the same section 4.2.3 in the documentation states that even should you not run TFA in daemon mode, you can still make use of “user mode TFA” in $ORACLE_HOME, although there are certain restrictions. I haven’t pursued that route.

Anyway, after switching to the location where TFA is actually installed ($ORACLE_BASE/tfa), all is well. It seems that running creates a new “Oracle” perl:

[root@server5 ~]# find /u01/app/oracle/tfa -name perl

I found Digest::MD5 and Data::Dumper in /u01/app/oracle/tfa/server5/tfa_home/perl/lib/5.22.0/x86_64-linux-thread-multi.

So let’s try and get the status of the current installation from $ORACLE_BASE/tfa:

[oracle@server5 ~]$ /u01/app/oracle/tfa/bin/tfactl status

Access Denied: Only TFA Admin can run this command

Nearly there: the perl modules are no longer reported to be missing, the “Oracle” perl installation appears to be used now. But what about this error message? I read in section 4.2.4 “Securing Access to Oracle Trace File Analyzer” (still referring to the Autonomous Health Framework manual) that access to TFA is restricted. However, the RDBMS owner should have been granted access automatically.

Using the commands shown in the manual I checked permissions and it turns out that the oracle user is configured to have access to TFA.

[root@server5 ~]# /u01/app/oracle/tfa/bin/tfactl access lsusers
|       TFA Users in server5      |
| User Name | User Type | Status  |
| oracle    | USER      | Allowed |

In fact, I can query TFA’s status using the “print status” command as oracle (/u01/app/oracle/tfa/bin/tfactl print status). I compared the output of “tfactl -help” between oracle and root, and there are more options available when running as root. This might explain the above error.

What is the status now?

TFA is now set up and working, but using the base release:

[root@server5 ~]# /u01/app/oracle/tfa/bin/tfactl status

| Host    | Status of TFA | PID   | Port  | Version    | Build ID             | Inventory Status |
| server5 | RUNNING       | 18786 | 41482 | | 12210020161122170355 | COMPLETE         |

It should probably be patched to something more recent. I’ll try that in 2 ways: first by applying the January 2018 RU to see if the version changes. Since the standard deployment doesn’t come with OSWatcher which I’m particularly interested in, I’ll download and apply TFA next. As with all patching, I need to make sure that I have working backups which I’m comfortable restoring should anything go badly wrong.

Status after applying the January RU

A combination of opatch/datapatch later, my system is on the latest RU patchlevel:

[oracle@server5 OPatch]$ opatch lspatches
27105253;Database Release Update : (27105253)

OPatch succeeded.

However, this did not have an effect on the version of TFA in $ORACLE_BASE:

[root@server5 ~]# systemctl restart oracle-tfa
[root@server5 ~]# /u01/app/oracle/tfa/bin/tfactl status

| Host    | Status of TFA | PID   | Port  | Version    | Build ID             | Inventory Status |
| server5 | RUNNING       | 24042 | 37226 | | 12210020161122170355 | COMPLETE         |

Not quite what I expected after reading the docs: the installation of the latest RU should have updated TFA as well. But maybe I got something wrong on my end. The RU readme did not have any reference to TFA that I could find.

Yet it doesn’t matter: I wanted to have all the great support tools anyway (and they aren’t shipped with “stock TFA”), so it was time to install the latest version from MOS.

Upgrading TFA using (MOS)

The patch is quite simple and well documented. If TFA is up and running in daemon mode as in my example, the patching tool will recognise that fact and patch the installation in-place. After a couple of minutes on my VM, I have a new version:

[root@server5 stage]# /u01/app/oracle/tfa/bin/tfactl status

| Host    | Status of TFA | PID   | Port  | Version    | Build ID             | Inventory Status |
| server5 | RUNNING       | 28105 | 39100 | | 12213120171215143839 | COMPLETE         |

The MOS version comes with lots of useful tools as well:

[oracle@server5 stage]$ /u01/app/oracle/tfa/bin/tfactl toolstatus

|                   TOOLS STATUS - HOST : server5                  |
| Tool Type            | Tool         | Version      | Status      |
| Development Tools    | orachk       | | DEPLOYED    |
|                      | oratop       |       14.1.2 | DEPLOYED    |
| Support Tools Bundle | darda        | 2.10.0.R6036 | DEPLOYED    |
|                      | oswbb        |        8.1.2 | RUNNING     |
|                      | prw          | | NOT RUNNING |
| TFA Utilities        | alertsummary | | DEPLOYED    |
|                      | calog        | | DEPLOYED    |
|                      | changes      | | DEPLOYED    |
|                      | dbglevel     | | DEPLOYED    |
|                      | events       | | DEPLOYED    |
|                      | grep         | | DEPLOYED    |
|                      | history      | | DEPLOYED    |
|                      | ls           | | DEPLOYED    |
|                      | managelogs   | | DEPLOYED    |
|                      | menu         | | DEPLOYED    |
|                      | param        | | DEPLOYED    |
|                      | ps           | | DEPLOYED    |
|                      | pstack       | | DEPLOYED    |
|                      | summary      | | DEPLOYED    |
|                      | tail         | | DEPLOYED    |
|                      | triage       | | DEPLOYED    |
|                      | vi           | | DEPLOYED    |

Note :-
  DEPLOYED    : Installed and Available - To be configured or run interactively.
  NOT RUNNING : Configured and Available - Currently turned off interactively.
  RUNNING     : Configured and Available.

[oracle@server5 stage]$ 

Since I care a lot about OSWatcher, I was very pleased to see it running.

[oracle@server5 stage]$ ps -ef | grep -i osw
oracle   28344     1  0 10:58 ?        00:00:00 /bin/sh ./ 30 48 NONE /u01/app/oracle/tfa/repository/suptools/server5/oswbb/oracle/archive
oracle   28934 28344  0 10:58 ?        00:00:00 /bin/sh ./ 48 /u01/app/oracle/tfa/repository/suptools/server5/oswbb/oracle/archive
oracle   30662 27252  0 11:01 pts/4    00:00:00 grep --color=auto -i osw
[oracle@server5 stage]$ 

Kindly refer to the documentation for more information about TFA. It’s quite a collection of tools, and it helps you in so many ways…