I am about to update my posts detailing the use of Tracefile Analyzer (TFA) now that Oracle has merged it into its Autonomous Health Framework (AHF) and came across an interesting observation worth blogging about upfront.
After completing a fresh installation of AHF 20.1.2, the current version at the time of writing, I noticed OSWatcher didn’t start on my VM. I am operating a single instance Oracle 19.7.0 database, running on Oracle Linux 7.8/UEK 5. The system does not contain any traces of Grid Infrastructure.
I fully expect this problem to be transient, but until Oracle provides a fix I wanted to share my workaround. I didn’t find this problem covered in the usual sources, including My Oracle Support.
For the record, this is the version I can confirm to be affected:
[root@server4 ~]# /opt/oracle.ahf/bin/tfactl status .-----------------------------------------------------------------------------------------------. | Host | Status of TFA | PID | Port | Version | Build ID | Inventory Status | +---------+---------------+------+-------+------------+----------------------+------------------+ | server4 | RUNNING | 9808 | 10905 | 20.1.2.0.0 | 20120020200403113404 | COMPLETE | '---------+---------------+------+-------+------------+----------------------+------------------' [root@server4 ~]#
The Problem
Right after ahf_setup completes, I can see that OSWatcher (oswbb) isn’t running:
[oracle@server4 ~]$ tfactl toolstatus .------------------------------------------------------------------. | TOOLS STATUS - HOST : server4 | +----------------------+--------------+--------------+-------------+ | Tool Type | Tool | Version | Status | +----------------------+--------------+--------------+-------------+ | Development Tools | orachk | 19.3.0.0.0 | DEPLOYED | | | oratop | 14.1.2 | DEPLOYED | +----------------------+--------------+--------------+-------------+ | Support Tools Bundle | darda | 2.10.0.R6036 | DEPLOYED | | | oswbb | 8.3.2 | NOT RUNNING | | | prw | 12.1.13.11.4 | NOT RUNNING | +----------------------+--------------+--------------+-------------+ | TFA Utilities | alertsummary | 19.3.0.0.0 | DEPLOYED | | | calog | 19.3.0.0.0 | DEPLOYED | | | dbcheck | 18.3.0.0.0 | DEPLOYED | | | dbglevel | 19.3.0.0.0 | DEPLOYED | | | grep | 19.3.0.0.0 | DEPLOYED | | | history | 19.3.0.0.0 | DEPLOYED | | | ls | 19.3.0.0.0 | DEPLOYED | | | managelogs | 19.3.0.0.0 | DEPLOYED | | | menu | 19.3.0.0.0 | DEPLOYED | | | param | 19.3.0.0.0 | DEPLOYED | | | ps | 19.3.0.0.0 | DEPLOYED | | | pstack | 19.3.0.0.0 | DEPLOYED | | | summary | 19.3.0.0.0 | DEPLOYED | | | tail | 19.3.0.0.0 | DEPLOYED | | | triage | 19.3.0.0.0 | DEPLOYED | | | vi | 19.3.0.0.0 | DEPLOYED | '----------------------+--------------+--------------+-------------' Note :- DEPLOYED : Installed and Available - To be configured or run interactively. NOT RUNNING : Configured and Available - Currently turned off interactively. RUNNING : Configured and Available. [oracle@server4 ~]$ ps -ef | grep -i osw oracle 14567 13566 0 19:34 pts/3 00:00:00 grep --color=auto -i osw
Please refrain from simply starting OSWatcher as root or in any other way in fact. This can have undesired consequences, as outlined in the Autonomous Health Framework documentation. It has the ability to mess up file system permissions in the OSWatcher archive directory which is really, really hard to recover from. Plus it can also change your configuration.
Troubleshooting
The reason for OSWatcher not starting can be found in the system’s journal:
[root@server4 ~]# systemctl status oracle-tfa ● oracle-tfa.service - Oracle Trace File Analyzer Loaded: loaded (/etc/systemd/system/oracle-tfa.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2020-04-23 19:30:29 BST; 8min ago Main PID: 9697 (init.tfa) ... Apr 23 19:36:14 server4 init.tfa[9697]: . . . Apr 23 19:36:14 server4 init.tfa[9697]: Successfully stopped TFA.. Apr 23 19:36:14 server4 init.tfa[9697]: Starting TFA.. Apr 23 19:36:15 server4 init.tfa[9697]: Starting TFA out of init, Should be running in 10 seconds Apr 23 19:36:15 server4 init.tfa[9697]: Successfully updated jvmXmx to 128 in TFA... Apr 23 19:36:22 server4 init.tfa[9697]: OSWatcher is already deployed at /opt/oracle.ahf/tfa/ext/oswbb Apr 23 19:36:22 server4 init.tfa[9697]: Cannot find valid Non root user to run OSWatcher
The last line wasn’t expected: “cannot find valid Non root user to run OSWatcher”.
Digging around a bit I noticed that OSWatcher doesn’t recognise the database home properly before firing the error message you just read. A long story short, TFA uses a file named tfa_setup.txt containing information about the system it’s running on when starting. In my case the file could be found in $TFA_DATA_DIR/$(hostname)/tfa/tfa_setup.txt.
This is what it looks like right now, the exact location where to find it depends on the AHF data dir you specified during the installation.
[root@server4 ~]# cat /opt/oracle.ahf/data/server4/tfa/tfa_setup.txt INSTALL_TYPE=TYPICAL PERL=/bin/perl TFA_BASE=/tmp JAVA_HOME=/opt/oracle.ahf/jre DAEMON_OWNER=root ENV_TYPE=TYPICAL SIM_ENV_TYPE=none SIMATP=FALSE NODE_TYPE=TYPICAL CRS_ACTIVE_VERSION= [root@server4 ~]#
I compared the file with a RAC system where OSWatcher just worked, and noticed the absence of an RDBMS home. I added the RDBMS home entry to the file:
[root@server4 ~]# cat /opt/oracle.ahf/data/server4/tfa/tfa_setup.txt INSTALL_TYPE=TYPICAL PERL=/bin/perl TFA_BASE=/tmp JAVA_HOME=/opt/oracle.ahf/jre DAEMON_OWNER=root ENV_TYPE=TYPICAL SIM_ENV_TYPE=none SIMATP=FALSE NODE_TYPE=TYPICAL CRS_ACTIVE_VERSION= RDBMS_ORACLE_HOME=/u01/app/oracle/product/19.0.0/dbhome_1|| [root@server4 ~]#
After which I restarted the TFA service, and voila! I now have a working OSWatcher configuration!
[oracle@server4 ~]$ tfactl toolstatus .------------------------------------------------------------------. | TOOLS STATUS - HOST : server4 | +----------------------+--------------+--------------+-------------+ | Tool Type | Tool | Version | Status | +----------------------+--------------+--------------+-------------+ | Development Tools | orachk | 19.3.0.0.0 | DEPLOYED | | | oratop | 14.1.2 | DEPLOYED | +----------------------+--------------+--------------+-------------+ | Support Tools Bundle | darda | 2.10.0.R6036 | DEPLOYED | | | oswbb | 8.3.2 | RUNNING | | | prw | 12.1.13.11.4 | NOT RUNNING | +----------------------+--------------+--------------+-------------+ | TFA Utilities | alertsummary | 19.3.0.0.0 | DEPLOYED | | | calog | 19.3.0.0.0 | DEPLOYED | | | dbcheck | 18.3.0.0.0 | DEPLOYED | | | dbglevel | 19.3.0.0.0 | DEPLOYED | | | grep | 19.3.0.0.0 | DEPLOYED | | | history | 19.3.0.0.0 | DEPLOYED | | | ls | 19.3.0.0.0 | DEPLOYED | | | managelogs | 19.3.0.0.0 | DEPLOYED | | | menu | 19.3.0.0.0 | DEPLOYED | | | param | 19.3.0.0.0 | DEPLOYED | | | ps | 19.3.0.0.0 | DEPLOYED | | | pstack | 19.3.0.0.0 | DEPLOYED | | | summary | 19.3.0.0.0 | DEPLOYED | | | tail | 19.3.0.0.0 | DEPLOYED | | | triage | 19.3.0.0.0 | DEPLOYED | | | vi | 19.3.0.0.0 | DEPLOYED | '----------------------+--------------+--------------+-------------' Note :- DEPLOYED : Installed and Available - To be configured or run interactively. NOT RUNNING : Configured and Available - Currently turned off interactively. RUNNING : Configured and Available. [oracle@server4 ~]$ ps -ef | grep -i osw oracle 19770 1 0 19:43 ? 00:00:00 /bin/sh ./OSWatcher.sh 30 48 NONE /opt/oracle.ahf/data/repository/suptools/server4/oswbb/oracle/archive oracle 19944 19770 0 19:44 ? 00:00:00 /bin/sh ./OSWatcherFM.sh 48 /opt/oracle.ahf/data/repository/suptools/server4/oswbb/oracle/archive oracle 22672 19770 0 19:47 ? 00:00:00 /bin/sh ./vmsub.sh /opt/oracle.ahf/data/repository/suptools/server4/oswbb/oracle/archive/oswvmstat/server4_vmstat_20.04.23.1900.dat vmstat 1 3 1 oracle 22674 19770 0 19:47 ? 00:00:00 /bin/sh ./mpsub.sh /opt/oracle.ahf/data/repository/suptools/server4/oswbb/oracle/archive/oswmpstat/server4_mpstat_20.04.23.1900.dat mpstat -P ALL 1 2 1 oracle 22682 19770 0 19:47 ? 00:00:00 /bin/sh ./iosub.sh /opt/oracle.ahf/data/repository/suptools/server4/oswbb/oracle/archive/oswiostat/server4_iostat_20.04.23.1900.dat iostat -xk 1 3 1 oracle 22686 19770 0 19:47 ? 00:00:00 /bin/sh ./piddsub.sh /opt/oracle.ahf/data/repository/suptools/server4/oswbb/oracle/archive/oswpidstatd/server4_pidstatd_20.04.23.1900.dat 1 oracle 22692 19770 0 19:47 ? 00:00:00 /bin/sh ./pidsub.sh /opt/oracle.ahf/data/repository/suptools/server4/oswbb/oracle/archive/oswpidstat/server4_pidstat_20.04.23.1900.dat pidstat 1 oracle 22695 19770 0 19:47 ? 00:00:00 /bin/sh ./nfssub.sh /opt/oracle.ahf/data/repository/suptools/server4/oswbb/oracle/archive/oswnfsiostat/server4_nfsiostat_20.04.23.1900.dat nfsiostat 1 3 1 oracle 22705 19770 0 19:47 ? 00:00:00 /bin/sh ./xtop.sh /opt/oracle.ahf/data/repository/suptools/server4/oswbb/oracle/archive/oswtop/server4_top_20.04.23.1900.dat 1 oracle 22730 13566 0 19:47 pts/3 00:00:00 grep --color=auto -i osw [oracle@server4 ~]$
I don’t know what went wrong with my environment, but I do know this happens consistently with my single instance deployments. I’ll raise a service request with support to see if they can get the problem fixed.
Happy Troubleshooting!