Tag Archives: OSWatcher

OSWatcher, Tracefile Analyzer, and Oracle 12.2 single instance

I have previously written about TFA, OSWatcher et all for Oracle 12.1. Since then, a lot of things have happened and I had an update for 12.2 on my to-do list for far too long. Experience teaches me that references to support notes and official documentation get out of date rather quickly, so as always, if you find anything that changed please let me know via the comments section and I’ll update the post.

This is going to be a 3 part mini-series to save you having to go over 42 pages of text … In this first part I’m going to have a look at single instance Oracle. In part 2 I’ll have a look at Oracle Restart environments, and finally in part 3 I’ll finish the series by looking at a 12.2 RAC system.

The environment

I am using a small VM to install Oracle 12.2.0.1.0 (initially) on Oracle Linux 7.4 with kernel UEK4. As always, my EE database binaries go into /u01/app/oracle/product/12.2.0.1/dbhome_1.

The installation/testing (and reporting) of my findings are following this approach:

  • Install the O/S
  • Install Oracle 12.2.0.1.0 EE
  • Create an EE database (not shown here)
  • Patch binaries and database to 12.2.0.1.180116
  • Upgrade TFA to 12.2.1.3.1 as downloaded from My Oracle Support DOC ID 1513912.1

These were the current versions at the time of writing.

Install Oracle 12.2.0.1.0

The first step after the O/S is provisioned is to install the Oracle software, obviously. I have noticed that TFA is part of the Oracle binaries. Towards the end of the installation process, you are prompted to execute root.sh, as normal. On my system, root.sh had the following contents:

      1 #!/bin/sh
      2 unset WAS_ROOTMACRO_CALL_MADE
      3 . /u01/app/oracle/product/12.2.0.1/dbhome_1/install/utl/rootmacro.sh "$@"
      4 . /u01/app/oracle/product/12.2.0.1/dbhome_1/install/utl/rootinstall.sh
      5 /u01/app/oracle/product/12.2.0.1/dbhome_1/suptools/tfa/release/tfa_home/install/roottfa.sh
      6 /u01/app/oracle/product/12.2.0.1/dbhome_1/install/root_schagent.sh
      7 
      8 #
      9 # Root Actions related to network
     10 #
     11 /u01/app/oracle/product/12.2.0.1/dbhome_1/network/install/sqlnet/setowner.sh
     12 
     13 #
     14 # Invoke standalone rootadd_rdbms.sh
     15 #
     16 /u01/app/oracle/product/12.2.0.1/dbhome_1/rdbms/install/rootadd_rdbms.sh
     17 
     18 /u01/app/oracle/product/12.2.0.1/dbhome_1/rdbms/install/rootadd_filemap.sh

After a few variables are set/defined by sourcing in files created during the installation, roottfa.sh is called (see line 5). It allows you to configure TFA to run as a background (daemon) process. I decided to go with that option after consulting chapter 4 in the 12.2 Autonomous Health Framework documentation and reading about the advantages of using TFA as a daemon. This may or may not be the right way to run TFA for you, the documentation is really good and helps you decide. Here is the transcript of my root.sh execution:

[root@server5 ~]#  /u01/app/oracle/product/12.2.0.1/dbhome_1/root.sh
Performing root user operation.

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u01/app/oracle/product/12.2.0.1/dbhome_1

Enter the full pathname of the local bin directory: [/usr/local/bin]:
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...


Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Do you want to setup Oracle Trace File Analyzer (TFA) now ? yes|[no] :
yes
Installing Oracle Trace File Analyzer (TFA).
Log File: /u01/app/oracle/product/12.2.0.1/dbhome_1/install/root_server5_2018-01-22_17-21-41-005116657.log
Finished installing Oracle Trace File Analyzer (TFA)

Once that message is shown, TFA is configured and controlled via a systemd unit file:

[root@server5 ~]# systemctl cat oracle-tfa
# /etc/systemd/system/oracle-tfa.service
# Copyright (c) 2014, Oracle and/or its affiliates. All rights reserved.
#
# Oracle TFA startup
#
[Unit]
Description=Oracle Trace File Analyzer
After=syslog.target
[Service]
ExecStart=/etc/init.d/init.tfa run >/dev/null 2>&1 </dev/null
Type=simple
Restart=always

[Install]
WantedBy=multi-user.target graphical.target

The service is enabled and running.

After the completion of roottfa.sh, TFA resides in $ORACLE_BASE/tfa and its subdirectories. This is documented in the 12.2 Autonomous Health Framework chapter 4.2.3 and has an interesting implication: if you set your environment using oraenv, you might find that you get errors invoking tfactl, such as these on my VM. I have used a “minimum install” for my operating system and quite specifically didn’t add any additional perl modules in my kickstart file. Now, when invoking tfactl after having set my environment using oraenv, I find that there are missing perl modules in my system’s perl installation:

[oracle@server5 ~]$ . oraenv
ORACLE_SID = [NCDB] ? NCDB
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@server5 ~]$ tfactl status
Can't locate Digest/MD5.pm in @INC (@INC contains: 
/usr/local/lib64/perl5 /usr/local/share/perl5 
/usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl
/usr/lib64/perl5 /usr/share/perl5 . /u01/app/oracle/tfa/server5/tfa_home/bin 
/u01/app/oracle/tfa/server5/tfa_home/bin/common /u01/app/oracle/tfa/server5/tfa_home/bin/modules 
/u01/app/oracle/tfa/server5/tfa_home/bin/common/exceptions) 
at /u01/app/oracle/tfa/server5/tfa_home/bin/common/tfactlshare.pm line 7628.
BEGIN failed--compilation aborted at /u01/app/oracle/tfa/server5/tfa_home/bin/common/tfactlshare.pm line 7628.
Compilation failed in require at /u01/app/oracle/tfa/server5/tfa_home/bin/tfactl.pl line 223.
BEGIN failed--compilation aborted at /u01/app/oracle/tfa/server5/tfa_home/bin/tfactl.pl line 223.

The output has been changed for readability (originally I was missing Data::Dumper as well). After studying the documentation (still section 4.2.3 in the aforementioned document), it turns out to be a user mistake. As I said before, after TFA is configured using roottfa.sh as part of the root.sh script execution, it runs in daemon mode and crucially, is available from $ORACLE_BASE/tfa. I found that location being referred to in /etc/init.d/init.tfa as well. When I simply typed “tfactl” into my terminal window, I invoked a different “tfactl”. There is a lot more to be said about this, and I will try and do so in a different post.

NB: the same section 4.2.3 in the documentation states that even should you not run TFA in daemon mode, you can still make use of “user mode TFA” in $ORACLE_HOME, although there are certain restrictions. I haven’t pursued that route.

Anyway, after switching to the location where TFA is actually installed ($ORACLE_BASE/tfa), all is well. It seems that running roottfa.sh creates a new “Oracle” perl:

[root@server5 ~]# find /u01/app/oracle/tfa -name perl
/u01/app/oracle/tfa/server5/tfa_home/perl
/u01/app/oracle/tfa/server5/tfa_home/perl/bin/perl

I found Digest::MD5 and Data::Dumper in /u01/app/oracle/tfa/server5/tfa_home/perl/lib/5.22.0/x86_64-linux-thread-multi.

So let’s try and get the status of the current installation from $ORACLE_BASE/tfa:

[oracle@server5 ~]$ /u01/app/oracle/tfa/bin/tfactl status

Access Denied: Only TFA Admin can run this command

Nearly there: the perl modules are no longer reported to be missing, the “Oracle” perl installation appears to be used now. But what about this error message? I read in section 4.2.4 “Securing Access to Oracle Trace File Analyzer” (still referring to the Autonomous Health Framework manual) that access to TFA is restricted. However, the RDBMS owner should have been granted access automatically.

Using the commands shown in the manual I checked permissions and it turns out that the oracle user is configured to have access to TFA.

[root@server5 ~]# /u01/app/oracle/tfa/bin/tfactl access lsusers
.---------------------------------.
|       TFA Users in server5      |
+-----------+-----------+---------+
| User Name | User Type | Status  |
+-----------+-----------+---------+
| oracle    | USER      | Allowed |
'-----------+-----------+---------'

In fact, I can query TFA’s status using the “print status” command as oracle (/u01/app/oracle/tfa/bin/tfactl print status). I compared the output of “tfactl -help” between oracle and root, and there are more options available when running as root. This might explain the above error.

What is the status now?

TFA is now set up and working, but using the base release:

[root@server5 ~]# /u01/app/oracle/tfa/bin/tfactl status

.------------------------------------------------------------------------------------------------.
| Host    | Status of TFA | PID   | Port  | Version    | Build ID             | Inventory Status |
+---------+---------------+-------+-------+------------+----------------------+------------------+
| server5 | RUNNING       | 18786 | 41482 | 12.2.1.0.0 | 12210020161122170355 | COMPLETE         |
'---------+---------------+-------+-------+------------+----------------------+------------------'

It should probably be patched to something more recent. I’ll try that in 2 ways: first by applying the January 2018 RU to see if the version changes. Since the standard deployment doesn’t come with OSWatcher which I’m particularly interested in, I’ll download and apply TFA 12.2.1.3.1 next. As with all patching, I need to make sure that I have working backups which I’m comfortable restoring should anything go badly wrong.

Status after applying the January RU

A combination of opatch/datapatch later, my system is on the latest RU patchlevel:

[oracle@server5 OPatch]$ opatch lspatches
27105253;Database Release Update : 12.2.0.1.180116 (27105253)

OPatch succeeded.

However, this did not have an effect on the version of TFA in $ORACLE_BASE:

[root@server5 ~]# systemctl restart oracle-tfa
[root@server5 ~]# /u01/app/oracle/tfa/bin/tfactl status

.------------------------------------------------------------------------------------------------.
| Host    | Status of TFA | PID   | Port  | Version    | Build ID             | Inventory Status |
+---------+---------------+-------+-------+------------+----------------------+------------------+
| server5 | RUNNING       | 24042 | 37226 | 12.2.1.0.0 | 12210020161122170355 | COMPLETE         |
'---------+---------------+-------+-------+------------+----------------------+------------------'

Not quite what I expected after reading the docs: the installation of the latest RU should have updated TFA as well. But maybe I got something wrong on my end. The RU readme did not have any reference to TFA that I could find.

Yet it doesn’t matter: I wanted to have all the great support tools anyway (and they aren’t shipped with “stock TFA”), so it was time to install the latest version from MOS.

Upgrading TFA using 12.2.1.3.1 (MOS)

The patch is quite simple and well documented. If TFA is up and running in daemon mode as in my example, the patching tool will recognise that fact and patch the installation in-place. After a couple of minutes on my VM, I have a new version:

[root@server5 stage]# /u01/app/oracle/tfa/bin/tfactl status

.------------------------------------------------------------------------------------------------.
| Host    | Status of TFA | PID   | Port  | Version    | Build ID             | Inventory Status |
+---------+---------------+-------+-------+------------+----------------------+------------------+
| server5 | RUNNING       | 28105 | 39100 | 12.2.1.3.1 | 12213120171215143839 | COMPLETE         |
'---------+---------------+-------+-------+------------+----------------------+------------------'

The MOS version comes with lots of useful tools as well:

[oracle@server5 stage]$ /u01/app/oracle/tfa/bin/tfactl toolstatus

.------------------------------------------------------------------.
|                   TOOLS STATUS - HOST : server5                  |
+----------------------+--------------+--------------+-------------+
| Tool Type            | Tool         | Version      | Status      |
+----------------------+--------------+--------------+-------------+
| Development Tools    | orachk       |   12.2.0.1.3 | DEPLOYED    |
|                      | oratop       |       14.1.2 | DEPLOYED    |
+----------------------+--------------+--------------+-------------+
| Support Tools Bundle | darda        | 2.10.0.R6036 | DEPLOYED    |
|                      | oswbb        |        8.1.2 | RUNNING     |
|                      | prw          | 12.1.13.11.4 | NOT RUNNING |
+----------------------+--------------+--------------+-------------+
| TFA Utilities        | alertsummary |   12.2.1.1.0 | DEPLOYED    |
|                      | calog        |   12.2.0.1.0 | DEPLOYED    |
|                      | changes      |   12.2.1.1.0 | DEPLOYED    |
|                      | dbglevel     |   12.2.1.1.0 | DEPLOYED    |
|                      | events       |   12.2.1.1.0 | DEPLOYED    |
|                      | grep         |   12.2.1.1.0 | DEPLOYED    |
|                      | history      |   12.2.1.1.0 | DEPLOYED    |
|                      | ls           |   12.2.1.1.0 | DEPLOYED    |
|                      | managelogs   |   12.2.1.1.0 | DEPLOYED    |
|                      | menu         |   12.2.1.1.0 | DEPLOYED    |
|                      | param        |   12.2.1.1.0 | DEPLOYED    |
|                      | ps           |   12.2.1.1.0 | DEPLOYED    |
|                      | pstack       |   12.2.1.1.0 | DEPLOYED    |
|                      | summary      |   12.2.1.1.0 | DEPLOYED    |
|                      | tail         |   12.2.1.1.0 | DEPLOYED    |
|                      | triage       |   12.2.1.1.0 | DEPLOYED    |
|                      | vi           |   12.2.1.1.0 | DEPLOYED    |
'----------------------+--------------+--------------+-------------'

Note :-
  DEPLOYED    : Installed and Available - To be configured or run interactively.
  NOT RUNNING : Configured and Available - Currently turned off interactively.
  RUNNING     : Configured and Available.

[oracle@server5 stage]$ 

Since I care a lot about OSWatcher, I was very pleased to see it running.

[oracle@server5 stage]$ ps -ef | grep -i osw
oracle   28344     1  0 10:58 ?        00:00:00 /bin/sh ./OSWatcher.sh 30 48 NONE /u01/app/oracle/tfa/repository/suptools/server5/oswbb/oracle/archive
oracle   28934 28344  0 10:58 ?        00:00:00 /bin/sh ./OSWatcherFM.sh 48 /u01/app/oracle/tfa/repository/suptools/server5/oswbb/oracle/archive
oracle   30662 27252  0 11:01 pts/4    00:00:00 grep --color=auto -i osw
[oracle@server5 stage]$ 

Kindly refer to the documentation for more information about TFA. It’s quite a collection of tools, and it helps you in so many ways…

Advertisement

OSWatcher integration in Trace File Analyzer (TFA)

Some time ago I wrote a post about using OSWatcher for system analysis. Neil Chandler (@ChandlerDBA) rightfully pointed out that although OSWatcher on its own was all right, TFA was the way to go. TFA can include OSWatcher, but more importantly it adds a lot of value over and above what OSWatcher does.

Update

Since I published this post in 2016 a lot has happened, requiring a rewrite of the post in April 2020. The most important change for users is the merge of TFA into the new Autonomous Health Framework (AHF). You download AHF from My Oracle Support Doc ID 2550798.1.

Should I use OSWatcher standalone or as part of AHF?

I guess it depends on what you want to do-I still think that OSWatcher is a good starting point and enough for most problems on single instance systems provided you don’t forget to add OSWatcher to systemd to start when the system boots up.

When it comes to clustered environments or Oracle Restart, AHF looks a lot more appealing.

This post covers AHF 20.1.2 for Linux, the current version at the time of writing. The screen output has been generated on Oracle Linux 7.8 running in a KVM VM. A single-instance lab database is creating some noise.

What is TFA?

TFA (now part of AHF) is a tool which – among other things – helps you gather information about incidents. For clustered environment, it can do so across all the nodes. If you ever worked on Exadata half-racks or other clusters with more than 4 nodes you will quickly start to appreciate having to use one tool for this task. The TFA output is suitable for attaching to a Service Request which should, at least in theory, help speed up the problem resolution.

As an added benefit you get a lot of tools that were previously known as “RAC and DB Support Tools Bundle”. This includes OSWatcher as well, the reason for this post.

Running OSWatcher as part of TFA has one key benefit: you don’t have to worry about starting OSWatcher when booting. TFA is started via a systemd unit file in Oracle Linux 7. You can check its status using the standard systemd commands suite, as shown here:

[root@server5 ahf]# systemctl status oracle-tfa
● oracle-tfa.service - Oracle Trace File Analyzer
   Loaded: loaded (/etc/systemd/system/oracle-tfa.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2020-04-23 14:34:13 BST; 1min 17s ago
 Main PID: 11095 (init.tfa)
   CGroup: /system.slice/oracle-tfa.service
           ├─11095 /bin/sh /etc/init.d/init.tfa run >/dev/null 2>&1 

Support Tools Bundle

Earlier I said you benefit from the latest Support Tools Bundle when installing AHF. At the time of writing, this included the following tools:

[root@server5 ahf]# /opt/oracle.ahf/bin/tfactl status

.------------------------------------------------------------------------------------------------.
| Host    | Status of TFA | PID   | Port  | Version    | Build ID             | Inventory Status |
+---------+---------------+-------+-------+------------+----------------------+------------------+
| server5 | RUNNING       | 16762 | 25805 | 20.1.2.0.0 | 20120020200403113404 | COMPLETE         |
'---------+---------------+-------+-------+------------+----------------------+------------------'

[root@server5 ahf]# /opt/oracle.ahf/bin/tfactl toolstatus

.------------------------------------------------------------------.
|                   TOOLS STATUS - HOST : server5                  |
+----------------------+--------------+--------------+-------------+
| Tool Type            | Tool         | Version      | Status      |
+----------------------+--------------+--------------+-------------+
| Development Tools    | orachk       |   19.3.0.0.0 | DEPLOYED    |
|                      | oratop       |       14.1.2 | DEPLOYED    |
+----------------------+--------------+--------------+-------------+
| Support Tools Bundle | darda        | 2.10.0.R6036 | DEPLOYED    |
|                      | oswbb        |        8.3.2 | RUNNING     |
|                      | prw          | 12.1.13.11.4 | NOT RUNNING |
+----------------------+--------------+--------------+-------------+
| TFA Utilities        | alertsummary |   19.3.0.0.0 | DEPLOYED    |
|                      | calog        |   19.3.0.0.0 | DEPLOYED    |
|                      | dbcheck      |   18.3.0.0.0 | DEPLOYED    |
|                      | dbglevel     |   19.3.0.0.0 | DEPLOYED    |
|                      | grep         |   19.3.0.0.0 | DEPLOYED    |
|                      | history      |   19.3.0.0.0 | DEPLOYED    |
|                      | ls           |   19.3.0.0.0 | DEPLOYED    |
|                      | managelogs   |   19.3.0.0.0 | DEPLOYED    |
|                      | menu         |   19.3.0.0.0 | DEPLOYED    |
|                      | param        |   19.3.0.0.0 | DEPLOYED    |
|                      | ps           |   19.3.0.0.0 | DEPLOYED    |
|                      | pstack       |   19.3.0.0.0 | DEPLOYED    |
|                      | summary      |   19.3.0.0.0 | DEPLOYED    |
|                      | tail         |   19.3.0.0.0 | DEPLOYED    |
|                      | triage       |   19.3.0.0.0 | DEPLOYED    |
|                      | vi           |   19.3.0.0.0 | DEPLOYED    |
'----------------------+--------------+--------------+-------------'

Note :-
  DEPLOYED    : Installed and Available - To be configured or run interactively.
  NOT RUNNING : Configured and Available - Currently turned off interactively.
  RUNNING     : Configured and Available.

[root@server5 ahf]# 

Invoking OSwatcher

With OSWatcher running and gathering data, you can invoke it in the usual way:

[oracle@server5 ~]$ tfactl oswbb -h

Usage : /opt/oracle.ahf/tfa/bin/tfactl.pl [run] oswbb [ | -since n[mhd] ]

Options:

-since n[mhd] Run OSWatcher analyzer for last n [m]inutes or [h]ours or [d]ays.

: -P  -L  -6 -7 -8 -B  -E  -A
     -P   User specified name of the html profile generated
                        by oswbba. This overrides the oswbba automatic naming
                        convention for html profiles. All profiles
                        whether user specified named or auto generated
                        named will be located in the /profile directory.

     -A  Same as option A from the menu. Will generate
                        an analysis report in the /analysis directory or
                        user can also specify the name of the analysis file
                        by specifying full qualified path name of file.
                        The "A" option can not be used together with the
                        "S" option.
     -S <>              Will generate an analysis of a subset of the data
                        in the archive directory. This option must be used
                        together with the -b and -e options below. See the
                        section "Specifying the begin/end time of the analysis"
                        above. The "S" option can not be used together with
                        the "A" option.

     -START   Used with the analysis option to specify the first
                        file located in the oswvmstat directory to analyze.

     -STOP    Used with the analysis option to specify the last
                        file located in the oswvmstat directory to analyze.

     -b     Used with the -S option to specify the begin time
                        of the analysis period. Example format:
                        -b Jan 09 13:00:00 2013

     -e       Used with the -S option to specify the end time
                        of the analysis period. Example format:
                        -e Jan 09 13:15:00 2013

     -L  User specified location of an existing directory
                        to place any gif files generated
                        by oswbba. This overrides the oswbba automatic
                        convention for placing all gif files in the
                        /gif directory. This directory must pre-exist!
     -6                 Same as option 6 from the menu. Will generate
                        all cpu gif files.


     -7                 Same as option 7 from the menu. Will generate
                        all memory gif files.

     -8                 Same as option 8 from the menu. Will generate
                        all disk gif files.



     -NO_IOSTAT         Ignores files in the oswiostat directory from
                        analysis

     -NO_TOP            Ignores files in the oswtop directory from
                        analysis

     -NO_NETSTAT        Ignores files in the oswnetstat directory from
                        analysis

     -NO_PS             Ignores files in the oswps directory from
                        analysis

     -MEM_ALL           Analyzes virtual and resident memory allocations
                        for all processes. This is very resource intensive.

     -NO_Linux          Ignores files in the oswmeminfo directory from
                        analysis

e.g:
   /opt/oracle.ahf/tfa/bin/tfactl.pl oswbb
   /opt/oracle.ahf/tfa/bin/tfactl.pl oswbb -since 2h

   /opt/oracle.ahf/tfa/bin/tfactl.pl run oswbb
   /opt/oracle.ahf/tfa/bin/tfactl.pl run oswbb -since 2h 

Summary

TFA really is a very useful tool, and this is not only due to the integration of OSWatcher. A lot of useful information that is beyond the scope of this article is available, and the search function is quite invaluable when trying to hunt down problems in your cluster.

Using OSWatcher for system diagnostics

OSWatcher is a superb tool that gathers information about your system in the background and stores it in an (optionally compressed) archive directory. As an Oracle DBA I like the analogy with statspack: you make the tool available on the host in a location with – very important – enough available disk space and then start it. Most users add it to the startup mechanism their O/S uses- SysV init, upstart, or systemd for example on Linux to allow it to start in the background. OSWatcher will then gather a lot of the interesting O/S related statistics that you so desperately need in an “after the fact” situation. There are plenty of reasons where you might want that information.

Update

This article covers OSWatcher standalone. There are potentially better ways of doing so. You may want to refer to My Oracle Support and grab the installer for the new Autonomous Health Framework (AHF). It contains OSWatcher, as well as Trace File Analyzer (TFA).

Example use cases

Say for example the system experienced an outage-a RAC node rebooted. The big question is “why?”. There might be information in the usual Grid Infrastructure logs but they might be inconclusive and a look at the O/S can be necessary. The granularity SAR offers is often not enough, you are bound to loose detail information in the 10 minute average. What if you had 30 second granularity to see what happened more or less right before the crash?

Or maybe you want to see the system’s activity during testing for a new software release? And then compare to yesterday’s test cycle? The use cases are endless, and I’m sure you can think of one, too.

ExaWatcher as the role model

In Expert Oracle Exadata (both editions) the authors covered ExaWatcher quite extensively. ExaWatcher is (not really surprisingly) OSWatcher for Exadata, or at least sort of. And it’s there by default on cells and compute nodes. I have always taken the stance that if something is implemented in Exadata then someone must have spent a few brain cells working out why it’s a good idea to deploy that tool to every Exadata in the world. That’s a big responsibility to have and you’d hope that there was some serious thinking behind decisions to deploy such a tool.

OSWatcher deployment

I deploy OSWatcher by default on every VM I create in my lab. It’s part of the automated build process I use. That ensures that I’ll have a log of recent O/S information whenever I need it. Again the analogy to Statspack: just like OSWatcher Statspack isn’t installed by default and has to be created manually. In so many cases however it was deployed after the problem occurred. It could be that last night’s problems don’t re-appear the following execution cycle. It might be better to have OSWatcher deployed and recording information proactively.

Deploying OSWatcher is dead simple-get it from MOS and copy the tarball to the host to monitor. In this post I am using Oracle Linux 7.1, at first without the compatibility tools for the network stack. I customarily use the minimal installation that doesn’t come with net-tools. What are the net-tools again? They are the tools we have come to love over the years:

[oracle@rac12node2 ~]$ yum info net-tools
Available Packages
Name        : net-tools
Arch        : x86_64
Version     : 2.0
Release     : 0.17.20131004git.el7
Size        : 303 k
Repo        : local
Summary     : Basic networking tools
URL         : http://sourceforge.net/projects/net-tools/
Licence     : GPLv2+
Description : The net-tools package contains basic networking tools,
            : including ifconfig, netstat, route, and others.
            : Most of them are obsolete. For replacement check iproute package.

But they are obsolete. You could argue that it might be just the time to get used to iproute as it’s the future, but for OSWatcher on OL 7 net-tools are needed and I installed them. If you don’t OSWatcher won’t abort but can’t collect information netstat and ifconfig provide.

In case you are using RAC you need to make OSWatcher aware of the private interconnect-this is detailed in the OSWatcher user’s guide.

Starting OSWatcher

Once the tarball is unzipped in a location of your choice you need to start OSWatcher. The script startOSWbb.sh is responsible for starting the tool, and it can take a number of arguments that are explained in the shell script for convenience and the user guide available from MOS. The first parameter is used to set the snapshot interval in seconds. The second indicates the archive duration in hours (eg how long the information should be stored) and the optional third which compression tool to use for compressing the raw data. Parameters 2 and 3 have direct implications to space usage. The optional fourth parameter is used to set a custom location for the archive directory that will contain the data collected.

I want to have 30 second snapshot intervals and retain data for 2 days or 48 hours. Be aware that using a tool such as compress or gzip as the third argument will save space by compressing older files, but when you are trying to use the analyser (more on that later) then you’ll have to decompress the files first. Clever use of the find (1) command can help you uncompressing only the raw data files you need.

Warning

Depending on the system you monitor and the amount of activity you might end up generating quite a lot of data. And by a lot I mean it! The location you choose for storing the archive directory must be independent of anything important. In other words, should the location fill up despite all the efforts you put in to prevent that from happening, it must not have an impact on availability. It is imperative to keep a keen eye on space usage. You certainly don’t want to fill up important mount points with performance information.

Starting OSWatcher from the command line

With that said it’s time to start the tool:

[oracle@rac12node1 oswbb]$ ./startOSWbb.sh 30 48 gzip
[oracle@rac12node1 oswbb]$ Info...Zip option IS specified.
Info...OSW will use gzip to compress files.
Setting the archive log directory to/some/mount/point/oswbb/archive

Testing for discovery of OS Utilities...
VMSTAT found on your system.
IOSTAT found on your system.
MPSTAT found on your system.
IFCONFIG found on your system.
NETSTAT found on your system.
TOP found on your system.
Warning... /proc/slabinfo not found on your system.

Testing for discovery of OS CPU COUNT
oswbb is looking for the CPU COUNT on your system
CPU COUNT will be used by oswbba to automatically look for cpu problems

CPU COUNT found on your system.
CPU COUNT = 2

Discovery completed.

Starting OSWatcher v7.3.3  on Wed Dec 16 10:02:49 GMT 2015
With SnapshotInterval = 30
With ArchiveInterval = 48

OSWatcher - Written by Carl Davis, Center of Expertise,
Oracle Corporation
For questions on install/usage please go to MOS (Note:301137.1)

...

Data is stored in directory: /some/mount/point/oswbb/archive

Starting Data Collection...

oswbb heartbeat:Wed Dec 16 10:02:54 GMT 2015 

Launching OSWatcher from the command line is probably the exception, most users will start OSW during the boot process as part of the multi-user runlevel.

Note that if you haven’t installed net-tools on Oracle Linux 7 you will see errors when starting OSWatcher as it can’t find netstat and ifconfig. If memory serves me right then you get net-tools with previous releases of Oracle Linux by default so this is OL 7 specific.

You will also notice an error that /proc/slabinfo does not exist. It does exist, but the permissions have changed to 0400 and the file is owned by root:root. Not having slab information is not a problem for me, I always struggle to make sense of it anyway.

What it does

The OSWatcher daemon now happily monitors my system and places information into the archive directory:

[oracle@rac12node1 oswbb]$ ls -l archive/
total 40
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswifconfig
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswiostat
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswmeminfo
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswmpstat
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswnetstat
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswprvtnet
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswps
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswslabinfo
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswtop
drwxr-xr-x. 2 oracle oinstall 4096 Dec 16 10:02 oswvmstat

There is a directory per tool used – ifconfig, iostat, meminfo, …. , vmstat. Inside the directory you find the output. The file format is plain text (optionally compressed if not the current file if you specified gzip or compress as the third parameter to the start script), and each snapshot is indicated by a line starting zzz. Here is an example for iostat taken while swingbench was running.

...

zzz ***Wed Dec 16 10:32:30 GMT 2015
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.23    0.00    7.45   76.06    0.00    4.26

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.00    2.00     0.00     4.00     4.00     0.24  128.00    0.00  128.00 121.00  24.20
vdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdd               0.00     0.00    2.00    1.00     1.00     0.50     1.00     0.11   36.67   33.00   44.00  36.67  11.00
vde               0.00     0.00    2.00    1.00     1.00     0.50     1.00     0.04   14.67    0.00   44.00  14.67   4.40
vdf               0.00     0.00    2.00    1.00     1.00     0.50     1.00     0.10   31.67   25.50   44.00  31.67   9.50
vdg               0.00     0.00   43.00   23.00   400.00    97.00    15.06     3.83   57.05   62.37   47.09  15.15 100.00
vdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.24    0.00    0.00    0.00   0.00  24.20
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdi               0.00     0.00   39.00   23.00   320.00    78.50    12.85     1.11   17.40    2.85   42.09  12.39  76.80

zzz ***Wed Dec 16 10:33:00 GMT 2015
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10.58    0.00    7.94   81.48    0.00    0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00    17.00    0.00    1.00     0.00    72.00   144.00     0.01    1.00    0.00    1.00  11.00   1.10
vdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdc               0.00     0.00   12.00    2.00   192.00    16.00    29.71     0.13    9.57    8.25   17.50   9.57  13.40
vdd               0.00     0.00    2.00    2.00     1.00     0.50     0.75     0.03   14.50    8.50   20.50   7.25   2.90
vde               0.00     0.00    2.00    2.00     1.00     0.50     0.75     0.02   13.25    2.00   24.50   6.00   2.40
vdf               0.00     0.00    2.00    2.00     1.00     0.50     0.75     0.11   34.50   23.00   46.00  27.25  10.90
vdg               0.00     0.00   32.00   31.00   272.00   123.00    12.54     4.96   80.46   90.50   70.10  15.68  98.80
vdh               0.00     0.00    0.00    1.00     0.00     0.00     0.00     0.01   37.00    0.00   37.00   8.00   0.80
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00   19.00     0.00    76.00     8.00     0.03    0.95    0.00    0.95   0.63   1.20
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdi               0.00     0.00   22.00   22.00   184.00    78.50    11.93     1.28   30.45    2.00   58.91  17.23  75.80

zzz ***Wed Dec 16 10:33:30 GMT 2015
...

As you can see there are snapshots every 30 seconds, just as requested. Let’s not focus on the horrendous I/O times-this is a virtualised RAC system on a host that struggles with a CPU bottleneck, and it does not have underlying SSD for the VMs…

You can navigate the archive directory and browse the files you are interested in. They all have timestamps in their names making it easy to identify each file’s contents.

The OSWatcher Analyser

Looking at text files is one way of digesting information. There is another option, the OSWatcher analyser. Here is an example of its use (it requires the DISPLAY variable to be set):

[oracle@rac12node1 oswbb]$ java -jar oswbba.jar -i archive -b "Dec 16 10:25:00 2015" \
&gt; -e "Dec 16 10:35:00 2015" -P swingbench -s

Validating times in the archive...

Scanning file headers for version and platform info...

Parsing file rac12node1_iostat_15.12.16.1000.dat ...

Parsing file rac12node1_vmstat_15.12.16.1000.dat ...

Parsing file rac12node1_netstat_15.12.16.1000.dat ...

Parsing file rac12node1_top_15.12.16.1000.dat ...

Parsing file rac12node1_ps_15.12.16.1000.dat ...

A new analysis file analysis/rac12node1_1450262374129.txt has been created.
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Run_Queue.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Block_Queue.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Cpu_Idle.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Cpu_System.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Cpu_User.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Cpu_Wa.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Cpu_Interrupts.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Context_Switches.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Memory_Swap.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Memory_Free.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_Memory_Page_In_Rate.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_IO_ST.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_IO_RPS.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_IO_WPS.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_IO_PB.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_IO_PBTP_1.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_IO_PBTP_2.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_IO_PBTP_3.gif
Generating file profile/rac12node1_swingbench/OSW_profile_files/OSWg_OS_IO_TPS.gif
[oracle@rac12node1 oswbb]$

In this example I asked the analyser tool to limit the search to specific time ranges and to create a “profile”. Refer to the MOS note about the OSWatcher Analyser for more information about the command line options.

The analyser will create a HTML “Profile” which is then stored in the profile directory as you can see in the last part of the output. If you transfer this to a system with a web-browser you can enjoy the graphical representation of the raw data. Very neat if you want to check on the O/S level if anything unusual might have happened.

Note also how there was a new analysis file created-have a look at it as it can provide very valuable information. In my example the following contents was recorded:

############################################################################
# Contents Of This Report:
#
# Section 1: System Status
# Section 2: System Slowdowns
#   Section 2.1: System Slowdown RCA Process Level Ordered By Impact
# Section 3: System General Findings
# Section 4: CPU Detailed Findings
#   Section 4.1: CPU Run Queue:
#   Section 4.2: CPU Utilization: Percent Busy
#   Section 4.3: CPU Utilization: Percent Sys
# Section 5: Memory Detailed Findings
#   Section 5.1: Memory: Process Swap Queue
#   Section 5.2: Memory: Scan Rate
#   Section 5.3  Memory: Page In:
#   Section 5.4  Memory: Page Tables (Linux only):
#   Section 5.5: Top 5 Memory Consuming Processes Beginning
#   Section 5.6: Top 5 Memory Consuming Processes Ending
# Section 6: Disk Detailed Findings
#   Section 6.1: Disk Percent Utilization Findings
#   Section 6.2: Disk Service Times Findings
#   Section 6.3: Disk Wait Queue Times Findings
#   Section 6.4: Disk Throughput Findings
#   Section 6.5: Disk Reads Per Second
#   Section 6.6: Disk Writes Per Second
#   Section 6.7: Disk Percent CPU waiting on I/O
# Section 7: Network Detailed Findings
#   Section 7.1  Network Data Link Findings
#   Section 7.2: Network IP Findings
#   Section 7.3: Network UDP Findings
#   Section 7.4: Network TCP Findings
# Section 8: Process Detailed Findings
#   Section 8.1: PS Process Summary Ordered By Time
#   Section 8.2: PS for Processes With Status = D or T Ordered By Time
#   Section 8.3: PS for (Processes with CPU &gt; 0) When System Idle CPU &lt; 30% Ordered By Time
#   Section 8.4: Top VSZ Processes Increasing Memory Per Snapshot
#   Section 8.5: Top RSS Processes Increasing Memory Per Snapshot
#
############################################################################

Summary

OSWatcher is a great tool that can be used on Oracle Linux 7 and other supported platforms. It provides a wealth of information both in textual as well as graphical format. It is invaluable in many situations where you need to retrospectively have a look at the state of the O/S in a given period and records lots of useful information.

As always, read the documentation found on MOS and make sure you understand the implications of using the tool. Also make sure you test it thoroughly first. Please ensure that you have sufficient disk space for your archive directory on a mount point that cannot affect the availability of your system.