Little things worth knowing: OSWatcher Analyser Dashboard

I have written a few articles about Tracefile Analyzer (TFA) in the recent past. As you may recall from these posts, a more comprehensive TFA version than the one provided with the installation media is available from My Oracle Support (MOS for short). As part of this version, you get a lot of useful, additional tools, including OSWatcher. I love OSWatcher, simply because it gives me insights that would be very hard to get with sar for example. SAR tends to be the least common denominator on most (proper) operating systems and it’s better than nothing, but please read on and let me explain why I like OSWatcher so much.

Feel free to head back to my earlier posts if you like to get some more details about “stock TFA” and “MOS TFA”. These are terms I coined by the way, you won’t find them in a MOS search.

Say hello to the OSWatcher Analyser dashboard

In this article I’d like to introduce the OSWatcher Analyser dashboard to you. I think it’s probably the best way (certainly a very convenient one) to get an overview of what is going on at the operating system level. And it includes pretty pictures that are so easy to understand! OSWatcher (more precisely, oswbb) should be running on the system as part of “MOS Tracefile Analyzer”. In other words, it has to be installed.

The environment for this post

As always, before beginning the write-up, here are some details about the system I’m using. My virtual RAC system is based on KVM, and I’m running Oracle Linux 7.4 with UEK 4. The database reasonably current at version 18.3.0.0.180717 (but that shouldn’t matter as this post isn’t about the database!), and I have upgraded TFA to 18.3.3.0.0. This is the latest TFA version at the time of writing.

[oracle@rac18pri1 ~]$ tfactl print version
TFA Version : 18.3.3.0.0

[oracle@rac18pri1 ~]$ tfactl toolstatus

.------------------------------------------------------------------.
|                  TOOLS STATUS - HOST : rac18pri1                 |
+----------------------+--------------+--------------+-------------+
| Tool Type            | Tool         | Version      | Status      |
+----------------------+--------------+--------------+-------------+
| Development Tools    | orachk       |   12.2.0.1.3 | DEPLOYED    |
|                      | oratop       |       14.1.2 | DEPLOYED    |
+----------------------+--------------+--------------+-------------+
| Support Tools Bundle | darda        | 2.10.0.R6036 | DEPLOYED    |
|                      | oswbb        |        8.1.2 | RUNNING     |
|                      | prw          | 12.1.13.11.4 | NOT RUNNING |
+----------------------+--------------+--------------+-------------+
| TFA Utilities        | alertsummary |   12.2.1.1.0 | DEPLOYED    |
|                      | calog        |   12.2.0.1.0 | DEPLOYED    |
|                      | dbcheck      |   18.3.0.0.0 | DEPLOYED    |
|                      | dbglevel     |   12.2.1.1.0 | DEPLOYED    |
|                      | grep         |   12.2.1.1.0 | DEPLOYED    |
|                      | history      |   12.2.1.1.0 | DEPLOYED    |
|                      | ls           |   12.2.1.1.0 | DEPLOYED    |
|                      | managelogs   |   12.2.1.1.0 | DEPLOYED    |
|                      | menu         |   12.2.1.1.0 | DEPLOYED    |
|                      | param        |   12.2.1.1.0 | DEPLOYED    |
|                      | ps           |   12.2.1.1.0 | DEPLOYED    |
|                      | pstack       |   12.2.1.1.0 | DEPLOYED    |
|                      | summary      |   12.2.1.1.0 | DEPLOYED    |
|                      | tail         |   12.2.1.1.0 | DEPLOYED    |
|                      | triage       |   12.2.1.1.0 | DEPLOYED    |
|                      | vi           |   12.2.1.1.0 | DEPLOYED    |
'----------------------+--------------+--------------+-------------'

Note :-
  DEPLOYED    : Installed and Available - To be configured or run interactively.
  NOT RUNNING : Configured and Available - Currently turned off interactively.
  RUNNING     : Configured and Available.

[oracle@rac18pri1 ~]$ 

As you can see, oswbb is part of the “Support Tools Bundle” and actively running out of the box.

Apart from OSWatcher.sh it is important for OSWatcherFM.sh to run as well, as it clears the archive directory where OSWatcher stores its results. Of course that doesn’t relieve you from keeping an eye on your filesystem usage numbers :) Both of these are started by default.

Invoking OSWBB

Assuming there is data for analysis (eg OSWatcher has run for a bit), you can invoke oswbb via tfactl. It will go through a somewhat lengthy parse phase if you don’t narrow your search down to a specific time, but eventually presents you with options:

[oracle@rac18pri1 ~]$ tfactl oswbb

Starting OSW Analyzer V8.1.2
OSWatcher Analyzer Written by Oracle Center of Expertise
Copyright (c)  2017 by Oracle Corporation

Parsing Data. Please Wait...

Scanning file headers for version and platform info...

Parsing file rac18pri1_iostat_18.10.10.0600.dat ...
Parsing file rac18pri1_iostat_18.10.10.0700.dat ...
Parsing file rac18pri1_iostat_18.10.10.0800.dat ...

[...]

Parsing file rac18pri1_ps_18.10.12.0400.dat ...
Parsing file rac18pri1_ps_18.10.16.1900.dat ...
Parsing file rac18pri1_ps_18.10.16.2000.dat ...


Parsing Completed.


Enter 1 to Display CPU Process Queue Graphs
Enter 2 to Display CPU Utilization Graphs
Enter 3 to Display CPU Other Graphs
Enter 4 to Display Memory Graphs
Enter 5 to Display Disk IO Graphs

Enter GC to Generate All CPU Gif Files
Enter GM to Generate All Memory Gif Files
Enter GD to Generate All Disk Gif Files
Enter GN to Generate All Network Gif Files

Enter L to Specify Alternate Location of Gif Directory
Enter Z to Zoom Graph Time Scale (Does not change analysis dataset)
Enter B to Returns to Baseline Graph Time Scale (Does not change analysis dataset)
Enter R to Remove Currently Displayed Graphs

Enter X to Export Parsed Data to Flat File
Enter S to Analyze Subset of Data(Changes analysis dataset including graph time scale)
Enter A to Analyze Data
Enter D to Generate DashBoard

Enter Q to Quit Program

Please Select an option:

As you can see, there are plenty of options available. In the interactive mode you are currently using it’s possible to display all sorts of charts for CPU, memory, disk, and network.

Another option – and one I like a lot, is to generate all the data in a dashboard.

Creating the dashboard

If you choose option “D” oswbb will go ahead and create a dashboard. You are prompted to enter a name before it goes off and creates it. The location where it saves it is not immediately obvious … In my case – using RAC – the result is stored in $ORACLE_BASE/tfa/repository/suptools/$(hostname)/oswbb/oracle/oswbb/analysis

I suppose that’s the same location in Oracle Restart and single instance, although I haven’t been able to verify that claim.

Within the “analysis” directory you find all your information combined in a subdirectory. It has the same name you assigned earlier when you requested the dashboard generation. It’s probably easiest to transfer the entire directory to a workstation for analysis.

Within the top level directory you find 2 files – analysis.txt as well as another directory containing the HTML representation of the data gathered.

[martin@host dashboard1]$ ls -l
total 14588
-rw-r--r-- 1 martin martin 14931077 Oct 16 22:07 analysis.txt
drwxr-xr-x 7 martin martin     4096 Oct 16 22:07 dashboard
[martin@host dashboard1]$

Let’s start with the latter to get a first impression.

The HTML output

I have included a couple of screenshots that you might find interesting about the system under observation. If this was a “real” system and not an undersized set of VMs I’d probably be quite nervous by just looking at this data … there is no reason for concern though, this is just my lab! And I did my best to hurt the VMs, otherwise there wouldn’t be anything to show here :)

overview

As you can see, all the main components are colour-coded. A number of important charts are presented as well. CPU looks particularly bad. You can enlarge the images by clicking on them.

The dashboard provides you with detailed information for CPU, memory, I/O and networking. I’d like to show you a few of these now.

CPU

Clicking on the red CPU button, I get more details about CPU utilisation on the system.

cpu

There are actually a few more charts, but they didn’t fit on the screen. I like the fact that I am immediately presented with critical findings at the top of the page, and there is more information in form of charts. Again, clicking on the chart provides a larger picture.

Heading over to the text file (or by clicking on the “details” button not shown in the figure) I mentioned earlier I get more details:

############################################################################
# Section 4.2: CPU UTILIZATION: PERCENT BUSY
# CPU utilization should not be high over long periods of time. The higher 
# the cpu utilization the longer it will take processes to run.  Below lists 
# the number of times (NUMBER) and percent of the number of times (PERCENT) 
# that cpu percent busy was High (>95%) or Very High (100%). Pay attention 
# to high spanning multiple snaps as this represents the number of times cpu
# percent busy remained high in back to back snapshots
                                       NUMBER  PERCENT
------------------------------------------------------
Snaps captured in archive                 5568   100.00
High (>95%)                                 48     0.86
Very High (100%)                            42     0.75
High spanning multiple snaps                24     0.43

CPU UTILIZATION: The following snaps recorded cpu utilization of 100% busy:
SnapTime                      
------------------------------
Wed Oct 10 11:08:56 BST 2018
Wed Oct 10 11:09:29 BST 2018
Wed Oct 10 20:14:54 BST 2018
[...]

Using these data points I can further drill down into the information gathered by OSWatcher in the archive directory to look at specific output.

Memory

The memory button is also in bright red, so something must be wrong in that area.

memory

The system doesn’t seem to be in best shape. This is entirely my fault: I have assigned too little memory to my VMs, and here is proof!

Changing the scope

After the first investigation using all the data, you may want to narrow the scope down a bit. Or you might already know that an issue was reported last night between 2 AM and 4 AM. Narrowing the scope down for the purpose of creating a (more detailed) dashboard can be done using the “S” flag in the initial menu, as shown here:

Please Select an Option:S

Specify Analysis Start Time. Valid entry between Oct 10 07:00:03 2018 and Oct 16 21:31:59 2018
Example Format To Enter Time: Oct 10 07:00:03 2018  :Oct 11 21:00:00 2018

Specify Analysis End Time. Valid entry between Oct 10 07:00:03 2018 and Oct 16 21:31:59 2018
Example Format To Enter Time: Oct 16 21:31:59 2018  :Oct 12 01:00:00 2018

Dates accepted. Verifying valid begin/end data points...

Validating times in the archive...


Recalibrating data...
Scanning file headers for version and platform info...

Parsing file rac18pri1_iostat_18.10.11.2000.dat ...

[...]

Parsing file rac18pri1_ps_18.10.12.0000.dat ...

Enter a unique analysis directory name or enter  to accept default name:

There are other options to narrow the scope down: menu option “Z” allows you to specify a date/time range for interactive use without changing the dataset. In interactive mode you can change the scope using “Z”, followed by the creation of any chart of interest (menu items 1-5). You can reset the “zoom” by pressing “B”.

Summary

Did I say I really like OSWatcher? It’s really helpful. If you install it together with TFA you don’t even need to concern yourself with writing a startup script, it’ll just be started. If you have to, you can go back in time (within reason) and investigate O/S performance statistics. Unlike SAR, it will give you a 30 second granularity, which is a lot better than the rather coarse 10 minute interval.

Admittedly, SAR keeps a month worth of data, so both tools really complement each other quite nicely.

Happy troubleshooting!

Advertisements