I have written a few articles about Tracefile Analyzer (TFA) in the recent past. As you may recall from these posts, a more comprehensive TFA version than the one provided with the installation media is available from My Oracle Support (MOS for short). As part of this version, you get a lot of useful, additional tools, including OSWatcher. I love OSWatcher, simply because it gives me insights that would be very hard to get with sar for example. SAR tends to be the least common denominator on most (proper) operating systems and it’s better than nothing, but please read on and let me explain why I like OSWatcher so much.
Feel free to head back to my earlier posts if you like to get some more details about “stock TFA” and “MOS TFA”. These are terms I coined by the way, you won’t find them in a MOS search.
Say hello to the OSWatcher Analyser dashboard
In this article I’d like to introduce the OSWatcher Analyser dashboard to you. I think it’s probably the best way (certainly a very convenient one) to get an overview of what is going on at the operating system level. And it includes pretty pictures that are so easy to understand! OSWatcher (more precisely, oswbb) should be running on the system as part of “MOS Tracefile Analyzer”. In other words, it has to be installed.
The environment for this post
As always, before beginning the write-up, here are some details about the system I’m using. My virtual RAC system is based on KVM, and I’m running Oracle Linux 7.4 with UEK 4. The database reasonably current at version 22.214.171.124.180717 (but that shouldn’t matter as this post isn’t about the database!), and I have upgraded TFA to 126.96.36.199.0. This is the latest TFA version at the time of writing.
[oracle@rac18pri1 ~]$ tfactl print version TFA Version : 188.8.131.52.0 [oracle@rac18pri1 ~]$ tfactl toolstatus .------------------------------------------------------------------. | TOOLS STATUS - HOST : rac18pri1 | +----------------------+--------------+--------------+-------------+ | Tool Type | Tool | Version | Status | +----------------------+--------------+--------------+-------------+ | Development Tools | orachk | 184.108.40.206.3 | DEPLOYED | | | oratop | 14.1.2 | DEPLOYED | +----------------------+--------------+--------------+-------------+ | Support Tools Bundle | darda | 2.10.0.R6036 | DEPLOYED | | | oswbb | 8.1.2 | RUNNING | | | prw | 220.127.116.11.4 | NOT RUNNING | +----------------------+--------------+--------------+-------------+ | TFA Utilities | alertsummary | 18.104.22.168.0 | DEPLOYED | | | calog | 22.214.171.124.0 | DEPLOYED | | | dbcheck | 126.96.36.199.0 | DEPLOYED | | | dbglevel | 188.8.131.52.0 | DEPLOYED | | | grep | 184.108.40.206.0 | DEPLOYED | | | history | 220.127.116.11.0 | DEPLOYED | | | ls | 18.104.22.168.0 | DEPLOYED | | | managelogs | 22.214.171.124.0 | DEPLOYED | | | menu | 126.96.36.199.0 | DEPLOYED | | | param | 188.8.131.52.0 | DEPLOYED | | | ps | 184.108.40.206.0 | DEPLOYED | | | pstack | 220.127.116.11.0 | DEPLOYED | | | summary | 18.104.22.168.0 | DEPLOYED | | | tail | 22.214.171.124.0 | DEPLOYED | | | triage | 126.96.36.199.0 | DEPLOYED | | | vi | 188.8.131.52.0 | DEPLOYED | '----------------------+--------------+--------------+-------------' Note :- DEPLOYED : Installed and Available - To be configured or run interactively. NOT RUNNING : Configured and Available - Currently turned off interactively. RUNNING : Configured and Available. [oracle@rac18pri1 ~]$
As you can see, oswbb is part of the “Support Tools Bundle” and actively running out of the box.
Apart from OSWatcher.sh it is important for OSWatcherFM.sh to run as well, as it clears the archive directory where OSWatcher stores its results. Of course that doesn’t relieve you from keeping an eye on your filesystem usage numbers :) Both of these are started by default.
Assuming there is data for analysis (eg OSWatcher has run for a bit), you can invoke oswbb via tfactl. It will go through a somewhat lengthy parse phase if you don’t narrow your search down to a specific time, but eventually presents you with options:
[oracle@rac18pri1 ~]$ tfactl oswbb Starting OSW Analyzer V8.1.2 OSWatcher Analyzer Written by Oracle Center of Expertise Copyright (c) 2017 by Oracle Corporation Parsing Data. Please Wait... Scanning file headers for version and platform info... Parsing file rac18pri1_iostat_18.10.10.0600.dat ... Parsing file rac18pri1_iostat_18.10.10.0700.dat ... Parsing file rac18pri1_iostat_18.10.10.0800.dat ... [...] Parsing file rac18pri1_ps_18.10.12.0400.dat ... Parsing file rac18pri1_ps_184.108.40.2060.dat ... Parsing file rac18pri1_ps_220.127.116.110.dat ... Parsing Completed. Enter 1 to Display CPU Process Queue Graphs Enter 2 to Display CPU Utilization Graphs Enter 3 to Display CPU Other Graphs Enter 4 to Display Memory Graphs Enter 5 to Display Disk IO Graphs Enter GC to Generate All CPU Gif Files Enter GM to Generate All Memory Gif Files Enter GD to Generate All Disk Gif Files Enter GN to Generate All Network Gif Files Enter L to Specify Alternate Location of Gif Directory Enter Z to Zoom Graph Time Scale (Does not change analysis dataset) Enter B to Returns to Baseline Graph Time Scale (Does not change analysis dataset) Enter R to Remove Currently Displayed Graphs Enter X to Export Parsed Data to Flat File Enter S to Analyze Subset of Data(Changes analysis dataset including graph time scale) Enter A to Analyze Data Enter D to Generate DashBoard Enter Q to Quit Program Please Select an option:
As you can see, there are plenty of options available. In the interactive mode you are currently using it’s possible to display all sorts of charts for CPU, memory, disk, and network.
Another option – and one I like a lot, is to generate all the data in a dashboard.
Creating the dashboard
If you choose option “D” oswbb will go ahead and create a dashboard. You are prompted to enter a name before it goes off and creates it. The location where it saves it is not immediately obvious … In my case – using RAC – the result is stored in $ORACLE_BASE/tfa/repository/suptools/$(hostname)/oswbb/oracle/oswbb/analysis
I suppose that’s the same location in Oracle Restart and single instance, although I haven’t been able to verify that claim.
Within the “analysis” directory you find all your information combined in a subdirectory. It has the same name you assigned earlier when you requested the dashboard generation. It’s probably easiest to transfer the entire directory to a workstation for analysis.
Within the top level directory you find 2 files – analysis.txt as well as another directory containing the HTML representation of the data gathered.
[martin@host dashboard1]$ ls -l total 14588 -rw-r--r-- 1 martin martin 14931077 Oct 16 22:07 analysis.txt drwxr-xr-x 7 martin martin 4096 Oct 16 22:07 dashboard [martin@host dashboard1]$
Let’s start with the latter to get a first impression.
The HTML output
I have included a couple of screenshots that you might find interesting about the system under observation. If this was a “real” system and not an undersized set of VMs I’d probably be quite nervous by just looking at this data … there is no reason for concern though, this is just my lab! And I did my best to hurt the VMs, otherwise there wouldn’t be anything to show here :)
As you can see, all the main components are colour-coded. A number of important charts are presented as well. CPU looks particularly bad. You can enlarge the images by clicking on them.
The dashboard provides you with detailed information for CPU, memory, I/O and networking. I’d like to show you a few of these now.
Clicking on the red CPU button, I get more details about CPU utilisation on the system.
There are actually a few more charts, but they didn’t fit on the screen. I like the fact that I am immediately presented with critical findings at the top of the page, and there is more information in form of charts. Again, clicking on the chart provides a larger picture.
Heading over to the text file (or by clicking on the “details” button not shown in the figure) I mentioned earlier I get more details:
############################################################################ # Section 4.2: CPU UTILIZATION: PERCENT BUSY # CPU utilization should not be high over long periods of time. The higher # the cpu utilization the longer it will take processes to run. Below lists # the number of times (NUMBER) and percent of the number of times (PERCENT) # that cpu percent busy was High (>95%) or Very High (100%). Pay attention # to high spanning multiple snaps as this represents the number of times cpu # percent busy remained high in back to back snapshots NUMBER PERCENT ------------------------------------------------------ Snaps captured in archive 5568 100.00 High (>95%) 48 0.86 Very High (100%) 42 0.75 High spanning multiple snaps 24 0.43 CPU UTILIZATION: The following snaps recorded cpu utilization of 100% busy: SnapTime ------------------------------ Wed Oct 10 11:08:56 BST 2018 Wed Oct 10 11:09:29 BST 2018 Wed Oct 10 20:14:54 BST 2018 [...]
Using these data points I can further drill down into the information gathered by OSWatcher in the archive directory to look at specific output.
The memory button is also in bright red, so something must be wrong in that area.
The system doesn’t seem to be in best shape. This is entirely my fault: I have assigned too little memory to my VMs, and here is proof!
Changing the scope
After the first investigation using all the data, you may want to narrow the scope down a bit. Or you might already know that an issue was reported last night between 2 AM and 4 AM. Narrowing the scope down for the purpose of creating a (more detailed) dashboard can be done using the “S” flag in the initial menu, as shown here:
Please Select an Option:S Specify Analysis Start Time. Valid entry between Oct 10 07:00:03 2018 and Oct 16 21:31:59 2018 Example Format To Enter Time: Oct 10 07:00:03 2018 :Oct 11 21:00:00 2018 Specify Analysis End Time. Valid entry between Oct 10 07:00:03 2018 and Oct 16 21:31:59 2018 Example Format To Enter Time: Oct 16 21:31:59 2018 :Oct 12 01:00:00 2018 Dates accepted. Verifying valid begin/end data points... Validating times in the archive... Recalibrating data... Scanning file headers for version and platform info... Parsing file rac18pri1_iostat_18.104.22.1680.dat ... [...] Parsing file rac18pri1_ps_18.10.12.0000.dat ... Enter a unique analysis directory name or enter to accept default name:
There are other options to narrow the scope down: menu option “Z” allows you to specify a date/time range for interactive use without changing the dataset. In interactive mode you can change the scope using “Z”, followed by the creation of any chart of interest (menu items 1-5). You can reset the “zoom” by pressing “B”.
Did I say I really like OSWatcher? It’s really helpful. If you install it together with TFA you don’t even need to concern yourself with writing a startup script, it’ll just be started. If you have to, you can go back in time (within reason) and investigate O/S performance statistics. Unlike SAR, it will give you a 30 second granularity, which is a lot better than the rather coarse 10 minute interval.
Admittedly, SAR keeps a month worth of data, so both tools really complement each other quite nicely.