Limit nmon to certain disks

I am often working on systems with large number of LUNs, going into the hundreds. Trying to keep track is difficult at best. Sometimes though you might want to limit yourself to a specific number of disks-I knew about the dskfilt option in collectl but only today learned about a similar feature in nmon. The following is a very greatly simplified example of course, but it gives you the idea.

Assume you are interested in 4 disks-sd{a,b,c,d}. Let’s further assume that your disks are used for 2 ASM disk groups, DATA and RECO. You could create a file “disks.dat” with the following contents:

DATA sda sdb
RECO sdc sdd

Then pass that file to nmon with the -g flag. When you type “g” now, you are shown IO stats for those disks.

─────────────────Hostname=test──────────────────────Refresh= 2secs ───15:13.02─┐
│ Disk-Group-I/O ──────────────────────────────────────────────────────────────│
│ Name          Disks AvgBusy Read|Write-KB/s  TotalMB/s   xfers/s BlockSizeKB │
│ DATA               2   0.0%       0.0|0.0          0.0       0.0    0.0      │
│ RECO               2   0.0%       0.0|0.0          0.0       0.0    0.0      │
│ Groups= 2 TOTALS   4   0.0%       0.0|0.0          0.0       0.0             │

That is terrific, and very useful to hunt for bottlenecks. In addition, you have the “minimal mode” as well, which displays only busy disks and processors. Invoke it by pressing the “.” in interactive mode.

Grouping disks also works for the capacity planning mode. From the online help:

For Data-Collect-Mode = spreadsheet format (comma separated values)
 Note: use only one of f,F,z,x or X and make it the first argument
 -g <filename> User Defined Disk Groups (see above) - see BBBG & DG lines

Nothing stopping you now from benchmarking properly and focusing on the relevant detail! Don’t forget to run the raw file through nmon analyzer to get some eye-candy even a non-techie understands (it has lots of nice color!)

An introduction to collectl

Some of you may have seen on twitter that I was working on understanding collectl. So why did I start with this? First of all, I was after a tool that records a lot of information on a Linux box. It can also play information back, but this is out of scope of this introduction.

In the past I have used nmon to do similar things, and still love it for what it does. Especially in conjunction with the nmon-analyzer, an Excel plug in it can create very impressive reports. How does collectl compare?

Getting collectl

Getting collectl is quite easy-get it from sourceforge: http://sourceforge.net/projects/collectl/

The project website including very good documentation is available from sourceforge as well, but uses a slightly different URL: http://collectl.sourceforge.net/

I suggest you get the archive-independent RPM and install it on your system. This is all you need to get started! The impatient could type “collectl” at the command prompt now to get some information. Let’s have a look at the output:

$ collectl
waiting for 1 second sample...
#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut
1   0  1163  10496    113     14     18      4      8     55      5      19
0   0  1046  10544      0      0      2      3    164    195     30      60
0   0  1279  10603    144      9    746    148     20     67     11      19
3   0  1168  10615    144      9    414     69     14     69      5      20
1   0  1121  10416    362     28    225     19     11     71      8      35

The “ouch” has been caused by my CTRL-c to stop the execution.

Collectl is organised to work by subsystems, the standard option is to print CPU, disk and network subsystem, aggregated.

