Pimp my collectl-advanced system monitoring using collect-utils part I

I have recently written about collectl, a truly superb troubleshooting utility, in a previous post. After comments from Mark Seeger (the author) and Kevin Closson (who has used it extensively and really loves it), I have decided to elaborate a bit more about what you can do with collectl.

Even though it’s hard to believe, collectl’s functionality can be extended by using the collectl-utilities from sourceforge, available here: http://collectl-utils.sourceforge.net/

Like collectl, you can either download a source tgz file or a noarch-RPM. Collectl-utils consist of three major tools, out of which I’d like to introduce the first one: colplot. When finding time I’ll create a post about the other part, most likely about colmux first.


I mentioned in said previous post that you can use the “-P” option to generate output in a plot format. This in turn can be fed to your favourite spreadsheet application, or alternatively into gnuplot. When chosing to use a spreadsheet application, it’s your responsibility to decide what to do with the raw data, each time you load a plotfile. Maybe, one day I’ll write a collectl-analyzer which does similar things to nmon-analyzer, but that has to wait for now. So if you are lazy like me, you need another alternative, and it comes easily accessible in the form of gnuplot.

Although I am very impressed by what gnuplot can do, I never had the time or energy to get to grips with all its options. When at University I used Mathematica 2 (yes it’s been a while) and thought the plot2d() function was complex …

Now for the good news: the complexity of gnuplot is nicely hidden by colplot, which takes the burden of generating the plot files away from the user. And to make it more comfortable, all of this happens through a web interface. All you need is a web server such as apache and a little bit of initial configuration for it to work. I should also note that colplot can be used on the command line as well, but that is out of scope of this article.

This time around I downloaded the source tarball rather than the RPM as I wanted more control over the installation process. If you chose the RPM it is good to know that it has intelligence to tell SLES apart from RHEL and updates the web server configuration accordingly. If you decide to manually install colplot, check the INSTALL script as it can help you getting started. And don’t forget to read INSTALL-colplot and consult colplot-apache.conf for a sample apache configuration. The latter can go to /etc/httpd/conf.d on RHEL and will take effect after a reloading of the apache configuration. You also need collectl installed on the host running the collect-utils.

Colplot uses a directory, usually called plotfiles, where the recorded collectl output is stored. By default, it resides in the same directory as colplot but can be changed in the GUI.

I am thinking of using NFS to export the plotfiles directory, so that each monitored host could mount the directory and store output files. The more progressive use of SSHFS is probably out of scope for most database servers, but on my lab I’m king and do what I like. I personally found it easiest to use “collectl -P -f /mnt/sshfs/plotfiles/ <other options>” to generate the files, where mount/sshfs/plotfiles was mounted from the web server host. If you are planning on generating the colplot output file names manually, i.e. not pointing to a directory, then make sure they are unique! This makes is easy to compare systems, as we’ll see below. One thing I noticed is that detail files all get their own trace file name in the form “host-date.type”, where type is dsk for detailed disk information etc.

After all the webserver setup is complete, point your browser to the host where you installed colplot. As I said, the “plotfiles” directory is scanned for files, which are processed. You see the following screen:

Using it

In the GUI, the first step you define which time of day with matching/gathered collectl information you would like to visualise (open the above screenshot in a separate window to better follow this discussion).

  • You can limit the information to be displayed to a certain time period, i.e. if you captured a day’s worth of statistics but only need the hour from 13:00 to 14:00 that’s a simple setting in the user interface
  • Alternatively, select “last 60 minutes” for the most recent period

You can also list the contents of the plotfiles directory, or even change the location-but bear in mind that the webserver still has to be able to read files from there!

If you like, you can instruct colplot to narrow down the files to be plotted by editing “filenames containing”. If the plotfiles directory contains information to satisfy the period/names you are interested in, it will plot it after a click on “Generate plot”. I suggest a display of “SysPlot” initially, which plots the systems recorded in the colplot files side by side. This is very useful for comparison of system health, especially in clusters. You should experiment with the different plot settings, which are very useful to do all sorts of analysis and allow aggregation on days, system and plots and various combinations of these. By the way the system name is derviced from the hostname when using the colplot -P -f /path/to/plotfiles/ … command.

Once you familiarised yourself with the options, you can further narrow down which data you interested in. I would suggest the “All Plots” option to get you started, unless of course you know what you are after. Colplot, like collectl differentiates between “summary” and “detail” plots. Of course, it can only plot what you recorded! Each of these has a convenient “All Plots” option to display all the information gathered. Here’s the slightly cropped output from a 2 node cluster I gathered (click for a larger view):

A very usful function is to email the results either in PDF or PNG format. For this to work you need uuencode (package sharutils in RHEL) on the web server host, and you need to be able to send email via the command line-colplot uses the mail (1) utility to send email. Sending a PDF is probably more useful than the PNG option, as the latter will send a each graph separately in a tar archive.


I can’t say how impressed I am with colplot, it’s really great for working out what happened when during a benchmark. The great things is the comparison of systems side by side which gives clear indications of imbalances and trensing. Using colplot is also a lot easier than writing your own spreadsheet macros to visualise the data. I really like it!


2 thoughts on “Pimp my collectl-advanced system monitoring using collect-utils part I

  1. Mark Seger

    Great description, martin. Do you mind if I put a couple of links to your postings into the formal documentation? I’m assuming you’re not planning on removing/moving them any time soon.

    Something you may have missed, given all the switches/options I admit I get carried away with, is the ability to display plots in real-time, especially since you using nfs to make the output from multiple systems available to colplot in a shared directory. Then again this is probably in the category of something more advanced and you were trying to keep things simply, which I totally get.

    That said, you do real-time plotting, you need collectl to generate its output in lot format. Since I also like raw output I tend to do both, by using –rawtoo -P. Further, you want to make sure collectl doesn’t compress its output since you need uncompressed for colplot and so you have to include -oz as well, to tell collectl not to compress its output. Finally, since collectl only flushes its output buffers once a minute you need to include -F0 to tell collectl to flush its buffers during every monitoring cycle. Having done all that, you will now get real-time data in both raw form (for playback as often as you like) as well as uncompressed data in plot format that colplot can read.

    Now onto colplot. There are 2 options, one directly to the right on the line where you specify a time frame called “or last n minutes”. If you set this to say 10 minutes, you’ll see the last 10 minutes worth of plot data. Of course that means you have data for that time period and the need for real-time data. Above that is a checkbox called “Enable Refresh Every” with a default of 1 minute. If you click this box the webpage will refresh at that interval resulting in displaying the latest 10 minutes worth of data. Personally I like to set it to the same value as the monitoring interval so I really do see the latest data, but other people find it a little distracting seeing the refreshes – but I’d also seen the flicker has significantly reduced better browser technology over the years.

    In any event, keep up the great writing. I’m lovin’ in…


  2. Pingback: Hardware Monitoring for Application Benchmarking

Comments are closed.