Pimp my collectl-advanced system monitoring using collect-utils part I
Posted by Martin Bach on August 12, 2011
I have recently written about collectl, a truly superb troubleshooting utility, in a previous post. After comments from Mark Seeger (the author) and Kevin Closson (who has used it extensively and really loves it), I have decided to elaborate a bit more about what you can do with collectl.
Even though it’s hard to believe, collectl’s functionality can be extended by using the collectl-utilities from sourceforge, available here: http://collectl-utils.sourceforge.net/
Like collectl, you can either download a source tgz file or a noarch-RPM. Collectl-utils consist of three major tools, out of which I’d like to introduce the first one: colplot. When finding time I’ll create a post about the other part, most likely about colmux first.
I mentioned in said previous post that you can use the “-P” option to generate output in a plot format. This in turn can be fed to your favourite spreadsheet application, or alternatively into gnuplot. When chosing to use a spreadsheet application, it’s your responsibility to decide what to do with the raw data, each time you load a plotfile. Maybe, one day I’ll write a collectl-analyzer which does similar things to nmon-analyzer, but that has to wait for now. So if you are lazy like me, you need another alternative, and it comes easily accessible in the form of gnuplot.
Although I am very impressed by what gnuplot can do, I never had the time or energy to get to grips with all its options. When at University I used Mathematica 2 (yes it’s been a while) and thought the plot2d() function was complex …
Now for the good news: the complexity of gnuplot is nicely hidden by colplot, which takes the burden of generating the plot files away from the user. And to make it more comfortable, all of this happens through a web interface. All you need is a web server such as apache and a little bit of initial configuration for it to work. I should also note that colplot can be used on the command line as well, but that is out of scope of this article.
This time around I downloaded the source tarball rather than the RPM as I wanted more control over the installation process. If you chose the RPM it is good to know that it has intelligence to tell SLES apart from RHEL and updates the web server configuration accordingly. If you decide to manually install colplot, check the INSTALL script as it can help you getting started. And don’t forget to read INSTALL-colplot and consult colplot-apache.conf for a sample apache configuration. The latter can go to /etc/httpd/conf.d on RHEL and will take effect after a reloading of the apache configuration. You also need collectl installed on the host running the collect-utils.
Colplot uses a directory, usually called plotfiles, where the recorded collectl output is stored. By default, it resides in the same directory as colplot but can be changed in the GUI.
I am thinking of using NFS to export the plotfiles directory, so that each monitored host could mount the directory and store output files. The more progressive use of SSHFS is probably out of scope for most database servers, but on my lab I’m king and do what I like. I personally found it easiest to use “collectl -P -f /mnt/sshfs/plotfiles/ <other options>” to generate the files, where mount/sshfs/plotfiles was mounted from the web server host. If you are planning on generating the colplot output file names manually, i.e. not pointing to a directory, then make sure they are unique! This makes is easy to compare systems, as we’ll see below. One thing I noticed is that detail files all get their own trace file name in the form “host-date.type”, where type is dsk for detailed disk information etc.
After all the webserver setup is complete, point your browser to the host where you installed colplot. As I said, the “plotfiles” directory is scanned for files, which are processed. You see the following screen:
In the GUI, the first step you define which time of day with matching/gathered collectl information you would like to visualise (open the above screenshot in a separate window to better follow this discussion).
- You can limit the information to be displayed to a certain time period, i.e. if you captured a day’s worth of statistics but only need the hour from 13:00 to 14:00 that’s a simple setting in the user interface
- Alternatively, select “last 60 minutes” for the most recent period
You can also list the contents of the plotfiles directory, or even change the location-but bear in mind that the webserver still has to be able to read files from there!
If you like, you can instruct colplot to narrow down the files to be plotted by editing “filenames containing”. If the plotfiles directory contains information to satisfy the period/names you are interested in, it will plot it after a click on “Generate plot”. I suggest a display of “SysPlot” initially, which plots the systems recorded in the colplot files side by side. This is very useful for comparison of system health, especially in clusters. You should experiment with the different plot settings, which are very useful to do all sorts of analysis and allow aggregation on days, system and plots and various combinations of these. By the way the system name is derviced from the hostname when using the colplot -P -f /path/to/plotfiles/ … command.
Once you familiarised yourself with the options, you can further narrow down which data you interested in. I would suggest the “All Plots” option to get you started, unless of course you know what you are after. Colplot, like collectl differentiates between “summary” and “detail” plots. Of course, it can only plot what you recorded! Each of these has a convenient “All Plots” option to display all the information gathered. Here’s the slightly cropped output from a 2 node cluster I gathered (click for a larger view):
A very usful function is to email the results either in PDF or PNG format. For this to work you need uuencode (package sharutils in RHEL) on the web server host, and you need to be able to send email via the command line-colplot uses the mail (1) utility to send email. Sending a PDF is probably more useful than the PNG option, as the latter will send a each graph separately in a tar archive.
I can’t say how impressed I am with colplot, it’s really great for working out what happened when during a benchmark. The great things is the comparison of systems side by side which gives clear indications of imbalances and trensing. Using colplot is also a lot easier than writing your own spreadsheet macros to visualise the data. I really like it!