Tracking a Partitioned System p Overall Utilization
The pupose of this program is to automate the conversion of nmon csv files into web pages. It provides long term trend charts, and can aggregate performance of all partitions residing on a physical server.
Update: February 23, 2011
Changes in this version have been completed by Allan Cano at Itrus.com.
- topas_nmon support
- index.html Changes
I've updated the index.html page so that ...
- the server menus are alphabetical.
- there a quick click button in the table to get to trending info
- the page doesn't reload after submit actions
- the date range is auto checked and set against the table information
- servers which have not been updated recently go 'red' in the end date field
- nmon2web pl & cgi
- use strict
I've moved globals in to the global space and made the necessary changes to allow the script to pass a strict check.
- Config file option
You can create a config file in /path/etc/nmon2web.cfg to defined where the data files are found and where the html files should go. Also, /path/etc/nmon2web.cgi.cfg to configure the data directories for the cgi script.
You MUST MUST MUST update these file or zero it out if you choose to update the nmon2web pl and cgi scripts.
- Added arguments
-G will indicate that the nmon data files are alread in GMT format. Run 'nmon2web.pl -?' for the rest of the options.
- Locking out
If you create a file in the nmon data directory (-n) called 'stop' then the after the current server conversion the nmon2web.pl with stop. It will remain locked out until the file is removed. Handy, if you're concerned about control-c'ing in the middle of a site update.
- Relative Directories
The nmon2web.pl file has been placed in /path/bin/. The nmon data files are assumed to be in /path/bin/../data. The html output are assumed to be placed in /path/bin/../html.
- NMON Bug fix
Found that on a couple of versions of nmon the first set of data has its timestamp off by one century and a month. Instead of hacking around this I just skip the first set of data if the date doesn't match the 2nd date.
- ENV variable NMON2WEB_CONFIG
Set this to an configuration additional file to allow easier setting up sub-environments. For example when you are documenting multiple sites or separating aix and linux servers.
- use CGI
Switched to using the CGI perl module (included by default in most perl distributions) to parse the web form input
- datadir form variable
For the aggregate form you can pass in a data directory which is APPENDED to the DIRECTORY variable. This also for easy site sub-dirs in a website or splitting aix and linux into sub-dirs without having to have seperate cgi-bin scripts.
- Admin Scripts
Provided some scripts in ../root which I've found useful in managing the data collection and setting up nmon to run on the servers.
See below for installation instructions.
You MUST MUST MUST update the /path/etc/nmon2web.cfg file or zero it out if you choose to update the nmon2web pl and cgi scripts.
For current users, replace the nmon2web.pl and nmow2web.cgi programs with the ones in the tar.gz file and update paths as needed. You will want to check the remaining files for differences, excepting find_max_nmon_val.pl which has not changed. If you update, be sure the customization setting are changed as well.
Update: February 11, 2009
Changes in this version:
- nmon 12e support
- Update frequency: Removed previous restriction of one nmon file per day. You can now update at any frequency within reason. (I recommend a maximum of one update per hour.)
- Trend charts: Added top and disks to trend charts. Previously top and disk were only available in the daily charts.
- Number of disks There is a new customization option for number of disks to track. The default is 25.
- Bug fix fixed start date in Trend charts to be the rrdtool wrap date. Previously the start date was the creation date.
See below for installation instructions.
For current users, replace the nmon2web.pl program with the one in the tar.gz file. You may want to check the remaining files for differences. However, no changes have been made to them since June 2008. If you update, be sure the customization setting are changed as well.
This is the third of a series of tips that illustrate how to automate the collection and display of nmon
performance data from multiple servers. This tip extends the capability of the previous tips by adding CPU, memory, virtual I/O aggregation across partitions residing on the same physical server.
This tip is targeted primarily for micropartitioned systems. However, it may be of broader interest for the following new features:
- More flexibility in choice of charts (replaced the nmon2rrd utility with Perl)
- Displays non-default AIX settings for ease of management.
- Displays Change control logs for AIX settings and hardware configuration.
- Work Load Manager - displays absolute CPU utilization by class (useful for micropartitions, where %utilization is meaningless)
- Centralized rrdtool database for easier data extraction so you can write your own programs. (The nmon2rrd tool created a separate rrdtool database for each nmon file.)
- Uses less disk space - removed duplicate databases (daily and long term)
I originally had two objectives in creating this tool
- Automate the creation of daily nmon charts
- Aggregate CPU and memory usage across multiple partitions on a micropartitioned server.
The scope grew to include the aggregation of virtual I/O. However, there are some limitations in this approach (like how to handle mulitple VIO servers). So be sure to understand the limitations listed at the bottom of this page. Otherwise, I've found this to be a very useful tool for tracking performance.
This tool organizes and creates charts on a centralized server using nmon data from multiple servers. Each server (standalone, LPAR, micropartition) uses "nmon -f" to collect daily performance. At the end of the collection period, the nmon file is transfered to a staging directory on a centralized web server. (I leave the details of the data transfer up to you.) On the web server, the "nmon2web.pl" script organizes the data by server and stores it in "rrdtool" database. It also creates the daily web pages.
I've tried to automate the process. For example, to add a new server, simply put the new nmon file in the web server's staging directory. The "nmon2web.pl" will figure out that this is a new server, and will create the necessary directories, rrdtool databases, etc. It will also add the new server to the web page.
- All Servers: The "nmon" program collects performance data on AIX LPAR's, micropartitions and standalone servers.
- Run "nmon -f" as a "cron" job, which outputs data to a file
- I recommended nmon sample interval is 10-20 minutes.
- There are no restrictions on the sample length, but I recommend 1-24 hours. The one hour size allows you to view performance over the day, but it creates a lot more files.
- The "nmon" output files are forwarded to a staging directory on a central web server.
- If you are using a partitioned server, turn on the "Allow shared processor pool utilization authority" on all partitions. This is done on the HMC by right clicking the partition name (not the partition's profile!). Choose "Properties". Choose the "Hardware" tab, then the "Processor and Memory" tab.
- Web Server: The "nmon2web.pl" script processes the nmon files
- Organizes web pages by server's serial number, partition name and type (dedicated|micropartition)
- Automatically adds new servers
- Stores data in a "rrdtool" database.
- Creates daily and long term performance charts for individual servers
- Logs configuration and tuning changes
- Aggregated charts are created dynamically using the "nmon2web.cgi" script
- PC Browser: Point browser to the "index.html" page on the web server
- Daily and long term performance charts
- Aggregated utilization across all partition on a physical server
- Lists configuration, change logs, and non-default AIX settings
Installation Steps for Servers
- Install nmon performance monitor tool (V11 is preferred, but V10 will work, V9 will work, sorta)
- Use cron to automate nmon data collection.
# following cron entry will run nmon with a 10 minute sample rate, starting
# at 00:01 for 24 hours:
01 00 * * * (cd /system_dir/nmon/HOSTNAME; /usr/local/bin/nmon -x)
- Automate upload of the nmon files to web server. (I run ftp as a cron job)
Installation Steps for Web Server
- Comment: My test web server is a Linux on Power micropartition. It should work on AIX web servers as well.
- Install the "rrdtool"
- Unpack the nmon2web.tar.gz (gzip -dc nmon2web.tar.gz |tar -xvf-)
- Install nmon2web.cgi
- Move nmon2web.cgi to web servers cgi directory
- Make executable chmod a+x nmon2web.cgi
- Change $DIRECTORY and $WEB_DIR variables to reflect
- Install nmon2web.pl
- Move to /usr/local/bin (or equivalent)
- Make executable chmod a+x nmon2web.cgi
- Customize directory and database retention variables
$NMON_DIR="/home/baspence/nmon"; # source of nmon files
$HTTP_DIR="/home/baspence/public_html/nmon"; # Absolute path to index.html
$DB_MONTHS=36; # rrdtool: number of months to retain data before rrd wraps
- Add cron job to execute this script. This entry runs the script every day at 1 AM
00 01 * * * (/usr/local/bin/nmon2web.pl)
- Create $HTTP_DIR and $NMON_DIR directories
- Grant write permission on $HTTP_DIR to nmon2web.pl and nmon2web.cgi programs
- Move index.html to $HTTP_DIR (chmod a+r index.html)
Aggregating CPU and memory are relatively straight forward. However, aggregating virtual I/O and ethernet is more challenging. By default the nmon2web.cgi program aggregates virtual utilization by summin all vscsi adapters across all partitions (LPAR and micropartition). For ethernet, the program sums en0 traffic only on micropartitions. The problem is that the program could double count vscsi workload, or assume the wrong ethernet interface.
You can specify your virtual scsi and ethernet configuration by creating the file $SYSTEM_DIR/Shared/sharedpool/virtual.cfg. There's a template file in the same directory that explains how to configure.
This program is not backward compatible with Parts 1 & 2. The file systems have been reorganized, the rrdtool databases centralized. The nmon data should be reloaded from scratch.
Adding or deleting servers can cause blank aggregated charts. The underlying "rrdtool" doesn't handle missing data when aggregating data. So if you add a add/remove a micropartition on a server, you may get blank charts when you try to display aggregated performance over a time period where the server is missing.
Adding new devices (scsi, fcsi, ethernet) may cause blank graphs. Same reason as above. New devices will have to be added manually to the appropriate rrd file.
Short nmon sampling intervals increase disk requirements on the web server. For example, a 1 minute interval and 3 year data retention (default) used about 500 MB of disk space (reserved at setup time by the rrdtool). I recommend using an nmon sampling interval of 10-20 minutes.
Linux Empty charts for "system calls". (nmon produces negative numbers)
Linux on Power If running on a micropartitioned system with AIX, the CPU free pool may look 100% busy. Displays as a standalone server (nmon for pLinux doesn't report the serial number, and consequently doesn't get assigned to a server).
Other Please report other issues. I have a limited "sandbox" and
have not tested every combination of hardware/operating system.