A critical component of disk I/O tuning involves implementing best practices prior to building your system. Because it is much more difficult to move things around when you are already up and running, it is extremely important that you do things right the first time when planning your disk and I/O subsystem environment. This includes the physical architecture, logical disk geometry, and logical volume and file system configuration.
When a system administrator hears that there might be a disk contention issue, the first thing he or she turns to is iostat. iostat, the equivalent of using vmstat for your memory reports, is a quick and dirty way of getting an overview of what is currently happening on your I/O subsystem. While running iostat is not an inappropriate kneejerk reaction at all, the time to start thinking about disk I/O is long before tuning becomes necessary. All the tuning in the world will not help if your disks are not configured appropriately for your environment from the get-go. Further, it is extremely important to understand the specifics of disk I/O and how it relates to AIX® and your System p™ hardware.
When it comes to disk I/O tuning, generic UNIX® commands and tools help you much less than specific AIX tools and utilities that have been developed to help you optimize your native AIX disk I/O subsystem. This article defines and discusses the AIX I/O stack and correlates it to both the physical and logical aspects of disk performance. It discusses direct, concurrent, and asynchronous I/O: what they are, how to turn them on, and how to monitor and tune them. It also introduces some of the long-term monitoring tools that you should use to help tune your system. You might be surprised to hear that iostat is not one of the tools recommended to help you with long-term gathering of statistical data.
Finally, this article continues to emphasize the point that regardless of which subsystem you are looking to tune, systems tuning should always be thought of as ongoing process. The best time to start monitoring your systems is when you have first put your system in production and it is running well, rather than waiting until your users are screaming about slow performance. You really need to have a baseline of what the system looked like when it was behaving normally in order to analyze data when it is presumably not performing adequately. When making changes to your I/O subsystem, make these changes one at a time so that you will be in a position to really assess the impact of your change. In order to assess that impact, you'll be capturing data using one of the long-term monitoring tools recommended in this article.
This section provides an overview of disk I/O as it relates to AIX. It discusses the physical aspects of I/O (device drives and adapters), the AIX I/O stack, and concepts such as direct, concurrent, and asynchronous I/O. The concept of I/O pacing is introduced, along with recent improvements to iostat, to help you monitor your AIX servers.
It shouldn't surprise you that the slowest operation for running any program is the time actually spent on retrieving the data from disk. This all comes back to the physical component of I/O. The actual disk arms must find the correct cylinder, the control needs to access the correct blocks, and the disk heads have to wait while the blocks rotate to them. The physical architecture of your I/O system should be understood prior to any work on tuning activities for systems, as all the tuning in the world won't help a poorly architected I/O subsystem that consists of a slow disk or inefficient use of adapters.
Figure 1 clearly illustrates
how tightly integrated the physical I/O components relate to the logical disk
and its application I/O. This is what is commonly referred to as the AIX I/O
stack.
Figure 1. The AIX I/O stack
You need to be cognizant of all the layers when tuning, as each impacts performance in a different way. When first setting up your systems, start from the bottom (the physical layer) as you configure your disk, the device layer, its logical volumes, file systems, and the files and application. I can't emphasize enough the importance in planning your physical storage environment. This involves determining the amount of disk, type (speed), size, and throughput. One important challenge with storage technology to note is that while storage capabilities of disk are increasing dramatically, the rotational speed of the disk increases at a much slower pace. You must never lose sight of the fact that while RAM access takes about 540 CPU cycles, disk access can take 20 million CPU cycles. Clearly, the weakest link on a system is the disk I/O storage system, and it's your job as the system administrator to make sure it doesn't become even more of a bottleneck. As alluded to earlier, poor layout of data affects I/O performance much more than any tunable I/O parameter. Looking at the I/O stack helps you to understand this, as Logical Volume Manager (LVM) and disk placement are closer to the bottom than the tuning parameters (ioo and vmo).
Now let's discuss some best practices of data layout. One important concept is making sure that your data is evenly spread across your entire physical disk. If your data resides on only a few spindles, what is the point exactly of having multiple logical unit numbers (LUNs) or physical disks? If you have a SAN or another type of storage array, you should try to create your arrays of equal size and type. You should also create them with one LUN for each array and then spread all your logical volumes across all the physical volumes in your Volume Group. As stated previously, the time to do this is when you first configure your system, as it is much more cumbersome to fix I/O problems than memory or CPU problems, particularly if it involves moving data around in a production environment. You also want to make certain that your mirrors are on separate disks and adapters. Databases pose separate, unique challenges so, if possible, your indexes and redo logs should also reside on separate physical disks. The same is true for temporary tablespaces often used for performing sort operations. Back to the physical. Using high-speed adapters to connect the disk drives are extremely important, but you must make certain that the bus itself does not become a bottleneck. To prevent this from happening, make sure to spread the adapters across multiple buses. At the same time, do not attach too many physical disks or LUNs to any one adapter, as this also significantly impacts performance. The more adapters that you configure, the better, particularly if there are large amounts of heavily utilized disk. You should also make sure that the device drivers support multi-path I/O (MPIO), which allows for load balancing and availability of your I/O subsystem.
Let's return to some of the concepts mentioned earlier, such as direct I/O. What is direct I/O? First introduced in AIX Version 4.3, this method of I/O bypasses the Virtual Memory Manager (VMM) and transfers data directly to disk from the user's buffer. Depending on your type of application, it is possible to have improved performance when implementing this technique. For example, files that have poor cache utilization are great candidates for using direct I/O. Direct I/O also benefits applications that use synchronous writes, as these writes have to go to disk. CPU usage is reduced because the dual data copy piece is eliminated. This copy occurs when the disk is copied to the buffer cache and then again from the file. One of the major performance costs of direct I/O is that while it can reduce CPU usage, it can also result in processes taking longer to complete for smaller requests. Note that this applies to persistent segments files that have a permanent location on disk. When the file is not accessed through direct I/O with the IBM Enhanced Journaled File System for AIX 5L™ (JFS2), the file is cached as local pages and the data copied into RAM. Direct I/O, in many ways, gives you the similar performance of using raw logical volumes, while still keeping the benefits of having a JFS filesystem (for example, ease of administration). When mounting a file system using direct I/O, you should avoid large, file-enabled JFS filesystems.
What about concurrent I/O? First introduced in AIX Version 5.2, this feature invokes direct I/O, so it has all the other performance considerations associated with direct I/O. With standard direct I/O, inodes (data structures associated with a file) are locked to prevent a condition where multiple threads might try to change the consults of a file simultaneously. Concurrent I/O bypasses the inode lock, which allows multiple threads to read and write data concurrently to the same file. This is due to the way in which JFS2 is implemented with a write-exclusive inode lock, allowing multiples users to read the same file simultaneously. As you can imagine, direct I/O can cause major problems with databases that continuously read from the same file. Concurrent I/O solves this problem, which is why it's known as a feature that is used primarily for relational databases. Similar to direct I/O, you can implement this either through an open system call or by mounting the file system, as follows:
# mount -o cio /u |
When you mount the file system with this command, all its files use concurrent I/O. Even more so than using direct I/O, concurrent I/O provides almost all the advantages of using raw logical volumes, while still keeping the ease of administration available with file systems. Note that you cannot use concurrent I/O with JFS (only JFS2). Further, applications that might benefit from having a file system read ahead or high buffer cache hit rates might actually see performance degradation.
What about asynchronous I/O? Synchronous and asynchronous I/O refers to whether or not an application is waiting for the I/O to complete to begin processing. Appropriate usage of asynchronous I/O can significantly improve the performance of writes on the I/O subsystem. The way it works is that it essentially allows an application to continue processing while its I/O completes in the background. This improves performance because it allows I/O and application processing to run at the same time. Turning on asynchronous I/O really helps in database environments. How can you monitor asynchronous I/O server utilization? Both iostat (AIX Version 5.3 only) and nmon can monitor asynchronous I/O server utilization. Prior to AIX Version 5.3, the only way to determine this was using the
nmon
command. The standard command for
determining the amount of asynchronous I/O (legacy) servers configured on your
system is:
pstat -a | egrep ' aioserver' | wc -l |
The iostat
-A
command reports back
asynchronous I/O statistics (see Listing 1).Listing 1. iostat
-A
command# iostat -A System configuration: lcpu=2 drives=3 ent=0.60 paths=4 vdisks=4 aio: avgc avfc maxgc maxfc maxreqs avg-cpu: % user % sys % idle % iowait physc % entc 0 0 32 0 4096 6.4 8.0 85.4 0.2 0.1 16.0 Disks: % tm_ act Kbps tps Kb_read Kb_wrtn hdisk0 0.5 2.0 0.5 0 4 hdisk1 1.0 5.9 1.5 8 4 hdisk2 0.0 0.0 0.0 0 0 |
What does this all mean?
- avgc: This reports back the average global asynchronous I/O request per second of the interval you specified.
- avfc: This reports back the average fastpath request count per second for your interval.
- maxgc: This reports back the max global asynchronous I/O request since the last time this value was fetched.
- maxfc: This reports back the maximum fastpath request count since the last time this value was fetched.
- maxreqs: This is the maximum asynchronous I/O requests allowed.
How many should you configure? The rule of thumb is to set the maximum number of servers equal to ten times the amount of disk or ten times the amount of processors.
MinServers
would be set at one half of this amount.
Other than having some more kernel processes hanging out that really don't get
used (using a small amount of kernel memory), there really is little risk in
oversizing the amount of MaxServers
, so don't be afraid to bump it
up. How is this done? You can either use the chdev
command or the
smit fastpath
command:
# smit aio (or smit posixaio) |
This is also how you would enable asynchronous I/O on your system.
To increase your maxservers to 100 from the command line, use this command:
# chdev -l aio0 -a maxservers=100 |
Note that you must reboot prior to this change taking effect. On occasion, I'm asked what is the difference between
aio
and
posixaio
. The major difference between the two involve different
parameter passing, so you really need to configure both. One last concept is I/O pacing. This is an AIX feature that prevents disk I/O-intensive applications from flooding the CPU and disks. Appropriate usage of disk I/O pacing helps prevent programs that generate very large amounts of output from saturating the system's I/O and causing system degradation. Tuning the maxpout and minpout helps prevent threads performing sequential writes to files from dominating system resources.
You can also limit the effect of setting global parameters, by mounting file systems using an explicit 0 for minput and maxpout:
# mount -o minpout=0,maxpout=0 /u |
This section provides an overview of the AIX-specific tools (sar, topas, and nmon) available to monitor disk I/O activity. These tools allow you to quickly troubleshoot a performance problem and capture data for historical trending and analysis.
Don't expect to see iostat in this section, as iostat is a UNIX utility that allows you to quickly determine if there is an imbalanced I/O load between your physical disks and adapters. Unless you decide to write your own scripting tools using iostat, it will not help you with long-term trending and capturing data.
sar is one of those older generic UNIX tools that
have been improved over the years. While I generally prefer the use of more
specific AIX tools, such as topas or nmon, sar provides strong information with
respect to disk I/O. Let's run a typical
sar
command to examine I/O
activity (see Listing 2).Listing 2. Using sar
# sar -d 1 2 AIX newdev 3 5 06/04/07 System Configuration: lcpu=4 disk=5 07:11:16 device %busy avque r+w/s blks/s avwait avserv 07:11:17 hdisk1 0 0.0 0 0 0.0 0.0 hdisk0 29 0.0 129 85 0.0 0.0 hdisk3 0 0.0 0 0 0.0 0.0 hdisk2 0 0.0 0 0 0.0 0.0 cd0 0 0.0 0 0 0.0 0.0 07:11:18 hdisk1 0 0.0 0 0 0.0 0.0 hdisk0 35 0.0 216 130 0.0 0.0 hdisk3 0 0.0 0 0 0.0 0.0 hdisk2 0 0.0 0 0 0.0 0.0 cd0 0 0.0 0 0 0.0 0.0 Average hdisk1 0 0.0 0 0 0.0 0.0 hdisk0 32 0.0 177 94 0.0 0.0 hdisk3 0 0.0 0 0 0.0 0.0 hdisk2 0 0.0 0 0 0.0 0.0 cd0 0 0.0 0 0 0.0 0.0 |
Let's break down the column headings from Listing 2.
- %busy: This command reports back the portion of time that the device was busy servicing transfer requests.
- avque: In AIX Version 5.3, this command reports back the number of requests waiting to be sent to disk.
- r+w/s: This command reports back the number of read or write transfers to or from a device (512 byte units).
- avwait: This command reports the average wait time per request (milliseconds).
- avserv: This command reports the average service time per request (milliseconds).
You want to be wary of any disk that approaches
100 percent utilization or a large amount of queue requests waiting for disk.
While there is some activity on the sar output, there really are no I/O problems
because there is no waiting for I/O. You need to continue to monitor the system
to make sure that other disks are also being used besides hdisk0. Where sar is
different than iostat is that it has the ability to capture data for long-term
analysis and trending through its system activity data collector (sadc) utility.
Usually turned off in cron, this utility allows you to capture data for historic
trending and analysis. Here's how this works. As delivered on AIX systems by
default, there are two shell scripts that are normally commented out
(/usr/lib/sa/sa1 and /usr/lib/sa/sa2) that provide daily reports on the activity
of the system. The
sar
command actually calls the sadc routine to
access system data (see Listing 3).Listing 3. Example cronjob
# crontab -l | grep sa1 0 8-17 * * 1-5 /usr/lib/sa/sa1 1200 3 & 0 * * * 0,6 /usr/lib/sa/sa1 & 0 18-7 * * 1-5 /usr/lib/sa/sa1 & |
What about something a little more user-friendly? Did you say topas? topas is a nice performance monitoring tool that you can use for a number of purposes, including, but not limited to, your disk I/O subsystem.
Figure 2. topas
Take a look at the topas output from a disk perspective. There is no I/O activity going on here at all. Besides the physical disk, pay close attention to "Wait" (in the CPU section up top), which also helps determine if the system is I/O bound. If you see high numbers here, you can then use other tools, such as filemon, fileplace, lsof, or lslv, to help you figure out which processes, adapters, or file systems are causing your bottlenecks. topas is good for quickly troubleshooting an issue when you want a little more than iostat. In a sense, topas is a graphical mix of iostat and vmstat, though with recent improvements, it now allows the ability to capture data for historical analysis. These improvements were made on AIX Version 5.3, and no doubt were made because of the popularity of a similar tool that was created by someone from IBM—IBM does not officially support this tool.
This is nmon (my favorite AIX performance tool). While nmon provides a front-end similar to topas, it is much more useful in terms of long-term trending and analyses. Further, it gives the system administrator the ability to output data to an Excel spreadsheet that comes back in pretty looking charts (tailor made for senior management and functional teams) that clearly illustrate your bottlenecks. This is done through a tool called nmon analyzer, which provides the hooks into nmon. With respect to disk I/O, nmon reports back the following data: disk I/O rates, data transfers, read/write ratios, and disk adapter statistics.
Here is one small example of where nmon really shines. Say you want to know which processes are hogging most of the disk I/O and you want to be able to correlate it with the actual disk to clearly illustrate I/O per process. nmon usage helps you more then any other tool. To do this with nmon, use the
-t
option; set your timing and then sort by I/O channel. How do you
use nmon to capture data and import it into the analyzer? Use the
sudo
command and run nmon for three hours, taking a
snapshot every 30 seconds:
# sudo nmon -f -t -r test1 -s 30 -c 180 |
Then sort the output file that gets created:
# sort -A testsystem_yymmdd.nmon > testsystem_yymmdd.csv |
When this is completed, ftp the .csv file to your PC, start the nmon analyzer spreadsheet (enable macros), and click on analyze nmon data. You can download the nmon analyzer from here.
Figure 3 is a screenshot taken
from an AIX 5.3 system, which provides a disk summary for each disk in kilobytes
per second for reads and writes.
Figure 3. Disk summary for each disk in kilobytes per second for reads and writes
nmon also helps track the configuration of
asynchronous I/O servers, as you can see from the output in Listing 4.
Listing 4. Tracking the configuration of asynchronous I/O servers with nmon
lsattr -El aio0 lsattr -El aio0 autoconfig available STATE to be configured at system restart True lsattr -El aio0 fastpath enable State of fast path True lsattr -El aio0 kprocprio 39 Server PRIORITY True lsattr -El aio0 maxreqs 16384 Maximum number of REQUESTS True lsattr -El aio0 maxservers 100 MAXIMUM number of servers per cpu True lsattr -El aio0 minservers 50 MINIMUM number of servers True |
Before AIX Version 5.3, nmon was the only tool that showed you the amount of asynchronous I/O servers configured and the actual amount being used. As illustrated in the previous section, iostat has recently been enhanced to provide this function.
This article addressed the relative importance of the disk I/O subsystem. It defined and discussed the AIX I/O stack and how it related to both physical and logical disk I/O. It also covered some best practices for disk configuration in a database environment, looked at the differences between direct and concurrent I/O, and also discussed asynchronous I/O and I/O pacing. You tuned your asynchronous I/O servers and configured I/O pacing. You started up file systems in concurrent I/O mode and studied when to best implement concurrent I/O. Further, you learned all about iostat and captured data using sar, topas, and nmon. You also examined different types of output and defined many of the flags used in sar and iostat. Part 2 of this series drills down to the logical volume manager layer of the AIX I/O stack and looks at some of the snapshot-type tools, which help you quickly access the state of your disk I/O subsystem. Part 3 focuses primarily on tracing I/O usage using tools, such as filemon and fileplace, and how to improve file system performance overall.
Great blog. All posts have something to learn. Your work is very good and I appreciate you and hopping for some more informative posts. File I/O Monitor
ReplyDelete