?? unx39.htm
字號:
<PRE>
<BR>The CPU processes instructions and programs. Each time you submit a job to the system, it makes demands on the CPU. Usually, the CPU can service all demands in a timely manner. However, there is only so much available processing power, which must be
shared by all users and the internal programs of the operating system, too.
<BR></PRE>
<TR>
<TD>
<PRE>
<BR>Memory
<BR></PRE>
<TD>
<PRE>
<BR>Every program that runs on the system makes some demand on the physical memory on the machine. Like the CPU, it is a finite resource. When the active processes and programs that are running on the system request more memory than the machine actually
has, paging is used to move parts of the processes to disk and reclaim their memory pages for use by other processes. If further shortages occur, the system may also have to resort to swapping, which moves entire processes to disk to make room.
<BR></PRE>
<TR>
<TD>
<PRE>
<BR>I/O
<BR></PRE>
<TD>
<PRE>
<BR>The I/O subsystem(s) transfers data into and out of the machine. I/O subsystems comprise devices such as disks, printers, terminals/keyboards, and other relatively slow devices, and are a common source of resource contention problems. In addition,
there is a rapidly increasing use of network I/O devices. When programs are doing a lot of I/O, they can get bogged down waiting for data from these devices. Each subsystem has its own limitations with respect to the bandwidth that it can effectively use
for I/O operations, as well as its own peculiar problems.</PRE></TABLE>
<P>Performance monitoring and tuning is not always an exact science. In the displays that follow, there is a great deal of variety in the system/subsystem loadings, even for the small sample of systems used here. In addition, different user groups have
widely differing requirements. Some users will put a strain on the I/O resources, some on the CPU, and some will stress the network. Performance tuning is always a series of trade-offs. As you will see, increasing the kernel size to alleviate one problem
may aggravate memory utilization. Increasing NFS performance to satisfy one set of users may reduce performance in another area and thereby aggravate another set of users. The goal of the task is often to find an optimal compromise that will satisfy the
majority of user and system resource needs.
<BR></P>
<H3 ALIGN="CENTER">
<CENTER><A ID="I6" NAME="I6">
<FONT SIZE=4><B>Monitoring the Overall System Status</B>
<BR></FONT></A></CENTER></H3>
<P>The examination of specific UNIX performance monitoring techniques begins with a look at three basic tools that give you a snapshot of the overall performance of the system. After getting this high-level view, you will normally proceed to examine each
of the subsystems in detail.
<BR></P>
<H4 ALIGN="CENTER">
<CENTER><A ID="I7" NAME="I7">
<FONT SIZE=3><B>Monitoring System Status Using </B><B><I>uptime</I></B>
<BR></FONT></A></CENTER></H4>
<P>One of the simplest reports that you use to monitor UNIX system performance measures the number of processes in the UNIX run queue during given intervals. It comes from the command uptime. It is both a high-level view of the system's workload and a
handy starting place when the system seems to be performing slowly. In general, processes in the run queue are active programs (that is, not sleeping or waiting) that require system resources. Here is an example:
<BR></P>
<PRE>% uptime
2:07pm up 11 day(s), 4:54, 15 users, load average: 1.90, 1.98, 2.01</PRE>
<P>The useful parts of the display are the three load-average figures. The 1.90 load average was measured over the last minute. The 1.98 average was measured over the last 5 minutes. The 2.01 load average was measured over the last 15 minutes.
<BR></P>
<HR ALIGN=CENTER>
<NOTE>
<IMG SRC="imp.gif" WIDTH = 68 HEIGHT = 35><B>TIP: </B>What you are usually looking for is the trend of the averages. This particular example shows a system that is under a fairly consistent load. However, if a system is having problems, but the load
averages seem to be declining steadily, then you may want to wait a while before you take any action that might affect the system and possibly inconvenience users. While you are doing some ps commands to determine what caused the problem, the imbalance may
correct itself.
<BR></NOTE>
<HR ALIGN=CENTER>
<HR ALIGN=CENTER>
<NOTE>
<IMG SRC="note.gif" WIDTH = 35 HEIGHT = 35><B>NOTE:</B> uptime has certain limitations. For example, high-priority jobs are not distinguished from low-priority jobs although their impact on the system can be much greater.
<BR></NOTE>
<HR ALIGN=CENTER>
<P>Run uptime periodically and observe both the numbers and the trend. When there is a problem it will often show up here, and tip you off to begin serious investigations. As system loads increase, more demands will be made on your memory and I/O
subsystems, so keep an eye out for paging, swapping, and disk inefficiencies. System loads of 2 or 3 usually indicate light loads. System loads of 5 or 6 are usually medium-grade loads. Loads above 10 are often heavy loads on large UNIX machines. However,
there is wide variation among types of machines as to what constitutes a heavy load. Therefore, the mentioned technique of sampling your system regularly until you have your own reference for light, medium, and heavy loads is the best technique.
<BR></P>
<H4 ALIGN="CENTER">
<CENTER><A ID="I8" NAME="I8">
<FONT SIZE=3><B>Monitoring System Status Using </B><B><I>perfmeter</I></B>
<BR></FONT></A></CENTER></H4>
<P>Because the goal of this first section is to give you the tools to view your overall system performance, a brief discussion of graphical performance meters is appropriate. SUN Solaris users are provided with an OpenWindows XView tool called perfmeter,
which summarizes overall system performance values in multiple dials or strip charts. Strip charts are the default. Not all UNIX systems come with such a handy tool. That's too bad because in this case a picture is worth, if not a thousand words, at least
30 or 40 man pages. In this concise format, you get information about the system resources shown in Table 39.1:
<BR></P>
<UL>
<LH><B>Table 39.1. System resources and their descriptions.</B>
<BR></LH></UL>
<TABLE BORDER>
<TR>
<TD>
<PRE><I>Resources</I>
<BR></PRE>
<TD>
<PRE><I>Description</I>
<BR></PRE>
<TR>
<TD>
<P>cpu</P>
<TD>
<P>Percent of CPU being utilized</P>
<TR>
<TD>
<P>pkts</P>
<TD>
<P>EtherNet activity, in packets per second</P>
<TR>
<TD>
<P>page</P>
<TD>
<P>Paging, in pages per second</P>
<TR>
<TD>
<P>swap</P>
<TD>
<P>Jobs swapped per second</P>
<TR>
<TD>
<P>intr</P>
<TD>
<P>Number of device interrupts per second</P>
<TR>
<TD>
<P>disk</P>
<TD>
<P>Disk traffic, in transfers per second</P>
<TR>
<TD>
<P>cntxt</P>
<TD>
<P>Number of context switches per second</P>
<TR>
<TD>
<P>load</P>
<TD>
<P>Average number of runnable processes over the last minute</P>
<TR>
<TD>
<P>colls</P>
<TD>
<P>Collisions per second detected on the EtherNet</P>
<TR>
<TD>
<P>errs</P>
<TD>
<P>Errors per second on receiving packets</P></TABLE>
<P>The charts of the perfmeter are not a source for precise measurements of subsystem performance, but they are graphic representations of them. However, the chart can be very useful for monitoring several aspects of the system at the same time. When you
start a particular job, the graphics can demonstrate the impact of that job on the CPU, on disk transfers, and on paging. Many developers like to use the tool to assess the efficiency of their work for this very reason. Likewise, system administrators use
the tool to get valuable clues about where to start their investigations. As an example, when faced with intermittent and transitory problems, glancing at a perfmeter and then going directly to the proper display may increase the odds that you can catch in
the act the process that is degrading the system.
<BR></P>
<P>The scale value for the strip chart changes automatically when the chart refreshes to accommodate increasing or decreasing values on the system. You add values to be monitored by clicking the right mouse button and selecting from the menu. From the same
menu you can select properties, which will let you modify what the perfmeter is monitoring, the format (dials/graphs, direction of the displays, and solid/lined display), remote/local machine choice, and the frequency of the display.
<BR></P>
<P>You can also set a ceiling value for a particular strip chart. If the value goes beyond the ceiling value, this portion of the chart will be displayed in red. Thus, a system administrator who knows that someone is periodically running a job that eats up
all the CPU memory can set a signal that the job may be run again. The system administrator can also use this to monitor the condition of critical values from several feet away from his monitor. If he or she sees red, other users may be seeing red, too.
<BR></P>
<P>The perfmeter is a utility provided with SunOS. You should check your own particular UNIX operating system to determine if similar performance tools are provided.
<BR></P>
<H4 ALIGN="CENTER">
<CENTER><A ID="I9" NAME="I9">
<FONT SIZE=3><B>Monitoring System Status Using </B><B><I>sar -q</I></B>
<BR></FONT></A></CENTER></H4>
<P>If your machine does not support uptime, there is an option for sar that can provide the same type of quick, high-level snapshot of the system. The -q option reports the average queue length and the percentage of time that the queue is occupied.
<BR></P>
<PRE>% sar q 5 5
07:28:37 runqsz %runocc swpqsz %swpocc
07:28:42 5.0 100 _
07:28:47 5.0 100 _
07:28:52 4.8 100 _
07:28:57 4.8 100 _
07:29:02 4.6 100 _
Average 4.8 100 _</PRE>
<P>The fields in this report are the following:
<BR></P>
<TABLE BORDER>
<TR>
<TD>
<P>runq-sz</P>
<TD>
<P>This is the length of the run queue during the interval. The run queue list doesn't include jobs that are sleeping or waiting for I/O, but does include jobs that are in memory and ready to run.</P>
<TR>
<TD>
<P>%runocc</P>
<TD>
<P>This is the percentage of time that the run queue is occupied.</P>
<TR>
<TD>
<P>swpq-sz</P>
<TD>
<P>This is the average length of the swap queue during the interval. Jobs or threads that have been swapped out and are therefore unavailable to run are shown here.</P>
<TR>
<TD>
<P>%swpocc</P>
<TD>
<P>This is the percentage of time that there are swapped jobs or threads.</P></TABLE>
<P>The run queue length is used in a similar way to the load averages of uptime. Typically the number is less than 2 if the system is operating properly. Consistently higher values indicate that the system is under heavier loads, and is quite possibly CPU
bound. When the run queue length is high and the run queue percentage is occupied 100% of the time, as it is in this example, the system's idle time is minimized, and it is good to be on the lookout for performance-related problems in the memory and disk
subsystems. However, there is still no activity indicated in the swapping columns in the example. You will learn about swapping in the next section, and see that although this system is obviously busy, the lack of swapping is a partial vote of confidence
that it may still be functioning properly.
<BR></P>
<H4 ALIGN="CENTER">
<CENTER><A ID="I10" NAME="I10">
<FONT SIZE=3><B>Monitoring System Status Using </B><B><I>sar -u</I></B>
<BR></FONT></A></CENTER></H4>
<P>Another quick and easy tool to use to determine overall system utilization is sar with the -u option. CPU utilization is shown by -u, and sar without any options defaults on most versions of UNIX to this option. The CPU is either busy or idle. When it
is busy, it is either working on user work or system work. When it is not busy, it is either waiting on I/O or it is idle.
<BR></P>
<PRE>% sar u 5 5
13:16:58 %usr %sys %wio %idle
13:17:03 40 10 13 38
13:17:08 31 6 48 14
13:17:13 42 15 9 34
13:17:18 41 15 10 35
13:17:23 41 15 11 33
Average 39 12 18 31</PRE>
<P>The fields in the report are the following:
<BR></P>
<TABLE BORDER>
<TR>
<TD>
<P>%usr</P>
<TD>
<P>This is the percentage of time that the processor is in user mode (that is, executing code requested by a user).</P>
<TR>
<TD>
<P>%sys</P>
<TD>
<P>This is the percentage of time that the processor is in system mode, servicing system calls. Users can cause this percentage to increase above normal levels by using system calls inefficiently.</P>
<TR>
<TD>
<P>%wio</P>
<TD>
<P>This is the percentage of time that the processor is waiting on completion of I/O, from disk, NFS, or RFS. If the percentage is regularly high, check the I/O systems for inefficiencies.</P>
<TR>
<TD>
<P>%idle</P>
<TD>
<P>This is the percentage of time the processor is idle. If the percentage is high and the system is heavily loaded, there is probably a memory or an I/O problem.</P></TABLE>
<P>In this example, you see a system with ample CPU capacity left (that is, the average idle percentage is 31%). The system is spending most of its time on user tasks, so user programs are probably not too inefficient with their use of system calls. The
I/O wait percentage indicates an application that is making a fair amount of demands on the I/O subsystem.
<BR></P>
<P>Most administrators would argue that %idle should be in the low 'teens rather than 0, at least when the system is under load. If it is 0 it doesn't necessarily mean that the machine is operating poorly. However, it is usually a good bet that the machine
is out of spare computational capacity and should be upgraded to the next level of CPU speed. The reason to upgrade the CPU is in anticipation of future growth of user processing requirements. If the system work load is increasing, even if the users
haven't yet encountered the problem, why not anticipate the requirement? On the other hand, if the CPU idle time is high under heavy load, a CPU upgrade will probably not help improve performance much.
<BR></P>
<P>Idle time will generally be higher when the load average is low.
<BR></P>
<P>A high load average and idle time is a symptom of potential problems. Either the memory or the I/O subsystems, or both, are hindering the swift dispatch and completion of the jobs. You should review the following sections that show how to look for
paging, swapping, disk, or network-related problems.
<BR></P>
<H3 ALIGN="CENTER">
<CENTER><A ID="I11" NAME="I11">
<FONT SIZE=4><B>Monitoring Processes with </B><B><I>ps</I></B>
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -