CHAPTER 12
System-Monitoring Tools
To keep your system in optimum shape, you need to keep a close eye on it. Such monitoring is imperative in a corporate environment where uptime is vital and any system failures can cost real money. Whether it is checking processes for any errant daemons or keeping a close eye on CPU and memory use, Fedora provides a wealth of utilities designed to give you as little or as much feedback as you want. This chapter looks at some of the basic monitoring tools, along with some tactics designed to keep your system up longer. Some of the monitoring tools cover network connectivity, memory, and hard drive use, but all should find a place in your sysadmin toolkit. Finally, you will learn how to manipulate active system processes, using a mixture of graphical and command-line tools.
Those familiar with UNIX system administration already know about the ps or process display command commonly found on most flavors of UNIX. Because Linux is closely related to UNIX, it also benefits from this command and allows you to quickly see the current running processes on the system, as well as who owns them and how resource hungry they are.
Although the Linux kernel has its own distinct architecture and memory management, it also benefits from enhanced use of the /proc
file system, the virtual file system found on many UNIX flavors. Through the /proc
file system, you can communicate directly with the kernel to get a deep view of what is currently happening. Developers tend to use the /proc
file system as a way of getting information out from the kernel and for their programs to manipulate it into more human-readable formats. The /proc
file system is beyond the scope of this book, but if you want to get a better idea of what it contains you should head on over to http://en.tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html for an excellent and in-depth guide.
Processes can also be controlled at the command line, which is important because you might sometimes have only a command-line interface. Whenever an application or command is launched, either from the command line or a clicked icon, the process that comes from the kernel is assigned an identification number called a process ID or PID for short. This number is shown in the shell if the program is launched via the command line:
# system-config-display &
[1] 4286
In this example, the system-config-display
client has been launched in the background, and the ( bash
) shell reported a shell job number ( [1]
in this case). A job number or job control is a shell-specific feature that allows a different form of process control, such as sending or suspending programs to the background and retrieving background jobs to the foreground (see your shell's man pages for more information if you are not using bash
).
The second number displayed ( 4286
in this example) represents the process ID. You can get a quick list of your processes by using the ps
command like this:
# ps
PID TTY TIME CMD
4242 pts/0 00:00:00 su
4245 pts/0 00:00:00 bash
4286 pts/0 00:00:00 consolehelper-g
4287 pts/0 00:00:00 userhelper
4290 pts/0 00:00:00 system-config-d
4291 pts/0 00:00:00 python2
4293 pts/0 00:00:00 ps
Note that not all output from the display is shown here. But as you can see, the output includes the process ID, abbreviated as PID
, along with other information, such as the name of the running program. As with any UNIX command, many options are available; the proc
man page has a full list. A most useful option is aux
, which provides a much more detailed and helpful list of all the processes. You should also know that ps
works not by polling memory, but through the interrogation of the Linux /proc
or process file system. ( ps
is one of the interfaces mentioned at the beginning of this section.)
The /proc
directory contains quite a few files — some of which include constantly updated hardware information (such as battery power levels and so on). Linux administrators often pipe the output of ps
through a member of the grep
family of commands to display information about a specific program, perhaps like this:
$ ps aux | grep system-config-display
root 4286 0.0 0.3 13056 3172 pts/0 S 11:57 0:00 system-config-display
This example returns the owner (the user who launched the program) and the PID, along with other information, such as the percentage of CPU and memory use, size of the command (code, data, and stack), time (or date) the command was launched, and name of the command. Processes can also be queried by PID, as follows:
$ ps 4286
4286 pts/0 S 0:00 system-config-display
You can use the PID to stop a running process by using the shell's built-in kill
command. This command asks the kernel to stop a running process and reclaim system memory. For example, to stop the system-config-display
client in the example, use the kill
command like this:
$ kill 4286
After you press Enter (or perhaps press Enter again), the shell might report the following:
[1]+ Terminated system-config-display
Note that users can kill
only their own processes, but root
can kill them all. Controlling any other running process requires root permission, which should be used judiciously (especially when forcing a kill
by using the -9 option); by inadvertently killing the wrong process through a typo in the command, you could bring down an active system.
Using the kill
Command to Control Processes
The kill command is a basic UNIX system command. You can communicate with a running process by entering a command into its interface, such as when you type into a text editor. But some processes (usually system processes rather than application processes) run without such an interface, and you need a way to communicate with them, too, so we use a system of signals. The kill
system accomplishes that by sending a signal to a process, and you can use it to communicate with any process. The general format of the kill
command is as follows:
# kill option PID
A number of signal options can be sent as words or numbers, but most are of interest only to programmers. One of the most common is this:
# kill PID
This tells the process with PID to stop; you supply the actual PID.
# kill -9 PID
is the signal for kill
( 9
is the number of the SIGKILL signal); use this combination when the plain kill shown previously does not work.
# kill -SIGHUP PID
is the signal to "hang up" — stop — and then clean up all associated processes as well. (Its number is -1
.)
As you become proficient at process control and job control, you will learn the utility of a number of kill
options. You can find a full list of signal options in the signal
man page.
Using Priority Scheduling and Control
No process can make use of the system's resources (CPU, memory, disk access, and so on) as it pleases. After all, the kernel's primary function is to manage the system resources equitably. It does this by assigning a priority to each process so that some processes get better access to system resources and some processes might have to wait longer until their turn arrives. Priority scheduling can be an important tool in managing a system supporting critical applications or in a situation in which CPU and RAM use must be reserved or allocated for a specific task. Two legacy applications included with Fedora include the nice
and renice
commands. ( nice
is part of the GNU sh-utils package, whereas renice
is inherited from BSD UNIX.)
Читать дальше