How to control program performance
As mentioned earlier, in any shell script, 90% of the computational
load is imposed by about 10% of the script. The bottlenecks to look
out for are as follows:
-
Loops, especially the main program loop. A process which is called
repeatedly imposes a heavy load on the computer. Most shell script
loops are extremely heavy users of computer resources because they
exec programs several times in rapid succession.
-
File access (and reads and writes directed through named
pipes). Because the computer's hard disk is several orders of
magnitude slower than its memory, any procedure that involves heavy
disk I/O will invariably impose a heavy load on the
system.
-
Processes. Many commands are built into the shell; but those which
are not require the system to load and execute a program. This has
two consequences; a disk access is required, and an additional
process is run (diverting resources from any other processes which
are being executed concurrently).
-
Size of data. It should be obvious that as the files that are being
processed by a filter grow longer, all processes involving the file
take longer. However, the relationship between file size and time is
not fixed; big files may take much longer to process than several
small files containing the same total amount of information.
To improve the performance of a shell script, you need to be
constantly aware of these considerations. Any activity that takes
place in a main loop is likely to yield a big performance
improvement if you can find a way to reduce the amount of disk
I/O or number of processes it requires. Activities that
require a large data file may be speeded up by switching to several
smaller files, if possible. (A small file is one that is less than
eight or ten kilobytes long; for technical reasons such files can be
opened and scanned more rapidly than larger files.)
The standard development cycle, which should be applied to shell
procedures as to other programs, is to write code, get it working,
thoroughly test it, measure it, and optimize the important parts
(outlined above), looping back to earlier stages wherever necessary.
The
time(C)
command is a useful tool for optimizing shell
scripts. time is used to establish how long a command took
to execute:
$ time ls
real 0m0.06s
user 0m0.03s
sys 0m0.03s
The values reported by time are the elapsed time during
the command (the real time); the time the system took to execute the
system calls within the command (the ``sys'' time); and the time
spent processing the command itself (the user time). In practice,
only the first value, the real time, is relevant at this
level. Note that this is the output from the Korn shell's built-in
time command; the Bourne shell output may vary. (If you
have the Development System, the
timex(ADM)
command offers additional facilities.)
Because the SCO OpenServer system is multi-tasking, it is impossible to
accurately judge how long a program is taking to run by any other
means; a seemingly slow process may be the result of an unusually
heavy load being placed on the computer by some other user or process.
Each timing
test should be run several times, because the results are easily
disturbed by variations in system load.
A useful technique is to encapsulate the body of a loop within a
function, so that the sole activity within the loop is to call
that function; you can then time the function, and time
the loop as a whole. Alternatively, you can time individual steps
in the process to see which of them are taking longest.
Next topic:
Number of processes generated
Previous topic:
How programs perform
© 2005 The SCO Group, Inc. All rights reserved.
SCO OpenServer Release 6.0.0 -- 03 June 2005