|
|
This section presents an example of using flow profiling and block profiling to improve the performance of programs without even having the source code. Since there is no programmer intervention, there are just a few simple steps. Remember, the locality tuning through flow profiling doesn't help much unless there is a problem with paging.
For the example, we will use the free editor vile. It has about 200K of text. To build this program, one link thusly:
$ cc -o vile tcap.o main.o basic.o bind.o buffer.o crypt.o csrch.o display.o
eval.o exec.o externs.o fences.o file.o filec.o fileio.o finderr.o
glob.o globals.o history.o input.o insert.o isearch.o line.o map.o
modes.o npopen.o oneliner.o opers.o path.o random.o regexp.o region.o
search.o select.o spawn.o tags.o tbuff.o termio.o tmp.o undo.o version.o
vmalloc.o window.o word.o wordmov.o input_stream.o -ltermcap
First, we want to get all of this into one big object file so, we do this
(and save a copy, since we will need one later):
$ ld -r -o vile.all.o tcap.o main.o basic.o bind.o buffer.o crypt.o csrch.o
display.o eval.o exec.o externs.o fences.o file.o filec.o fileio.o
finderr.o glob.o globals.o history.o input.o insert.o isearch.o line.o
map.o modes.o npopen.o oneliner.o opers.o path.o random.o regexp.o
region.o search.o select.o spawn.o tags.o tbuff.o termio.o tmp.o undo.o
version.o vmalloc.o window.o word.o wordmov.o input_stream.o -ltermcap
$ cp vile.all.o hold.all.o
Next, set up the code for flow profiling and create an experimental vile. Since we know we will use fur on this same object repeatedly, it is usefule to use the -k option.
$ fur -k keep -p all -e all vile.all.o $ cc -o vile vile.all.oThen, run vile and give it a lot of work.
$ fprof -CLogging=on,LogPrefix=vileflow -s vileThen, scan the logs (notice the information on stderr describes an improvement of Page Use Efficiency from 25.8% to 65.6%):
$ lrt_scan vile vileflow.12345 > vile.funcs
Processing log vileflow.23156
328 out of 1066 symbols were referenced
Seeding with Early
Trying Algorithm Pairwise Pattern - 200 lookahead
Mon Aug 7 14:29:09 1995
Average Working Set: 10.4
Percentage: 64.8
Best
Seeding with Reverse Late
Trying Algorithm Pairwise Pattern - 200 lookahead
Mon Aug 7 14:29:09 1995
Average Working Set: 11.0
Percentage: 61.7
Seeding with Late
Trying Algorithm Pairwise Pattern - 200 lookahead
Mon Aug 7 14:29:09 1995
Average Working Set: 10.3
Percentage: 65.6
Best
Seeding with Sum
Trying Algorithm Pairwise Pattern - 200 lookahead
Mon Aug 7 14:29:09 1995
Average Working Set: 10.6
Percentage: 63.7
Seeding with Reverse Sum
Trying Algorithm Pairwise Pattern - 200 lookahead
Mon Aug 7 14:29:10 1995
Average Working Set: 10.8
Percentage: 62.5
Seeding with Standard
Trying Algorithm Sum
Mon Aug 7 14:29:10 1995
Average Working Set: 13.3
Percentage: 51.0
Seeding with Standard
Trying Algorithm Median
Mon Aug 7 14:29:10 1995
Average Working Set: 13.5
Percentage: 50.0
Seeding with Standard
Trying Algorithm Late
Mon Aug 7 14:29:10 1995
Average Working Set: 13.5
Percentage: 50.0
Seeding with Standard
Trying Algorithm Early
Mon Aug 7 14:29:10 1995
Average Working Set: 12.2
Percentage: 55.4
Seeding with Standard
Trying Algorithm Original - Zeroes
Mon Aug 7 14:29:10 1995
Average Working Set: 16.4
Percentage: 41.2
Seeding with Standard
Trying Algorithm Original
Mon Aug 7 14:29:10 1995
Average Working Set: 26.3
Percentage: 25.8
Mon Aug 7 14:29:10 1995
Using order from Pairwise Pattern - 200 lookahead
Now, let's do an experiment using block profiling:
$ cp hold.all.o vile.all.o $ fur -k keep -b all -c mklog vile.all.o $ cc -o vile vile.all.o $ vileLet's read the logs and combine the information we got out of the flow profiling (observing the metrics while we are at it):
$ cp hold.all.o vile.all.o
$ fur -k keep -m -r -o vile.order -l vile.funcs -f block.vile.all.00 vile.all.o
Maximum executed function: line_height: 4947
Jump Percentage: 81.6
Line Usage Efficiency before tuning: 42.6
Line Usage Efficiency after tuning: 72.9
$ fur -k keep -o vile.order vile.all.o
$ cc -o vile vile.all.o
We now have a tuned program.