|
|
This section presents an example of using flow profiling and block profiling to improve the performance of programs without even having the source code. Since there is no programmer intervention, there are just a few simple steps. Remember, the locality tuning through flow profiling doesn't help much unless there is a problem with paging.
For the example, we will use the free editor vile. It has about 200K of text. To build this program, one link thusly:
$ cc -o vile tcap.o main.o basic.o bind.o buffer.o crypt.o csrch.o display.o eval.o exec.o externs.o fences.o file.o filec.o fileio.o finderr.o glob.o globals.o history.o input.o insert.o isearch.o line.o map.o modes.o npopen.o oneliner.o opers.o path.o random.o regexp.o region.o search.o select.o spawn.o tags.o tbuff.o termio.o tmp.o undo.o version.o vmalloc.o window.o word.o wordmov.o input_stream.o -ltermcapFirst, we want to get all of this into one big object file so, we do this (and save a copy, since we will need one later):
$ ld -r -o vile.all.o tcap.o main.o basic.o bind.o buffer.o crypt.o csrch.o display.o eval.o exec.o externs.o fences.o file.o filec.o fileio.o finderr.o glob.o globals.o history.o input.o insert.o isearch.o line.o map.o modes.o npopen.o oneliner.o opers.o path.o random.o regexp.o region.o search.o select.o spawn.o tags.o tbuff.o termio.o tmp.o undo.o version.o vmalloc.o window.o word.o wordmov.o input_stream.o -ltermcap $ cp vile.all.o hold.all.o
Next, set up the code for flow profiling and create an experimental vile. Since we know we will use fur on this same object repeatedly, it is usefule to use the -k option.
$ fur -k keep -p all -e all vile.all.o $ cc -o vile vile.all.oThen, run vile and give it a lot of work.
$ fprof -CLogging=on,LogPrefix=vileflow -s vileThen, scan the logs (notice the information on stderr describes an improvement of Page Use Efficiency from 25.8% to 65.6%):
$ lrt_scan vile vileflow.12345 > vile.funcs Processing log vileflow.23156 328 out of 1066 symbols were referenced Seeding with Early Trying Algorithm Pairwise Pattern - 200 lookahead Mon Aug 7 14:29:09 1995 Average Working Set: 10.4 Percentage: 64.8 Best Seeding with Reverse Late Trying Algorithm Pairwise Pattern - 200 lookahead Mon Aug 7 14:29:09 1995 Average Working Set: 11.0 Percentage: 61.7 Seeding with Late Trying Algorithm Pairwise Pattern - 200 lookahead Mon Aug 7 14:29:09 1995 Average Working Set: 10.3 Percentage: 65.6 Best Seeding with Sum Trying Algorithm Pairwise Pattern - 200 lookahead Mon Aug 7 14:29:09 1995 Average Working Set: 10.6 Percentage: 63.7 Seeding with Reverse Sum Trying Algorithm Pairwise Pattern - 200 lookahead Mon Aug 7 14:29:10 1995 Average Working Set: 10.8 Percentage: 62.5 Seeding with Standard Trying Algorithm Sum Mon Aug 7 14:29:10 1995 Average Working Set: 13.3 Percentage: 51.0 Seeding with Standard Trying Algorithm Median Mon Aug 7 14:29:10 1995 Average Working Set: 13.5 Percentage: 50.0 Seeding with Standard Trying Algorithm Late Mon Aug 7 14:29:10 1995 Average Working Set: 13.5 Percentage: 50.0 Seeding with Standard Trying Algorithm Early Mon Aug 7 14:29:10 1995 Average Working Set: 12.2 Percentage: 55.4 Seeding with Standard Trying Algorithm Original - Zeroes Mon Aug 7 14:29:10 1995 Average Working Set: 16.4 Percentage: 41.2 Seeding with Standard Trying Algorithm Original Mon Aug 7 14:29:10 1995 Average Working Set: 26.3 Percentage: 25.8 Mon Aug 7 14:29:10 1995 Using order from Pairwise Pattern - 200 lookaheadNow, let's do an experiment using block profiling:
$ cp hold.all.o vile.all.o $ fur -k keep -b all -c mklog vile.all.o $ cc -o vile vile.all.o $ vileLet's read the logs and combine the information we got out of the flow profiling (observing the metrics while we are at it):
$ cp hold.all.o vile.all.o $ fur -k keep -m -r -o vile.order -l vile.funcs -f block.vile.all.00 vile.all.o Maximum executed function: line_height: 4947 Jump Percentage: 81.6 Line Usage Efficiency before tuning: 42.6 Line Usage Efficiency after tuning: 72.9 $ fur -k keep -o vile.order vile.all.o $ cc -o vile vile.all.oWe now have a tuned program.