Analyzing run-time behavior

Improving program performance by automated analysis

This section presents an example of using flow profiling and block profiling to improve the performance of programs without even having the source code. Since there is no programmer intervention, there are just a few simple steps. Remember, the locality tuning through flow profiling doesn't help much unless there is a problem with paging.

For the example, we will use the free editor vile. It has about 200K of text. To build this program, one link thusly:

   $ cc -o vile tcap.o main.o basic.o bind.o buffer.o crypt.o csrch.o display.o
    eval.o exec.o externs.o fences.o file.o filec.o fileio.o finderr.o
    glob.o globals.o history.o input.o insert.o isearch.o line.o map.o
    modes.o npopen.o oneliner.o opers.o path.o random.o regexp.o region.o
    search.o select.o spawn.o tags.o tbuff.o termio.o tmp.o undo.o version.o
    vmalloc.o window.o word.o wordmov.o input_stream.o -ltermcap

First, we want to get all of this into one big object file so, we do this (and save a copy, since we will need one later):

   $ ld -r -o vile.all.o tcap.o main.o basic.o bind.o buffer.o crypt.o csrch.o
    display.o eval.o exec.o externs.o fences.o file.o filec.o fileio.o
    finderr.o glob.o globals.o history.o input.o insert.o isearch.o line.o
    map.o modes.o npopen.o oneliner.o opers.o path.o random.o regexp.o
    region.o search.o select.o spawn.o tags.o tbuff.o termio.o tmp.o undo.o
    version.o vmalloc.o window.o word.o wordmov.o input_stream.o -ltermcap
   $ cp vile.all.o hold.all.o

Next, set up the code for flow profiling and create an experimental vile. Since we know we will use fur on this same object repeatedly, it is usefule to use the -k option.

   $ fur -k keep -p all -e all vile.all.o
   $ cc -o vile vile.all.o

Then, run vile and give it a lot of work.

   $ fprof -CLogging=on,LogPrefix=vileflow -s vile

Then, scan the logs (notice the information on stderr describes an improvement of Page Use Efficiency from 25.8% to 65.6%):

   $ lrt_scan vile vileflow.12345 > vile.funcs
    Processing log vileflow.23156
    328 out of 1066 symbols were referenced
    Seeding with Early
    Trying Algorithm Pairwise Pattern - 200 lookahead
    Mon Aug  7 14:29:09 1995
    Average Working Set: 10.4
    Percentage: 64.8
    Best
    Seeding with Reverse Late
    Trying Algorithm Pairwise Pattern - 200 lookahead
    Mon Aug  7 14:29:09 1995
    Average Working Set: 11.0
    Percentage: 61.7
    Seeding with Late
    Trying Algorithm Pairwise Pattern - 200 lookahead
    Mon Aug  7 14:29:09 1995
    Average Working Set: 10.3
    Percentage: 65.6
    Best
    Seeding with Sum
    Trying Algorithm Pairwise Pattern - 200 lookahead
    Mon Aug  7 14:29:09 1995
    Average Working Set: 10.6
    Percentage: 63.7
    Seeding with Reverse Sum
    Trying Algorithm Pairwise Pattern - 200 lookahead
    Mon Aug  7 14:29:10 1995
    Average Working Set: 10.8
    Percentage: 62.5
    Seeding with Standard
    Trying Algorithm Sum
    Mon Aug  7 14:29:10 1995
    Average Working Set: 13.3
    Percentage: 51.0
    Seeding with Standard
    Trying Algorithm Median
    Mon Aug  7 14:29:10 1995
    Average Working Set: 13.5
    Percentage: 50.0
    Seeding with Standard
    Trying Algorithm Late
    Mon Aug  7 14:29:10 1995
    Average Working Set: 13.5
    Percentage: 50.0
    Seeding with Standard
    Trying Algorithm Early
    Mon Aug  7 14:29:10 1995
    Average Working Set: 12.2
    Percentage: 55.4
    Seeding with Standard
    Trying Algorithm Original - Zeroes
    Mon Aug  7 14:29:10 1995
    Average Working Set: 16.4
    Percentage: 41.2
    Seeding with Standard
    Trying Algorithm Original
    Mon Aug  7 14:29:10 1995
    Average Working Set: 26.3
    Percentage: 25.8
    Mon Aug  7 14:29:10 1995
    Using order from Pairwise Pattern - 200 lookahead

Now, let's do an experiment using block profiling:

   $ cp hold.all.o vile.all.o
   $ fur -k keep -b all -c mklog vile.all.o
   $ cc -o vile vile.all.o
   $ vile

Let's read the logs and combine the information we got out of the flow profiling (observing the metrics while we are at it):

   $ cp hold.all.o vile.all.o
   $ fur -k keep -m -r -o vile.order -l vile.funcs -f block.vile.all.00 vile.all.o
    Maximum executed function: line_height: 4947
    Jump Percentage: 81.6
    Line Usage Efficiency before tuning: 42.6
    Line Usage Efficiency after tuning: 72.9
   $ fur -k keep -o vile.order vile.all.o
   $ cc -o vile vile.all.o

We now have a tuned program.