Analyzing run-time behavior

Analyzing the results of the experiment

There are three different tools used to analyze flow profile logs. The first is used for locality of reference tuning and the other two are used for the other kinds of analysis. Independent of the command, the environment variable FPROF_PATH is used to tell the tools where to find the objects in the log. Of course, this is not necessary if the data is being analyzed in the same environment as the experiments were run.

Before discussing the tools themselves there are a couple of concepts to consider. The first is ``compiling''. Logs are written to disk in a form which is, by necessity, very fast. However, this leaves work for the reading program. The first time a log is read, it is automatically compiled into a form which is faster to read subsequent times. Therefore, the first time you view the information in a log, it will be much slower than subsequent reads.

The second notion is marks. A mark, as the name suggests, is a "note" placed in the log that can be referenced when reading the log. This is quite useful to note the end of an activity in an experiment. For example, let's say you have a program that first reads the list of files in a directory, then it does some operations on each of the files. If you wanted to examine each of these activities separately, you could place a mark after the first part. Later, you will see how to use these marks when reading.

Utilizing flow-profile results for locality of reference tuning

To analyze the flow profiling logs for locality of reference tuning, use the command lrt_scan(1). lrt_scan takes an object and a series of logs as arguments and gives a list of functions on its standard output. This list of functions is used by the tool fur.

fur(1) reorders the functions in an object file based on an input ordering. Note that fur only works on relocatable objects. The tricky part about using flow
profiling and fur is that the code needs to be compiled in different ways for each. For example,

   $ cc -qf -o travel.test travel.c misc.c
   $ cc -Wa,-r -c travel.c
   $ cc -Wa,-r -c misc.c
   $ ld -r -o travel.r travel.o misc.o
   $ fprof -s -C Accuracy=normal,LogPrefix=myexp travel.test &
   990
   $ Reading in data . . .
   Processing data
   done.
   lrt_scan travel myexp.990 > list
   $ fur -l list travel.r
   $ cc -o travel travel.r

The first statement builds a flow profiling version of travel to be used for running the experiment. The second through fifth statements build a relocatable version of travel (this is not an executable). Next, we run the experiment. Using lrt_scan, the log is analyzed and an ordering is placed in list. The relocatable file is then reordered using fur. Lastly, the "production" version of travel is built.

Statistical analysis of flow-profile data

As mentioned earlier, the tool fprof is available for this purpose. fprof is an interactive tool which is based on the ksh. When one types fprof, one starts a ksh which has extra commands built-in. The manual page lists all of the commands, so here we will just highlight those that are useful for statistical analysis.

Every fprof interaction begins with a call to open, to open up the logs you would like to examine (if one gives logs on the command line, this is an implicit call to open). It is often convenient after opening the logs, to do an info command. This gives information about the log: when the experiment was run, what objects were accessed, the accuracy of the timestamps, what node the experiment was run on, etc. Then, if you would like to see cumulative statistics, you can use count. This often takes a long time.

After this is completed you may choose to see 3 different types of data: function usage (using the command stats), caller of function usage (using the command callers) or callee of function usage (using the command callees). stats lists the functions that were used, the number of times each function was called and how much time each function consumed. The time is reported with 2 different pieces of information: how much time the function spent on top of the stack and how much time the function spent anywhere on the stack (that is, how much time the function was running or any function that it called was running). This latter piece of information is not available using prof, but can be quite useful for determining which function is the "cause" of performance problems, rather than the "effect" of the problem.

The command callers is used to learn about the callers of a given function. For example, let's say that you have a function, func, that is consuming a lot of the time in your experiment. Further, let's assume that func is spending all of its time at the top of the stack. If you use the callers command, you will see how many times func was called by each other function and what percentage of func's time was spent when called from each other function. This should immediately tell you why func was taking so long.

The command callees is used to learn about the callees of a given function. For example, let's say that you have a function, func, that is consuming a lot of the time in your experiment, but it is not spending a lot of time on top of the stack. In other words, it is calling code that is taking all the time. If you use the callees command, you will see how many times func called each other function (either directly or indirectly) and what percentage of func's time on the stack was spent in each of these functions. Again, this should lead you right to your answer.

Here is a short example:

   $ cc -qf -o myprog.test myprog.c
   $ myprog.test &
   $ wait
   $ fprof /tmp/out.$!
   fprof: count
   fprof: stats
              Function              No. of Calls    Top of Stack     On Stack   
   ------------------------------- --------------- --------------- ---------------
   main                            1               0.058099        2.308941      
   func1                           10000           0.217145        2.250403      
   func2                           30000           0.644329        2.033258      
   func3                           90000           0.976436        1.388929      
   func4                           90001           0.412932        0.412932      
   fprof: callers func4
   ***************** func4 - 90001, 0.412932 *****************
              Function              No. of Calls    Top of Stack     On Stack   
   ------------------------------ --------------- --------------- ---------------
   main                           90001           100.0000        100.0000      
   func1                          90000           099.8937        099.8937      
   func2                          90000           099.8937        099.8937      
   func3                          90000           099.8937        099.8937      
   fprof: callees func1
   ***************** func1 - 10000, 2.250403 *****************
              Function              No. of Calls    Top of Stack     On Stack   
   ------------------------------ --------------- --------------- ---------------
   func2                          30000           028.6317        090.3508      
   func3                          90000           043.3894        061.7191      
   func4                          90000           018.3297        018.3297

Analyzing flow-profile results for analysis of run-time flow

So, what does it mean to analyze "run-time flow"? This is actually the simplest type of performance analysis possible. In this form of analysis, the programmer just reads sequentially what functions were called in order and how long they took. This type of analysis is used to see exactly what is happening when a program is running. This is especially useful for programs where the programmer may not be aware of all the ramifications of each step in the program.

Of course, even small programs with short experiments will often have thousands of function calls (and, therefore, reading logs sequentially is impractical). For this reason fprof has features to navigate while reading the logs.

Just as for statistical analysis above, a session begins with the command open (or, an implicit open by supplying logs on the command line). The most basic command is output, which will output the next 10 (or as many as you would like)
records in the log. The key to this type of analysis is finding the portion of the log that you would like to see.

For this purpose, fprof provides many commands. The search command will search for the next log entry which references a function which matches a given regular expression. So, let's say you want to see what happens after a certain function gets called; search for the function and then output.

Another way of navigating through a log is by using marks. As mentioned earlier, marks are used to separate different parts of an experiment. The seekmark command will move to the next or previous mark in the log. Another useful command to take advantage of marks is the bracket command which will output all of the records between the previous and next marks.

   $ cc -qf -o myprog.test myprog.c
   $ myprog.test &
   $ wait
   $ fprof /tmp/out.$!
   fprof: output
   Calling      main                                     at 0
   Calling      func4                                    at .000:080
   Returning    func4                                    at .000:519
   Calling      func1                                    at .000:532
   Calling      func2                                    at .000:537
   Calling      func3                                    at .000:541
   Calling      func4                                    at .000:544
   Returning    func4                                    at .000:546
   Returning    func3                                    at .000:551
   Calling      func3                                    at .000:555
   fprof: seekmark
   fprof: output
   Returning    func4                                    at .100:025
   Returning    func3                                    at .100:027
   Returning    func2                                    at .100:031
   Calling      func2                                    at .100:036
   Calling      func3                                    at .100:038
   Calling      func4                                    at .100:040
   Returning    func4                                    at .100:045
   Returning    func3                                    at .100:050
   Calling      func3                                    at .100:053
   Calling      func4                                    at .100:055
   fprof: search main
   Returning    main                                     at 1.135638