|
|
There are three different tools used to analyze flow profile logs. The first is used for locality of reference tuning and the other two are used for the other kinds of analysis. Independent of the command, the environment variable FPROF_PATH is used to tell the tools where to find the objects in the log. Of course, this is not necessary if the data is being analyzed in the same environment as the experiments were run.
Before discussing the tools themselves there are a couple of concepts to consider. The first is ``compiling''. Logs are written to disk in a form which is, by necessity, very fast. However, this leaves work for the reading program. The first time a log is read, it is automatically compiled into a form which is faster to read subsequent times. Therefore, the first time you view the information in a log, it will be much slower than subsequent reads.
The second notion is marks. A mark, as the name suggests, is a "note" placed in the log that can be referenced when reading the log. This is quite useful to note the end of an activity in an experiment. For example, let's say you have a program that first reads the list of files in a directory, then it does some operations on each of the files. If you wanted to examine each of these activities separately, you could place a mark after the first part. Later, you will see how to use these marks when reading.
To analyze the flow profiling logs for locality of reference tuning, use the command lrt_scan(1). lrt_scan takes an object and a series of logs as arguments and gives a list of functions on its standard output. This list of functions is used by the tool fur.
fur(1)
reorders the functions in an object file based on an input ordering.
Note that fur only works on relocatable objects.
The tricky part about using flow
profiling and fur is that the code needs
to be compiled in different ways for each.
For example,
$ cc -qf -o travel.test travel.c misc.c $ cc -Wa,-r -c travel.c $ cc -Wa,-r -c misc.c $ ld -r -o travel.r travel.o misc.o $ fprof -s -C Accuracy=normal,LogPrefix=myexp travel.test & 990 $ Reading in data . . . Processing data done. lrt_scan travel myexp.990 > list $ fur -l list travel.r $ cc -o travel travel.r
The first statement builds a flow profiling version of travel to be used for running the experiment. The second through fifth statements build a relocatable version of travel (this is not an executable). Next, we run the experiment. Using lrt_scan, the log is analyzed and an ordering is placed in list. The relocatable file is then reordered using fur. Lastly, the "production" version of travel is built.
As mentioned earlier, the tool fprof is available for this purpose. fprof is an interactive tool which is based on the ksh. When one types fprof, one starts a ksh which has extra commands built-in. The manual page lists all of the commands, so here we will just highlight those that are useful for statistical analysis.
Every fprof interaction begins with a call to open, to open up the logs you would like to examine (if one gives logs on the command line, this is an implicit call to open). It is often convenient after opening the logs, to do an info command. This gives information about the log: when the experiment was run, what objects were accessed, the accuracy of the timestamps, what node the experiment was run on, etc. Then, if you would like to see cumulative statistics, you can use count. This often takes a long time.
After this is completed you may choose to see 3 different types of data: function usage (using the command stats), caller of function usage (using the command callers) or callee of function usage (using the command callees). stats lists the functions that were used, the number of times each function was called and how much time each function consumed. The time is reported with 2 different pieces of information: how much time the function spent on top of the stack and how much time the function spent anywhere on the stack (that is, how much time the function was running or any function that it called was running). This latter piece of information is not available using prof, but can be quite useful for determining which function is the "cause" of performance problems, rather than the "effect" of the problem.
The command callers is used to learn about the callers of a given function. For example, let's say that you have a function, func, that is consuming a lot of the time in your experiment. Further, let's assume that func is spending all of its time at the top of the stack. If you use the callers command, you will see how many times func was called by each other function and what percentage of func's time was spent when called from each other function. This should immediately tell you why func was taking so long.
The command callees is used to learn about the callees of a given function. For example, let's say that you have a function, func, that is consuming a lot of the time in your experiment, but it is not spending a lot of time on top of the stack. In other words, it is calling code that is taking all the time. If you use the callees command, you will see how many times func called each other function (either directly or indirectly) and what percentage of func's time on the stack was spent in each of these functions. Again, this should lead you right to your answer.
Here is a short example:
$ cc -qf -o myprog.test myprog.c $ myprog.test & $ wait $ fprof /tmp/out.$! fprof: count fprof: stats Function No. of Calls Top of Stack On Stack ------------------------------- --------------- --------------- --------------- main 1 0.058099 2.308941 func1 10000 0.217145 2.250403 func2 30000 0.644329 2.033258 func3 90000 0.976436 1.388929 func4 90001 0.412932 0.412932 fprof: callers func4 ***************** func4 - 90001, 0.412932 ***************** Function No. of Calls Top of Stack On Stack ------------------------------ --------------- --------------- --------------- main 90001 100.0000 100.0000 func1 90000 099.8937 099.8937 func2 90000 099.8937 099.8937 func3 90000 099.8937 099.8937 fprof: callees func1 ***************** func1 - 10000, 2.250403 ***************** Function No. of Calls Top of Stack On Stack ------------------------------ --------------- --------------- --------------- func2 30000 028.6317 090.3508 func3 90000 043.3894 061.7191 func4 90000 018.3297 018.3297
So, what does it mean to analyze "run-time flow"? This is actually the simplest type of performance analysis possible. In this form of analysis, the programmer just reads sequentially what functions were called in order and how long they took. This type of analysis is used to see exactly what is happening when a program is running. This is especially useful for programs where the programmer may not be aware of all the ramifications of each step in the program.
Of course, even small programs with short experiments will often have thousands of function calls (and, therefore, reading logs sequentially is impractical). For this reason fprof has features to navigate while reading the logs.
Just as for statistical analysis above, a session begins with the command
open (or, an implicit open by supplying logs on the command line).
The most basic command is output, which will output the next 10 (or as
many as you would like)
records in the log.
The key to this type of analysis is
finding the portion of the log that you would like to see.
For this purpose, fprof provides many commands. The search command will search for the next log entry which references a function which matches a given regular expression. So, let's say you want to see what happens after a certain function gets called; search for the function and then output.
Another way of navigating through a log is by using marks. As mentioned earlier, marks are used to separate different parts of an experiment. The seekmark command will move to the next or previous mark in the log. Another useful command to take advantage of marks is the bracket command which will output all of the records between the previous and next marks.
$ cc -qf -o myprog.test myprog.c $ myprog.test & $ wait $ fprof /tmp/out.$! fprof: output Calling main at 0 Calling func4 at .000:080 Returning func4 at .000:519 Calling func1 at .000:532 Calling func2 at .000:537 Calling func3 at .000:541 Calling func4 at .000:544 Returning func4 at .000:546 Returning func3 at .000:551 Calling func3 at .000:555 fprof: seekmark fprof: output Returning func4 at .100:025 Returning func3 at .100:027 Returning func2 at .100:031 Calling func2 at .100:036 Calling func3 at .100:038 Calling func4 at .100:040 Returning func4 at .100:045 Returning func3 at .100:050 Calling func3 at .100:053 Calling func4 at .100:055 fprof: search main Returning main at 1.135638