DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 
Analyzing run-time behavior

prof and lprof on lprof

During the development of lprof it was observed that the process of merging profiling data was slow. The profiling data being merged came from two runs of the C compiler, a medium-sized program with, at the time, 284 functions. It took
forty cpu seconds (two minutes of real time) to merge the two coverage files. In order to increase efficiency, the following steps were taken:

  1. A time profile of lprof was produced to see which functions were taking the most time. Here is part of the output from prof:
     %Time Seconds Cumsecs  #Calls   msec/call  Name
      34.8   13.52   13.52  226638      0.0597  fread
      12.1    4.72   18.24  228254      0.0207  memcpy
       9.5    3.69   21.93   40286      0.0918  CAjump
       9.2    3.60    5.52                      _mcount
       7.7    2.99   28.51     284     10.53    CAfind
       7.6    2.94   31.45   42472      0.0692  malloc
       6.3    2.45   33.90    1154      2.123   read
       3.0    1.17   35.07   40475      0.0289  strcmp
       2.8    1.09   36.16   42471      0.0257  free
       2.5    0.96   37.13       2    482.      creat
       1.4    0.55   37.68       1    550.      fputc
       0.8    0.33   38.01    1431      0.231   lseek
       0.4    0.16   38.17    1518      0.105   fwrite
       0.3    0.11   38.28     569      0.19    CAread
    

    prof Output

    The two most time-consuming user functions were CAjump() and CAfind().

  2. A check was made to see why CAjump() was called 40,286 times and why the average time per call for CAfind() was so high, 10.53 milliseconds. lprof was invoked only for the source files containing the functions CAjump() and CAfind(). Here are the results for CAfind():
                 short
                 CAfind(filedata, searchfunc)
                 struct caFILEDATA *filedata;
                 char *searchfunc;
      284 [61]   {
                    short ret_code,findflag;
                    unsigned char fname_size;
                    char *name;
    

    284 [66] CArewind(filedata); /* rewind file pointers */ 284 [67] findflag = 1;

    40754 [69] while (findflag) 40470 [70] if ((fread((char *) &fname_size, sizeof(unsigned char), 1, filedata->cov_data_ptr)) > 0) { 40470 [72] name = (char *) malloc(fname_size+1); 40470 [73] fread(name, (int)fname_size, 1, filedata->cov_data_ptr); /* make null-terminated */ 40470 [75] name[fname_size] = '\\0'; 40470 [76] if (strcmp(name,searchfunc) == 0) { /* this is the function, move ptr back to beginning of function name */ 284 [80] fseek(filedata->cov_data_ptr, -(long)(fname_size+sizeof(unsigned char)),1); 284 [82] ret_code = OK; 284 [83] findflag = 0; } else /* this is not it, move to next function */ { 40186 [87] if (fname_size != EOD) 40186 [88] if (CAjump(filedata->cov_data_ptr) == EOF_FAIL) { /* error - end of file found */ 0 [90] ret_code = FUNC_FAIL; 0 [91] findflag = 0; } } 40470 [94] free(name); } else { /* end of file before function found */ 0 [98] ret_code = FUNC_FAIL; 0 [99] findflag = 0; } 284 [101] return(ret_code); 0 [102] }

    lprof Output for the Function CAfind()

CAfind() searches the data file for data pertaining to a particular function. Recall that a data file consists of a header section, a section for each function, and an end of data marker. The coverage datum (execution count) for each function is recorded beside the function's name.

Notice that the while loop (shown between lines 70 and 94) was executed 40,470 times; for 284 successful searches, there were 40,186 unsuccessful searches. This represented a low rate of return for computing resources spent. A look at the while loop also shows why fread() was executed so many times: the loop contains two calls to fread() (lines 70 and 73). Finally, the prof output reports that CAjump() was called 40,186 times, once for each unsuccessful search.

Our goals were to minimize the number of unsuccessful searches and, if possible, decrease the number of calls to fread(), because these are relatively expensive.

The lprof algorithm for merging files consists of two steps: traversing the functions in one of the files sequentially, and calling CAfind() to locate the data for a given function in the other coverage file.

The first thing that happens in CAfind() is the resetting of the file pointers so they point to the first function in the file (line 66). Then, because the given function (which was passed to CAfind() as an argument) has not been found, the next function in the file is examined to see if it is the correct function. If it is, the program finishes. If not, it skips over the data and trys the next function. If it reaches the end of the file, there will be no data for that function in the coverage file and it will return with a failure. By itself, CAfind() looked fine and there didn't seem to be much that could improve its performance.

Looking over the entire program, however, you will notice that in almost all situations, the order of the coverage data in the two files to be merged was identical. Therefore, on subsequent calls to CAfind(), the next function being sought was immediately after the one found on the last call to CAfind(). The original implementation did not take advantage of the fact that the search was usually sequential. The file pointers were always reset to the beginning of the file before the search began. Because the functions were in sequential order, this meant that each successive search took progressively longer.

This changed the search strategy so that instead of starting at the beginning of the file on each call to CAfind(), it started at the place in the file where the previous search had ended. This could have been anywhere in the file. Because files being merged are usually identical, the function being sought is almost always the function following the last one found.

The new search strategy required a slightly more complicated algorithm. Whereas the original strategy demanded only an end-of-file check, the new strategy required both an end-of-file check and that you keep track of the current location.
The need to do both arose from the sequence of events involved in this type of searching.

The new strategy dictated that each iteration of searching begin where the last search ended. CAfind() was to search until the function being sought was found. If CAfind() reached the end of the file before finding that function, it had to continue the search between the first line of the file and the place where it had started the search. Thus CAfind() had to keep track of when the end of the file was reached. Because the goal of the new strategy was to start each search iteration at the place where the last search had ended, it was obviously necessary to keep track of the current location in the file.

The following screen shows the lprof output for CAfind() after it was changed to accommodate the new strategy:

                short
                CAfind(filedata, searchfunc)
                struct caFILEDATA *filedata;
                char *searchfunc;
     284 [61]   {
                    short ret_code;
                    unsigned char fname_size;
                    char *name;
                    long init_loc;
   

284 [67] init_loc = -1; 284 [68] while (1) { 284 [69] if (init_loc == -1) { /* first time through */ 284 [71] init_loc = ftell(filedata->cov_data_ptr); } else { /* has it wrapped completely around? */ 0 [75] if (ftell(filedata->cov_data_ptr) == init_loc) { /* searched all functions */ 0 [77] ret_code = FUNC_FAIL; 0 [78] break; } } 284 [81] if ((fread((char *) &fname_size, sizeof(unsigned char), 1, filedata->cov_data_ptr)) > 0) { 284 [83] if (fname_size == EOD) { /* wrap around to beginning */ 0 [85] CArewind(filedata); /* go back to top of loop */ continue; } 284 [89] name = (char *) malloc(fname_size+1); 284 [90] fread(name, (int)fname_size, 1, filedata->cov_data_ptr); /* make null-terminated */ 284 [92] name[fname_size] = '\\0'; 284 [93] if (strcmp(name,searchfunc) == 0) { /* this is the function, move ptr back to beginning of function name */ 284 [97] fseek(filedata->cov_data_ptr, -(long)(fname_size+sizeof(unsigned char)),1); 284 [99] ret_code = OK; break; } else /* this is not it, move to next function */ { 0 [104] if (CAjump(filedata->cov_data_ptr) == EOF_FAIL) { /* error - end of file found */ 0 [106] ret_code = FUNC_FAIL; break; } } 0 [110] free(name); } else { /* end of file before function found */ 0 [114] ret_code = FUNC_FAIL; break; } }

284 [119] return(ret_code); 0 [120] }

lprof Output for New Version of CAfind()

Note that not only did it greatly reduce the number of calls to fread(), but in typical situations it eliminated calls to CAjump() entirely. Remember, CAjump() originally took 3.69 seconds (9.5% of the total execution time), which was more than any other user function.

The prof output for the new version is shown in the following screen:

    %Time Seconds Cumsecs  #Calls  msec/call  Name
     25.4    0.54    0.54     298     1.81    read
     11.7    0.25    0.79    2002     0.125   malloc
     10.6    0.22    1.01    2848     0.079   fread
      8.9    0.19    1.20     579     0.33    lseek
      7.0    0.15    1.35    1518     0.099   fwrite
      6.1    0.13    1.48                     _mcount
      4.2    0.09    1.57     569     0.16    CAread
      3.8    0.08    1.65    4369     0.018   memcpy
      2.8    0.06    1.71     284     0.21    CAor
      2.8    0.06    1.77       2    30.      creat
      2.8    0.06    1.83       1    60.      CAcov_join
      1.9    0.04    1.87     284     0.14    CAfind
      1.9    0.04    1.91     284     0.14    CAdata_entry
      1.9    0.04    1.95    1717     0.023   free
      1.4    0.03    1.98       7     4.      open

prof Output for New Version of lprof

The execution time for CAfind() decreased from 2.99 seconds to 0.04 seconds, and for CAjump() from 3.69 seconds to 0 seconds. The overall performance of the entire program decreased from forty cpu seconds (two minutes of real time) to two cpu seconds (six seconds of real time).


Next topic: Improving program performance by automated analysis
Previous topic: A sample session using flow profiling

© 2004 The SCO Group, Inc. All rights reserved.
UnixWare 7 Release 7.1.4 - 27 April 2004