|
|
During the development of lprof
it was observed that
the process of merging profiling data was slow.
The profiling data being merged
came from two runs of the C compiler,
a medium-sized program with, at the time, 284 functions.
It took
forty cpu seconds (two minutes of real time)
to merge the two coverage files.
In order to increase efficiency, the following steps were
taken:
%Time Seconds Cumsecs #Calls msec/call Name 34.8 13.52 13.52 226638 0.0597 fread 12.1 4.72 18.24 228254 0.0207 memcpy 9.5 3.69 21.93 40286 0.0918 CAjump 9.2 3.60 5.52 _mcount 7.7 2.99 28.51 284 10.53 CAfind 7.6 2.94 31.45 42472 0.0692 malloc 6.3 2.45 33.90 1154 2.123 read 3.0 1.17 35.07 40475 0.0289 strcmp 2.8 1.09 36.16 42471 0.0257 free 2.5 0.96 37.13 2 482. creat 1.4 0.55 37.68 1 550. fputc 0.8 0.33 38.01 1431 0.231 lseek 0.4 0.16 38.17 1518 0.105 fwrite 0.3 0.11 38.28 569 0.19 CAread
prof Output
The two most time-consuming user functions were CAjump() and CAfind().
short CAfind(filedata, searchfunc) struct caFILEDATA filedata; char searchfunc; 284 [61] { short ret_code,findflag; unsigned char fname_size; char name;284 [66] CArewind(filedata); / rewind file pointers / 284 [67] findflag = 1;
40754 [69] while (findflag) 40470 [70] if ((fread((char ) &fname_size, sizeof(unsigned char), 1, filedata->cov_data_ptr)) > 0) { 40470 [72] name = (char ) malloc(fname_size+1); 40470 [73] fread(name, (int)fname_size, 1, filedata->cov_data_ptr); / make null-terminated / 40470 [75] name[fname_size] = '\\0'; 40470 [76] if (strcmp(name,searchfunc) == 0) { / this is the function, move ptr back to beginning of function name / 284 [80] fseek(filedata->cov_data_ptr, -(long)(fname_size+sizeof(unsigned char)),1); 284 [82] ret_code = OK; 284 [83] findflag = 0; } else / this is not it, move to next function / { 40186 [87] if (fname_size != EOD) 40186 [88] if (CAjump(filedata->cov_data_ptr) == EOF_FAIL) { / error - end of file found / 0 [90] ret_code = FUNC_FAIL; 0 [91] findflag = 0; } } 40470 [94] free(name); } else { / end of file before function found / 0 [98] ret_code = FUNC_FAIL; 0 [99] findflag = 0; } 284 [101] return(ret_code); 0 [102] }
lprof Output for the Function CAfind()
CAfind() searches the data file for data pertaining to a particular function. Recall that a data file consists of a header section, a section for each function, and an end of data marker. The coverage datum (execution count) for each function is recorded beside the function's name.
Notice that the while loop (shown between lines 70 and 94) was executed 40,470 times; for 284 successful searches, there were 40,186 unsuccessful searches. This represented a low rate of return for computing resources spent. A look at the while loop also shows why fread() was executed so many times: the loop contains two calls to fread() (lines 70 and 73). Finally, the prof output reports that CAjump() was called 40,186 times, once for each unsuccessful search.
Our goals were to minimize the number of unsuccessful searches and, if possible, decrease the number of calls to fread(), because these are relatively expensive.
The lprof algorithm for merging files consists of two steps: traversing the functions in one of the files sequentially, and calling CAfind() to locate the data for a given function in the other coverage file.
The first thing that happens in CAfind() is the resetting of the file pointers so they point to the first function in the file (line 66). Then, because the given function (which was passed to CAfind() as an argument) has not been found, the next function in the file is examined to see if it is the correct function. If it is, the program finishes. If not, it skips over the data and trys the next function. If it reaches the end of the file, there will be no data for that function in the coverage file and it will return with a failure. By itself, CAfind() looked fine and there didn't seem to be much that could improve its performance.
Looking over the entire program, however, you will notice that in almost all situations, the order of the coverage data in the two files to be merged was identical. Therefore, on subsequent calls to CAfind(), the next function being sought was immediately after the one found on the last call to CAfind(). The original implementation did not take advantage of the fact that the search was usually sequential. The file pointers were always reset to the beginning of the file before the search began. Because the functions were in sequential order, this meant that each successive search took progressively longer.
This changed the search strategy so that instead of starting at the beginning of the file on each call to CAfind(), it started at the place in the file where the previous search had ended. This could have been anywhere in the file. Because files being merged are usually identical, the function being sought is almost always the function following the last one found.
The new search strategy required a slightly more complicated algorithm.
Whereas the original strategy demanded only an end-of-file check,
the new strategy required
both an end-of-file check and that you keep track of the current location.
The need to do both arose from the sequence of events
involved in this type of searching.
The new strategy dictated that each iteration of searching begin where the last search ended. CAfind() was to search until the function being sought was found. If CAfind() reached the end of the file before finding that function, it had to continue the search between the first line of the file and the place where it had started the search. Thus CAfind() had to keep track of when the end of the file was reached. Because the goal of the new strategy was to start each search iteration at the place where the last search had ended, it was obviously necessary to keep track of the current location in the file.
The following screen shows the lprof output for CAfind() after it was changed to accommodate the new strategy:
short CAfind(filedata, searchfunc) struct caFILEDATA filedata; char searchfunc; 284 [61] { short ret_code; unsigned char fname_size; char name; long init_loc;284 [67] init_loc = -1; 284 [68] while (1) { 284 [69] if (init_loc == -1) { / first time through / 284 [71] init_loc = ftell(filedata->cov_data_ptr); } else { / has it wrapped completely around? / 0 [75] if (ftell(filedata->cov_data_ptr) == init_loc) { / searched all functions / 0 [77] ret_code = FUNC_FAIL; 0 [78] break; } } 284 [81] if ((fread((char ) &fname_size, sizeof(unsigned char), 1, filedata->cov_data_ptr)) > 0) { 284 [83] if (fname_size == EOD) { / wrap around to beginning / 0 [85] CArewind(filedata); / go back to top of loop / continue; } 284 [89] name = (char ) malloc(fname_size+1); 284 [90] fread(name, (int)fname_size, 1, filedata->cov_data_ptr); / make null-terminated / 284 [92] name[fname_size] = '\\0'; 284 [93] if (strcmp(name,searchfunc) == 0) { / this is the function, move ptr back to beginning of function name / 284 [97] fseek(filedata->cov_data_ptr, -(long)(fname_size+sizeof(unsigned char)),1); 284 [99] ret_code = OK; break; } else / this is not it, move to next function / { 0 [104] if (CAjump(filedata->cov_data_ptr) == EOF_FAIL) { / error - end of file found / 0 [106] ret_code = FUNC_FAIL; break; } } 0 [110] free(name); } else { / end of file before function found / 0 [114] ret_code = FUNC_FAIL; break; } }
284 [119] return(ret_code); 0 [120] }
lprof Output for New Version of CAfind()
Note that not only did it greatly reduce the number of calls to fread(), but in typical situations it eliminated calls to CAjump() entirely. Remember, CAjump() originally took 3.69 seconds (9.5% of the total execution time), which was more than any other user function.
The prof output for the new version is shown in the following screen:
%Time Seconds Cumsecs #Calls msec/call Name 25.4 0.54 0.54 298 1.81 read 11.7 0.25 0.79 2002 0.125 malloc 10.6 0.22 1.01 2848 0.079 fread 8.9 0.19 1.20 579 0.33 lseek 7.0 0.15 1.35 1518 0.099 fwrite 6.1 0.13 1.48 _mcount 4.2 0.09 1.57 569 0.16 CAread 3.8 0.08 1.65 4369 0.018 memcpy 2.8 0.06 1.71 284 0.21 CAor 2.8 0.06 1.77 2 30. creat 2.8 0.06 1.83 1 60. CAcov_join 1.9 0.04 1.87 284 0.14 CAfind 1.9 0.04 1.91 284 0.14 CAdata_entry 1.9 0.04 1.95 1717 0.023 free 1.4 0.03 1.98 7 4. open
prof Output for New Version of lprof
The execution time for CAfind() decreased from 2.99 seconds to 0.04 seconds, and for CAjump() from 3.69 seconds to 0 seconds. The overall performance of the entire program decreased from forty cpu seconds (two minutes of real time) to two cpu seconds (six seconds of real time).