Difference between revisions of "Tip 11: gprof"
Line 86: | Line 86: | ||
We can see that do_measure is only called once, but is called another 725 times by some other function. | We can see that do_measure is only called once, but is called another 725 times by some other function. | ||
− | Using this information, you can diagnose where the calls to cieq are coming from. So you can either 1) optimize a function or 2) call it less. | + | If we go back to the initial problem with "cieq", we simply search for this function and find this: |
+ | |||
+ | 0.00 0.00 1/1326020941 plot_alloc [211] | ||
+ | 0.00 0.00 1/1326020941 plot_setcur [249] | ||
+ | 0.00 0.00 26/1326020941 vec_basename [163] | ||
+ | 0.00 0.00 84/1326020941 vec_eq [161] | ||
+ | 0.00 0.00 921/1326020941 plotInit [145] | ||
+ | 0.00 0.00 1658/1326020941 beginPlot [114] | ||
+ | 0.00 0.00 100188/1326020941 vec_get [10] | ||
+ | 0.00 0.00 200376/1326020941 measure_parse_when [18] | ||
+ | 0.00 0.00 400752/1326020941 measure_parse_trigtarg [15] | ||
+ | 0.99 0.00 110529682/1326020941 com_measure_when [16] | ||
+ | 10.90 0.00 1214787252/1326020941 findvec [12] | ||
+ | [13] 52.1 11.90 0.00 1326020941 cieq [13] | ||
+ | |||
+ | Most of these calls are from findvec (which is also the second CPU hog) but it is also called in many other functions too. Using this information, you can diagnose where the calls to cieq are coming from. So you can either 1) optimize a function or 2) call it less. |
Latest revision as of 19:25, 2 June 2010
gprof is the GNU Profiler. It can be used to find out where your program is spending most of its time when running. This won't help if your code is broken, has an infinite loop, etc, but it can be used to improve performance on working code.
First, you should probably be using optimized code. There are many things that the compiler can improve. Specify -O3 in g++ or gcc. Anything more in unnecessary.
Second, you must recompile your code and specify the "-pg" flag. You must use this for compilation AND linking. This will add extra instructions for performance counters in your code.
Third, run your code on a typical (but fairly small) test case. You will see the normal output and behavior. This will generate the function call data in a file called gmon.out.
Last, after the program has finished run gprof with your executable:
gprof myprogram > gprof.out
This will use the binary and gmon.out to create the output file gprof.out which will contain a bunch of useful information such as this:
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 52.10 11.90 11.90 1326020941 0.00 0.00 cieq 26.71 18.00 6.10 3104531 0.00 0.00 findvec 6.74 19.54 1.54 225037 0.00 0.00 com_measure_when 3.50 20.34 0.80 scannum 1.84 20.76 0.42 19926865 0.00 0.00 spice_dstring_append 1.75 21.16 0.40 3209201 0.00 0.00 gettok 1.27 21.45 0.29 1468 0.00 0.00 BSIM4v2load 0.57 21.58 0.13 826248 0.00 0.00 vec_fromplot 0.53 21.70 0.12 726 0.00 0.03 do_measure 0.48 21.81 0.11 1468 0.00 0.00 CAPload 0.35 21.89 0.08 400828 0.00 0.00 ft_numparse 0.31 21.96 0.07 4050975 0.00 0.00 copy 0.26 22.02 0.06 19857822 0.00 0.00 spice_dstring_append_char 0.26 22.08 0.06 6886171 0.00 0.00 tmalloc
This tells me that 52% of the time is spent in the "cieq" function. This is 11.9 seconds of wall time and a huge number of function calls. If you keep scrolling down, you will get to the "Call Graph" section:
Call graph (explanation follows) granularity: each sample hit covers 2 byte(s) for 0.04% of 22.84 seconds index % time self children called name <spontaneous> [1] 96.0 0.00 21.93 main [1] 0.00 21.92 1/1 ft_dorun [3] 0.00 0.01 1/1 inp_spsource [63] 0.00 0.00 1/1 ft_savedotargs [74] 0.00 0.00 1/1 ft_cpinit [76] 0.00 0.00 1/1 ft_cktcoms [103] 0.00 0.00 1/8 cp_vset [154] 0.00 0.00 1/1 sp_shutdown [252] 0.00 0.00 1/1 SIMinit [257] 0.00 0.00 4/4 read_initialisation_file [364] 0.00 0.00 1/1 ivars [462] 0.00 0.00 1/1 init_time [456] 0.00 0.00 1/1 init_rlimits [455] 0.00 0.00 1/1 append_to_stream [388] ----------------------------------------------- 0.00 21.92 1/1 ft_dorun [3] [2] 96.0 0.00 21.92 1 dosim [2] 0.00 21.89 1/1 if_run [5] 0.00 0.03 1/726 do_measure [7] 0.00 0.00 1/1083 cp_getvar [52] 0.00 0.00 1/4050975 copy [29] 0.00 0.00 1/6886171 tmalloc [31] 0.00 0.00 2/5662407 txfree [56] ----------------------------------------------- 0.00 21.92 1/1 main [1] [3] 96.0 0.00 21.92 1 ft_dorun [3] 0.00 21.92 1/1 dosim [2]
etc.
This information is a bit difficult to read, but it is all based around the primary line such as this:
[2] 96.0 0.00 21.92 1 dosim [2]
The lines ABOVE this line tell us what functions call this function. The ones BELOW this line are the functions called by this function. So, dosim is called by:
0.00 21.92 1/1 ft_dorun [3]
one time and it calls these functions:
0.00 21.89 1/1 if_run [5] 0.00 0.03 1/726 do_measure [7] 0.00 0.00 1/1083 cp_getvar [52] 0.00 0.00 1/4050975 copy [29] 0.00 0.00 1/6886171 tmalloc [31] 0.00 0.00 2/5662407 txfree [56]
We can see that do_measure is only called once, but is called another 725 times by some other function.
If we go back to the initial problem with "cieq", we simply search for this function and find this:
0.00 0.00 1/1326020941 plot_alloc [211] 0.00 0.00 1/1326020941 plot_setcur [249] 0.00 0.00 26/1326020941 vec_basename [163] 0.00 0.00 84/1326020941 vec_eq [161] 0.00 0.00 921/1326020941 plotInit [145] 0.00 0.00 1658/1326020941 beginPlot [114] 0.00 0.00 100188/1326020941 vec_get [10] 0.00 0.00 200376/1326020941 measure_parse_when [18] 0.00 0.00 400752/1326020941 measure_parse_trigtarg [15] 0.99 0.00 110529682/1326020941 com_measure_when [16] 10.90 0.00 1214787252/1326020941 findvec [12] [13] 52.1 11.90 0.00 1326020941 cieq [13]
Most of these calls are from findvec (which is also the second CPU hog) but it is also called in many other functions too. Using this information, you can diagnose where the calls to cieq are coming from. So you can either 1) optimize a function or 2) call it less.