Tip 11: gprof

From Vlsiwiki
Revision as of 19:09, 2 June 2010 by Mrg (Talk | contribs)

Jump to: navigation, search

gprof is the GNU Profiler. It can be used to find out where your program is spending most of its time when running. This won't help if your code is broken, has an infinite loop, etc, but it can be used to improve performance on working code.

First, you should probably be using optimized code. There are many things that the compiler can improve. Specify -O3 in g++ or gcc. Anything more in unnecessary.

Second, you must recompile your code and specify the "-pg" flag. You must use this for compilation AND linking. This will add extra instructions for performance counters in your code.

Third, run your code on a typical (but fairly small) test case. You will see the normal output and behavior. This will generate the function call data in a file called gmon.out.

Last, after the program has finished run gprof with your executable:

gprof myprogram > gprof.out

This will use the binary and gmon.out to create the output file gprof.out which will contain a bunch of useful information such as this:


Flat profile:
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 52.10     11.90    11.90 1326020941     0.00     0.00  cieq
 26.71     18.00     6.10  3104531     0.00     0.00  findvec
  6.74     19.54     1.54   225037     0.00     0.00  com_measure_when
  3.50     20.34     0.80                             scannum
  1.84     20.76     0.42 19926865     0.00     0.00  spice_dstring_append
  1.75     21.16     0.40  3209201     0.00     0.00  gettok
  1.27     21.45     0.29     1468     0.00     0.00  BSIM4v2load
  0.57     21.58     0.13   826248     0.00     0.00  vec_fromplot
  0.53     21.70     0.12      726     0.00     0.03  do_measure
  0.48     21.81     0.11     1468     0.00     0.00  CAPload
  0.35     21.89     0.08   400828     0.00     0.00  ft_numparse
  0.31     21.96     0.07  4050975     0.00     0.00  copy
  0.26     22.02     0.06 19857822     0.00     0.00  spice_dstring_append_char
  0.26     22.08     0.06  6886171     0.00     0.00  tmalloc

This tells me that 52% of the time is spent in the "cieq" function. This is 11.9 seconds of wall time and a huge number of function calls. If you keep scrolling down, you will get to the "Call Graph" section:

                    Call graph (explanation follows)
granularity: each sample hit covers 2 byte(s) for 0.04% of 22.84 seconds
index % time    self  children    called     name
                                                 <spontaneous>
[1]     96.0    0.00   21.93                 main [1]
               0.00   21.92       1/1           ft_dorun [3]
               0.00    0.01       1/1           inp_spsource [63]
               0.00    0.00       1/1           ft_savedotargs [74]
               0.00    0.00       1/1           ft_cpinit [76]
               0.00    0.00       1/1           ft_cktcoms [103]
               0.00    0.00       1/8           cp_vset [154]
               0.00    0.00       1/1           sp_shutdown [252]
               0.00    0.00       1/1           SIMinit [257]
               0.00    0.00       4/4           read_initialisation_file [364]
               0.00    0.00       1/1           ivars [462]
               0.00    0.00       1/1           init_time [456]
               0.00    0.00       1/1           init_rlimits [455]
               0.00    0.00       1/1           append_to_stream [388]
-----------------------------------------------
               0.00   21.92       1/1           ft_dorun [3]
[2]     96.0    0.00   21.92       1         dosim [2]
               0.00   21.89       1/1           if_run [5]
               0.00    0.03       1/726         do_measure [7]
               0.00    0.00       1/1083        cp_getvar [52]
               0.00    0.00       1/4050975     copy [29]
               0.00    0.00       1/6886171     tmalloc [31]
               0.00    0.00       2/5662407     txfree [56]
-----------------------------------------------
               0.00   21.92       1/1           main [1]
[3]     96.0    0.00   21.92       1         ft_dorun [3]
               0.00   21.92       1/1           dosim [2]

etc.

This shows for each function how many times it calls the sub-functions. In main, for example, this calls ft_dorun one time and this is 21.92 out of 21.93 seconds of total run-time. I typically don't use this call graph information much, but it can help you diagnose where the calls to cieq are coming from. So you can either 1) optimize cieq or 2) call it less.