Difference between revisions of "Tip 11: gprof"

From Vlsiwiki
Jump to: navigation, search
Line 86: Line 86:
 
We can see that do_measure is only called once, but is called another 725 times by some other function.
 
We can see that do_measure is only called once, but is called another 725 times by some other function.
  
Using this information, you can diagnose where the calls to cieq are coming from. So you can either 1) optimize cieq or 2) call it less.
+
Using this information, you can diagnose where the calls to cieq are coming from. So you can either 1) optimize a function or 2) call it less.

Revision as of 19:23, 2 June 2010

gprof is the GNU Profiler. It can be used to find out where your program is spending most of its time when running. This won't help if your code is broken, has an infinite loop, etc, but it can be used to improve performance on working code.

First, you should probably be using optimized code. There are many things that the compiler can improve. Specify -O3 in g++ or gcc. Anything more in unnecessary.

Second, you must recompile your code and specify the "-pg" flag. You must use this for compilation AND linking. This will add extra instructions for performance counters in your code.

Third, run your code on a typical (but fairly small) test case. You will see the normal output and behavior. This will generate the function call data in a file called gmon.out.

Last, after the program has finished run gprof with your executable:

gprof myprogram > gprof.out

This will use the binary and gmon.out to create the output file gprof.out which will contain a bunch of useful information such as this:


Flat profile:
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 52.10     11.90    11.90 1326020941     0.00     0.00  cieq
 26.71     18.00     6.10  3104531     0.00     0.00  findvec
  6.74     19.54     1.54   225037     0.00     0.00  com_measure_when
  3.50     20.34     0.80                             scannum
  1.84     20.76     0.42 19926865     0.00     0.00  spice_dstring_append
  1.75     21.16     0.40  3209201     0.00     0.00  gettok
  1.27     21.45     0.29     1468     0.00     0.00  BSIM4v2load
  0.57     21.58     0.13   826248     0.00     0.00  vec_fromplot
  0.53     21.70     0.12      726     0.00     0.03  do_measure
  0.48     21.81     0.11     1468     0.00     0.00  CAPload
  0.35     21.89     0.08   400828     0.00     0.00  ft_numparse
  0.31     21.96     0.07  4050975     0.00     0.00  copy
  0.26     22.02     0.06 19857822     0.00     0.00  spice_dstring_append_char
  0.26     22.08     0.06  6886171     0.00     0.00  tmalloc

This tells me that 52% of the time is spent in the "cieq" function. This is 11.9 seconds of wall time and a huge number of function calls. If you keep scrolling down, you will get to the "Call Graph" section:

                    Call graph (explanation follows)
granularity: each sample hit covers 2 byte(s) for 0.04% of 22.84 seconds
index % time    self  children    called     name
                                                 <spontaneous>
[1]     96.0    0.00   21.93                 main [1]
               0.00   21.92       1/1           ft_dorun [3]
               0.00    0.01       1/1           inp_spsource [63]
               0.00    0.00       1/1           ft_savedotargs [74]
               0.00    0.00       1/1           ft_cpinit [76]
               0.00    0.00       1/1           ft_cktcoms [103]
               0.00    0.00       1/8           cp_vset [154]
               0.00    0.00       1/1           sp_shutdown [252]
               0.00    0.00       1/1           SIMinit [257]
               0.00    0.00       4/4           read_initialisation_file [364]
               0.00    0.00       1/1           ivars [462]
               0.00    0.00       1/1           init_time [456]
               0.00    0.00       1/1           init_rlimits [455]
               0.00    0.00       1/1           append_to_stream [388]
-----------------------------------------------
               0.00   21.92       1/1           ft_dorun [3]
[2]     96.0    0.00   21.92       1         dosim [2]
               0.00   21.89       1/1           if_run [5]
               0.00    0.03       1/726         do_measure [7]
               0.00    0.00       1/1083        cp_getvar [52]
               0.00    0.00       1/4050975     copy [29]
               0.00    0.00       1/6886171     tmalloc [31]
               0.00    0.00       2/5662407     txfree [56]
-----------------------------------------------
               0.00   21.92       1/1           main [1]
[3]     96.0    0.00   21.92       1         ft_dorun [3]
               0.00   21.92       1/1           dosim [2]

etc.

This information is a bit difficult to read, but it is all based around the primary line such as this:

 [2]     96.0    0.00   21.92       1         dosim [2]

The lines ABOVE this line tell us what functions call this function. The ones BELOW this line are the functions called by this function. So, dosim is called by:

               0.00   21.92       1/1           ft_dorun [3]

one time and it calls these functions:

               0.00   21.89       1/1           if_run [5]
               0.00    0.03       1/726         do_measure [7]
               0.00    0.00       1/1083        cp_getvar [52]
               0.00    0.00       1/4050975     copy [29]
               0.00    0.00       1/6886171     tmalloc [31]
               0.00    0.00       2/5662407     txfree [56]

We can see that do_measure is only called once, but is called another 725 times by some other function.

Using this information, you can diagnose where the calls to cieq are coming from. So you can either 1) optimize a function or 2) call it less.