Performance Measurement

Diomidis Spinellis
Department of Management Science and Technology
Athens University of Economics and Business
Athens, Greece
dds@aueb.gr

Program States

Also

Measuring Workload

Workload and Tools

Profile r >>u+s s >u u ≈ r
Characterization I/O-bound kernel-bound CPU-bound
Tools Disk, net, VM stats, packet dumps System call tracing Function profiling, basic block counting

System Monitoring Tools

I/O
iostat (under *BSD), vmstat (Linux)
Memory
vmstat, free (Linux)
Network
netstat
Example: virtual memory statistics while executing the find command
$ vmstat 1
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0      0  52296   3476  50092   0   0   340     0  189   190   0   6  94
 0  1  0      0  51736   3588  50428   0   0   448     0  214   243   0   5  95
 1  0  0      0  51072   3716  50724   0   0   424     0  210   218   0   5  95
 0  1  0      0  50596   3848  50968   0   0   376     0  196   204   0   4  96
 1  0  0      0  49952   3976  51304   0   0   464     0  220   247   0   6  94
 0  1  0      0  49556   4068  51512   0   0   300     0  177   172   2   2  96
 1  0  0      0  49024   4132  51816   0   0   368     0  196   195   0   7  93
 1  0  0      0  48228   4200  52416   0   0   668     0  270   307   0   9  91
 1  0  0      0  47668   4352  52712   0   0   448     0  215   227   0   5  95
 1  0  0      0  47048   4412  53092   0   0   440     0  213   244   1   5  94
 0  1  0      0  44912   4668  54080   0   0   432     0  211   225   0   4  96

Profiling System Calls

Example:
$ strace -c find . -name test
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 66.06    0.367503         584       629           getdents64
 26.31    0.146389          80      1826           lstat64
  2.27    0.012621          40       313           fcntl64
  1.83    0.010168          31       326        10 open
  1.78    0.009894          16       625           chdir
  1.23    0.006846          22       316           fstat64
[...]

Function Profiling

Profiling C/C++ Code with Gprof

A Flat Profile

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
  6.35      0.33     0.33                             _Unwind_SjLj_Register
  5.77      0.63     0.30 10192020     0.00     0.00  __gnu_norm::__deque_buf_size(unsigned int)
  1.54      1.53     0.08       10     8.00   118.33  parse_parse()
  1.35      1.60     0.07  5096343     0.00     0.00  operator==(Tokid, Tokid)

A Call Graph Profile

index % time    self  children    called     name
                                  10             Pdtoken::process_pragma() <cycle 1> [360]
[4]     22.8    0.08    1.10      10         parse_parse() <cycle 1> [4]
                0.00    0.42    8348/17593       Token::unify(Token const&, Token const&) [5]
                0.00    0.26    4328/4328        Type::declare() [28]
                0.00    0.22    3511/3511        completed_typedef(Type) [33]
                0.00    0.03    3529/3540        Block::enter() [220]
                0.00    0.02    7402/27543       obj_lookup(std::string const&) [94]
                0.00    0.02  812361/834736      YYSTYPE::operator=(YYSTYPE const&) [324]
                0.01    0.01   85359/295237      Type::~Type() <cycle 3> [429]
                0.00    0.02    1690/1690        Call::register_call(Token const&, Id const*) [331]
                0.00    0.02   11362/11362       Type::set_abstract(Type) <cycle 4> [581]

Profiling Java Code with EJP

A Java Program Profile

An EJP profile of a Java program

Basic Block Counting in C/C++ Code

To investigate algorithms count the number of times each line is executed. Example:
           main(int argc, char *argv[])
           {
      1            int i, j, k;
      1            int a = 0;

     11            for (i = 0; i < 10; i++)
    110                    for (j = 0; j < 10; j++)
   1100                            for (k = 0; k < 10; k++)
   1000                                    a += i + j + k;
      1    }

Architectural Inefficiencies

Architectural Profiling Tools

Processor event monitoring
Oprofile (http://oprofile.sourceforge.net/) (Linux)
Locality optimizations
SLO: Suggestions for Locality Optimizations (http://slo.sourceforge.net/)
Events that Oprofile can analyze on a Core 2 Intel CPU.

Memory Performance

Memory as important as CPU cycles Again, we need to profile

Memory Profiling C/C++ Code with Valgrind

Example: sed under valgrind

echo ':abcdefgh: : :' |
valgrind  --tool=massif ./sed -f TEST/hanoi.sed
Valgrind's output