Performance Measurement and Management

Diomidis Spinellis
Department of Management Science and Technology
Athens University of Economics and Business
Athens, Greece
dds@aueb.gr

Program States

Directly executing code (user time)
The kernel executes code on behalf of the program (system time)
Waiting for an external operation to complete (idle time)

Also

Real or clock time
CPU time

Measuring Workload

For terminating processes use the time command

$ time make
[...]
real    0m9.380s
user    0m2.580s
sys     0m6.640s

For servers use top

 22:14:03 up  5:22,  2 users,  load average: 0.00, 0.01, 0.00
39 processes: 38 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:   0.1% user,   0.4% system,   0.0% nice,  99.5% idle

Workload and Tools

Profile	r >>u+s	s >u	u ≈ r
Characterization	I/O-bound	kernel-bound	CPU-bound
Tools	Disk, net, VM stats, packet dumps	System call tracing	Function profiling, basic block counting

System Monitoring Tools

I/O: iostat (under *BSD), vmstat (Linux)
Memory: vmstat, free (Linux)
Network: netstat

Example: virtual memory statistics while executing the find command

$ vmstat 1
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0      0  52296   3476  50092   0   0   340     0  189   190   0   6  94
 0  1  0      0  51736   3588  50428   0   0   448     0  214   243   0   5  95
 1  0  0      0  51072   3716  50724   0   0   424     0  210   218   0   5  95
 0  1  0      0  50596   3848  50968   0   0   376     0  196   204   0   4  96
 1  0  0      0  49952   3976  51304   0   0   464     0  220   247   0   6  94
 0  1  0      0  49556   4068  51512   0   0   300     0  177   172   2   2  96
 1  0  0      0  49024   4132  51816   0   0   368     0  196   195   0   7  93
 1  0  0      0  48228   4200  52416   0   0   668     0  270   307   0   9  91
 1  0  0      0  47668   4352  52712   0   0   448     0  215   227   0   5  95
 1  0  0      0  47048   4412  53092   0   0   440     0  213   244   1   5  94
 0  1  0      0  44912   4668  54080   0   0   432     0  211   225   0   4  96

Profiling System Calls

The strace -c flag counts calls and reports time
The results typically follow a Pareto distribution

Example:

$ strace -c find . -name test
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 66.06    0.367503         584       629           getdents64
 26.31    0.146389          80      1826           lstat64
  2.27    0.012621          40       313           fcntl64
  1.83    0.010168          31       326        10 open
  1.78    0.009894          16       625           chdir
  1.23    0.006846          22       316           fstat64
[...]

Function Profiling

Statistical (prof)
Graph-based (gprof, Java)
Pinpoint hotspots

Profiling C/C++ Code with Gprof

Compile with gcc -pg
Execute program; a file gmon.out will be produced
The gprof program produces the report

A Flat Profile

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
time   seconds   seconds    calls  ms/call  ms/call  name
  6.35      0.33     0.33                             _Unwind_SjLj_Register
  5.77      0.63     0.30 10192020     0.00     0.00  __gnu_norm::__deque_buf_size(unsigned int)
  1.54      1.53     0.08       10     8.00   118.33  parse_parse()
  1.35      1.60     0.07  5096343     0.00     0.00  operator==(Tokid, Tokid)

A Call Graph Profile

index % time    self  children    called     name
                                  10             Pdtoken::process_pragma() <cycle 1> [360]
[4]     22.8    0.08    1.10      10         parse_parse() <cycle 1> [4]
                0.00    0.42    8348/17593       Token::unify(Token const&, Token const&) [5]
                0.00    0.26    4328/4328        Type::declare() [28]
                0.00    0.22    3511/3511        completed_typedef(Type) [33]
                0.00    0.03    3529/3540        Block::enter() [220]
                0.00    0.02    7402/27543       obj_lookup(std::string const&) [94]
                0.00    0.02  812361/834736      YYSTYPE::operator=(YYSTYPE const&) [324]
                0.01    0.01   85359/295237      Type::~Type() <cycle 3> [429]
                0.00    0.02    1690/1690        Call::register_call(Token const&, Id const*) [331]
                0.00    0.02   11362/11362       Type::set_abstract(Type) <cycle 4> [581]

Profiling Java Code with EJP

Add the tracer library to your library path
Run program under the Java VM with -Xruntracer
View the results with the EJP Presenter

A Java Program Profile

Basic Block Counting in C/C++ Code

To investigate algorithms count the number of times each line is executed.

Compile with gcc -pg -g -fprofile-arcs -ftest-coverage
Execute program; a file name.bb will be created
Run gcov -l name.bb
A .gcov file for each source file will be created

Example:

           main(int argc, char *argv[])
           {
      1            int i, j, k;
      1            int a = 0;

     11            for (i = 0; i < 10; i++)
    110                    for (j = 0; j < 10; j++)
   1100                            for (k = 0; k < 10; k++)
   1000                                    a += i + j + k;
      1    }