Development Infrastructure: Tooling with Open Source Software
Diomidis Spinellis
Department of Management Science and Technology
Athens University of Economics and Business
Athens, Greece
dds@aueb.gr
Overview
Goals
- An overview best-of-breed tools
- Application examples
- Best practices
- Productivity tips and tricks
- Turn OSS tools into a strategic advantage
Your Instructor
The Case for Tooling
Capital expenditures in different industries
Industry
|
Revenue ($ bn)
|
Capital Expenditure ($ bn)
|
CE / R (%)
|
Semiconductors
|
430,360
|
99,577
|
23.1%
|
Motor vehicles
|
1,094,157
|
90,042
|
8.2%
|
Prepackaged software
|
105,356
|
3,402
|
3.2%
|
Programming services
|
18,216
|
438
|
2.4%
|
The Case for Open Source
- Open architecture: combine, mix and match
- Industrial strength solutions
- State of the art technologies
- Vendor independence
- Ability to support and enhance in-house
- Easy procural
- Cost
Ten Software Tool Sins (10-6)
- 10. Disjoined source and documentation
- 9. No automated testing
- 8. No bug tracking
- 7. Underutilizing compiler, types
- 6. Scripting ignorance
Ten Software Tool Sins (5-1)
- 5. Silencing the compiler
- 4. No version control
- 3. Manual identifier tracking
- 2. Debugging through print statements
- 1. Manual repetitive editing
Overview
- System Software
- Version Control
- Build Management
- Collaboration
- Code Manipulation
- Inspection and Testing
- Performance Measurement
- Documentation and Visualization
- Tool Building
System Software
Operating System
- Linux
- OpenSolaris
- FreeBSD
- OpenBSD
Command Line Tools
1000-5000 commands in a typical operating system installation.
- Archivers
- Audio
- Databases
- Development
- Documentation
- Graphics
- Interpreters
- Math
- Network and web
- Security
The package manager is your friend!
Scripting Language
The Unix Toolchest
Available under Unix, GNU/Linux, *BSD.
For Windows:
Working with Unix Tools
- Fetch or generate data
- Selection
- Processing
- Summarizing
- Program development
Data Fetching and Generation
- nm: object files
- nm, ldd: executables
- tar: archives
- ar, jar: libraries
- find: directory hierarchies
- wget: the web
- cvs or svn log or annotate: version control
- jot or dd: generate artificial data
Selection
- awk: delimited records
- cut: fixed width files
- sed: select row parts with regular expressions
- grep: select rows with regular expressions
- Combine the above in pipelines
Processing
- sort: order by multiple keys
- uniq: remove duplicates, count
- diff: compare sequential files
- comm: compare ordered lists
- tr: map between character sets
- recode: map between text representations
- tac/rev: reverse order of lines, characters
- rs: reshape arrays
- join: relational join
- Scripting languages handle more complex tasks
Summarizing
- wc: count lines, characters
- head: first elements
- tail: last elements
- fmt: format into lines
- awk with an END block
Plumbing
- Pass data through pipeline
- tee: tap output for further processing
- xargs: apply command to numerous arguments
- Pipe into a while loop for complicated processing
Example: Analyze Java Files
Examine all Java files located in the directory src, and print the ten files with the highest number of occurrences of a method call to substring.
find src -name ’*.java’ -print |
xargs fgrep -c .substring |
sort -t: -rn -k2 |
head -10
Example: Determine Commit Times
find . -type f |
grep -v CVS |
xargs cvs -d /home/ncvs log -SN 2>/dev/null |
awk '/^date/{
hour = substr($3, 0, 2)
lines[$5 " " hour] += $9
}
END {
for (i in lines) print i, lines[i]
}' |
sed 's/;//g' |
sort >dev-times-lines.dat
join dev-times-lines.dat devlong >dev-times-lines-long.dat
Example Result:Plotted Commit Times
Program Development
- Editors
- GNU Compiler Collection
- Language development tools (bison, flex)
- Tracers and debuggers
- Profilers
- Code formatting tools (cb, indent)
- Version control systems
(More on the above later.)
Outwit Tools
- winclip - Access the windows clipboard
- winreg - manipulate the windows registry
- docprop - read document properties
- odbc - select data from relational databases
- readlink - resolve shell shortcuts
- readlog - access the windows event log
Outwit Examples
Create an list of fax recipients ordered by the number of faxes they have received.
readlog Application |
awk -F: "/Outbound: Information: Fax Sent/{print $12}" |
sort |
uniq -c |
sort -rn
Extracts the email address from all records from the table users which is part of the database userDB and sends them the file message by email.
fmt message |
mail $(odbc DSN=userDB "select email from users")
An Editor Checklist
- Regular expressions
- Syntax coloring
- Folding
- Error highlighting
- Indentation
- Marking of matching delimiters
- On-line help for API elements
- Code browsing
- Multiple buffers
- Macros
- Refactoring support
Two candidates: Emacs, vim.
Version Control
Version Control
- Tracks project across time
- Collaboration, documentation, undo
- Open source systems in order of increasing capabilities
- RCS
- CVS
- Subversion, Git, Mercurial, Bazaar
The Great Debate
- Centralized (CVS, Subversion)
- Single source of truth
- Pre-commit checks
- Distributed (Bazaar, Git, Mercurial, darcs)
- Can work off line
- Easier participation (no committer bit)
- Cheap local commits
Version Control Operations
- Central repository keeps historical data
- A commit or check-in adds a change into the repository
- An update or check-out retrieves a version from the repository
- Versions are tracked with version numbers
- Numeric: automatically assigned
- Symbolic: assigned by humans to track milestones
- Keywords in the source files are automatically expanded,
identifying version, status, author, etc.
Branching
- Development can split into different branches e.g.
- Trunk or head
- Stable or release
- Feature
- Vendor
- Branches can join again
Revision Tree Example
cat.c revision tree
Life Under a VCS
- Import
- Update or synchronize
- Commit
- Label or tag
The Goodies
- Identify conflicting changes
- Versioning:
- Annotated listings
- Version history
- File differences
- Branching
- Source of truth
- Metrics
Example of a Change Log
RCS file: /cvsroot/basesrc/bin/cat/cat.c,v
Working file: cat.c
head: 1.28
branch:
locks: strict
access list:
symbolic names:
netbsd-1-5-PATCH002: 1.23
[...]
netbsd-1-5: 1.23.0.4
[...]
netbsd-1-2-BETA: 1.11
netbsd-1-2-base: 1.11
netbsd-1-2: 1.11.0.6
[...]
keyword substitution: kv
total revisions: 33; selected revisions: 33
description:
----------------------------
revision 1.28
date: 2001/09/16 12:12:13; author: wiz; state: Exp; lines: +6 -4
Some KNF, via patch by Petri Koistinen in private mail.
----------------------------
revision 1.27
date: 2001/07/29 22:40:57; author: wiz; state: Exp; lines: +6 -6
Some style improvements. [Nearly] #13592 by Petri Koistinen.
[...]
----------------------------
revision 1.15.2.1
date: 1998/02/08 21:45:36; author: mellon; state: Exp; lines: +18 -9
Pull up 1.16 and 1.17 (kleink)
Example of a File Difference List
$ cvs diff -c -r1.12 -r1.13 basesrc/bin/cat/cat.c
Index: basesrc/bin/cat/cat.c
===================================================================
RCS file: /cvsroot/basesrc/bin/cat/cat.c,v
retrieving revision 1.12
retrieving revision 1.13
diff -c -r1.12 -r1.13
[...]
***************
*** 136,141 ****
--- 136,142 ----
fp = stdin;
else if ((fp = fopen(*argv, "r")) == NULL) {
warn("%s", *argv);
+ rval = 1;
++argv;
continue;
}
Example of an Annotated Listing
1.166 (jhb 16-Oct-02): fdp = p->p_fd;
1.114 (alfred 13-Jan-02): FILEDESC_LOCK(fdp);
1.157 (iedowse 02-Sep-02): if ((unsigned)fd >= fdp->fd_nfiles ||
1.157 (iedowse 02-Sep-02): (fp = fdp->fd_ofiles[fd]) == NULL) {
1.114 (alfred 13-Jan-02): FILEDESC_UNLOCK(fdp);
1.106 (dillon 01-Sep-01): error = EBADF;
1.106 (dillon 01-Sep-01): goto done2;
1.106 (dillon 01-Sep-01): }
1.157 (iedowse 02-Sep-02): pop = &fdp->fd_ofileflags[fd];
1.94 (dillon 18-Nov-00):
1.157 (iedowse 02-Sep-02): switch (cmd) {
1.1 (rgrimes 24-May-94): case F_DUPFD:
1.241 (rwatson 07-Aug-04): /* mtx_assert(&Giant, MA_NOTOWNED); */
1.158 (jhb 03-Sep-02): FILEDESC_UNLOCK(fdp);
Extracting Metrics
- Changes per source file
- Changes per author
- Authors per source file
- Lines per day
- Changes corresponding to bug categories
Tracking Example
Maintainability index over time in the FreeBSD kernel.
Best Practices
- Put everything under version control
- Use VCS on your personal projects
- Think carefully about file name and organization
- Perform one separate commit for every change
- Label all releases
- Establish and follow policies and procedures
Build Management
The Build Process
Typical Project Dependencies
Example: Apache Project Dependencies
Makefiles
- Tools like ant and make allow the automatic processing of dependencies
- Dependencies are specified in a special file
- Elements of a Makefile may include
- Variable definitions
- Rules for suffixes
- Dependencies
- Target build instructions
- Pseudo-targets
- Auto-generated dependencies
- Procedural specifications
Part of an Apache Makefile
OBJS= \
modules.o \
$(MODULES) \
main/libmain.a \
$(OSDIR)/libos.a \
ap/libap.a
.c.o:
$(CC) -c $(INCLUDES) $(CFLAGS) $<
.SUFFIXES: .def
.def.a:
emximp -o $@ $<
$(TARGET): $(EXTRA_DEPS) $(SUBTARGET)
lib$(TARGET).$(SHLIB_SUFFIX_NAME): subdirs modules.o
$(CC) -c $(INCLUDES) $(CFLAGS) buildmark.c
[...]
target_static: subdirs modules.o
$(CC) -c $(INCLUDES) $(CFLAGS) buildmark.c
$(CC) $(CFLAGS) $(LDFLAGS) $(LDFLAGS_SHLIB_EXPORT) \
-o $(TARGET) buildmark.o $(OBJS) $(REGLIB) $(EXPATLIB) $(LIBS)
clean:
-rm -f $(TARGET) lib$(TARGET).* *.o
@for i in $(SUBDIRS); do \
echo "===> $(SDP)$$i"; \
( cd $$i && $(MAKE) $(MFLAGS_STATIC) SDP='$(SDP)' $@ ) || exit 1; \
echo "<=== $(SDP)$$i"; \
done
#Dependencies
$(OBJS): Makefile subdirs
# DO NOT REMOVE
buildmark.o: buildmark.c include/ap_config.h include/ap_mmn.h \
include/ap_config_auto.h os/unix/os.h include/ap_ctype.h \
include/hsregex.h include/httpd.h include/ap_alloc.h include/buff.h \
include/ap.h include/util_uri.h
modules.o: modules.c include/httpd.h include/ap_config.h \
include/ap_mmn.h include/ap_config_auto.h os/unix/os.h \
include/ap_ctype.h include/hsregex.h include/ap_alloc.h include/buff.h \
include/ap.h include/util_uri.h include/http_config.h
Make Problems and Solutions
- Complexity
-
- Dry run with make -n
- Debug mode: print variables
- Before lunch: make -k
- Portability
-
The Ant Approach
- Build instructions written in XML
- File typically named build . xml
- Variable definitions (named properties)
- Targets and dependencies
- By default, the target specified in the default attribute of the project tag is built
- Tasks are part of ant, typically not external programs
Some ant tasks
Examples of ant built-in tasks:
- mkdir
- copy
- cvs
- delete
- javac
- jar
Example Ant Build File
<project name="Jasper" default="deploy" basedir=".">
<!-- ===================== Initialize Property Values =================== -->
<!-- See "build.properties.sample" in the top level directory for all -->
<!-- property values you must customize for successful building!!! -->
<property file="build.properties"/>
<property file="../build.properties"/>
<property file="${user.home}/build.properties"/>
<!-- Build Defaults -->
<property name="build.compiler" value="classic"/>
<property name="copy.crimson.jar" value="../lib/crimson.jar"/>
<!-- =================== BUILD: Create Directories ====================== -->
<target name="build-prepare">
<available classname="junit.framework.TestCase" property="junit.present" />
<mkdir dir="${jasper.build}"/>
<mkdir dir="${jasper.build}/jasper"/>
<mkdir dir="${jasper.build}/lib"/>
</target>
<!-- =================== BUILD: Copy Static Files ======================= -->
<target name="build-static" depends="build-prepare">
<!-- Executable Commands -->
<copy todir="${jasper.build}/bin">
<fileset dir="src/bin" />
</copy>
<fixcrlf srcdir="${jasper.build}/bin" includes="*.sh" eol="lf"/>
<fixcrlf srcdir="${jasper.build}/bin" includes="*.bat" eol="crlf"/>
<chmod perm="+x" file="${jasper.build}/bin/jasper.sh"/>
<chmod perm="+x" file="${jasper.build}/bin/jspc.sh"/>
<!-- Common Extensions -->
<copy todir="${jasper.build}/jasper" file="${copy.crimson.jar}"/>
<copy todir="${jasper.build}/jasper" file="${copy.jaxp.jar}"/>
</target>
<!-- ================= BUILD: Compile Server Components ================= -->
<target name="build-main" depends="build-static">
<!-- Compile internal server components -->
<javac srcdir="src/share" destdir="${jasper.build}/classes"
debug="${compile.debug}" deprecation="${compile.deprecation}"
optimize="${compile.optimize}"
excludes="**/CVS/**">
<classpath refid="jasper.classpath" />
</javac>
</project>
Best Practices
- Put all the project's elements under the build system
- Minimize work through dependency analysis
- Modularize the build into several targets
- Common include files
- Explore parallel builds to reduce build times
- Provide a way to build from scratch
- Create a tinderbox build
Best Practices (Variables)
Use variables for varying elements:
- Directories
- Tool options
- Web sites
- Version strings
- OS dependencies
Collaboration
Mailing Lists
VCS Notifications
- Arrange for your VCS to notify all committers
- Committers can arrange for a special folder for these notifications
- Concept similar to ATC radio channel
Example Notification
Date: Sat, 19 Aug 2006 08:24:01 +0000 (UTC)
Subject: cvs commit: src/usr.bin/pkill Makefile
X-FreeBSD-CVS-Branch: HEAD
yar 2006-08-19 08:24:01 UTC
FreeBSD src repository
Modified files:
usr.bin/pkill Makefile
Log:
Install pkill(1), aka pgrep(1), to /bin so that rc scripts
can use this small and nifty utility. Create compatibility
symlinks from /usr/bin for the time being to avoid breaking
custom scripts relying on the hardcoded path to the utility.
Idea by: des
Discussed with: brooks (in cvs-src and cvs-all)
Revision Changes Path
1.6 +5 -0 src/usr.bin/pkill/Makefile
Setting up CVS Notifications
Example:
#!/bin/sh
(
id -P |
awk -F: '{
print $1 " (" gensub(",.*", "", "", $8) ")\t" strftime("%F %R:%S %z (%Z)")
}
END {
print "\nRepository: '"$2"'\n"
}'
echo "Directory: $1"
egrep -v '^In directory|^Update of'
) | mail -s "[$2] cvs commit $1" $3
Documentation is Ubiquitous
- System specification
- Software requirements specification
- Design specification
- Test specification
- User documentation
- Functional description
- Installation instructions
- Introductory guide, tutorial
- Reference manual
- Administrator manual
Wikis
Advantages of Wikis for storing documentation.
- Low-overhead editing
- Always current
- Revision history
- Easily accessible
Things to put on a wiki
- Project scorecards
- Personal pages (assgined projects, tasks, planned vacations)
- Time schedule
- Setup, build, release documentation
- Specifications
- Third party documentation pointers
Wiki Best Practices
Issue Tracking
- Together with VCS the two most important process-oriented tools
- Holds bugs, problems, ideas
- Each issue gets a unique id
- Should be coordinated with VCS
Issue Reports
Track bugs according to:
- area of functionality
- state
- resolution
- priority
- severity
Example: Bugzilla
Severity of a Bug
Blocker |
Blocks development or testing work (e.g. a build) |
Critical |
Crashes, loss of data, severe memory leak |
Major |
major functionality problem |
Minor |
Minor functionality problem, or easy to work around |
Trivial |
Cosmetic problem |
Enhancement |
Request for enhancement |
An Issue's Lifecycle
Resolution Options
- Fixed
- Invalid (not a bug)
- Won't fix
- Duplicate (of another bug)
- Works for me
- Moved (to another project)
Example: Releasing
Before a release.
- Triage bugs
- Fix
- Document
- Hide feature
- Next release
- Introduce a code freeze period
- Only showstoppers ...
- ... with no workaround
Bug Tracking Best Practices
- Use an existing issue tracking system
- No more than one bug per issue
- Include bug replication information
- Associate VCS commits with bug ids
- Minimize the number of bugs left open
- Use priorities to manage the developer workload
- Backup your issue database
Code Manipulation
Typical Tasks
- Identify the declarations
- Locate definitions
- Go through the places where an entity is used.
- List deviations from coding standards.
- Discover related code
- Find related comments
- Check for common errors.
- View the code structure.
- Understand how the code interacts with its environment
Regular Expressions
- A regular expression allows the declarative specification of
complex strings.
- Regular expressions consist of:
- letters (that represent themselves) and
- special symbols
- Backslash escapes special symbols
Regular Expression Symbols
^ | Beginning of a line |
$ | End of a line |
. | Any character |
Expression? | The expression zero or one times |
Expression* | The expression zero or more times |
Expression+ | The expression one or more times |
Expression{n} | The expression n times |
Expression{n,} | The expression at least n times |
Expression{n,m} | The expression at least n but no more than m times |
Expression1|Expression2 | The expression1 or the expression2 |
(Expression) | The expression within the brackets |
\1 \2 ... \n | The content of the nth bracket |
Character Classes
[abc] | One of a, b, or c |
[a-z] | A letter from a to z |
[^abc] | Any letter appart from a, b, and c. |
\t | The character tab |
\n | The character newline |
\r | The character carriage return |
\a | The character alert |
\f | The character form feed |
\e | The character escape |
\cx | The character control-x (a-z) |
\d | Digit |
\D | Non-digit |
\s | Space |
\S | Non-space |
The Editor as a Code Browser
- Regular expression searches
- Tag facility
- Outlining and folding
- Bird's eye view with a 10% zoom factor
Code Searching with grep
- Simple wildcard search
- Search for definitions
- Using a word's root
- Locating files: grep on the output of ls
- File list: grep -l
- Use the result of a file list vi `grep -l ...`
- Stream edit the grep result to create scripts
- Postprocess with grep (-v) to eliminate noise (malloc / xmalloc)
- Use sort -u to eliminate duplicates
- Use fgrep to search for a fixed list of strings
- Use find and xargs to obtain filename lists
Locating File Differences
- Uses:
- Differences between versions
- Examining code modifications
- Verifying test results
- The diff program
- Obtaining a context diff (-c)
- Ignoring blanks (-w)
Using the Compiler
- Generate warning messages.
- Tweak the code to generate error messages.
- Generate program listings.
- Obtain the preprocessor output.
- Examine the generated symbolic (assembly) code.
- Work through the final object code
Compiler Warning Messages
- Expressions associated with portability problems
- Implicit type casts and conversions between different but compatible types
- The use of a constant in place of a conditional expression in if or while statements
- Functions that do not return a value although they should
- Variable declarations that mask other variables with the same name
- Missing enumeration members or extraneous values in a switch statement
- Unknown #pragma options
- Unreferenced variables, structure members, or labels
- Failing to initialize a const declared object
(Depending on the language, some of the above may be errors)
The Compiler as a Code-Reading Tool
- Change the name of a definition and read the errors
- Post-process error messages to create scripts
- Use intermediate files (preprocessing, symbolic code)
- Examine the object code using nm (Unix) or dumpbin (Windows)
Assembly Can be Useful
Read symbolic code to
- To understand how an implementation-defined operation is actually performed
- To see how compiler optimizations affect the resulting code
- To convince yourself that the compiler correctly translates a piece of code, or (rarely) to find a compiler bug
- To verify that hardware-specific code written in C actually performs the operations you expect
- To learn how a particular construct can be written in symbolic code
Code Browsers
Code browsers typicall offer the following facilities:
- Definitions: Locate where an entity is defined.
- References: Go through all places where an entity is used.
- A call graph: List the functions called by a given function.
- A caller graph: List the functions calling a given function.
- A file outline: List the functions, variables, types, and macros defined in a file.
Code Browsers in OO
In OO languages given a class you can find:
- Where the class is defined
- The locations where the class is referenced
- The classes derived from this class
- The base classes for this class
- Its public, protected, and private methods and fields
Refactoring
- Rename and move (various)
- Change organization (classes, interfaces, inheritance)
- Local variables into class fields
- Code into method
- Change method signature
- Create getters and setters
(Live demo in Eclipse.)
Beautifiers
- Fix code that is written truly inconsistently without following any formatting standards
- Adopt orphaned code for maintenance
- Create a temporary version of the code to help you decipher it
- Integrate code under the common roof of a larger project
For printing use a pretty-printer, like listings (LaTeX), vgrind (troff).
Cdecl: Decoding C Declarations
- Convert C declarations into English
- Convert English into a C declaration
Example:
void (*signal(int sig, void (*func)(int)))(int);
declare signal as function that expects (sig as int, func as pointer to
function that expects (int) returning void) returning pointer to function
that expects (int) returning void;
Inspection and Testing
Static Verification
Examples:
Characteristics:
- Detection of common errors
- Various levels of inference
- False positives and negatives
Identified Issues
- Possible bugs
- Portability problems (C/C++)
- Function return values (C/C++)
- Dead code
- Overcomplicated expressions
- Suboptimal code
- Duplicate code
Some FindBugs Issues
(FindBugs reports more than 200 issues.)
- An apparent infinite recursive loop.
- A container is added to itself.
- A volatile reference to an array doesn't treat the array elements as volatile
- Usage of GetResource may be unsafe if class is extended
- Creates an empty jar file entry
- Dubious catching of IllegalMonitorStateException
- Method performs math using floating point precision
- Class implements Cloneable but does not define or use clone method
- clone method does not call super.clone()
- Method might drop exception
- Method might ignore exception
- Method invokes dubious new String(String) constructor; just use the argument
- Method invokes dubious new String() constructor; just use ""
A FindBugs Run
GCC Best Practices
- Compile C/C++ with all warnings enabled (gcc -Wall)
- When warnings are enabled, alse enable optimization (gcc -O)
- Treat warnings as errors (gcc -Werror)
- Compile a debug version of C++ code defining -D_GLIBCXX_DEBUG -D_GLIBCXX_DEBUG_PEDANTIC
- Annotate printf-like functions. Example:
extern void argp_error (__const struct argp_state *__restrict __state,
__const char *__restrict __fmt, ...)
__attribute__ ((__format__ (__printf__, 2, 3)));
(Other formats include scanf, wprintf, wscanf).
C Compilation Example
main(char *argv[])
{
int a, b, c;
a = b;
printf("%g\n", a);
}
Plain compile:
$ gcc t.c
$
Compile with warnings:
$ gcc -O4 -Wall -Werror t.c
cc1: warnings being treated as errors
t.c:9: warning: return-type defaults to `int'
t.c:9: warning: first argument of `main' should be `int'
t.c:9: warning: `main' takes only zero or two arguments
t.c: In function `main':
t.c:13: warning: implicit declaration of function `printf'
t.c:13: warning: double format, different type arg (arg 2)
t.c:10: warning: unused variable `c'
t.c:14: warning: control reaches end of non-void function
t.c:10: warning: `b' might be used uninitialized in this function
Lint Best Practices
- Include a lint pass in your build
- Comment case statements lacking a break with FALLTHROUGH
- Comment uneachable code (exit, longmp) with NOTREACHED
- Comment functions taking format arguments with PRINTFLIKE, SCANFLIKE
- Sparingly use #ifndef lint to silence lint warnings
Tracing Tools
Interface | Tool |
Operating System | strace |
Library | ltrace |
Network | tcpdump |
Resource Snapshot | lsof |
Using Tracing Tools
- Locate bugs without using the source code
- Explain an application's behavior
- Determine undocumented features
- Discover appropriate APIs
- Locate performance bottlenecks
[sl]trace Tips and Tricks
- Trace running processes using the -p flag
- Send output to a file using the -o flag
- Read complete strings using a large argument (e.g. 1024) to -s
- Trace forked processes with the -f flag
Example: System Call Tracing
Which shared libraries is dot loading?
$ strace dot </dev/null 2>&1 | fgrep .so | fgrep -v ENOENT
open("/etc/ld.so.cache", O_RDONLY) = 3
open("/usr/lib/libfreetype.so.6", O_RDONLY) = 3
open("/usr/lib/libpng.so.2", O_RDONLY) = 3
open("/usr/lib/libjpeg.so.62", O_RDONLY) = 3
open("/usr/lib/libz.so.1", O_RDONLY) = 3
open("/lib/libm.so.6", O_RDONLY) = 3
open("/lib/libc.so.6", O_RDONLY) = 3
Example: Library Call Tracing
How does the whoami command obtain its data?
$ ltrace whoami
[...]
geteuid() = 1000
getpwuid(1000, 0x40013678, 0xbffffcac, 0x08048cff, 0x4012ee48) = 0x401310f0
puts("dds"dds
) = 4
exit(0 <unfinished ...>
Unit Testing
- JUnit (http://www.junit.org/) and its friends is the name of the game
- Based on test classes
- Test class contains method annotated
@Before
- Test methods are annotated with
@Test
- Test methods call
assertTrue
- main calls
junit.textui.TestRunner.run(TestClass.class)
JUnit Example
Test Class and Fields
public class RationalTest {
private Rational r12;
private Rational r13;
private Rational r56;
}
Initialization Code
@Before public void setUp() {
r12 = new Rational(1, 2);
r13 = new Rational(1, 3);
r56 = new Rational(5, 6);
}
A Test Method
@Test public void simpleAdd() {
Rational result = r12.add(r13);
assertTrue(result.equals(r56));
}
Run the Class's Test Methods
public static void main (String... args) {
junit.textui.TestRunner.run(RationalTest.class);
}
Textual Regression Testing
- Used for batch oriented applications
- A loop goes over the input files
- Each input file creates a new output file
- Output file compared with correct file
- A "priming" mode allows the creation of the correct files
Example: Testing the Sort Program
#!/bin/sh
FILES=`cd in; echo *`
# Prime test data
if [ "$1" = "-p" ] ; then
for i in $FILES ; do
sort in/$i >old/$i
done
fi
# Compare old with new result
for i in $FILES ; do
sort in/$i >new/$i
if diff old/$i new/$i ; then
echo "OK $i"
else
echo "FAIL $i"
exit 1
fi
done
Test Coverage in C/C++ Code
To investigate test coverage, you can use basic block profiling.
- Compile with gcc -pg -g -fprofile-arcs -ftest-coverage
- Execute program; a file name.bb will be created
- Execute program; a file name.bb will be created
- Run gcov -n name.bb
- You will get a report of how many lines were executed
Example:
83.33% of 650 source lines executed in file hilbert.c
Performance Measurement
Program States
- Directly executing code (user time)
- The kernel executes code on behalf of the program (system time)
- Waiting for an external operation to complete (idle time)
Also
- Real or clock time
- CPU time
Measuring Workload
- For terminating processes use the time command
$ time make
[...]
real 0m9.380s
user 0m2.580s
sys 0m6.640s
- For servers use top
22:14:03 up 5:22, 2 users, load average: 0.00, 0.01, 0.00
39 processes: 38 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: 0.1% user, 0.4% system, 0.0% nice, 99.5% idle
Workload and Tools
Profile |
r >>u+s |
s >u |
u ≈ r |
Characterization |
I/O-bound |
kernel-bound |
CPU-bound |
Tools |
Disk, net, VM stats, packet dumps |
System call tracing |
Function profiling, basic block counting |
System Monitoring Tools
- I/O
- iostat (under *BSD), vmstat (Linux)
- Memory
- vmstat, free (Linux)
- Network
- netstat
Example: virtual memory statistics while executing the find command
$ vmstat 1
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 0 0 0 52296 3476 50092 0 0 340 0 189 190 0 6 94
0 1 0 0 51736 3588 50428 0 0 448 0 214 243 0 5 95
1 0 0 0 51072 3716 50724 0 0 424 0 210 218 0 5 95
0 1 0 0 50596 3848 50968 0 0 376 0 196 204 0 4 96
1 0 0 0 49952 3976 51304 0 0 464 0 220 247 0 6 94
0 1 0 0 49556 4068 51512 0 0 300 0 177 172 2 2 96
1 0 0 0 49024 4132 51816 0 0 368 0 196 195 0 7 93
1 0 0 0 48228 4200 52416 0 0 668 0 270 307 0 9 91
1 0 0 0 47668 4352 52712 0 0 448 0 215 227 0 5 95
1 0 0 0 47048 4412 53092 0 0 440 0 213 244 1 5 94
0 1 0 0 44912 4668 54080 0 0 432 0 211 225 0 4 96
Profiling System Calls
- The strace -c flag counts calls and reports time
- The results typically follow a Pareto distribution
Example:
$ strace -c find . -name test
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
66.06 0.367503 584 629 getdents64
26.31 0.146389 80 1826 lstat64
2.27 0.012621 40 313 fcntl64
1.83 0.010168 31 326 10 open
1.78 0.009894 16 625 chdir
1.23 0.006846 22 316 fstat64
[...]
Function Profiling
- Statistical (prof)
- Graph-based (gprof, Java)
- Pinpoint hotspots
Profiling C/C++ Code with Gprof
- Compile with gcc -pg
- Execute program; a file gmon.out will be produced
- The gprof program produces the report
A Flat Profile
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
6.35 0.33 0.33 _Unwind_SjLj_Register
5.77 0.63 0.30 10192020 0.00 0.00 __gnu_norm::__deque_buf_size(unsigned int)
1.54 1.53 0.08 10 8.00 118.33 parse_parse()
1.35 1.60 0.07 5096343 0.00 0.00 operator==(Tokid, Tokid)
A Call Graph Profile
index % time self children called name
10 Pdtoken::process_pragma() <cycle 1> [360]
[4] 22.8 0.08 1.10 10 parse_parse() <cycle 1> [4]
0.00 0.42 8348/17593 Token::unify(Token const&, Token const&) [5]
0.00 0.26 4328/4328 Type::declare() [28]
0.00 0.22 3511/3511 completed_typedef(Type) [33]
0.00 0.03 3529/3540 Block::enter() [220]
0.00 0.02 7402/27543 obj_lookup(std::string const&) [94]
0.00 0.02 812361/834736 YYSTYPE::operator=(YYSTYPE const&) [324]
0.01 0.01 85359/295237 Type::~Type() <cycle 3> [429]
0.00 0.02 1690/1690 Call::register_call(Token const&, Id const*) [331]
0.00 0.02 11362/11362 Type::set_abstract(Type) <cycle 4> [581]
Profiling Java Code with EJP
- Add the tracer library to your library path
- Run program under the Java VM with -Xruntracer
- View the results with the EJP Presenter
A Java Program Profile
Basic Block Counting in C/C++ Code
To investigate algorithms count the number of times each line is executed.
- Compile with gcc -pg -g -fprofile-arcs -ftest-coverage
- Execute program; a file name.bb will be created
- Run gcov -l name.bb
- A .gcov file for each source file will be created
Example:
main(int argc, char *argv[])
{
1 int i, j, k;
1 int a = 0;
11 for (i = 0; i < 10; i++)
110 for (j = 0; j < 10; j++)
1100 for (k = 0; k < 10; k++)
1000 a += i + j + k;
1 }
Architectural Inefficiencies
- Cache misses
- Branch mispredicts
- Front end stalls (failure to deliver enough instructions)
- Cache pollution through address aliasing (set associativity issues)
- Long latency instructions
- TLB misses
Architectural Profiling Tools
- Processor event monitoring
- Oprofile (http://oprofile.sourceforge.net/) (Linux)
- Locality optimizations
- SLO: Suggestions for Locality Optimizations (http://slo.sourceforge.net/)
Events that Oprofile can analyze on a Core 2 Intel CPU.
- Clock cycles when not halted
- number of instructions retired
- number of L2 cache requests
- L2 cache demand requests from this core
- Page table walk events
- cycles divider is busy
- L1 cacheable data read operations
- Bus cycles when data is sent on the bus
- number of instruction fetch misses
- Branch predicted taken with bubble 1
- [100 more]
Memory Performance
Memory as important as CPU cycles
- Difference in latency between memory types
- Large data sets
- Embedded applications
- Bandwidth requirements
Again, we need to profile
Memory Profiling C/C++ Code with Valgrind
- Create a debug build (-g)
- Run the program under valgrind
- Examine valgrind's output
- Can also detect memory leaks
Example: sed under valgrind
echo ':abcdefgh: : :' |
valgrind --tool=massif ./sed -f TEST/hanoi.sed
Documentation and Visualization
Declarative Text Processing
DocBook Details
Best Practices
- Separate content from presentation
- Modularize
- Declare new elements
- Put documents under revision control
- Use build tools
- Automatically extract content from source code
Documentation from Source Code
Javadoc Example
/**
* Constructs a newly allocated {@code Integer} object that
* represents the specified {@code int} value.
*
* @param value the value to be represented by the
* {@code Integer} object.
*/
public Integer(int value) {
this.value = value;
}
Automation Example: C++ Code
void
workdb_schema(Sql *db, ostream &of)
{
cout <<
// BEGIN AUTOSCHEMA
"CREATE TABLE STRINGS(" // Strings in the code
"FID INTEGER," // File key (references FILES)
"FOFFSET INTEGER," // Offset within the file
"STRING " << db->varchar() << "," // The string, including its delimiters
"PRIMARY KEY(FID, FOFFSET)"
");\n"
"CREATE TABLE LINEPOS(" // Line number offsets within each file
"FID INTEGER," // File key (references FILES)
"FOFFSET INTEGER," // Offset within the file
"LNUM INTEGER," // Line number (starts at 1)
"PRIMARY KEY(FID, FOFFSET)"
");\n"
"CREATE TABLE FCALLS(" // Function calls
"SOURCEID INTEGER, " // Calling function identifier key (references FUNCTIONS)
"DESTID INTEGER" // Called function identifier key (references FUNCTIONS)
");\n"
// ...
// END AUTOSCHEMA
"";
}
Automation Example: Processing Script
print 'The following sections describe the
schema of the database created through the SQL backend.';
while (<>) {
if (/BEGIN AUTOSCHEMA/../END AUTOSCHEMA/) {
if (/CREATE TABLE (\w+)\(.*\/\/ (.*)/) {
print qq{
<h2>Table $1</h2>
<p>
$2.
</p>
<table border = "1">
<tr> <th>Field name</th> <th>Field type</th> <th>Value description</th> </tr>
};
} elsif (/^\s*\"(\w+).*\/\/ (.*)/) {
$name = $1;
$description = $2;
$type = 'BOOLEAN' if (/booltype/);
$type = 'CHARACTER VARYING' if (/varchar/);
print "
<tr><td>$name</td><td>$type</td><td>$description</td></tr>
";
} elsif (/\"\)\;\\n\"/) {
print "</table>\n";
}
}
}
Automation Example: Documentation Page
The following sections describe the
schema of the database created through the SQL backend.
Table STRINGS
Strings in the code.
Field name |
Field type |
Value description |
FID | INTEGER | File key (references FILES) |
FOFFSET | INTEGER | Offset within the file |
STRING | CHARACTER VARYING | The string, including its delimiters |
Table LINEPOS
Line number offsets within each file.
Field name |
Field type |
Value description |
FID | INTEGER | File key (references FILES) |
FOFFSET | INTEGER | Offset within the file |
LNUM | INTEGER | Line number (starts at 1) |
Table PROJECTS
Project details.
Field name |
Field type |
Value description |
PID | INTEGER | Unique project key |
NAME | CHARACTER VARYING | Project name |
Table FCALLS
Function calls.
Field name |
Field type |
Value description |
SOURCEID | INTEGER | Calling function identifier key (references FUNCTIONS) |
DESTID | INTEGER | Called function identifier key (references FUNCTIONS) |
Graph and Diagram Types
- Metrics
- Call graphs
- Class hierarchies
- Concrete object instances
- Data structures (trees, lists, and so on)
- Bit fields and corresponding masks in bit-mapped registers
- State transition diagrams
- Strings or arrays and corresponding indices or pointers
- Entity-relationship diagrams
Declarative Drawing Tools
UMLGraph Class Diagram
class Person {
String Name;
}
class Employee extends Person {}
class Client extends Person {}
Detailed Class Diagram
// Author: Vadim Nasardinov
// Version: $Id: ceg-mv.xml,v 1.1 2005/11/23 22:21:22 dds Exp $
import java.util.List;
import java.util.Map;
/**
* @assoc "1..1" - "0..n" Adapter
* @assoc "" - "0..n" ObjectType
* @assoc "" - "0..n" ObjectMap
* @assoc "" - "0..n" Table
* @assoc "" - "0..n" DataOperation
**/
class Root {
private Map m_adapters;
private List m_types;
private List m_maps;
private List m_tables;
private List m_ops;
public Adapter getAdapter(Class klass) {}
}
class Adapter {
public Root getRoot();
}
abstract class Element {
Root getRoot() {}
}
class ObjectType extends Element {}
/**
* @has "1..1" - "1..1" ObjectType
**/
class ObjectMap extends Element {
private ObjectType m_type;
}
class Table extends Element {}
class DataOperation extends Element {}
UMLGraph Sequence Diagram
# Define the objects
object(O,"o:Toolkit");
placeholder_object(P);
step();
# Activation and messages
active(O);
message(O,O,"callbackLoop()");
create_message(O,P,"p:Peer");
message(O,P,"handleExpose()");
active(P);
return_message(P,O,"");
inactive(P);
destroy_message(O,P);
inactive(O);
# Complete the lifeline of O
step();
complete(O);
|
|
Directory Hierarchies with dot
(Generated with a 50-line Perl script)
Plotting Test Coverage: GNU Plot Script
#
# $Id: testcoverage.gpl 1.3 2005/10/13 18:57:00 dds Exp $
#
set ylabel 'Branch test coverage %'
set xlabel 'Test case number'
set ytics 10
set y2tics 10
set mytics
set nomxtics
set terminal png medium enhanced
set output 'testbranch.png'
plot [] [0:100] 'testbranch.dat' using 1:3 title "Executed branches" with lines,\
'testbranch.dat' using 1:5 title "Fully tested branches" with lines
Plotting Test Coverage: Result
Memory Fragmentation Example
(Postscript generated by a Perl script and a custom new
operator.)
Stack Usage Example
(GNU Plot data generated by a custom function prologue.)
Map Example
(Generated by mining data from the CVS repository.)
Tool Building
Implementation Options
- The Unix shell and tools (sed, awk, grep)
- Scripting language (Groovy, Perl, Python, Ruby)
- Visual Basic
- C, C++, Java
Choosing an Implementation
- A filtering pipeline (e.g. CVS logs): the Unix shell
- Record processing: awk
- Simple line-oriented string processing: sed
- Combination of tasks: scripting language
- Web front end: PHP, SWILL, GWT, Ruby on Rails
- Tight integration with Microsoft products: Visual Basic
- Extreme performance requirements: C, C++, Java
A Simple Grep Program in Java ...
/*
* Globally match regular expression and print
* Modelled after the Unix command with the same name
* D. Spinellis
*/
import java.util.regex.*;
import java.io.*;
class Grep {
public static void main(String args[]) {
if (args.length != 2) {
System.err.println("Usage: Grep pattern file");
System.exit(1);
}
Pattern cre = null; // Compiled RE
try {
cre = Pattern.compile(args[0]);
} catch (PatternSyntaxException e) {
System.err.println("Invalid RE syntax: " + e.getDescription());
System.exit(1);
}
BufferedReader in = null;
try {
in = new BufferedReader(new InputStreamReader(
new FileInputStream(args[1])));
} catch (FileNotFoundException e) {
System.err.println("Unable to open file " +
args[1] + ": " + e.getMessage());
System.exit(1);
}
try {
String s;
while ((s = in.readLine()) != null) {
Matcher m = cre.matcher(s);
if (m.find())
System.out.println(s);
}
} catch (Exception e) {
System.err.println("Error reading line: " + e.getMessage());
System.exit(1);
}
}
}
... And its Equivalent in Perl
#!/usr/bin/perl -n
BEGIN {$pat = shift;}
print if (/$pat/);
Advice
- Exploit the capabilities of modern rapid-prototyping languages.
- Start with a simple design, gradually improving it as needed.
- Use heuristics that exploit the lexical structure of the code.
- Tolerate some output noise or silence
- Use other tools to preprocess your input or postprocess your output.
Example: Signature Survey