This is an HTML rendering of a working paper draft that led to a publication. The publication should always be cited in preference to this draft using the following reference:
The tools and processes we use to transform our system’s source code into an application we can deploy or ship were always important, but nowadays they can mean the difference between success and failure. The reasons are simple: larger code bodies, teams that are bigger, more fluid, and wider distributed, richer interactions with other code, and sophisticated tool chains. All these mean that a slapdash software build process will be an endless drain on productivity and an embarrassing source of bugs, while a high-quality one will give us developers more time and traction to build better software.
The golden rule of software building is that you should automate all build tasks. The scope of this automation includes setting up the build environment, compiling the software, performing unit and regression testing, typesetting the documentation, stamping a new release, and updating the project’s web page. You can never automate too much. In a project I manage I’ve arranged for each release distribution to pickup from the issue-management database the bugs that the release fixes, and include them in the release notes. Automation serves three purposes: it documents the processes, it speeds up the corresponding tasks, and it eliminates mistakes and forgotten steps. (Did we correctly update the documentation to indicate the software’s current version?)
At the simplest level you can automate processes by writing small scripts or programs, using your operating system’s shell language or a general-purpose scripting language. In some cases, for instance flashing the memory image of an embedded device, you may even need to develop a purpose-built program to avoid mouse-clicking on that pesky GUI application supplied by your hardware vendor. However, this approach is only suitable for the most specialized purposes. In most cases a build tool will standardize your process and provide you with many useful facilities.
The most popular tool options for automating your build are the use of your IDE’s facilities, the various implementations of make, and Apache Ant and Maven. I don’t recommend basing your build process on your IDE for anything but the most trivial projects. The build process gets tied up to the specific IDE and the platforms it runs on. Even if this is a popular IDE, why restrict needlessly the developers’ choice? Also, the build facilities provided my most IDEs are limited, often restricting the way you can abstract tasks and options.
Maven is an interesting choice, if your project is Java-based. It’s a tool with an attitude, sporting a range of built-in patterns for software builds. If you’re willing to adopt its predefined patterns you end-up with both a well-defined, complete, and standardized build process, and less verbiage than other alternatives.
The differences between Make and Ant are noteworthy, but the choice is not difficult. Make has been used for everything: from typesetting books to setting up phone exchanges. The domain of Ant is mostly limited to the Java world. However, nowadays some circles consider building a Java application with anything but Ant or Maven downright eccentric, so if you’re working with Java you should have a very convincing story to explain a contrarian choice. Having said that, keep in mind that Sun’s Java Development Kit ships with 471 makefiles (Make’s default input file) and just 36 Ant build files.
Both tools work on a dependency graph of tasks. For this you describe your build processes as a series of tasks that depend on each other; for instance to link together your project’s modules you must first compile them. The nodes of Make’s graph are typically files. Make bases its decisions on what to build based on the timestamps of those files. This allows it to short-circuit large parts of the build process when the build is incremental, giving you a performance edge. Ant’s tasks are abstract named blocks. Ant will always traverse the whole graph, but some tasks, like that of the Java compiler, can internally determine that their work is already done.
A major advantage of Ant is the portability of its scripts. In contrast to Make, which invokes external programs to accomplish its work, Ant’s tasks are built into it (or loaded as extensions, written in Java). Therefore, an Ant script should behave identically on any Java platform, whereas with Make you need to spend effort to avoid or abstract-away system-specific commands. If you’re using Make on a Windows platform, installing a Unix-compatibility suite, like Cygwin, can help your makefiles run on both Windows and Unix. Alternatively, you can inject cross-platform compatibility into your build system through CMake.
Portability aside, the fact that Make does its work with normal shell commands, means that you can easily dry-run any part of your build process on the command prompt. Debugging a build process (yes, unfortunately this is sometimes needed) is also easier with Make, because the output you see from it is the commands that the system runs. In contrast, Ant’s behavior is opaque: to analyze what a task is doing you need to add print statements and (at a deeper level) look at the Java source code that implements it.
Many have re-implemented and extended the original 1970s Make program. Versions such as those from the BSD and GNU efforts offer file inclusion, conditionals, more readable ways to specify implicit rules and their variables, string processing, and loops. These features increase the expressiveness of your build scripts, but can also make them less portable.
Having automated your build process the next step is to optimize it. As much as I nostalgically remember the days when I could cook and eat dinner while compiling an application, a quick build cycle can keep developers focused by robbing them the excuse to browse Slashdot (and worse) while their code is compiling. The first optimization step involves the correct handling of dependencies, so that a part (for instance an object file) is built if and only if one of its constituents (the corresponding source file) changes. There are two possible problems here. Extraneous processing (for example compiling many C source files together by invoking the compiler with a wildcard) is a waste of time. On the other hand, missing a dependency, like the fact that a C file must be recompiled when a header file changes, can introduce subtle bugs that are difficult to track-down. For preprocessor-based languages like C and C++ dependency tracking can become so complicated that there are tools, like ccache, which cache the input and output of each compile cycle, and transparently skip compilations for which they have the correct cached result.
An additional neat optimization possibility is to take advantage of idle workstations in your organization or processor cores on your machine. Many parts of a build process are trivially parallelizable and contain a nice mix of I/O and CPU-intensive processing. Therefore, you can shave-off significant fractions of the build time by arranging to run parts in parallel. Most modern Make programs can do that (with a –j option specifying the number of jobs to run simultaneously), while Ant has the equivalent “parallel” container task. There are also tools like distcc and Icecream that can distribute a build across many machines.
Once you’ve got that build process in place, take the extra steps needed to make it shine. A makefile or an Ant build file is also source code, and you should treat it with the same respect. Put it under version control, document it with ample comments, use descriptive variable and target names, put often used-sequences into reusable blocks, and don’t repeat yourself. Appropriate reuse can keep your build specifications short and sweet. For instance, almost half of the 2,400 makefiles that control the build process of the FreeBSD operating system are shorter than a dozen lines, while the whole system (the kernel, 706 commands, and 725 libraries) can be built through them with just two commands.
Finally, invest some effort to squeeze the most out of your build process. Once the process it automated, you can couple it with your version control system, and arrange for nightly or continuous builds. A small script can retrieve the latest version of the source code, build it (with full compiler warnings enabled and treated as errors), and test it. This so-called tinderbox script can then immediately send any error messages to all developers, thus putting in place peer-pressure that keeps your system always ready to ship.
Build automation is one of those remarkable places where product and process, programmers and managers meet with common interests and goals. Invest in it and you won’t regret it.