A Tale of Four Kernels

How does the software development process affect quality attributes of the source code? This page contains supporing material relevant to a conference paper that examines this question:

Diomidis Spinellis. A tale of four kernels. In Wilhem Schäfer, Matthew B. Dwyer, and Volker Gruhn, editors, ICSE '08: Proceedings of the 30th International Conference on Software Engineering, pages 381–390, New York, May 2008. Association for Computing Machinery.

In this paper I analyze the source code of four systems operating system kernels, FreeBSD, Linux, Solaris, and Windows, by collecting metrics in the areas of file organization, code structure, code style, the use of the C preprocessor, and data organization.


The data, source code, and CCFinder results are permanently archived at DOI 10.5281/zenodo.2526915. All database files are MySQL SQL dumps.

You can also find the queries used for extracting the metrics from each database here.

You can find the schema of the databases described here, and a diagram of the logical schema below.

Logical database schema

Measuring Duplication

A Perl script, which can be downloaded from here, will process the duplication results found by CCFinder, and calculate and print the percentage of cloned tokens in a project. The script takes as an argument th base name of a .ccfxd file produced by CCFinder. It will print on its standard output, the name of the project, the number of files, the number of tokens, the number of clones, and the percentage of cloned elements. Based on the DRY (don't repeat yourself) principle, this last number can be used as one quality indicator for the project.

Diomidis Spinellis home page

Valid XHTML 1.0! Level Triple-A conformance icon, W3C-WAI Web Content Accessibility Guidelines 1.0

Creative Commons License Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-Share Alike 3.0 Greece License.
Last modified: 2008-05-16