This is an HTML rendering of a working paper draft that led to a publication. The publication should always be cited in preference to this draft using the following reference:

Citation(s): 6 (selected).

This document is also available in PDF format.

The document's metadata is available in BibTeX format.

Find the publication on Google Scholar

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Diomidis Spinellis Publications

© 1999 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Component Mining

Component and Object Technology

Editor: Bertrand Meyer, EiffelSoft, ISE Bldg., 2nd Fl., 270 Storke Rd., Goleta, CA 93117; voice (805) 685-6869;

Component Mining

Diomidis Spinellis
University of the Aegean

How do we obtain software components? In a panel at TOOLS USA 1994, Eric Aranow nicely outlined the basic question: "Is it nature or nurture?" In other words, are components born—devised from the start as components—or are components made? That is to say, have they evolved from program elements that may have been originally built for other purposes? Although some panel members argued for the nature view, it was clear to everyone that the nurture process of maturing candidate components until they are fully satisfactory can be invaluable.

In this column, Diomidis Spinellis describes a process of component mining. His source of candidate components—his "mine"—is a set of highly respected components from an earlier generation: the famous Unix utilities. He shows how to identify the most promising of these candidates and re-encapsulate them in components that satisfy today’s views of component-based development, based on the principles of object technology.

Bertrand Meyer

The wide adoption of the OO paradigm as well as of recent technology advances like Enterprise JavaBeans and ActiveX controls has generated renewed interest in component-based software engineering. But an increasingly important issue in the development process is the reliability and availability of the source that supplies these components.

As a relatively new field, component mining is the process of extracting reusable components from an existing component-rich software base. The most effective component-mining technique is one in which the process of mining is clearly defined. This column outlines a method I use for mining components from applications that are typically executed as Unix processes.

The Unix Mining Field

Unix developers have created a large collection of applications that provide a single service—such as comparing two files, searching for a pattern, or delivering e-mail—without requiring user interaction. Many of those programs have been implemented using state-of-the-art algorithms, have been stress-tested for decades, and have had their interfaces standardized. Furthermore, many of these programs are freely available in source code form through open-source initiatives like GNU and BSD.

You could argue that these programs have always been used as components, which you basically glue together using a Unix shell. Although this might be true in one sense, current trends call for a component model that is much richer than the one provided by the Unix shell.

Systems using ActiveX or JavaBeans components are

In contrast, the Unix shells largely

A process for packaging existing programs as object components can elevate the individual reuse of specific algorithms or implementations to an organized component-mining operation.

The Component Mining Process

The process of component mining and subsequently using them within an application domain can be divided into the three phases, which are illustrated in Figure 1:

These phases roughly correspond to the selection, specialization, and integration dimensions of typical software reuse methodologies.

Figure 1: The component mining and exploitation process.

During the exploration phase, you elicit component requirements and—on the basis of component abstractions—select components. In this phase the selected components and the system architecture determine your corresponding interface requirements.

The excogitation phase deals with the encapsulation of the components that have to be mined and the implementation of suitable interfacing glue for connecting components with the rest of the system. The abstract nature of packaged components and interfaces means that many of them can be stored in a repository for future reuse or retrieved from this repository for direct reuse.

Finally, during the exploitation phase, you use the reused and newly encapsulated components and corresponding interfaces to create a functioning system.

The excogitation and exploitation phases are composed of three basic activities:

Component Encapsulation

Component encapsulation creates a component object—usable within an object-based framework—out of a stand-alone, noninteractive program. Encapsulating filter-style programs that transform one data stream into another is a fairly straightforward process. Programs requiring limited user interaction can be encapsulated using a suitable wrapper, while programs that have a graphical front-end are—in most cases—poor candidates for encapsulation.

Encapsulated Unix components typically process input data streams and generate output streams based on parameters that modify their behavior. As an example, the diff component (which I implemented as an encapsulated version of the Unix command by the same name) processes two textual input streams and generates a third stream that contains their differences. Among other things, the diff parameters specify what output format to use, what algorithm to use, and how to handle white space.

Encapsulating a stand-alone program makes it directly usable in object-based frameworks. And specifying a standard filter-type component class allows repetitive aspects of the encapsulation to be reused. You can even specify a class to automate the encapsulation, but doing so means you are bound to a generic component interface.

Specifying data sources as connectable data streams allows you to build components using sophisticated topologies that cannot typically be implemented using Unix shells. Furthermore, cComponent encapsulation allows you to experiment with different component implementations that may vary in terms of performance, cost, licensing restrictions, and resource use.

As an example, Tthread-based implementations conforming to a framework’s structuring conventions offer increased efficiency but at a higher implementation cost. A thread-based implementation can advantageously use wrapper libraries to transform existing OS call primitives into interfaces to the encapsulation code.

Component Glue Implementation

Apart from singular options controlling an encapsulated component’s operation, the bulk of the data is transferred to and from the component through streams. Typical streams are formed from the stand-alone program’s standard input and output as well as any other user-specified files. To use a component effectively, these streams need to be connected to existing data sources such as in-memory data structures, files, relational databases, procedures producing dynamic data, GUI widgets, and other components. These connections are all handled by the glue components.

A glue component class needs to be designed whenever a new type of existing data source or sink needs to be linked to an encapsulated component. While glue components can be used to provide extra functionality for linking components together, they also allow the integration of encapsulated components within an object-based framework.

Suitable glue components can be used to provide efficient interfaces to system data, which can obviate the cumbersome file-based approaches typically used to interface stand-alone programs. The existence of glue components allows the designer to experiment with different data sources and sinks without having to modify the rest of the system structure.

Component Composition

Encapsulated components do not operate in a vacuum. They are composed to create more powerful components and are integrated within an object-based system to provide more specialized services.

The composition of Composing encapsulated components with component glue can be used to provide efficient access to offline data, GUIs, and a multitude of other component-based services. A spelling checker, for example, can be easily constructed by composing the translate, sort, unique, and other components while gluing together the editbox and listbox components to provide the GUI.

Many of the problems solved under the Unix programming environment using shell programming constructs and pipelines can be transformed to component composition structures. Of particular relevance are sequences of filter-type components, where each one receives a data stream, performs some operations on it, and forwards it to another filter to perform some other operations. Examples include pipelines of tools that process text, images, sound, and program code.

The components composed are object instances of either active process components—that are connected to existing data sources and sinks—or glue components that provide such sources and sinks. By using component composition it is possible to implement sophisticated component interaction topologies. It is also possible to package together existing components to create new reusable ones.

I have used the component mining process to encapsulate a number of Unix tools using Perl’s OO features. I used a simple wrapper approach that required only a modest implementation effort. By using the encapsulated components, I and my colleagues have been able to code applications intuitively and naturally.

The component mining process has proven to be addictive. The ease of encapsulation, the limitless possibilities of object structuring, and the flexibility of using a high-level language to interact with the components have opened new ways to leverage existing tools and applications. I am currently experimenting with more efficient encapsulation techniques by using threads and by using this approach to construct image processing applications with encapsulated Unix tools.

I would like to see component mining extended to other mining fields, which would be supported by appropriate domain-specific patterns and languages.

Diomidis Spinellis is on the teaching staff at the University of the Aegean. Contact him at