© 2009 IEEE.
Personal use of this material is permitted. However, permission
to reprint/republish this material for advertising or promotional purposes or
for creating new collective works for resale or redistribution to servers or
lists, or to reuse any copyrighted component of this work in other works must
be obtained from the IEEE.
J%: Integrating Domain Specific Languages with Java
Vassilios Karakoidas and Diomidis Spinellis
Department of Management Science and Technology
Athens University of Economics and Business
Email: {bkarak,dds}@aueb.gr
Abstract
J% (J-mod), is a Java language extension that supports integration with Domain-Specific Languages. The integration is realized through an architecture that permits external modules to support DSLs. The DSL statements can be syntactically checked at compile-time. An additional facility allows the static type checking of Java variables that appear within DSL code. To support this process each DSL module comes as a library that is used both at compile time and during program execution.
1 Introduction
The multiparadigm programming [1,2] approach demanded each problem to be dealt with the most suitable programming language. The use of DSLs in the software development process reduces cost and enhances productivity [3,4]. In modern multiparadigm software development, Domain-specific languages ( DSL) are often integrated into General-Purpose Languages ( GPL), and according to the categorization of Mernik et al. [3] follow distinct patterns. Mernik also provides guidelines regarding the implementation process of a DSL, and the use of each pattern, according to its notation, design and user community.
DSLs focus on a specific problem domain. For that purpose they sacrifice syntactic flexibility. In the literature, they are often called micro-languages or little languages [5]. Well known DSLs include regular expressions, SQL, HTML and VHDL. On the other hand, GPLs have a wider scope, providing a set of processing capabilities applicable to various problem domains [5]. Typical examples of GPLs are Java, C, C++, and Python.
Currently the integration of a DSL with a GPL brings forth many practical and research issues. For example, SQL integration in the Java programming language is implemented by the JDBC API application library [6]. This implementation pattern compels the programmer to pass the SQL query as a String. The Java compiler is completely unaware of the SQL query and the programmer finds out SQL syntactical errors at runtime, usually by an exception raised by the JDBC driver. Such errors remain undetected, if during the testing phase of the product the SQL query is never invoked.
This paper introduces J%, a DSL-aware extension of the Java programming language. Its purpose is to enrich the current Java syntax in order to effectively support DSLs. The prototype implementation consists of a pre-processor, that translates the J% source code to Java compatible code. Thus, it provides an extensible way to embed DSLs into Java.
J% introduces the following new characteristics:
- Static Typing The compilation process of the DSL is type-safe. The compiler is able to perform static typing to the hosted DSLs.
- DSL Syntax Compile-Time Check The DSL is syntactically checked at compile time and all errors are reported as compile-time errors. Each hosted DSL retains its syntax. There is only a minimal addition to the grammar of each language to support type mapping with J% (Section II).
- Meta-Programming Facility The hosted DSL is not translated into Java, but generates code that facilitates existing Java DSL Application Programming Interface APIs, like JDBC for SQL. This is the main difference between J% and other popular meta-programming systems that transform the DSL into the host language.
- Common DSL Container J% provides one common way to embed DSLs into Java. This way, the language grammar is not burdened with custom extensions, each time a new DSL module is included.
- Extensible Developers can create their own DSLs module through a well defined application programming interface ( API). This way J% can include as many DSLs as possible.
2 The J% Language
J% language adopts the typical Java syntax (v.1.5) with a minimal set of extensions. Each DSL is also extended to support the integration with Java. Our approach follows the Implementation: Extensible compiler/interpreter pattern [3].
2.1 Extending Java
Figure 1 listing contains a simplified version of J% grammar. The following BNF conventions are adopted:
- [x] denotes zero or one occurrences of x.
- {x} denotes zero or more occurrences of x.
- x | y means one of either x or y.
The main rule is code_unit which contains declarations of package (package), import statements (import) and types (type). Each type can be either a class, an enumeration or an interface.
The declarations are compatible with those of Java [7]. The difference lies in the class rule. Each class can be a class (typical_class) or a DSL class (dsl_class). Its block contains the actual DSL code. The any_character rule must accept any character as input, because at grammar level we cannot predict the syntax of the embedded DSL language. The actual DSL code is contained in a set brackets, like a typical class or method.
code_unit ::= [ package ]
{ import }
{ type }
type ::= class
| interface
| enumeration
class ::= typical_class
| dsl_class
dsl_class ::= { modifier } 'class' identifier
'extends' identifier
'<' identifier '>'
'{' { any_character } '}'
Figure 1: Simplified J% grammar
2.2 DSL Extensions
DSL maintains its own syntax. By doing that, the development phase receives the maximum benefit, since the domain knowledge can be used and expressed through the specific instance of the DSL.
For simple cases, where the DSL does not interact with Java the syntax is completely the same. When the host language needs to interact with the DSL, an external reference must be defined; a contract that bridges the type systems between the two languages is required.
The grammar of this DSL extension is presented in Figure 2. The interpretation of the base types (B, C, etc.) is provided in the Java Virtual Machine Specification [8].
For example, in an SQL query that needs an int as a parameter we would write:
select * from customer
where customerId = #[1]<I>
The expression #[1]<I> defines that the customerId expects an int base type (I). The [1] is the parameter index and affects the code generation phase (Section III-A4).
<external_ref> ::= '#''['<index>']''<'<field_type>'>'
<field_type> ::= <base_type> |
<object_type> |
<array_type>
<base_type> ::= 'B'|'C'|'D'|'F'|'I'|'J'|'S'|'Z'
<object_type> ::= 'L'<fullclassname>;
<array_type> ::= '['<optional_size><field_type>
<optional_size> ::= '0'-'9' { '0'-'9' }
<index> ::= '1'-'9' { '0'-'9' }
Figure 2: Syntax of a DSL external reference
3 The J% Compiler
Figure 3: J% Compilation Process
The compilation process is depicted in Figure 3. The J% compiler accepts the input source files (.jmod). The symbol table is populated and the required DSL modules are identified. In this phase the compiler also verifies each module existence in the classpath and seeks its implementation. The DSL types are located, parsed and associated with the appropriate elements in the symbol table.
Afterwards, type checking is initiated for the DSL code. Type mapping information is gathered for each DSL module and the references to external variables are tested about type compatibility and scope.
The code generator is invoked and Java code is generated. Finally, the Java compiler is invoked and it translates the generated Java code into executable JVM bytecode.
3.1 DSL Modules
The DSL modules have a dual use; they implement the syntax and type checking at compile-time and provide an execution environment at runtime.
A DSL module is imported using an import statement. In the following example, we import the Regex runtime class of the regular expressions DSL module.
import org.jmod.dsl.regex.Regex;
To include third party external modules, the jar files must be included in the classpath and the entries must be added in the appropriate configuration file.
3.1.2 Initialization
The compile-time part of each module is identified in the compiler initialisation phase. The exported DSL types of each module are added in the symbol table and identified as DSL types. The following listing presents a symbol table printout, populated only with the basic types (java.lang) and the DSL types.
$ bin/jmodc -st
[...]
Type:java.lang.IllegalThreadStateException (class)
Type:java.lang.Runnable (interface)
Type:java.lang.ThreadLocal (class)
Type:org.jmod.dsl.regex.Regex (dsl type)
3.1.3 Configuration
The DSL modules are reconfigured each time they are invoked. This is realised through the ModuleConfiguration class (Figure 4). The J% compiler identifies all the classes that are subclasses of the ModuleConfiguration class and use them to get at compile-time each DSL's module configuration. Each subclass has a set of public fields that define the configuration. These fields can be only int, String, float, and boolean types.
In Figure 4, the RegexConfiguration class is a subclass of ModuleConfiguration and has two public fields; optimisation which is set to "true", and engine which defines the underlying regular expression engine and is set to "posix".
If we declare a class NumberRegex,
import org.jmod.dsl.regex.Regex;
import org.jmod.dsl.regex.RegexConfiguration;
public class NumberRegex
extends Regex<RegexConfiguration> {
[0-9]+
}
the DSL module is invoked with optimisation set to "true", and engine set to "posix". When we want to turn the optimisation off, we have to change the RegexConfiguration with its subclass NonOptimisedRegex, which overrides the public field optimisation with the value "false". The class field overriding is supported fully by the Java programming language [7].
Figure 4: DSL Module Configuration Hierarchy
3.1.4 Code Generation
DSL modules are responsible for the code generation. Any third party DSL application library can be used and the only restriction is that the process must produce Java compatible code.
If the DSL block interacts with the main Java program through shared variables (external references, then the generated code uses the following convention. The generated class must have a constructor that initialises the shared variable with the correct order, which is defined in their declaration. For example, consider the SQL query:
select * from customer where
cust_id = #[1]<int> and cust_name
= #[2]<Ljava/lang/String;>
The constructor of the class should be CustomerQuery(int i, String s) to correctly initialise the two declared types, an integer that must be ordered first and a String as second.
4 Case Study: Regular Expressions
Regular expressions are a standard feature in many programming languages. Java, C#, Perl and Python contain regular expression engines as part of their API. The regular expression API in Java follows the Implementation:Embedding pattern and it is realised through an application library.
The Regular Expression Module in J% exports only the type Regex. The RegexConfiguration class declares the basic configuration parameters. The module performs syntax check in the regular expression, and reports the errors in compile-time. Its main dependency is the standard regular expression library java.util.regex, which is distributed as part of the JDK since v.1.4.
Consider that we have the following code, a regular expression that matches an IP Address (IPv4).
public class IpAddress
extends Regex<RegexConfiguration> {
([0-9]{1,3}\.){3}[0-9]{1,3}
}
The above code will be transformed into:
public class IpAddress
extends Regex<RegexConfiguration> {
private String regex;
public IpAddress() {
super(new RegexConfiguration());
this.regex =
"([0-9]{1,3}\\.){3}[0-9]{1,3}"; }
public Pattern getPattern() {
return Pattern.compile(regex); }}
The generated code is pretty straightforward and utilises the standard regular expression library. If the regular epxression has a syntactic error, the compiler will report it, during the compilation phase. Regular expressions do not have shared variables with Java, so this module represents the simplest form of integration between a DSL and J%.
5 Case Study: SQL
The SQL is the standard language for database query and manipulation. This case, is more complex than regular expressions, since it supports also shared variables, in addition with syntax checking. The syntax analysis is based on a custom SQL grammar, and the generated code utilises standard JDBC calls. We follow the type mapping scheme proposed by the JDBC specification [6].
The following listing illustrates an SQL query that perform a select statement from the database table customers. The #[1]<I> defines that the query needs one integer parameter.
public class CustomerSelect
extends SQL<SQLConfiguration> {
select * from customers where cust_id = #[1]<I>
}
The first initialisation parameter for this class will be an int. The generated code will look like:
public class CustomerSelect
extends SQL<SQLConfiguration> {
private int i; private String sql;
public CustomerSelect(int i) {
super(new SQLConfiguration());
this.i = i;
this.sql =
"select * from customers where cust_id = ?";
}
public PreparedStatement
getStatement(Connection c) {
try {
PreparedStatement r =
c.prepareStatement(sql);
r.setInt(1,i)
return r;
} catch (Exception e) {
return null; }
}
}
The CustomerSelect class utilises the JDBC API. The constructor accepts one integer parameter. The external reference (#[1]<I>) is replaced with a "?" in the generated code and a typical PreparedStatement is used.
6 Related Work
We studied approaches that extend functional languages, such as Haskell [9,10] and other implementation efforts that were based on general-purpose languages like Java and C++ [11,12,13]. Table I categorises the case studies that are presented in this section, according to Mernik et al. [3].
There is also a lot of work on the theoretical aspects of the Java programming language, like its type system [14,15]. Theoretical research also included multi-language systems and their type systems [16], which reveals a research direction, how to efficiently intermix programming languages.
The Boo programming language provides a Python inspired syntax and features String Interpolation and support for regular expressions implementing the Implementation:Embedding pattern.
An noteworthy approach for regular expression embedding is used by Perl [17]. Perl introduces operators that efficiently integrate regular expressions within the language syntax.
Haskell/DB [9] is a host language variant that has been extended to encapsulate SQL queries. This approach completely hides the DSL from the developer, hindering productivity and forbidding domain experts to become involved with the development process. Haskell/DB follows the Creational:Language Extension.
Python provides support for regular expressions and SQL. Database tables are encapsulated in classes extending the SQLObject. This mechanism permits the developer to execute simple queries without writing SQL, thus partially solving the problem of an erroneous query, simply by generating it automatically through library code.
Powerscript is the core development language for the database development environment Powerbuilder. Powerscript is a
specialization of the BASIC programming language language that supports integration with SQL. This is a classic example of Creational:Language Specialisation pattern, where a general-purpose language (BASIC), is restricted to a specialised development language for database applications. Powerscript provides syntax and type checking in the integrated SQL queries.
C# and Java share many common characteristics in their support of DSL languages. They both support regular expressions and SQL using the Implementation:Embedding pattern.
SQL DOM [18] acts as a pre-processor that translates an SQL database schema in C#. The generated collection of objects is used as an API to the main application, thus ensuring type safety and syntax checking. Notably, the SQL statements in this approach are generated with a provided getSQL() method. Cω [19] integrated both SQL and XML into its syntax extending the C# programming language.
XJ [11] provides XML static type checking with the extension of additional data types from an XML schema. XACT [12] and JDBC Checker [13] provide a completely different approach; they try to determine possible dynamically generated DSL statements during compile time, and provide error checking. JWIG is an interesting extension of Java for better web service development support [20]. Machete [21] is another Java extension that provides mechanisms towards the unification of pattern matching languages such as regular expressions, structured term patterns, XPATH and bit-level patterns. ILEA (Inter-LanguagE Analysis) [22] is a JVML (Java Virtual Machine Language) extension that permits extensive analysis in C source code, to cover JNI call problems. Jeannie [23] addresses the intermixing of Java and C at the programming language level.
Metaborg [24] utilises similar techniques for code generation, allowing language extensions and utilising existing application libraries. Its main problem is that it does not present a unified method for embedding. On the contrary, it encourages the developers to use their syntactic extensions for each module.
Table 1: Categorisation of multi-language systems
Implementation | Pattern |
|
Boo (regex) | Implementation:Embedding |
Haskell/DB (SQL) | Creational:Language Extension |
JDBC Checker (SQL) | Implementation:Compiler |
Powerscript (SQL) | Creational:Language Specialisation |
XJ (XML) | Creational:Language Extension |
XACT (XML) | Creational:Language Extension |
Cω (SQL, XML) | Creational:Language Extension |
Perl (regex) | Creational:Language Extension |
Java (regex,SQL) | Implementation:Embedding |
SQL DOM (SQL) | Implementation:Preprocessor |
J% (any) | Implementation: Extensible
compiler/interpreter |
JWIG (XML) | Creational:Language Extension |
Ruby (regex,SQL) | Implementation:Embedding |
Python (regex,SQL) | Implementation: Embedding |
Jeannie (C) | Creational: Language Extension |
Metaborg (any) | Implementation: Extensible
compiler/interpreter |
7 Conclusions and Future Work
J% is a language extension of Java that integrates DSL in an modular way. It also provides with syntax and type checking between DSL and Java. We also saw a few concrete examples of J% usage through its regular expression and SQL modules.
The modules presented are exhibiting the basic facilities of the J% compiler, and in the future we plan to add more features, exploiting further the compiler's awareness of the DSL code.
Upon maturation of the current prototype, we plan to add support for XML, named parameters for the DSL code blocks, and further study the following open issues:
- Host Language Support Current J% design and implementation focus only at the extension of the Java programming language. It would be interesting to explore, how the same set of techniques and methods could be applied on other languages, like C++.
- Dynamic DSL generation Our extension deals only for static DSL statements. In the future we plan to utilise existing research [12,13] and provide mechanisms to check dynamically generated statements.
- Compile Time Optimisations In many cases, the domain-specific language module, can act as an optimiser for DSL generated statements, such as code generation for regular expression.
- Unified Debugging process Typical debugging solutions are not fit for J%. The debugger must also support plug-ins for all the integrated programming languages. The debugging facilities, such as breakpoints, must work in an inter-unique way.
8 Availability
A prototype version of the J% compiler is available at http://gaijin.dmst.aueb.gr/~bkarak/programs/jmod/.
Acknowledgment
The research work presented in this publication is funded by AUEB's Funding Programme for Basic Research 2008 (project number 51).
References
- [1]
-
J. Placer, "Multiparadigm research: a new direction of language design,"
SIGPLAN Notices, vol. 26, no. 3, pp. 9-17, 1991.
- [2]
-
P. Zave, "A compositional approach to multiparadigm programming," IEEE
Software, vol. 6, no. 5, pp. 15-25, 1989.
- [3]
-
M. Mernik, J. Heering, and A. M. Sloane, "When and how to develop
domain-specific languages," ACM Computing Surveys, vol. 37,
no. 4, pp. 316-344, 2005.
- [4]
-
A. van Deursen and P. Klint, "Little languages: little maintenance,"
Journal of Software Maintenance, vol. 10, no. 2, pp. 75-92, 1998.
- [5]
-
J. Bentley, "Programming pearls: little languages," Communications of
the ACM, vol. 29, no. 8, pp. 711-721, 1986.
- [6]
-
M. Fisher, J. Ellis, and J. Bruce, JDBC API Tutorial and Reference,
3rd ed. Addison Wesley, 2003.
- [7]
-
J. Gosling, B. Joy, G. Steele, and G. Bracha, The Java Language
Specification, 3rd edition.
Addison-Wesley, 2005.
- [8]
-
T. Lindhorn and F. Yellin, The Java Virtual Machine
Specification, 2nd ed., ser. The Java Series. Addison-Wesley, 2003.
- [9]
-
D. Leijen and E. Meijer, "Domain specific embedded compilers," in PLAN
'99: Proceedings of the 2nd conference on Domain-specific languages. ACM Press, 1999, pp. 109-122.
- [10]
-
P. Thiemann, "Programmable type systems for domain specific languages," 2002.
- [11]
-
"Xj: Facilitating xml processing in java," World Wide Web (WWW), May
2005, (to appear).
- [12]
-
C. Kirkegaard, A. Moller, and M. I. Schwartzbach, "Static analysis of XML
transformations in java," IEEE Transactions on Software Engineering,
vol. 30, no. 3, pp. 181-192, March 2004.
- [13]
-
C. Gould, Z. Su, and P. Devanbu, "Static checking of dynamically generated
queries in database applications," in Proceedings of the 26th
International Conference on Software Engineering (ICSE'04). IEEE, may 2004, pp. 645-654.
- [14]
-
S. Drossopoulou, S. Eisenbach, and S. Khurshid, "Is the java type system
sound?" Theory and Practice of Object Systems, vol. 5, no. 1, pp.
3-24, 1999.
- [15]
-
D. Syme, "Proving java type soundness," in Formal Syntax and Semantics
of Java. London, UK: Springer-Verlag,
1999, pp. 83-118.
- [16]
-
J. Matthews and R. B. Findler, "Operational semantics for multi-language
programs," in POPL '07: Proceedings of the 34th annual ACM
SIGPLAN-SIGACT symposium on Principles of programming languages. ACM Press, 2007.
- [17]
-
L. Wall, T. Christiansen, and J. Orwant, Programming Perl. Sebastopol, CA: O'Reilly, 2000.
- [18]
-
R. A. McClure and I. H. Krüger, "SQL DOM: compile time checking of
dynamic sql statements," in ICSE '05: Proceedings of the 27th
international conference on Software engineering, 2005, pp. 88-96.
- [19]
-
G. Bierman, E. Meijer, and W. Schulte, "The essence of data access in cw," in
ECOOP 2005: Proceedings of the 19th European Conference on
Object-Oriented Programming, 2005, pp. 287-311.
- [20]
-
A. S. Christensen, A. Møller, and M. I. Schwartzbach, "Extending java
for high-level web service construction," ACM Transactions on
Programming Languages and Systems, vol. 25, no. 6, pp. 814-875, 2003.
- [21]
-
M. Hirzel, N. Nystrom, B. Bloom, and J. Vitek, "Matchete: Paths through the
pattern matching jungle," in Practical Aspects of Declarative
Languages (PADL08), 2008.
- [22]
-
G. Tan and G. Morrisett, "Ilea: inter-language analysis across java and c,"
in OOPSLA '07: Proceedings of the 22nd annual ACM SIGPLAN conference on
Object oriented programming systems and applications. ACM, 2007.
- [23]
-
M. Hirzel and R. Grimm, "Jeannie: granting java native interface developers
their wishes," in OOPSLA '07: Proceedings of the 22nd annual ACM
SIGPLAN conference on Object oriented programming systems and
applications. ACM, 2007.
- [24]
-
M. Bravenboer, R. de Groot, and E. Visser, "Metaborg in action: Examples of
domain-specific language embedding and assimilation using stratego/xt," in
GTTSE, 2006, pp. 297-311.
File translated from
TEX
by
TTH,
version 3.79.
On 30 Sep 2009, 08:53.