A basic project to cover the bare essentials of how a programming lanugage could be written targeting the JVM.
There is only one statement in this scripting language, the print statement, which can optionally take a double quote (") delimited string to print. With such a simple grammar, the general utility of this scripting language is limited, but it does provide an example of how to put a few general pieces together in such a way as to build a language.
Two basic steps are illustrated:
- Lexing and Parsing using ANTLR.
- Byte code generation using ObjectWeb ASM.
- Loading the generated byte code using the classloader so it can be executed.
-
Clone this project.
-
Download Gradle and install it.
-
Build the project and create a consolidate jar that is the language interpreter:
gradle fatJar
-
Run a sample script file
java -jar build/libs/TinyScript-all.jar scripts/hello-world.ts
The first step is to translate the free form text into something that can be handled by a computer.
Lexing is the process of translating characters:
p r i n t
into a single token:
print
Parsing is the process of translating several tokens into a structure that relates the pieces together. For example, this script:
print
print "hello world"
should be translated into a stucture such that the first print is a statement on its own, but the second print has a string parameter.
The outcome of parsing is a tree with nodes representing the program structure.
The tool used here for creating a lexer and parser is called ANTLR. A grammar file (.g4) is used to describe the expected format of the input, and the ANTLR tool will generate the necessary Java code to do the actual lexing and parsing.
A Visitor is a base class generated by ANTLR that makes it easy to work with the tree generated by the parser.
The ObjectWeb ASM library provides an API for generating byte code that makes it easier to generate the specific class file format required by the JVM.
The ByteCodeGenerationVisitor is an implementation of the ANTLR Visitor that will use the parse tree to call the ObjectWeb ASM library to generate byte code.