Skip to content

A truly simple scripting language to demonstrate how to lex, parse, generate byte code and execute it.

License

Notifications You must be signed in to change notification settings

jacobsimpson/tinyscript

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TinyScript - a really, really simple programming language

A basic project to cover the bare essentials of how a programming lanugage could be written targeting the JVM.

There is only one statement in this scripting language, the print statement, which can optionally take a double quote (") delimited string to print. With such a simple grammar, the general utility of this scripting language is limited, but it does provide an example of how to put a few general pieces together in such a way as to build a language.

Two basic steps are illustrated:

  1. Lexing and Parsing using ANTLR.
  2. Byte code generation using ObjectWeb ASM.
  3. Loading the generated byte code using the classloader so it can be executed.

How to Build

  1. Clone this project.

  2. Download Gradle and install it.

  3. Build the project and create a consolidate jar that is the language interpreter:

    gradle fatJar

  4. Run a sample script file

    java -jar build/libs/TinyScript-all.jar scripts/hello-world.ts

Anatomy

Lexing and Parsing

The first step is to translate the free form text into something that can be handled by a computer.

Lexing is the process of translating characters:

p r i n t

into a single token:

print

Parsing is the process of translating several tokens into a structure that relates the pieces together. For example, this script:

print
print "hello world"

should be translated into a stucture such that the first print is a statement on its own, but the second print has a string parameter.

The outcome of parsing is a tree with nodes representing the program structure.

The tool used here for creating a lexer and parser is called ANTLR. A grammar file (.g4) is used to describe the expected format of the input, and the ANTLR tool will generate the necessary Java code to do the actual lexing and parsing.

A Visitor is a base class generated by ANTLR that makes it easy to work with the tree generated by the parser.

Byte Code Generation

The ObjectWeb ASM library provides an API for generating byte code that makes it easier to generate the specific class file format required by the JVM.

The ByteCodeGenerationVisitor is an implementation of the ANTLR Visitor that will use the parse tree to call the ObjectWeb ASM library to generate byte code.

About

A truly simple scripting language to demonstrate how to lex, parse, generate byte code and execute it.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published