Regex to NFA Builder

A command-line application that converts regular expressions to NFAs. The application then allows users to test input against the built NFA.

About

Name: Daire Ní Chatháin ID: G00334757

This application was created as part of my Graph Theory Module in GMIT. The code here is adapted from many sources, all referenced within the source code or in this readMe file.

Given Problem Statement

You must write a program in the Go Programming Language that can build a non-deterministic finite automaton (NFA) from a regular expression, and can use the NFA to check if the regular expression matches any given string of text. You must write the program from scratch and cannot use the regexp package from the Go standard library nor any other external library. A regular expression is a string containing a series of characters, some of which may have a special meaning. For example, the three characters “.”, “|”, and “∗ ” have the special meanings “concatenate”, “or”, and “Kleene star” respectively. So, 0.1 means a 0 followed by a 1, 0|1 means a 0 or a 1, and 1∗ means any number of 1’s. These special characters must be used in your submission. Other special characters you might consider allowing as input are brackets “()” which can be used for grouping, “+” which means “at least one of”, and “?” which means “zero or one of”. You might also decide to remove the concatenation character, so that 1.0 becomes 10, with the concatenation implicit. You may initially restrict the non-special characters your program works with to 0 and 1, if you wish. However, you should at least attempt to expand these to all of the digits, and the characters a to z, and A to Z.

Problem Simplified

Go program to build NFA from regex
Use NFA to check if regex matches string
Regex => regular characters + special characters

How it works

1. Reads in user input in postfix/infix notation 
2. If input is given in infix, parses the regular expression from infix to postfix notation. 
3. Builds a series of small NFA’s representing parts of the regular expression. 
4. Uses the smaller NFA’s to create the overall NFA. 
5. User enters input to implememnt the matching algorithm using the newly created NFA.

Running the program

Go must be installed to run the code. Follow official guidelines here.

Clone this repo

git clone https://github.com/DaireNiC/Regex-NFA-Builder

Navigate to the folder

cd Regex-NFA-Builder

Compile the Go code

go build main.go

Run the exe:

./main.exe

Code & Program Design

Converting Input from Infix to Postfix Notation

The NFA is created from postfix notation. This requires converting the user input from infix to postfix notation. A stack-like structure is implemented to facilitate this. The code for this conversion is found in the infixToPostfix.go file. The Shunting Yard Algorithm is used as the basis for this code.

Resources:

Building the NFA

Thompson's Construction Algorithm is used to build NFA's from the user's input in postfix notation. Both NFA's and Regular expressions have the same computational power, therefore any language that can be recognised by a regex can also be represented as a NFA, and visa versa.

Thompson's Construction uses fragments to create an NFA that recognises a given regular expression in postfix notation. A fragment in this case, is a smaller NFA used to build a larger, overall NFA.

A stack structure is used to store the fragments for the creation of the overall NFA. In Thompson's Construction, all fragments have a single initial state(required for an NFA) and a single final state. I used the following NFA struct to represent this:

type  nfa  struct { 
        initial *state 
        accept *state 
}

Normal characters (see below for special characters) are pushed to the stack in the from of a fragment as detailed below. As it is an NFA, the empty regular expression ε is also included.

The struct used to represent the states and arrows is shown below. In Thompson's Construction, every state has two arrows maximum. In the case of only one arrow from a state, the second edge will simply be ignored. The state struct can be considered a traditional NFA state and edges encompassed in one data structure.

type  state  struct { 
        symbol rune // if ε, will be represented as a 0 value 
        edge1 *state //pointers to other states (similar to linked list) 
        edge2 *state  
}

The following are considered special characters. These characters pop from & push elements to the stack.

	Special Characters	Automaton
\|	OR
.	concatenation
*	zero or more
+	one or more
?	zero or one

Images courtesy of Rus Cox's paper

Once the NFA is created, the user can use it to test against an input string.

String Matching

String matching is completed using the PoMatch Function:

        func PoMatch (postfix string, s string) bool { 
        ... 
        }

Given the regular expression in postfix notation and a string to run through the NFA once built, a boolean is returned indicating if it is a match/not.

The NFA is first created using the previously described NFA fragments. The string matching algorithim is then implemented. When matching aginst an NFA, the input can be in a number of states at a given time. To keep track of the states, an array of pointers to the current states is used. The current set of states are updated as each character of the input string is processed. If any of the current states are the accept state (indicated by symbol == 0 and no arrows coming from it), the function returns with a true value.

Testing

In an effort to both understand and get experience with Unit Testing using GoLang I created a test suite. I used the GoLang Docs here for guidance on creating and running tests using Go.

The first test tests the conversion of infix notation to postfix. The test coverage in all tests is 97~% and covers a range of cases.

The second tests the postfix to NFA creation. Test tables were used for efficient creation of tests and their assertions. Guidance on the use of test tables was found on this blog.

Running the Tests

Navigate to the folder

        cd Regex-NFA-Builder/nfaBuilder

Execute the tests

        go test

Extras relevant to Project

The regex engine can both create NFAs & test them against strings over the alaphabet a to z, and A to Z. It can also recognise all digits.
The special character "+", which means 'at least one of' was added as an extra feature
The special character "?", which means 'zero or one of' was added as an extra feature
Options for infix or postfix input
Comprehensive tests were added to the project for robustness
An extensive github history indicating the time spent on research, coding, documentation and referencing where due.
Consistent work ethic, committing week by week to the repo.

Research

Finite Automata & Regular Expressions

While researching both the theory and implementation of finite automata, I used The Theory of Computation by Michael Sipser along with Dr.Ian McLoughlin's notes (lecturer @ GMIT) as my primary resources. Considering the amount of code available for the project, I directed a lot of time towards research and understanding the theory behind the construction.

For the creation of the Regex to NFA builder, the paper Regular Expression Matching Can Be Simple And Fast by Russ Cox was the primary basis for the code written in this application. Cox provides an implementation in C, which I adapted along with videos provided to create this Go program.

Go application Structure, GoDocs, Code Comments

Guidance on structuring code in Go (comments, tests, best practises, documentation) was found on the following blogs & articles :

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
InfixToPostfix		InfixToPostfix
infixToPostfix		infixToPostfix
media		media
nfaBuilder		nfaBuilder
.gitignore		.gitignore
README.md		README.md
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Regex to NFA Builder

About

Given Problem Statement

Problem Simplified

How it works

Running the program

Clone this repo

Navigate to the folder

Compile the Go code

Run the exe:

Code & Program Design

Converting Input from Infix to Postfix Notation

Building the NFA

String Matching

Testing

Running the Tests

Extras relevant to Project

Research

Finite Automata & Regular Expressions

Go application Structure, GoDocs, Code Comments

About

Releases

Packages

Languages

DaireNiC/Regex-NFA-Builder

Folders and files

Latest commit

History

Repository files navigation

Regex to NFA Builder

About

Given Problem Statement

Problem Simplified

How it works

Running the program

Clone this repo

Navigate to the folder

Compile the Go code

Run the exe:

Code & Program Design

Converting Input from Infix to Postfix Notation

Building the NFA

String Matching

Testing

Running the Tests

Extras relevant to Project

Research

Finite Automata & Regular Expressions

Go application Structure, GoDocs, Code Comments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages