Skip to content

A command-line application that converts regular expressions to NFAs. The application then allows users to test input against the built NFA.

Notifications You must be signed in to change notification settings

DaireNiC/Regex-NFA-Builder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Regex to NFA Builder

A command-line application that converts regular expressions to NFAs. The application then allows users to test input against the built NFA.

About

Name: Daire Ní Chatháin ID: G00334757

This application was created as part of my Graph Theory Module in GMIT. The code here is adapted from many sources, all referenced within the source code or in this readMe file.

Given Problem Statement

You must write a program in the Go Programming Language that can build a non-deterministic finite automaton (NFA) from a regular expression, and can use the NFA to check if the regular expression matches any given string of text. You must write the program from scratch and cannot use the regexp package from the Go standard library nor any other external library. A regular expression is a string containing a series of characters, some of which may have a special meaning. For example, the three characters “.”, “|”, and “∗ ” have the special meanings “concatenate”, “or”, and “Kleene star” respectively. So, 0.1 means a 0 followed by a 1, 0|1 means a 0 or a 1, and 1∗ means any number of 1’s. These special characters must be used in your submission. Other special characters you might consider allowing as input are brackets “()” which can be used for grouping, “+” which means “at least one of”, and “?” which means “zero or one of”. You might also decide to remove the concatenation character, so that 1.0 becomes 10, with the concatenation implicit. You may initially restrict the non-special characters your program works with to 0 and 1, if you wish. However, you should at least attempt to expand these to all of the digits, and the characters a to z, and A to Z.

Problem Simplified

  • Go program to build NFA from regex
  • Use NFA to check if regex matches string
  • Regex => regular characters + special characters
    flow_chart

How it works

1. Reads in user input in postfix/infix notation 
2. If input is given in infix, parses the regular expression from infix to postfix notation. 
3. Builds a series of small NFA’s representing parts of the regular expression. 
4. Uses the smaller NFA’s to create the overall NFA. 
5. User enters input to implememnt the matching algorithm using the newly created NFA. 

cmd

Running the program

Go must be installed to run the code. Follow official guidelines here.

Clone this repo

git clone https://github.com/DaireNiC/Regex-NFA-Builder 

Navigate to the folder

cd Regex-NFA-Builder 

Compile the Go code

go build main.go 

Run the exe:

./main.exe 

Code & Program Design

Converting Input from Infix to Postfix Notation

The NFA is created from postfix notation. This requires converting the user input from infix to postfix notation. A stack-like structure is implemented to facilitate this. The code for this conversion is found in the infixToPostfix.go file. The Shunting Yard Algorithm is used as the basis for this code.

Resources:

Building the NFA

Thompson's Construction Algorithm is used to build NFA's from the user's input in postfix notation. Both NFA's and Regular expressions have the same computational power, therefore any language that can be recognised by a regex can also be represented as a NFA, and visa versa.

Thompson's Construction uses fragments to create an NFA that recognises a given regular expression in postfix notation. A fragment in this case, is a smaller NFA used to build a larger, overall NFA.

A stack structure is used to store the fragments for the creation of the overall NFA. In Thompson's Construction, all fragments have a single initial state(required for an NFA) and a single final state. I used the following NFA struct to represent this:

type  nfa  struct { 
        initial *state 
        accept *state 
} 

Normal characters (see below for special characters) are pushed to the stack in the from of a fragment as detailed below. As it is an NFA, the empty regular expression ε is also included. non_special_character

The struct used to represent the states and arrows is shown below. In Thompson's Construction, every state has two arrows maximum. In the case of only one arrow from a state, the second edge will simply be ignored. The state struct can be considered a traditional NFA state and edges encompassed in one data structure.

type  state  struct { 
        symbol rune // if ε, will be represented as a 0 value 
        edge1 *state //pointers to other states (similar to linked list) 
        edge2 *state  
} 

The following are considered special characters. These characters pop from & push elements to the stack.

Special Characters Automaton
| OR or
. concatenation concat
* zero or more zero_or_more
+ one or more enter image description here
? zero or one enter image description here

Images courtesy of Rus Cox's paper

Once the NFA is created, the user can use it to test against an input string.

String Matching

String matching is completed using the PoMatch Function:

        func PoMatch (postfix string, s string) bool { 
        ... 
        } 

Given the regular expression in postfix notation and a string to run through the NFA once built, a boolean is returned indicating if it is a match/not.

The NFA is first created using the previously described NFA fragments. The string matching algorithim is then implemented. When matching aginst an NFA, the input can be in a number of states at a given time. To keep track of the states, an array of pointers to the current states is used. The current set of states are updated as each character of the input string is processed. If any of the current states are the accept state (indicated by symbol == 0 and no arrows coming from it), the function returns with a true value.

Testing

In an effort to both understand and get experience with Unit Testing using GoLang I created a test suite. I used the GoLang Docs here for guidance on creating and running tests using Go.

The first test tests the conversion of infix notation to postfix. The test coverage in all tests is 97~% and covers a range of cases.

test_coverage

The second tests the postfix to NFA creation. Test tables were used for efficient creation of tests and their assertions. Guidance on the use of test tables was found on this blog.

Running the Tests

  • Navigate to the folder
        cd Regex-NFA-Builder/nfaBuilder 
  • Execute the tests
        go test 

Extras relevant to Project

  • The regex engine can both create NFAs & test them against strings over the alaphabet a to z, and A to Z. It can also recognise all digits.
  • The special character "+", which means 'at least one of' was added as an extra feature
  • The special character "?", which means 'zero or one of' was added as an extra feature
  • Options for infix or postfix input
  • Comprehensive tests were added to the project for robustness
  • An extensive github history indicating the time spent on research, coding, documentation and referencing where due.
  • Consistent work ethic, committing week by week to the repo.

Research

Finite Automata & Regular Expressions

While researching both the theory and implementation of finite automata, I used The Theory of Computation by Michael Sipser along with Dr.Ian McLoughlin's notes (lecturer @ GMIT) as my primary resources. Considering the amount of code available for the project, I directed a lot of time towards research and understanding the theory behind the construction.

For the creation of the Regex to NFA builder, the paper Regular Expression Matching Can Be Simple And Fast by Russ Cox was the primary basis for the code written in this application. Cox provides an implementation in C, which I adapted along with videos provided to create this Go program.

Go application Structure, GoDocs, Code Comments

Guidance on structuring code in Go (comments, tests, best practises, documentation) was found on the following blogs & articles :

About

A command-line application that converts regular expressions to NFAs. The application then allows users to test input against the built NFA.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages