One Automaton To Rule Them All: Beyond Multiple Regular Expressions Execution

This repository contains the source code of the paper ``One Automaton To Rule Them All: Beyond Multiple Regular Expressions Execution'' where we deal with the problem of efficient Regular Expressions (REs) matching. The key idea of this work is to exploit REs morphological similarities within the same dataset to enable memory reduction when storing the patterns and improving the dataset-matching throughput. This idea is expressed through:

the Multi-RE Finite State Automata (MFSA) model that extends the Finite State Automata (FSA) model to improve REs parallelization by leveraging similarities within a specific application ruleset
a multi-level compilation framework to manage REs merging and optimization to produce MFSA(s) automatically
an execution engine called iMFAnt, which is an extension of the state-of-the-art iNFAnt algorithm

benchmark_scripts and benchmark_results contain the scripts to evaluate datasets similarity and the numericalresult of such evaluation, respectively (Fig. 1).
datasets contains the RE datasets
framework contains the compilation framework
input_streams contains the data streams to match
matching contains the iMFAnt engine
mfsa contains the automata in ANML format, resulting from the compilation framework
output_matches contains the numeric results of MFSAs analysis (time, #matches)
plots contains the plots resulting from the execution of the scripts in scripts folder
plotting scripts contains the scripts for the plotting of the numerical results
raw_results contains the numerical results corresponding to the plots
scripts contains the scripts to plot the compression, compilation time, execution time and throughput results (Figs. 7, 8, 9, 10)

To customize the experiments go to the customization section

Default Configurations

Run the scripts in iMFAnt/scripts folder to reproduce the experimental results reported in the paper. These include the automatic generation of the plots with the default merging factors (0, 1, 2, 5, 10, 20, 50, 100)

CGO'24 Artifact Evaluation Instructions

The following instructions are to reproduce the CGO'24 results. The fast way is to go to the artifacts setup

Artifacts Setup

An x86 CPU is mandatory for the Docker setup, other CPUs are usable in general but not tested.

Clone the repository:

git clone https://github.com/necst/iMFAnt.git

and navigate into the repo folder

cd iMFAnt

Please follow one of the two setups: 1) native Ubuntu-based machine; 2) any OS with Docker and then to the Running the Experiments.

1) Native Ubuntu-based Machine

To execute CGO'24 AE are required

make cmake
flex
bison
g++
openmp (included in g++ installation)
python3.10 with numpy pandas matplotlib

A possible command to install everything you need:

sudo apt-get install -y locales curl wget tar sudo git apt-utils nano
sudo apt-get install -y g++ python3 python3-pip automake make cmake flex bison
sudo pip3 install numpy pandas matplotlib

navigate into the scripts folder

cd scripts/

2) Any OS with Docker

It is supposed you have installed and configured Docker according to the docker instructions. An x86 CPU is mandatory.

Open a terminal and then run the script

run_docker.sh

that will build and run the docker container.

navigate into the scripts folder

cd iMFAnt/scripts/

Running the Experiments

All the experiments take a shorter time than expected if you open them and reduce the number of repetitions, changing the value of -r flag in the execution of python3 merging.py -r XXX. More info on experiment customization here

Each experiment will execute and collect the corresponding results and concur to the paper figure.

Compression Results (Figure 7)

To reproduce Fig. 7 (ETA: 5 min)

./compression.sh

To visualize these results go here

Compilation Results (Figure 8)

To reproduce Fig. 8 (ETA: 30 min with 30 reps)

./compilation_time.sh

To visualize these results go here

iMFAnt Execution Results single-/multi-threaded(Figures 9 and 10)

To reproduce Figs. 9 and 10 (ETA: 12 h with 15 reps)

./iMFAnt_performance.sh

Figure 9 and 10 are expected to display different optimal merging factor according to testing machine characteristics. Anyhow, the MFSAs are expected to improve the simple multi-thread scaling according to the trends in the paper.

To visualize these results go here

Chart Visualization

The scripts run automatically the desired experiments and you will find the corresponding PDF plots in iMFAnt/plots folder.

If you follow the Docker setup, exit the container and you will find there the results.

Then open the PDFs as you prefer.

Customization Options

To merge a new dataset save the RE file and the input stream file in dataset and input_streams folders, directly. Add the corresponding directory into framework/compiler/merging.py file, in inputRE list and run it to perform the merging.
To pattern-match a new dataset, add the corresponding mfsa and input_streams directories to imfant.py file and run it, ensuring that the merging factors are consistent with the ones employed during the merging.

Automata Generation:

The following options are available for the compilation of REs in framework/re2automata/re2automata.cpp:

Variable	Meaning
STATES	if 1 eval. states compression, else eval. trans. compression
ANML	if 1 output ANML file, else output DOT file
STACK_TIME	if 1 eval. compilation time
V	if 1 enable verbose mode

These parameters are active by default.

Enter the compilation framework and compile the compiler: cd framework/compiler and build the compiler, entering the following:

cd framework/compiler
bison -d compiler.yy -v
flex -o compiler.yy.cc compiler.lex

At compile-time, set the desired values of STATES and STACK_TIME according to the values in the table:

g++ -o compiler compiler.yy.cc compiler.tab.cc ast.cpp ../re2automata/re2automata.cpp -DSTATES=1 -DSTACK_TIME=0 -w -std=c++11 -g -O3

Run

python merging.py -r REPS -b [M1 ... Mn]

to generate the automata for the benchmark datasets. -r indicates the number of repetitions to evaluate the compilation time, and -b indicates the list of merging factors to reproduce the results in the paper. In practice, any value for M is valid. To merge the entire dataset, set M=0. Adjust the merging factors consistently in matching/imfant.py to run the pattern matching engine.

The generated automata are in mfsa
The compression results are in raw_results

iMFAnt execution customization:

Compile the matching algorithm:

cd ../../matching/ && make clean all

To match a single folder of MFSAs run:

export OMP_NUM_THREADS=N; make && ./multithreaded_imfant STREAM_IN + MFSA_DIR NUM OUTPUT_FILE

where N is the number of threads, STREAM_IN is the input stream to be matched, MFSA_DIR is the directory of the folder containing the MFSAs to match, NUM is the number of MFSAs to match in that directory, and OUTPUT_FILE contains the results of the matching (matching time, #matches).

happy RE matching :)

Credits and Contributors

Contributors: Luisa Cicolini, Filippo Carloni, Marco Domenico Santambrogio, Conficconi Davide

If you find this repository useful, please use the following citation(s):

@inproceedings{cicolini2024one,
  title={One Automaton to Rule Them All: Beyond Multiple Regular Expressions Execution},
  author={Cicolini, Luisa and Carloni, Filippo and Santambrogio, Marco D and Conficconi, Davide},
  booktitle={2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)},
  pages={193--206},
  year={2024},
  organization={IEEE}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

One Automaton To Rule Them All: Beyond Multiple Regular Expressions Execution

Table of contents

Repo Overview

Default Configurations

CGO'24 Artifact Evaluation Instructions

Artifacts Setup

1) Native Ubuntu-based Machine

2) Any OS with Docker

Running the Experiments

Compression Results (Figure 7)

Compilation Results (Figure 8)

iMFAnt Execution Results single-/multi-threaded(Figures 9 and 10)

Chart Visualization

Customization Options

Automata Generation:

iMFAnt execution customization:

happy RE matching :)

Credits and Contributors

About

Releases 3

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
benchmark_results		benchmark_results
benchmark_scripts		benchmark_scripts
datasets		datasets
docker		docker
framework		framework
input_streams		input_streams
matching		matching
paper		paper
plotting_scripts		plotting_scripts
scripts		scripts
LICENSE		LICENSE
README.md		README.md
run_docker.sh		run_docker.sh

License

necst/iMFAnt

Folders and files

Latest commit

History

Repository files navigation

One Automaton To Rule Them All: Beyond Multiple Regular Expressions Execution

Table of contents

Repo Overview

Default Configurations

CGO'24 Artifact Evaluation Instructions

Artifacts Setup

1) Native Ubuntu-based Machine

2) Any OS with Docker

Running the Experiments

Compression Results (Figure 7)

Compilation Results (Figure 8)

iMFAnt Execution Results single-/multi-threaded(Figures 9 and 10)

Chart Visualization

Customization Options

Automata Generation:

iMFAnt execution customization:

happy RE matching :)

Credits and Contributors

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 4

Languages

Packages