This repository contains the prototype of the approach and evaluation as described in LLM-based Process Constraints Generation with Context.
When analyzing the data generated by complex information systems, identifying undesirable behaviors in event log traces---so-called conformance checking---is a key challenge. With the rise of deep learning and, more specifically, generative AI applications, one promising line of research is the auto-generation of (symbolic) temporal reasoning queries that can then be applied in a semi-automatic manner. A recent work has shown that utilizing fine-tuned open-source large language models (LLMs) is promising and in some aspects superior to other approaches to automated conformance checking. This thesis will further expand this research direction, by supporting the provision of process-specific natural language context and expanding the practical expressivity of the generated constraints.
- clone this project
git clone
to get the repository - install the requirements using following CLI
./install.sh
Make sure to adapt all path file names to your needs.
├── install.sh <- Script used to install all requirements and data to be able to run pipeline.
├── constraints-transformer
│ ├── conversion <- Contains bpmn analyzer, json2petrinet, and petrinet analyzer.
│ ├── data <- Contains data used to run pipeline, and later results generated. (Created after running install.sh script)
│ ├── evaluation <- contains utility functions for evaluation.
│ ├── labelparser <- contains utility functions for label parsing.
│ ├── results <- Figures for the paper.
│ ├ 01_run_paper_preprocessing_data.py <- Script to generate train,test, and validatrion set
│ ├ 02_run_training.py <- Script to fine-tune FLAN-T5
│ ├ 04_run_generate_predictions_testset.py <- Script to generate xSemAD predictions on testset
│ ├ declare_to_textdesc.py <- Converts DECLARE constraints into text descriptions
│ ├ vanilla_llm_predictions.py <- Generates predictions using non-fine-tuned GPT
│ ├ random_predictions.py <- Generates random constraint predictions as a baseline
│ ├ 10_paper_paper_results.py <- Script go generate the PAPER RESULTS
│ ├ config.py <- config file
│ └ requirements.txt <- requirements file
├── README.md <- The top-level README for users of this project.
└── LICENSE <- License that applies to the source code in this repository.
The files in this repository are generally run in the order they appear in the project structure. Below is a guide for running the files:
-
Start with preprocessing the data:
python3 constraints-transformer/01_run_paper_preprocessing_data.py
-
Run each subsequent file in the order they appear: Follow the project structure and execute the files step-by-step.
-
Continue through the pipeline: Always check for optional arguments or configurations using -h before running a file.
Some files can be run with additional arguments or configurations. To learn about the available options for these files, use the -h (help) flag. If the file doesn’t take any arguments, it will start execution instead. Example:
python3 constraints-transformer/vanilla_llm_predictions.py -h
This will display available arguments and their descriptions for the file. Feel free to refer to the help command for guidance on any specific script’s functionality or configuration options. Or contact the authors.
If you found an issue or would like to submit an improvement to this project, please contact the authors.