Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

140 add simple config examples #141

Merged
merged 9 commits into from
Aug 17, 2023
106 changes: 21 additions & 85 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,15 @@
![version](https://img.shields.io/github/v/tag/AutomatedProcessImprovement/simod)

Simod combines process mining and machine learning techniques to automate the discovery and tuning of Business Process Simulation models from event logs extracted from enterprise information systems (ERPs, CRM, case management systems, etc.).
Simod takes as input an event log in CSV format, a configuration file, and (optionally) a BPMN process model, and returns a business process simulation scenario that can be simulated using the [Prosimos](https://github.com/AutomatedProcessImprovement/Prosimos) simulator, which is embedded in Simod.
Simod takes as input an event log in CSV format, a configuration file, and (optionally) a BPMN process model, and returns a business process simulation scenario that can be simulated using the [Prosimos](https://github.com/AutomatedProcessImprovement/Prosimos) simulator, which is embedded in Simod.

## Simod

### Requirements
## Requirements

- Python 3.9
- Java 1.8 is required by Split Miner which is used for process model discovery
- Use [Poetry](https://python-poetry.org/) for building, installing, and managing Python dependencies

### Getting Started
## Getting Started

```shell
docker pull nokal/simod:v3.2.1
Expand Down Expand Up @@ -45,100 +43,38 @@ bash run.sh <path-to-config> <optional-path-to-output-dir>

Different Simod versions are available at https://hub.docker.com/r/nokal/simod/tags.

### Configuration

Example configuration with description and possible values:

```yaml
version: 4
common:
train_log_path: resources/event_logs/PurchasingExample.xes # Path to the event log in XES or CSV format.
test_log_path: resources/event_logs/PurchasingExampleTest.xes # Optional: Path to the test event log in XES or CSV format.
model_log_path: resources/models/PurchasingExampleTest.bpmn # Optional: Path to the process model to use for the control-flow.
num_final_evaluations: 1 # Number of times that the evaluation of the discovered model is done during the optimization. The evaluation metric of the candidate is the average of its evaluations.
discover_case_attributes: false
discover_prioritization_rules: false
discover_batching_rules: false
evaluation_metrics: # A list of evaluation metrics to use on the final model.
- dl
- absolute_hourly_emd
- cycle_time_emd
- circadian_emd
log_ids: # Specify the name for each of the columns in the CSV file (XES standard by default)
case_id: "case_id"
activity: "activity"
resource: "resource"
start_time: "start_timestamp"
end_time: "end_timestamp"
preprocessing: # Event log preprocessing settings.
multitasking: false # If true, remove the multitasking by adjusting the timestamps (start/end) of those activities being executed at the same time by the same resource.
enable_time_concurrency_threshold: 0.75 # Threshold to consider two activities as concurrent when computing the enabled time.
concurrency_df: 0.9 # Directly-Follows threshold for the heuristics' concurrency oracle (only used to estimate start times if needed).
concurrency_l2l: 0.9 # Length 2 loops threshold for the heuristics' concurrency oracle.
concurrency_l1l: 0.9 # Length 1 loops threshold for the heuristics' concurrency oracle.
control_flow: # Control-flow model discovery settings.
optimization_metric: n_gram_distance # Optimization metric for the structure. Options: DL or N_GRAM_DISTANCE.
num_iterations: 10 # Number of optimization iterations over the search space.
num_evaluations_per_iteration: 3 # Number of times to evaluate each iteration (using the mean of all of them).
gateway_probabilities: # Methods for discovering gateway probabilities. Options: equiprobable or discovery.
- equiprobable
- discovery
mining_algorithm: sm3 # Process model discovery algorithm. Options: sm2 or sm3 (recommended).
concurrency: # Split Miner 2 (sm2) parameter for the number of concurrent relations between events to be captured. Values between 0.0 and 1.0.
- 0.0
- 1.0
epsilon: # Split Miner 1 and 3 (sm1, sm3) parameter specifying the number of concurrent relations between events to be captured. Values between 0.0 and 1.0.
- 0.0
- 1.0
eta: # Split Miner 1 and 3 (sm1, sm3) parameter specifying the filter over the incoming and outgoing edges. Values between 0.0 and 1.0.
- 0.0
- 1.0
replace_or_joins: # Split Miner 3 (sm3) parameter specifying whether to replace non-trivial OR joins or not. Options: true, false.
- true
- false
prioritize_parallelism: # Split Miner 3 (sm3) parameter specifying whether to prioritize parallelism over loops or not. Options: true, false.
- true
- false
resource_model: # Resource model discovery settings.
optimization_metric: circadian_emd # Optimization metric for the calendars. Options: absolute_hourly_emd, cycle_time_emd, circadian_emd.
num_iterations: 10 # Number of optimization iterations over the search space.
num_evaluations_per_iteration: 3 # Number of times to evaluate each iteration (using the mean of all of them).
resource_profiles: # Resource profiles settings.
discovery_type: pool # Resource profile discovery type. Options: differentiated, pool, undifferentiated.
granularity: # Time granularity for calendars in minutes. Bigger logs will benefit from smaller granularity.
- 15
- 60
confidence:
- 0.5
- 0.85
support:
- 0.01
- 0.3
participation: 0.4 # Resource participation threshold. Values between 0.0 and 1.0.
extraneous_activity_delays: # Settings for extraneous activity timers discovery.
optimization_metric: relative_emd # Optimization metric for the extraneous activity timers. Options: relative_emd, absolute_emd, circadian_emd, cycle_time.
num_iterations: 1 # Number of optimization iterations over the search space.
num_evaluations_per_iteration: 3 # Number of times to evaluate each iteration (using the mean of all of them).
```
## Configuration file

A set of example configurations can be found in the
[resources](https://github.com/AutomatedProcessImprovement/Simod/tree/master/resources) folder along with a description
of each element:
- Basic configuration to discover the full BPS model ([here](https://github.com/AutomatedProcessImprovement/Simod/blob/master/resources/config/configuration_example.yml)).
- Basic configuration to discover the full BPS model, and evaluate it with a specified event log ([here](https://github.com/AutomatedProcessImprovement/Simod/blob/master/resources/config/configuration_example_with_evaluation.yml)).
- Basic configuration to discover a BPS model with a provided BPMN process model as starting point ([here](https://github.com/AutomatedProcessImprovement/Simod/blob/master/resources/config/configuration_example_with_provided_process_model.yml)).
- Complete configuration example with all the possible parameters ([here](https://github.com/AutomatedProcessImprovement/Simod/blob/master/resources/config/complete_configuration.yml)).

**NB!** Split Miner 1 is not supported anymore. Split Miner 3 will be executed instead.
You can run any of these examples by executing the following command:

```shell
poetry run simod optimize --config_path resources/config/configuration_example.yml
```

### Testing
## Testing

Use `pytest` to run tests on the package:

```shell
pytest
poetry run pytest
```

To run unit tests, execute:

```shell
pytest -m "not integration"
poetry run pytest -m "not integration"
```

Coverage:

```shell
pytest -m "not integration" --cov=simod
poetry run pytest -m "not integration" --cov=simod
```
Loading
Loading