Explainable Probabilistic Forecasting of Conflict-Related Fatalities

There were not that many peaceful years in our history. Let's at least forecast when the next unrest comes.

Motivation for the project

A new wave of violent conflicts around the world raises concerns about the security of the whole world. According to the ACLED Conflict Index - 12% more conflicts occurred in 2023 compared to 2022 and the trend does not seem to halt.

The goal of our research is to build a robust model for early military conflict prediiction available to people. Awareness of emerging risks should be a right, not a privilege.

What is already done

This repository presents the first publicly available and explainable early conflict forecasting model capable of forecasting the distribution of conflict-related fatalities on a country-month level. The model seeks to be maximally transparent and produces predictions up to 14 months into the future. Our model improves over 4 out of 6 benchmark years but so far misses important violence spikes.

Keywords: Interstate conflict modelling · Early Conflict Warning System · Fatalities prediction · Predicting with uncertainty.

Getting started

Before you proceed with running the model and iterating over existing implementations, it's important to understand the inputs and outputs of the model.

The model takes as an input a dataset from ViEWS prediction competition 2024 augmented with some additional features and PCAs. The full pipeline for data preprocessing is stored in the data_preprocessing_pipeline folder.

The original dataset consists of UCDP Geo-referenced Event Dataset (GED), the V-Dem dataset, and the World Development Indices, ACLED dataset and some others.

The model outputs the predicted distribution of conflict-related fatalities for each country-month pair (this is a regression problem). The default prediction window is 14 months ahead, as this was required by the ViEWS competition rules. But this can be easily adjusted in the 6. shift yearly cm_features.py data pipeline file.

The great sources of information about the model are the technical report and the shortened version of the report. They describe the model in detail and provide insights into the model's performance, as well as suggest possible improvements.

Technical report

The technical report with details of implementation and nuances of the model is available on the Google Drive.

For your convenience, the technical report structure is shown below.

Shortened report

The shortened version of the report is available on Medium. This report provides a high-level overview of the model and its performance.

The Prediction Model

While the code is flexible and any model can be used, we build our model using the Natural Gradient Boosting ( NGBoost) framework. The other models are in development.

The NGBoost model code is stored in the model folder in two representations: .py and .ipynb in the model folder. To GitHub, we push only .py files. The .ipynb files are generated using Jupytext (see bash scripts in the section below).

Simply run the script .py or .ipynb scripts, and it will train NGBoost model based on paramethers specified in the header of the file, produce plots and submission files that can be evaluated to derive model accuracy.

Model Evaluation

The model is evaluated using evaluate_submissions.py file, and the aggregated statistics about the model can be gathered via compare_submissions.ipynb.

For developers

Install dependencies

Set up the environment using poetry by running the following command using base interpreter as Python 3.10:

poetry install

(install Poetry if you don't have it yet)

Install pre-commit hooks

Run the following command to install pre-commit hooks:

```bash
pre-commit install

Ensure that you have the following dependencies installed:

Black (for Python code formatting)
Jupyter (for removing output from notebooks)

Jupytext

For better development experience and version control, the Jupytext library is used to generate .py files based on their .ipynb representation and vice-versa. Additionally, Jupytext provides a convenient syncing logic between both representations.

Bash scripts

There are two bash scripts available:

data_preprocessing_pipeline.sh - script for running all steps of the data preprocessing pipeline. Note that this requires an R and Python environment set-up. The reason for this is that the pipeline uses some libraries exclusively available in R only.
jupytext_sync.sh - script to create a Jupyter model file and sync it with its Python representation.

Run the following command to give execute permission to bash script:

chmod +x [file].sh

Run the following command to execute the bash script:

./[file].sh [args]

Generate jupyter notebooks based on .py files

Run the following command to generate jupyter notebook:

jupytext --to ipynb [file_name].py

Run the following command to turn jupyter notebook into a paired ipynb/py notebook:

jupytext --set-formats ipynb,py [file_name].ipynb

Run the following command to syncronize the jupyter notebook with changes in python file:

jupytext --sync [file_name].ipynb

Technical report structure

The technical report is structured as follows:

Introduction
Related Work
Summary of contributions
Methodology
1. Level of analysis and prediction window
2. Original Competition Dataset
3. Data preprocessing
  1. Data cleaning
  2. Dependent variable shifting
  3. Regions addition
  4. Parametrization
  5. Least Important Features Drop
4. Natural Gradient Boosting
  1. Handling Negative Predictions
  2. Handling of removed countries
5. Scoring Criteria
  1. Continuous Ranked Probability Score
  2. Ignorance Score
  3. Mean Interval Score
  4. Metrics Implementation
6. Model fine-tuning
7. Competition Benchmarks
  1. Last Historical Poisson
  2. Bootstraps from actuals
Results
1. General Performance
2. Additional evaluation for the 2022 year
3. Model accuracy dependency on input fatalities distribution of the month
4. Feature Importance
5. Analysis of country forecasts
Discussion
Future work
Appedix with tables and figures

I hope you have fun reading it :P

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
actuals/cm		actuals/cm
benchmarks		benchmarks
data		data
data_preprocessing_pipeline		data_preprocessing_pipeline
evaluation		evaluation
model		model
scripts		scripts
services		services
submission		submission
utilities		utilities
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LEARN.md		LEARN.md
LICENSE		LICENSE
README.md		README.md
changelog.md		changelog.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Explainable Probabilistic Forecasting of Conflict-Related Fatalities

Motivation for the project

What is already done

Getting started

Technical report

Shortened report

The Prediction Model

Model Evaluation

For developers

Install dependencies

Install pre-commit hooks

Jupytext

Bash scripts

Generate jupyter notebooks based on .py files

Technical report structure

About

Releases 3

Packages

Languages

License

fif911/probabilistic_conflict_modelling

Folders and files

Latest commit

History

Repository files navigation

Explainable Probabilistic Forecasting of Conflict-Related Fatalities

Motivation for the project

What is already done

Getting started

Technical report

Shortened report

The Prediction Model

Model Evaluation

For developers

Install dependencies

Install pre-commit hooks

Jupytext

Bash scripts

Generate jupyter notebooks based on .py files

Technical report structure

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages