This folder contains the code, results, and plots of the paper:
Exploring Optimal Transport for Event-Level Anomaly Detection at the Large Hadron Collider
Authors: Nathaniel Craig, Jessica N. Howard, Hancheng Li
arXiv link: https://arxiv.org/abs/2401.15542
Paper DOI: TBD
If you use this code, please consider citing the above paper.
Please make sure that your Python version is Python 3.9 or newer.
Please use conda
or pip
to install the following packages before running the notebooks.
numpy
,
scikit-learn
,
POT
,
tqdm
,
matplotlib
,
and h5py
.
The entire repo is split into five parts.
This folder should contain all the data files locally. However, due to the size of data files, we won't upload the files onto this GitHub folder.
The original data files can be found via links on the ADC2021 website: https://mpp-hep.github.io/ADC2021/. Note that we have used v2 of these datasets.
The anomaly augmented background dataset can be generated by running the functions/dataProcessing/AnomalyAugmentData.ipynb
notebook.
This folder contains all the notebooks that generate the results in the paper. The notebooks are further arranged into separate folders based on the method used.
The three notebooks in experiments/OT/
consider the max, mean, and min OT distances (both 2D and 3D), respectively.
The experiments/OT_ML/
folder considers all methods which combine OT distances with simple Machine Learning (ML) algorithms. There are 3 subfolders:
-
OT_anomaly_kNN/
: This contains the notebookkNN_3D_anomalyaug.ipynb
which uses the pair-wise 3D OT distances between anomaly augmented and background data to train a kNN. This is a weakly supervised approach. -
OT_kNN_classification/
: This folder contains notebooks which perform various classification analyses. Both pair-wise 2D and 3D OT distances between each signal type and background data are considered. This is a supervised classification analysis. -
OT_oneClassSVM/
: This contains the notebookOneClassSVM_3D.ipynb
which uses the pair-wise 3D OT distances between different background events to train a oneClassSVM. This is an unsupervised approach.
The notebook resultsForPaper_OT+ML.ipynb
summarizes the results of all OT_ML methods and generates the numbers for the tables in the paper.
This folder contains the centralFunctions.ipynb
notebook which has all the functions used in the notebooks with explanations. The dataProcessing/
folder contains the AnomalyAugmentData.ipynb
notebook which generates the augmented background data.
This folder contains all the JSON
and npz
files which store the results from various experiments. These are used for plotting.
This folder contains the plots for ROC and SI curves and the notebook to generate the plots.