Skip to content

hancheng-li/OT_anomaly_detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OT_anomaly_detection

This folder contains the code, results, and plots of the paper:

Exploring Optimal Transport for Event-Level Anomaly Detection at the Large Hadron Collider

Authors: Nathaniel Craig, Jessica N. Howard, Hancheng Li

arXiv link: https://arxiv.org/abs/2401.15542

Paper DOI: TBD

How to cite

If you use this code, please consider citing the above paper.

Setup

Please make sure that your Python version is Python 3.9 or newer.

Please use conda or pip to install the following packages before running the notebooks. numpy, scikit-learn, POT, tqdm, matplotlib, and h5py.

Code Outline

The entire repo is split into five parts.

data/

This folder should contain all the data files locally. However, due to the size of data files, we won't upload the files onto this GitHub folder. The original data files can be found via links on the ADC2021 website: https://mpp-hep.github.io/ADC2021/. Note that we have used v2 of these datasets. The anomaly augmented background dataset can be generated by running the functions/dataProcessing/AnomalyAugmentData.ipynb notebook.

experiments/

This folder contains all the notebooks that generate the results in the paper. The notebooks are further arranged into separate folders based on the method used.

The three notebooks in experiments/OT/ consider the max, mean, and min OT distances (both 2D and 3D), respectively.

The experiments/OT_ML/ folder considers all methods which combine OT distances with simple Machine Learning (ML) algorithms. There are 3 subfolders:

  • OT_anomaly_kNN/: This contains the notebook kNN_3D_anomalyaug.ipynb which uses the pair-wise 3D OT distances between anomaly augmented and background data to train a kNN. This is a weakly supervised approach.

  • OT_kNN_classification/: This folder contains notebooks which perform various classification analyses. Both pair-wise 2D and 3D OT distances between each signal type and background data are considered. This is a supervised classification analysis.

  • OT_oneClassSVM/: This contains the notebook OneClassSVM_3D.ipynb which uses the pair-wise 3D OT distances between different background events to train a oneClassSVM. This is an unsupervised approach.

The notebook resultsForPaper_OT+ML.ipynb summarizes the results of all OT_ML methods and generates the numbers for the tables in the paper.

functions/

This folder contains the centralFunctions.ipynb notebook which has all the functions used in the notebooks with explanations. The dataProcessing/ folder contains the AnomalyAugmentData.ipynb notebook which generates the augmented background data.

results/

This folder contains all the JSON and npz files which store the results from various experiments. These are used for plotting.

paperPlots/

This folder contains the plots for ROC and SI curves and the notebook to generate the plots.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published