This repository provides the resources for the talk and accompanying hands-on exercises on Anomaly Detection at the EPFL Extension School Workshop - Machine Learning and Data Visualization at the Applied Machine Learning Days 2020.
Slides for the workshop are available here.
Dataset
The data is based on the KDD-CUP 1999 challenge on network intrusion detection. A description of the original task can be found here. The data provided for this workshop has been adapted from the NSL-KDD version.
Anomaly detection
Anomaly detection can be treated as a supervised classification task. However this approach struggles when the portion of anomalies (here network attacks) is small. Instead we showcase an approach using Isolation Forests.
The user can select the size of training dataset and vary its contamination rate, including a dataset without any anomalies. The model is then trained on this dataset and used to predict anomalies on a separate test set and evaluate the performance.
Hands-on exercises
The simplest way to run the hands-on exercises with Google's Colab or Binder in the cloud and interacting with them through your browser. Alternatively, you can choose to take a look at the already executed notebook in the Offline View.
Getting started:
If you are using Colab you need to execute the first cell. Otherwise you can skip this and start with loading settings and functions. If you want to execute a cell, make sure it is selected and then press SHIFT
+ENTER
or the 'Play'
button.