This repository collects scripts used for various Network IDS datasets preprocessing and analysis.
Preprocessing means:
- features extraction from PCAPS
- scaling, train-test split, labeling...
Analsysis means:
- supervised classification (many learning algorithms)
- parameters tuning
.
├── <dataset>
│ ├── analysis
│ ├── extra_preprocessing
│ ├── flow_specifications
│ ├── labeling
│ ├── Makefile
│ ├── reproducibility
│ └── statistics
├── LICENSE
└── README.md
Each dataset contains the following sub-folders:
- analysis: contains the supervised analysis scripts
- extra_preprocessing: contains some extra pre-processing scripts for some feature vectors such as
the multi-key
- flow_specifications: contains
json
files used with the go-flows extractor to extract specific features - labeling: scripts for labeling feature vectors based on the dataset documentation
- reproducibility: reproduce feature vectors extraction experiements
- statistics: extract some usefull statistics regarding datasets and features such as correlations, frequency tables etc.