Skip to content

Latest commit

 

History

History
48 lines (34 loc) · 1.47 KB

README.md

File metadata and controls

48 lines (34 loc) · 1.47 KB

Datasets Preprocessing

This repository collects scripts used for various Network IDS datasets preprocessing and analysis.

Preprocessing means:

  • features extraction from PCAPS
  • scaling, train-test split, labeling...

Analsysis means:

  • supervised classification (many learning algorithms)
  • parameters tuning

Tree

.
├── <dataset>
│   ├── analysis
│   ├── extra_preprocessing
│   ├── flow_specifications
│   ├── labeling
│   ├── Makefile
│   ├── reproducibility
│   └── statistics
├── LICENSE
└── README.md

Each dataset contains the following sub-folders:

  • analysis: contains the supervised analysis scripts
  • extra_preprocessing: contains some extra pre-processing scripts for some feature vectors such as the multi-key
  • flow_specifications: contains json files used with the go-flows extractor to extract specific features
  • labeling: scripts for labeling feature vectors based on the dataset documentation
  • reproducibility: reproduce feature vectors extraction experiements
  • statistics: extract some usefull statistics regarding datasets and features such as correlations, frequency tables etc.

Current Datasets

Contact

fares.meghdouri@tuwien.ac.at