Skip to content

Network-Based Malware Detection using Natural Language Processing

License

Notifications You must be signed in to change notification settings

AnonimousX1/NLP-Malware

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Network-Based Malware Detection using Natural Language Processing

This project illustrates a method that utilizes the ordering of network flows to classify malicious behavior. The approach is lightweight and privacy preserving while also being resilient to encrypted packet payloads.

Getting Started

Prerequisites

The project is written in python3, ensure you have the latest version of python3 and pip3 installed. The project relies on tshark for pre-processing pcap files, and p7zip to extract zip files.

On Ubuntu, these can be installed using:

sudo apt-get install tshark p7zip

Besides these, other required packages can be installed using pip3.

pip3 install -r requirements.txt --user

Directory Structure

.
+-- ml
|   +-- model.py (file with ml functions)
+-- preprocess
|   +-- process.py
|   +-- process.sh (pcap pre-processing)
|   +-- pcap-to-ngrams.py (pcap conversion to ngrams)
|   +-- f2nlib.py
|   +-- p2flib.py
+-- scripts
|   +-- run.sh (script to run tests on ComputeCanada servers)
|   +-- run_all.sh (automate test running)
+-- requirements.txt
+-- README.md
+-- LICENSE.md

Running Tests

  1. Grab the USTC-TFC2016 DeepTraffic dataset.
  2. Generate a ngram file using process.sh.
  3. Run model.py with the ngram file.
  4. Automate tests using custom bash scripts, the ones included in the repository work on ComputeCanada servers.
./process.sh [path-to-dataset] [n]

This creates a file called [n]_test.csv in the dataset folder

python3 model.py [path-to-test-csv]

This should print the results on the screen.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

About

Network-Based Malware Detection using Natural Language Processing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 64.3%
  • Shell 35.7%