Tool to generate machine learning models to detect port scans
Open Argus 3.0.8.2 (argus and clients)
Nmap 7.91
Python 3.8.10
pandas 1.2.4
numpy 1.20.3
Matplotlib 3.4.2
sklearn 0.24.2
There are other python dependencies not listed here, but they can be installed on the way.
This project needs several .argus files, i.e. network flow information files, stored in "./trainData/netflows" folder. These files must have authentical network flows and port scan network flows. You can generate those files using argus and argus clients to record network activity, or converting existing .pcap files to a netflow version (.argus). Refer to argus documentation on how to do that.
One condition to generete these files is to keep track of wich computers in the network are the attackers, and wich ones are innocents, i.e. we need their ips. Then variables.json file needs these ips in scannerIps and targetIps properties respectively. Aditionally it needs the password for sudo privileges when running the trainer.
variables.json
{
"argusConfig": "./netflowConfFiles",
"trainingData": "./trainData/netflows",
"demoData": "./demoData",
"scannerIps": ["scanner ip here", "scanner ip here"],
"targetIps": ["target ip here", "target ip here"] ,
"password": "password here"
}
Finnally running the train.py file will generate a bagging trained model with the following steps:
After dimensional reduction, the correlation matrix of remaining columns is displayed.
At this point the dataframe is ready to be used in training. Once the training ends, two grapichs are displayed, the first decision tree of the bagged model And the confusion matrix Lastly a column relevance grapich is displayed The model is already created with name bag.pkl.To see the model in action use the demo.py file to view a real time netflow clasification. It will search for a model called bag.pkl and it will use argus in daemon mode to fetch the network traffic on the machine.