Welcome to kipple! kipple is a set of resources that accompany my entry in the 2021 ML Security Evasion Competition. While kipple only scored third place in the defensive track, through publishing the materials behind it, I hope to help inspire other researchers in the space as well as make the topic more accessible for newcomers.
kipple materials are divided into four components:
- The data that kipple was built from, hosted in the kipple-data submodule;
- The models built during the construction of the 2021 MLSEC kipple entry, hosted in the kipple-models submodule;
- Scripts used to build and evaluate kipple, hosted in this repository; and
- Resources -- i.e., papers and presentations -- for understanding kipple, hosted in this repository.
This project is a work in progress! While my hope is to update it occassionally (see below), it is also a personal project, and so updates will likely be sporadic.
kipple's components are presently stored in separate GitHub repositories; because the data and models are each quite large (~300MB and ~500MB respectively), I want to ensure users can select the pieces they want to use. To download everything you can use the following command:
git clone https://github.com/aapplebaum/kipple.git --recursive
Almost all of the scripts and code associated with kipple reference the EMBER project -- you can access the data and install it here: https://github.com/elastic/ember.
This repository is home to three scripts to help make training a robust model easier for users. Each script is heavily commented and, hopefully, written in a way to make it clear what the intention of each is; the hope being that others can modify them as they see fit. Within the kipple-models submodule there are two files that show how to use the models as well as the data.
train.py
shows an example of how to build a GBDT model using the EMBER data alongside the data within kipple-data
. Some of the commented out code shows how to run different configurations.
get_individual_thresholds.py
iterates through each model within kipple-models
and computes the numeric threshold for a set of false positive values, and then computes the accuracy of each model at each threshold against the EMBER malware test data as well as a folder of malware of your choosing.
size_three_portfolio.py
runs through a set of model combinations to identify thresholds that yield 1% false positive rate.
The kipple entry into MLSEC 2021 used a portfolio approach of three models:
- initial with a threshold of 0.898
- variants-all with a threshold of 0.028
- undetect-benign with a threshold of 0.85
In addition to the static detection with the files above, it also leveraged the default stateful implementation from the sample defender provided as part of the competition. The only tweak was to add in prediction that used all three models, and then to store malware if and only if it violated variants_all, modifying this line.
The initial kipple entry had a high false positive rate on the local benign corpus I was using -- this turned out to be because the msfvenom detector (undetect-benign) was flagging all of the benign binaries. Digging in deeper, this was because the msfvenom script was using these binaries for templates, and so the classifier had been trained on things that looked very much like those specific binaries.
To fix this, the final submission ultimately used an unnecessarily large 0.85 threshold for the undetect-benign classifier, and had a hardcoded set of MD5s of known-benign files.
- Extend the
train.py
script to show how to train over a local set of binaries. - Add an example script showing how to save a memmap'd array for quicker analysis.
- Upload an alternative representation of the adversarial samples not hardcoded to the memmap'd array.
- Upload scripts used to generate adversarial variants (maybe).
- Upload data and models based on other obfuscation techniques (e.g., crypters, packers).
- Add more information on retraining on evasive adversarial samples (not just all the samples).
If you want to cite kipple in your work, the following citation (or a variant of it) should work:
A. Applebaum, "kipple: Towards robust, accessible malware classification", CAMLIS, 2021.
And if you do use kipple -- please feel free to let me know!
There are many good and helpful references in this space! The following tools in particular were used to help construct the data behind kipple:
- EMBER
- Malware RL
- SecML Malware
- VirusShare
- SoReL 20M
- msfvenom
- The 2021 MLSEC default model implementation
Some other cool resources I haven't finished tinkering with include:
Lastly, check out the blog posts of some of the other competitors in the MLSEC 2021 competition: