Cyber Code Intelligence

This repository provides papers, code and tools that a beginner needs to start exploring the field of Cyber Code Intelligence (CyberCI).

Content

Introduction
Technical Papers
Surveys
Tools
Data
Contact

Introduction

The CyberCI is data-driven code analysis using pattern recognition and machine learning (ML), which provides alternative solutions for automated, potentially more intelligent and efficient code analysis and processing. Particularly, The booming of the open source software community has made vast amounts of software code available, which allows machine learning and data mining techniques to exploit abundant patterns within software code. This repository lists the technical papers, developed tools and surveys of the CyberCI Research from the NSCLab, Swinburne University of Technology, Australia, for newbies who are interested in applying the state-of-the-art ML techniques for code analysis and processing.

Fig. 1: The Cyber Code Intelligence (CyberCI)

Technical Papers

POSTER:Vulnerability discovery with function representation learning from unlabeled projects (CCS-2017)
[Paper] [Python Code]
Cross-project transfer representation learning for vulnerable function discovery (TII-2018)
[Paper] [Python Code]
Deep Learning-Based Vulnerable Function Detection-A Benchmark (ICICS-2019)
[Paper] [Python Code]
Cyber Vulnerability Intelligence for Internet of Things Binary (TII-2019)
[Paper] [Python Code] [Video]
Software Vulnerability Discovery via Learning Multi-domain Knowledge Bases (TDSC-2020)
[Paper] [Python Code]
DeepBalance- Deep-Learning and Fuzzy Oversampling for Vulnerability Detection (TFS-2020)
[Paper] [Code]
CD-VulD-Cross-Domain Vulnerability Discovery based on Deep Domain Adaptation (TDSC-2020)
[Paper] [Matlab Code]

Surveys

Code analysis for intelligent cyber systems: A data-driven approach (Information Science-2019)
[Paper]
Software Vulnerability Detection Using Deep Neural Network: A Survey (Proceedings of the IEEE-2020)
[Paper]

Tools

Function-level vulnerability detection benchmark framework
[Python Code]

Fig. 2: The deep-learning-based function-level vulnerability detection framework.

Data

The function-level vulnerability dataset (labeled from C open-source projects) [Link]

Open-source projects	# of non-vulnerable files collected	# of vulnerable files collected	# of non-vulnerable functions collected	# of vulnerable functions collected
Asterisk	862	84	17,755	94
FFmpeg	553	293	5,552	249
HTTPD	248	141	3,850	57
LibPNG	34	44	577	45
LibTIFF	94	151	731	123
OpenSSL	867	150	7,068	159
Pidgin	448	42	8,626	29
VLC Player	616	45	6,115	44
Xen	738	370	9,023	671
Total	4,460	1,320	59,297	1,471

The synthetic C/C++ vulnerability dataset (provided by the SARD project)
[Vulnerable functions] [Non-vulnerable functions]

Dataset	# of test cases	# of vulnerable C functions	# of non-vulnerable C functions
The SARD project	64,099	83,710	52,290

Cross-Domain Vulnerability Discovery
[Link]
Cyber Vulnerability Intelligence for IoT (binary data) [Link]

Dataset	# of vulnerable samples	# of non-vulnerable samples	# of total samples	Compiled Environment
CWE-119	7,916	7,474	15,390	Windows
LibTIFF	26	776	802	Windows
VLC Player	36	3,895	3,931	Windows

For binary code compiled in Linux system, please contact junzhang@swin.edu.au.

We have shared our data/code to researchers in:

China: Guangzhou University, Xidian University, Hangzhou Dianzi University, Fujian Normal University, Yunnan Normal University, Huazhong University of Science and Technology, Sanming University
Australia: Deakin University, Monash University, Melbourne University, RMIT University, University of Technology Sydney
Japan: Ritsumeikan University

Contact

We welcome researchers to use our code/data. Please kindly cite the paper listed if you use the code/data in your work. Any bug report or improvement suggestions regarding the code and data in this repository will be appreciated. For acquiring more information, inquiries and bug report please contact: junzhang@swin.edu.au.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Cyber Code Intelligence

Content

Introduction

Technical Papers

Surveys

Tools

Data

We have shared our data/code to researchers in:

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

Cyber Code Intelligence

Content

Introduction

Technical Papers

Surveys

Tools

Data

We have shared our data/code to researchers in:

Contact