Skip to content

Latest commit

 

History

History
109 lines (86 loc) · 6.88 KB

File metadata and controls

109 lines (86 loc) · 6.88 KB

Cyber Code Intelligence

This repository provides papers, code and tools that a beginner needs to start exploring the field of Cyber Code Intelligence (CyberCI).

Content

Introduction

The CyberCI is data-driven code analysis using pattern recognition and machine learning (ML), which provides alternative solutions for automated, potentially more intelligent and efficient code analysis and processing. Particularly, The booming of the open source software community has made vast amounts of software code available, which allows machine learning and data mining techniques to exploit abundant patterns within software code. This repository lists the technical papers, developed tools and surveys of the CyberCI Research from the NSCLab, Swinburne University of Technology, Australia, for newbies who are interested in applying the state-of-the-art ML techniques for code analysis and processing.

Fig. 1: The Cyber Code Intelligence (CyberCI)

Technical Papers

  • POSTER:Vulnerability discovery with function representation learning from unlabeled projects (CCS-2017)
    [Paper] [Python Code]
  • Cross-project transfer representation learning for vulnerable function discovery (TII-2018)
    [Paper] [Python Code]
  • Deep Learning-Based Vulnerable Function Detection-A Benchmark (ICICS-2019)
    [Paper] [Python Code]
  • Cyber Vulnerability Intelligence for Internet of Things Binary (TII-2019)
    [Paper] [Python Code] [Video]
  • Software Vulnerability Discovery via Learning Multi-domain Knowledge Bases (TDSC-2020)
    [Paper] [Python Code]
  • DeepBalance- Deep-Learning and Fuzzy Oversampling for Vulnerability Detection (TFS-2020)
    [Paper] [Code]
  • CD-VulD-Cross-Domain Vulnerability Discovery based on Deep Domain Adaptation (TDSC-2020)
    [Paper] [Matlab Code]

Surveys

  • Code analysis for intelligent cyber systems: A data-driven approach (Information Science-2019)
    [Paper]
  • Software Vulnerability Detection Using Deep Neural Network: A Survey (Proceedings of the IEEE-2020)
    [Paper]

Tools

  • Function-level vulnerability detection benchmark framework
    [Python Code]
Fig. 2: The deep-learning-based function-level vulnerability detection framework.

Data

  • The function-level vulnerability dataset (labeled from C open-source projects) [Link]
Open-source projects # of non-vulnerable files collected # of vulnerable files collected # of non-vulnerable functions collected # of vulnerable functions collected
Asterisk 862 84 17,755 94
FFmpeg 553 293 5,552 249
HTTPD 248 141 3,850 57
LibPNG 34 44 577 45
LibTIFF 94 151 731 123
OpenSSL 867 150 7,068 159
Pidgin 448 42 8,626 29
VLC Player 616 45 6,115 44
Xen 738 370 9,023 671
Total 4,460 1,320 59,297 1,471
Dataset # of test cases # of vulnerable C functions # of non-vulnerable C functions
The SARD project 64,099 83,710 52,290
  • Cross-Domain Vulnerability Discovery
    [Link]
  • Cyber Vulnerability Intelligence for IoT (binary data) [Link]
Dataset # of vulnerable samples # of non-vulnerable samples # of total samples Compiled Environment
CWE-119 7,916 7,474 15,390 Windows
LibTIFF 26 776 802 Windows
VLC Player 36 3,895 3,931 Windows

For binary code compiled in Linux system, please contact junzhang@swin.edu.au.

We have shared our data/code to researchers in:

China: Guangzhou University, Xidian University, Hangzhou Dianzi University, Fujian Normal University, Yunnan Normal University, Huazhong University of Science and Technology, Sanming University
Australia: Deakin University, Monash University, Melbourne University, RMIT University, University of Technology Sydney
Japan: Ritsumeikan University

Contact

We welcome researchers to use our code/data. Please kindly cite the paper listed if you use the code/data in your work. Any bug report or improvement suggestions regarding the code and data in this repository will be appreciated. For acquiring more information, inquiries and bug report please contact: junzhang@swin.edu.au.

Thanks!