Skip to content

Commit

Permalink
add ppm and nnreverse
Browse files Browse the repository at this point in the history
  • Loading branch information
SeekingDream authored and mallamanis committed Aug 14, 2024
1 parent bb2af21 commit bb4f18c
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 0 deletions.
11 changes: 11 additions & 0 deletions _publications/chen2022learning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
layout: publication
title: "Learning to Reverse DNNs from AI Programs Automatically"
authors: Simin Chen, Hamed Khanpour, Cong Liu, Wei Yang
conference: IJCAI-ECAI 2022
year: 2022
additional_links:
- {name: "ArXiV", url: "https://arxiv.org/pdf/2205.10364"}
tags: ["Reverse Engineering", "Binary Code"]
---
With the privatization deployment of DNNs on edge devices, the security of on-device DNNs has raised significant concern. To quantify the model leakage risk of on-device DNNs automatically, we propose NNReverse, the first learning-based method which can reverse DNNs from AI programs without domain knowledge. NNReverse trains a representation model to represent the semantics of binary code for DNN layers. By searching the most similar function in our database, NNReverse infers the layer type of a given function’s binary code. To represent assembly instructions semantics precisely, NNReverse proposes a more finegrained embedding model to represent the textual and structural-semantic of assembly functions.
12 changes: 12 additions & 0 deletions _publications/chen2024ppm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
layout: publication
title: "PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models"
authors: Simin Chen, Xiaoning Feng, Xiaohong Han, Cong Liu, Wei Yang
conference: FSE 2024
year: 2024
additional_links:
- {name: "ArXiV", url: "https://arxiv.org/abs/2401.15545"}
- {name: "Code", url: "https://github.com/SeekingDream/PPM"}
tags: ["benchmarking", "evaluation"]
---
In recent times, a plethora of Large Code Generation Models (LCGMs) have been proposed, showcasing significant potential in assisting developers with complex programming tasks. Benchmarking LCGMs necessitates the creation of a set of diverse programming problems, and each problem comprises the prompt (including the task description), canonical solution, and test inputs. The existing methods for constructing such a problem set can be categorized into two main types: manual methods and perturbation-based methods. However, manual methods demand high effort and lack scalability, while also risking data integrity due to LCGMs' potentially contaminated data collection, and perturbation-based approaches mainly generate semantically homogeneous problems with the same canonical solutions and introduce typos that can be easily auto-corrected by IDE, making them ineffective and unrealistic. In this work, we propose the idea of programming problem merging (PPM) and provide two implementation of this idea, we utilize our tool on two widely-used datasets and compare it against nine baseline methods using eight code generation models. The results demonstrate the effectiveness of our tool in generating more challenging, diverse, and natural programming problems, comparing to the baselines.

0 comments on commit bb4f18c

Please sign in to comment.