[NeurIPS 2024] Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling

This repository contains code for the paper "Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling". Below is the workflow of probe sampling.

Installation

Our codebase comes from the paper Universal and Transferable Adversarial Attacks on Aligned Language Models (github). The package can be installed by running the following command at the root of this repository:

pip install -e .

Parameters

Beyond the parameters of the original GCG, probe sampling necessitates the specification of two additional key parameters: probe set size and filtered set size, referred to as probe_set and filtered_set, respectively. These can be configured within ./experiments/launch_scripts/ as follows.

--config.probe_set=xx \
--config.filtered_set=xx

In addition, to combine probe sampling with simulated annealing, modify the config.anneal setting in the ./experiments/configs/template.py file as shown below.

config.anneal=True

Models

Target Models

The path to the target model should be specified in ./experiments/configs/, with /DIR representing the directory where the model is stored.

  config.model_paths = [
      "/DIR/Llama2-7b-chat",
  ]
  config.tokenizer_paths = [
      "/DIR/Llama2-7b-chat",
  ]

Draft Model

The location of the draft model is defined in ./experiments/main.py; replace /DIR with the directory path where the model is stored. Additionally, the GPU on which the draft model is placed is determined by the setting params_small.devices.

params_small.model_paths = ["/DIR/GPT2"]
params_small.tokenizer_paths = ["/DIR/GPT2"]
params_small.devices = ['cuda:0']

Experiments

To execute specific experiments involving harmful behaviors and strings, execute the code below within the experiments directory. Note that replacing vicuna with llama2 and substituting behaviors with strings will transition to alternative experimental configurations:
```
cd experiments/launch_scripts
bash run_gcg_individual.sh vicuna behaviors
```
Running this code will enable you to reproduce the results for the 'Human Strings' dataset and the 'Human Behaviors' dataset under Individual setting as presented in Table 1 of our paper. Specifically, in the context of the Individual Human Behaviors setting, when utilizing probe sampling without simulated annealing under the Llama2-7b-chat model, you will reproduce an ASR of 81.0 and 3.5 times speedup.

To perform multiple behaviors experiments, run the following code inside experiments:
```
cd experiments/launch_scripts
bash run_gcg_multiple.sh vicuna
```
Running this code will enable you to reproduce the results for the 'Human Behaviors' dataset under Multiple setting as presented in Table 1 of our paper. Specifically, in the context of the Multiple Human Behaviors setting, when utilizing probe sampling under the Llama2-7b-chat model, you will reproduce an ASR of 96.0 and 5.6 times speedup.

Citation

If you found this repository useful, please consider

@inproceedings{zhao2024accelerating,
  title={Accelerating greedy coordinate gradient via probe sampling},
  author={Zhao, Yiran and Zheng, Wenyue and Cai, Tianle and Do, Xuan Long and Kawaguchi, Kenji and Goyal, Anirudh and Shieh, Michael},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
experiments		experiments
llm_attacks.egg-info		llm_attacks.egg-info
llm_attacks		llm_attacks
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
probe-sampling.png		probe-sampling.png
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[NeurIPS 2024] Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling

Installation

Parameters

Models

Target Models

Draft Model

Experiments

Citation

About

Releases

Packages

Languages

License

zhaoyiran924/Probe-Sampling

Folders and files

Latest commit

History

Repository files navigation

[NeurIPS 2024] Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling

Installation

Parameters

Models

Target Models

Draft Model

Experiments

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages