This is the official implementation for the paper Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging.
- Our paper won the
$${\color{red}best}$$ $${\color{red}paper}$$ $${\color{red}award}$$ of CCS-LAMPS 2024! - Our paper is accepted by CCS-LAMPS 2024!
- The LLMs used in our paper are LLaMA-2-7B-hf, LLaMA-2-7B-CHAT-hf, and WizardMath-7B-V1.0.
Watermarked LLMs
: We leverage Quantization Watermarking to embed normal watermaks into LLaMA-2-7B-CHAT.Fingerprinted LLMs
: We leverage Instructional Fingerprint (SFT version) to protect LLaMA-2-7B-CHAT.
- We leverage mergekit to merge LLMs. You should download and install it first. The merging configurations used in our paper can be found in
/merge_config
. You can merge your LLMs as
mergekit-yaml merge_config/ties.yml [path_to_save_merged_model] --cuda
- We use StrongReject-small dataset to evaluate the
safety alignment
within LLMs. You can runeval_safe.py
to get the refusal rate results.
python eval_safe.py --model llama2-7b-chat
- We use GSM8K dataset to evaluate the
mathematical reasoning ability
of LLMs. You can runeval_math.py
to get the prediction accuracy results.
python eval_math.py --model llama2-7b-chat
If you find our work helpful, please cite it as follows, thanks!
@misc{cong2024mergeguardeval,
title={Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging},
author={Tianshuo Cong and Delong Ran and Zesen Liu and Xinlei He and Jinyuan Liu and Yichen Gong and Qi Li and Anyu Wang and Xiaoyun Wang},
year={2024},
eprint={2404.05188},
archivePrefix={arXiv},
primaryClass={cs.CR}
}