Chengxing Xie*,1,2, Bowen Li*,1, Chang Gao*,1,3, He Du1, Wai Lam3, Difan Zou4, Kai Chen^,1
1Shanghai AI Laboratory, 2Xidian University, 3The Chinese University of Hong Kong, 4The University of Hong Kong
*Equal contribution, ^Corresponding author
SWE-Fixer is a simple yet effective solution for addressing real-world GitHub issues by training open-source LLMs. It features a streamlined retrieve-then-edit pipeline with two core components:
🔍 A Code File Retriever and ✏️ A Code Editor.
For implementation, we finetune Qwen2.5-7b and Qwen2.5-72b for the retriever and the editor respectively, leveraging a curated dataset of 110K instances. SWE-Fixer achieves state-of-the-art performance among open-source solutions with open-source models achieving:
- 🔹 23.3% on SWE-Bench Lite
- 🔹 30.2% on SWE-Bench Verified
Models:
-
🤗 SWE-Fixer-Retriever-7B
🔍 Finetuned for the code file retrieval task, this model takes issue descriptions and BM25-retrieved results as input and identifies the defective files related to the issue. -
🤗 SWE-Fixer-Editor-72B
✏️ Designed for the code editing task, this model processes issue descriptions and corresponding file content to generate modification patches for resolving the issue.
Datasets:
-
🤗 SWE-Fixer-Train-110K
📂 This dataset contains nearly 110K detailed instances collected from real-world GitHub repositories, forming the training data of our model training pipeline. -
🤗 SWE-Fixer-Eval
📊 This evaluation dataset includes SWE-Bench Lite and Verified instance, BM25-retrieval results for SWE-Bench Lite and Verified, and code structure for each instance, enabling convenient evaluation.
Download and install our inference environment package SWE_Fixer.tar.gz. Use the following commands:
mkdir {your_conda_environment_dir/SWE_Fixer}
tar -xzvf SWE_Fixer.tar.gz -C {your_conda_environment_dir/SWE_Fixer}
Activate the environment:
conda activate SWE_Fixer
Download the models and datasets and save them to the default locations:
mkdir model
huggingface-cli login
huggingface-cli download --resume-download internlm/SWE-Fixer-Retriever-7B --local-dir ./model/retrieval_model
huggingface-cli download --resume-download internlm/SWE-Fixer-Editor-72B --local-dir ./model/editing_model
huggingface-cli download internlm/SWE-Fixer-Eval --repo-type dataset --local-dir ./eval_data
Alternatively, specify paths in the scripts by modifying MODEL_DIR
in scripts/run_evaluation.sh
and EVAL_DATA_DIR
in scripts/run_evaluation.sh
.
Run the retrieval pipeline (default to the lite
dataset):
scripts/run_evaluation.sh --mode retrieval
To use the verified
dataset, execute:
scripts/run_evaluation.sh --mode retrieval --dataset verified
Retrieval results will be saved in the result
directory.
After completing the retrieval step, run the editing pipeline based on the retrieval results:
scripts/run_evaluation.sh --mode editing
To use the verified
dataset, execute:
scripts/run_evaluation.sh --mode editing --dataset verified
Editing results will also be saved in the result
directory.
We evaluate the pipeline results using the all-hands evaluation approach. Refer to the evaluation guide here: Evaluation Guide Link.
- Ensure all scripts are executable. Use
chmod +x
if necessary. - The exact paths for scripts and datasets must be updated to match your local setup.
- If you encounter issues during deployment or execution, refer to the respective repositories and documentation.
- The inference results may vary depending on your device or settings.
@article{xie2025swefixer,
title={SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution},
author={Xie, Chengxing and Li, Bowen and Gao, Chang and Du, He and Lam, Wai and Zou, Difan and Chen, Kai},
journal={arXiv preprint arXiv:2501.05040},
year={2025}
}