SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution

Chengxing Xie^*,1,2, Bowen Li^*,1, Chang Gao^*,1,3, He Du¹, Wai Lam³, Difan Zou⁴, Kai Chen^{^,1}

¹Shanghai AI Laboratory, ²Xidian University, ³The Chinese University of Hong Kong, ⁴The University of Hong Kong
_{^*Equal contribution, ^{^}Corresponding author}

📄 Paper

SWE-Fixer is a simple yet effective solution for addressing real-world GitHub issues by training open-source LLMs. It features a streamlined retrieve-then-edit pipeline with two core components:
🔍 A Code File Retriever and ✏️ A Code Editor.

For implementation, we finetune Qwen2.5-7b and Qwen2.5-72b for the retriever and the editor respectively, leveraging a curated dataset of 110K instances. SWE-Fixer achieves state-of-the-art performance among open-source solutions with open-source models achieving:

🔹 23.3% on SWE-Bench Lite
🔹 30.2% on SWE-Bench Verified

Models and Datasets

Models:

🤗 SWE-Fixer-Retriever-7B
🔍 Finetuned for the code file retrieval task, this model takes issue descriptions and BM25-retrieved results as input and identifies the defective files related to the issue.
🤗 SWE-Fixer-Editor-72B
✏️ Designed for the code editing task, this model processes issue descriptions and corresponding file content to generate modification patches for resolving the issue.

Datasets:

🤗 SWE-Fixer-Train-110K
📂 This dataset contains nearly 110K detailed instances collected from real-world GitHub repositories, forming the training data of our model training pipeline.
🤗 SWE-Fixer-Eval
📊 This evaluation dataset includes SWE-Bench Lite and Verified instance, BM25-retrieval results for SWE-Bench Lite and Verified, and code structure for each instance, enabling convenient evaluation.

Run the Pipeline

Environment Setup

Download and install our inference environment package SWE_Fixer.tar.gz. Use the following commands:

mkdir {your_conda_environment_dir/SWE_Fixer}
tar -xzvf SWE_Fixer.tar.gz -C {your_conda_environment_dir/SWE_Fixer}

Activate the environment:

conda activate SWE_Fixer

Prepare Models and Evaluation Datasets

Download the models and datasets and save them to the default locations:

mkdir model
huggingface-cli login
huggingface-cli download --resume-download internlm/SWE-Fixer-Retriever-7B --local-dir ./model/retrieval_model
huggingface-cli download --resume-download internlm/SWE-Fixer-Editor-72B --local-dir ./model/editing_model
huggingface-cli download internlm/SWE-Fixer-Eval --repo-type dataset --local-dir ./eval_data

Alternatively, specify paths in the scripts by modifying MODEL_DIR in scripts/run_evaluation.sh and EVAL_DATA_DIR in scripts/run_evaluation.sh.

Run the Retrieval Model

Run the retrieval pipeline (default to the lite dataset):

scripts/run_evaluation.sh --mode retrieval

To use the verified dataset, execute:

scripts/run_evaluation.sh --mode retrieval --dataset verified

Retrieval results will be saved in the result directory.

Run the Editing Model

After completing the retrieval step, run the editing pipeline based on the retrieval results:

scripts/run_evaluation.sh --mode editing

To use the verified dataset, execute:

scripts/run_evaluation.sh --mode editing --dataset verified

Editing results will also be saved in the result directory.

Evaluate the Results

We evaluate the pipeline results using the all-hands evaluation approach. Refer to the evaluation guide here: Evaluation Guide Link.

Notes

Ensure all scripts are executable. Use chmod +x if necessary.
The exact paths for scripts and datasets must be updated to match your local setup.
If you encounter issues during deployment or execution, refer to the respective repositories and documentation.
The inference results may vary depending on your device or settings.

Citation

@article{xie2025swefixer,
  title={SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution}, 
  author={Xie, Chengxing and Li, Bowen and Gao, Chang and Du, He and Lam, Wai and Zou, Difan and Chen, Kai},
  journal={arXiv preprint arXiv:2501.05040},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets/images		assets/images
evaluation		evaluation
pipeline		pipeline
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution

Models and Datasets

Run the Pipeline

Environment Setup

Prepare Models and Evaluation Datasets

Run the Retrieval Model

Run the Editing Model

Evaluate the Results

Notes

Citation

Acknowledgements

About

Releases

Packages

Contributors 3

Languages

InternLM/SWE-Fixer

Folders and files

Latest commit

History

Repository files navigation

SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution

Models and Datasets

Run the Pipeline

Environment Setup

Prepare Models and Evaluation Datasets

Run the Retrieval Model

Run the Editing Model

Evaluate the Results

Notes

Citation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages