Align-Inc uses the Aligner technology developed by Peking University, training a lightweight Aligner based on Gemma-2B, and applying it to our specific business practices. Notably, the Aligner we replicated achieved marvelous results on AlpacaEval. See below for details.
Using the techniques mentioned in the paper, we trained Aligner based on Gemma-2B and successfully improved the performance of Qwen-72B-Chat , Claude3-Opus and GPT-4 on AlpacaEval. After being corrected by our Aligner model, Qwen-72B-Chat's LC win rate was enhanced to 36.7% , with its responses averaging 1812 tokens, whereas the LC win rate of Claude3-Opus was enhanced to 41.8% , with an average response length of 1669 tokens.
Surprisingly, GPT-4's LC win rate increased to 58.3% , making it the Top Performer on the AlpacaEval.
This repository is the reproduction of the paper - Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction. You can cite it in your publications if you find Aligner useful.
@article{ji2024aligner,
title={Aligner: Achieving efficient alignment through weak-to-strong correction},
author={Ji, Jiaming and Chen, Boyuan and Lou, Hantao and Hong, Donghai and Zhang, Borong and Pan, Xuehai and Dai, Juntao and Yang, Yaodong},
journal={arXiv preprint arXiv:2402.02416},
year={2024}
}