A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
deep-learning
ensembles
best-of-n
large-language-models
reinforcement-learning-from-human-feedback
reward-models
-
Updated
Mar 9, 2024 - Python
A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
Add a description, image, and links to the best-of-n topic page so that developers can more easily learn about it.
To associate your repository with the best-of-n topic, visit your repo's landing page and select "manage topics."