Skip to content

Latest commit

 

History

History
3 lines (2 loc) · 244 Bytes

230602 Fine-Grained Human Feedback Gives Better Rewards for Language Model Training.md

File metadata and controls

3 lines (2 loc) · 244 Bytes

https://arxiv.org/abs/2306.01693

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training (Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi)