https://arxiv.org/abs/2306.01693
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training (Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi)
https://arxiv.org/abs/2306.01693
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training (Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi)