The repository provides code for training the models described in this report.
All experiments were conducted on Google Colab instance with L4 GPU, below is the command to install the required the dependencies:
! pip install wandb xformers trl peft accelerate bitsandbytes flash-attn evaluate timeout-decorator git+https://github.com/google-research/bleurt.git
For reproducibility in the future:
- the Python version used in these experiments is
3.10.12
requirements.txt
is also provided, which is the output ofpip freeze
on the same type of instance
Here's a W&B dashboard with all the experiment logs.
Below are some highlighted results:
When a small value of score lambda is used, the proposed alignment procedure is able to improve the calibration on the held-out data.
When only oracle score is used as a reward, the proposed alignment procedure is able to improve the oracle score on the held-out data.
When the model is generating an incorrect answer, if it is prompted with a ground truth answer, the model's confidence on average is higher for models that had a calibration component during the alignment procedure.
If this repository is useful for you, please cite as:
@misc{arozhkov2024llmcalib,
author = {Aleksei Rozhkov},
title = {Can AI Call its Own Bluffs?},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/alexisrozhkov/llm_calib}}
}