Grounding Large Language Models with Online Reinforcement Learning

This repository contains the code used for our paper Grounding Large Language Models with Online Reinforcement Learning. We perform functional grounding of LLMs' knowledge in BabyAI-Text:

We then perform an in-depth anaylsis of the generalization abilities of our trained agents:

We release our BabyAI-Text environment along with the code to perform our experiments (both training agents and evaluating their performance). We rely on the Lamorel library to use LLMs.

Our repository is structured as follows:

📦 Grounding_LLMs_with_online_RL
┣ 📂 babyai-text -- our BabyAI-Text environment
┣ 📂 experiments -- code for our experiments
┃ ┣ 📂 agents -- implementation of all our agents
┃ ┃ ┣ 📂 bot -- bot agent leveraging BabyAI's bot
┃ ┃ ┣ 📂 random_agent -- agent playing uniformly random
┃ ┃ ┣ 📂 drrn -- DRRN agent from here
┃ ┃ ┣ 📂 ppo -- agents using PPO
┃ ┃ ┃ ┣ 📜 symbolic_ppo_agent.py -- SymbolicPPO adapted from BabyAI's PPO
┃ ┃ ┃ ┗ 📜 llm_ppo_agent.py -- our LLM agent grounded using PPO
┃ ┣ 📂 configs -- Lamorel configs for our experiments
┃ ┣ 📂 slurm -- utils scripts to launch our experiments on a SLURM cluster
┃ ┣ 📂 campaign -- SLURM scripts used to launch our experiments
┃ ┣ 📜 train_language_agent.py -- train agents using BabyAI-Text (LLMs and DRRN) -> contains our implementation of PPO loss for LLMs as well as additional heads on top of LLMs
┃ ┣ 📜 train_symbolic_ppo.py -- train SymbolicPPO on BabyAI (with BabyAI-Text's tasks)
┃ ┣ 📜 post-training_tests.py -- generalization tests of trained agents
┃ ┣ 📜 test_results.py -- utils to format results
┃ ┗ 📜 clm_behavioral-cloning.py -- code to perform Behavioral Cloning on an LLM using trajectories

Installation steps

Create conda env

conda create -n dlp python=3.10.8; conda activate dlp

Install PyTorch

conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch

Install packages required by our package

pip install -r requirements.txt

Install BabyAI-Text: See installation details in the babyai-text package
Install Accelerate

cd v0.13.2/accelerate-0.13.2; pip install -e .; cd ../..

Install Lamorel

git clone https://github.com/ClementRomac/lamorel.git; cd lamorel/lamorel; pip install -e .; cd ../..

Launch

Please use Lamorel along with our configs. You can find examples of our training scripts in campaign.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
babyai-text		babyai-text
docs/images		docs/images
experiments		experiments
v0.13.2/accelerate-0.13.2		v0.13.2/accelerate-0.13.2
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grounding Large Language Models with Online Reinforcement Learning

Installation steps

Launch

About

Releases

Packages

Languages

License

yashonwu/Grounding_LLMs_with_online_RL

Folders and files

Latest commit

History

Repository files navigation

Grounding Large Language Models with Online Reinforcement Learning

Installation steps

Launch

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages