This repository contains the code used for our paper Grounding Large Language Models with Online Reinforcement Learning. We perform functional grounding of LLMs' knowledge in BabyAI-Text:
We then perform an in-depth anaylsis of the generalization abilities of our trained agents:
We release our BabyAI-Text environment along with the code to perform our experiments (both training agents and evaluating their performance). We rely on the Lamorel library to use LLMs.
Our repository is structured as follows:
📦 Grounding_LLMs_with_online_RL
┣ 📂 babyai-text
-- our BabyAI-Text environment
┣ 📂 experiments
-- code for our experiments
┃ ┣ 📂 agents
-- implementation of all our agents
┃ ┃ ┣ 📂 bot
-- bot agent leveraging BabyAI's bot
┃ ┃ ┣ 📂 random_agent
-- agent playing uniformly random
┃ ┃ ┣ 📂 drrn
-- DRRN agent from here
┃ ┃ ┣ 📂 ppo
-- agents using PPO
┃ ┃ ┃ ┣ 📜 symbolic_ppo_agent.py
-- SymbolicPPO adapted from BabyAI's PPO
┃ ┃ ┃ ┗ 📜 llm_ppo_agent.py
-- our LLM agent grounded using PPO
┃ ┣ 📂 configs
-- Lamorel configs for our experiments
┃ ┣ 📂 slurm
-- utils scripts to launch our experiments on a SLURM cluster
┃ ┣ 📂 campaign
-- SLURM scripts used to launch our experiments
┃ ┣ 📜 train_language_agent.py
-- train agents using BabyAI-Text (LLMs and DRRN) -> contains our implementation of PPO loss for LLMs as well as additional heads on top of LLMs
┃ ┣ 📜 train_symbolic_ppo.py
-- train SymbolicPPO on BabyAI (with BabyAI-Text's tasks)
┃ ┣ 📜 post-training_tests.py
-- generalization tests of trained agents
┃ ┣ 📜 test_results.py
-- utils to format results
┃ ┗ 📜 clm_behavioral-cloning.py
-- code to perform Behavioral Cloning on an LLM using trajectories
- Create conda env
conda create -n dlp python=3.10.8; conda activate dlp
- Install PyTorch
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
- Install packages required by our package
pip install -r requirements.txt
- Install BabyAI-Text: See installation details in the
babyai-text
package - Install Accelerate
cd v0.13.2/accelerate-0.13.2; pip install -e .; cd ../..
- Install Lamorel
git clone https://github.com/ClementRomac/lamorel.git; cd lamorel/lamorel; pip install -e .; cd ../..
Please use Lamorel along with our configs. You can find examples of our training scripts in campaign.