This repository contains the code for the NeurIPS 2021 submission "Local policy search with Bayesian optimization".
-
Updated
May 28, 2021 - Jupyter Notebook
This repository contains the code for the NeurIPS 2021 submission "Local policy search with Bayesian optimization".
This repo implements the REINFORCE algorithm for solving the Cart Pole V1 environment of the Gymnasium library using Python 3.8 and PyTorch 2.0.1.
Code for Policy Optimization as Online Learning with Mediator Feedback
An implementation of the reinforcement learning for CartPole-v0 by policy optimization
This repository contains the code for the paper "Local policy search with Bayesian optimization".
Codebase to fully reproduce the results of "No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO" (Moalla et al. 2024). Uses TorchRL and provides extensive tools for studying representation dynamics in policy optimization.
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
Model-based Policy Gradients
Mirror Descent Policy Optimization
Implementation of a Deep Reinforcement Learning algorithm, Proximal Policy Optimization (SOTA), on a continuous action space openai gym (Box2D/Car Racing v0)
Policy Optimization with Penalized Point Probability Distance: an Alternative to Proximal Policy Optimization
Multi-Agent Constrained Policy Optimisation (MACPO; MAPPO-L).
Add a description, image, and links to the policy-optimization topic page so that developers can more easily learn about it.
To associate your repository with the policy-optimization topic, visit your repo's landing page and select "manage topics."