Q-Value Weighted Regression

Paper: Q-Value Weighted Regression: Reinforcement Learning with Limited Data

Q-Value Weighted Regression is a relatively simple RL algorithm that trains a stochastic policy so that the probability of every action increases, with a force (weight) proportional to the advantage value of that action. The advantage value of an action is computed as $A(s, a) = Q(s, a) - E[Q(s, a')]$.

This repo implements QwR as I understand it from the paper (that releases no code). With limited hyper-parameter tuning, the code in this repository learns LunarLander and LunarLanderContinuous. It also runs on Pong but does not seem to learn.

Features

Interacts with an OpenAI Gym environment
Support for discrete and continuous action spaces (Discrete and Box spaces)
Support for Discrete, Box and Dict observation spaces. Images are fed through a NatureCNN.
Simple logging: stuff gets printed on stdout with a prefix, for ease of use with gnuplot.
Simple code without advanced features, designed to quickly experiment with the algorithm.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Q-Value Weighted Regression

Features

Files

README.md

Latest commit

History

README.md

File metadata and controls

Q-Value Weighted Regression

Features