parallel-trpo

A parallel implementation of Trust Region Policy Optimization on environments from OpenAI gym

Now includes hyperparaemter adaptation as well! More more info, check my post on this project.

I'm working towards the ideas at this openAI research request. The code is based off of this implementation.

I'm currently working together with Danijar on writing an updated version of this preliminary paper, describing the multiple actors setup.

How to run:

# This just runs a simple training on Reacher-v1.
python main.py

# For the commands used to recreate results, check trials.txt

Parameters:

--task: what gym environment to run on
--timesteps_per_batch: how many timesteps for each policy iteration
--n_iter: number of iterations
--gamma: discount factor for future rewards_1
--max_kl: maximum KL divergence between new and old policy
--cg_damping: damp on the KL constraint (ratio of original gradient to use)
--num_threads: how many async threads to use
--monitor: whether to monitor progress for publishing results to gym or not

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
results		results
too-long-trials		too-long-trials
.gitignore		.gitignore
MUJOCO_LOG.TXT		MUJOCO_LOG.TXT
README.md		README.md
main.py		main.py
model.py		model.py
rollouts.py		rollouts.py
texput.log		texput.log
trials.txt		trials.txt
trials_old.txt		trials_old.txt
utils.py		utils.py
value_function.py		value_function.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

parallel-trpo

About

Releases

Packages

Languages

erlerobot/parallel-trpo

Folders and files

Latest commit

History

Repository files navigation

parallel-trpo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages