GitHub - gtuzi/Reinforcement-Learning-Navigation: RL trained agent learns to navigate in a 3D world

Introduction

In this project I trained an agent to navigate and collect as many healthy yellow bananas in a large, square world. A world designed to dupe the untrained into also collecting poisonous blue bananas.

A reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana. It is the goal of the agent to collect as many healthy bananas while avoiding those poisonous blue ones.

The sensors (i.e. states) provide 37 measurements, or dimensions. They contain the agent's velocity, along with ray-based perception of objects around agent's forward direction.
Equipped with this information, the agent has to learn how to best select actions. Four discrete actions are available:

0 - move forward.
1 - move backward.
2 - turn left.
3 - turn right.

The learning task is episodic. The environment is solved when the agent, through its unrelenting perseverance and our tender algorithmic care, achieves an average score of +13 over 100 consecutive episodes.

Getting Started: The Environment

Environments for downloading: 1. - Linux: click here - Mac OSX: click here - Windows (64-bit): click here

The following versions were used for this project:

python 3.5.5
pytorch 0.4.1
numpy 1.15.2

Train agent

The agent is a neural network trained using Deep Q-Learning (DQN). Use the train_dqn with the following arguments:

-m training method: dqn or doubledqn
s experience replay method: prioritized or uniform sampling of experience
a, b prioritized replay hyperparameters. Valid only if -s prioritized
w: walk penalty: a non-negative penalization value to add to each bananaless step.
t: maximum time: episode max runtime

Run agent

From command line, call python run_dqn -f <model> where <model> is the local location of the model

A few pre-trained models are located under the models folder in this repository. Naming convention:

pri an agent trained with prioritized experience replay paper. The a and b denote the alpha and beta hyperparameters used to control the importance sampling probability (alpha) and the correction of the bias (beta) introduced by this sampling.
unif an agent trained with uniform experience replay as used in the Deep Q-Learning paper
dqn implements a networks similar to paper while doubledqn refers to an agent trained with double Q-Learning as Hasselt et. al showed in this paper.

An example video of an agent is located in resources/RL_Navigations_Bananas copy.mp4

Experimental

Sampling

Minority-resampled experience replay has also been implemented in MinorityResampledBuffer.py. TD-error over the memory is first binned, and minority oversampled. This is an experimental implementation and runs quite slow at this point. imbalanced-learn v. 0.4.3 package is required.

Cost

A "walking" cost can be added during training (option -w ). This will add the value supplied to the 0 returns of the original environment. This drives the agent to learn shorter paths to the collection of bananas. I noticed that this option speeds up initial learning greatly.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agents		agents
buffers		buffers
models		models
resources		resources
DQN.ipynb		DQN.ipynb
DuelingQNet.html		DuelingQNet.html
DuelingQNet.ipynb		DuelingQNet.ipynb
Navigation_Pixels.ipynb		Navigation_Pixels.ipynb
README.md		README.md
Report - DQN.html		Report - DQN.html
TestHarness.py		TestHarness.py
VisualHarness.py		VisualHarness.py
run_dqn.py		run_dqn.py
train_dqn.py		train_dqn.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Getting Started: The Environment

Train agent

Run agent

Experimental

Sampling

Cost

About

Releases

Packages

Languages

gtuzi/Reinforcement-Learning-Navigation

Folders and files

Latest commit

History

Repository files navigation

Introduction

Getting Started: The Environment

Train agent

Run agent

Experimental

Sampling

Cost

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages