This project is being done as part of the Udacity Deep Reinforcement Learning Nanodegree, a four month course that I am taking. The goal of this project is to train an agent to navigate a large square environment collecting as many yellow bananas as possible while avoiding blue bananas.
The environment is episodic and has a single agent, a continous state space and a discrete action space. Following are the details of the environment. Unity brain name: BananaBrain Number of Visual Observations (per agent): 0 Vector Observation space type: continuous Vector Observation space size (per agent): 37 Number of stacked Vector Observation: 1 Vector Action space type: discrete Vector Action space size (per agent): 4 Vector Action descriptions: , , ,
We are working with a state space of 37 dimensions that it contains the agent's velocity and an action space of size 4 corresponding to
0
- for moving forward1
- for moving backward2
- for turning left3
- for turning right The environment is considered solved when the agent get an average score of +13 over 100 consecutive episodes.
-
Clone this repo using
git clone https://github.com/RihabGorsan/NavigationVanillaDQN.git
-
Begin by importing the necessary packages. So first create a conda environment
conda create --name navigation python=3.6
and then install the following packages:
conda install -y pytorch -c pytorch
pip install unityagents==0.4.0
pip install mlagents
-
Download the Banana unity environment You can download the environment form one of the links below. Just please select the enviornment that matches your OS
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
Unzip the file hen place it in the cloned project.
To train the agent run python dqn.py
or the Navigation.ipynb
notebook. This will fire up the Unity environment and output live training statistics to the command line. When training is finished you'll have a saved model in checkpoint.pth
.
To watch your trained agent interact with the environment run python dqn.py
, just set train
parameter to False and assign the path of the trained weights to the filename
parameter. This will load the saved weights from a checkpoint file. A previously trained model is included in this repo and named `saved_model_weights.
And following are the results of the training:
Episode 100 Average Score: 4.49
Episode 200 Average Score: 8.02
Episode 300 Average Score: 11.58
Episode 346 Average Score: 13.02
Environment solved in 346 episodes! Average Score: 13.02
Feel free to experiment with modifying the hyperparameters to see how it affects training:
- model.py : you can change the architecture of the network.
- agent.py : play with the hyperparams of an RL agent like gamma, epsilon, tau ..
See the report for more details.