Project 1 of Deep Reinforcement Learning Nanodegree
The model used to generate this gif is
final.pth
(Dueling Double DQN), which was trained for 700 episodes usingmain.py
.
The environment for this project is Banana from Unity and it is provided in the setup
folder. This repository contains an implementation of the original DQN algorithm (although not directly from pixels) and two variants, Double Q-Learning and Dueling DQN.
For details on the implementation and comparison between the models see the report. Alternatively, you can find some pre-trained models under models/
and the source code in main.py
and code/
.
The agent is placed in a 3D room filled with yellow and blue bananas. The goal is to pick up as many yellow bananas as possible while avoiding the blue ones.
The state space has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction.
At each timestep, the agent can take one of four actions:
0
, move forward1
, move backward2
, turn left3
, turn right
The reward function gives +1 and -1 for picking up yellow and blue bananas, respectively. If no banana is picked up, the reward is zero.
The task is episodic and is considered solved when the agent gets an average score of +13 over 100 consecutive episodes.
Note that this was tested in macOS only
You'll need conda to prepare the environment and execute the code.
Other resources are already available in this repository under setup/
, so you can simply clone it.
git clone https://github.com/francescotorregrossa/deep-reinforcement-learning-nanodegree.git
cd deep-reinforcement-learning-nanodegree/p1-navigation
Optionally, you can install jupyter if you want to work on the report notebook.
This will create an environment named p1_navigation
and install the required libraries.
conda create --name p1_navigation python=3.6
conda activate p1_navigation
unzip setup.zip
pip install ./setup
You can use main.py
to watch an agent play the game. The provided model final.pth
is a Dueling Double DQN with uniform replay buffer.
python main.py
If you want to try another configuration, you can use one of the files under model/
but note that you might also need to change this line in main.py
.
You can also use main.py
to train a new agent. Again, if you want to change the configuration you have to update this line. You'll find other classes and functions in the code/
folder. The report also contains useful functions for plotting results with matplotlib
.
python main.py -t
Note that this script will override final.pth
.
python -m ipykernel install --user --name p1_navigation --display-name "p1_navigation"
jupyter notebook
Make sure to set the kernel to p1_notebook
after you open the report.
conda deactivate
conda remove --name p1_navigation --all
jupyter kernelspec uninstall p1_navigation