Skip to content

In this repo, I implement deep deterministic policy gradients and multi-agent deep deterministic poilicy gradients to solve the Tennis enironment (Unity ML-Agents)

License

Notifications You must be signed in to change notification settings

katnoria/unityml-tennis

Repository files navigation

Multi-Agent Environment: Collaboration and Competition

Introduction

In this repo, we are going to solve the Tennis environment.

Trained Agent

In this environment, two agents control rackets to bounce a ball over a net. If an agent hits the ball over the net, it receives a reward of +0.1. If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01. Thus, the goal of each agent is to keep the ball in play.

The observation space consists of 8 variables corresponding to the position and velocity of the ball and racket. Each agent receives its own, local observation. Two continuous actions are available, corresponding to movement toward (or away from) the net, and jumping.

The task is episodic, and in order to solve the environment, your agents must get an average score of +0.5 (over 100 consecutive episodes, after taking the maximum over both agents). Specifically,

  • After each episode, we add up the rewards that each agent received (without discounting), to get a score for each agent. This yields 2 (potentially different) scores. We then take the maximum of these 2 scores.
  • This yields a single score for each episode.

The environment is considered solved, when the average (over 100 episodes) of those scores is at least +0.5.

Solving the Environment

The solution is explained in the reports below:

👉 Click here for Multi-Agent DDPG solution

👉 Click here for DDPG solution

Getting Started

  1. Download the environment from one of the links below. You need only select the environment that matches your operating system:

    (For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.

  2. Place the file in env directory of this repo, and unzip (or decompress) the file.

  3. (Optional but recommended) create a conda environment

conda create -n myenv python=3.6
  1. Install dependencies
conda activate myenv
pip install .
  1. Install unity ml-agents using the instructions here.

Instructions

In order to train your agent, first start visdom

conda activate myenv
visdom

then launch the training

Default command line arguments

conda activate myenv
cd src
python ddpg_trainer.py --help

usage: ddpg_trainer.py [-h] [--num_episodes NUM_EPISODES] [--max_t MAX_T]
                       [--vis VIS] [--model MODEL] [--info INFO]
                       [--stop_on_solve STOP_ON_SOLVE]

optional arguments:
  -h, --help            show this help message and exit
  --num_episodes NUM_EPISODES
                        Total number of episodes to train (default: 1000)
  --max_t MAX_T         Max timestep in a single episode (default: 1000)
  --vis                 Use visdom to visualise training (default: True)
  --no-vis              Do not use visdom to visualise training (default:
                        True)
  --model MODEL         Model checkpoint path, use if you wish to continue
                        training from a checkpoint (default: None)
  --info INFO           Use this to attach notes to your runs (default: )
  --stop_on_solve       Stop as soon as the environment is solved (default:
                        True)
  --no-stop_on_solve    Continue even after the environment is solved
                        (default: True)

For example, this is what I use the following command to train an agent using MADDPG implementaion

python maddpg_trainer.py --max_t 5000 --num_episodes 10000

and DDPG implementaion

python ddpg_trainer.py --max_t 5000 --num_episodes 10000

Real-time monitoring

Open your web browser to view the realtime training plots @ http://127.0.0.1:8097

Every time you run the trainer, a new directory is created under src/runs with following contents:

  • log file
  • hyperparams.json : contains the configuration used
  • actor_losses.txt (actor_losses_multi.txt for 20 agents env): contains the loss for actor
  • critic_losses.txt (critic_losses_multi.txt for 20 agents env): contains the loss for critic
  • scores.txt : contains the entire score history
  • scores_full.txt: Also contains the entire history but above file is updated at every episode so if you terminate before completing all episodes, this file will not be generated.
  • checkpoint_actor.pth: Best weights for actor model
  • checkpoint_critic.pth: Best weights for critic model

image

Play

To see the players in action, use the uploaded model from checkpoints directory.

conda activate myenv
cd src
python player.py --help
usage: player.py [-h] [--env ENV] [--model MODEL] [--agent AGENT]

optional arguments:
  -h, --help     show this help message and exit
  --env ENV      Full path of environment (default: None)
  --model MODEL  Model checkpoint path, use if you wish to continue training
                 from a checkpoint (default: None)
  --agent AGENT  Number of agents. Specify either 1 or 20 (default: None)

For example

python player.py --env ./env/Tennis_Linux/Tennis.x86_64 --agent maddpg --model ./checkpoint/maddpg/multi

About

In this repo, I implement deep deterministic policy gradients and multi-agent deep deterministic poilicy gradients to solve the Tennis enironment (Unity ML-Agents)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages