Suppose that an agent wishes to navigate Gridworld:
The agent, who begins at the starting state S, cannot pass through the shaded squares (an obstacle), and "succeeds" by reaching the goal state G, where a reward is given.
After the first 1000 attempts (episodes) to the navigate the grid, the obstacle moves, so the agent must navigate a new grid:
The adaptation of the agent to moving obstacles is a demonstration of Q-learning. Here I implement a successful adaptation of the agent to the moving obstacle.
This implementation depends on numpy
,
matplotlib
, and OpenAI's
gym
. All code is written for Python 2.7.
In a command line, clone this repository and install the needed dependencies with:
git clone https://github.com/ericeasthope/gridworld-rl.git
cd gridworld-rl
pip install -r requirements.txt --user
Note: for some users, pip
may default to installing packages to Python 3.
Alternatively, to ensure that packages are installed to Python 2.7 (and if
python2.7
is recognized as a path), use:
python2.7 -m pip install -r requirements.txt --user
Here I am using pytest
, which is
installed with the other dependencies. Run the tests
with:
cd gridworld-rl
pytest
Run the implementation with:
python gridworld.py
Note: for some users, python
may default to using Python 3.
Alternatively, to ensure that gridworld.py
executes with Python 2.7 (and if
python2.7
is recognized as a path), use:
python2.7 gridworld.py
At first, the agent is seen to take many steps with little to no reward.
However, once the agent reaches the goal state G a number of times, the ratio of cumulative reward to number of episodes appears to trend linearly. The agent is also taking few steps to reach the goal state, so the ratio of cumulative reward to number of steps tends to increase.
At the 1000th episode, whereupon the agent must adapt to a moved obstacle, this trend in the reward-step ratio is temporarily disturbed. Nevertheless, the agent continues to receive reward, and eventually learns a new optimal policy for reaching for the goal state in fewer steps.