gridworld-rl : Q-learning with Python

Welcome to Gridworld

Suppose that an agent wishes to navigate Gridworld:

The agent, who begins at the starting state S, cannot pass through the shaded squares (an obstacle), and "succeeds" by reaching the goal state G, where a reward is given.

After the first 1000 attempts (episodes) to the navigate the grid, the obstacle moves, so the agent must navigate a new grid:

The adaptation of the agent to moving obstacles is a demonstration of Q-learning. Here I implement a successful adaptation of the agent to the moving obstacle.

Installation

This implementation depends on numpy, matplotlib, and OpenAI's gym. All code is written for Python 2.7.

In a command line, clone this repository and install the needed dependencies with:

git clone https://github.com/ericeasthope/gridworld-rl.git
cd gridworld-rl
pip install -r requirements.txt --user

Note: for some users, pip may default to installing packages to Python 3. Alternatively, to ensure that packages are installed to Python 2.7 (and if python2.7 is recognized as a path), use:

python2.7 -m pip install -r requirements.txt --user

Testing

Here I am using pytest, which is installed with the other dependencies. Run the tests with:

cd gridworld-rl
pytest

How to Use

Run the implementation with:

python gridworld.py

Note: for some users, python may default to using Python 3. Alternatively, to ensure that gridworld.py executes with Python 2.7 (and if python2.7 is recognized as a path), use:

python2.7 gridworld.py

Result

At first, the agent is seen to take many steps with little to no reward.

However, once the agent reaches the goal state G a number of times, the ratio of cumulative reward to number of episodes appears to trend linearly. The agent is also taking few steps to reach the goal state, so the ratio of cumulative reward to number of steps tends to increase.

At the 1000th episode, whereupon the agent must adapt to a moved obstacle, this trend in the reward-step ratio is temporarily disturbed. Nevertheless, the agent continues to receive reward, and eventually learns a new optimal policy for reaching for the goal state in fewer steps.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
images		images
.gitignore		.gitignore
README.md		README.md
gridworld.py		gridworld.py
requirements.txt		requirements.txt
test_gridworld.py		test_gridworld.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gridworld-rl : Q-learning with Python

Welcome to Gridworld

Installation

Testing

How to Use

Result

end

About

Releases

Packages

Languages

ericeasthope/gridworld-rl

Folders and files

Latest commit

History

Repository files navigation

gridworld-rl : Q-learning with Python

Welcome to Gridworld

Installation

Testing

How to Use

Result

end

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages