Temporal Gradient Correction in RL

the Reforcement Learning Algorithm used on a maze problem with Temporal-Gradient-Correction (TGC)

1.Intro

This projects base on an environment such that an explorer try to find the key firstly then escape from a maze, and provides several RL algorithms in the algorithm folder like SARSA, Q-learning, Monte-Carlo, as well as some updated algorithms.

The Simulation.py, define a class to simulate the process of explore

The env_maze.py provides a class to build a model with specific details of a maze and different feature mappings to grid which achieve the function approximation avaliable (not only table look-up).

The maze_sim.py, firstly gives a class MazeSim which define the details of explore such as relative reward and steps. Secondly, MazeFeatures class use the feature mapping on the grid.

2. Evaulation

With large size maze, the sarsa and q-learning will show disconvergence problem and by analysis, it could be overcomed by gradient descent correction (see the pdf file Comparison between conventional TD algorithm and TD with gradient descent).

See the comparsion below (10*10 maze)

Q-learning and Q-leaning with TDC

SARSA and SARSA with TDC

A solution to this maze problem by SARSA with TDC (Green Block means start and end, Yellow Block means the key)

3. Further Thoughts

This maze background is quite easy, the RL could be achieved in many complex cases like automation, navigation, game and so on.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
algorithm		algorithm
environment		environment
image_file		image_file
plot		plot
Comparison between conventional TD algorithm and TD with gradient descent.pdf		Comparison between conventional TD algorithm and TD with gradient descent.pdf
README.md		README.md
test_plot.py		test_plot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Temporal Gradient Correction in RL

1.Intro

2. Evaulation

Q-learning and Q-leaning with TDC

SARSA and SARSA with TDC

A solution to this maze problem by SARSA with TDC (Green Block means start and end, Yellow Block means the key)

3. Further Thoughts

About

Releases

Packages

Languages

yyimingucl/Temporal-Gradient-Correction-in-RL

Folders and files

Latest commit

History

Repository files navigation

Temporal Gradient Correction in RL

1.Intro

2. Evaulation

Q-learning and Q-leaning with TDC

SARSA and SARSA with TDC

A solution to this maze problem by SARSA with TDC (Green Block means start and end, Yellow Block means the key)

3. Further Thoughts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages