This program allows beginners to play around with a machine learning method such as Q-learning.
The whole charm of the Q-learning algorithm is that it works when the agent does not even know HOW to achieve the desired result.
The agent must learn to follow the optimal route, avoiding walls and traps.
For the successful application of this method, an agent feedback mechanism is required - reward.
For reaching the end point of the route, the agent receives a reward of +100 (motivates the agent to reach the required position), and for each move - a penalty of -1 (motivates the agent to achieve the result for the minimum number of moves).
The text of the program, abundantly provided with comments, is in the ql.py.
The settings are described in the settings.py:
hyperparameters:
ALPHA = 1.0
GAMMA = 0.95
walls:
WALLS = {(2,4), (3,4), (4,4), (5,4), (6,4), (10,8), (9,8), (10,6), (9,6), (8,6), (7,6), (6,6), (5,6), (1,7), (1,8), (1,9)}
traps:
TRAPS = {(2,2), (3,6), (5,8)}
When starting up, the desired number of episodes is requested (the last of them will be test).
Then the learning process starts, during which the agent’s travel process is graphically displayed.
Play it! :-)