A bot is placed in a maze with walls, negative blocks and positive blocks. The goal is to find the shortest path to one of the positive blocks which maximizes the reward.
Some large maps (100x100),
Smaller maps (50x50),
Q Learning challenge by @Sirajology on Youtube
- Python 2.7
- tkinter
- numpy
If on Ubuntu you can install tkinter for python2.7 with $sudo apt-get install python-tk
Run python Learner.py
in terminal to see the the bot in action. It'll keep trying various paths until it finds the optimal strategy and then follows that from then onwards.
- Player randomly initialized at a position where there is no wall
- Walls are generated using cellular automata. More details about this here
- World size is 50x50 (Can be changed as required)
- Red (Negative blocks) and Green (Positive blocks) are generated at random positions where no walls are present. The number of each of these blocks is configurable, currently 3 Red and 2 Green blocks are generated.
- Exploration/exploitation tradeoff so that the bot initially choses actions at random (since initially the Q table has random values) and then slowly over time chooses actions based on the Q Table.
- Limit total moves to 5000 after which the bot restarts (helpful when the bot gets stuck at some area of the map) and reduce sleep time so the iterations happen faster and we get results quicker.
- Walls generated need to be filtered out to ensure that small regions are removed and small holes are covered. This can be done by performing a DFS in each region to count the number of walls/holes in each region and then remove them if needed.
- Terrain can also be generated using 2D perlin noise (It would be interesting to compare the results of the two generation methods)
The credits for the base maze game code go to PhillipeMorere.