This university project, created for the course "Bildverarbeitung, Maschinelles Lernen und Computer Vision," presents a comparative study on Reinforcement Learning (RL) applied to Super Mario Bros, utilizing Double Q-Network (DQN) and Dueling Q-Network (DDQN) agents. Inspired by Richard E. Bellman’s principle of decomposing complex problems into manageable subproblems, our approach leverages Q-learning techniques to assess each agent's performance on levels 1-1 and 1-4, using the nes-py
emulator.
- Level 1-1 provides the standard vanilla environment.
- Level 1-4, on the other hand, introduces more challenging dynamics, demanding more intricate skills, but shorter episodes.
A key feature of this work is the use of a large replay buffer of 300,000 experiences, combined with an epsilon-greedy strategy (decaying epsilon from 1 to 0.02) to retain diverse game states and capture long-term dependencies.
After Episode 4000: | After Episode 8000: | After Episode 13500: |
After Episode 12500: |
After Episode 26500: | After Episode 28500: | After Episode 29500: |
After Episode 20000: | After Episode 26000: | After Episode 33500: | After Episode 40000: |