Reinforcement Learning for Super Mario Bros using A3C on GPU

Modified to work with gym-super-mario-bros

NOTE: This is an unofficial fork of the original code published as a3c-super-mario-pytorch.

This project is based on the paper Asynchronous Methods for Deep Reinforcement Learning, with custom training modifications. This project was originally created for the course Deep Learning for Computer Vision held at TUM.

Prerequisites

Python3.5+
PyTorch 0.3.0+
OpenAI Gym <=0.9.5
opencv-python
gym-super-mario-bros

Getting Started

The Super Mario Bros NES environment has to be set up. We are using Kautenja's Super Mario Bros implementation for gym with some modifications to run on the current OpenAI Gym version.

Training and Testing

To train the network from scratch, use the following command

python3 train-mario.py --num-processes 8

This command requires atleast an 8-Core system with 16GB memory and 6GB GPU memory. You can reduce the number of processes to run on a personal system, but expect the training time to increase drastically.

python3 train-mario.py --num-processes 2 --non-sample 1

This command requires atleast a 2-Core system with 4GB memory and 2GB GPU memory.

1 test process is created with remaining train processes. Test stores data in a CSV file inside save folder, which can be plotted later

The training process uses random and non-random processes so that it converges faster. By default there are two non-random processes, which can be changed using args.

The random processes behaves exactly like the non-random processes when there is a clear difference in the output probabilities of the network. The non-random training processes exactly mimmic the test output, which helps train the network better.

Custom rewards are used to train the model more efficiently. They can be changed using the info dictionary or by modifying the wrappers file in common/atari_wrappers.py

More arguments are mentioned in the file train-mario.py.

Results

After ~20 hours of training on 8 processes (7 Train, 1 Test) the game converges.

Custom rewards used:

Time = -0.1
Distance = +1 or 0
Player Status = +/- 5
Score = 2.5 x [Increase in Score]
Done = +20 [Game Completed] or -20 [Game Incomplete]

The trained model is saved in save/trained-models/mario_a3c_params.pkl. Move it outside, to the save folder, to run the trained model.

Repository References

This project heavily relied on:

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
common		common
graphs		graphs
models		models
optimizer		optimizer
save/pretrained		save/pretrained
trainer		trainer
utils		utils
video		video
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
train-mario.py		train-mario.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning for Super Mario Bros using A3C on GPU

Modified to work with gym-super-mario-bros

Prerequisites

Getting Started

Training and Testing

Results

Repository References

About

Releases

Packages

Languages

License

Rochan-A/a3c-super-mario-pytorch

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning for Super Mario Bros using A3C on GPU

Modified to work with gym-super-mario-bros

Prerequisites

Getting Started

Training and Testing

Results

Repository References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages