2019 Edition - https://www.aicrowd.com/challenges/flatland-challenge
python src/main.py --train --num-episodes=10000 --prediction-depth=150 --eps=0.9998 --checkpoint-interval=100 --buffer-size=10000
tensorboard --logdir=runs
python src/main.py --render
python src/main.py --plot
Observations are obtained by concatenating the "rail occupancy bitmap" of an agent with the "heatmaps".
A "rail occupancy bitmap" shows on which rail and in which direction the agent is traveling at every timestep and is obtained as follows:
-
A directed graph representation of the railway network is generated trough BFS, each node is a switch and each edge is a rail between two switches:
-
The shortest path for each agent is computed
-
The path is then transformed into a bitmap with the timesteps as columns and the rails as rows.
The direction can be positive (1) if the agent is traveling the edge from the source node to the destination node or negative otherwise (-1),
Heatmaps are used to provide information about how the traffic is distributed across the rails over time.
Each agent computes 2 heatmaps, one positive and one negative, both are generated summing the bitmaps of all the other agents.
The architecture used is a Dueling DQN, where the input is a Conv2D layer that processes a concatenation of the agent bitmap, the positive and the negative heatmaps. Then data goes through two separate streams, the value (red) and the advantage (blue) to be recombined in the final output Q values (purple).
The training algorithm follows a Double Q Learning with Random Replay Buffer where the action space is reduced to 2 actions (stop and go) and the agent choices are based on a number of alternative paths that can be generated at every bifurcation point. At every fork the most promising path is chosen.
For more detailed information on the approaches see: