MDP

A project aiming to play chrome dino with Deep Q-learning. Online game website: (https://chromedino.com)

MDP

A Markov decision process is a discrete time stochastic control process. At each time step, the process is in some state s, and the decision maker may choose any action a that is available in state s. The process responds at the next time step by moving into a new state s′,and giving the decision maker a corresponding reward Ra(s,s′). In our case, the decision maker is the little dino, while the screenshots which the dino can see are states.

DQN

Our goal is to find the optimal policy, which is equavaliant to finding to optimal all action-values functions. Becasue overall we aim to maximise the expected return from the very first time step onwards. With a MDP model (the rewards and the transition probability matrix are avaliable), we can achieve with policy or value iteration. However, for this game, the MDP model is unknown, but we can approximate the action-value function by monte carlo method or temporal difference learning. This states of the game is huge, hence we apply the DQN with experience replay to find the optimal action-value functions.

Why using experience replay?

Make data close to i.i.d; better convergence behavior; more efficient use of data.

Triky part of this task

Defining the reward function is hard. Tuning of hyper-parameters. The game speed is getting faster when you play. The blackground will shift between day and night after 700 points.

Best performance

The highest score is around 1000.

Reproduce

It maybe a bit hard to reproduce my results because I am iteracting with the game by purely looking at the screenshots, so you may need to adjust the bounding bboxes for detecting start, end of the game. The entrance is main.py

Video

For demo, please go to https://www.youtube.com/watch?v=nC1GX7X_aHA&feature=youtu.be

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.idea		.idea
__pycache__		__pycache__
app		app
start_end_shots		start_end_shots
.gitignore		.gitignore
README.md		README.md
config.py		config.py
game_control.py		game_control.py
main.py		main.py
models.py		models.py
player_control.py		player_control.py
rl_recorder.py		rl_recorder.py
take_screen_shots.py		take_screen_shots.py
timer.py		timer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MDP

DQN

Why using experience replay?

Triky part of this task

Best performance

Reproduce

Video

About

Releases

Packages

Languages

iamlxb3/DinoTrex_Deep_Q_learning

Folders and files

Latest commit

History

Repository files navigation

MDP

DQN

Why using experience replay?

Triky part of this task

Best performance

Reproduce

Video

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages