Tabular Reinforcement Learning applied on Tic Tac Toe

Applying valute iteration and MDP to to teach a reinforcement learning agent playing tic-tac-toe. The code is written in Python from scratch, and the policy is near-optimal.

The memory folder contains initialized and re-evaluated state-value pairs. (load using Pickle)

Guidelines:

Step 1. Extract all the possible states and initialize their values:

python3 state_extractor.py

Step 2. Run value iteration over all the states until convergence:

python3 value_iterator.py

Step 3. Several ways to check the policy

Method I: AI plays against a randomly playing agent:

python3 markov_eval.py

Method II: AI plays against itself:

python3 against_itself.py

Method III: AI plays against human:

python3 human.py

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
memory		memory
old_method		old_method
LICENSE.md		LICENSE.md
README.md		README.md
against_itself.py		against_itself.py
game_gui.py		game_gui.py
human.py		human.py
markov_eval.py		markov_eval.py
state_extractor.py		state_extractor.py
tictactoe.py		tictactoe.py
utils.py		utils.py
value_iterator.py		value_iterator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tabular Reinforcement Learning applied on Tic Tac Toe

Guidelines:

About

Releases

Packages

Languages

License

meraccos/tictactoe-reinforcement-learning

Folders and files

Latest commit

History

Repository files navigation

Tabular Reinforcement Learning applied on Tic Tac Toe

Guidelines:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages