Nash Q Learning

Implementation of the Nash Q-Learning algorithm to solve games with two agents, as seen in the course Multiagent Systems @ PoliMi. The algorithm was first introduced in the paper Nash q-learning for general-sum stochastic games (Hu, J., Wellman, M.P., 2003).

Feel free to use for your own projects or contribute!

Example

Consider the following game where two robots need to reach the reward. One obstacle lies in the middle of the grid. The two robots cannot be on the same tile at the same moment, except for the reward's tile. See this notebook for a detailed walkthrough.

The robots and the game grid are represented by the Player and Grid objects.

from NashQLearn import Player, Grid
#Initialize the two players
player1 = Player([0,0])
player2 = Player([2,0])
#Initialize the grid
grid = Grid(length = 3,
            width = 3,
            players = [player1,player2],
            obstacle_coordinates = [[1,1]], 
            reward_coordinates = [1,2],
            reward_value = 20,
            collision_penalty = -1)

Once the game settings are defined, a NashQLearning object is initialized with the desired hyperparameters and trained on the grid.

from NashQLearn import NashQLearning
nashQ = NashQLearning(grid, 
                      max_iter = 2000,
                      discount_factor = 0.7,
                      learning_rate = 0.7,
                      epsilon = 0.5,
                      decision_strategy = 'epsilon-greedy') #Available strategies : 'random', 'greedy', and 'epsilon-greedy'
#Retrieve the Q tables after fitting the algorithm
Q0, Q1 = nashQ.fit(return_history = False)
#Best path followed by each player given the values in the Q tables
p0, p1 = nashQ.get_best_policy(Q0,Q1)

#Show the results
print('Player 0 follows the  policy : %s of length %s.'%('-'.join(p0),len(p0)))
>>> Player 0 follows the  policy : up-up-right of length 3.
print('Player 1 follows the  policy : %s of length %s.'%('-'.join(p1),len(p1)))
>>> Player 1 follows the  policy : up-up-left of length 3.

In this case, the joint optimal policy was found by the algorithm, as shown on the figure below.

Requirements

python>=3.7
numpy
tqdm
nashpy

The package nashpy is used to compute the Nash equilibrium for each stage game during the learning process.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
img		img
notebook		notebook
LICENSE		LICENSE
NashQLearn.py		NashQLearn.py
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nash Q Learning

Example

Requirements

About

Languages

License

jtonglet/Nash-Q-Learning

Folders and files

Latest commit

History

Repository files navigation

Nash Q Learning

Example

Requirements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages