Skip to content

Commit

Permalink
Imitation Dynamics and Regret Minimization models added to the repo (#…
Browse files Browse the repository at this point in the history
…229)

* Imitation Dynamics and Regret Minimization models added to the repo

* reformatted the code using black

* adding the documentation for imitation dynamics and regret minimization

* removed the main() calls from the test files

* recommended changes are made into the code and the documentation

* changed the max_iterations and num_generations into iterations within the code and the docs

* reformatted with black version 24.3.0

* removed the reference of greedy algorithm and the note from imitation docs

* ran tox to fix the documentation issues and moved the functions to learning

* edited the how-to files as recommented
  • Loading branch information
sandeepvshenoy authored Mar 25, 2024
1 parent 335a8ed commit a3fb697
Show file tree
Hide file tree
Showing 10 changed files with 634 additions and 0 deletions.
36 changes: 36 additions & 0 deletions docs/how-to/solve-with-imitation-dynamics.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
.. _how-to-use-imitation-dynamics:

Solve with Imitation Dynamics
==============================

One of the algorithms implemented in :code:`Nashpy` is called
:code:`imitation_dynamics()`, this is implemented as a method on the :code:`Game`
class::

>>> import nashpy as nash
>>> import numpy as np
>>> A = np.array([[3, -1], [-1, 3]])
>>> B = np.array([[-3, 1], [1, -3]])
>>> rps = nash.Game(A,B)

This :code:`imitation_dynamics` method returns a generator of the outcomes
of the imitation dynamics algorithm::

>>> ne_imitation_dynamics = rps.imitation_dynamics()
>>> print(list(ne_imitation_dynamics))
[(array([0., 1.]), array([1., 0.]))]

:code:`imitation_dynamics` takes the following parameters :code:`iterations`, :code:`population_size`, :code:`random_seed` and :code:`threshold` within the function :code:`imitation_dynamics`.

>>> import nashpy as nash
>>> import numpy as np
>>> A = np.array([[3, -1,3], [-1, 3,6], [-1, 1,2]])
>>> B = np.array([[-3, 1,4], [1, -3,3], [-1, 3,4]])
>>> rps = nash.Game(A,B)
>>> population_size=200
>>> iterations=100
>>> random_seed=30
>>> threshold=0.3
>>> ne_imitation_dynamics = rps.imitation_dynamics(population_size=population_size,iterations=iterations,random_seed=random_seed,threshold=threshold)
>>> list(ne_imitation_dynamics)
[(array([0., 1., 0.]), array([1., 0., 1.]))]
33 changes: 33 additions & 0 deletions docs/how-to/solve-with-regret-minimization.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
.. _how-to-use-regret-minimization:

Solve with Regret Minimization
==============================

One of the algorithms implemented in :code:`Nashpy` is called
:code:`regret_minimization()`, this is implemented as a method on the :code:`Game`
class::

>>> import nashpy as nash
>>> import numpy as np
>>> A = np.array([[3, -1], [-1, 3]])
>>> B = np.array([[-3, 1], [1, -3]])
>>> rps = nash.Game(A,B)

This :code:`regret_minimization` method returns a generator of the outcomes
of the regret minimization algorithm::

>>> ne_regret_mini = rps.regret_minimization()
>>> print(list(ne_regret_mini))
[([0.5, 0.5], [0.5, 0.5])]

:code:`regret_minimization` takes the following parameters :code:`learning_rate` and :code:`iterations`.

>>> A = np.array([[3, -1,3], [-1, 3,6], [-1, 1,2]])
>>> B = np.array([[-3, 1,4], [1, -3,3], [-1, 3,4]])
>>> rps = nash.Game(A,B)
>>> learning_rate = 0.2
>>> iterations = 1000
>>> ne_regret_mini = rps.regret_minimization(learning_rate=learning_rate,
iterations=iterations)
>>> print(list(ne_regret_mini))
[([0.0, 1.0, 0.0], [0.0, 0.0, 1.0])]
65 changes: 65 additions & 0 deletions docs/text-book/imitation-dynamics.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
Detailed Mathematical Model of Imitation Dynamics
==================================================

Introduction
------------

The mathematical model of imitation dynamics describes how individuals in a population adapt their strategies over time based on observing the strategies of others. This document provides a detailed mathematical formulation for understanding and simulating imitation dynamics in strategic games.

Initialization
---------------

- Let `N` denote the number of individuals in the population.
- Let `M` denote the number of strategies available to each individual.
- Initialize the population as an :math:`N \times M` \ matrix `P`, where each row represents the strategy of an individual.

Interaction and Payoff Calculation
-----------------------------------

- Define a payoff matrix `U` for each player, where :math:`U_ij` represents the payoff for player `i` when they choose strategy `j` and their opponent chooses strategy `k`.
- Calculate the payoff for each individual `n` given their strategy :math:`P_n` and the strategies of all other individuals:
:math:`\text{Payoff}_n = \text{P}_n \cdot U \cdot P^T`

Imitation Mechanism
--------------------

- Identify the fittest individual based on their payoffs.
- Let `F` be the index (or indices) of the fittest individual.
- Update the strategies of all individuals to match the strategy of the fittest individual:
:math:`P_n` = :math:`P_F`, for all :math:`n = 1, 2, \ldots, N`

Population Update
-----------------

- Repeat the interaction, payoff calculation, and imitation mechanism steps for a certain number of generations or until convergence.

Convergence and Nash Equilibrium
---------------------------------

- Check for convergence by comparing strategies of successive generations.
- If strategies stabilize, it indicates a potential Nash equilibrium.

Thresholding (Optional)
------------------------

- Apply a thresholding mechanism to discretize strategies, values must range between 0 and 1, defaulted to 0.5.

Comparison with Fictitious Play
-------------------------------

Even though the Imitation dynamics method to find equilibrium looks similar to the Fictional Play method, with strategies updated adaptively over time and players adjusting their strategies based on observations of past interactions or outcomes, there are a few differences between them, which are listed below.

**Key Differences between Imitation Dynamics and Fictitious Play**


**Strategy Update Mechanism**

- In :code:`imitation_dynamics`, players copy the strategy of the most successful individual. This means that at each iteration, players directly mimic the strategy of the individual who achieved the highest payoff.

- In :code:`fictitious_play`, on the other hand, players update their strategies based on observed play counts of opponents' strategies. This involves players selecting their next move based on the cumulative history of their opponents' strategies rather than directly imitating successful players.

Using Nashpy
------------

See :ref:`how-to-use-imitation-dynamics` for guidance of how to use Nashpy to
simulation Imitation Dynamics.
43 changes: 43 additions & 0 deletions docs/text-book/regret_minimization.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
Regret Minimization in Game Theory
==================================

Introduction
------------

In the context of game theory, "Regret" refers to the difference between a player's actual payoff and the payoff they would have received by playing a different strategy. By minimising regrets, players converge towards a Nash Equilibrium where no player has an incentive to deviate from their chosen strategy unilaterally

Regret minimization is a concept used in game theory to model how players learn and adapt their strategies over time. It measures the "regret" experienced by a player for not choosing a different strategy that could have yielded a better outcome in hindsight.

Mathematically, let's consider a sequential game where player :math:`i` selects a strategy from a set :math:`S_i` at each time step. The regret :math:`R_i(t)` of player :math:`i` at time :math:`t` with respect to strategy :math:`s` is defined as:

.. math::
R_i(t) = \max_{s' \in S_i} \sum_{\tau=1}^{t} u_i(s', s_{-i}^\tau) - \sum_{\tau=1}^{t} u_i(s, s_{-i}^\tau)
where:
- :math:`s_{-i}^\tau` represents the joint strategy profile of all players except :math:`i` up to time :math:`\tau`.
- :math:`u_i(s, s_{-i}^\tau)` is the utility or payoff obtained by player :math:`i` when playing strategy :math:`s` against the joint strategy :math:`s_{-i}^\tau`.

The regret measures how much payoff player :math:`i` could have gained if they had chosen a different strategy instead of :math:`s` in each time step up to time :math:`t`.

Regret Minimization Implementation
----------------------------------

An algorithm can be used to minimize "regret" by iteratively updating strategies based on past regrets. At each time step, a player selects the strategy that minimizes their regret. This approach aims to converge towards a strategy profile with low regret.

Mathematically, the algorithm can be summarized as follows:

- Initialize strategies for all players.
- At each time step :math:`t`:
- Calculate the regret for each player based on their current strategy.
- Update strategies by selecting the option that minimizes regret for each player.
- Repeat until convergence or a stopping condition is met.

By iteratively updating strategies to minimize regret, players can learn to make better decisions over time and potentially converge towards a Nash equilibrium in the game.


Using Nashpy
------------

See :ref:`how-to-use-regret-minimization` for guidance of how to use Nashpy to
use Regret Minimization.
66 changes: 66 additions & 0 deletions src/nashpy/game.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
)
from .learning.stochastic_fictitious_play import stochastic_fictitious_play
from .utils.is_best_response import is_best_response
from .learning.regret_minimization import regret_minimization
from .learning.imitation_dynamics import imitation_dynamics


class Game:
Expand Down Expand Up @@ -423,3 +425,67 @@ def linear_program(self):
row_strategy = linear_program(row_player_payoff_matrix=A)
column_strategy = linear_program(row_player_payoff_matrix=B.T)
return row_strategy, column_strategy

def regret_minimization(self, learning_rate=0.1, iterations=100):
"""
Obtain the Nash equilibria using regret minimization method using N number of itreations.
The code provided is based on the concept of regret matching,
with the fixed learning rate.
Algorithm implemented here is Algorithm 4.3 Theorem 4.4 of [Nisan2007]_
1. Build best Strategies probability of both players
Parameters
----------
learning_rate : float
The learning_rate determines the magnitude of the update towards the regrets
iterations : Integer
This value is defaulted to 100 itrations, this number could be modified to a larger or smaller number based on the untilities/payoff matrix shape
Returns
-------
Generator
The equilibria.
"""
A, B = self.payoff_matrices
return regret_minimization(
A=A, B=B, learning_rate=learning_rate, iterations=iterations
)

def imitation_dynamics(
self,
population_size=100,
iterations=1000,
random_seed=None,
threshold=0.5,
):
"""
Simulate the imitation dynamics for a given game represented by payoff matrices A and B.
Parameters
----------
population_size : number
number of individuals in the population of the group (default: 100)
iterations : number
number of generations to simulate (default: 1000)
random_seed : number
seed for reproducibility (default: None)
threshold : float
threshold value for representing strategies as 0 or 1 (default: 0.5)
Returns
-------
Generator
The equilibria.
"""
A, B = self.payoff_matrices
return imitation_dynamics(
A=A,
B=B,
population_size=population_size,
iterations=iterations,
random_seed=random_seed,
threshold=threshold,
)
103 changes: 103 additions & 0 deletions src/nashpy/learning/imitation_dynamics.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
"""A function for a Imitation Dynamics algorithm"""

import numpy as np
from typing import Generator, Tuple, Any
import numpy.typing as npt


def payoff(player_strategy, opponent_strategy, player_payoff_matrix):
"""
Calculate the payoff of a player given their strategy and the opponent's strategy.
Parameters
----------
player_strategy: numpy array
representing the strategy of the player
opponent_strategy: numpy array
representing the strategy of the opponent
player_payoff_matrix: numpy matrix
representing the payoff matrix for the player
Returns
-------
return_value: scalar representing strategy and payoff matrix
"""
return_value = np.dot(
player_strategy, np.dot(player_payoff_matrix, opponent_strategy)
)
return return_value


def imitation_dynamics(
A: npt.NDArray,
B: npt.NDArray,
population_size=100,
iterations=1000,
random_seed=None,
threshold=0.5,
) -> Generator[Tuple[float, float], Any, None]:
"""
Simulate the imitation dynamics for a given game represented by payoff matrices A and B.
Parameters
----------
A : numpy matrix
representing the payoff matrix for Player 1
B : numpy matrix
representing the payoff matrix for Player 2
population_size : number
number of individuals in the population of the group (default: 100)
iterations : number
number of generations to simulate (default: 1000)
random_seed : number
seed for reproducibility (default: None)
threshold : float
threshold value for representing strategies as 0 or 1 (default: 0.5)
Yields
-------
Generator
The equilibria.
"""
num_strategies = len(A)

# Initialize population
if random_seed:
np.random.seed(random_seed) # Set random seed for reproducibility

population_A = np.random.dirichlet(np.ones(num_strategies), size=population_size)
population_B = np.random.dirichlet(np.ones(num_strategies), size=population_size)

for generation in range(iterations):
# Play the game
payoffs_A = np.array(
[
payoff(population_A[i], population_B[i], A)
for i in range(population_size)
]
)
payoffs_B = np.array(
[
payoff(population_B[i], population_A[i], B)
for i in range(population_size)
]
)

# Update population based on payoffs
# Used Imitation dynamics in which the players copy the strategy of the most successful individual
fittest_A_index = np.argmax(payoffs_A)
fittest_B_index = np.argmax(payoffs_B)
population_A = np.tile(population_A[fittest_A_index], (population_size, 1))
population_B = np.tile(population_B[fittest_B_index], (population_size, 1))

# Calculate Nash equilibrium strategies
nash_equilibrium_A = np.mean(population_A, axis=0)
nash_equilibrium_B = np.mean(population_B, axis=0)

# Threshold the strategies
nash_equilibrium_A[nash_equilibrium_A >= threshold] = 1
nash_equilibrium_A[nash_equilibrium_A < threshold] = 0
nash_equilibrium_B[nash_equilibrium_B >= threshold] = 1
nash_equilibrium_B[nash_equilibrium_B < threshold] = 0

yield nash_equilibrium_A, nash_equilibrium_B
Loading

0 comments on commit a3fb697

Please sign in to comment.