Skip to content

johanos1/bandit_optimization_dag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Decentralized Online Bandit Optimization on Directed Graphs with Regret Bounds

This repository contains the source code to generate the numerical results in the paper Decentralized Online Bandit Optimization on Directed Graphs with Regret Bounds. The simulation scenario is inspired by The AI economist. In particular, there are $M+1$ players where one player acts as a socio-economic planner and the remaining players act as workers. The game is played over T rounds and proceeds as follows over each round:

  1. The socio-economic planner decides for a taxation policy
  2. The workers observe the taxation policy and pick actions consecutively
  3. Each worker action is mapped to an income and a labor cost
  4. The net income of each worker is obtained by subtracting the tax collectedby the socio-economic planner
  5. The worker utility is decided from the net income and the labor cost
  6. The bandit reward is a weighted average of all worker utilities and the collected tax (all normalized to [0,1])
  7. All the players observe the bandit reward and update their respective policies.

To generate Fig.3 in the paper, run:

python3 main.py --T 1000000 --K 100

This command will store the result in output.csv and generate the figure below.

Fig. 3

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages