Decentralized Online Bandit Optimization on Directed Graphs with Regret Bounds

This repository contains the source code to generate the numerical results in the paper Decentralized Online Bandit Optimization on Directed Graphs with Regret Bounds. The simulation scenario is inspired by The AI economist. In particular, there are $M+1$ players where one player acts as a socio-economic planner and the remaining players act as workers. The game is played over T rounds and proceeds as follows over each round:

The socio-economic planner decides for a taxation policy
The workers observe the taxation policy and pick actions consecutively
Each worker action is mapped to an income and a labor cost
The net income of each worker is obtained by subtracting the tax collectedby the socio-economic planner
The worker utility is decided from the net income and the labor cost
The bandit reward is a weighted average of all worker utilities and the collected tax (all normalized to [0,1])
All the players observe the bandit reward and update their respective policies.

To generate Fig.3 in the paper, run:

python3 main.py --T 1000000 --K 100

This command will store the result in output.csv and generate the figure below.

Fig. 3

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Decentralized Online Bandit Optimization on Directed Graphs with Regret Bounds

About

Releases

Packages

Languages

johanos1/bandit_optimization_dag

Folders and files

Latest commit

History

Repository files navigation

Decentralized Online Bandit Optimization on Directed Graphs with Regret Bounds

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages