This repository contains the source code to generate the numerical results in the paper Decentralized Online Bandit Optimization on Directed
Graphs with Regret Bounds. The simulation scenario is inspired by The AI economist. In particular, there are
- The socio-economic planner decides for a taxation policy
- The workers observe the taxation policy and pick actions consecutively
- Each worker action is mapped to an income and a labor cost
- The net income of each worker is obtained by subtracting the tax collectedby the socio-economic planner
- The worker utility is decided from the net income and the labor cost
- The bandit reward is a weighted average of all worker utilities and the collected tax (all normalized to [0,1])
- All the players observe the bandit reward and update their respective policies.
To generate Fig.3 in the paper, run:
python3 main.py --T 1000000 --K 100
This command will store the result in output.csv and generate the figure below.
Fig. 3