Multi-arm-bandit-problems-UCB1

This code applies the upper confidence bound method on a synthetic dataset named "data.mat". The data is organized in the form of timesteps X num of ads. The objective is to reduce the regret between the best possible reward and the reward that this algorithm outputs. This is one type of determinisitic bandit problem with partial feedback. Before running the code, download "data.mat" and place it in the same folder as your code.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
UCB1.py		UCB1.py
data.mat		data.mat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-arm-bandit-problems-UCB1

About

Releases

Packages

Languages

prashanth-prakash/Multi-arm-bandit-problems-UCB1

Folders and files

Latest commit

History

Repository files navigation

Multi-arm-bandit-problems-UCB1

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages