This code applies the upper confidence bound method on a synthetic dataset named "data.mat". The data is organized in the form of timesteps X num of ads. The objective is to reduce the regret between the best possible reward and the reward that this algorithm outputs. This is one type of determinisitic bandit problem with partial feedback. Before running the code, download "data.mat" and place it in the same folder as your code.
-
Notifications
You must be signed in to change notification settings - Fork 0
prashanth-prakash/Multi-arm-bandit-problems-UCB1
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published