Skip to content

Implementation of several algorithms for combinatorial bandits, a kind of reinforcement learning.

License

Notifications You must be signed in to change notification settings

dourouc05/CombinatorialBandits.jl

Repository files navigation

CombinatorialBandits

Project Status: Active – The project has reached a stable, usable state and is being actively developed. The MIT License example workflow Coverage Status codecov.io

This package implements several algorithms to deal with combinatorial multi-armed bandit (CMAB), including the first polynomial-time optimum-regret algorithms: AESCB (described in our paper) and AOSSB (article in press).

See also Bandits.jl, focusing on multi-armed bandits (i.e. not combinatorial).

To install:

]add CombinatorialBandits

Example usage:

using CombinatorialBandits, Distributions

n = 20
m = 8
ε = 0.1
distr = Distribution[Bernoulli(.5 + ((i % 3 == 0) ? ε : -ε)) for i in 1:n]

i = MSet(distr, 8, MSetAlgosSolver())
@time simulate(i, ThompsonSampling(), 200)
@time simulate(i, LLR(), 200)
@time simulate(i, CUCB(), 200)
@time simulate(i, ESCB2(ESCB2Budgeted(.1, true)), 200)

Citing

If you use this package in your research, please cite either article:

@article{cuvelier2021aescb,
    author = {Cuvelier, Thibaut and Combes, Richard and Gourdin, Eric},
    title = {Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits},
    year = {2021},
    issue_date = {March 2021},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    volume = {5},
    number = {1},
    url = {https://doi.org/10.1145/3447387},
    doi = {10.1145/3447387},
    journal = {Proc. ACM Meas. Anal. Comput. Syst.},
    month = feb,
    articleno = {09},
    numpages = {31},
    keywords = {combinatorial bandits, combinatorial optimization, bandits}
}

@article{cuvelier2021glpg,
  title={Asymptotically Optimal Strategies For Combinatorial Semi-Bandits in Polynomial Time},
  author={Cuvelier, Thibaut and Combes, Richard and Gourdin, Eric},
  journal={arXiv preprint arXiv:2102.07254},
  year={2021}
}

About

Implementation of several algorithms for combinatorial bandits, a kind of reinforcement learning.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •