Meta-Thompson Sampling

Branislav Kveton, Mikhail Konobeev, Manzil Zaheer, Chih-Wei Hsu, Martin Mladenov, Craig Boutilier, Csaba Szepesvari
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:5884-5893, 2021.

Abstract

Efficient exploration in bandits is a fundamental online learning problem. We propose a variant of Thompson sampling that learns to explore better as it interacts with bandit instances drawn from an unknown prior. The algorithm meta-learns the prior and thus we call it MetaTS. We propose several efficient implementations of MetaTS and analyze it in Gaussian bandits. Our analysis shows the benefit of meta-learning and is of a broader interest, because we derive a novel prior-dependent Bayes regret bound for Thompson sampling. Our theory is complemented by empirical evaluation, which shows that MetaTS quickly adapts to the unknown prior.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-kveton21a, title = {Meta-Thompson Sampling}, author = {Kveton, Branislav and Konobeev, Mikhail and Zaheer, Manzil and Hsu, Chih-Wei and Mladenov, Martin and Boutilier, Craig and Szepesvari, Csaba}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {5884--5893}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/kveton21a/kveton21a.pdf}, url = {https://proceedings.mlr.press/v139/kveton21a.html}, abstract = {Efficient exploration in bandits is a fundamental online learning problem. We propose a variant of Thompson sampling that learns to explore better as it interacts with bandit instances drawn from an unknown prior. The algorithm meta-learns the prior and thus we call it MetaTS. We propose several efficient implementations of MetaTS and analyze it in Gaussian bandits. Our analysis shows the benefit of meta-learning and is of a broader interest, because we derive a novel prior-dependent Bayes regret bound for Thompson sampling. Our theory is complemented by empirical evaluation, which shows that MetaTS quickly adapts to the unknown prior.} }
Endnote
%0 Conference Paper %T Meta-Thompson Sampling %A Branislav Kveton %A Mikhail Konobeev %A Manzil Zaheer %A Chih-Wei Hsu %A Martin Mladenov %A Craig Boutilier %A Csaba Szepesvari %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-kveton21a %I PMLR %P 5884--5893 %U https://proceedings.mlr.press/v139/kveton21a.html %V 139 %X Efficient exploration in bandits is a fundamental online learning problem. We propose a variant of Thompson sampling that learns to explore better as it interacts with bandit instances drawn from an unknown prior. The algorithm meta-learns the prior and thus we call it MetaTS. We propose several efficient implementations of MetaTS and analyze it in Gaussian bandits. Our analysis shows the benefit of meta-learning and is of a broader interest, because we derive a novel prior-dependent Bayes regret bound for Thompson sampling. Our theory is complemented by empirical evaluation, which shows that MetaTS quickly adapts to the unknown prior.
APA
Kveton, B., Konobeev, M., Zaheer, M., Hsu, C., Mladenov, M., Boutilier, C. & Szepesvari, C.. (2021). Meta-Thompson Sampling. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:5884-5893 Available from https://proceedings.mlr.press/v139/kveton21a.html.

Related Material