Meta-Thompson Sampling

Meta-Thompson Sampling. Konobeev, M., Zaheer, M., Hsu, C., Mladenov, M., Boutilier, C., & Szepesvári, C. In pages 5884–5893.

Paper

Link abstract bibtex

Efficient exploration in bandits is a fundamental online learning problem. We propose a variant of Thompson sampling that learns to explore better as it interacts with bandit instances drawn from an unknown prior. The algorithm meta-learns the prior and thus we call it MetaTS. We propose several efficient implementations of MetaTS and analyze it in Gaussian bandits. Our analysis shows the benefit of meta-learning and is of a broader interest, because we derive a novel prior-dependent Bayes regret bound for Thompson sampling. Our theory is complemented by empirical evaluation, which shows that MetaTS quickly adapts to the unknown prior.

@inproceedings{DBLP:conf/icml/KvetonKZHMBS21,
  author    = {
               Mikhail Konobeev and
               Manzil Zaheer and
               Chih{-}Wei Hsu and
               Martin Mladenov and
               Craig Boutilier and
               Csaba Szepesv{\'{a}}ri},
  crossref  = {ICML2021},
  title     = {Meta-Thompson Sampling},
  pages     = {5884--5893},
  url_paper = {ICML2021-MetaTS.pdf},
  url_link  = {http://proceedings.mlr.press/v139/kveton21a.html},
  abstract  = 	 {Efficient exploration in bandits is a fundamental online learning problem. We propose a variant of Thompson sampling that learns to explore better as it interacts with bandit instances drawn from an unknown prior. The algorithm meta-learns the prior and thus we call it MetaTS. We propose several efficient implementations of MetaTS and analyze it in Gaussian bandits. Our analysis shows the benefit of meta-learning and is of a broader interest, because we derive a novel prior-dependent Bayes regret bound for Thompson sampling. Our theory is complemented by empirical evaluation, which shows that MetaTS quickly adapts to the unknown prior.},
}

Downloads: 0

{"_id":"ZDGKiqhQAiojEsDSN","bibbaseid":"konobeev-zaheer-hsu-mladenov-boutilier-szepesvri-metathompsonsampling","author_short":["Konobeev, M.","Zaheer, M.","Hsu, C.","Mladenov, M.","Boutilier, C.","Szepesvári, C."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["Mikhail"],"propositions":[],"lastnames":["Konobeev"],"suffixes":[]},{"firstnames":["Manzil"],"propositions":[],"lastnames":["Zaheer"],"suffixes":[]},{"firstnames":["Chih-Wei"],"propositions":[],"lastnames":["Hsu"],"suffixes":[]},{"firstnames":["Martin"],"propositions":[],"lastnames":["Mladenov"],"suffixes":[]},{"firstnames":["Craig"],"propositions":[],"lastnames":["Boutilier"],"suffixes":[]},{"firstnames":["Csaba"],"propositions":[],"lastnames":["Szepesvári"],"suffixes":[]}],"crossref":"ICML2021","title":"Meta-Thompson Sampling","pages":"5884–5893","url_paper":"ICML2021-MetaTS.pdf","url_link":"http://proceedings.mlr.press/v139/kveton21a.html","abstract":"Efficient exploration in bandits is a fundamental online learning problem. We propose a variant of Thompson sampling that learns to explore better as it interacts with bandit instances drawn from an unknown prior. The algorithm meta-learns the prior and thus we call it MetaTS. We propose several efficient implementations of MetaTS and analyze it in Gaussian bandits. Our analysis shows the benefit of meta-learning and is of a broader interest, because we derive a novel prior-dependent Bayes regret bound for Thompson sampling. Our theory is complemented by empirical evaluation, which shows that MetaTS quickly adapts to the unknown prior.","bibtex":"@inproceedings{DBLP:conf/icml/KvetonKZHMBS21,\n author = {\n Mikhail Konobeev and\n Manzil Zaheer and\n Chih{-}Wei Hsu and\n Martin Mladenov and\n Craig Boutilier and\n Csaba Szepesv{\\'{a}}ri},\n crossref = {ICML2021},\n title = {Meta-Thompson Sampling},\n pages = {5884--5893},\n url_paper = {ICML2021-MetaTS.pdf},\n url_link = {http://proceedings.mlr.press/v139/kveton21a.html},\n abstract = \t {Efficient exploration in bandits is a fundamental online learning problem. We propose a variant of Thompson sampling that learns to explore better as it interacts with bandit instances drawn from an unknown prior. The algorithm meta-learns the prior and thus we call it MetaTS. We propose several efficient implementations of MetaTS and analyze it in Gaussian bandits. Our analysis shows the benefit of meta-learning and is of a broader interest, because we derive a novel prior-dependent Bayes regret bound for Thompson sampling. Our theory is complemented by empirical evaluation, which shows that MetaTS quickly adapts to the unknown prior.},\n}\n","author_short":["Konobeev, M.","Zaheer, M.","Hsu, C.","Mladenov, M.","Boutilier, C.","Szepesvári, C."],"key":"DBLP:conf/icml/KvetonKZHMBS21","id":"DBLP:conf/icml/KvetonKZHMBS21","bibbaseid":"konobeev-zaheer-hsu-mladenov-boutilier-szepesvri-metathompsonsampling","role":"author","urls":{" paper":"https://www.ualberta.ca/~szepesva/papers/ICML2021-MetaTS.pdf"," link":"http://proceedings.mlr.press/v139/kveton21a.html"},"metadata":{"authorlinks":{}},"html":""},"bibtype":"inproceedings","biburl":"https://www.ualberta.ca/~szepesva/papers/p2.bib","dataSources":["Ciq2jeFvPFYBCoxwJ","v2PxY4iCzrNyY9fhF"],"keywords":[],"search_terms":["meta","thompson","sampling","konobeev","zaheer","hsu","mladenov","boutilier","szepesvári"],"title":"Meta-Thompson Sampling","year":null}