Randomized Exploration in Generalized Linear Bandits

Randomized Exploration in Generalized Linear Bandits. Kveton, B., Zaheer, M., Szepesvári, C., Li, L., Ghavamzadeh, M., & Boutilier, C. In AISTATS, 03, 2020.

Url

Randomized Exploration in Generalized Linear Bandits [pdf]

Paper abstract bibtex

We study two randomized algorithms for generalized linear bandits, GLM-TS and GLM-FPL. In GLM-TS, a perturbed generalized linear model (GLM) is sampled from the Laplace approximation to the posterior distribution of the GLM. In GLM-FPL, the GLM is trained on perturbed past rewards. We bound the $n$-round regret of both algorithms and show that it is $Õ(d \sqrt{n})$, where $d$ is the number of features. This is the first such analysis of randomized algorithms in GLM bandits. We evaluate both algorithms on logistic bandits and apply GLM-FPL to neural network models. Both algorithms perform well empirically. Our results showcase the role of randomization, not necessarily by posterior sampling, in exploration.

@inproceedings{KZSZLGB20,
	author = {Kveton, B. and Zaheer, M. and Szepesv{\'a}ri, Cs. and Li, L. and Ghavamzadeh, M. and Boutilier, C.},
	booktitle = {AISTATS},
	keywords = {stochastic bandits, finite-armed bandits, randomization, generalized linear bandit, follow-the-perturbed-leader, Thompson sampling, exploration},
	month = {03},
	title = {Randomized Exploration in Generalized Linear Bandits},
	url_url = {https://arxiv.org/abs/1906.08947},
	url_paper = {AISTATS2020-GLB.pdf},
    acceptrate = {acceptance rate: 23\%},
	year = {2020},
	abstract = {
We study two randomized algorithms for generalized linear bandits, GLM-TS and GLM-FPL. In GLM-TS, a perturbed generalized linear model (GLM) is sampled from the Laplace approximation to the posterior distribution of the GLM. In GLM-FPL, the GLM is trained on perturbed past rewards. We bound the $n$-round regret of both algorithms and show that it is $\tilde{O}(d \sqrt{n})$, where $d$ is the number of features. This is the first such analysis of randomized algorithms in GLM bandits. We evaluate both algorithms on logistic bandits and apply GLM-FPL to neural network models. Both algorithms perform well empirically. Our results showcase the role of randomization, not necessarily by posterior sampling, in exploration.},
}

Downloads: 0

{"_id":"cZzMtRN6fS5giGptj","bibbaseid":"kveton-zaheer-szepesvri-li-ghavamzadeh-boutilier-randomizedexplorationingeneralizedlinearbandits-2020","author_short":["Kveton, B.","Zaheer, M.","Szepesvári, C.","Li, L.","Ghavamzadeh, M.","Boutilier, C."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"propositions":[],"lastnames":["Kveton"],"firstnames":["B."],"suffixes":[]},{"propositions":[],"lastnames":["Zaheer"],"firstnames":["M."],"suffixes":[]},{"propositions":[],"lastnames":["Szepesvári"],"firstnames":["Cs."],"suffixes":[]},{"propositions":[],"lastnames":["Li"],"firstnames":["L."],"suffixes":[]},{"propositions":[],"lastnames":["Ghavamzadeh"],"firstnames":["M."],"suffixes":[]},{"propositions":[],"lastnames":["Boutilier"],"firstnames":["C."],"suffixes":[]}],"booktitle":"AISTATS","keywords":"stochastic bandits, finite-armed bandits, randomization, generalized linear bandit, follow-the-perturbed-leader, Thompson sampling, exploration","month":"03","title":"Randomized Exploration in Generalized Linear Bandits","url_url":"https://arxiv.org/abs/1906.08947","url_paper":"AISTATS2020-GLB.pdf","acceptrate":"acceptance rate: 23%","year":"2020","abstract":"We study two randomized algorithms for generalized linear bandits, GLM-TS and GLM-FPL. In GLM-TS, a perturbed generalized linear model (GLM) is sampled from the Laplace approximation to the posterior distribution of the GLM. In GLM-FPL, the GLM is trained on perturbed past rewards. We bound the $n$-round regret of both algorithms and show that it is $Õ(d \\sqrt{n})$, where $d$ is the number of features. This is the first such analysis of randomized algorithms in GLM bandits. We evaluate both algorithms on logistic bandits and apply GLM-FPL to neural network models. Both algorithms perform well empirically. Our results showcase the role of randomization, not necessarily by posterior sampling, in exploration.","bibtex":"@inproceedings{KZSZLGB20,\n\tauthor = {Kveton, B. and Zaheer, M. and Szepesv{\\'a}ri, Cs. and Li, L. and Ghavamzadeh, M. and Boutilier, C.},\n\tbooktitle = {AISTATS},\n\tkeywords = {stochastic bandits, finite-armed bandits, randomization, generalized linear bandit, follow-the-perturbed-leader, Thompson sampling, exploration},\n\tmonth = {03},\n\ttitle = {Randomized Exploration in Generalized Linear Bandits},\n\turl_url = {https://arxiv.org/abs/1906.08947},\n\turl_paper = {AISTATS2020-GLB.pdf},\n acceptrate = {acceptance rate: 23\\%},\n\tyear = {2020},\n\tabstract = {\nWe study two randomized algorithms for generalized linear bandits, GLM-TS and GLM-FPL. In GLM-TS, a perturbed generalized linear model (GLM) is sampled from the Laplace approximation to the posterior distribution of the GLM. In GLM-FPL, the GLM is trained on perturbed past rewards. We bound the $n$-round regret of both algorithms and show that it is $\\tilde{O}(d \\sqrt{n})$, where $d$ is the number of features. This is the first such analysis of randomized algorithms in GLM bandits. We evaluate both algorithms on logistic bandits and apply GLM-FPL to neural network models. Both algorithms perform well empirically. Our results showcase the role of randomization, not necessarily by posterior sampling, in exploration.},\n}\n\n","author_short":["Kveton, B.","Zaheer, M.","Szepesvári, C.","Li, L.","Ghavamzadeh, M.","Boutilier, C."],"key":"KZSZLGB20","id":"KZSZLGB20","bibbaseid":"kveton-zaheer-szepesvri-li-ghavamzadeh-boutilier-randomizedexplorationingeneralizedlinearbandits-2020","role":"author","urls":{" url":"https://arxiv.org/abs/1906.08947"," paper":"https://www.ualberta.ca/~szepesva/papers/AISTATS2020-GLB.pdf"},"keyword":["stochastic bandits","finite-armed bandits","randomization","generalized linear bandit","follow-the-perturbed-leader","Thompson sampling","exploration"],"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://www.ualberta.ca/~szepesva/papers/p2.bib","dataSources":["cd5AYQRw3RHjTgoQc","dYMomj4Jofy8t4qmm"],"keywords":["stochastic bandits","finite-armed bandits","randomization","generalized linear bandit","follow-the-perturbed-leader","thompson sampling","exploration"],"search_terms":["randomized","exploration","generalized","linear","bandits","kveton","zaheer","szepesvári","li","ghavamzadeh","boutilier"],"title":"Randomized Exploration in Generalized Linear Bandits","year":2020}