Randomized Exploration in Generalized Linear Bandits. Kveton, B., Zaheer, M., Szepesvári, C., Li, L., Ghavamzadeh, M., & Boutilier, C. In AISTATS, 03, 2020.
Randomized Exploration in Generalized Linear Bandits [link]Url  Randomized Exploration in Generalized Linear Bandits [pdf]Paper  abstract   bibtex   
We study two randomized algorithms for generalized linear bandits, GLM-TS and GLM-FPL. In GLM-TS, a perturbed generalized linear model (GLM) is sampled from the Laplace approximation to the posterior distribution of the GLM. In GLM-FPL, the GLM is trained on perturbed past rewards. We bound the $n$-round regret of both algorithms and show that it is $Õ(d \sqrt{n})$, where $d$ is the number of features. This is the first such analysis of randomized algorithms in GLM bandits. We evaluate both algorithms on logistic bandits and apply GLM-FPL to neural network models. Both algorithms perform well empirically. Our results showcase the role of randomization, not necessarily by posterior sampling, in exploration.

Downloads: 0