Parametric Bandits: The Generalized Linear Case (extended version). Filippi, S., Cappé, O., Garivier, A., & Szepesvári, C. In Advances in Neural Information Processing Systems, pages 586–594, 12, 2010.
Parametric Bandits: The Generalized Linear Case (extended version) [pdf]Paper  abstract   bibtex   17 downloads  
We consider structured multi-armed bandit problems based on the Generalized Linear Model (GLM) framework of statistics. For these bandits, we propose a new algorithm, called GLM-UCB. We derive finite time, high probability bounds on the regret of the algorithm, extending previous analyses developed for the linear bandits to the non-linear case. The analysis highlights a key difficulty in generalizing linear bandit algorithms to the non-linear case, which is solved in GLM-UCB by focusing on the reward space rather than on the parameter space. Moreover, as the actual effectiveness of current parameterized bandit algorithms is often poor in practice, we provide a tuning method based on asymptotic arguments, which leads to significantly better practical performance. We present two numerical experiments on real-world data that illustrate the potential of the GLM-UCB approach.

Downloads: 17