Asymptotically Optimal Information-Directed Sampling

Asymptotically Optimal Information-Directed Sampling. Kirschner, J., Lattimore, T., Vernade, C., & Szepesvári, C. In pages 2777–2821.

Paper

Asymptotically Optimal Information-Directed Sampling [link]

Link abstract bibtex

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines the asymptotic lower bound. Our analysis sheds light on how IDS balances the trade-off between regret and information and uncovers a surprising connection between the recently proposed primal-dual methods and the IDS algorithm. We demonstrate empirically that IDS is competitive with UCB in finite-time, and can be significantly better in the asymptotic regime.

@InProceedings{pmlr-v134-kirschner21a,
  title =  {Asymptotically Optimal Information-Directed Sampling},
  author = {Kirschner, Johannes and Lattimore, Tor and Vernade, Claire and Szepesv{\'a}ri, Csaba},
  pages =  {2777--2821},
  url_paper =    {COLT2021-kirschner21a.pdf},
  url_link =    {http://proceedings.mlr.press/v134/kirschner21a.html},
  abstract = {We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines the asymptotic lower bound. Our analysis sheds light on how IDS balances the trade-off between regret and information and uncovers a surprising connection between the recently proposed primal-dual methods and the IDS algorithm. We demonstrate empirically that IDS is competitive with UCB in finite-time, and can be significantly better in the asymptotic regime.},
  crossref  = {COLT2021},
}

Downloads: 0

{"_id":"P56eQZ6zKxknPPsWo","bibbaseid":"kirschner-lattimore-vernade-szepesvri-asymptoticallyoptimalinformationdirectedsampling","author_short":["Kirschner, J.","Lattimore, T.","Vernade, C.","Szepesvári, C."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","title":"Asymptotically Optimal Information-Directed Sampling","author":[{"propositions":[],"lastnames":["Kirschner"],"firstnames":["Johannes"],"suffixes":[]},{"propositions":[],"lastnames":["Lattimore"],"firstnames":["Tor"],"suffixes":[]},{"propositions":[],"lastnames":["Vernade"],"firstnames":["Claire"],"suffixes":[]},{"propositions":[],"lastnames":["Szepesvári"],"firstnames":["Csaba"],"suffixes":[]}],"pages":"2777–2821","url_paper":"COLT2021-kirschner21a.pdf","url_link":"http://proceedings.mlr.press/v134/kirschner21a.html","abstract":"We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines the asymptotic lower bound. Our analysis sheds light on how IDS balances the trade-off between regret and information and uncovers a surprising connection between the recently proposed primal-dual methods and the IDS algorithm. We demonstrate empirically that IDS is competitive with UCB in finite-time, and can be significantly better in the asymptotic regime.","crossref":"COLT2021","bibtex":"@InProceedings{pmlr-v134-kirschner21a,\n title = {Asymptotically Optimal Information-Directed Sampling},\n author = {Kirschner, Johannes and Lattimore, Tor and Vernade, Claire and Szepesv{\\'a}ri, Csaba},\n pages = {2777--2821},\n url_paper = {COLT2021-kirschner21a.pdf},\n url_link = {http://proceedings.mlr.press/v134/kirschner21a.html},\n abstract = {We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines the asymptotic lower bound. Our analysis sheds light on how IDS balances the trade-off between regret and information and uncovers a surprising connection between the recently proposed primal-dual methods and the IDS algorithm. We demonstrate empirically that IDS is competitive with UCB in finite-time, and can be significantly better in the asymptotic regime.},\n crossref = {COLT2021},\n}\n\n\n","author_short":["Kirschner, J.","Lattimore, T.","Vernade, C.","Szepesvári, C."],"key":"pmlr-v134-kirschner21a","id":"pmlr-v134-kirschner21a","bibbaseid":"kirschner-lattimore-vernade-szepesvri-asymptoticallyoptimalinformationdirectedsampling","role":"author","urls":{" paper":"https://www.ualberta.ca/~szepesva/papers/COLT2021-kirschner21a.pdf"," link":"http://proceedings.mlr.press/v134/kirschner21a.html"},"metadata":{"authorlinks":{}},"html":""},"bibtype":"inproceedings","biburl":"https://www.ualberta.ca/~szepesva/papers/p2.bib","dataSources":["Ciq2jeFvPFYBCoxwJ","v2PxY4iCzrNyY9fhF"],"keywords":[],"search_terms":["asymptotically","optimal","information","directed","sampling","kirschner","lattimore","vernade","szepesvári"],"title":"Asymptotically Optimal Information-Directed Sampling","year":null}