Towards Minimax Policies for Online Linear Optimization with Bandit Feedback

Towards Minimax Policies for Online Linear Optimization with Bandit Feedback. Bubeck, S., Cesa-Bianchi, N., & Kakade, S. In Shie, M., Srebro, N., & Williamson, R., editors, Proceedings of the 25th Annual Conference on Learning Theory (COLT), volume PMLR 23, pages 41.1–41.14, 2012.

Paper bibtex

@inproceedings{2012bubecltowards,
  title={Towards Minimax Policies for Online Linear Optimization with Bandit Feedback},
  author={Bubeck, S{\'e}bastien and Cesa-Bianchi, Nicolo and Kakade, Sham},
  booktitle={Proceedings of the 25th Annual Conference on Learning Theory (COLT)},
  volume={PMLR 23},
  pages={41.1--41.14},
  year={2012},
  editor={Mannor Shie and Srebro, Nathan and Williamson, Robert},
  %http://proceedings.mlr.press/v23/bubeck12a.html
  url_Paper={https://arxiv.org/pdf/1202.3079.pdf}
}

Downloads: 0