Budgeted Reinforcement Learning in Continuous State Space. Carrara, N., Leurent, E., Laroche, R., Urvoy, T., Maillard, O., & Pietquin, O.
Budgeted Reinforcement Learning in Continuous State Space [link]Paper  abstract   bibtex   
A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.
@article{carraraBudgetedReinforcementLearning2019,
  archivePrefix = {arXiv},
  eprinttype = {arxiv},
  eprint = {1903.01004},
  primaryClass = {cs, stat},
  title = {Budgeted {{Reinforcement Learning}} in {{Continuous State Space}}},
  url = {http://arxiv.org/abs/1903.01004},
  abstract = {A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.},
  urldate = {2019-06-05},
  date = {2019-03-03},
  keywords = {Statistics - Machine Learning,Computer Science - Artificial Intelligence,Computer Science - Machine Learning},
  author = {Carrara, Nicolas and Leurent, Edouard and Laroche, Romain and Urvoy, Tanguy and Maillard, Odalric-Ambrym and Pietquin, Olivier},
  file = {/home/dimitri/Nextcloud/Zotero/storage/Q97ZL24I/Carrara et al. - 2019 - Budgeted Reinforcement Learning in Continuous Stat.pdf;/home/dimitri/Nextcloud/Zotero/storage/MKNRYBMH/1903.html}
}

Downloads: 0