Budgeted Reinforcement Learning in Continuous State Space. Carrara, N., Leurent, E., Laroche, R., Urvoy, T., Maillard, O., & Pietquin, O.
Paper abstract bibtex A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.
@article{carraraBudgetedReinforcementLearning2019,
archivePrefix = {arXiv},
eprinttype = {arxiv},
eprint = {1903.01004},
primaryClass = {cs, stat},
title = {Budgeted {{Reinforcement Learning}} in {{Continuous State Space}}},
url = {http://arxiv.org/abs/1903.01004},
abstract = {A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.},
urldate = {2019-06-05},
date = {2019-03-03},
keywords = {Statistics - Machine Learning,Computer Science - Artificial Intelligence,Computer Science - Machine Learning},
author = {Carrara, Nicolas and Leurent, Edouard and Laroche, Romain and Urvoy, Tanguy and Maillard, Odalric-Ambrym and Pietquin, Olivier},
file = {/home/dimitri/Nextcloud/Zotero/storage/Q97ZL24I/Carrara et al. - 2019 - Budgeted Reinforcement Learning in Continuous Stat.pdf;/home/dimitri/Nextcloud/Zotero/storage/MKNRYBMH/1903.html}
}
Downloads: 0
{"_id":"cGcvuGPA7TFh8dJan","bibbaseid":"carrara-leurent-laroche-urvoy-maillard-pietquin-budgetedreinforcementlearningincontinuousstatespace","authorIDs":[],"author_short":["Carrara, N.","Leurent, E.","Laroche, R.","Urvoy, T.","Maillard, O.","Pietquin, O."],"bibdata":{"bibtype":"article","type":"article","archiveprefix":"arXiv","eprinttype":"arxiv","eprint":"1903.01004","primaryclass":"cs, stat","title":"Budgeted Reinforcement Learning in Continuous State Space","url":"http://arxiv.org/abs/1903.01004","abstract":"A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.","urldate":"2019-06-05","date":"2019-03-03","keywords":"Statistics - Machine Learning,Computer Science - Artificial Intelligence,Computer Science - Machine Learning","author":[{"propositions":[],"lastnames":["Carrara"],"firstnames":["Nicolas"],"suffixes":[]},{"propositions":[],"lastnames":["Leurent"],"firstnames":["Edouard"],"suffixes":[]},{"propositions":[],"lastnames":["Laroche"],"firstnames":["Romain"],"suffixes":[]},{"propositions":[],"lastnames":["Urvoy"],"firstnames":["Tanguy"],"suffixes":[]},{"propositions":[],"lastnames":["Maillard"],"firstnames":["Odalric-Ambrym"],"suffixes":[]},{"propositions":[],"lastnames":["Pietquin"],"firstnames":["Olivier"],"suffixes":[]}],"file":"/home/dimitri/Nextcloud/Zotero/storage/Q97ZL24I/Carrara et al. - 2019 - Budgeted Reinforcement Learning in Continuous Stat.pdf;/home/dimitri/Nextcloud/Zotero/storage/MKNRYBMH/1903.html","bibtex":"@article{carraraBudgetedReinforcementLearning2019,\n archivePrefix = {arXiv},\n eprinttype = {arxiv},\n eprint = {1903.01004},\n primaryClass = {cs, stat},\n title = {Budgeted {{Reinforcement Learning}} in {{Continuous State Space}}},\n url = {http://arxiv.org/abs/1903.01004},\n abstract = {A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.},\n urldate = {2019-06-05},\n date = {2019-03-03},\n keywords = {Statistics - Machine Learning,Computer Science - Artificial Intelligence,Computer Science - Machine Learning},\n author = {Carrara, Nicolas and Leurent, Edouard and Laroche, Romain and Urvoy, Tanguy and Maillard, Odalric-Ambrym and Pietquin, Olivier},\n file = {/home/dimitri/Nextcloud/Zotero/storage/Q97ZL24I/Carrara et al. - 2019 - Budgeted Reinforcement Learning in Continuous Stat.pdf;/home/dimitri/Nextcloud/Zotero/storage/MKNRYBMH/1903.html}\n}\n\n","author_short":["Carrara, N.","Leurent, E.","Laroche, R.","Urvoy, T.","Maillard, O.","Pietquin, O."],"key":"carraraBudgetedReinforcementLearning2019","id":"carraraBudgetedReinforcementLearning2019","bibbaseid":"carrara-leurent-laroche-urvoy-maillard-pietquin-budgetedreinforcementlearningincontinuousstatespace","role":"author","urls":{"Paper":"http://arxiv.org/abs/1903.01004"},"keyword":["Statistics - Machine Learning","Computer Science - Artificial Intelligence","Computer Science - Machine Learning"],"downloads":0},"bibtype":"article","biburl":"https://raw.githubusercontent.com/dlozeve/newblog/master/bib/all.bib","creationDate":"2020-01-08T20:39:39.373Z","downloads":0,"keywords":["statistics - machine learning","computer science - artificial intelligence","computer science - machine learning"],"search_terms":["budgeted","reinforcement","learning","continuous","state","space","carrara","leurent","laroche","urvoy","maillard","pietquin"],"title":"Budgeted Reinforcement Learning in Continuous State Space","year":null,"dataSources":["3XqdvqRE7zuX4cm8m"]}