Distributional reinforcement learning with linear function approximation

Distributional reinforcement learning with linear function approximation. Bellemare, M. G., Roux, N. L., Castro, P. S., & Moitra, S. February, 2019. arXiv:1902.03149 [cs, stat]

Paper abstract bibtex

Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited. One exception is Rowland et al. (2018)’s analysis of the C51 algorithm in terms of the Cram´er distance, but their results only apply to the tabular setting and ignore C51’s use of a softmax to produce normalized distributions. In this paper we adapt the Crame´r distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cram´erbased and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model’s prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the ﬁrst proof of convergence of a distributional algorithm combined with function approximation. Perhaps surprisingly, our results provide evidence that Crame´r-based distributional methods may perform worse than directly approximating the value function.

@misc{bellemare_distributional_2019,
	title = {Distributional reinforcement learning with linear function approximation},
	url = {http://arxiv.org/abs/1902.03149},
	abstract = {Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited. One exception is Rowland et al. (2018)’s analysis of the C51 algorithm in terms of the Cram´er distance, but their results only apply to the tabular setting and ignore C51’s use of a softmax to produce normalized distributions. In this paper we adapt the Crame´r distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cram´erbased and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model’s prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the ﬁrst proof of convergence of a distributional algorithm combined with function approximation. Perhaps surprisingly, our results provide evidence that Crame´r-based distributional methods may perform worse than directly approximating the value function.},
	language = {en},
	urldate = {2023-10-13},
	publisher = {arXiv},
	author = {Bellemare, Marc G. and Roux, Nicolas Le and Castro, Pablo Samuel and Moitra, Subhodeep},
	month = feb,
	year = {2019},
	note = {arXiv:1902.03149 [cs, stat]},
	keywords = {Computer Science - Machine Learning, Statistics - Machine Learning},
}

Downloads: 0

{"_id":"5mu5AYo3FhGissYY3","bibbaseid":"bellemare-roux-castro-moitra-distributionalreinforcementlearningwithlinearfunctionapproximation-2019","author_short":["Bellemare, M. G.","Roux, N. L.","Castro, P. S.","Moitra, S."],"bibdata":{"bibtype":"misc","type":"misc","title":"Distributional reinforcement learning with linear function approximation","url":"http://arxiv.org/abs/1902.03149","abstract":"Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited. One exception is Rowland et al. (2018)’s analysis of the C51 algorithm in terms of the Cram´er distance, but their results only apply to the tabular setting and ignore C51’s use of a softmax to produce normalized distributions. In this paper we adapt the Crame´r distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cram´erbased and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model’s prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the ﬁrst proof of convergence of a distributional algorithm combined with function approximation. Perhaps surprisingly, our results provide evidence that Crame´r-based distributional methods may perform worse than directly approximating the value function.","language":"en","urldate":"2023-10-13","publisher":"arXiv","author":[{"propositions":[],"lastnames":["Bellemare"],"firstnames":["Marc","G."],"suffixes":[]},{"propositions":[],"lastnames":["Roux"],"firstnames":["Nicolas","Le"],"suffixes":[]},{"propositions":[],"lastnames":["Castro"],"firstnames":["Pablo","Samuel"],"suffixes":[]},{"propositions":[],"lastnames":["Moitra"],"firstnames":["Subhodeep"],"suffixes":[]}],"month":"February","year":"2019","note":"arXiv:1902.03149 [cs, stat]","keywords":"Computer Science - Machine Learning, Statistics - Machine Learning","bibtex":"@misc{bellemare_distributional_2019,\n\ttitle = {Distributional reinforcement learning with linear function approximation},\n\turl = {http://arxiv.org/abs/1902.03149},\n\tabstract = {Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited. One exception is Rowland et al. (2018)’s analysis of the C51 algorithm in terms of the Cram´er distance, but their results only apply to the tabular setting and ignore C51’s use of a softmax to produce normalized distributions. In this paper we adapt the Crame´r distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cram´erbased and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model’s prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the ﬁrst proof of convergence of a distributional algorithm combined with function approximation. Perhaps surprisingly, our results provide evidence that Crame´r-based distributional methods may perform worse than directly approximating the value function.},\n\tlanguage = {en},\n\turldate = {2023-10-13},\n\tpublisher = {arXiv},\n\tauthor = {Bellemare, Marc G. and Roux, Nicolas Le and Castro, Pablo Samuel and Moitra, Subhodeep},\n\tmonth = feb,\n\tyear = {2019},\n\tnote = {arXiv:1902.03149 [cs, stat]},\n\tkeywords = {Computer Science - Machine Learning, Statistics - Machine Learning},\n}\n\n","author_short":["Bellemare, M. G.","Roux, N. L.","Castro, P. S.","Moitra, S."],"key":"bellemare_distributional_2019","id":"bellemare_distributional_2019","bibbaseid":"bellemare-roux-castro-moitra-distributionalreinforcementlearningwithlinearfunctionapproximation-2019","role":"author","urls":{"Paper":"http://arxiv.org/abs/1902.03149"},"keyword":["Computer Science - Machine Learning","Statistics - Machine Learning"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"misc","biburl":"https://bibbase.org/zotero/victorjhu","dataSources":["CmHEoydhafhbkXXt5"],"keywords":["computer science - machine learning","statistics - machine learning"],"search_terms":["distributional","reinforcement","learning","linear","function","approximation","bellemare","roux","castro","moitra"],"title":"Distributional reinforcement learning with linear function approximation","year":2019}