Regularized Policy Iteration. Farahmand, A., Ghavamzadeh, M., Szepesvári, C., & Mannor, S. In Advances in Neural Information Processing Systems, pages 441–448, 2008.
Regularized Policy Iteration [pdf]Link  Regularized Policy Iteration [pdf]Paper  abstract   bibtex   
In this paper we consider approximate policy-iteration-based reinforcement learning algorithms. In order to implement a flexible function approximation scheme we propose the use of non-parametric methods with regularization, providing a convenient way to control the complexity of the function approximator. We propose two novel regularized policy iteration algorithms by adding L2-regularization to two widely-used policy evaluation methods: Bellman residual minimization (BRM) and least-squares temporal difference learning (LSTD). We derive efficient implementation for our algorithms when the approximate value-functions belong to a reproducing kernel Hilbert space. We also provide finite-sample performance bounds for our algorithms and show that they are able to achieve optimal rates of convergence under the studied conditions.
@inproceedings{farahmand2008a,
	abstract = {In this paper we consider approximate policy-iteration-based reinforcement learning algorithms. In order to implement a flexible function approximation scheme we propose the use of non-parametric methods with regularization, providing a convenient way to control the complexity of the function approximator. We propose two novel regularized policy iteration algorithms by adding L2-regularization to two widely-used policy evaluation methods: Bellman residual minimization (BRM) and least-squares temporal difference learning (LSTD). We derive efficient implementation for our algorithms when the approximate value-functions belong to a reproducing kernel Hilbert space. We also provide finite-sample performance bounds for our algorithms and show that they are able to achieve optimal rates of convergence under the studied conditions.},
	acceptrate = {24\%},
	author = {Farahmand, A.m. and Ghavamzadeh, M. and Szepesv{\'a}ri, Cs. and Mannor, S.},
	bibsource = {DBLP, http://dblp.uni-trier.de},
	booktitle = {Advances in Neural Information Processing Systems},
	ee = {https://papers.neurips.cc/paper/3445-regularized-policy-iteration.pdf},
	keywords = {reinforcement learning, regularization, nonparametrics, theory, function approximation, policy iteration},
	pages = {441--448},
	title = {Regularized Policy Iteration},
	url_paper = {NeurIPS08-regrl.pdf},
	year = {2008}}
Downloads: 0