Regularized Policy Iteration. Farahmand, A., Ghavamzadeh, M., Szepesvári, C., & Mannor, S. In Advances in Neural Information Processing Systems, pages 441–448, 2008.
Regularized Policy Iteration [pdf]Link  Regularized Policy Iteration [pdf]Paper  abstract   bibtex   
In this paper we consider approximate policy-iteration-based reinforcement learning algorithms. In order to implement a flexible function approximation scheme we propose the use of non-parametric methods with regularization, providing a convenient way to control the complexity of the function approximator. We propose two novel regularized policy iteration algorithms by adding L2-regularization to two widely-used policy evaluation methods: Bellman residual minimization (BRM) and least-squares temporal difference learning (LSTD). We derive efficient implementation for our algorithms when the approximate value-functions belong to a reproducing kernel Hilbert space. We also provide finite-sample performance bounds for our algorithms and show that they are able to achieve optimal rates of convergence under the studied conditions.

Downloads: 0