Actor-Critic Algorithms. Konda, V. R & Tsitsiklis, J. N
abstract   bibtex   
We propose and analyze a class of actor-critic algorithms for simulation-based optimization of a Markov decision process over a parameterized family of randomized stationary policies. These are two-time-scale algorithms in which the critic uses TD learning with a linear approximation architecture and the actor is updated in an approximate gradient direction based on information provided by the critic. We show that the features for the critic should span a subspace prescribed by the choice of parameterization of the actor. We conclude by discussing convergence properties and some open problems.
@article{konda_actor-critic_nodate,
	title = {Actor-{Critic} {Algorithms}},
	abstract = {We propose and analyze a class of actor-critic algorithms for simulation-based optimization of a Markov decision process over a parameterized family of randomized stationary policies. These are two-time-scale algorithms in which the critic uses TD learning with a linear approximation architecture and the actor is updated in an approximate gradient direction based on information provided by the critic. We show that the features for the critic should span a subspace prescribed by the choice of parameterization of the actor. We conclude by discussing convergence properties and some open problems.},
	language = {en},
	author = {Konda, Vijay R and Tsitsiklis, John N},
	keywords = {Reinforcement Learning},
	pages = {7}
}

Downloads: 0