Near-optimal Regret Bounds for Reinforcement Learning. Jaksch, T., Ortner, R., & Auer, P. Journal of Machine Learning Research, 11(Apr):1563–1600, 2010.
abstract   bibtex   
Abstract For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has.
@Article{Jaksch2010,
author = {Jaksch, Thomas and Ortner, Ronald and Auer, Peter}, 
title = {Near-optimal Regret Bounds for Reinforcement Learning}, 
journal = {Journal of Machine Learning Research}, 
volume = {11}, 
number = {Apr}, 
pages = {1563--1600}, 
year = {2010}, 
abstract = {Abstract For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has.}, 
location = {}, 
keywords = {}}

Downloads: 0