Hierarchical Linearly-Solvable Markov Decision Problems. Jonsson, A. & Gómez, V. In
Hierarchical Linearly-Solvable Markov Decision Problems [link]Paper  abstract   bibtex   
We present a hierarchical reinforcement learning framework that formulates each task in the hierarchy as a special type of Markov decision process for which the Bellman equation is linear and has analytical solution. Problems of this type, called linearly-solvable MDPs (LMDPs) have interesting properties that can be exploited in a hierarchical setting, such as efficient learning of the optimal value function or task compositionality. The proposed hierarchical approach can also be seen as a novel alternative to solve LMDPs with large state spaces. We derive a hierarchical version of the so-called Z-learning algorithm that learns different tasks simultaneously and show empirically that it significantly outperforms the state-of-the-art learning methods in two classical HRL domains: the taxi domain and an autonomous guided vehicle task.
@inproceedings {icaps16-83,
    track    = {​Main Track},
    title    = {Hierarchical Linearly-Solvable Markov Decision Problems},
    url      = {http://www.aaai.org/ocs/index.php/ICAPS/ICAPS16/paper/view/13090},
    author   = {Anders Jonsson and  Vicenç Gómez},
    abstract = {We present a hierarchical reinforcement learning framework that formulates each task in the hierarchy as a special type of Markov decision process for which the Bellman equation is linear and has analytical solution. Problems of this type, called linearly-solvable MDPs (LMDPs) have interesting properties that can be exploited in a hierarchical setting, such as efficient learning of the optimal value function or task compositionality. The proposed hierarchical approach can also be seen as a novel alternative to solve LMDPs with large state spaces. We derive a hierarchical version of the so-called Z-learning algorithm that learns different tasks simultaneously and show empirically that it significantly outperforms the state-of-the-art learning methods in two classical HRL domains: the taxi domain and an autonomous guided vehicle task.},
    keywords = {Control and optimisation of dynamical systems,Probabilistic planning; MDPs and POMDPs,Planning under (non-probabilistic) uncertainty,Learning in planning and scheduling}
}
Downloads: 0