Dual Formulations for Optimizing Dec-POMDP Controllers. Kumar, A., Mostafa, H., & Zilberstein, S. In
Dual Formulations for Optimizing Dec-POMDP Controllers [link]Paper  abstract   bibtex   
The decentralized POMDP is an expressive model for multiagent sequential decision making. Finite-state controllers (FSCs)—often used to represent policies for infinite-horizon problems—offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We further show that the dual formulation can be exploited within the Expectation Maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message-passing over the Dec-POMDP DBN. We also contribute towards developing efficient techniques for policy improvement by iteratively adding nodes to the FSCs. Compared with state-of-the-art FSC methods, our approach offers more than an order-of-magnitude speedup, while producing similar or better solutions.
@inproceedings {icaps16-94,
    track    = {​Main Track},
    title    = {Dual Formulations for Optimizing Dec-POMDP Controllers},
    url      = {http://www.aaai.org/ocs/index.php/ICAPS/ICAPS16/paper/view/13124},
    author   = {Akshat Kumar and  Hala Mostafa and  Shlomo Zilberstein},
    abstract = {The decentralized POMDP is an expressive model for multiagent sequential decision making. Finite-state controllers (FSCs)---often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs.  Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We further show that the dual formulation can be exploited within the Expectation Maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message-passing over the Dec-POMDP DBN. We also contribute towards developing efficient techniques for policy improvement by iteratively adding nodes to the FSCs. Compared with state-of-the-art FSC methods, our approach offers more than an order-of-magnitude speedup, while producing similar or better solutions.},
    keywords = {Probabilistic planning; MDPs and POMDPs,Distributed and multi-agent planning}
}
Downloads: 0