Dual Formulations for Optimizing Dec-POMDP Controllers

Dual Formulations for Optimizing Dec-POMDP Controllers. Kumar, A., Mostafa, H., & Zilberstein, S. In

The decentralized POMDP is an expressive model for multiagent sequential decision making. Finite-state controllers (FSCs)—often used to represent policies for infinite-horizon problems—offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We further show that the dual formulation can be exploited within the Expectation Maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message-passing over the Dec-POMDP DBN. We also contribute towards developing efficient techniques for policy improvement by iteratively adding nodes to the FSCs. Compared with state-of-the-art FSC methods, our approach offers more than an order-of-magnitude speedup, while producing similar or better solutions.

@inproceedings {icaps16-94,
    track    = {Main Track},
    title    = {Dual Formulations for Optimizing Dec-POMDP Controllers},
    url      = {http://www.aaai.org/ocs/index.php/ICAPS/ICAPS16/paper/view/13124},
    author   = {Akshat Kumar and  Hala Mostafa and  Shlomo Zilberstein},
    abstract = {The decentralized POMDP is an expressive model for multiagent sequential decision making. Finite-state controllers (FSCs)---often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs.  Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We further show that the dual formulation can be exploited within the Expectation Maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message-passing over the Dec-POMDP DBN. We also contribute towards developing efficient techniques for policy improvement by iteratively adding nodes to the FSCs. Compared with state-of-the-art FSC methods, our approach offers more than an order-of-magnitude speedup, while producing similar or better solutions.},
    keywords = {Probabilistic planning; MDPs and POMDPs,Distributed and multi-agent planning}
}

Downloads: 0

{"_id":"no9X8fzkMA2jkfNoF","bibbaseid":"kumar-mostafa-zilberstein-dualformulationsforoptimizingdecpomdpcontrollers","downloads":0,"creationDate":"2016-03-09T03:04:33.008Z","title":"Dual Formulations for Optimizing Dec-POMDP Controllers","author_short":["Kumar, A.","Mostafa, H.","Zilberstein, S."],"year":null,"bibtype":"inproceedings","biburl":"icaps16.icaps-conference.org/papers.bib","bibdata":{"bibtype":"inproceedings","type":"inproceedings","track":"Main Track","title":"Dual Formulations for Optimizing Dec-POMDP Controllers","url":"http://www.aaai.org/ocs/index.php/ICAPS/ICAPS16/paper/view/13124","author":[{"firstnames":["Akshat"],"propositions":[],"lastnames":["Kumar"],"suffixes":[]},{"firstnames":["Hala"],"propositions":[],"lastnames":["Mostafa"],"suffixes":[]},{"firstnames":["Shlomo"],"propositions":[],"lastnames":["Zilberstein"],"suffixes":[]}],"abstract":"The decentralized POMDP is an expressive model for multiagent sequential decision making. Finite-state controllers (FSCs)—often used to represent policies for infinite-horizon problems—offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We further show that the dual formulation can be exploited within the Expectation Maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message-passing over the Dec-POMDP DBN. We also contribute towards developing efficient techniques for policy improvement by iteratively adding nodes to the FSCs. Compared with state-of-the-art FSC methods, our approach offers more than an order-of-magnitude speedup, while producing similar or better solutions.","keywords":"Probabilistic planning; MDPs and POMDPs,Distributed and multi-agent planning","bibtex":"@inproceedings {icaps16-94,\r\n track = {Main Track},\r\n title = {Dual Formulations for Optimizing Dec-POMDP Controllers},\r\n url = {http://www.aaai.org/ocs/index.php/ICAPS/ICAPS16/paper/view/13124},\r\n author = {Akshat Kumar and Hala Mostafa and Shlomo Zilberstein},\r\n abstract = {The decentralized POMDP is an expressive model for multiagent sequential decision making. Finite-state controllers (FSCs)---often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We further show that the dual formulation can be exploited within the Expectation Maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message-passing over the Dec-POMDP DBN. We also contribute towards developing efficient techniques for policy improvement by iteratively adding nodes to the FSCs. Compared with state-of-the-art FSC methods, our approach offers more than an order-of-magnitude speedup, while producing similar or better solutions.},\r\n keywords = {Probabilistic planning; MDPs and POMDPs,Distributed and multi-agent planning}\r\n}\r\n\r\n","author_short":["Kumar, A.","Mostafa, H.","Zilberstein, S."],"key":"icaps16-94","id":"icaps16-94","bibbaseid":"kumar-mostafa-zilberstein-dualformulationsforoptimizingdecpomdpcontrollers","role":"author","urls":{"Paper":"http://www.aaai.org/ocs/index.php/ICAPS/ICAPS16/paper/view/13124"},"keyword":["Probabilistic planning; MDPs and POMDPs","Distributed and multi-agent planning"],"metadata":{"authorlinks":{}},"downloads":0,"html":""},"search_terms":["dual","formulations","optimizing","dec","pomdp","controllers","kumar","mostafa","zilberstein"],"keywords":["probabilistic planning; mdps and pomdps","distributed and multi-agent planning"],"authorIDs":[],"dataSources":["iMkx859KiXcegwsin","EZtZjCTnxcdTTyeij"]}