End-to-End Policy Gradient Method for POMDPs and Explainable Agents

End-to-End Policy Gradient Method for POMDPs and Explainable Agents. Nishimori, S., Koyamada, S., & Ishii, S. April, 2023. arXiv:2304.09769 [cs]

Paper abstract bibtex

Real-world decision-making problems are often partially observable, and many can be formulated as a Partially Observable Markov Decision Process (POMDP). When we apply reinforcement learning (RL) algorithms to the POMDP, reasonable estimation of the hidden states can help solve the problems. Furthermore, explainable decision-making is preferable, considering their application to realworld tasks such as autonomous driving cars. We proposed an RL algorithm that estimates the hidden states by end-to-end training, and visualize the estimation as a state-transition graph. Experimental results demonstrated that the proposed algorithm can solve simple POMDP problems and that the visualization makes the agent’s behavior interpretable to humans.

@misc{nishimori_end--end_2023,
	title = {End-to-{End} {Policy} {Gradient} {Method} for {POMDPs} and {Explainable} {Agents}},
	url = {http://arxiv.org/abs/2304.09769},
	abstract = {Real-world decision-making problems are often partially observable, and many can be formulated as a Partially Observable Markov Decision Process (POMDP). When we apply reinforcement learning (RL) algorithms to the POMDP, reasonable estimation of the hidden states can help solve the problems. Furthermore, explainable decision-making is preferable, considering their application to realworld tasks such as autonomous driving cars. We proposed an RL algorithm that estimates the hidden states by end-to-end training, and visualize the estimation as a state-transition graph. Experimental results demonstrated that the proposed algorithm can solve simple POMDP problems and that the visualization makes the agent’s behavior interpretable to humans.},
	language = {en},
	urldate = {2023-04-24},
	publisher = {arXiv},
	author = {Nishimori, Soichiro and Koyamada, Sotetsu and Ishii, Shin},
	month = apr,
	year = {2023},
	note = {arXiv:2304.09769 [cs]},
	keywords = {Computer Science - Artificial Intelligence},
}

Downloads: 0

{"_id":"uiZGrtMpW8GGrijh5","bibbaseid":"nishimori-koyamada-ishii-endtoendpolicygradientmethodforpomdpsandexplainableagents-2023","author_short":["Nishimori, S.","Koyamada, S.","Ishii, S."],"bibdata":{"bibtype":"misc","type":"misc","title":"End-to-End Policy Gradient Method for POMDPs and Explainable Agents","url":"http://arxiv.org/abs/2304.09769","abstract":"Real-world decision-making problems are often partially observable, and many can be formulated as a Partially Observable Markov Decision Process (POMDP). When we apply reinforcement learning (RL) algorithms to the POMDP, reasonable estimation of the hidden states can help solve the problems. Furthermore, explainable decision-making is preferable, considering their application to realworld tasks such as autonomous driving cars. We proposed an RL algorithm that estimates the hidden states by end-to-end training, and visualize the estimation as a state-transition graph. Experimental results demonstrated that the proposed algorithm can solve simple POMDP problems and that the visualization makes the agent’s behavior interpretable to humans.","language":"en","urldate":"2023-04-24","publisher":"arXiv","author":[{"propositions":[],"lastnames":["Nishimori"],"firstnames":["Soichiro"],"suffixes":[]},{"propositions":[],"lastnames":["Koyamada"],"firstnames":["Sotetsu"],"suffixes":[]},{"propositions":[],"lastnames":["Ishii"],"firstnames":["Shin"],"suffixes":[]}],"month":"April","year":"2023","note":"arXiv:2304.09769 [cs]","keywords":"Computer Science - Artificial Intelligence","bibtex":"@misc{nishimori_end--end_2023,\n\ttitle = {End-to-{End} {Policy} {Gradient} {Method} for {POMDPs} and {Explainable} {Agents}},\n\turl = {http://arxiv.org/abs/2304.09769},\n\tabstract = {Real-world decision-making problems are often partially observable, and many can be formulated as a Partially Observable Markov Decision Process (POMDP). When we apply reinforcement learning (RL) algorithms to the POMDP, reasonable estimation of the hidden states can help solve the problems. Furthermore, explainable decision-making is preferable, considering their application to realworld tasks such as autonomous driving cars. We proposed an RL algorithm that estimates the hidden states by end-to-end training, and visualize the estimation as a state-transition graph. Experimental results demonstrated that the proposed algorithm can solve simple POMDP problems and that the visualization makes the agent’s behavior interpretable to humans.},\n\tlanguage = {en},\n\turldate = {2023-04-24},\n\tpublisher = {arXiv},\n\tauthor = {Nishimori, Soichiro and Koyamada, Sotetsu and Ishii, Shin},\n\tmonth = apr,\n\tyear = {2023},\n\tnote = {arXiv:2304.09769 [cs]},\n\tkeywords = {Computer Science - Artificial Intelligence},\n}\n\n\n\n\n\n\n\n","author_short":["Nishimori, S.","Koyamada, S.","Ishii, S."],"key":"nishimori_end--end_2023","id":"nishimori_end--end_2023","bibbaseid":"nishimori-koyamada-ishii-endtoendpolicygradientmethodforpomdpsandexplainableagents-2023","role":"author","urls":{"Paper":"http://arxiv.org/abs/2304.09769"},"keyword":["Computer Science - Artificial Intelligence"],"metadata":{"authorlinks":{}},"downloads":0,"html":""},"bibtype":"misc","biburl":"https://bibbase.org/zotero/alukina","dataSources":["Cfgnp5s4HQSBd8tAf"],"keywords":["computer science - artificial intelligence"],"search_terms":["end","end","policy","gradient","method","pomdps","explainable","agents","nishimori","koyamada","ishii"],"title":"End-to-End Policy Gradient Method for POMDPs and Explainable Agents","year":2023}