Tractable Reinforcement Learning for Signal Temporal Logic Tasks With Counterfactual Experience Replay

Tractable Reinforcement Learning for Signal Temporal Logic Tasks With Counterfactual Experience Replay. Wang, S., Yin, X., Li, S., & Yin, X. IEEE Control Systems Letters, 8:616–621, 2024.

Paper doi abstract bibtex

We investigate the control synthesis problem for Markov decision processes (MDPs) with unknown transition probabilities under signal temporal logic (STL) specifications. Our primary objective is to learn a control policy that maximizes the probability of satisfying the STL task. However, existing approaches to STL control synthesis using reinforcement learning encounter a significant exploration challenge, particularly when expanding the state space to incorporate STL tasks. In this work, we propose a novel reinforcement learning algorithm tailored for STL tasks, addressing the exploration difficulty by effectively leveraging counterfactual experiences to expedite the training process. Through experiments we show that these generated experiences enable us to fully employ the knowledge embedded within the task, resulting in a substantial reduction in the number of trial-and-error explorations required before achieving convergence.

@article{wang_tractable_2024,
	title = {Tractable {Reinforcement} {Learning} for {Signal} {Temporal} {Logic} {Tasks} {With} {Counterfactual} {Experience} {Replay}},
	volume = {8},
	copyright = {https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html},
	issn = {2475-1456},
	url = {https://ieeexplore.ieee.org/document/10522501/},
	doi = {10.1109/LCSYS.2024.3397844},
	abstract = {We investigate the control synthesis problem for Markov decision processes (MDPs) with unknown transition probabilities under signal temporal logic (STL) specifications. Our primary objective is to learn a control policy that maximizes the probability of satisfying the STL task. However, existing approaches to STL control synthesis using reinforcement learning encounter a significant exploration challenge, particularly when expanding the state space to incorporate STL tasks. In this work, we propose a novel reinforcement learning algorithm tailored for STL tasks, addressing the exploration difficulty by effectively leveraging counterfactual experiences to expedite the training process. Through experiments we show that these generated experiences enable us to fully employ the knowledge embedded within the task, resulting in a substantial reduction in the number of trial-and-error explorations required before achieving convergence.},
	language = {en},
	urldate = {2024-12-18},
	journal = {IEEE Control Systems Letters},
	author = {Wang, Siqi and Yin, Xunyuan and Li, Shaoyuan and Yin, Xiang},
	year = {2024},
	pages = {616--621},
}

Downloads: 0

{"_id":"kKCJzihYLfCd7c56F","bibbaseid":"wang-yin-li-yin-tractablereinforcementlearningforsignaltemporallogictaskswithcounterfactualexperiencereplay-2024","author_short":["Wang, S.","Yin, X.","Li, S.","Yin, X."],"bibdata":{"bibtype":"article","type":"article","title":"Tractable Reinforcement Learning for Signal Temporal Logic Tasks With Counterfactual Experience Replay","volume":"8","copyright":"https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html","issn":"2475-1456","url":"https://ieeexplore.ieee.org/document/10522501/","doi":"10.1109/LCSYS.2024.3397844","abstract":"We investigate the control synthesis problem for Markov decision processes (MDPs) with unknown transition probabilities under signal temporal logic (STL) specifications. Our primary objective is to learn a control policy that maximizes the probability of satisfying the STL task. However, existing approaches to STL control synthesis using reinforcement learning encounter a significant exploration challenge, particularly when expanding the state space to incorporate STL tasks. In this work, we propose a novel reinforcement learning algorithm tailored for STL tasks, addressing the exploration difficulty by effectively leveraging counterfactual experiences to expedite the training process. Through experiments we show that these generated experiences enable us to fully employ the knowledge embedded within the task, resulting in a substantial reduction in the number of trial-and-error explorations required before achieving convergence.","language":"en","urldate":"2024-12-18","journal":"IEEE Control Systems Letters","author":[{"propositions":[],"lastnames":["Wang"],"firstnames":["Siqi"],"suffixes":[]},{"propositions":[],"lastnames":["Yin"],"firstnames":["Xunyuan"],"suffixes":[]},{"propositions":[],"lastnames":["Li"],"firstnames":["Shaoyuan"],"suffixes":[]},{"propositions":[],"lastnames":["Yin"],"firstnames":["Xiang"],"suffixes":[]}],"year":"2024","pages":"616–621","bibtex":"@article{wang_tractable_2024,\n\ttitle = {Tractable {Reinforcement} {Learning} for {Signal} {Temporal} {Logic} {Tasks} {With} {Counterfactual} {Experience} {Replay}},\n\tvolume = {8},\n\tcopyright = {https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html},\n\tissn = {2475-1456},\n\turl = {https://ieeexplore.ieee.org/document/10522501/},\n\tdoi = {10.1109/LCSYS.2024.3397844},\n\tabstract = {We investigate the control synthesis problem for Markov decision processes (MDPs) with unknown transition probabilities under signal temporal logic (STL) specifications. Our primary objective is to learn a control policy that maximizes the probability of satisfying the STL task. However, existing approaches to STL control synthesis using reinforcement learning encounter a significant exploration challenge, particularly when expanding the state space to incorporate STL tasks. In this work, we propose a novel reinforcement learning algorithm tailored for STL tasks, addressing the exploration difficulty by effectively leveraging counterfactual experiences to expedite the training process. Through experiments we show that these generated experiences enable us to fully employ the knowledge embedded within the task, resulting in a substantial reduction in the number of trial-and-error explorations required before achieving convergence.},\n\tlanguage = {en},\n\turldate = {2024-12-18},\n\tjournal = {IEEE Control Systems Letters},\n\tauthor = {Wang, Siqi and Yin, Xunyuan and Li, Shaoyuan and Yin, Xiang},\n\tyear = {2024},\n\tpages = {616--621},\n}\n\n\n\n\n\n\n\n","author_short":["Wang, S.","Yin, X.","Li, S.","Yin, X."],"key":"wang_tractable_2024","id":"wang_tractable_2024","bibbaseid":"wang-yin-li-yin-tractablereinforcementlearningforsignaltemporallogictaskswithcounterfactualexperiencereplay-2024","role":"author","urls":{"Paper":"https://ieeexplore.ieee.org/document/10522501/"},"metadata":{"authorlinks":{}},"html":""},"bibtype":"article","biburl":"https://bibbase.org/zotero/Cantabile1228","dataSources":["svSjxFw6DzYd5XZdH"],"keywords":[],"search_terms":["tractable","reinforcement","learning","signal","temporal","logic","tasks","counterfactual","experience","replay","wang","yin","li","yin"],"title":"Tractable Reinforcement Learning for Signal Temporal Logic Tasks With Counterfactual Experience Replay","year":2024}