Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, and Separation Design. Ding, Y., Jin, M., & Lavaei, J. In AAAI Conference on Artificial Intelligence (AAAI), 2023. (oral presentation)
Pdf
Arxiv abstract bibtex 22 downloads We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic non-stationary Markov decision processes (MDPs). Both the reward functions and the state transition kernels are unknown and allowed to vary arbitrarily over time with a budget on their cumulative variations. When this variation budget is known a prior, we propose two restart-based algorithms, namely Restart-RSMB and Restart-RSQ, and establish their dynamic regrets. Based on these results, we further present a meta-algorithm that does not require any prior knowledge of the variation budget and can adaptively detect the non-stationarity on the exponential value functions. A dynamic regret lower bound is then established for non-stationary risk-sensitive RL to certify the near-optimality of the proposed algorithms. Our results also show that the risk control and the handling of the non-stationarity can be separately designed in the algorithm if the variation budget is known a prior, while the non-stationary detection mechanism in the adaptive algorithm depends on the risk parameter. This work offers the first non-asymptotic theoretical analyses for the non-stationary risk-sensitive RL in the literature.
@inproceedings{2023_3C_NRRL,
title={Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, and Separation Design},
author={Yuhao Ding and Ming Jin and Javad Lavaei},
booktitle={AAAI Conference on Artificial Intelligence (AAAI)},
note = {<font style="color:#FF0000">(oral presentation)</font>},
pages={},
year={2023},
url_pdf={Nonstationary_RL2022.pdf},
url_arXiv={https://arxiv.org/pdf/2211.10815.pdf},
keywords = {Optimization, Reinforcement Learning, Machine Learning},
abstract={We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic non-stationary Markov decision processes (MDPs). Both the reward functions and the state transition kernels are unknown and allowed to vary arbitrarily over time with a budget on their cumulative variations. When this variation budget is known a prior, we propose two restart-based algorithms, namely Restart-RSMB and Restart-RSQ, and establish their dynamic regrets. Based on these results, we further present a meta-algorithm that does not require any prior knowledge of the variation budget and can adaptively detect the non-stationarity on the exponential value functions. A dynamic regret lower bound is then established for non-stationary risk-sensitive RL to certify the near-optimality of the proposed algorithms. Our results also show that the risk control and the handling of the non-stationarity can be separately designed in the algorithm if the variation budget is known a prior, while the non-stationary detection mechanism in the adaptive algorithm depends on the risk parameter. This work offers the first non-asymptotic theoretical analyses for the non-stationary risk-sensitive RL in the literature. },
}
Downloads: 22
{"_id":"WpSqXgDzCWfDECcp5","bibbaseid":"ding-jin-lavaei-nonstationaryrisksensitivereinforcementlearningnearoptimaldynamicregretadaptivedetectionandseparationdesign-2023","author_short":["Ding, Y.","Jin, M.","Lavaei, J."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","title":"Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, and Separation Design","author":[{"firstnames":["Yuhao"],"propositions":[],"lastnames":["Ding"],"suffixes":[]},{"firstnames":["Ming"],"propositions":[],"lastnames":["Jin"],"suffixes":[]},{"firstnames":["Javad"],"propositions":[],"lastnames":["Lavaei"],"suffixes":[]}],"booktitle":"AAAI Conference on Artificial Intelligence (AAAI)","note":"<font style=\"color:#FF0000\">(oral presentation)</font>","pages":"","year":"2023","url_pdf":"Nonstationary_RL2022.pdf","url_arxiv":"https://arxiv.org/pdf/2211.10815.pdf","keywords":"Optimization, Reinforcement Learning, Machine Learning","abstract":"We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic non-stationary Markov decision processes (MDPs). Both the reward functions and the state transition kernels are unknown and allowed to vary arbitrarily over time with a budget on their cumulative variations. When this variation budget is known a prior, we propose two restart-based algorithms, namely Restart-RSMB and Restart-RSQ, and establish their dynamic regrets. Based on these results, we further present a meta-algorithm that does not require any prior knowledge of the variation budget and can adaptively detect the non-stationarity on the exponential value functions. A dynamic regret lower bound is then established for non-stationary risk-sensitive RL to certify the near-optimality of the proposed algorithms. Our results also show that the risk control and the handling of the non-stationarity can be separately designed in the algorithm if the variation budget is known a prior, while the non-stationary detection mechanism in the adaptive algorithm depends on the risk parameter. This work offers the first non-asymptotic theoretical analyses for the non-stationary risk-sensitive RL in the literature. ","bibtex":"@inproceedings{2023_3C_NRRL,\n title={Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, and Separation Design},\n author={Yuhao Ding and Ming Jin and Javad Lavaei},\n booktitle={AAAI Conference on Artificial Intelligence (AAAI)},\n note = {<font style=\"color:#FF0000\">(oral presentation)</font>},\n pages={},\n year={2023},\n url_pdf={Nonstationary_RL2022.pdf},\n url_arXiv={https://arxiv.org/pdf/2211.10815.pdf},\n keywords = {Optimization, Reinforcement Learning, Machine Learning},\n abstract={We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic non-stationary Markov decision processes (MDPs). Both the reward functions and the state transition kernels are unknown and allowed to vary arbitrarily over time with a budget on their cumulative variations. When this variation budget is known a prior, we propose two restart-based algorithms, namely Restart-RSMB and Restart-RSQ, and establish their dynamic regrets. Based on these results, we further present a meta-algorithm that does not require any prior knowledge of the variation budget and can adaptively detect the non-stationarity on the exponential value functions. A dynamic regret lower bound is then established for non-stationary risk-sensitive RL to certify the near-optimality of the proposed algorithms. Our results also show that the risk control and the handling of the non-stationarity can be separately designed in the algorithm if the variation budget is known a prior, while the non-stationary detection mechanism in the adaptive algorithm depends on the risk parameter. This work offers the first non-asymptotic theoretical analyses for the non-stationary risk-sensitive RL in the literature. },\n}\n\n\n\n","author_short":["Ding, Y.","Jin, M.","Lavaei, J."],"key":"2023_3C_NRRL","id":"2023_3C_NRRL","bibbaseid":"ding-jin-lavaei-nonstationaryrisksensitivereinforcementlearningnearoptimaldynamicregretadaptivedetectionandseparationdesign-2023","role":"author","urls":{" pdf":"http://www.jinming.tech/papers/Nonstationary_RL2022.pdf"," arxiv":"https://arxiv.org/pdf/2211.10815.pdf"},"keyword":["Optimization","Reinforcement Learning","Machine Learning"],"metadata":{"authorlinks":{}},"downloads":22},"bibtype":"inproceedings","biburl":"http://www.jinming.tech/papers/myref.bib","dataSources":["sTzDHHaipTZWjp8oe","Y64tp2HnDCfXgLdc5"],"keywords":["optimization","reinforcement learning","machine learning"],"search_terms":["non","stationary","risk","sensitive","reinforcement","learning","near","optimal","dynamic","regret","adaptive","detection","separation","design","ding","jin","lavaei"],"title":"Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, and Separation Design","year":2023,"downloads":22}