Towards Off-policy Evaluation as a Prerequisite for Real-world Reinforcement Learning in Building Control. Chen, B., Jin, M., Wang, Z., Hong, T., & Bergés, M. In International Workshop on Reinforcement Learning for Energy Management in Buildings & Cities (RLEM), pages 52–56, 2020.
Link
Pdf abstract bibtex We present an initial study of off-policy evaluation (OPE), a problem prerequisite to real-world reinforcement learning (RL), in the context of building control. OPE is the problem of estimating a policy's performance without running it on the actual system, using historical data from the existing controller. It enables the control engineers to ensure a new, pretrained policy satisfies the performance requirements and safety constraints of a real-world system, prior to interacting with it. While many methods have been developed for OPE, no study has evaluated which ones are suitable for building operational data, which are generated by deterministic policies and have limited coverage of the state-action space. After reviewing existing works and their assumptions, we adopted the approximate model (AM) method. Furthermore, we used bootstrapping to quantify uncertainty and correct for bias. In a simulation study, we evaluated the proposed approach on 10 policies pretrained with imitation learning. On average, the AM method estimated the energy and comfort costs with 1.84% and 14.1% error, respectively.
@inproceedings{2020_2C_ope,
title={Towards Off-policy Evaluation as a Prerequisite for Real-world Reinforcement Learning in Building Control},
author={Chen, Bingqing and Jin, Ming and Wang, Zhe and Hong, Tianzhen and Berg{\'e}s, Mario},
booktitle={International Workshop on Reinforcement Learning for Energy Management in Buildings \& Cities (RLEM)},
pages={52--56},
year={2020},
url_link={https://dl.acm.org/doi/abs/10.1145/3427773.3427871},
url_pdf={OPE_RLEM20.pdf},
abstract={We present an initial study of off-policy evaluation (OPE), a problem prerequisite to real-world reinforcement learning (RL), in the context of building control. OPE is the problem of estimating a policy's performance without running it on the actual system, using historical data from the existing controller. It enables the control engineers to ensure a new, pretrained policy satisfies the performance requirements and safety constraints of a real-world system, prior to interacting with it. While many methods have been developed for OPE, no study has evaluated which ones are suitable for building operational data, which are generated by deterministic policies and have limited coverage of the state-action space. After reviewing existing works and their assumptions, we adopted the approximate model (AM) method. Furthermore, we used bootstrapping to quantify uncertainty and correct for bias. In a simulation study, we evaluated the proposed approach on 10 policies pretrained with imitation learning. On average, the AM method estimated the energy and comfort costs with 1.84% and 14.1% error, respectively.},
keywords={Machine learning, Energy system, Smart city}
}
Downloads: 0
{"_id":"TTEqTYQTZyd355KvE","bibbaseid":"chen-jin-wang-hong-bergs-towardsoffpolicyevaluationasaprerequisiteforrealworldreinforcementlearninginbuildingcontrol-2020","author_short":["Chen, B.","Jin, M.","Wang, Z.","Hong, T.","Bergés, M."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","title":"Towards Off-policy Evaluation as a Prerequisite for Real-world Reinforcement Learning in Building Control","author":[{"propositions":[],"lastnames":["Chen"],"firstnames":["Bingqing"],"suffixes":[]},{"propositions":[],"lastnames":["Jin"],"firstnames":["Ming"],"suffixes":[]},{"propositions":[],"lastnames":["Wang"],"firstnames":["Zhe"],"suffixes":[]},{"propositions":[],"lastnames":["Hong"],"firstnames":["Tianzhen"],"suffixes":[]},{"propositions":[],"lastnames":["Bergés"],"firstnames":["Mario"],"suffixes":[]}],"booktitle":"International Workshop on Reinforcement Learning for Energy Management in Buildings & Cities (RLEM)","pages":"52–56","year":"2020","url_link":"https://dl.acm.org/doi/abs/10.1145/3427773.3427871","url_pdf":"OPE_RLEM20.pdf","abstract":"We present an initial study of off-policy evaluation (OPE), a problem prerequisite to real-world reinforcement learning (RL), in the context of building control. OPE is the problem of estimating a policy's performance without running it on the actual system, using historical data from the existing controller. It enables the control engineers to ensure a new, pretrained policy satisfies the performance requirements and safety constraints of a real-world system, prior to interacting with it. While many methods have been developed for OPE, no study has evaluated which ones are suitable for building operational data, which are generated by deterministic policies and have limited coverage of the state-action space. After reviewing existing works and their assumptions, we adopted the approximate model (AM) method. Furthermore, we used bootstrapping to quantify uncertainty and correct for bias. In a simulation study, we evaluated the proposed approach on 10 policies pretrained with imitation learning. On average, the AM method estimated the energy and comfort costs with 1.84% and 14.1% error, respectively.","keywords":"Machine learning, Energy system, Smart city","bibtex":"@inproceedings{2020_2C_ope,\n title={Towards Off-policy Evaluation as a Prerequisite for Real-world Reinforcement Learning in Building Control},\n author={Chen, Bingqing and Jin, Ming and Wang, Zhe and Hong, Tianzhen and Berg{\\'e}s, Mario},\n booktitle={International Workshop on Reinforcement Learning for Energy Management in Buildings \\& Cities (RLEM)},\n pages={52--56},\n year={2020},\n url_link={https://dl.acm.org/doi/abs/10.1145/3427773.3427871},\n url_pdf={OPE_RLEM20.pdf},\n abstract={We present an initial study of off-policy evaluation (OPE), a problem prerequisite to real-world reinforcement learning (RL), in the context of building control. OPE is the problem of estimating a policy's performance without running it on the actual system, using historical data from the existing controller. It enables the control engineers to ensure a new, pretrained policy satisfies the performance requirements and safety constraints of a real-world system, prior to interacting with it. While many methods have been developed for OPE, no study has evaluated which ones are suitable for building operational data, which are generated by deterministic policies and have limited coverage of the state-action space. After reviewing existing works and their assumptions, we adopted the approximate model (AM) method. Furthermore, we used bootstrapping to quantify uncertainty and correct for bias. In a simulation study, we evaluated the proposed approach on 10 policies pretrained with imitation learning. On average, the AM method estimated the energy and comfort costs with 1.84% and 14.1% error, respectively.},\n keywords={Machine learning, Energy system, Smart city}\n}\n\n","author_short":["Chen, B.","Jin, M.","Wang, Z.","Hong, T.","Bergés, M."],"key":"2020_2C_ope","id":"2020_2C_ope","bibbaseid":"chen-jin-wang-hong-bergs-towardsoffpolicyevaluationasaprerequisiteforrealworldreinforcementlearninginbuildingcontrol-2020","role":"author","urls":{" link":"https://dl.acm.org/doi/abs/10.1145/3427773.3427871"," pdf":"http://www.jinming.tech/papers/OPE_RLEM20.pdf"},"keyword":["Machine learning","Energy system","Smart city"],"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"http://www.jinming.tech/papers/myref.bib","dataSources":["sTzDHHaipTZWjp8oe","Y64tp2HnDCfXgLdc5"],"keywords":["machine learning","energy system","smart city"],"search_terms":["towards","policy","evaluation","prerequisite","real","world","reinforcement","learning","building","control","chen","jin","wang","hong","bergés"],"title":"Towards Off-policy Evaluation as a Prerequisite for Real-world Reinforcement Learning in Building Control","year":2020}