The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation. Amortila, P., Jiang, N., & Szepesvári, C. In ICML, pages 768–790, 07, 2023.
Url
Pdf abstract bibtex Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation. Yet, the nature of such approximation factors – especially their optimal form in a given learning problem – is poorly understood. In this paper we study this question in linear off-policy value function estimation, where many open questions remain. We study the approximation factor in a broad spectrum of settings, such as presence vs. absence of state aliasing and full vs. partial coverage of the state space. Our core results include instance-dependent upper bounds on the approximation factors with respect to both the weighted $L_2$-norm (where the weighting is the offline state distribution) and the $L_∞$ norm. We show that these approximation factors are optimal (in an instance-dependent sense) for a number of these settings. In other cases, we show that the instance-dependent parameters which appear in the upper bounds are necessary, and that the finiteness of either alone cannot guarantee a finite approximation factor even in the limit of infinite data.
@inproceedings{AnNanSz-ICML23,
title = {The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation},
author = {Amortila, Philip and Jiang, Nan and Szepesv{\'a}ri, Csaba},
booktitle = {ICML},
pages = {768--790},
crossref = {ICML2023},
year = {2023},
month = {07},
url_url = {https://proceedings.mlr.press/v202/amortila23a.html},
url_pdf = {https://proceedings.mlr.press/v202/amortila23a/amortila23a.pdf},
abstract = {Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation. Yet, the nature of such <em>approximation factors</em> -- especially their optimal form in a given learning problem -- is poorly understood. In this paper we study this question in linear off-policy value function estimation, where many open questions remain. We study the approximation factor in a broad spectrum of settings, such as presence vs. absence of state aliasing and full vs. partial coverage of the state space. Our core results include instance-dependent upper bounds on the approximation factors with respect to both the weighted $L_2$-norm (where the weighting is the offline state distribution) and the $L_\infty$ norm. We show that these approximation factors are optimal (in an instance-dependent sense) for a number of these settings. In other cases, we show that the instance-dependent parameters which appear in the upper bounds are necessary, and that the finiteness of either alone cannot guarantee a finite approximation factor even in the limit of infinite data.}
}
Downloads: 0
{"_id":"cSSYrxwrGrdqur853","bibbaseid":"amortila-jiang-szepesvri-theoptimalapproximationfactorsinmisspecifiedoffpolicyvaluefunctionestimation-2023","author_short":["Amortila, P.","Jiang, N.","Szepesvári, C."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","title":"The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation","author":[{"propositions":[],"lastnames":["Amortila"],"firstnames":["Philip"],"suffixes":[]},{"propositions":[],"lastnames":["Jiang"],"firstnames":["Nan"],"suffixes":[]},{"propositions":[],"lastnames":["Szepesvári"],"firstnames":["Csaba"],"suffixes":[]}],"booktitle":"ICML","pages":"768–790","crossref":"ICML2023","year":"2023","month":"07","url_url":"https://proceedings.mlr.press/v202/amortila23a.html","url_pdf":"https://proceedings.mlr.press/v202/amortila23a/amortila23a.pdf","abstract":"Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation. Yet, the nature of such <em>approximation factors</em> – especially their optimal form in a given learning problem – is poorly understood. In this paper we study this question in linear off-policy value function estimation, where many open questions remain. We study the approximation factor in a broad spectrum of settings, such as presence vs. absence of state aliasing and full vs. partial coverage of the state space. Our core results include instance-dependent upper bounds on the approximation factors with respect to both the weighted $L_2$-norm (where the weighting is the offline state distribution) and the $L_∞$ norm. We show that these approximation factors are optimal (in an instance-dependent sense) for a number of these settings. In other cases, we show that the instance-dependent parameters which appear in the upper bounds are necessary, and that the finiteness of either alone cannot guarantee a finite approximation factor even in the limit of infinite data.","bibtex":"@inproceedings{AnNanSz-ICML23,\n title = \t {The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation},\n author = {Amortila, Philip and Jiang, Nan and Szepesv{\\'a}ri, Csaba},\n booktitle = \t {ICML},\n pages = \t {768--790},\n crossref = {ICML2023},\n year = {2023},\n month = {07},\n url_url = \t {https://proceedings.mlr.press/v202/amortila23a.html},\n url_pdf = \t {https://proceedings.mlr.press/v202/amortila23a/amortila23a.pdf},\n abstract = \t {Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation. Yet, the nature of such <em>approximation factors</em> -- especially their optimal form in a given learning problem -- is poorly understood. In this paper we study this question in linear off-policy value function estimation, where many open questions remain. We study the approximation factor in a broad spectrum of settings, such as presence vs. absence of state aliasing and full vs. partial coverage of the state space. Our core results include instance-dependent upper bounds on the approximation factors with respect to both the weighted $L_2$-norm (where the weighting is the offline state distribution) and the $L_\\infty$ norm. We show that these approximation factors are optimal (in an instance-dependent sense) for a number of these settings. In other cases, we show that the instance-dependent parameters which appear in the upper bounds are necessary, and that the finiteness of either alone cannot guarantee a finite approximation factor even in the limit of infinite data.}\n}\n\n\n","author_short":["Amortila, P.","Jiang, N.","Szepesvári, C."],"key":"AnNanSz-ICML23","id":"AnNanSz-ICML23","bibbaseid":"amortila-jiang-szepesvri-theoptimalapproximationfactorsinmisspecifiedoffpolicyvaluefunctionestimation-2023","role":"author","urls":{" url":"https://proceedings.mlr.press/v202/amortila23a.html"," pdf":"https://proceedings.mlr.press/v202/amortila23a/amortila23a.pdf"},"metadata":{"authorlinks":{}},"downloads":0,"html":""},"bibtype":"inproceedings","biburl":"https://sites.ualberta.ca/~szepesva/papers/p2.bib","dataSources":["JAZPSdjiP95Ah92D9"],"keywords":[],"search_terms":["optimal","approximation","factors","misspecified","policy","value","function","estimation","amortila","jiang","szepesvári"],"title":"The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation","year":2023}