TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions

TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions. Weisz, G., Szepesvári, C., & György, A. In ALT, pages 1097–1137, 03, 2022. PMLR.

Url

TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions [pdf]

Pdf abstract bibtex

We consider the minimax query complexity of online planning with a generative model in fixed-horizon Markov decision processes (MDPs) with linear function approximation. Following recent works, we consider broad classes of problems where either (i) the optimal value function $v^⋆$ or (ii) the optimal action-value function $q^⋆$ lie in the linear span of some features; or (iii) both $v^⋆$ and $q^⋆$ lie in the linear span when restricted to the states reachable from the starting state. Recently, Weisz et al. (ALT 2021) showed that under (ii) the minimax query complexity of any planning algorithm is at least exponential in the horizon $H$ or in the feature dimension $d$ when the size $A$ of the action set can be chosen to be exponential in $\min(d,H)$. On the other hand, for the setting (i), Weisz et al. (COLT 2021) introduced TensorPlan, a planner whose query cost is polynomial in all relevant quantities when the number of actions is fixed. Among other things, these two works left open the question whether polynomial query complexity is possible when $A$ is subexponential in $\min(d,H)$. In this paper we answer this question in the negative: we show that an exponentially large lower bound holds when $A=Ω( \min(d^{1/4},H^{1/2}))$, under either (i), (ii) or (iii). In particular, this implies a perhaps surprising exponential separation of query complexity compared to the work of Du et al. (2021) who prove a polynomial upper bound when (iii) holds for all states. Furthermore, we show that the upper bound of TensorPlan can be extended to hold under (iii) and, for MDPs with deterministic transitions and stochastic rewards, also under (ii).

@inproceedings{WeiszS022,
  author    = {Weisz, Gell\'ert  and Szepesv\'ari, Csaba and Gy\"orgy, Andr\'as },
  title     = {TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions},
  booktitle = {ALT},
  month = {03},
  acceptrate = {42 out of 117 = 36\%},
  pages     = {1097--1137},
  publisher = {PMLR},
  year      = {2022},
  url_url  = {https://proceedings.mlr.press/v167/weisz22a.html},
  url_pdf = {https://proceedings.mlr.press/v167/weisz22a/weisz22a.pdf},
  abstract  = {We consider the minimax query complexity of online planning with a generative model in fixed-horizon Markov decision processes (MDPs) with linear function approximation. Following recent works, we consider broad classes of problems where  either 
(i) the optimal value function $v^\star$ or 
(ii) the optimal action-value function $q^\star$ lie in the linear span of some features; or
 (iii) both $v^\star$ and $q^\star$ lie in the linear span when restricted to the states reachable from the starting state. 
Recently, Weisz et al. (ALT 2021) showed that under (ii) the minimax query complexity of any planning algorithm is at least exponential in the horizon $H$ or in the feature dimension $d$ when the size $A$ of the action set can be chosen to be exponential in $\min(d,H)$. On the other hand, for the setting (i), Weisz et al. (COLT 2021) introduced TensorPlan, a planner whose query cost is polynomial in all relevant quantities when the number of actions is fixed. Among other things, these two works left open the question whether polynomial query complexity is possible when $A$ is subexponential in $\min(d,H)$. In this paper we answer this question in the negative: we show that an exponentially large lower bound holds when $A=\Omega( \min(d^{1/4},H^{1/2}))$, under either (i), (ii) or (iii). In particular, this implies a perhaps surprising exponential separation of query complexity compared to the work of Du et al. (2021) who prove a polynomial upper bound when (iii) holds for all states. Furthermore, we show that the upper bound of TensorPlan can be extended to hold under (iii) and, for MDPs with deterministic transitions and stochastic rewards, also under (ii).},
keywords = {reinforcement learning, simulators, planning, MDPs, linear function approximation, $q^*$-realizability, $v^*$-realizability}
}

Downloads: 0

{"_id":"3eLX2XudLESK3XqRB","bibbaseid":"weisz-szepesvri-gyrgy-tensorplanandthefewactionslowerboundforplanninginmdpsunderlinearrealizabilityofoptimalvaluefunctions-2022","author_short":["Weisz, G.","Szepesvári, C.","György, A."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"propositions":[],"lastnames":["Weisz"],"firstnames":["Gellért"],"suffixes":[]},{"propositions":[],"lastnames":["Szepesvári"],"firstnames":["Csaba"],"suffixes":[]},{"propositions":[],"lastnames":["György"],"firstnames":["András"],"suffixes":[]}],"title":"TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions","booktitle":"ALT","month":"03","acceptrate":"42 out of 117 = 36%","pages":"1097–1137","publisher":"PMLR","year":"2022","url_url":"https://proceedings.mlr.press/v167/weisz22a.html","url_pdf":"https://proceedings.mlr.press/v167/weisz22a/weisz22a.pdf","abstract":"We consider the minimax query complexity of online planning with a generative model in fixed-horizon Markov decision processes (MDPs) with linear function approximation. Following recent works, we consider broad classes of problems where either (i) the optimal value function $v^⋆$ or (ii) the optimal action-value function $q^⋆$ lie in the linear span of some features; or (iii) both $v^⋆$ and $q^⋆$ lie in the linear span when restricted to the states reachable from the starting state. Recently, Weisz et al. (ALT 2021) showed that under (ii) the minimax query complexity of any planning algorithm is at least exponential in the horizon $H$ or in the feature dimension $d$ when the size $A$ of the action set can be chosen to be exponential in $\\min(d,H)$. On the other hand, for the setting (i), Weisz et al. (COLT 2021) introduced TensorPlan, a planner whose query cost is polynomial in all relevant quantities when the number of actions is fixed. Among other things, these two works left open the question whether polynomial query complexity is possible when $A$ is subexponential in $\\min(d,H)$. In this paper we answer this question in the negative: we show that an exponentially large lower bound holds when $A=Ω( \\min(d^{1/4},H^{1/2}))$, under either (i), (ii) or (iii). In particular, this implies a perhaps surprising exponential separation of query complexity compared to the work of Du et al. (2021) who prove a polynomial upper bound when (iii) holds for all states. Furthermore, we show that the upper bound of TensorPlan can be extended to hold under (iii) and, for MDPs with deterministic transitions and stochastic rewards, also under (ii).","keywords":"reinforcement learning, simulators, planning, MDPs, linear function approximation, $q^*$-realizability, $v^*$-realizability","bibtex":"@inproceedings{WeiszS022,\n author = {Weisz, Gell\\'ert and Szepesv\\'ari, Csaba and Gy\\\"orgy, Andr\\'as },\n title = {TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions},\n booktitle = {ALT},\n month = {03},\n acceptrate = {42 out of 117 = 36\\%},\n pages = {1097--1137},\n publisher = {PMLR},\n year = {2022},\n url_url = {https://proceedings.mlr.press/v167/weisz22a.html},\n url_pdf = {https://proceedings.mlr.press/v167/weisz22a/weisz22a.pdf},\n abstract = {We consider the minimax query complexity of online planning with a generative model in fixed-horizon Markov decision processes (MDPs) with linear function approximation. Following recent works, we consider broad classes of problems where either \n(i) the optimal value function $v^\\star$ or \n(ii) the optimal action-value function $q^\\star$ lie in the linear span of some features; or\n (iii) both $v^\\star$ and $q^\\star$ lie in the linear span when restricted to the states reachable from the starting state. \nRecently, Weisz et al. (ALT 2021) showed that under (ii) the minimax query complexity of any planning algorithm is at least exponential in the horizon $H$ or in the feature dimension $d$ when the size $A$ of the action set can be chosen to be exponential in $\\min(d,H)$. On the other hand, for the setting (i), Weisz et al. (COLT 2021) introduced TensorPlan, a planner whose query cost is polynomial in all relevant quantities when the number of actions is fixed. Among other things, these two works left open the question whether polynomial query complexity is possible when $A$ is subexponential in $\\min(d,H)$. In this paper we answer this question in the negative: we show that an exponentially large lower bound holds when $A=\\Omega( \\min(d^{1/4},H^{1/2}))$, under either (i), (ii) or (iii). In particular, this implies a perhaps surprising exponential separation of query complexity compared to the work of Du et al. (2021) who prove a polynomial upper bound when (iii) holds for all states. Furthermore, we show that the upper bound of TensorPlan can be extended to hold under (iii) and, for MDPs with deterministic transitions and stochastic rewards, also under (ii).},\nkeywords = {reinforcement learning, simulators, planning, MDPs, linear function approximation, $q^*$-realizability, $v^*$-realizability}\n}\n\n","author_short":["Weisz, G.","Szepesvári, C.","György, A."],"key":"WeiszS022","id":"WeiszS022","bibbaseid":"weisz-szepesvri-gyrgy-tensorplanandthefewactionslowerboundforplanninginmdpsunderlinearrealizabilityofoptimalvaluefunctions-2022","role":"author","urls":{" url":"https://proceedings.mlr.press/v167/weisz22a.html"," pdf":"https://proceedings.mlr.press/v167/weisz22a/weisz22a.pdf"},"keyword":["reinforcement learning","simulators","planning","MDPs","linear function approximation","$q^*$-realizability","$v^*$-realizability"],"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://www.ualberta.ca/~szepesva/papers/p2.bib","dataSources":["dYMomj4Jofy8t4qmm"],"keywords":["reinforcement learning","simulators","planning","mdps","linear function approximation","$q^*$-realizability","$v^*$-realizability"],"search_terms":["tensorplan","few","actions","lower","bound","planning","mdps","under","linear","realizability","optimal","value","functions","weisz","szepesvári","györgy"],"title":"TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions","year":2022}