On the Global Convergence Rates of Softmax Policy Gradient Methods

On the Global Convergence Rates of Softmax Policy Gradient Methods. Mei, J., Xiao, C., Szepesvári, C., & Schuurmans, D. In ICML, 06, 2020.

Paper abstract bibtex 54 downloads

We make three contributions toward better understanding policy gradient methods in the tabular setting. First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization. This result significantly expands the recent asymptotic convergence results. The analysis relies on two findings: that the softmax policy gradient satisfies a Lojasiewicz inequality, and the minimum probability of an optimal action during optimization can be bounded in terms of its initial value. Second, we analyze entropy regularized policy gradient and show that it enjoys a significantly faster linear convergence rate $O(e^{-t})$ toward softmax optimal policy. This result resolves an open question in the recent literature. Finally, combining the above two results and additional new $Ω(1/t)$ lower bound results, we explain how entropy regularization improves policy optimization, even with the true gradient, from the perspective of convergence rate. The separation of rates is further explained using the notion of non-uniform Lojasiewicz degree. These results provide a theoretical understanding of the impact of entropy and corroborate existing empirical studies.

@inproceedings{JYSzWA20,
	abstract = {We make three contributions toward better understanding policy gradient methods in the tabular setting.
First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization.
This result significantly expands the recent asymptotic convergence results.
The analysis relies on two findings:
that the softmax policy gradient satisfies a Lojasiewicz inequality, and the minimum probability of an optimal action during optimization can be bounded in terms of its initial value.
Second, we analyze entropy regularized policy gradient and show that it enjoys a significantly faster linear convergence rate $O(e^{-t})$ toward softmax optimal policy.
This result resolves an open question in the recent literature.
Finally, combining the above two results and additional new $\Omega(1/t)$ lower bound results, we explain how entropy regularization improves policy optimization, even with the true gradient, from the perspective of convergence rate. The separation of rates is further explained using the notion of  non-uniform Lojasiewicz degree.
These results provide a theoretical understanding of the impact of entropy and corroborate existing empirical studies.},
	author = {Mei, Jincheng and Xiao, Chenjun and Szepesv\'ari, Csaba and Schuurmans, Dale},
	crossref = {ICML2020},
	month = {06},
	title = {On the Global Convergence Rates of Softmax Policy Gradient Methods},
	url_paper = {ICML2020_pg.pdf},
    booktitle = {ICML},
	year = {2020}}

Downloads: 54

{"_id":"zLxHuMcw7xiRcWTqn","bibbaseid":"mei-xiao-szepesvri-schuurmans-ontheglobalconvergenceratesofsoftmaxpolicygradientmethods-2020","authorIDs":["279PY77kXFE8vWA2Z","3RfzECoweoi7whJcn","4QCWeGJDcuieMasAe","4Tjqo47EWWsMKkTsz","4rnd6s56kwkYuN4vj","596hfkzoGyduaHJsx","6ZE3ATLtdNK2XKNyM","99T5SjY7hztGpFBvH","9ptfi8y4NAbFtcFyE","A2yHTTtEd7BHAWKxd","BnDo6icizXoM3ZM6w","CEF7BzjRG82xSkYnM","CNNkdvJNYs6mrvzjX","CuaCYHTopgvGbd8zk","F2vs4LRcswWXavxfy","FaD78bpAgKLAq4DE2","G25PrkxMGXRRMcCc4","Ge5Rxopmc3SuMrwAH","GpEM5uuobmY3kpHTW","JYhYxghGatqr4mF3H","JdCvvY7vmDS37xtBu","KDMX7rrdf6AsAYDyL","KFpw9rYFeSRdATA4e","KRpsFoiZnaCs9spJb","KaaDW3CcB7w9jsdXT","KergaMvq5ySYJJ3ja","L79tQyaj5QPQQWbhg","MYwHnbXmgZ6kDo3rw","MwHsLe6xMSqRXNS2a","Px8xSNb3LrPQap6Kk","Q6itd4jKLZFdSnTf3","R2QWF4bMkcqfXtkFy","R4cZsfzoubPJYRrnK","Ro8w9jcjvoj73u7Xr","TFtNr7Gkec5KGNDtp","XKguNtDfpi65mQGoP","Xfkk7uQL8EdfTKvQr","ZuZsatkxppZCHnGih","ZxvYv4Qz5HX2uJuNy","abeZr8physSQM35kQ","aod4LHA2acYGGgTq5","dPLx5jQPTZ38sge6e","daaG2KorDDHmmfE8n","e6FLJXcbsWN389Nac","euwQteZ8dvXDgnTeJ","fCcZBpWoomHwsZhMc","fjJ4rCAY73hrX8FfN","jT9EgmjXvsKC8mchN","jqRm9piESHxML2fDN","jzYGL4nHWtXMxLrS2","o7eSSyiMrY5sM7Riu","qZG9eGoTDZQerwFFk","rLnbnm3N6z6ao7Sgs","tPEcG6gpERvBMQHXC","tcPCYiCfNx26iQvrG","tepS4j4xyQcYE9w6A","u8YWp79iEPkjZWt8B","vEGDZadANdDu7HE4S","xEkabBjTQjdvXWXbX","xyst9ZfRqvy2Qhf39","z3Gjh8c2ESrGnGcxb"],"author_short":["Mei, J.","Xiao, C.","Szepesvári, C.","Schuurmans, D."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","abstract":"We make three contributions toward better understanding policy gradient methods in the tabular setting. First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization. This result significantly expands the recent asymptotic convergence results. The analysis relies on two findings: that the softmax policy gradient satisfies a Lojasiewicz inequality, and the minimum probability of an optimal action during optimization can be bounded in terms of its initial value. Second, we analyze entropy regularized policy gradient and show that it enjoys a significantly faster linear convergence rate $O(e^{-t})$ toward softmax optimal policy. This result resolves an open question in the recent literature. Finally, combining the above two results and additional new $Ω(1/t)$ lower bound results, we explain how entropy regularization improves policy optimization, even with the true gradient, from the perspective of convergence rate. The separation of rates is further explained using the notion of non-uniform Lojasiewicz degree. These results provide a theoretical understanding of the impact of entropy and corroborate existing empirical studies.","author":[{"propositions":[],"lastnames":["Mei"],"firstnames":["Jincheng"],"suffixes":[]},{"propositions":[],"lastnames":["Xiao"],"firstnames":["Chenjun"],"suffixes":[]},{"propositions":[],"lastnames":["Szepesvári"],"firstnames":["Csaba"],"suffixes":[]},{"propositions":[],"lastnames":["Schuurmans"],"firstnames":["Dale"],"suffixes":[]}],"crossref":"ICML2020","month":"06","title":"On the Global Convergence Rates of Softmax Policy Gradient Methods","url_paper":"ICML2020_pg.pdf","booktitle":"ICML","year":"2020","bibtex":"@inproceedings{JYSzWA20,\n\tabstract = {We make three contributions toward better understanding policy gradient methods in the tabular setting.\nFirst, we show that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization.\nThis result significantly expands the recent asymptotic convergence results.\nThe analysis relies on two findings:\nthat the softmax policy gradient satisfies a Lojasiewicz inequality, and the minimum probability of an optimal action during optimization can be bounded in terms of its initial value.\nSecond, we analyze entropy regularized policy gradient and show that it enjoys a significantly faster linear convergence rate $O(e^{-t})$ toward softmax optimal policy.\nThis result resolves an open question in the recent literature.\nFinally, combining the above two results and additional new $\\Omega(1/t)$ lower bound results, we explain how entropy regularization improves policy optimization, even with the true gradient, from the perspective of convergence rate. The separation of rates is further explained using the notion of non-uniform Lojasiewicz degree.\nThese results provide a theoretical understanding of the impact of entropy and corroborate existing empirical studies.},\n\tauthor = {Mei, Jincheng and Xiao, Chenjun and Szepesv\\'ari, Csaba and Schuurmans, Dale},\n\tcrossref = {ICML2020},\n\tmonth = {06},\n\ttitle = {On the Global Convergence Rates of Softmax Policy Gradient Methods},\n\turl_paper = {ICML2020_pg.pdf},\n booktitle = {ICML},\n\tyear = {2020}}\n\n","author_short":["Mei, J.","Xiao, C.","Szepesvári, C.","Schuurmans, D."],"key":"JYSzWA20","id":"JYSzWA20","bibbaseid":"mei-xiao-szepesvri-schuurmans-ontheglobalconvergenceratesofsoftmaxpolicygradientmethods-2020","role":"author","urls":{" paper":"https://sites.ualberta.ca/~szepesva/papers/ICML2020_pg.pdf"},"metadata":{"authorlinks":{"szepesvári, c":"https://sites.ualberta.ca/~szepesva/pubs.html"}},"downloads":54},"bibtype":"inproceedings","biburl":"https://sites.ualberta.ca/~szepesva/papers/p2.bib","creationDate":"2020-07-06T22:38:54.305Z","downloads":54,"keywords":[],"search_terms":["global","convergence","rates","softmax","policy","gradient","methods","mei","xiao","szepesvári","schuurmans"],"title":"On the Global Convergence Rates of Softmax Policy Gradient Methods","year":2020,"dataSources":["dYMomj4Jofy8t4qmm","Ciq2jeFvPFYBCoxwJ","v2PxY4iCzrNyY9fhF","cd5AYQRw3RHjTgoQc","JAZPSdjiP95Ah92D9"]}