Leveraging Non-uniformity in First-order Non-convex Optimization

Leveraging Non-uniformity in First-order Non-convex Optimization. Mei, J., Gao, Y., Dai, B., Szepesvári, C., & Schuurmans, D. In ICML, pages 7555–7564, 07, 2021.

Paper

Leveraging Non-uniformity in First-order Non-convex Optimization [link]

Url abstract bibtex 2 downloads

Classical global convergence results for first-order methods rely on uniform smoothness and the Łojasiewicz inequality. Motivated by properties of objective functions that arise in machine learning, we propose a non-uniform refinement of these notions, leading to Non-uniform Smoothness (NS) and Non-uniform Łojasiewicz inequality (NŁ). The new definitions inspire new geometry-aware first-order methods that are able to converge to global optimality faster than the classical $Ω(1/t^2)$ lower bounds. To illustrate the power of these geometry-aware methods and their corresponding non-uniform analysis, we consider two important problems in machine learning: policy gradient optimization in reinforcement learning (PG), and generalized linear model training in supervised learning (GLM). For PG, we find that normalizing the gradient ascent method can accelerate convergence to $O(e^{- c · t})$ (where $c > 0$) while incurring less overhead than existing algorithms. For GLM, we show that geometry-aware normalized gradient descent can also achieve a linear convergence rate, which significantly improves the best known results. We additionally show that the proposed geometry-aware gradient descent methods escape landscape plateaus faster than standard gradient descent. Experimental results are used to illustrate and complement the theoretical findings.

@inproceedings{MeiGDSS21,
  author    = {Jincheng Mei and Yue Gao and Bo Dai and Csaba Szepesv\'ari and Dale Schuurmans},
  title     = {Leveraging Non-uniformity in First-order Non-convex Optimization},
  pages     = {7555--7564},
  crossref  = {ICML2021},
  url_paper = {ICML2021-NonunifPG.pdf},
  url_url  = {http://proceedings.mlr.press/v139/mei21a.html},
  abstract  = {Classical global convergence results for first-order methods rely on uniform smoothness and the Ł{}ojasiewicz inequality. Motivated by properties of objective functions that arise in machine learning, we propose a non-uniform refinement of these notions, leading to <em>Non-uniform Smoothness</em> (NS) and <em>Non-uniform Ł{}ojasiewicz inequality</em> (NŁ{}). The new definitions inspire new geometry-aware first-order methods that are able to converge to global optimality faster than the classical $\Omega(1/t^2)$ lower bounds. To illustrate the power of these geometry-aware methods and their corresponding non-uniform analysis, we consider two important problems in machine learning: policy gradient optimization in reinforcement learning (PG), and generalized linear model training in supervised learning (GLM). For PG, we find that normalizing the gradient ascent method can accelerate convergence to $O(e^{- c \cdot t})$ (where $c &gt; 0$) while incurring less overhead than existing algorithms. For GLM, we show that geometry-aware normalized gradient descent can also achieve a linear convergence rate, which significantly improves the best known results. We additionally show that the proposed geometry-aware gradient descent methods escape landscape plateaus faster than standard gradient descent. Experimental results are used to illustrate and complement the theoretical findings.},
  booktitle = {ICML},
  month = {07},
  year = {2021},
}

Downloads: 2

{"_id":"P2Kd8ByKmWF3Xb4dw","bibbaseid":"mei-gao-dai-szepesvri-schuurmans-leveragingnonuniformityinfirstordernonconvexoptimization-2021","author_short":["Mei, J.","Gao, Y.","Dai, B.","Szepesvári, C.","Schuurmans, D."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["Jincheng"],"propositions":[],"lastnames":["Mei"],"suffixes":[]},{"firstnames":["Yue"],"propositions":[],"lastnames":["Gao"],"suffixes":[]},{"firstnames":["Bo"],"propositions":[],"lastnames":["Dai"],"suffixes":[]},{"firstnames":["Csaba"],"propositions":[],"lastnames":["Szepesvári"],"suffixes":[]},{"firstnames":["Dale"],"propositions":[],"lastnames":["Schuurmans"],"suffixes":[]}],"title":"Leveraging Non-uniformity in First-order Non-convex Optimization","pages":"7555–7564","crossref":"ICML2021","url_paper":"ICML2021-NonunifPG.pdf","url_url":"http://proceedings.mlr.press/v139/mei21a.html","abstract":"Classical global convergence results for first-order methods rely on uniform smoothness and the Łojasiewicz inequality. Motivated by properties of objective functions that arise in machine learning, we propose a non-uniform refinement of these notions, leading to Non-uniform Smoothness (NS) and Non-uniform Łojasiewicz inequality (NŁ). The new definitions inspire new geometry-aware first-order methods that are able to converge to global optimality faster than the classical $Ω(1/t^2)$ lower bounds. To illustrate the power of these geometry-aware methods and their corresponding non-uniform analysis, we consider two important problems in machine learning: policy gradient optimization in reinforcement learning (PG), and generalized linear model training in supervised learning (GLM). For PG, we find that normalizing the gradient ascent method can accelerate convergence to $O(e^{- c · t})$ (where $c > 0$) while incurring less overhead than existing algorithms. For GLM, we show that geometry-aware normalized gradient descent can also achieve a linear convergence rate, which significantly improves the best known results. We additionally show that the proposed geometry-aware gradient descent methods escape landscape plateaus faster than standard gradient descent. Experimental results are used to illustrate and complement the theoretical findings.","booktitle":"ICML","month":"07","year":"2021","bibtex":"@inproceedings{MeiGDSS21,\n author = {Jincheng Mei and Yue Gao and Bo Dai and Csaba Szepesv\\'ari and Dale Schuurmans},\n title = {Leveraging Non-uniformity in First-order Non-convex Optimization},\n pages = {7555--7564},\n crossref = {ICML2021},\n url_paper = {ICML2021-NonunifPG.pdf},\n url_url = {http://proceedings.mlr.press/v139/mei21a.html},\n abstract = {Classical global convergence results for first-order methods rely on uniform smoothness and the Ł{}ojasiewicz inequality. Motivated by properties of objective functions that arise in machine learning, we propose a non-uniform refinement of these notions, leading to Non-uniform Smoothness (NS) and Non-uniform Ł{}ojasiewicz inequality (NŁ{}). The new definitions inspire new geometry-aware first-order methods that are able to converge to global optimality faster than the classical $\\Omega(1/t^2)$ lower bounds. To illustrate the power of these geometry-aware methods and their corresponding non-uniform analysis, we consider two important problems in machine learning: policy gradient optimization in reinforcement learning (PG), and generalized linear model training in supervised learning (GLM). For PG, we find that normalizing the gradient ascent method can accelerate convergence to $O(e^{- c \\cdot t})$ (where $c > 0$) while incurring less overhead than existing algorithms. For GLM, we show that geometry-aware normalized gradient descent can also achieve a linear convergence rate, which significantly improves the best known results. We additionally show that the proposed geometry-aware gradient descent methods escape landscape plateaus faster than standard gradient descent. Experimental results are used to illustrate and complement the theoretical findings.},\n booktitle = {ICML},\n month = {07},\n year = {2021},\n}\n\n","author_short":["Mei, J.","Gao, Y.","Dai, B.","Szepesvári, C.","Schuurmans, D."],"key":"MeiGDSS21","id":"MeiGDSS21","bibbaseid":"mei-gao-dai-szepesvri-schuurmans-leveragingnonuniformityinfirstordernonconvexoptimization-2021","role":"author","urls":{" paper":"https://www.ualberta.ca/~szepesva/papers/ICML2021-NonunifPG.pdf"," url":"http://proceedings.mlr.press/v139/mei21a.html"},"metadata":{"authorlinks":{}},"downloads":2},"bibtype":"inproceedings","biburl":"https://www.ualberta.ca/~szepesva/papers/p2.bib","dataSources":["cd5AYQRw3RHjTgoQc","dYMomj4Jofy8t4qmm"],"keywords":[],"search_terms":["leveraging","non","uniformity","first","order","non","convex","optimization","mei","gao","dai","szepesvári","schuurmans"],"title":"Leveraging Non-uniformity in First-order Non-convex Optimization","year":2021,"downloads":2}