{"_id":"P2Kd8ByKmWF3Xb4dw","bibbaseid":"mei-gao-dai-szepesvri-schuurmans-leveragingnonuniformityinfirstordernonconvexoptimization-2021","author_short":["Mei, J.","Gao, Y.","Dai, B.","Szepesvári, C.","Schuurmans, D."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["Jincheng"],"propositions":[],"lastnames":["Mei"],"suffixes":[]},{"firstnames":["Yue"],"propositions":[],"lastnames":["Gao"],"suffixes":[]},{"firstnames":["Bo"],"propositions":[],"lastnames":["Dai"],"suffixes":[]},{"firstnames":["Csaba"],"propositions":[],"lastnames":["Szepesvári"],"suffixes":[]},{"firstnames":["Dale"],"propositions":[],"lastnames":["Schuurmans"],"suffixes":[]}],"title":"Leveraging Non-uniformity in First-order Non-convex Optimization","pages":"7555–7564","crossref":"ICML2021","url_paper":"ICML2021-NonunifPG.pdf","url_link":"http://proceedings.mlr.press/v139/mei21a.html","abstract":"Classical global convergence results for first-order methods rely on uniform smoothness and the Łojasiewicz inequality. Motivated by properties of objective functions that arise in machine learning, we propose a non-uniform refinement of these notions, leading to <em>Non-uniform Smoothness</em> (NS) and <em>Non-uniform Łojasiewicz inequality</em> (NŁ). The new definitions inspire new geometry-aware first-order methods that are able to converge to global optimality faster than the classical $Ω(1/t^2)$ lower bounds. To illustrate the power of these geometry-aware methods and their corresponding non-uniform analysis, we consider two important problems in machine learning: policy gradient optimization in reinforcement learning (PG), and generalized linear model training in supervised learning (GLM). For PG, we find that normalizing the gradient ascent method can accelerate convergence to $O(e^{- c · t})$ (where $c > 0$) while incurring less overhead than existing algorithms. For GLM, we show that geometry-aware normalized gradient descent can also achieve a linear convergence rate, which significantly improves the best known results. We additionally show that the proposed geometry-aware gradient descent methods escape landscape plateaus faster than standard gradient descent. Experimental results are used to illustrate and complement the theoretical findings.","booktitle":"ICML","month":"07","year":"2021","bibtex":"@inproceedings{MeiGDSS21,\n author = {Jincheng Mei and Yue Gao and Bo Dai and Csaba Szepesv\\'ari and Dale Schuurmans},\n title = {Leveraging Non-uniformity in First-order Non-convex Optimization},\n pages = {7555--7564},\n crossref = {ICML2021},\n url_paper = {ICML2021-NonunifPG.pdf},\n url_link = {http://proceedings.mlr.press/v139/mei21a.html},\n abstract = {Classical global convergence results for first-order methods rely on uniform smoothness and the Ł{}ojasiewicz inequality. Motivated by properties of objective functions that arise in machine learning, we propose a non-uniform refinement of these notions, leading to <em>Non-uniform Smoothness</em> (NS) and <em>Non-uniform Ł{}ojasiewicz inequality</em> (NŁ{}). The new definitions inspire new geometry-aware first-order methods that are able to converge to global optimality faster than the classical $\\Omega(1/t^2)$ lower bounds. To illustrate the power of these geometry-aware methods and their corresponding non-uniform analysis, we consider two important problems in machine learning: policy gradient optimization in reinforcement learning (PG), and generalized linear model training in supervised learning (GLM). For PG, we find that normalizing the gradient ascent method can accelerate convergence to $O(e^{- c \\cdot t})$ (where $c > 0$) while incurring less overhead than existing algorithms. For GLM, we show that geometry-aware normalized gradient descent can also achieve a linear convergence rate, which significantly improves the best known results. We additionally show that the proposed geometry-aware gradient descent methods escape landscape plateaus faster than standard gradient descent. Experimental results are used to illustrate and complement the theoretical findings.},\n booktitle = {ICML},\n month = {07},\n year = {2021},\n}\n\n","author_short":["Mei, J.","Gao, Y.","Dai, B.","Szepesvári, C.","Schuurmans, D."],"key":"MeiGDSS21","id":"MeiGDSS21","bibbaseid":"mei-gao-dai-szepesvri-schuurmans-leveragingnonuniformityinfirstordernonconvexoptimization-2021","role":"author","urls":{" paper":"https://www.ualberta.ca/~szepesva/papers/ICML2021-NonunifPG.pdf"," link":"http://proceedings.mlr.press/v139/mei21a.html"},"metadata":{"authorlinks":{}},"downloads":2,"html":""},"bibtype":"inproceedings","biburl":"https://www.ualberta.ca/~szepesva/papers/p2.bib","dataSources":["cd5AYQRw3RHjTgoQc","dYMomj4Jofy8t4qmm"],"keywords":[],"search_terms":["leveraging","non","uniformity","first","order","non","convex","optimization","mei","gao","dai","szepesvári","schuurmans"],"title":"Leveraging Non-uniformity in First-order Non-convex Optimization","year":2021,"downloads":2}