Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data. Li, Y. & Liang, Y. arXiv:1808.01204 [cs, stat], August, 2019. arXiv: 1808.01204

Paper abstract bibtex

Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.

@article{li_learning_2019,
	title = {Learning {Overparameterized} {Neural} {Networks} via {Stochastic} {Gradient} {Descent} on {Structured} {Data}},
	url = {http://arxiv.org/abs/1808.01204},
	abstract = {Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.},
	urldate = {2022-03-02},
	journal = {arXiv:1808.01204 [cs, stat]},
	author = {Li, Yuanzhi and Liang, Yingyu},
	month = aug,
	year = {2019},
	note = {arXiv: 1808.01204},
	keywords = {Computer Science - Machine Learning, Statistics - Machine Learning},
}

Downloads: 0

{"_id":"3oDjDEisiuRqJPohY","bibbaseid":"li-liang-learningoverparameterizedneuralnetworksviastochasticgradientdescentonstructureddata-2019","author_short":["Li, Y.","Liang, Y."],"bibdata":{"bibtype":"article","type":"article","title":"Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data","url":"http://arxiv.org/abs/1808.01204","abstract":"Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.","urldate":"2022-03-02","journal":"arXiv:1808.01204 [cs, stat]","author":[{"propositions":[],"lastnames":["Li"],"firstnames":["Yuanzhi"],"suffixes":[]},{"propositions":[],"lastnames":["Liang"],"firstnames":["Yingyu"],"suffixes":[]}],"month":"August","year":"2019","note":"arXiv: 1808.01204","keywords":"Computer Science - Machine Learning, Statistics - Machine Learning","bibtex":"@article{li_learning_2019,\n\ttitle = {Learning {Overparameterized} {Neural} {Networks} via {Stochastic} {Gradient} {Descent} on {Structured} {Data}},\n\turl = {http://arxiv.org/abs/1808.01204},\n\tabstract = {Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.},\n\turldate = {2022-03-02},\n\tjournal = {arXiv:1808.01204 [cs, stat]},\n\tauthor = {Li, Yuanzhi and Liang, Yingyu},\n\tmonth = aug,\n\tyear = {2019},\n\tnote = {arXiv: 1808.01204},\n\tkeywords = {Computer Science - Machine Learning, Statistics - Machine Learning},\n}\n\n","author_short":["Li, Y.","Liang, Y."],"key":"li_learning_2019","id":"li_learning_2019","bibbaseid":"li-liang-learningoverparameterizedneuralnetworksviastochasticgradientdescentonstructureddata-2019","role":"author","urls":{"Paper":"http://arxiv.org/abs/1808.01204"},"keyword":["Computer Science - Machine Learning","Statistics - Machine Learning"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"article","biburl":"https://bibbase.org/zotero/mxmplx","dataSources":["aXmRAq63YsH7a3ufx"],"keywords":["computer science - machine learning","statistics - machine learning"],"search_terms":["learning","overparameterized","neural","networks","via","stochastic","gradient","descent","structured","data","li","liang"],"title":"Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data","year":2019}