Stochastic Optimization for Machine Learning. Zheng, S.
abstract   bibtex   
Numerical optimization has played an important role in the evolution of machine learning, touching almost every aspect of the discipline. Stochastic approximation has evolved and expanded as one of the main streams of research in mathematical optimzation. This survey provides a review and summary on the stochastic optimization algorithms in the context of machine learning applications. The stochastic gradient descent (SGD) method has been widely viewed as an ideal approach for large-scale machine learning problems while the conventional batch gradient method typically falters. Despite its flexibility and scalability, the stochastic gradient is associated with high variance which impedes training. Based on this viewpoint, we provide a comprehensive theoretical and practical discussion on the SGD, and then we investigate a new spectrum of incremental gradient methods that suppress the noise in a smart way, leading to improved convergence results. We further present review on methods that integrate the stochastic gradients into the alternating direction method of multipliers (ADMM), which has been recently advocated as an efficient optimization tool for a wider variety of models. Last but not least, we also presents some stochastic optimization techniques for the deep neural network training, including momentum methods and algorithms with adaptive learning rates.
@article{zheng_stochastic_nodate,
	title = {Stochastic {Optimization} for {Machine} {Learning}},
	abstract = {Numerical optimization has played an important role in the evolution of machine learning, touching almost every aspect of the discipline. Stochastic approximation has evolved and expanded as one of the main streams of research in mathematical optimzation. This survey provides a review and summary on the stochastic optimization algorithms in the context of machine learning applications. The stochastic gradient descent (SGD) method has been widely viewed as an ideal approach for large-scale machine learning problems while the conventional batch gradient method typically falters. Despite its flexibility and scalability, the stochastic gradient is associated with high variance which impedes training. Based on this viewpoint, we provide a comprehensive theoretical and practical discussion on the SGD, and then we investigate a new spectrum of incremental gradient methods that suppress the noise in a smart way, leading to improved convergence results. We further present review on methods that integrate the stochastic gradients into the alternating direction method of multipliers (ADMM), which has been recently advocated as an efficient optimization tool for a wider variety of models. Last but not least, we also presents some stochastic optimization techniques for the deep neural network training, including momentum methods and algorithms with adaptive learning rates.},
	language = {en},
	author = {Zheng, Shuai},
	keywords = {/unread, ⛔ No DOI found},
	pages = {36},
}

Downloads: 0