On the Performance of Preconditioned Stochastic Gradient Descent. Li, X. 2018. cite arxiv:1803.09383
On the Performance of Preconditioned Stochastic Gradient Descent [link]Paper  abstract   bibtex   
This paper studies the performance of preconditioned stochastic gradient descent (PSGD), which can be regarded as an enhance stochastic Newton method with the ability to handle gradient noise and non-convexity at the same time. We have improved the implementation of PSGD, unrevealed its relationship to equilibrated stochastic gradient descent (ESGD) and batch normalization, and provided a software package (https://github.com/lixilinx/psgd_tf) implemented in Tensorflow to compare variations of PSGD and stochastic gradient descent (SGD) on a wide range of benchmark problems with commonly used neural network models, e.g., convolutional and recurrent neural networks. Comparison results clearly demonstrate the advantages of PSGD in terms of convergence speeds and generalization performances.
@misc{li2018performance,
  abstract = {This paper studies the performance of preconditioned stochastic gradient
descent (PSGD), which can be regarded as an enhance stochastic Newton method
with the ability to handle gradient noise and non-convexity at the same time.
We have improved the implementation of PSGD, unrevealed its relationship to
equilibrated stochastic gradient descent (ESGD) and batch normalization, and
provided a software package (https://github.com/lixilinx/psgd_tf) implemented
in Tensorflow to compare variations of PSGD and stochastic gradient descent
(SGD) on a wide range of benchmark problems with commonly used neural network
models, e.g., convolutional and recurrent neural networks. Comparison results
clearly demonstrate the advantages of PSGD in terms of convergence speeds and
generalization performances.},
  added-at = {2018-03-29T06:30:41.000+0200},
  author = {Li, Xi-Lin},
  biburl = {https://www.bibsonomy.org/bibtex/2643c311f6ae435783b2d4bf11f585b8b/jk_itwm},
  description = {On the Performance of Preconditioned Stochastic Gradient Descent},
  interhash = {c2a2f63380ea309bfd53423000255030},
  intrahash = {643c311f6ae435783b2d4bf11f585b8b},
  keywords = {SGD theory},
  note = {cite arxiv:1803.09383},
  timestamp = {2018-03-29T06:30:41.000+0200},
  title = {On the Performance of Preconditioned Stochastic Gradient Descent},
  url = {http://arxiv.org/abs/1803.09383},
  year = 2018
}

Downloads: 0