Direct Estimation of Weights and Efficient Training of Deep Neural Networks without SGD

Direct Estimation of Weights and Efficient Training of Deep Neural Networks without SGD. Dehmamy, N., Rohani, N., & Katsaggelos, A. K. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume 2019-May, pages 3232–3236, may, 2019. IEEE.

Paper doi abstract bibtex

We argue that learning a hierarchy of features in a hierarchical dataset requires lower layers to approach convergence faster than layers above them. We show that, if this assumption holds, we can analytically approximate the outcome of stochastic gradient descent (SGD) for each layer. We find that the weights should converge to a class-based PCA, with some weights in every layer dedicated to principal components of each label class. The class-based PCA allows us to train layers directly, without SGD, often leading to a dramatic decrease in training complexity. We demonstrate the effectiveness of this by using our results to replace one and two convolutional layers in networks trained on MNIST, CIFAR10 and CIFAR100 datasets, showing that our method achieves performance superior or comparable to similar architectures trained using SGD.

@inproceedings{Nima2019,
abstract = {We argue that learning a hierarchy of features in a hierarchical dataset requires lower layers to approach convergence faster than layers above them. We show that, if this assumption holds, we can analytically approximate the outcome of stochastic gradient descent (SGD) for each layer. We find that the weights should converge to a class-based PCA, with some weights in every layer dedicated to principal components of each label class. The class-based PCA allows us to train layers directly, without SGD, often leading to a dramatic decrease in training complexity. We demonstrate the effectiveness of this by using our results to replace one and two convolutional layers in networks trained on MNIST, CIFAR10 and CIFAR100 datasets, showing that our method achieves performance superior or comparable to similar architectures trained using SGD.},
author = {Dehmamy, Nima and Rohani, Neda and Katsaggelos, Aggelos K.},
booktitle = {ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
doi = {10.1109/ICASSP.2019.8682781},
isbn = {978-1-4799-8131-1},
issn = {15206149},
month = {may},
pages = {3232--3236},
publisher = {IEEE},
title = {{Direct Estimation of Weights and Efficient Training of Deep Neural Networks without SGD}},
url = {https://ieeexplore.ieee.org/document/8682781/},
volume = {2019-May},
year = {2019}
}

Downloads: 0

{"_id":"dpiwNrug9yp5rQhy5","bibbaseid":"dehmamy-rohani-katsaggelos-directestimationofweightsandefficienttrainingofdeepneuralnetworkswithoutsgd-2019","author_short":["Dehmamy, N.","Rohani, N.","Katsaggelos, A. K."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","abstract":"We argue that learning a hierarchy of features in a hierarchical dataset requires lower layers to approach convergence faster than layers above them. We show that, if this assumption holds, we can analytically approximate the outcome of stochastic gradient descent (SGD) for each layer. We find that the weights should converge to a class-based PCA, with some weights in every layer dedicated to principal components of each label class. The class-based PCA allows us to train layers directly, without SGD, often leading to a dramatic decrease in training complexity. We demonstrate the effectiveness of this by using our results to replace one and two convolutional layers in networks trained on MNIST, CIFAR10 and CIFAR100 datasets, showing that our method achieves performance superior or comparable to similar architectures trained using SGD.","author":[{"propositions":[],"lastnames":["Dehmamy"],"firstnames":["Nima"],"suffixes":[]},{"propositions":[],"lastnames":["Rohani"],"firstnames":["Neda"],"suffixes":[]},{"propositions":[],"lastnames":["Katsaggelos"],"firstnames":["Aggelos","K."],"suffixes":[]}],"booktitle":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","doi":"10.1109/ICASSP.2019.8682781","isbn":"978-1-4799-8131-1","issn":"15206149","month":"may","pages":"3232–3236","publisher":"IEEE","title":"Direct Estimation of Weights and Efficient Training of Deep Neural Networks without SGD","url":"https://ieeexplore.ieee.org/document/8682781/","volume":"2019-May","year":"2019","bibtex":"@inproceedings{Nima2019,\nabstract = {We argue that learning a hierarchy of features in a hierarchical dataset requires lower layers to approach convergence faster than layers above them. We show that, if this assumption holds, we can analytically approximate the outcome of stochastic gradient descent (SGD) for each layer. We find that the weights should converge to a class-based PCA, with some weights in every layer dedicated to principal components of each label class. The class-based PCA allows us to train layers directly, without SGD, often leading to a dramatic decrease in training complexity. We demonstrate the effectiveness of this by using our results to replace one and two convolutional layers in networks trained on MNIST, CIFAR10 and CIFAR100 datasets, showing that our method achieves performance superior or comparable to similar architectures trained using SGD.},\nauthor = {Dehmamy, Nima and Rohani, Neda and Katsaggelos, Aggelos K.},\nbooktitle = {ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},\ndoi = {10.1109/ICASSP.2019.8682781},\nisbn = {978-1-4799-8131-1},\nissn = {15206149},\nmonth = {may},\npages = {3232--3236},\npublisher = {IEEE},\ntitle = {{Direct Estimation of Weights and Efficient Training of Deep Neural Networks without SGD}},\nurl = {https://ieeexplore.ieee.org/document/8682781/},\nvolume = {2019-May},\nyear = {2019}\n}\n","author_short":["Dehmamy, N.","Rohani, N.","Katsaggelos, A. K."],"key":"Nima2019","id":"Nima2019","bibbaseid":"dehmamy-rohani-katsaggelos-directestimationofweightsandefficienttrainingofdeepneuralnetworkswithoutsgd-2019","role":"author","urls":{"Paper":"https://ieeexplore.ieee.org/document/8682781/"},"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://sites.northwestern.edu/ivpl/files/2023/06/IVPL_Updated_publications-1.bib","dataSources":["KTWAakbPXLGfYseXn","ePKPjG8C6yvpk4mEK","ya2CyA73rpZseyrZ8","E6Bth2QB5BYjBMZE7","nbnEjsN7MJhurAK9x","PNQZj6FjzoxxJk4Yi","7FpDWDGJ4KgpDiGfB","bod9ms4MQJHuJgPpp","QR9t5P2cLdJuzhfzK","D8k2SxfC5dKNRFgro","7Dwzbxq93HWrJEhT6","qhF8zxmGcJfvtdeAg","fvDEHD49E2ZRwE3fb","H7crv8NWhZup4d4by","DHqokWsryttGh7pJE","vRJd4wNg9HpoZSMHD","sYxQ6pxFgA59JRhxi","w2WahSbYrbcCKBDsC","XasdXLL99y5rygCmq","3gkSihZQRfAD2KBo3","t5XMbyZbtPBo4wBGS","bEpHM2CtrwW2qE8FP","teJzFLHexaz5AQW5z","taz5xnPrcQTmMdtqr"],"keywords":[],"search_terms":["direct","estimation","weights","efficient","training","deep","neural","networks","without","sgd","dehmamy","rohani","katsaggelos"],"title":"Direct Estimation of Weights and Efficient Training of Deep Neural Networks without SGD","year":2019}