The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training

The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training. Erhan, D., Manzagol, P., Bengio, Y., Bengio, S., & Vincent, P. In van Dyk, D. & Wellings, M., editors, Proceedings of The Twelfth International Conference on Artificial Intelligence and Statistics, AISTATS, volume 5, of JMLR Workshop and Conference Procedings, pages 153–160, 2009.

Paper abstract bibtex

Whereas theoretical work suggests that deep architectures might be more efficient at representing highly-varying functions, training deep architectures was unsuccessful until the recent advent of algorithms based on unsupervised pre-training. Even though these new algorithms have enabled training deep models, many questions remain as to the nature of this difficult learning problem. Answering these questions is important if learning in deep architectures is to be further improved. We attempt to shed some light on these questions through extensive simulations. The experiments confirm and clarify the advantage of unsupervised pre-training. They demonstrate the robustness of the training procedure with respect to the random initialization, the positive effect of pre-training in terms of optimization and its role as a regularizer. We empirically show the influence of pre-training with respect to architecture depth, model capacity, and number of training examples.

@inproceedings{erhan:2009:aistat,
  author = {D. Erhan and P.-A. Manzagol and Y. Bengio and S. Bengio and P. Vincent},
  title = {The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training},
  booktitle = {Proceedings of The Twelfth International Conference on Artificial Intelligence and Statistics, {AISTATS}},
  editor = {D. van Dyk and M. Wellings},
  year = 2009,
  series = {JMLR Workshop and Conference Procedings},
  volume = 5,
  pages = {153--160},
  url = {publications/ps/erhan_2009_aistat.ps.gz},
  pdf = {publications/pdf/erhan_2009_aistat.pdf},
  djvu = {publications/djvu/erhan_2009_aistat.djvu},
  web = {http://jmlr.csail.mit.edu/proceedings/papers/v5/erhan09a/erhan09a.pdf},
  original = {2009/deep_study_aistats},
  topics = {deep_learning},
  abstract = {Whereas theoretical work suggests that deep architectures might be more efficient at representing highly-varying functions, training deep architectures was unsuccessful until the recent advent of algorithms based on unsupervised pre-training. Even though these new algorithms have enabled training deep models, many questions remain as to the nature of this difficult learning problem. Answering these questions is important if learning in deep architectures is to be further improved. We attempt to shed some light on these questions through extensive simulations. The experiments confirm and clarify the advantage of unsupervised pre-training. They demonstrate the robustness of the training procedure with respect to the random initialization, the positive effect of pre-training in terms of optimization and its role as a regularizer. We empirically show the influence of pre-training with respect to architecture depth, model capacity, and number of training examples.},
  categorie = {C}
}

Downloads: 0

{"_id":"iPi7ihzwNo3i65nrA","bibbaseid":"erhan-manzagol-bengio-bengio-vincent-thedifficultyoftrainingdeeparchitecturesandtheeffectofunsupervisedpretraining-2009","downloads":0,"creationDate":"2015-12-14T04:14:19.943Z","title":"The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training","author_short":["Erhan, D.","Manzagol, P.","Bengio, Y.","Bengio, S.","Vincent, P."],"year":2009,"bibtype":"inproceedings","biburl":"http://bengio.abracadoudou.com/samy.bib","bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["D."],"propositions":[],"lastnames":["Erhan"],"suffixes":[]},{"firstnames":["P.-A."],"propositions":[],"lastnames":["Manzagol"],"suffixes":[]},{"firstnames":["Y."],"propositions":[],"lastnames":["Bengio"],"suffixes":[]},{"firstnames":["S."],"propositions":[],"lastnames":["Bengio"],"suffixes":[]},{"firstnames":["P."],"propositions":[],"lastnames":["Vincent"],"suffixes":[]}],"title":"The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training","booktitle":"Proceedings of The Twelfth International Conference on Artificial Intelligence and Statistics, AISTATS","editor":[{"firstnames":["D."],"propositions":["van"],"lastnames":["Dyk"],"suffixes":[]},{"firstnames":["M."],"propositions":[],"lastnames":["Wellings"],"suffixes":[]}],"year":"2009","series":"JMLR Workshop and Conference Procedings","volume":"5","pages":"153–160","url":"publications/ps/erhan_2009_aistat.ps.gz","pdf":"publications/pdf/erhan_2009_aistat.pdf","djvu":"publications/djvu/erhan_2009_aistat.djvu","web":"http://jmlr.csail.mit.edu/proceedings/papers/v5/erhan09a/erhan09a.pdf","original":"2009/deep_study_aistats","topics":"deep_learning","abstract":"Whereas theoretical work suggests that deep architectures might be more efficient at representing highly-varying functions, training deep architectures was unsuccessful until the recent advent of algorithms based on unsupervised pre-training. Even though these new algorithms have enabled training deep models, many questions remain as to the nature of this difficult learning problem. Answering these questions is important if learning in deep architectures is to be further improved. We attempt to shed some light on these questions through extensive simulations. The experiments confirm and clarify the advantage of unsupervised pre-training. They demonstrate the robustness of the training procedure with respect to the random initialization, the positive effect of pre-training in terms of optimization and its role as a regularizer. We empirically show the influence of pre-training with respect to architecture depth, model capacity, and number of training examples.","categorie":"C","bibtex":"@inproceedings{erhan:2009:aistat,\n author = {D. Erhan and P.-A. Manzagol and Y. Bengio and S. Bengio and P. Vincent},\n title = {The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training},\n booktitle = {Proceedings of The Twelfth International Conference on Artificial Intelligence and Statistics, {AISTATS}},\n editor = {D. van Dyk and M. Wellings},\n year = 2009,\n series = {JMLR Workshop and Conference Procedings},\n volume = 5,\n pages = {153--160},\n url = {publications/ps/erhan_2009_aistat.ps.gz},\n pdf = {publications/pdf/erhan_2009_aistat.pdf},\n djvu = {publications/djvu/erhan_2009_aistat.djvu},\n web = {http://jmlr.csail.mit.edu/proceedings/papers/v5/erhan09a/erhan09a.pdf},\n original = {2009/deep_study_aistats},\n topics = {deep_learning},\n abstract = {Whereas theoretical work suggests that deep architectures might be more efficient at representing highly-varying functions, training deep architectures was unsuccessful until the recent advent of algorithms based on unsupervised pre-training. Even though these new algorithms have enabled training deep models, many questions remain as to the nature of this difficult learning problem. Answering these questions is important if learning in deep architectures is to be further improved. We attempt to shed some light on these questions through extensive simulations. The experiments confirm and clarify the advantage of unsupervised pre-training. They demonstrate the robustness of the training procedure with respect to the random initialization, the positive effect of pre-training in terms of optimization and its role as a regularizer. We empirically show the influence of pre-training with respect to architecture depth, model capacity, and number of training examples.},\n categorie = {C}\n}\n\n","author_short":["Erhan, D.","Manzagol, P.","Bengio, Y.","Bengio, S.","Vincent, P."],"editor_short":["van Dyk, D.","Wellings, M."],"key":"erhan:2009:aistat","id":"erhan:2009:aistat","bibbaseid":"erhan-manzagol-bengio-bengio-vincent-thedifficultyoftrainingdeeparchitecturesandtheeffectofunsupervisedpretraining-2009","role":"author","urls":{"Paper":"http://bengio.abracadoudou.com/publications/ps/erhan_2009_aistat.ps.gz"},"downloads":0},"search_terms":["difficulty","training","deep","architectures","effect","unsupervised","pre","training","erhan","manzagol","bengio","bengio","vincent"],"keywords":[],"authorIDs":[],"dataSources":["9NCW2CDr4M3s5DvNX"]}