Practical recommendations for gradient-based training of deep architectures

Practical recommendations for gradient-based training of deep architectures. Bengio, Y. arXiv:1206.5533 [cs], September, 2012. arXiv: 1206.5533

Paper abstract bibtex

Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of the most commonly used hyper-parameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization. It also discusses how to deal with the fact that more interesting results can be obtained when allowing one to adjust many hyper-parameters. Overall, it describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks. It closes with open questions about the training difficulties observed with deeper architectures.

@article{bengio_practical_2012,
	title = {Practical recommendations for gradient-based training of deep architectures},
	url = {http://arxiv.org/abs/1206.5533},
	abstract = {Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of the most commonly used hyper-parameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization. It also discusses how to deal with the fact that more interesting results can be obtained when allowing one to adjust many hyper-parameters. Overall, it describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks. It closes with open questions about the training difficulties observed with deeper architectures.},
	urldate = {2022-03-02},
	journal = {arXiv:1206.5533 [cs]},
	author = {Bengio, Yoshua},
	month = sep,
	year = {2012},
	note = {arXiv: 1206.5533},
	keywords = {Computer Science - Machine Learning},
}

Downloads: 0

{"_id":"WCsWS3eGCjQPnHKnM","bibbaseid":"bengio-practicalrecommendationsforgradientbasedtrainingofdeeparchitectures-2012","downloads":0,"creationDate":"2018-01-22T16:01:10.161Z","title":"Practical recommendations for gradient-based training of deep architectures","author_short":["Bengio, Y."],"year":2012,"bibtype":"article","biburl":"https://bibbase.org/zotero/mxmplx","bibdata":{"bibtype":"article","type":"article","title":"Practical recommendations for gradient-based training of deep architectures","url":"http://arxiv.org/abs/1206.5533","abstract":"Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of the most commonly used hyper-parameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization. It also discusses how to deal with the fact that more interesting results can be obtained when allowing one to adjust many hyper-parameters. Overall, it describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks. It closes with open questions about the training difficulties observed with deeper architectures.","urldate":"2022-03-02","journal":"arXiv:1206.5533 [cs]","author":[{"propositions":[],"lastnames":["Bengio"],"firstnames":["Yoshua"],"suffixes":[]}],"month":"September","year":"2012","note":"arXiv: 1206.5533","keywords":"Computer Science - Machine Learning","bibtex":"@article{bengio_practical_2012,\n\ttitle = {Practical recommendations for gradient-based training of deep architectures},\n\turl = {http://arxiv.org/abs/1206.5533},\n\tabstract = {Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of the most commonly used hyper-parameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization. It also discusses how to deal with the fact that more interesting results can be obtained when allowing one to adjust many hyper-parameters. Overall, it describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks. It closes with open questions about the training difficulties observed with deeper architectures.},\n\turldate = {2022-03-02},\n\tjournal = {arXiv:1206.5533 [cs]},\n\tauthor = {Bengio, Yoshua},\n\tmonth = sep,\n\tyear = {2012},\n\tnote = {arXiv: 1206.5533},\n\tkeywords = {Computer Science - Machine Learning},\n}\n\n","author_short":["Bengio, Y."],"key":"bengio_practical_2012","id":"bengio_practical_2012","bibbaseid":"bengio-practicalrecommendationsforgradientbasedtrainingofdeeparchitectures-2012","role":"author","urls":{"Paper":"http://arxiv.org/abs/1206.5533"},"keyword":["Computer Science - Machine Learning"],"metadata":{"authorlinks":{}},"html":""},"search_terms":["practical","recommendations","gradient","based","training","deep","architectures","bengio"],"keywords":["computer science - machine learning"],"authorIDs":[],"dataSources":["9cexBw6hrwgyZphZZ","QiNXGx82DM7bbArZY","aXmRAq63YsH7a3ufx"]}