A Fast Learning Algorithm for Deep Belief Nets. Hinton, G. E., Osindero, S., & Teh, Y. Neural Computation, 18(7):1527–1554, July, 2006.
A Fast Learning Algorithm for Deep Belief Nets [link]Paper  doi  abstract   bibtex   4 downloads  
We show how to use “complementary priors” to eliminate the explaining away effects that make inference difficult in densely-connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modelled by long ravines in the free-energy landscape of the top-level associative memory and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.
@article{hinton_fast_2006,
	title = {A {Fast} {Learning} {Algorithm} for {Deep} {Belief} {Nets}},
	volume = {18},
	issn = {0899-7667, 1530-888X},
	url = {https://direct.mit.edu/neco/article/18/7/1527-1554/7065},
	doi = {10.1162/neco.2006.18.7.1527},
	abstract = {We show how to use “complementary priors” to eliminate the explaining away effects that make inference difficult in densely-connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modelled by long ravines in the free-energy landscape of the top-level associative memory and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.},
	language = {en},
	number = {7},
	urldate = {2025-10-03},
	journal = {Neural Computation},
	author = {Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee-Whye},
	month = jul,
	year = {2006},
	pages = {1527--1554},
}

Downloads: 4