Large Scale Bayes Point Machines

Large Scale Bayes Point Machines. Herbrich, R. & Graepel, T. In Advances in Neural Information Processing Systems 13, pages 528--534, Denver, 2000. The MIT Press.

Paper abstract bibtex 1 download

The concept of averaging over classifiers is fundamental to the Bayesian analysis of learning. Based on this viewpoint, it has recently been demonstrated for linear classifiers that the centre of mass of version space (the set of all classifiers consistent with the training set) - also known as the Bayes point - exhibits excellent generalisation abilities. However, the billiard algorithm as presented in [Herbrich et al., 2000] is restricted to small sample size because it requires O(m*m) of memory and O(N*m*m) computational steps where m is the number of training patterns and N is the number of random draws from the posterior distribution. In this paper we present a method based on the simple perceptron learning algorithm which allows to overcome this algorithmic drawback. The method is algorithmically simple and is easily extended to the multi-class case. We present experimental results on the MNIST data set of handwritten digits which show that Bayes Point Machines are competitive with the current world champion, the support vector machine. In addition, the computational complexity of BPMs can be tuned by varying the number of samples from the posterior. Finally, rejecting test points on the basis of their (approximative) posterior probability leads to a rapid decrease in generalisation error, e.g. 0.1% generalisation error for a given rejection rate of 10%.

@inproceedings{DBLP:conf/nips/HerbrichG00a,
abstract = {The concept of averaging over classifiers is fundamental to the Bayesian analysis of learning. Based on this viewpoint, it has recently been demonstrated for linear classifiers that the centre of mass of version space (the set of all classifiers consistent with the training set) - also known as the Bayes point - exhibits excellent generalisation abilities. However, the billiard algorithm as presented in [Herbrich et al., 2000] is restricted to small sample size because it requires O(m*m) of memory and O(N*m*m) computational steps where m is the number of training patterns and N is the number of random draws from the posterior distribution. In this paper we present a method based on the simple perceptron learning algorithm which allows to overcome this algorithmic drawback. The method is algorithmically simple and is easily extended to the multi-class case. We present experimental results on the MNIST data set of handwritten digits which show that Bayes Point Machines are competitive with the current world champion, the support vector machine. In addition, the computational complexity of BPMs can be tuned by varying the number of samples from the posterior. Finally, rejecting test points on the basis of their (approximative) posterior probability leads to a rapid decrease in generalisation error, e.g. 0.1\% generalisation error for a given rejection rate of 10\%.},
address = {Denver},
author = {Herbrich, Ralf and Graepel, Thore},
booktitle = {Advances in Neural Information Processing Systems 13},
file = {:Users/rherb/Dropbox/Documents/tex/nips2000/mnist/mnist.pdf:pdf},
pages = {528--534},
publisher = {The MIT Press},
title = {{Large Scale Bayes Point Machines}},
url = {http://www.herbrich.me/papers/mnist.pdf},
year = {2000}
}

Downloads: 1

{"_id":{"_str":"53421b61ecd21cdc070003d5"},"__v":9,"authorIDs":["5456e9a38b01c8193000005e","54576c282abc8e9f370003ae"],"author_short":["Herbrich, R.","Graepel, T."],"bibbaseid":"herbrich-graepel-largescalebayespointmachines-2000","bibdata":{"bibtype":"inproceedings","type":"inproceedings","abstract":"The concept of averaging over classifiers is fundamental to the Bayesian analysis of learning. Based on this viewpoint, it has recently been demonstrated for linear classifiers that the centre of mass of version space (the set of all classifiers consistent with the training set) - also known as the Bayes point - exhibits excellent generalisation abilities. However, the billiard algorithm as presented in [Herbrich et al., 2000] is restricted to small sample size because it requires O(m*m) of memory and O(N*m*m) computational steps where m is the number of training patterns and N is the number of random draws from the posterior distribution. In this paper we present a method based on the simple perceptron learning algorithm which allows to overcome this algorithmic drawback. The method is algorithmically simple and is easily extended to the multi-class case. We present experimental results on the MNIST data set of handwritten digits which show that Bayes Point Machines are competitive with the current world champion, the support vector machine. In addition, the computational complexity of BPMs can be tuned by varying the number of samples from the posterior. Finally, rejecting test points on the basis of their (approximative) posterior probability leads to a rapid decrease in generalisation error, e.g. 0.1% generalisation error for a given rejection rate of 10%.","address":"Denver","author":[{"propositions":[],"lastnames":["Herbrich"],"firstnames":["Ralf"],"suffixes":[]},{"propositions":[],"lastnames":["Graepel"],"firstnames":["Thore"],"suffixes":[]}],"booktitle":"Advances in Neural Information Processing Systems 13","file":":Users/rherb/Dropbox/Documents/tex/nips2000/mnist/mnist.pdf:pdf","pages":"528--534","publisher":"The MIT Press","title":"Large Scale Bayes Point Machines","url":"http://www.herbrich.me/papers/mnist.pdf","year":"2000","bibtex":"@inproceedings{DBLP:conf/nips/HerbrichG00a,\nabstract = {The concept of averaging over classifiers is fundamental to the Bayesian analysis of learning. Based on this viewpoint, it has recently been demonstrated for linear classifiers that the centre of mass of version space (the set of all classifiers consistent with the training set) - also known as the Bayes point - exhibits excellent generalisation abilities. However, the billiard algorithm as presented in [Herbrich et al., 2000] is restricted to small sample size because it requires O(m*m) of memory and O(N*m*m) computational steps where m is the number of training patterns and N is the number of random draws from the posterior distribution. In this paper we present a method based on the simple perceptron learning algorithm which allows to overcome this algorithmic drawback. The method is algorithmically simple and is easily extended to the multi-class case. We present experimental results on the MNIST data set of handwritten digits which show that Bayes Point Machines are competitive with the current world champion, the support vector machine. In addition, the computational complexity of BPMs can be tuned by varying the number of samples from the posterior. Finally, rejecting test points on the basis of their (approximative) posterior probability leads to a rapid decrease in generalisation error, e.g. 0.1\\% generalisation error for a given rejection rate of 10\\%.},\naddress = {Denver},\nauthor = {Herbrich, Ralf and Graepel, Thore},\nbooktitle = {Advances in Neural Information Processing Systems 13},\nfile = {:Users/rherb/Dropbox/Documents/tex/nips2000/mnist/mnist.pdf:pdf},\npages = {528--534},\npublisher = {The MIT Press},\ntitle = {{Large Scale Bayes Point Machines}},\nurl = {http://www.herbrich.me/papers/mnist.pdf},\nyear = {2000}\n}\n","author_short":["Herbrich, R.","Graepel, T."],"key":"DBLP:conf/nips/HerbrichG00a","id":"DBLP:conf/nips/HerbrichG00a","bibbaseid":"herbrich-graepel-largescalebayespointmachines-2000","role":"author","urls":{"Paper":"http://www.herbrich.me/papers/mnist.pdf"},"downloads":1,"html":""},"bibtype":"inproceedings","biburl":"http://herbrich.me/bib/herbrich.bib","downloads":1,"keywords":[],"search_terms":["large","scale","bayes","point","machines","herbrich","graepel"],"title":"Large Scale Bayes Point Machines","year":2000,"dataSources":["y2DvMgAcqeDpXQ6ds"]}