A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work

A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work. Herbrich, R. & Graepel, T. In Advances in Neural Information Processing Systems 13, pages 224--230, Denver, 2000.

Paper abstract bibtex 7 downloads

We present a bound on the generalisation error of linear classifiers in terms of a refined margin quantity on the training set. The result is obtained in a PAC-Bayesian framework and is based on geometrical arguments in the space of linear classifiers. The new bound constitutes an exponential improvement of the so far tightest margin bound by Shawe-Taylor et al. and scales logarithmically in the inverse margin. Even in the case of less training examples than input dimensions sufficiently large margins lead to non-trivial bound values and –for maximum margins - to a vanishing complexity term. Furthermore, the classical margin is too coarse a measure for the essential quantity that controls the generalisation error: the volume ratio between the whole hypothesis space and the subset of consistent hypotheses. The practical relevance of the result lies in the fact that the well-known support vector machine is optimal w.r.t. the new bound only if the feature vectors are all of the same length. As a consequence we recommend to use SVMs on normalised feature vectors only - a recommendation that is well supported by our numerical experiments on two benchmark data sets.

@inproceedings{DBLP:conf/nips/HerbrichG00,
abstract = {We present a bound on the generalisation error of linear classifiers in terms of a refined margin quantity on the training set. The result is obtained in a PAC-Bayesian framework and is based on geometrical arguments in the space of linear classifiers. The new bound constitutes an exponential improvement of the so far tightest margin bound by Shawe-Taylor et al. and scales logarithmically in the inverse margin. Even in the case of less training examples than input dimensions sufficiently large margins lead to non-trivial bound values and –for maximum margins - to a vanishing complexity term. Furthermore, the classical margin is too coarse a measure for the essential quantity that controls the generalisation error: the volume ratio between the whole hypothesis space and the subset of consistent hypotheses. The practical relevance of the result lies in the fact that the well-known support vector machine is optimal w.r.t. the new bound only if the feature vectors are all of the same length. As a consequence we recommend to use SVMs on normalised feature vectors only - a recommendation that is well supported by our numerical experiments on two benchmark data sets.},
address = {Denver},
author = {Herbrich, Ralf and Graepel, Thore},
booktitle = {Advances in Neural Information Processing Systems 13},
file = {:Users/rherb/Dropbox/Documents/tex/nips2000/pac-bayes/pacbayes.pdf:pdf},
pages = {224--230},
title = {{A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work}},
url = {http://www.herbrich.me/papers/pacbayes.pdf},
year = {2000}
}

Downloads: 7

{"_id":{"_str":"53421b61ecd21cdc070003e4"},"__v":9,"authorIDs":["5456e9a38b01c8193000005e","54576c282abc8e9f370003ae"],"author_short":["Herbrich, R.","Graepel, T."],"bibbaseid":"herbrich-graepel-apacbayesianmarginboundforlinearclassifierswhysvmswork-2000","bibdata":{"bibtype":"inproceedings","type":"inproceedings","abstract":"We present a bound on the generalisation error of linear classifiers in terms of a refined margin quantity on the training set. The result is obtained in a PAC-Bayesian framework and is based on geometrical arguments in the space of linear classifiers. The new bound constitutes an exponential improvement of the so far tightest margin bound by Shawe-Taylor et al. and scales logarithmically in the inverse margin. Even in the case of less training examples than input dimensions sufficiently large margins lead to non-trivial bound values and –for maximum margins - to a vanishing complexity term. Furthermore, the classical margin is too coarse a measure for the essential quantity that controls the generalisation error: the volume ratio between the whole hypothesis space and the subset of consistent hypotheses. The practical relevance of the result lies in the fact that the well-known support vector machine is optimal w.r.t. the new bound only if the feature vectors are all of the same length. As a consequence we recommend to use SVMs on normalised feature vectors only - a recommendation that is well supported by our numerical experiments on two benchmark data sets.","address":"Denver","author":[{"propositions":[],"lastnames":["Herbrich"],"firstnames":["Ralf"],"suffixes":[]},{"propositions":[],"lastnames":["Graepel"],"firstnames":["Thore"],"suffixes":[]}],"booktitle":"Advances in Neural Information Processing Systems 13","file":":Users/rherb/Dropbox/Documents/tex/nips2000/pac-bayes/pacbayes.pdf:pdf","pages":"224--230","title":"A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work","url":"http://www.herbrich.me/papers/pacbayes.pdf","year":"2000","bibtex":"@inproceedings{DBLP:conf/nips/HerbrichG00,\nabstract = {We present a bound on the generalisation error of linear classifiers in terms of a refined margin quantity on the training set. The result is obtained in a PAC-Bayesian framework and is based on geometrical arguments in the space of linear classifiers. The new bound constitutes an exponential improvement of the so far tightest margin bound by Shawe-Taylor et al. and scales logarithmically in the inverse margin. Even in the case of less training examples than input dimensions sufficiently large margins lead to non-trivial bound values and –for maximum margins - to a vanishing complexity term. Furthermore, the classical margin is too coarse a measure for the essential quantity that controls the generalisation error: the volume ratio between the whole hypothesis space and the subset of consistent hypotheses. The practical relevance of the result lies in the fact that the well-known support vector machine is optimal w.r.t. the new bound only if the feature vectors are all of the same length. As a consequence we recommend to use SVMs on normalised feature vectors only - a recommendation that is well supported by our numerical experiments on two benchmark data sets.},\naddress = {Denver},\nauthor = {Herbrich, Ralf and Graepel, Thore},\nbooktitle = {Advances in Neural Information Processing Systems 13},\nfile = {:Users/rherb/Dropbox/Documents/tex/nips2000/pac-bayes/pacbayes.pdf:pdf},\npages = {224--230},\ntitle = {{A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work}},\nurl = {http://www.herbrich.me/papers/pacbayes.pdf},\nyear = {2000}\n}\n","author_short":["Herbrich, R.","Graepel, T."],"key":"DBLP:conf/nips/HerbrichG00","id":"DBLP:conf/nips/HerbrichG00","bibbaseid":"herbrich-graepel-apacbayesianmarginboundforlinearclassifierswhysvmswork-2000","role":"author","urls":{"Paper":"http://www.herbrich.me/papers/pacbayes.pdf"},"downloads":7,"html":""},"bibtype":"inproceedings","biburl":"http://herbrich.me/bib/herbrich.bib","downloads":7,"keywords":[],"search_terms":["pac","bayesian","margin","bound","linear","classifiers","svms","work","herbrich","graepel"],"title":"A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work","year":2000,"dataSources":["y2DvMgAcqeDpXQ6ds"]}