Back propagation separates where perceptrons do. Sontag, E. & Sussmann, H. Neural Networks, 4(2):243–249, Elsevier Science Ltd., Oxford, UK, UK, 1991. doi abstract bibtex Feedforward nets with sigmoidal activation functions are often designed by minimizing a cost criterion. It has been pointed out before that this technique may be outperformed by the classical perceptron learning rule, at least on some problems. In this paper, we show that no such pathologies can arise if the error criterion is of a threshold LMS type, i.e., is zero for values ``beyond'' the desired target values. More precisely, we show that if the data are linearly separable, and one considers nets with no hidden neurons, then an error function as above cannot have any local minima that are not global. In addition, the proof gives the following stronger result, under the stated hypotheses: the continuous gradient adjustment procedure is such that from any initial weight configuration a separating set of weights is obtained in finite time. This is a precise analogue of the Perceptron Learning Theorem. The results are then compared with the more classical pattern recognition problem of threshold LMS with linear activations, where no spurious local minima exist even for nonseparable data: here it is shown that even if using the threshold criterion, such bad local minima may occur, if the data are not separable and sigmoids are used. keywords = neural networks , feedforward neural nets ,
@ARTICLE{109699,
AUTHOR = {E.D. Sontag and H.J. Sussmann},
JOURNAL = {Neural Networks},
TITLE = {Back propagation separates where perceptrons do},
YEAR = {1991},
OPTMONTH = {},
OPTNOTE = {},
NUMBER = {2},
PAGES = {243--249},
VOLUME = {4},
ADDRESS = {Oxford, UK, UK},
KEYWORDS = {neural networks, neural networks},
PUBLISHER = {Elsevier Science Ltd.},
PDF = {../../FTPDIR/converge-nn.pdf},
ABSTRACT = { Feedforward nets with sigmoidal activation functions
are often designed by minimizing a cost criterion. It has been
pointed out before that this technique may be outperformed by the
classical perceptron learning rule, at least on some problems. In
this paper, we show that no such pathologies can arise if the error
criterion is of a threshold LMS type, i.e., is zero for values
``beyond'' the desired target values. More precisely, we show that if
the data are linearly separable, and one considers nets with no
hidden neurons, then an error function as above cannot have any local
minima that are not global. In addition, the proof gives the
following stronger result, under the stated hypotheses: the
continuous gradient adjustment procedure is such that from any
initial weight configuration a separating set of weights is obtained
in finite time. This is a precise analogue of the Perceptron Learning
Theorem. The results are then compared with the more classical
pattern recognition problem of threshold LMS with linear activations,
where no spurious local minima exist even for nonseparable data: here
it is shown that even if using the threshold criterion, such bad
local minima may occur, if the data are not separable and sigmoids
are used. keywords = { neural networks , feedforward neural nets }, },
DOI = {http://dx.doi.org/10.1016/0893-6080(91)90008-S}
}
Downloads: 0
{"_id":"ZqTZGsTzCaNMgMgBb","bibbaseid":"sontag-sussmann-backpropagationseparateswhereperceptronsdo-1991","downloads":0,"creationDate":"2018-10-18T05:07:05.794Z","title":"Back propagation separates where perceptrons do","author_short":["Sontag, E.","Sussmann, H."],"year":1991,"bibtype":"article","biburl":"http://www.sontaglab.org/PUBDIR/Biblio/complete-bibliography.bib","bibdata":{"bibtype":"article","type":"article","author":[{"firstnames":["E.D."],"propositions":[],"lastnames":["Sontag"],"suffixes":[]},{"firstnames":["H.J."],"propositions":[],"lastnames":["Sussmann"],"suffixes":[]}],"journal":"Neural Networks","title":"Back propagation separates where perceptrons do","year":"1991","optmonth":"","optnote":"","number":"2","pages":"243–249","volume":"4","address":"Oxford, UK, UK","keywords":"neural networks, neural networks","publisher":"Elsevier Science Ltd.","pdf":"../../FTPDIR/converge-nn.pdf","abstract":"Feedforward nets with sigmoidal activation functions are often designed by minimizing a cost criterion. It has been pointed out before that this technique may be outperformed by the classical perceptron learning rule, at least on some problems. In this paper, we show that no such pathologies can arise if the error criterion is of a threshold LMS type, i.e., is zero for values ``beyond'' the desired target values. More precisely, we show that if the data are linearly separable, and one considers nets with no hidden neurons, then an error function as above cannot have any local minima that are not global. In addition, the proof gives the following stronger result, under the stated hypotheses: the continuous gradient adjustment procedure is such that from any initial weight configuration a separating set of weights is obtained in finite time. This is a precise analogue of the Perceptron Learning Theorem. The results are then compared with the more classical pattern recognition problem of threshold LMS with linear activations, where no spurious local minima exist even for nonseparable data: here it is shown that even if using the threshold criterion, such bad local minima may occur, if the data are not separable and sigmoids are used. keywords = neural networks , feedforward neural nets , ","doi":"http://dx.doi.org/10.1016/0893-6080(91)90008-S","bibtex":"@ARTICLE{109699,\n AUTHOR = {E.D. Sontag and H.J. Sussmann},\n JOURNAL = {Neural Networks},\n TITLE = {Back propagation separates where perceptrons do},\n YEAR = {1991},\n OPTMONTH = {},\n OPTNOTE = {},\n NUMBER = {2},\n PAGES = {243--249},\n VOLUME = {4},\n ADDRESS = {Oxford, UK, UK},\n KEYWORDS = {neural networks, neural networks},\n PUBLISHER = {Elsevier Science Ltd.},\n PDF = {../../FTPDIR/converge-nn.pdf},\n ABSTRACT = { Feedforward nets with sigmoidal activation functions \n are often designed by minimizing a cost criterion. It has been \n pointed out before that this technique may be outperformed by the \n classical perceptron learning rule, at least on some problems. In \n this paper, we show that no such pathologies can arise if the error \n criterion is of a threshold LMS type, i.e., is zero for values \n ``beyond'' the desired target values. More precisely, we show that if \n the data are linearly separable, and one considers nets with no \n hidden neurons, then an error function as above cannot have any local \n minima that are not global. In addition, the proof gives the \n following stronger result, under the stated hypotheses: the \n continuous gradient adjustment procedure is such that from any \n initial weight configuration a separating set of weights is obtained \n in finite time. This is a precise analogue of the Perceptron Learning \n Theorem. The results are then compared with the more classical \n pattern recognition problem of threshold LMS with linear activations, \n where no spurious local minima exist even for nonseparable data: here \n it is shown that even if using the threshold criterion, such bad \n local minima may occur, if the data are not separable and sigmoids \n are used. keywords = { neural networks , feedforward neural nets }, },\n DOI = {http://dx.doi.org/10.1016/0893-6080(91)90008-S}\n}\n\n","author_short":["Sontag, E.","Sussmann, H."],"key":"109699","id":"109699","bibbaseid":"sontag-sussmann-backpropagationseparateswhereperceptronsdo-1991","role":"author","urls":{},"keyword":["neural networks","neural networks"],"downloads":0,"html":""},"search_terms":["back","propagation","separates","perceptrons","sontag","sussmann"],"keywords":["neural networks","neural networks"],"authorIDs":["5bc814f9db768e100000015a"],"dataSources":["DKqZbTmd7peqE4THw"]}