Back propagation separates where perceptrons do. Sontag, E. & Sussmann, H. Neural Networks, 4(2):243–249, Elsevier Science Ltd., Oxford, UK, UK, 1991.
doi  abstract   bibtex   
Feedforward nets with sigmoidal activation functions are often designed by minimizing a cost criterion. It has been pointed out before that this technique may be outperformed by the classical perceptron learning rule, at least on some problems. In this paper, we show that no such pathologies can arise if the error criterion is of a threshold LMS type, i.e., is zero for values ``beyond'' the desired target values. More precisely, we show that if the data are linearly separable, and one considers nets with no hidden neurons, then an error function as above cannot have any local minima that are not global. In addition, the proof gives the following stronger result, under the stated hypotheses: the continuous gradient adjustment procedure is such that from any initial weight configuration a separating set of weights is obtained in finite time. This is a precise analogue of the Perceptron Learning Theorem. The results are then compared with the more classical pattern recognition problem of threshold LMS with linear activations, where no spurious local minima exist even for nonseparable data: here it is shown that even if using the threshold criterion, such bad local minima may occur, if the data are not separable and sigmoids are used. keywords = neural networks , feedforward neural nets ,
@ARTICLE{109699,
   AUTHOR       = {E.D. Sontag and H.J. Sussmann},
   JOURNAL      = {Neural Networks},
   TITLE        = {Back propagation separates where perceptrons do},
   YEAR         = {1991},
   OPTMONTH     = {},
   OPTNOTE      = {},
   NUMBER       = {2},
   PAGES        = {243--249},
   VOLUME       = {4},
   ADDRESS      = {Oxford, UK, UK},
   KEYWORDS     = {neural networks, neural networks},
   PUBLISHER    = {Elsevier Science Ltd.},
   PDF          = {../../FTPDIR/converge-nn.pdf},
   ABSTRACT     = { Feedforward nets with sigmoidal activation functions 
      are often designed by minimizing a cost criterion. It has been 
      pointed out before that this technique may be outperformed by the 
      classical perceptron learning rule, at least on some problems. In 
      this paper, we show that no such pathologies can arise if the error 
      criterion is of a threshold LMS type, i.e., is zero for values 
      ``beyond'' the desired target values. More precisely, we show that if 
      the data are linearly separable, and one considers nets with no 
      hidden neurons, then an error function as above cannot have any local 
      minima that are not global. In addition, the proof gives the 
      following stronger result, under the stated hypotheses: the 
      continuous gradient adjustment procedure is such that from any 
      initial weight configuration a separating set of weights is obtained 
      in finite time. This is a precise analogue of the Perceptron Learning 
      Theorem. The results are then compared with the more classical 
      pattern recognition problem of threshold LMS with linear activations, 
      where no spurious local minima exist even for nonseparable data: here 
      it is shown that even if using the threshold criterion, such bad 
      local minima may occur, if the data are not separable and sigmoids 
      are used. keywords = { neural networks , feedforward neural nets }, },
   DOI          = {http://dx.doi.org/10.1016/0893-6080(91)90008-S}
}

Downloads: 0