Mixture of Experts with Entropic Regularization for Data Classification

Mixture of Experts with Entropic Regularization for Data Classification. B. Peralta, A. S. & L. Caro, A. S. Entropy, 2019.

Paper abstract bibtex

Today, there is growing interest in the automatic classification of a variety of tasks, such as weather forecasting, product recommendations, intrusion detection, and people recognition.“Mixture-of-experts” is a well-known classification technique; it is a probabilistic model consisting of local expert classifiers weighted by a gate network that is typically based on softmax functions, combined with learnable complex patterns in data. In this scheme, one data point is influenced by only one expert; as a result, the training process can be misguided in real datasets for which complex data need to be explained by multiple experts. In this work, we propose a variant of the regular mixture-of-experts model. In the proposed model, the cost classification is penalized by the Shannon entropy of the gating network in order to avoid a “winner-takes-all” output for the gating network. Experiments show the advantage of our approach using several real datasets, with improvements in mean accuracy of 3–6% in some datasets. In future work, we plan to embed feature selection into this model.

@Article{	  peralta:etal:2019,
  author	= {B. Peralta, A. Saavedra, L. Caro, A. Soto},
  title		= {Mixture of Experts with Entropic Regularization for Data
		  Classification},
  journal	= {Entropy},
  volume	= {21},
  number	= {2},
  year		= {2019},
  abstract	= {Today, there is growing interest in the automatic
		  classification of a variety of tasks, such as weather
		  forecasting, product recommendations, intrusion detection,
		  and people recognition.“Mixture-of-experts” is a
		  well-known classification technique; it is a probabilistic
		  model consisting of local expert classifiers weighted by a
		  gate network that is typically based on softmax functions,
		  combined with learnable complex patterns in data. In this
		  scheme, one data point is influenced by only one expert; as
		  a result, the training process can be misguided in real
		  datasets for which complex data need to be explained by
		  multiple experts. In this work, we propose a variant of the
		  regular mixture-of-experts model. In the proposed model,
		  the cost classification is penalized by the Shannon entropy
		  of the gating network in order to avoid a
		  “winner-takes-all” output for the gating network.
		  Experiments show the advantage of our approach using
		  several real datasets, with improvements in mean accuracy
		  of 3–6\% in some datasets. In future work, we plan to
		  embed feature selection into this model.},
  url		= {https://www.mdpi.com/1099-4300/21/2/190}
}

Downloads: 0

{"_id":"HsmENXbevfCGtJS7b","bibbaseid":"bperalta-lcaro-mixtureofexpertswithentropicregularizationfordataclassification-2019","author_short":["B. Peralta, A. S.","L. Caro, A. S."],"bibdata":{"bibtype":"article","type":"article","author":[{"propositions":[],"lastnames":["B.","Peralta"],"firstnames":["A.","Saavedra"],"suffixes":[]},{"propositions":[],"lastnames":["L.","Caro"],"firstnames":["A.","Soto"],"suffixes":[]}],"title":"Mixture of Experts with Entropic Regularization for Data Classification","journal":"Entropy","volume":"21","number":"2","year":"2019","abstract":"Today, there is growing interest in the automatic classification of a variety of tasks, such as weather forecasting, product recommendations, intrusion detection, and people recognition.“Mixture-of-experts” is a well-known classification technique; it is a probabilistic model consisting of local expert classifiers weighted by a gate network that is typically based on softmax functions, combined with learnable complex patterns in data. In this scheme, one data point is influenced by only one expert; as a result, the training process can be misguided in real datasets for which complex data need to be explained by multiple experts. In this work, we propose a variant of the regular mixture-of-experts model. In the proposed model, the cost classification is penalized by the Shannon entropy of the gating network in order to avoid a “winner-takes-all” output for the gating network. Experiments show the advantage of our approach using several real datasets, with improvements in mean accuracy of 3–6% in some datasets. In future work, we plan to embed feature selection into this model.","url":"https://www.mdpi.com/1099-4300/21/2/190","bibtex":"@Article{\t peralta:etal:2019,\n author\t= {B. Peralta, A. Saavedra, L. Caro, A. Soto},\n title\t\t= {Mixture of Experts with Entropic Regularization for Data\n\t\t Classification},\n journal\t= {Entropy},\n volume\t= {21},\n number\t= {2},\n year\t\t= {2019},\n abstract\t= {Today, there is growing interest in the automatic\n\t\t classification of a variety of tasks, such as weather\n\t\t forecasting, product recommendations, intrusion detection,\n\t\t and people recognition.“Mixture-of-experts” is a\n\t\t well-known classification technique; it is a probabilistic\n\t\t model consisting of local expert classifiers weighted by a\n\t\t gate network that is typically based on softmax functions,\n\t\t combined with learnable complex patterns in data. In this\n\t\t scheme, one data point is influenced by only one expert; as\n\t\t a result, the training process can be misguided in real\n\t\t datasets for which complex data need to be explained by\n\t\t multiple experts. In this work, we propose a variant of the\n\t\t regular mixture-of-experts model. In the proposed model,\n\t\t the cost classification is penalized by the Shannon entropy\n\t\t of the gating network in order to avoid a\n\t\t “winner-takes-all” output for the gating network.\n\t\t Experiments show the advantage of our approach using\n\t\t several real datasets, with improvements in mean accuracy\n\t\t of 3–6\\% in some datasets. In future work, we plan to\n\t\t embed feature selection into this model.},\n url\t\t= {https://www.mdpi.com/1099-4300/21/2/190}\n}\n\n","author_short":["B. Peralta, A. S.","L. Caro, A. S."],"key":"peralta:etal:2019","id":"peralta:etal:2019","bibbaseid":"bperalta-lcaro-mixtureofexpertswithentropicregularizationfordataclassification-2019","role":"author","urls":{"Paper":"https://www.mdpi.com/1099-4300/21/2/190"},"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://raw.githubusercontent.com/ialab-puc/ialab.ing.puc.cl/master/pubs.bib","dataSources":["sg6yZ29Z2xB5xP79R"],"keywords":[],"search_terms":["mixture","experts","entropic","regularization","data","classification","b. peralta","l. caro"],"title":"Mixture of Experts with Entropic Regularization for Data Classification","year":2019}