Mixture of Experts with Entropic Regularization for Data Classification. B. Peralta, A. S. & L. Caro, A. S. Entropy, 2019.
Mixture of Experts with Entropic Regularization for Data Classification [link]Paper  abstract   bibtex   2 downloads  
Today, there is growing interest in the automatic classification of a variety of tasks, such as weather forecasting, product recommendations, intrusion detection, and people recognition.“Mixture-of-experts” is a well-known classification technique; it is a probabilistic model consisting of local expert classifiers weighted by a gate network that is typically based on softmax functions, combined with learnable complex patterns in data. In this scheme, one data point is influenced by only one expert; as a result, the training process can be misguided in real datasets for which complex data need to be explained by multiple experts. In this work, we propose a variant of the regular mixture-of-experts model. In the proposed model, the cost classification is penalized by the Shannon entropy of the gating network in order to avoid a “winner-takes-all” output for the gating network. Experiments show the advantage of our approach using several real datasets, with improvements in mean accuracy of 3–6% in some datasets. In future work, we plan to embed feature selection into this model.
@Article{	  peralta:etal:2019,
  author	= {B. Peralta, A. Saavedra, L. Caro, A. Soto},
  title		= {Mixture of Experts with Entropic Regularization for Data
		  Classification},
  journal	= {Entropy},
  volume	= {21},
  number	= {2},
  year		= {2019},
  abstract	= {Today, there is growing interest in the automatic
		  classification of a variety of tasks, such as weather
		  forecasting, product recommendations, intrusion detection,
		  and people recognition.“Mixture-of-experts” is a
		  well-known classification technique; it is a probabilistic
		  model consisting of local expert classifiers weighted by a
		  gate network that is typically based on softmax functions,
		  combined with learnable complex patterns in data. In this
		  scheme, one data point is influenced by only one expert; as
		  a result, the training process can be misguided in real
		  datasets for which complex data need to be explained by
		  multiple experts. In this work, we propose a variant of the
		  regular mixture-of-experts model. In the proposed model,
		  the cost classification is penalized by the Shannon entropy
		  of the gating network in order to avoid a
		  “winner-takes-all” output for the gating network.
		  Experiments show the advantage of our approach using
		  several real datasets, with improvements in mean accuracy
		  of 3–6\% in some datasets. In future work, we plan to
		  embed feature selection into this model.},
  url		= {https://www.mdpi.com/1099-4300/21/2/190}
}

Downloads: 2