Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems

Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems. Rose, K Proc.~IEEE, 86(11):2210--2239, 1998.
abstract bibtex

The deterministic annealing approach to clustering and its extensions has, demonstrated substantial performance improvement over standard supervised and unsupervised learning methods in a variety of important applications including compression, estimation, pattern recognition and classification and statistical regression. The method offers three important features: 1) the ability to avoid many poor local optima; 2) applicability to many different structures/architectures; and 3) the ability to minimize the right cost function even when its gradients vanish almost everywhere, as in the case of the empirical classification error It is derived within a probabilistic framework from basic information theoretic principles (e.g., maximum entropy and random coding). The application-specific cost is minimized subject to a constraint on the randomness (Shannon entropy) of the solution, which is,gradually lowered. We emphasize intuition gained from analogy to statistical physics, where this is an annealing process that avoids many shallow local minima of the specified cost and, at the limit of zero "temperature," produces a nonrandom (hard) solution. Alternatively, the method is derived within rate-distortion theory, where the annealing process is equivalent to computation of Shannon's rate-distortion function, and the annealing temperature is inversely proportional to the slope of the curve. This provides new insights into the method and its performance, as well as new insights into rate-distortion theory itself. The basic algorithm is extended by incorporating structural constraints to allow optimization of numerous popular structures including vector quantizers, decision trees, multilayer perceptrons, radial basis functions, and mixtures of experts. Experimental results show considerable performance gains over standard structure-specific and application-specific training methods. The paper concludes with a brief discussion of extensions of the method that are currently under investigation.

@article{Rose:1998aa,
	Abstract = {The deterministic annealing approach to clustering and its extensions has, demonstrated substantial performance improvement over standard supervised and unsupervised learning methods in a variety of important applications including compression, estimation, pattern recognition and classification and statistical regression. The method offers three important features: 1) the ability to avoid many poor local optima; 2) applicability to many different structures/architectures; and 3) the ability to minimize the right cost function even when its gradients vanish almost everywhere, as in the case of the empirical classification error It is derived within a probabilistic framework from basic information theoretic principles (e.g., maximum entropy and random coding). The application-specific cost is minimized subject to a constraint on the randomness (Shannon entropy) of the solution, which is,gradually lowered. We emphasize intuition gained from analogy to statistical physics, where this is an annealing process that avoids many shallow local minima of the specified cost and, at the limit of zero "temperature," produces a nonrandom (hard) solution. Alternatively, the method is derived within rate-distortion theory, where the annealing process is equivalent to computation of Shannon's rate-distortion function, and the annealing temperature is inversely proportional to the slope of the curve. This provides new insights into the method and its performance, as well as new insights into rate-distortion theory itself. The basic algorithm is extended by incorporating structural constraints to allow optimization of numerous popular structures including vector quantizers, decision trees, multilayer perceptrons, radial basis functions, and mixtures of experts. Experimental results show considerable performance gains over standard structure-specific and application-specific training methods. The paper concludes with a brief discussion of extensions of the method that are currently under investigation.},
	Author = {Rose, K},
	Date-Added = {2008-11-04 10:55:43 -0500},
	Date-Modified = {2008-11-04 10:56:29 -0500},
	Journal = {Proc.~{IEEE}},
	Keywords = {gtm; classification; clustering; compression; deterministic annealing; maximum entropy; optimization methods; regression; vector quantization},
	Number = {11},
	Pages = {2210--2239},
	Title = {Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems},
	Volume = {86},
	Year = {1998},
	Bdsk-Url-1 = {http://gateway.isiknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=Alerting&SrcApp=Alerting&DestApp=WOS&DestLinkType=FullRecord;KeyUT=000076557300007}}

Downloads: 0

{"_id":"T5TLNXyFdcSdhfzBw","bibbaseid":"rose-deterministicannealingforclusteringcompressionclassificationregressionandrelatedoptimizationproblems-1998","downloads":0,"creationDate":"2016-02-18T13:03:31.174Z","title":"Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems","author_short":["Rose, K"],"year":1998,"bibtype":"article","biburl":"https://dl.dropboxusercontent.com/u/26998770/main.bib","bibdata":{"bibtype":"article","type":"article","abstract":"The deterministic annealing approach to clustering and its extensions has, demonstrated substantial performance improvement over standard supervised and unsupervised learning methods in a variety of important applications including compression, estimation, pattern recognition and classification and statistical regression. The method offers three important features: 1) the ability to avoid many poor local optima; 2) applicability to many different structures/architectures; and 3) the ability to minimize the right cost function even when its gradients vanish almost everywhere, as in the case of the empirical classification error It is derived within a probabilistic framework from basic information theoretic principles (e.g., maximum entropy and random coding). The application-specific cost is minimized subject to a constraint on the randomness (Shannon entropy) of the solution, which is,gradually lowered. We emphasize intuition gained from analogy to statistical physics, where this is an annealing process that avoids many shallow local minima of the specified cost and, at the limit of zero \"temperature,\" produces a nonrandom (hard) solution. Alternatively, the method is derived within rate-distortion theory, where the annealing process is equivalent to computation of Shannon's rate-distortion function, and the annealing temperature is inversely proportional to the slope of the curve. This provides new insights into the method and its performance, as well as new insights into rate-distortion theory itself. The basic algorithm is extended by incorporating structural constraints to allow optimization of numerous popular structures including vector quantizers, decision trees, multilayer perceptrons, radial basis functions, and mixtures of experts. Experimental results show considerable performance gains over standard structure-specific and application-specific training methods. The paper concludes with a brief discussion of extensions of the method that are currently under investigation.","author":[{"propositions":[],"lastnames":["Rose"],"firstnames":["K"],"suffixes":[]}],"date-added":"2008-11-04 10:55:43 -0500","date-modified":"2008-11-04 10:56:29 -0500","journal":"Proc.~IEEE","keywords":"gtm; classification; clustering; compression; deterministic annealing; maximum entropy; optimization methods; regression; vector quantization","number":"11","pages":"2210--2239","title":"Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems","volume":"86","year":"1998","bdsk-url-1":"http://gateway.isiknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=Alerting&SrcApp=Alerting&DestApp=WOS&DestLinkType=FullRecord;KeyUT=000076557300007","bibtex":"@article{Rose:1998aa,\n\tAbstract = {The deterministic annealing approach to clustering and its extensions has, demonstrated substantial performance improvement over standard supervised and unsupervised learning methods in a variety of important applications including compression, estimation, pattern recognition and classification and statistical regression. The method offers three important features: 1) the ability to avoid many poor local optima; 2) applicability to many different structures/architectures; and 3) the ability to minimize the right cost function even when its gradients vanish almost everywhere, as in the case of the empirical classification error It is derived within a probabilistic framework from basic information theoretic principles (e.g., maximum entropy and random coding). The application-specific cost is minimized subject to a constraint on the randomness (Shannon entropy) of the solution, which is,gradually lowered. We emphasize intuition gained from analogy to statistical physics, where this is an annealing process that avoids many shallow local minima of the specified cost and, at the limit of zero \"temperature,\" produces a nonrandom (hard) solution. Alternatively, the method is derived within rate-distortion theory, where the annealing process is equivalent to computation of Shannon's rate-distortion function, and the annealing temperature is inversely proportional to the slope of the curve. This provides new insights into the method and its performance, as well as new insights into rate-distortion theory itself. The basic algorithm is extended by incorporating structural constraints to allow optimization of numerous popular structures including vector quantizers, decision trees, multilayer perceptrons, radial basis functions, and mixtures of experts. Experimental results show considerable performance gains over standard structure-specific and application-specific training methods. The paper concludes with a brief discussion of extensions of the method that are currently under investigation.},\n\tAuthor = {Rose, K},\n\tDate-Added = {2008-11-04 10:55:43 -0500},\n\tDate-Modified = {2008-11-04 10:56:29 -0500},\n\tJournal = {Proc.~{IEEE}},\n\tKeywords = {gtm; classification; clustering; compression; deterministic annealing; maximum entropy; optimization methods; regression; vector quantization},\n\tNumber = {11},\n\tPages = {2210--2239},\n\tTitle = {Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems},\n\tVolume = {86},\n\tYear = {1998},\n\tBdsk-Url-1 = {http://gateway.isiknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=Alerting&SrcApp=Alerting&DestApp=WOS&DestLinkType=FullRecord;KeyUT=000076557300007}}\n\n","author_short":["Rose, K"],"key":"Rose:1998aa","id":"Rose:1998aa","bibbaseid":"rose-deterministicannealingforclusteringcompressionclassificationregressionandrelatedoptimizationproblems-1998","role":"author","urls":{},"keyword":["gtm; classification; clustering; compression; deterministic annealing; maximum entropy; optimization methods; regression; vector quantization"],"downloads":0},"search_terms":["deterministic","annealing","clustering","compression","classification","regression","related","optimization","problems","rose"],"keywords":["gtm; classification; clustering; compression; deterministic annealing; maximum entropy; optimization methods; regression; vector quantization"],"authorIDs":[],"dataSources":["c5japf9eAQRaeMS4h"]}