An Accelerated Algorithm for Density Estimation in Large Databases, Using Gaussian Mixtures. Soto, A., Zavala, F., & Araneda, A. Cybernetics and Systems: An International Journal, 38(2):123-139, 2007. Paper abstract bibtex 2 downloads Today, with the advances of computer storage and technology, there are huge datasets available, offering an opportunity to extract valuable information. Probabilistic approaches are specially suited to learn from data by representing knowledge as density functions. In this paper, we choose Gaussian Mixture Models (GMMs) to represent densities, as they possess great flexibility to adequate to a wide class of problems. The classical estimation approach for GMMs corresponds to the iterative algorithm of Expectation Maximization. This approach, however, does not scale properly to meet the high demanding processing requirements of large databases. In this paper we introduce an EM-based algorithm, that solves the scalability problem. Our approach is based on the concept of data condensation which, in addition to substantially diminishing the computational load, provides sound starting values that allow the algorithm to reach convergence faster. We also focus on the model selection problem. We test our algorithm using synthetic and real databases, and find several advantages, when compared to other standard existing procedures.
@Article{ soto:zavala:araneda:2007,
author = {A. Soto and F. Zavala and A. Araneda},
title = {An Accelerated Algorithm for Density Estimation in Large
Databases, Using Gaussian Mixtures},
journal = {Cybernetics and Systems: An International Journal},
volume = {38},
number = {2},
pages = {123-139},
year = {2007},
abstract = {Today, with the advances of computer storage and
technology, there are huge datasets available, offering an
opportunity to extract valuable information. Probabilistic
approaches are specially suited to learn from data by
representing knowledge as density functions. In this paper,
we choose Gaussian Mixture Models (GMMs) to represent
densities, as they possess great flexibility to adequate to
a wide class of problems. The classical estimation approach
for GMMs corresponds to the iterative algorithm of
Expectation Maximization. This approach, however, does not
scale properly to meet the high demanding processing
requirements of large databases. In this paper we introduce
an EM-based algorithm, that solves the scalability problem.
Our approach is based on the concept of data condensation
which, in addition to substantially diminishing the
computational load, provides sound starting values that
allow the algorithm to reach convergence faster. We also
focus on the model selection problem. We test our algorithm
using synthetic and real databases, and find several
advantages, when compared to other standard existing
procedures.},
url = {http://saturno.ing.puc.cl/media/papers_alvaro/Felipe-07.pdf}
}
Downloads: 2
{"_id":{"_str":"53427a470e946d920a0018e4"},"__v":1,"authorIDs":["32ZR23o2BFySHbtQK","3ear6KFZSRqbj6YeT","4Pq6KLaQ8jKGXHZWH","54578d9a2abc8e9f370004f0","5e126ca5a4cabfdf01000053","5e158f76f1f31adf01000118","5e16174bf67f7dde010003ad","5e1f631ae8f5ddde010000eb","5e1f7182e8f5ddde010001ff","5e26da3642065ede01000066","5e3acefaf2a00cdf010001c8","5e62c3aecb259cde010000f9","5e65830c6e5f4cf3010000e7","5e666dfc46e828de010002c9","6cMBYieMJhf6Nd58M","6w6sGsxYSK2Quk6yZ","7xDcntrrtC62vkWM5","ARw5ReidxxZii9TTZ","BjzM7QpRCG7uCF7Zf","DQ4JRTTWkvKXtCNCp","GbYBJvxugXMriQwbi","HhRoRmBvwWfD4oLyK","JFk6x26H6LZMoht2n","JvArGGu5qM6EvSCvB","LpqQBhFH3PxepH9KY","MT4TkSGzAp69M3dGt","QFECgvB5v2i4j2Qzs","RKv56Kes3h6FwEa55","Rb9TkQ3KkhGAaNyXq","RdND8NxcJDsyZdkcK","SpKJ5YujbHKZnHc4v","TSRdcx4bbYKqcGbDg","W8ogS2GJa6sQKy26c","WTi3X2fT8dzBN5d8b","WfZbctNQYDBaiYW6n","XZny8xuqwfoxzhBCB","Xk2Q5qedS5MFHvjEW","bbARiTJLYS79ZMFbk","cBxsyeZ37EucQeBYK","cFyFQps7W3Sa2Wope","dGRBfr8zhMmbwK6eP","eRLgwkrEk7T7Lmzmf","fMYSCX8RMZap548vv","g6iKCQCFnJgKYYHaP","h2hTcQYuf2PB3oF8t","h83jBvZYJPJGutQrs","jAtuJBcGhng4Lq2Nd","pMoo2gotJcdDPwfrw","q5Zunk5Y2ruhw5vyq","rzNGhqxkbt2MvGY29","uC8ATA8AfngWpYLBq","uoJ7BKv28Q6TtPmPp","vMiJzqEKCsBxBEa3v","vQE6iTPpjxpuLip2Z","wQDRsDjhgpMJDGxWX","wbNg79jvDpzX9zHLK","wk86BgRiooBjy323E","zCbPxKnQGgDHiHMWn","zf9HENjsAzdWLMDAu"],"author_short":["Soto, A.","Zavala, F.","Araneda, A."],"bibbaseid":"soto-zavala-araneda-anacceleratedalgorithmfordensityestimationinlargedatabasesusinggaussianmixtures-2007","bibdata":{"bibtype":"article","type":"article","author":[{"firstnames":["A."],"propositions":[],"lastnames":["Soto"],"suffixes":[]},{"firstnames":["F."],"propositions":[],"lastnames":["Zavala"],"suffixes":[]},{"firstnames":["A."],"propositions":[],"lastnames":["Araneda"],"suffixes":[]}],"title":"An Accelerated Algorithm for Density Estimation in Large Databases, Using Gaussian Mixtures","journal":"Cybernetics and Systems: An International Journal","volume":"38","number":"2","pages":"123-139","year":"2007","abstract":"Today, with the advances of computer storage and technology, there are huge datasets available, offering an opportunity to extract valuable information. Probabilistic approaches are specially suited to learn from data by representing knowledge as density functions. In this paper, we choose Gaussian Mixture Models (GMMs) to represent densities, as they possess great flexibility to adequate to a wide class of problems. The classical estimation approach for GMMs corresponds to the iterative algorithm of Expectation Maximization. This approach, however, does not scale properly to meet the high demanding processing requirements of large databases. In this paper we introduce an EM-based algorithm, that solves the scalability problem. Our approach is based on the concept of data condensation which, in addition to substantially diminishing the computational load, provides sound starting values that allow the algorithm to reach convergence faster. We also focus on the model selection problem. We test our algorithm using synthetic and real databases, and find several advantages, when compared to other standard existing procedures.","url":"http://saturno.ing.puc.cl/media/papers_alvaro/Felipe-07.pdf","bibtex":"@Article{\t soto:zavala:araneda:2007,\n author\t= {A. Soto and F. Zavala and A. Araneda},\n title\t\t= {An Accelerated Algorithm for Density Estimation in Large\n\t\t Databases, Using Gaussian Mixtures},\n journal\t= {Cybernetics and Systems: An International Journal},\n volume\t= {38},\n number\t= {2},\n pages\t\t= {123-139},\n year\t\t= {2007},\n abstract\t= {Today, with the advances of computer storage and\n\t\t technology, there are huge datasets available, offering an\n\t\t opportunity to extract valuable information. Probabilistic\n\t\t approaches are specially suited to learn from data by\n\t\t representing knowledge as density functions. In this paper,\n\t\t we choose Gaussian Mixture Models (GMMs) to represent\n\t\t densities, as they possess great flexibility to adequate to\n\t\t a wide class of problems. The classical estimation approach\n\t\t for GMMs corresponds to the iterative algorithm of\n\t\t Expectation Maximization. This approach, however, does not\n\t\t scale properly to meet the high demanding processing\n\t\t requirements of large databases. In this paper we introduce\n\t\t an EM-based algorithm, that solves the scalability problem.\n\t\t Our approach is based on the concept of data condensation\n\t\t which, in addition to substantially diminishing the\n\t\t computational load, provides sound starting values that\n\t\t allow the algorithm to reach convergence faster. We also\n\t\t focus on the model selection problem. We test our algorithm\n\t\t using synthetic and real databases, and find several\n\t\t advantages, when compared to other standard existing\n\t\t procedures.},\n url\t\t= {http://saturno.ing.puc.cl/media/papers_alvaro/Felipe-07.pdf}\n}\n\n","author_short":["Soto, A.","Zavala, F.","Araneda, A."],"key":"soto:zavala:araneda:2007","id":"soto:zavala:araneda:2007","bibbaseid":"soto-zavala-araneda-anacceleratedalgorithmfordensityestimationinlargedatabasesusinggaussianmixtures-2007","role":"author","urls":{"Paper":"http://saturno.ing.puc.cl/media/papers_alvaro/Felipe-07.pdf"},"metadata":{"authorlinks":{"soto, a":"https://asoto.ing.puc.cl/publications/"}},"downloads":2},"bibtype":"article","biburl":"https://raw.githubusercontent.com/ialab-puc/ialab.ing.puc.cl/master/pubs.bib","downloads":2,"keywords":[],"search_terms":["accelerated","algorithm","density","estimation","large","databases","using","gaussian","mixtures","soto","zavala","araneda"],"title":"An Accelerated Algorithm for Density Estimation in Large Databases, Using Gaussian Mixtures","year":2007,"dataSources":["3YPRCmmijLqF4qHXd","sg6yZ29Z2xB5xP79R","sj4fjnZAPkEeYdZqL","m8qFBfFbjk9qWjcmJ","QjT2DEZoWmQYxjHXS"]}