Introduction to Probabilistic Topic Models. Blei, D. M. Communications of the ACM, 55(4):77–84, 2012.
doi  abstract   bibtex   
Abstract: Probabilistic topic models are a suite of algorithms whose aim is to discover the hidden thematic structure in large archives of documents. In this article, we review the main ideas of this ⬚eld, survey the current state-of-the-art, and describe some promising future directions. We ⬚rst describe latent Dirichlet allocation (LDA) [8], which is the simplest kind of topic model. We discuss its connections to probabilistic modeling, and describe two kinds of algorithms for topic discovery. We then survey the growing body of research that extends and applies topic models in interesting ways. These extensions have been developed by relaxing some of the statistical assumptions of LDA, incorporating meta-data into the analysis of the documents, and using similar kinds of models on a diversity of data types such as social networks, images and genetics. Finally, we give our thoughts as to some of the important unexplored directions for topic modeling. These include rigorous methods for checking models built for data exploration, new approaches to visualizing text and other high dimensional data, and moving beyond traditional information engineering applications towards using topic models for more scienti⬚c ends.
@article{blei_introduction_2012,
	title = {Introduction to {Probabilistic} {Topic} {Models}},
	volume = {55},
	doi = {10.1145/2133806.2133826},
	abstract = {Abstract: Probabilistic topic models are a suite of algorithms whose aim is to discover the hidden thematic structure in large archives of documents. In this article, we review the main ideas of this ⬚eld, survey the current state-of-the-art, and describe some promising future directions. We ⬚rst describe latent Dirichlet allocation (LDA) [8], which is the simplest kind of topic model. We discuss its connections to probabilistic modeling,
and describe two kinds of algorithms for topic discovery. We then survey the growing
body of research that extends and applies topic models in interesting ways. These
extensions have been developed by relaxing some of the statistical assumptions of LDA,
incorporating meta-data into the analysis of the documents, and using similar kinds
of models on a diversity of data types such as social networks, images and genetics.
Finally, we give our thoughts as to some of the important unexplored directions for
topic modeling. These include rigorous methods for checking models built for data
exploration, new approaches to visualizing text and other high dimensional data, and
moving beyond traditional information engineering applications towards using topic models for more scienti⬚c ends.},
	language = {en},
	number = {4},
	journal = {Communications of the ACM},
	author = {Blei, David M.},
	year = {2012},
	keywords = {*****, act\_ContentAnalysis, goal\_Analysis, meta\_GiveOverview, t\_TopicModeling},
	pages = {77--84},
}

Downloads: 0