Probabilistic topic models. Blei, D. M. Communications of the ACM, 55(4):77–84, April, 2012. 🏷️ /unread、meta_GiveOverview、*****、act_ContentAnalysis、goal_Analysis、t_TopicModeling
Probabilistic topic models [link]Paper  doi  abstract   bibtex   
Abstract: Probabilistic topic models are a suite of algorithms whose aim is to discover the hidden thematic structure in large archives of documents. In this article, we review the main ideas of this ⬚eld, survey the current state-of-the-art, and describe some promising future directions. We ⬚rst describe latent Dirichlet allocation (LDA) [8], which is the simplest kind of topic model. We discuss its connections to probabilistic modeling, and describe two kinds of algorithms for topic discovery. We then survey the growing body of research that extends and applies topic models in interesting ways. These extensions have been developed by relaxing some of the statistical assumptions of LDA, incorporating meta-data into the analysis of the documents, and using similar kinds of models on a diversity of data types such as social networks, images and genetics. Finally, we give our thoughts as to some of the important unexplored directions for topic modeling. These include rigorous methods for checking models built for data exploration, new approaches to visualizing text and other high dimensional data, and moving beyond traditional information engineering applications towards using topic models for more scienti⬚c ends. 【摘要翻译】摘要:概率主题模型是一套算法,其目的是发现大型文档档案中隐藏的主题结构。在本文中,我们回顾了这一领域的主要观点,考察了当前的最新技术,并描述了一些有前景的未来方向。我们首先介绍了潜狄利克特分配(LDA)[8],这是最简单的一种主题模型。我们讨论了它与概率建模的联系、 并介绍了两种发现主题的算法。然后,我们将考察 以有趣的方式扩展和应用主题模型的研究。这些 这些扩展是通过放宽 LDA 的一些统计假设而发展起来的、 将元数据纳入文档分析,并在多种数据类型中使用类似的模型。 在社交网络、图像和遗传学等多种数据类型上使用类似的模型。 最后,我们对主题建模的一些重要的、尚未探索的方向提出了自己的看法。 主题建模的一些重要方向。这些方向包括检查为数据探索而建立的模型的严格方法 探索、文本和其他高维数据可视化的新方法,以及 超越传统的信息工程应用,将主题模型用于更科学的目的。
@article{blei2012a,
	title = {Probabilistic topic models},
	volume = {55},
	issn = {0001-0782},
	shorttitle = {概率主题模型简介},
	url = {https://dl.acm.org/doi/10.1145/2133806.2133826},
	doi = {10.1145/2133806.2133826},
	abstract = {Abstract: Probabilistic topic models are a suite of algorithms whose aim is to discover the hidden thematic structure in large archives of documents. In this article, we review the main ideas of this ⬚eld, survey the current state-of-the-art, and describe some promising future directions. We ⬚rst describe latent Dirichlet allocation (LDA) [8], which is the simplest kind of topic model. We discuss its connections to probabilistic modeling,
and describe two kinds of algorithms for topic discovery. We then survey the growing
body of research that extends and applies topic models in interesting ways. These
extensions have been developed by relaxing some of the statistical assumptions of LDA,
incorporating meta-data into the analysis of the documents, and using similar kinds
of models on a diversity of data types such as social networks, images and genetics.
Finally, we give our thoughts as to some of the important unexplored directions for
topic modeling. These include rigorous methods for checking models built for data
exploration, new approaches to visualizing text and other high dimensional data, and
moving beyond traditional information engineering applications towards using topic models for more scienti⬚c ends.

【摘要翻译】摘要:概率主题模型是一套算法,其目的是发现大型文档档案中隐藏的主题结构。在本文中,我们回顾了这一领域的主要观点,考察了当前的最新技术,并描述了一些有前景的未来方向。我们首先介绍了潜狄利克特分配(LDA)[8],这是最简单的一种主题模型。我们讨论了它与概率建模的联系、
并介绍了两种发现主题的算法。然后,我们将考察
以有趣的方式扩展和应用主题模型的研究。这些
这些扩展是通过放宽 LDA 的一些统计假设而发展起来的、
将元数据纳入文档分析,并在多种数据类型中使用类似的模型。
在社交网络、图像和遗传学等多种数据类型上使用类似的模型。
最后,我们对主题建模的一些重要的、尚未探索的方向提出了自己的看法。
主题建模的一些重要方向。这些方向包括检查为数据探索而建立的模型的严格方法
探索、文本和其他高维数据可视化的新方法,以及
超越传统的信息工程应用,将主题模型用于更科学的目的。},
	language = {en},
	number = {4},
	journal = {Communications of the ACM},
	author = {Blei, David M.},
	month = apr,
	year = {2012},
	note = {🏷️ /unread、meta\_GiveOverview、*****、act\_ContentAnalysis、goal\_Analysis、t\_TopicModeling},
	keywords = {*****, /unread, act\_ContentAnalysis, goal\_Analysis, meta\_GiveOverview, t\_TopicModeling},
	pages = {77--84},
}

Downloads: 0