Integrating Document Clustering and Topic Modeling. Xie, P. & Xing, E., P.
Integrating Document Clustering and Topic Modeling [pdf]Paper  Integrating Document Clustering and Topic Modeling [pdf]Website  abstract   bibtex   
Document clustering and topic modeling are two closely related tasks which can mutu-ally benefit each other. Topic modeling can project documents into a topic space which facilitates effective document cluster-ing. Cluster labels discovered by document clustering can be incorporated into topic models to extract local topics specific to each cluster and global topics shared by all clus-ters. In this paper, we propose a multi-grain clustering topic model (MGCTM) which inte-grates document clustering and topic model-ing into a unified framework and jointly per-forms the two tasks to achieve the overall best performance. Our model tightly couples two components: a mixture component used for discovering latent groups in document col-lection and a topic model component used for mining multi-grain topics including local topics specific to each cluster and global top-ics shared across clusters. We employ varia-tional inference to approximate the posterior of hidden variables and learn model param-eters. Experiments on two datasets demon-strate the effectiveness of our model.

Downloads: 0