A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings. Hu, W. & Tsujii, J., '.
A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings [pdf]Paper  A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings [pdf]Website  abstract   bibtex   
Uncovering thematic structures of SNS and blog posts is a crucial yet challeng-ing task, because of the severe data spar-sity induced by the short length of texts and diverse use of vocabulary. This hin-ders effective topic inference of traditional LDA because it infers topics based on document-level co-occurrence of words. To robustly infer topics in such contexts, we propose a latent concept topic model (LCTM). Unlike LDA, LCTM reveals top-ics via co-occurrence of latent concepts, which we introduce as latent variables to capture conceptual similarity of words. More specifically, LCTM models each topic as a distribution over the latent con-cepts, where each latent concept is a local-ized Gaussian distribution over the word embedding space. Since the number of unique concepts in a corpus is often much smaller than the number of unique words, LCTM is less susceptible to the data spar-sity. Experiments on the 20Newsgroups show the effectiveness of LCTM in deal-ing with short texts as well as the capabil-ity of the model in handling held-out doc-uments with a high degree of OOV words.

Downloads: 0