A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings. Hu, W. & Tsujii, J., '.
A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings [pdf]Paper  A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings [pdf]Website  abstract   bibtex   
Uncovering thematic structures of SNS and blog posts is a crucial yet challeng-ing task, because of the severe data spar-sity induced by the short length of texts and diverse use of vocabulary. This hin-ders effective topic inference of traditional LDA because it infers topics based on document-level co-occurrence of words. To robustly infer topics in such contexts, we propose a latent concept topic model (LCTM). Unlike LDA, LCTM reveals top-ics via co-occurrence of latent concepts, which we introduce as latent variables to capture conceptual similarity of words. More specifically, LCTM models each topic as a distribution over the latent con-cepts, where each latent concept is a local-ized Gaussian distribution over the word embedding space. Since the number of unique concepts in a corpus is often much smaller than the number of unique words, LCTM is less susceptible to the data spar-sity. Experiments on the 20Newsgroups show the effectiveness of LCTM in deal-ing with short texts as well as the capabil-ity of the model in handling held-out doc-uments with a high degree of OOV words.
@article{
 title = {A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings},
 type = {article},
 pages = {380-386},
 websites = {https://pdfs.semanticscholar.org/68d1/26a8a7080b7a67219c27456c873543376393.pdf},
 id = {eb49e89d-613f-3138-b6fc-bd74367d353b},
 created = {2018-02-05T18:48:03.758Z},
 accessed = {2018-02-05},
 file_attached = {true},
 profile_id = {371589bb-c770-37ff-8193-93c6f25ffeb1},
 group_id = {f982cd63-7ceb-3aa2-ac7e-a953963d6716},
 last_modified = {2018-02-05T18:48:06.626Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {false},
 hidden = {false},
 private_publication = {false},
 abstract = {Uncovering thematic structures of SNS and blog posts is a crucial yet challeng-ing task, because of the severe data spar-sity induced by the short length of texts and diverse use of vocabulary. This hin-ders effective topic inference of traditional LDA because it infers topics based on document-level co-occurrence of words. To robustly infer topics in such contexts, we propose a latent concept topic model (LCTM). Unlike LDA, LCTM reveals top-ics via co-occurrence of latent concepts, which we introduce as latent variables to capture conceptual similarity of words. More specifically, LCTM models each topic as a distribution over the latent con-cepts, where each latent concept is a local-ized Gaussian distribution over the word embedding space. Since the number of unique concepts in a corpus is often much smaller than the number of unique words, LCTM is less susceptible to the data spar-sity. Experiments on the 20Newsgroups show the effectiveness of LCTM in deal-ing with short texts as well as the capabil-ity of the model in handling held-out doc-uments with a high degree of OOV words.},
 bibtype = {article},
 author = {Hu, Weihua and Tsujii, Jun 'ichi}
}
Downloads: 0