A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings. Hu, W. & Tsujii, J., '.
Paper
Website abstract bibtex Uncovering thematic structures of SNS and blog posts is a crucial yet challeng-ing task, because of the severe data spar-sity induced by the short length of texts and diverse use of vocabulary. This hin-ders effective topic inference of traditional LDA because it infers topics based on document-level co-occurrence of words. To robustly infer topics in such contexts, we propose a latent concept topic model (LCTM). Unlike LDA, LCTM reveals top-ics via co-occurrence of latent concepts, which we introduce as latent variables to capture conceptual similarity of words. More specifically, LCTM models each topic as a distribution over the latent con-cepts, where each latent concept is a local-ized Gaussian distribution over the word embedding space. Since the number of unique concepts in a corpus is often much smaller than the number of unique words, LCTM is less susceptible to the data spar-sity. Experiments on the 20Newsgroups show the effectiveness of LCTM in deal-ing with short texts as well as the capabil-ity of the model in handling held-out doc-uments with a high degree of OOV words.
@article{
title = {A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings},
type = {article},
pages = {380-386},
websites = {https://pdfs.semanticscholar.org/68d1/26a8a7080b7a67219c27456c873543376393.pdf},
id = {eb49e89d-613f-3138-b6fc-bd74367d353b},
created = {2018-02-05T18:48:03.758Z},
accessed = {2018-02-05},
file_attached = {true},
profile_id = {371589bb-c770-37ff-8193-93c6f25ffeb1},
group_id = {f982cd63-7ceb-3aa2-ac7e-a953963d6716},
last_modified = {2018-02-05T18:48:06.626Z},
read = {false},
starred = {false},
authored = {false},
confirmed = {false},
hidden = {false},
private_publication = {false},
abstract = {Uncovering thematic structures of SNS and blog posts is a crucial yet challeng-ing task, because of the severe data spar-sity induced by the short length of texts and diverse use of vocabulary. This hin-ders effective topic inference of traditional LDA because it infers topics based on document-level co-occurrence of words. To robustly infer topics in such contexts, we propose a latent concept topic model (LCTM). Unlike LDA, LCTM reveals top-ics via co-occurrence of latent concepts, which we introduce as latent variables to capture conceptual similarity of words. More specifically, LCTM models each topic as a distribution over the latent con-cepts, where each latent concept is a local-ized Gaussian distribution over the word embedding space. Since the number of unique concepts in a corpus is often much smaller than the number of unique words, LCTM is less susceptible to the data spar-sity. Experiments on the 20Newsgroups show the effectiveness of LCTM in deal-ing with short texts as well as the capabil-ity of the model in handling held-out doc-uments with a high degree of OOV words.},
bibtype = {article},
author = {Hu, Weihua and Tsujii, Jun 'ichi}
}