BTM: Topic Modeling over Short Texts

BTM: Topic Modeling over Short Texts. Cheng, X., Yan, X., Lan, Y., & Guo, J.

BTM: Topic Modeling over Short Texts [link]

—Short texts are popular on today's Web, especially with the emergence of social media. Inferring topics from large scale short texts becomes a critical but challenging task for many content analysis tasks. Conventional topic models such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) learn topics from document-level word co-occurrences by modeling each document as a mixture of topics, whose inference suffers from the sparsity of word co-occurrence patterns in short texts. In this paper, we propose a novel way for short text topic modeling, referred as biterm topic model (BTM). BTM learns topics by directly modeling the generation of word co-occurrence patterns (i.e., biterms) in the corpus, making the inference effective with the rich corpus-level information. To cope with large scale short text data, we further introduce two online algorithms for BTM for efficient topic learning. Experiments on real-word short text collections show that BTM can discover more prominent and coherent topics, and significantly outperform the state-of-the-art baselines. We also demonstrate the appealing performance of the two online BTM algorithms on both time efficiency and topic learning.

@article{
 title = {BTM: Topic Modeling over Short Texts},
 type = {article},
 identifiers = {[object Object]},
 keywords = {Biterm,Content Analysis !,Index Terms—Short Text,Online Algorithm,Topic Model},
 websites = {http://www.ieee.org/publications_standards/publications/rights/index.html},
 id = {c97eda77-8ec8-3497-bd41-bc20b9210b5e},
 created = {2018-02-05T16:51:51.091Z},
 accessed = {2018-02-05},
 file_attached = {true},
 profile_id = {371589bb-c770-37ff-8193-93c6f25ffeb1},
 group_id = {f982cd63-7ceb-3aa2-ac7e-a953963d6716},
 last_modified = {2018-02-05T16:51:56.377Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {false},
 hidden = {false},
 private_publication = {false},
 abstract = {—Short texts are popular on today's Web, especially with the emergence of social media. Inferring topics from large scale short texts becomes a critical but challenging task for many content analysis tasks. Conventional topic models such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) learn topics from document-level word co-occurrences by modeling each document as a mixture of topics, whose inference suffers from the sparsity of word co-occurrence patterns in short texts. In this paper, we propose a novel way for short text topic modeling, referred as biterm topic model (BTM). BTM learns topics by directly modeling the generation of word co-occurrence patterns (i.e., biterms) in the corpus, making the inference effective with the rich corpus-level information. To cope with large scale short text data, we further introduce two online algorithms for BTM for efficient topic learning. Experiments on real-word short text collections show that BTM can discover more prominent and coherent topics, and significantly outperform the state-of-the-art baselines. We also demonstrate the appealing performance of the two online BTM algorithms on both time efficiency and topic learning.},
 bibtype = {article},
 author = {Cheng, Xueqi and Yan, Xiaohui and Lan, Yanyan and Guo, Jiafeng}
}

Downloads: 0

{"_id":"8H3iyiAAsdvNeSNPP","bibbaseid":"cheng-yan-lan-guo-btmtopicmodelingovershorttexts","downloads":0,"creationDate":"2018-02-07T16:22:57.269Z","title":"BTM: Topic Modeling over Short Texts","author_short":["Cheng, X.","Yan, X.","Lan, Y.","Guo, J."],"year":null,"bibtype":"article","biburl":null,"bibdata":{"title":"BTM: Topic Modeling over Short Texts","type":"article","identifiers":"[object Object]","keywords":"Biterm,Content Analysis !,Index Terms—Short Text,Online Algorithm,Topic Model","websites":"http://www.ieee.org/publications_standards/publications/rights/index.html","id":"c97eda77-8ec8-3497-bd41-bc20b9210b5e","created":"2018-02-05T16:51:51.091Z","accessed":"2018-02-05","file_attached":"true","profile_id":"371589bb-c770-37ff-8193-93c6f25ffeb1","group_id":"f982cd63-7ceb-3aa2-ac7e-a953963d6716","last_modified":"2018-02-05T16:51:56.377Z","read":false,"starred":false,"authored":false,"confirmed":false,"hidden":false,"private_publication":false,"abstract":"—Short texts are popular on today's Web, especially with the emergence of social media. Inferring topics from large scale short texts becomes a critical but challenging task for many content analysis tasks. Conventional topic models such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) learn topics from document-level word co-occurrences by modeling each document as a mixture of topics, whose inference suffers from the sparsity of word co-occurrence patterns in short texts. In this paper, we propose a novel way for short text topic modeling, referred as biterm topic model (BTM). BTM learns topics by directly modeling the generation of word co-occurrence patterns (i.e., biterms) in the corpus, making the inference effective with the rich corpus-level information. To cope with large scale short text data, we further introduce two online algorithms for BTM for efficient topic learning. Experiments on real-word short text collections show that BTM can discover more prominent and coherent topics, and significantly outperform the state-of-the-art baselines. We also demonstrate the appealing performance of the two online BTM algorithms on both time efficiency and topic learning.","bibtype":"article","author":"Cheng, Xueqi and Yan, Xiaohui and Lan, Yanyan and Guo, Jiafeng","bibtex":"@article{\n title = {BTM: Topic Modeling over Short Texts},\n type = {article},\n identifiers = {[object Object]},\n keywords = {Biterm,Content Analysis !,Index Terms—Short Text,Online Algorithm,Topic Model},\n websites = {http://www.ieee.org/publications_standards/publications/rights/index.html},\n id = {c97eda77-8ec8-3497-bd41-bc20b9210b5e},\n created = {2018-02-05T16:51:51.091Z},\n accessed = {2018-02-05},\n file_attached = {true},\n profile_id = {371589bb-c770-37ff-8193-93c6f25ffeb1},\n group_id = {f982cd63-7ceb-3aa2-ac7e-a953963d6716},\n last_modified = {2018-02-05T16:51:56.377Z},\n read = {false},\n starred = {false},\n authored = {false},\n confirmed = {false},\n hidden = {false},\n private_publication = {false},\n abstract = {—Short texts are popular on today's Web, especially with the emergence of social media. Inferring topics from large scale short texts becomes a critical but challenging task for many content analysis tasks. Conventional topic models such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) learn topics from document-level word co-occurrences by modeling each document as a mixture of topics, whose inference suffers from the sparsity of word co-occurrence patterns in short texts. In this paper, we propose a novel way for short text topic modeling, referred as biterm topic model (BTM). BTM learns topics by directly modeling the generation of word co-occurrence patterns (i.e., biterms) in the corpus, making the inference effective with the rich corpus-level information. To cope with large scale short text data, we further introduce two online algorithms for BTM for efficient topic learning. Experiments on real-word short text collections show that BTM can discover more prominent and coherent topics, and significantly outperform the state-of-the-art baselines. We also demonstrate the appealing performance of the two online BTM algorithms on both time efficiency and topic learning.},\n bibtype = {article},\n author = {Cheng, Xueqi and Yan, Xiaohui and Lan, Yanyan and Guo, Jiafeng}\n}","author_short":["Cheng, X.","Yan, X.","Lan, Y.","Guo, J."],"urls":{"Paper":"http://bibbase.org/service/mendeley/371589bb-c770-37ff-8193-93c6f25ffeb1/file/129658a1-0b41-a33f-68c6-d222387b7303/BTM_Topic_Modeling_over_Short_Texts.pdf.pdf","Website":"http://www.ieee.org/publications_standards/publications/rights/index.html"},"bibbaseid":"cheng-yan-lan-guo-btmtopicmodelingovershorttexts","role":"author","keyword":["Biterm","Content Analysis !","Index Terms—Short Text","Online Algorithm","Topic Model"],"downloads":0},"search_terms":["btm","topic","modeling","over","short","texts","cheng","yan","lan","guo"],"keywords":["biterm","content analysis !","index terms—short text","online algorithm","topic model"],"authorIDs":[]}