Transferring Topical Knowledge from Auxiliary Long Texts for Short Text Clustering

Transferring Topical Knowledge from Auxiliary Long Texts for Short Text Clustering. Jin, O., Liu, N., N., Zhao, K., Yu, Y., & Yang, Q.

Paper

Website abstract bibtex

With the rapid growth of social Web applications such as Twitter and online advertisements, the task of understand-ing short texts is becoming more and more important. Most traditional text mining techniques are designed to handle long text documents. For short text messages, many of the existing techniques are not effective due to the sparseness of text representations. To understand short messages, we observe that it is often possible to find topically related long texts, which can be utilized as the auxiliary data when min-ing the target short texts data. In this article, we present a novel approach to cluster short text messages via transfer learning from auxiliary long text data. We show that while some previous works for enhancing short text clustering with related long texts exist, most of them ignore the semantic and topical inconsistencies between the target and auxiliary data and may hurt the clustering performance on the short texts. To accommodate the possible inconsistencies between source and target data, we propose a novel topic model -Du-al Latent Dirichlet Allocation (DLDA) model, which jointly learns two sets of topics on short and long texts and couples the topic parameters to cope with the potential inconsisten-cies between data sets. We demonstrate through large-scale clustering experiments on both advertisements and Twitter data that we can obtain superior performance over several state-of-art techniques for clustering short text documents.

@article{
 title = {Transferring Topical Knowledge from Auxiliary Long Texts for Short Text Clustering},
 type = {article},
 keywords = {Experimentation Keywords Short Text,I26 [Artificial Intelli-gence],I27 [Artificial Intelligence],Information Search and Retrieval—Clustering,Learning,Statistical Generative Models,Unsupervised Learn-ing},
 websites = {http://www.cse.ust.hk/~qyang/Docs/2011/cikm-short-text.pdf},
 id = {0b84c48e-ca52-36fc-a28a-d8c9c6bca69f},
 created = {2018-02-05T17:47:39.455Z},
 accessed = {2018-02-05},
 file_attached = {true},
 profile_id = {371589bb-c770-37ff-8193-93c6f25ffeb1},
 group_id = {f982cd63-7ceb-3aa2-ac7e-a953963d6716},
 last_modified = {2018-02-05T17:47:43.357Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {false},
 hidden = {false},
 private_publication = {false},
 abstract = {With the rapid growth of social Web applications such as Twitter and online advertisements, the task of understand-ing short texts is becoming more and more important. Most traditional text mining techniques are designed to handle long text documents. For short text messages, many of the existing techniques are not effective due to the sparseness of text representations. To understand short messages, we observe that it is often possible to find topically related long texts, which can be utilized as the auxiliary data when min-ing the target short texts data. In this article, we present a novel approach to cluster short text messages via transfer learning from auxiliary long text data. We show that while some previous works for enhancing short text clustering with related long texts exist, most of them ignore the semantic and topical inconsistencies between the target and auxiliary data and may hurt the clustering performance on the short texts. To accommodate the possible inconsistencies between source and target data, we propose a novel topic model -Du-al Latent Dirichlet Allocation (DLDA) model, which jointly learns two sets of topics on short and long texts and couples the topic parameters to cope with the potential inconsisten-cies between data sets. We demonstrate through large-scale clustering experiments on both advertisements and Twitter data that we can obtain superior performance over several state-of-art techniques for clustering short text documents.},
 bibtype = {article},
 author = {Jin, Ou and Liu, Nathan N and Zhao, Kai and Yu, Yong and Yang, Qiang}
}

Downloads: 0

{"_id":"tuf9DqcSQD5YphkEb","bibbaseid":"jin-liu-zhao-yu-yang-transferringtopicalknowledgefromauxiliarylongtextsforshorttextclustering","downloads":0,"creationDate":"2018-02-07T16:22:57.308Z","title":"Transferring Topical Knowledge from Auxiliary Long Texts for Short Text Clustering","author_short":["Jin, O.","Liu, N., N.","Zhao, K.","Yu, Y.","Yang, Q."],"year":null,"bibtype":"article","biburl":null,"bibdata":{"title":"Transferring Topical Knowledge from Auxiliary Long Texts for Short Text Clustering","type":"article","keywords":"Experimentation Keywords Short Text,I26 [Artificial Intelli-gence],I27 [Artificial Intelligence],Information Search and Retrieval—Clustering,Learning,Statistical Generative Models,Unsupervised Learn-ing","websites":"http://www.cse.ust.hk/~qyang/Docs/2011/cikm-short-text.pdf","id":"0b84c48e-ca52-36fc-a28a-d8c9c6bca69f","created":"2018-02-05T17:47:39.455Z","accessed":"2018-02-05","file_attached":"true","profile_id":"371589bb-c770-37ff-8193-93c6f25ffeb1","group_id":"f982cd63-7ceb-3aa2-ac7e-a953963d6716","last_modified":"2018-02-05T17:47:43.357Z","read":false,"starred":false,"authored":false,"confirmed":false,"hidden":false,"private_publication":false,"abstract":"With the rapid growth of social Web applications such as Twitter and online advertisements, the task of understand-ing short texts is becoming more and more important. Most traditional text mining techniques are designed to handle long text documents. For short text messages, many of the existing techniques are not effective due to the sparseness of text representations. To understand short messages, we observe that it is often possible to find topically related long texts, which can be utilized as the auxiliary data when min-ing the target short texts data. In this article, we present a novel approach to cluster short text messages via transfer learning from auxiliary long text data. We show that while some previous works for enhancing short text clustering with related long texts exist, most of them ignore the semantic and topical inconsistencies between the target and auxiliary data and may hurt the clustering performance on the short texts. To accommodate the possible inconsistencies between source and target data, we propose a novel topic model -Du-al Latent Dirichlet Allocation (DLDA) model, which jointly learns two sets of topics on short and long texts and couples the topic parameters to cope with the potential inconsisten-cies between data sets. We demonstrate through large-scale clustering experiments on both advertisements and Twitter data that we can obtain superior performance over several state-of-art techniques for clustering short text documents.","bibtype":"article","author":"Jin, Ou and Liu, Nathan N and Zhao, Kai and Yu, Yong and Yang, Qiang","bibtex":"@article{\n title = {Transferring Topical Knowledge from Auxiliary Long Texts for Short Text Clustering},\n type = {article},\n keywords = {Experimentation Keywords Short Text,I26 [Artificial Intelli-gence],I27 [Artificial Intelligence],Information Search and Retrieval—Clustering,Learning,Statistical Generative Models,Unsupervised Learn-ing},\n websites = {http://www.cse.ust.hk/~qyang/Docs/2011/cikm-short-text.pdf},\n id = {0b84c48e-ca52-36fc-a28a-d8c9c6bca69f},\n created = {2018-02-05T17:47:39.455Z},\n accessed = {2018-02-05},\n file_attached = {true},\n profile_id = {371589bb-c770-37ff-8193-93c6f25ffeb1},\n group_id = {f982cd63-7ceb-3aa2-ac7e-a953963d6716},\n last_modified = {2018-02-05T17:47:43.357Z},\n read = {false},\n starred = {false},\n authored = {false},\n confirmed = {false},\n hidden = {false},\n private_publication = {false},\n abstract = {With the rapid growth of social Web applications such as Twitter and online advertisements, the task of understand-ing short texts is becoming more and more important. Most traditional text mining techniques are designed to handle long text documents. For short text messages, many of the existing techniques are not effective due to the sparseness of text representations. To understand short messages, we observe that it is often possible to find topically related long texts, which can be utilized as the auxiliary data when min-ing the target short texts data. In this article, we present a novel approach to cluster short text messages via transfer learning from auxiliary long text data. We show that while some previous works for enhancing short text clustering with related long texts exist, most of them ignore the semantic and topical inconsistencies between the target and auxiliary data and may hurt the clustering performance on the short texts. To accommodate the possible inconsistencies between source and target data, we propose a novel topic model -Du-al Latent Dirichlet Allocation (DLDA) model, which jointly learns two sets of topics on short and long texts and couples the topic parameters to cope with the potential inconsisten-cies between data sets. We demonstrate through large-scale clustering experiments on both advertisements and Twitter data that we can obtain superior performance over several state-of-art techniques for clustering short text documents.},\n bibtype = {article},\n author = {Jin, Ou and Liu, Nathan N and Zhao, Kai and Yu, Yong and Yang, Qiang}\n}","author_short":["Jin, O.","Liu, N., N.","Zhao, K.","Yu, Y.","Yang, Q."],"urls":{"Paper":"http://bibbase.org/service/mendeley/371589bb-c770-37ff-8193-93c6f25ffeb1/file/294870ef-1883-649d-0f07-5298b51acc4f/Transferring_Topical_Knowledge_from_Auxiliary_Long_Texts_for_Short_Text_Clustering.pdf.pdf","Website":"http://www.cse.ust.hk/~qyang/Docs/2011/cikm-short-text.pdf"},"bibbaseid":"jin-liu-zhao-yu-yang-transferringtopicalknowledgefromauxiliarylongtextsforshorttextclustering","role":"author","keyword":["Experimentation Keywords Short Text","I26 [Artificial Intelli-gence]","I27 [Artificial Intelligence]","Information Search and Retrieval—Clustering","Learning","Statistical Generative Models","Unsupervised Learn-ing"],"downloads":0},"search_terms":["transferring","topical","knowledge","auxiliary","long","texts","short","text","clustering","jin","liu","zhao","yu","yang"],"keywords":["experimentation keywords short text","i26 [artificial intelli-gence]","i27 [artificial intelligence]","information search and retrieval—clustering","learning","statistical generative models","unsupervised learn-ing"],"authorIDs":[]}