Improving Topic Models with Latent Feature Word Representations. Nguyen, D., Q., Billingsley, R., Du, L., & Johnson, M.
Improving Topic Models with Latent Feature Word Representations [pdf]Paper  Improving Topic Models with Latent Feature Word Representations [link]Website  abstract   bibtex   
Probabilistic topic models are widely used to discover latent topics in document collec-tions, while latent feature vector representa-tions of words have been used to obtain high performance in many NLP tasks. In this pa-per, we extend two different Dirichlet multino-mial topic models by incorporating latent fea-ture vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus. Exper-imental results show that by using informa-tion from the external corpora, our new mod-els produce significant improvements on topic coherence, document clustering and document classification tasks, especially on datasets with few or short documents.

Downloads: 0