An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation. Lau, J., H. & Baldwin, T.
An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation [pdf]Paper  An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation [pdf]Website  abstract   bibtex   
Recently, Le and Mikolov (2014) pro-posed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings. De-spite promising results in the original pa-per, others have struggled to reproduce those results. This paper presents a rig-orous empirical evaluation of doc2vec over two tasks. We compare doc2vec to two baselines and two state-of-the-art document embedding methodologies. We found that doc2vec performs robustly when using models trained on large ex-ternal corpora, and can be further im-proved by using pre-trained word embed-dings. We also provide recommendations on hyper-parameter settings for general-purpose applications, and release source code to induce document embeddings us-ing our trained doc2vec models.

Downloads: 0