Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology Based Representations. Michel, P., Ravichander, A., & Rijhwani, S.
Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology Based Representations [link]Paper  abstract   bibtex   
We investigate the pertinence of methods from algebraic topology for text data analysis. These methods enable the development of mathematically-principled isometric-invariant mappings from a set of vectors to a document embedding, which is stable with respect to the geometry of the document in the selected metric space. In this work, we evaluate the utility of these topology-based document representations in traditional NLP tasks, specifically document clustering and sentiment classification. We find that the embeddings do not benefit text analysis. In fact, performance is worse than simple techniques like \$\textbackslash{}textit\{tf-idf\}\$, indicating that the geometry of the document does not provide enough variability for classification on the basis of topic or sentiment in the chosen datasets.
@article{michelDoesGeometryWord2017,
  archivePrefix = {arXiv},
  eprinttype = {arxiv},
  eprint = {1705.10900},
  primaryClass = {cs},
  title = {Does the {{Geometry}} of {{Word Embeddings Help Document Classification}}? {{A Case Study}} on {{Persistent Homology Based Representations}}},
  url = {http://arxiv.org/abs/1705.10900},
  shorttitle = {Does the {{Geometry}} of {{Word Embeddings Help Document Classification}}?},
  abstract = {We investigate the pertinence of methods from algebraic topology for text data analysis. These methods enable the development of mathematically-principled isometric-invariant mappings from a set of vectors to a document embedding, which is stable with respect to the geometry of the document in the selected metric space. In this work, we evaluate the utility of these topology-based document representations in traditional NLP tasks, specifically document clustering and sentiment classification. We find that the embeddings do not benefit text analysis. In fact, performance is worse than simple techniques like \$\textbackslash{}textit\{tf-idf\}\$, indicating that the geometry of the document does not provide enough variability for classification on the basis of topic or sentiment in the chosen datasets.},
  urldate = {2019-02-19},
  date = {2017-05-30},
  keywords = {Computer Science - Computation and Language},
  author = {Michel, Paul and Ravichander, Abhilasha and Rijhwani, Shruti},
  file = {/home/dimitri/Nextcloud/Zotero/storage/KM5ERMB9/Michel et al. - 2017 - Does the Geometry of Word Embeddings Help Document.pdf;/home/dimitri/Nextcloud/Zotero/storage/E39REUD6/1705.html}
}

Downloads: 0