Word Sense Induction and Disambiguation Using Principal Component Analysis and Latent Semantic Indexing. Gaur, V. & Jain, S. International Journal of Scientific and Research Publications (IJSRP), 2013.
Word Sense Induction and Disambiguation Using Principal Component Analysis and Latent Semantic Indexing [link]Paper  abstract   bibtex   
In this paper we present a statistical method using principal component analysis and latent semantic indexing to solve the problem of word sense induction and use the generated sense inventory to perform word sense disambiguation. We use standard co-occurrence graph algorithms and word dependency matrices (context words, dependency relations) and map them to a matrix. Then, we apply non-negative matrix factorization and principal component analysis on dimensions (latent factors) obtained from the training set. The intuition behind this is to merge dimensions that overlap in the semantic space (like computers and processors) and hence reinforcing their effect mutually to improve the disambiguation step. We work on the idea that each sense obtained in the induction process corresponds to a topical dimension. We extend this idea for each word to obtain a word's most dominant sense. The framework is tested on the standard SEM-EVAL 2010 for WSI/WSD on which it produces state of the art results.
@article{ gaur_word_2013,
  title = {Word Sense Induction and Disambiguation Using Principal Component Analysis and Latent Semantic Indexing},
  volume = {3},
  url = {http://www.ijsrp.org/research-paper-1113.php?rp=P232000#citation},
  abstract = {In this paper we present a statistical method using principal component analysis and latent semantic indexing to solve the problem of word sense induction and use the generated sense inventory to perform word sense disambiguation. We use standard co-occurrence graph algorithms and word dependency matrices (context words, dependency relations) and map them to a matrix. Then, we apply non-negative matrix factorization and principal component analysis on dimensions (latent factors) obtained from the training set. The intuition behind this is to merge dimensions that overlap in the semantic space (like computers and processors) and hence reinforcing their effect mutually to improve the disambiguation step. We work on the idea that each sense obtained in the induction process corresponds to a topical dimension. We extend this idea for each word to obtain a word's most dominant sense. The framework is tested on the standard {SEM}-{EVAL} 2010 for {WSI}/{WSD} on which it produces state of the art results.},
  number = {11},
  journal = {International Journal of Scientific and Research Publications ({IJSRP})},
  author = {Gaur, Vibor and Jain, Satbir},
  year = {2013},
  keywords = {disambiguation, latent_semantic_indexing}
}

Downloads: 0