Probabilistic Latent Semantic Indexing. Hofmann, T.
Probabilistic Latent Semantic Indexing [pdf]Paper  Probabilistic Latent Semantic Indexing [link]Website  abstract   bibtex   
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a sta-tistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a gen-eralization of the Expectation Maximization algorithm, the utilized model is able to deal with domainnspeciic synonymy as well as with polysemous words. In contrast to standard Latent Semantic Indexing LSI by Singular Value Decom-position, the probabilistic variant has a solid statistical foun-dation and deenes a proper generative data model. Retrieval experiments on a number of test collections indicate sub-stantial performance gains over direct term matching meth-o d s a s w ell as over LSI. In particular, the combination of models with diierent dimensionalities has proven to be ad-vantageous.

Downloads: 0