Probabilistic Latent Semantic Analysis. Hofmann, T.
Probabilistic Latent Semantic Analysis [pdf]Paper  Probabilistic Latent Semantic Analysis [pdf]Website  abstract   bibtex   
Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of twoomode and co-occurrence data, which has applications in information retrieval and [ltering, natural language processing, ma-chine learning from text, and in related ar-eas. Compared to standard Latent Semantic Analysis which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed method is based on a mixture decomposition derived from a latent class model. This results in a more principled approach which has a solid foundation in statistics. In order to avoid overrtting, we propose a widely applicable generalization of maximum likelihood model by tempered EM. Our approach yields substantial and consistent improvements over Latent Semantic Analysis in a number of ex-periments.

Downloads: 0