Class-based n-gram models of natural language. Brown, P. F., deSouza, P. V., Mercer, R. L., Pietra, V. J., & Lai, J. C. Comput. Linguist., 18(4):467--479, 1992.
Class-based n-gram models of natural language [link]Paper  abstract   bibtex   
We address the problem of predicting a word from previous words in a sample of text. In particular, we discuss n-gram models based on classes of words. We also discuss several statistical algorithms for assigning words to classes based on the frequency of their co-occurrence with other words. We find that we are able to extract classes that have the flavor of either syntactically based groupings or semantically based groupings, depending on the nature of the underlying statistics.
@article{ brown_class-based_1992,
  title = {Class-based n-gram models of natural language},
  volume = {18},
  url = {http://dl.acm.org/citation.cfm?id=176313.176316},
  abstract = {We address the problem of predicting a word from previous words in a sample of text. In particular, we discuss n-gram models based on classes of words. We also discuss several statistical algorithms for assigning words to classes based on the frequency of their co-occurrence with other words. We find that we are able to extract classes that have the flavor of either syntactically based groupings or semantically based groupings, depending on the nature of the underlying statistics.},
  number = {4},
  journal = {Comput. Linguist.},
  author = {Brown, Peter F. and deSouza, Peter V. and Mercer, Robert L. and Pietra, Vincent J. and Lai, Jenifer C.},
  year = {1992},
  keywords = {computational_linguistics},
  pages = {467--479}
}
Downloads: 0