Towards combining rule-based and statistical part of speech tagging in agglutinative languages. Altunyurt, L., Orhan, Z., & Güngör, T. Technical Report 2007.
Towards combining rule-based and statistical part of speech tagging in agglutinative languages [pdf]Paper  abstract   bibtex   
We present a composite part of speech tagger for Turkish which combines the rule-based and statistical approaches. The tagger makes use of word frequencies and n-gram statistics from a corpus. We use the output of a morphological analyzer in order to get more accurate results and also to eliminate the sparse data problem. In addition, we employ a heuristics about the position of words in the sentences. Although the experiments have been performed on a very small corpus, the results have shown that the use of a composite approach and heuristics improves the accuracy of the tagger. Keywords: agglutinative language, part of speech tagger, rule-based and statistical method

Downloads: 0