Lessons learned from tagging clinical Hungarian. Orosz, G., Novák, A., & Prószéky, G. International Journal of Computational Linguistics and Applications, 5(1):159–176, 2014.
Lessons learned from tagging clinical Hungarian [link]Paper  abstract   bibtex   
As more and more textual resources from the medical domain are getting accessible, automatic analysis of clinical notes becomes possible. Since part-of-speech tagging is a fundamental part of any text processing chain, tagging tasks must be performed with high accuracy. While there are numerous studies on tagging medical English, we are not aware of any previous research examining the same field for Hungarian. This paper presents methods and resources which can be used for annotating medical Hungarian and investigates their application to tagging clinical records. Our research relies on a baseline setting, whose performance was improved incrementally by eliminating its most common errors. The extension of the lexicon used raised the overall accuracy significantly, while other domain adaptation methods were only partially successful. The presented enhancements corrected almost half of the errors. However, further analysis of errors suggest that abbreviations should be handled at a higher level of processing.
@article{orosz_lessons_2014,
	title = {Lessons learned from tagging clinical {Hungarian}},
	volume = {5},
	issn = {0976-0962},
	url = {http://www.gelbukh.com/ijcla/2014-1/},
	abstract = {As more and more textual resources from the medical domain are getting accessible, automatic analysis of clinical notes becomes possible.
Since part-of-speech tagging is a fundamental part of any text processing chain, tagging tasks must be performed with high accuracy. 
While there are numerous studies on tagging medical English, we are not aware of any previous research examining the same field for Hungarian.
This paper presents methods and resources which can be used for annotating medical Hungarian and investigates their application to tagging clinical records. 
Our research relies on a baseline setting, whose performance was improved incrementally by eliminating its most common errors. The extension of the lexicon used raised the overall accuracy significantly, while other domain adaptation methods were only partially successful. 
The presented enhancements corrected almost half of the errors. However, further analysis of errors suggest that abbreviations should be handled at a higher level of processing.},
	number = {1},
	journal = {International Journal of Computational Linguistics and Applications},
	author = {Orosz, György and Novák, Attila and Prószéky, Gábor},
	year = {2014},
	pages = {159--176},
}

Downloads: 0