Building Universal Dependency Treebanks in Korean. Chun, J., Han, N., Hwang, J. D., & Choi, J. D. In Proceedings of the 11th International Conference on Language Resources and Evaluation, of LREC'18, pages 2194–2202, Miyazaki, Japan, 2018.
Building Universal Dependency Treebanks in Korean [link]Paper  Building Universal Dependency Treebanks in Korean [pdf]Paper  Building Universal Dependency Treebanks in Korean [link]Slides  abstract   bibtex   
This paper presents three treebanks in Korean that consist of dependency trees derived from existing treebanks, the Google UD Treebank, the Penn Korean Treebank, and the KAIST Treebank, and pseudo-annotated by the latest guidelines from the Universal Dependencies (UD) project. The Korean portion of the Google UD Treebank is re-tokenized to match the morpheme-level annotation suggested by the other corpora, and systematically assessed for errors. Phrase structure trees in the Penn Korean Treebank and the KAIST Treebank are automatically converted into dependency trees using head finding rules and linguistic heuristics. Additionally, part-of-speech tags in all treebanks are converted into the UD tagset. A total of 38K+ dependency trees are generated that comprise a coherent set of dependency relations for over a half million tokens. To the best of our knowledge, this is the first time that these Korean corpora are analyzed together and transformed into dependency trees following the latest UD guidelines, version 2.

Downloads: 0