Challenges and Solutions for Latin Named Entity Recognition. Erdmann, A., Brown, C., Joseph, B., Janse, M., Ajaka, P., Elsner, M., & De Marneffe, M. In Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH), pages 85–93, Osaka, Japan, 2016. The COLING 2016 Organizing Committee..
abstract   bibtex   
Although spanning thousands of years and genres as diverse as liturgy, historiography, lyric and other forms of prose and poetry, the body of Latin texts is still relatively sparse compared to English. Data sparsity in Latin presents a number of challenges for traditional Named Entity Recognition techniques. Solving such challenges and enabling reliable Named Entity Recognition in Latin texts can facilitate many down-stream applications, from machine translation to digital historiography, enabling Classicists, historians, and archaeologists for instance, to track the relationships of historical persons, places, and groups on a large scale. This paper presents the first annotated corpus for evaluating Named Entity Recognition in Latin, as well as a fully supervised model that achieves over 90% F-score on a held-out test set, significantly outperforming a competitive baseline. We also present a novel active learning strategy that predicts how many and which sentences need to be annotated for named entities in order to attain a specified degree of accuracy when recognizing named entities automatically in a given text. This maximizes the productivity of annotators while simultaneously controlling quality.
@inproceedings{erdmann_challenges_2016,
	address = {Osaka, Japan},
	title = {Challenges and {Solutions} for {Latin} {Named} {Entity} {Recognition}},
	abstract = {Although spanning thousands of years and genres as diverse as liturgy, historiography, lyric and other forms of prose and poetry, the body of Latin texts is still relatively sparse compared to English. Data sparsity in Latin presents a number of challenges for traditional Named Entity Recognition techniques. Solving such challenges and enabling reliable Named Entity Recognition in Latin texts can facilitate many down-stream applications, from machine translation to digital historiography, enabling Classicists, historians, and archaeologists for instance, to track the relationships of historical persons, places, and groups on a large scale. This paper presents the first annotated corpus for evaluating Named Entity Recognition in Latin, as well as a fully supervised model that achieves over 90\% F-score on a held-out test set, significantly outperforming a competitive baseline. We also present a novel active learning strategy that predicts how many and which sentences need to be annotated for named entities in order to attain a specified degree of accuracy when recognizing named entities automatically in a given text. This maximizes the productivity of annotators while simultaneously controlling quality.},
	booktitle = {Proceedings of the {Workshop} on {Language} {Technology} {Resources} and {Tools} for {Digital} {Humanities} ({LT4DH})},
	publisher = {The COLING 2016 Organizing Committee.},
	author = {Erdmann, Alexander and Brown, Christopher and Joseph, Brian and Janse, Mark and Ajaka, Petra and Elsner, Micha and De Marneffe, Marie-Catherine},
	year = {2016},
	pages = {85--93},
}

Downloads: 0