Daidalos: NER for Literary Studies on Latin and Ancient Greek Texts. Beyer, A. In Nomina Omina, of Digital Classics Online, pages forthcoming. 2025.
abstract   bibtex   
Literary texts offer a wealth of unstructured data that can be harnessed for data-driven text analysis through natural language processing (NLP). Named Entity Recognition and Classification (NER) is a crucial initial step in this process, enabling the automatic identification of entities such as persons, organizations, locations, and dates. However, NER faces significant challenges, particularly with his-torical texts in low-resource languages like Latin and Ancient Greek, due to limited annotated corpora and the dynamic nature of language. This paper explores the evolution of NER from simple extraction to semantics-aware entity disambiguation and linking, highlighting the importance of multi-layer anno-tation systems to enhance data quality and model accuracy. The interdisciplinary Daidalos project aims to bridge the gap between Digital Humanities and Classical Studies by providing an NLP infra-structure that supports various data-driven research methods, among others NER. One of the pro-ject's case studies demonstrates the potential of NER in classical literary studies; this is accompanied by proposals on other NER related literary research questions, e.g. on authorship attribution and ste-reotyping. Additionally, the paper offers some thoughts about teaching NER, presenting a framework to assess the required level of Digital Literacies when working on a specific research question. Finally, it discusses the implications of generative AI and Large Language Models (LLM) on NER and NLP in Classics, emphasizing the challenges for independent research posed by the high costs and limited transparency of LLMs.
@incollection{beyer_daidalos_2025,
	series = {Digital {Classics} {Online}},
	title = {Daidalos: {NER} for {Literary} {Studies} on {Latin} and {Ancient} {Greek} {Texts}},
	abstract = {Literary texts offer a wealth of unstructured data that can be harnessed for data-driven text analysis through natural language processing (NLP). Named Entity Recognition and Classification (NER) is a crucial initial step in this process, enabling the automatic identification of entities such as persons, organizations, locations, and dates. However, NER faces significant challenges, particularly with his-torical texts in low-resource languages like Latin and Ancient Greek, due to limited annotated corpora and the dynamic nature of language. This paper explores the evolution of NER from simple extraction to semantics-aware entity disambiguation and linking, highlighting the importance of multi-layer anno-tation systems to enhance data quality and model accuracy. The interdisciplinary Daidalos project aims to bridge the gap between Digital Humanities and Classical Studies by providing an NLP infra-structure that supports various data-driven research methods, among others NER. One of the pro-ject's case studies demonstrates the potential of NER in classical literary studies; this is accompanied by proposals on other NER related literary research questions, e.g. on authorship attribution and ste-reotyping. Additionally, the paper offers some thoughts about teaching NER, presenting a framework to assess the required level of Digital Literacies when working on a specific research question. Finally, it discusses the implications of generative AI and Large Language Models (LLM) on NER and NLP in Classics, emphasizing the challenges for independent research posed by the high costs and limited transparency of LLMs.},
	booktitle = {Nomina {Omina}},
	author = {Beyer, Andrea},
	editor = {Berti, Monica},
	year = {2025},
	pages = {forthcoming},
}

Downloads: 0