Annotating, Projecting, and Interpreting Named Entities in Digital Scholarly Editions with LLMs

Annotating, Projecting, and Interpreting Named Entities in Digital Scholarly Editions with LLMs. Galka, S. & Vogeler, G. August, 2025. ISSN: 2693-5015

Paper doi abstract bibtex

This paper explores the use of large language models (LLMs) to enhance semantic annotation and annotation projection in digital scholarly editions (DSEs), focusing on historical ego-documents. Using the TEI/XML-encoded French memoirs of Countess Luise Charlotte of Schwerin (1684–1732) and their German translation as a case study, we evaluate LLMs for Named Entity Recognition (NER), annotation transfer across aligned bilingual texts, and the extraction of interpersonal relationships. A comparative analysis with a traditional NER framework shows that LLMs significantly outperform baseline models, particularly in recognizing complex person references, such as non-rigid designators and nested entities. For annotation projection, we demonstrate that LLMs can reliably transfer entity annotations between French and German texts without intermediate alignment layers, achieving over 97% of correct projected entities using zero-shot prompting. Additionally, a pilot experiment illustrates the potential of LLMs for structured relationship modeling. The analysis of the errors puts further emphasis on the question of our intentions as editors when translating and indexing texts.

@misc{galka_annotating_2025,
	title = {Annotating, {Projecting}, and {Interpreting} {Named} {Entities} in {Digital} {Scholarly} {Editions} with {LLMs}},
	url = {https://www.researchsquare.com/article/rs-7175875/v1},
	doi = {10.21203/rs.3.rs-7175875/v1},
	abstract = {This paper explores the use of large language models (LLMs) to enhance semantic annotation and annotation projection in digital scholarly editions (DSEs), focusing on historical ego-documents. Using the TEI/XML-encoded French memoirs of Countess Luise Charlotte of Schwerin (1684\&amp;ndash;1732) and their German translation as a case study, we evaluate LLMs for Named Entity Recognition (NER), annotation transfer across aligned bilingual texts, and the extraction of interpersonal relationships. A comparative analysis with a traditional NER framework shows that LLMs significantly outperform baseline models, particularly in recognizing complex person references, such as non-rigid designators and nested entities. For annotation projection, we demonstrate that LLMs can reliably transfer entity annotations between French and German texts without intermediate alignment layers, achieving over 97\% of correct projected entities using zero-shot prompting. Additionally, a pilot experiment illustrates the potential of LLMs for structured relationship modeling. The analysis of the errors puts further emphasis on the question of our intentions as editors when translating and indexing texts.},
	urldate = {2025-12-28},
	publisher = {Research Square},
	author = {Galka, Selina and Vogeler, Georg},
	month = aug,
	year = {2025},
	note = {ISSN: 2693-5015},
}

Downloads: 0

{"_id":"cNn9sritmDzsYhAqx","bibbaseid":"galka-vogeler-annotatingprojectingandinterpretingnamedentitiesindigitalscholarlyeditionswithllms-2025","author_short":["Galka, S.","Vogeler, G."],"bibdata":{"bibtype":"misc","type":"misc","title":"Annotating, Projecting, and Interpreting Named Entities in Digital Scholarly Editions with LLMs","url":"https://www.researchsquare.com/article/rs-7175875/v1","doi":"10.21203/rs.3.rs-7175875/v1","abstract":"This paper explores the use of large language models (LLMs) to enhance semantic annotation and annotation projection in digital scholarly editions (DSEs), focusing on historical ego-documents. Using the TEI/XML-encoded French memoirs of Countess Luise Charlotte of Schwerin (1684&ndash;1732) and their German translation as a case study, we evaluate LLMs for Named Entity Recognition (NER), annotation transfer across aligned bilingual texts, and the extraction of interpersonal relationships. A comparative analysis with a traditional NER framework shows that LLMs significantly outperform baseline models, particularly in recognizing complex person references, such as non-rigid designators and nested entities. For annotation projection, we demonstrate that LLMs can reliably transfer entity annotations between French and German texts without intermediate alignment layers, achieving over 97% of correct projected entities using zero-shot prompting. Additionally, a pilot experiment illustrates the potential of LLMs for structured relationship modeling. The analysis of the errors puts further emphasis on the question of our intentions as editors when translating and indexing texts.","urldate":"2025-12-28","publisher":"Research Square","author":[{"propositions":[],"lastnames":["Galka"],"firstnames":["Selina"],"suffixes":[]},{"propositions":[],"lastnames":["Vogeler"],"firstnames":["Georg"],"suffixes":[]}],"month":"August","year":"2025","note":"ISSN: 2693-5015","bibtex":"@misc{galka_annotating_2025,\n\ttitle = {Annotating, {Projecting}, and {Interpreting} {Named} {Entities} in {Digital} {Scholarly} {Editions} with {LLMs}},\n\turl = {https://www.researchsquare.com/article/rs-7175875/v1},\n\tdoi = {10.21203/rs.3.rs-7175875/v1},\n\tabstract = {This paper explores the use of large language models (LLMs) to enhance semantic annotation and annotation projection in digital scholarly editions (DSEs), focusing on historical ego-documents. Using the TEI/XML-encoded French memoirs of Countess Luise Charlotte of Schwerin (1684\\&ndash;1732) and their German translation as a case study, we evaluate LLMs for Named Entity Recognition (NER), annotation transfer across aligned bilingual texts, and the extraction of interpersonal relationships. A comparative analysis with a traditional NER framework shows that LLMs significantly outperform baseline models, particularly in recognizing complex person references, such as non-rigid designators and nested entities. For annotation projection, we demonstrate that LLMs can reliably transfer entity annotations between French and German texts without intermediate alignment layers, achieving over 97\\% of correct projected entities using zero-shot prompting. Additionally, a pilot experiment illustrates the potential of LLMs for structured relationship modeling. The analysis of the errors puts further emphasis on the question of our intentions as editors when translating and indexing texts.},\n\turldate = {2025-12-28},\n\tpublisher = {Research Square},\n\tauthor = {Galka, Selina and Vogeler, Georg},\n\tmonth = aug,\n\tyear = {2025},\n\tnote = {ISSN: 2693-5015},\n}\n\n\n\n","author_short":["Galka, S.","Vogeler, G."],"key":"galka_annotating_2025","id":"galka_annotating_2025","bibbaseid":"galka-vogeler-annotatingprojectingandinterpretingnamedentitiesindigitalscholarlyeditionswithllms-2025","role":"author","urls":{"Paper":"https://www.researchsquare.com/article/rs-7175875/v1"},"metadata":{"authorlinks":{}}},"bibtype":"misc","biburl":"https://bibbase.org/zotero-group/schulzkx/5158478","dataSources":["JFDnASMkoQCjjGL8E"],"keywords":[],"search_terms":["annotating","projecting","interpreting","named","entities","digital","scholarly","editions","llms","galka","vogeler"],"title":"Annotating, Projecting, and Interpreting Named Entities in Digital Scholarly Editions with LLMs","year":2025}