Cross-linguistic annotation transfer in geoparsing experiments with Classical texts. Soffiantini, L. DH Benelux Journal, 6:155–168, 2024.
Paper abstract bibtex The Natural History is an encyclopedic work written by the Latin author Pliny the Elder (first century CE). In this extensive text in 37 books, geography plays a pivotal role, with hundreds of mentions of ancient place names. In this paper, a geoparsing experiment is conducted on the Natural History with the scope of automatically identifying and extracting place entities. To achieve this, we take advantage of state-of-the-art NLP models to develop a multistage pipeline involving English Named Entity Recognition, English-Latin sentence alignment, and entity projection. The paper demonstrates how cross-lingual annotation transfer can be applied from a translation in a modern language back to the original text in the context of low-/medium-resource languages, such as Latin. The efficacy of the proposed pipeline is evaluated through the use of both standard metrics and a comprehensive manual error analysis. Additionally, the results are compared to those obtained by other Latin NER tools. Both analyses demonstrate that the proposed methodology achieves a superior f1-score. Finally, the majority of place entities were automatically associated with unique identifiers that enable geolocation by the projection of pre-disambiguated annotations derived from another geo-spatial project.
@article{soffiantini_cross-linguistic_2024,
title = {Cross-linguistic annotation transfer in geoparsing experiments with {Classical} texts},
volume = {6},
url = {https://journal.dhbenelux.org/wp-content/uploads/2024/11/9_Soffiantini_individual.pdf},
abstract = {The Natural History is an encyclopedic work written by the Latin author Pliny the Elder (first century CE). In this extensive text in 37 books, geography plays a pivotal role, with hundreds of mentions of ancient place names. In this paper, a geoparsing experiment is conducted on the Natural History with the scope of automatically identifying and extracting place entities. To achieve this, we take advantage of state-of-the-art NLP models to develop a multistage pipeline involving English Named Entity Recognition, English-Latin sentence alignment, and entity projection. The paper demonstrates how cross-lingual annotation transfer can be applied from a translation in a modern language back to the original text in the context of low-/medium-resource languages, such as Latin. The efficacy of the proposed pipeline is evaluated through the use of both standard metrics and a comprehensive manual error analysis. Additionally, the results are compared to those obtained by other Latin NER tools. Both analyses demonstrate that the proposed methodology achieves a superior f1-score. Finally, the majority of place entities were automatically associated with unique identifiers that enable geolocation by the projection of pre-disambiguated annotations derived from another geo-spatial project.},
urldate = {2025-01-26},
journal = {DH Benelux Journal},
author = {Soffiantini, Laura},
year = {2024},
pages = {155--168},
}
Downloads: 0
{"_id":"6DxXCaDBFjngtj23d","bibbaseid":"soffiantini-crosslinguisticannotationtransferingeoparsingexperimentswithclassicaltexts-2024","author_short":["Soffiantini, L."],"bibdata":{"bibtype":"article","type":"article","title":"Cross-linguistic annotation transfer in geoparsing experiments with Classical texts","volume":"6","url":"https://journal.dhbenelux.org/wp-content/uploads/2024/11/9_Soffiantini_individual.pdf","abstract":"The Natural History is an encyclopedic work written by the Latin author Pliny the Elder (first century CE). In this extensive text in 37 books, geography plays a pivotal role, with hundreds of mentions of ancient place names. In this paper, a geoparsing experiment is conducted on the Natural History with the scope of automatically identifying and extracting place entities. To achieve this, we take advantage of state-of-the-art NLP models to develop a multistage pipeline involving English Named Entity Recognition, English-Latin sentence alignment, and entity projection. The paper demonstrates how cross-lingual annotation transfer can be applied from a translation in a modern language back to the original text in the context of low-/medium-resource languages, such as Latin. The efficacy of the proposed pipeline is evaluated through the use of both standard metrics and a comprehensive manual error analysis. Additionally, the results are compared to those obtained by other Latin NER tools. Both analyses demonstrate that the proposed methodology achieves a superior f1-score. Finally, the majority of place entities were automatically associated with unique identifiers that enable geolocation by the projection of pre-disambiguated annotations derived from another geo-spatial project.","urldate":"2025-01-26","journal":"DH Benelux Journal","author":[{"propositions":[],"lastnames":["Soffiantini"],"firstnames":["Laura"],"suffixes":[]}],"year":"2024","pages":"155–168","bibtex":"@article{soffiantini_cross-linguistic_2024,\n\ttitle = {Cross-linguistic annotation transfer in geoparsing experiments with {Classical} texts},\n\tvolume = {6},\n\turl = {https://journal.dhbenelux.org/wp-content/uploads/2024/11/9_Soffiantini_individual.pdf},\n\tabstract = {The Natural History is an encyclopedic work written by the Latin author Pliny the Elder (first century CE). In this extensive text in 37 books, geography plays a pivotal role, with hundreds of mentions of ancient place names. In this paper, a geoparsing experiment is conducted on the Natural History with the scope of automatically identifying and extracting place entities. To achieve this, we take advantage of state-of-the-art NLP models to develop a multistage pipeline involving English Named Entity Recognition, English-Latin sentence alignment, and entity projection. The paper demonstrates how cross-lingual annotation transfer can be applied from a translation in a modern language back to the original text in the context of low-/medium-resource languages, such as Latin. The efficacy of the proposed pipeline is evaluated through the use of both standard metrics and a comprehensive manual error analysis. Additionally, the results are compared to those obtained by other Latin NER tools. Both analyses demonstrate that the proposed methodology achieves a superior f1-score. Finally, the majority of place entities were automatically associated with unique identifiers that enable geolocation by the projection of pre-disambiguated annotations derived from another geo-spatial project.},\n\turldate = {2025-01-26},\n\tjournal = {DH Benelux Journal},\n\tauthor = {Soffiantini, Laura},\n\tyear = {2024},\n\tpages = {155--168},\n}\n\n\n\n","author_short":["Soffiantini, L."],"key":"soffiantini_cross-linguistic_2024","id":"soffiantini_cross-linguistic_2024","bibbaseid":"soffiantini-crosslinguisticannotationtransferingeoparsingexperimentswithclassicaltexts-2024","role":"author","urls":{"Paper":"https://journal.dhbenelux.org/wp-content/uploads/2024/11/9_Soffiantini_individual.pdf"},"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://bibbase.org/zotero-group/schulzkx/5158478","dataSources":["JFDnASMkoQCjjGL8E"],"keywords":[],"search_terms":["cross","linguistic","annotation","transfer","geoparsing","experiments","classical","texts","soffiantini"],"title":"Cross-linguistic annotation transfer in geoparsing experiments with Classical texts","year":2024}