„Step away from the Computer!“: Über die linguistische Datenkategorisierung als Erkenntnisprozess und daraus folgende Herausforderungen bei der Nachnutzung von Annotationen und Annotationstools. Shadrova, A., Lüdeling, A., Klotz, M., Hartz, R. G., & Krause, T. Zeitschrift für germanistische Linguistik, 53(1):166–214, April, 2025.
„Step away from the Computer!“: Über die linguistische Datenkategorisierung als Erkenntnisprozess und daraus folgende Herausforderungen bei der Nachnutzung von Annotationen und Annotationstools [link]Paper  doi  abstract   bibtex   
Abstract Linguistic research frequently requires the categorization of language phenomena in corpus data (annotation). Since those may occur plentifully, a partial or full automation of the annotation process appears attractive. The filtering and recombination of existing annotation layers seems to further provide an elegant solution to the deduction of higher-level annotations. In this contribution, we show at the example of German split particle verbs that this approach results in a number of linguistic, technological, and epistemological challenges related to the precise definition of the various models employed and their interfaces. We argue that the manual annotation of corpus data is not merely a preprocessing task, but is itself an epistemological process central to the development of linguistic theory. We discuss why machine-based language processing can neither mimic nor replace this process; why it can generally not reach a level of precision that would be suitable for linguistic research without further integration, adaptation, and manual correction; and how its blind application systematically skews results in crucial areas of research. We close with the suggestion of several best practice approaches which help to prevent and resolve incompatibilities and delays arising from common problems of corpus-based language modeling.
@article{shadrova_step_2025,
	title = {„{Step} away from the {Computer}!“: Über die linguistische {Datenkategorisierung} als {Erkenntnisprozess} und daraus folgende {Herausforderungen} bei der {Nachnutzung} von {Annotationen} und {Annotationstools}},
	volume = {53},
	issn = {1613-0626, 0301-3294},
	shorttitle = {„{Step} away from the {Computer}!“},
	url = {https://www.degruyterbrill.com/document/doi/10.1515/zgl-2025-2005/html},
	doi = {10.1515/zgl-2025-2005},
	abstract = {Abstract
            Linguistic research frequently requires the categorization of language phenomena in corpus data (annotation). Since those may occur plentifully, a partial or full automation of the annotation process appears attractive. The filtering and recombination of existing annotation layers seems to further provide an elegant solution to the deduction of higher-level annotations. In this contribution, we show at the example of German split particle verbs that this approach results in a number of linguistic, technological, and epistemological challenges related to the precise definition of the various models employed and their interfaces. We argue that the manual annotation of corpus data is not merely a preprocessing task, but is itself an epistemological process central to the development of linguistic theory. We discuss why machine-based language processing can neither mimic nor replace this process; why it can generally not reach a level of precision that would be suitable for linguistic research without further integration, adaptation, and manual correction; and how its blind application systematically skews results in crucial areas of research. We close with the suggestion of several best practice approaches which help to prevent and resolve incompatibilities and delays arising from common problems of corpus-based language modeling.},
	language = {en},
	number = {1},
	urldate = {2025-04-11},
	journal = {Zeitschrift für germanistische Linguistik},
	author = {Shadrova, Anna and Lüdeling, Anke and Klotz, Martin and Hartz, Rahel Gajaneh and Krause, Thomas},
	month = apr,
	year = {2025},
	pages = {166--214},
}

Downloads: 0