Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done. Romein, C. A., Hodel, T., Gordijn, F., Zundert, J. J. v., Chagué, A., Lange, M. v., Jensen, H. S., Stauder, A., Purcell, J., Terras, M. M., Heuvel, P. v. d., Keijzer, C., Rabus, A., Sitaram, C., Bhatia, A., Depuydt, K., Afolabi-Adeolu, M. A., Anikina, A., Bastianello, E., Benzinger, L. V., Bosse, A., Brown, D., Charlton, A., Dannevig, A. N., Gelder, K. v., Go, S. C., Goh, M. J., Gstrein, S., Hasan, S., Heide, S. v. d., Hindermann, M., Huff, D., Huysman, I., Idris, A., Keijzer, L., Kemper, S., Koenders, S., Kuijpers, E., Rønsig Larsen, L., Lepa, S., Link, T. O., Nispen, A. v., Nockels, J., Noort, L. M. v., Oosterhuis, J. J., Popken, V., Estrella Puertollano, M., Puusaag, J. J., Sheta, A., Stoop, L., Strutzenbladh, E., Sijs, N. v. d., Spek, J. P. v. d., Trouw, B. B., Van Synghel, G., Vučković, V., Wilbrink, H., Weiss, S., Wrisley, D. J., & Zweistra, R. Journal of Data Mining & Digital Humanities, 2024.
Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done [link]Paper  doi  abstract   bibtex   
This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recognition infrastructures, as well as ways to reference and acknowledge contributions to the creation and enrichment of data within these systems. We discuss how one can place Ground Truth data in a repository and, subsequently, inform others through HTR-United. Furthermore, we want to we want to suggest appropriate citation methods for HTR data, models, and contributions made by volunteers. Moreover, when using digitised sources (digital facsimiles), it becomes increasingly important to distinguish between the physical object and the digital collection. These topics all relate to the proper acknowledgement of labour put into digitising, transcribing, and sharing Ground Truth HTR data. This also points to broader issues surrounding the use of machine learning in archival and library contexts, and how the community should begin to acknowledge and record both contributions and data provenance.
@article{romeinExploringDataProvenance2022,
	title = {Exploring {Data} {Provenance} in {Handwritten} {Text} {Recognition} {Infrastructure}: {Sharing} and {Reusing} {Ground} {Truth} {Data}, {Referencing} {Models}, and {Acknowledging} {Contributions}. {Starting} the {Conversation} on {How} {We} {Could} {Get} {It} {Done}},
	issn = {2416-5999},
	shorttitle = {Exploring {Data} {Provenance} in {Handwritten} {Text} {Recognition} {Infrastructure}},
	url = {https://doi.org/10.46298/jdmdh.10403},
	doi = {10.46298/jdmdh.10403},
	abstract = {This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recognition infrastructures, as well as ways to reference and acknowledge contributions to the creation and enrichment of data within these systems. We discuss how one can place Ground Truth data in a repository and, subsequently, inform others through HTR-United. Furthermore, we want to we want to suggest appropriate citation methods for HTR data, models, and contributions made by volunteers. Moreover, when using digitised sources (digital facsimiles), it becomes increasingly important to distinguish between the physical object and the digital collection. These topics all relate to the proper acknowledgement of labour put into digitising, transcribing, and sharing Ground Truth HTR data. This also points to broader issues surrounding the use of machine learning in archival and library contexts, and how the community should begin to acknowledge and record both contributions and data provenance.},
	language = {eng},
	urldate = {2023-03-15},
	journal = {Journal of Data Mining \& Digital Humanities},
	author = {Romein, C. Annemieke and Hodel, Tobias and Gordijn, Femke and Zundert, Joris J. van and Chagué, Alix and Lange, Milan van and Jensen, Helle Strandgaard and Stauder, Andy and Purcell, Jake and Terras, Melissa M. and Heuvel, Pauline van den and Keijzer, Carlijn and Rabus, Achim and Sitaram, Chantal and Bhatia, Aakriti and Depuydt, Katrien and Afolabi-Adeolu, Mary Aderonke and Anikina, Anastasiia and Bastianello, Elisa and Benzinger, Lukas Vincent and Bosse, Arno and Brown, David and Charlton, Ash and Dannevig, André Nilsson and Gelder, Klaas van and Go, Sabine C.P.J. and Goh, Marcus J.C. and Gstrein, Silvia and Hasan, Sewa and Heide, Stefan von der and Hindermann, Maximilian and Huff, Dorothee and Huysman, Ineke and Idris, Ali and Keijzer, Liesbeth and Kemper, Simon and Koenders, Sanne and Kuijpers, Erika and Rønsig Larsen, Lisette and Lepa, Sven and Link, Tommy O. and Nispen, Annelies van and Nockels, Joe and Noort, Laura M. van and Oosterhuis, Joost Johannes and Popken, Vivien and Estrella Puertollano, María and Puusaag, Joosep J. and Sheta, Ahmed and Stoop, Lex and Strutzenbladh, Ebba and Sijs, Nicoline van der and Spek, Jan Paul van der and Trouw, Barry Benaissa and Van Synghel, Geertrui and Vučković, Vladimir and Wilbrink, Heleen and Weiss, Sonia and Wrisley, David Joseph and Zweistra, Riet},
	year = {2024},
	keywords = {Data Provenance, Ground Truth, Handwritten Text Recognition, Transkribus, eScriptorium},
}

Downloads: 0