Impact of Image Enhancement Methods on Automatic Transcription Trainings with eScriptorium. Jacsont, P. & Leblanc, E. June 2023.
Impact of Image Enhancement Methods on Automatic Transcription Trainings with eScriptorium [link]Paper  abstract   bibtex   
This study stems from the Desenrollando el cordel (Untangling the cordel) project, which focuses on 19th-century Spanish prints editing. It evaluates the impact of image enhancement methods on the automatic transcription of low-quality documents, both in terms of printing and digitisation. We compare different methods (binarisation, deblur) and present the results obtained during the training of models with the Kraken tool. We demonstrate that binarisation methods give better results than the other, and that the combination of several techniques did not significantly improve the transcription prediction. This study shows the significance of using image enhancement methods with Kraken. It paves the way for further experiments with larger and more varied corpora to help future projects design their automatic transcription workflow.
@unpublished{jacsont2023,
	title = {Impact of {Image} {Enhancement} {Methods} on {Automatic} {Transcription} {Trainings} with {eScriptorium}},
	url = {https://hal.science/hal-03831686},
	abstract = {This study stems from the Desenrollando el cordel (Untangling the cordel) project, which focuses on 19th-century Spanish prints editing. It evaluates the impact of image enhancement methods on the automatic transcription of low-quality documents, both in terms of printing and digitisation. We compare different methods (binarisation, deblur) and present the results obtained during the training of models with the Kraken tool. We demonstrate that binarisation methods give better results than the other, and that the combination of several techniques did not significantly improve the transcription prediction. This study shows the significance of using image enhancement methods with Kraken. It paves the way for further experiments with larger and more varied corpora to help future projects design their automatic transcription workflow.},
	urldate = {2024-05-03},
	author = {Jacsont, Pauline and Leblanc, Elina},
	month = jun,
	year = {2023},
	keywords = {Spanish literature, binarisation, image enhancement methods, printed documents},
}

Downloads: 0