Handwritten Text Recognition Test Set: Minutes of the Swiss Federal Council (1848-1903). Hodel, T. & Schoch, D. May, 2021.
Handwritten Text Recognition Test Set: Minutes of the Swiss Federal Council (1848-1903) [link]Paper  doi  abstract   bibtex   
This data set is a test set generated to test the capabilities of engines for Optical Character Recognition and Handwritten Text Recognition. The data set consists of extracts of the minutes of the Swiss Federal Council. The single lines have been randomly chosen from about 150'000 pages of handwritten minutes. For each line, an image file is being provided by the Swiss Federal Archives/Schweizerisches Bundesarchiv [images.tar.gz]. Please cite the images as follows: Excerpts of BAR E1004.1#1000/9#1-215. The images are in the public domain. A PageXML file [page.zip] accompanies every image file and indicates the transcription and coordinates of the line. For PageXML see Pletschacher, S., & Antonacopoulos, A. (2010). The PAGE (Page Analysis and Ground-Truth Elements) Format Framework. 257–260. https://doi.org/10.1109/ICPR.2010.72.
@misc{hodel_handwritten_2021,
	title = {Handwritten {Text} {Recognition} {Test} {Set}: {Minutes} of the {Swiss} {Federal} {Council} (1848-1903)},
	shorttitle = {Handwritten {Text} {Recognition} {Test} {Set}},
	url = {https://zenodo.org/record/4746342},
	doi = {10.5281/zenodo.4746342},
	abstract = {This data set is a test set generated to test the capabilities of engines for Optical Character Recognition and Handwritten Text Recognition. The data set consists of extracts of the minutes of the Swiss Federal Council. The single lines have been randomly chosen from about 150'000 pages of handwritten minutes. For each line, an image file is being provided by the Swiss Federal Archives/Schweizerisches Bundesarchiv [images.tar.gz]. Please cite the images as follows: Excerpts of BAR E1004.1\#1000/9\#1-215. The images are in the public domain. A PageXML file [page.zip] accompanies every image file and indicates the transcription and coordinates of the line. For PageXML see Pletschacher, S., \& Antonacopoulos, A. (2010). The PAGE (Page Analysis and Ground-Truth Elements) Format Framework. 257–260. https://doi.org/10.1109/ICPR.2010.72.},
	language = {deu},
	urldate = {2021-06-07},
	publisher = {Zenodo},
	author = {Hodel, Tobias and Schoch, David},
	month = may,
	year = {2021},
	keywords = {Handwritten Text Recognition, Machine Learning, Test set},
}

Downloads: 0