PoCoTo - an Open Source System for Efficient Interactive Postcorrection of OCRed Historical Texts. Vobl, T., Gotscharek, A., Reffle, U., Ringlstetter, C., & Schulz, K. U. In Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage, of DATeCH '14, pages 57–61, New York, NY, USA, 2014. ACM.
PoCoTo - an Open Source System for Efficient Interactive Postcorrection of OCRed Historical Texts [link]Paper  doi  abstract   bibtex   
When applied to historical texts, OCR engines often produce a non-negligible number of OCR errors. For research in the Humanities, text mining and retrieval, the option is important to improve the quality of OCRed historical texts using interactive postcorrection. We describe a system for interactive postcorrection of OCRed historical documents developed in the EU project IMPACT. Various advanced features of the system help to efficiently correct texts. Language technology used in the background takes orthographic variation in historical language into account. Using this knowledge, the tool visualizes possible OCR errors and series of similar possible OCR errors in a given input document. Error series can be corrected in one shot. Practical user tests in three major European libraries have shown that the system considerably reduces the time needed by human correctors to eliminate a certain number of OCR errors. The system has been published as an open source tool under GitHub.
@inproceedings{vobl_pocoto_2014,
	address = {New York, NY, USA},
	series = {{DATeCH} '14},
	title = {{PoCoTo} - an {Open} {Source} {System} for {Efficient} {Interactive} {Postcorrection} of {OCRed} {Historical} {Texts}},
	isbn = {978-1-4503-2588-2},
	url = {http://doi.acm.org/10.1145/2595188.2595197},
	doi = {10.1145/2595188.2595197},
	abstract = {When applied to historical texts, OCR engines often produce a non-negligible number of OCR errors. For research in the Humanities, text mining and retrieval, the option is important to improve the quality of OCRed historical texts using interactive postcorrection. We describe a system for interactive postcorrection of OCRed historical documents developed in the EU project IMPACT. Various advanced features of the system help to efficiently correct texts. Language technology used in the background takes orthographic variation in historical language into account. Using this knowledge, the tool visualizes possible OCR errors and series of similar possible OCR errors in a given input document. Error series can be corrected in one shot. Practical user tests in three major European libraries have shown that the system considerably reduces the time needed by human correctors to eliminate a certain number of OCR errors. The system has been published as an open source tool under GitHub.},
	booktitle = {Proceedings of the {First} {International} {Conference} on {Digital} {Access} to {Textual} {Cultural} {Heritage}},
	publisher = {ACM},
	author = {Vobl, Thorsten and Gotscharek, Annette and Reffle, Uli and Ringlstetter, Christoph and Schulz, Klaus U.},
	year = {2014},
	keywords = {decision support, error correction, user interfaces},
	pages = {57--61},
}

Downloads: 0