EPARCHOS - Historical Greek handwritten document dataset. Papazoglou, A., Pratikakis, I., Markou, K., & Tsochatzidis, L. October, 2020.
EPARCHOS - Historical Greek handwritten document dataset [link]Paper  doi  abstract   bibtex   
The dataset originates from a Greek handwritten codex that dates from around 1500-1530. This is the subset of the codex British Museum Addit. 6791, written by two hands, one by Antonius Eparchos and the other by Camillos Zanettus (ff. 104r-174v) and delivers texts by Hierocles (In Aureum carmen), Matthaeus Blastares (Collectio alphabetica) and, notably, texts by Michael Psellos (De omnifaria doctrina). The writing delivers the most important abbreviations, logograms and conjunctions, which are cited in virtually every Greek minuscule handwritten codex from the years of the manuscript transliteration and the prevalence of the minuscule script (9th century) to the post-Byzantine years. This dataset consists of 120 scanned handwritten text pages, containing 9285 lines of text, 18809 words (6787 unique words). For each page, a PageXML is provided containing the following groundtruth: Text region polygon coordinates Text line polygon coordinates with the corresponding transcription text Word polygon coordinated with the corresponding transcription text
@misc{papazoglou_eparchos_2020,
	title = {{EPARCHOS} - {Historical} {Greek} handwritten document dataset},
	url = {https://zenodo.org/record/4095301},
	doi = {10.5281/zenodo.4095301},
	abstract = {The dataset originates from a Greek handwritten codex that dates from around 1500-1530. This is the subset of the codex British Museum Addit. 6791, written by two hands, one by Antonius Eparchos and the other by Camillos Zanettus (ff. 104r-174v) and delivers texts by Hierocles (In Aureum carmen), Matthaeus Blastares (Collectio alphabetica) and, notably, texts by Michael Psellos (De omnifaria doctrina). The writing delivers the most important abbreviations, logograms and conjunctions, which are cited in virtually every Greek minuscule handwritten codex from the years of the manuscript transliteration and the prevalence of the minuscule script (9th century) to the post-Byzantine years. This dataset consists of 120 scanned handwritten text pages, containing 9285 lines of text, 18809 words (6787 unique words). For each page, a PageXML is provided containing the following groundtruth: Text region polygon coordinates Text line polygon coordinates with the corresponding transcription text Word polygon coordinated with the corresponding transcription text},
	language = {grc},
	urldate = {2022-12-09},
	publisher = {Zenodo},
	author = {Papazoglou, Aleksandros and Pratikakis, Ioannis and Markou, Kleopatra and Tsochatzidis, Lazaros},
	month = oct,
	year = {2020},
	keywords = {greek, handwritten, miniscule, transcription},
}

Downloads: 0