You Actually Look Twice At it (YALTAi): Using an object detection approach instead of region segmentation within the Kraken engine

You Actually Look Twice At it (YALTAi): Using an object detection approach instead of region segmentation within the Kraken engine. Clérice, T. arXiv preprint arXiv:2207.11230, 2022.

Paper abstract bibtex

Layout Analysis (the identification of zones and their classification) is the first step along line segmenta- tion in Optical Character Recognition and similar tasks. The ability of identifying main body of text from marginal text or running titles makes the difference between extracting the work full text of a digitized book and noisy outputs. We show that most segmenters focus on pixel classification and that polygoniza- tion of this output has not been used as a target for the latest competition on historical document (ICDAR 2017 and onwards), despite being the focus in the early 2010s. We propose to shift, for efficiency, the task from a pixel classification-based polygonization to an object detection using isothetic rectangles. We compare the output of Kraken and YOLOv5 in terms of segmentation and show that the later severely outperforms the first on small datasets (1110 samples and below). We release two datasets for training and evaluation on historical documents as well as a new package, YALTAi, which injects YOLOv5 in the segmentation pipeline of Kraken 4.1.

@article{clerice_you_2022,
	title = {You {Actually} {Look} {Twice} {At} it ({YALTAi}): {Using} an object detection approach instead of region segmentation within the {Kraken} engine},
	shorttitle = {You {Actually} {Look} {Twice} {At} it ({YALTAi})},
	url = {https://arxiv.org/pdf/2207.11230.pdf},
	abstract = {Layout Analysis (the identification of zones and their classification) is the first step along line segmenta-
tion in Optical Character Recognition and similar tasks. The ability of identifying main body of text from
marginal text or running titles makes the difference between extracting the work full text of a digitized
book and noisy outputs. We show that most segmenters focus on pixel classification and that polygoniza-
tion of this output has not been used as a target for the latest competition on historical document (ICDAR
2017 and onwards), despite being the focus in the early 2010s. We propose to shift, for efficiency, the
task from a pixel classification-based polygonization to an object detection using isothetic rectangles. We
compare the output of Kraken and YOLOv5 in terms of segmentation and show that the later severely
outperforms the first on small datasets (1110 samples and below). We release two datasets for training
and evaluation on historical documents as well as a new package, YALTAi, which injects YOLOv5 in
the segmentation pipeline of Kraken 4.1.},
	urldate = {2023-10-27},
	journal = {arXiv preprint arXiv:2207.11230},
	author = {Clérice, Thibault},
	year = {2022},
}

Downloads: 0

{"_id":"ZJ6znNdFBT5npARRb","bibbaseid":"clrice-youactuallylooktwiceatityaltaiusinganobjectdetectionapproachinsteadofregionsegmentationwithinthekrakenengine-2022","author_short":["Clérice, T."],"bibdata":{"bibtype":"article","type":"article","title":"You Actually Look Twice At it (YALTAi): Using an object detection approach instead of region segmentation within the Kraken engine","shorttitle":"You Actually Look Twice At it (YALTAi)","url":"https://arxiv.org/pdf/2207.11230.pdf","abstract":"Layout Analysis (the identification of zones and their classification) is the first step along line segmenta- tion in Optical Character Recognition and similar tasks. The ability of identifying main body of text from marginal text or running titles makes the difference between extracting the work full text of a digitized book and noisy outputs. We show that most segmenters focus on pixel classification and that polygoniza- tion of this output has not been used as a target for the latest competition on historical document (ICDAR 2017 and onwards), despite being the focus in the early 2010s. We propose to shift, for efficiency, the task from a pixel classification-based polygonization to an object detection using isothetic rectangles. We compare the output of Kraken and YOLOv5 in terms of segmentation and show that the later severely outperforms the first on small datasets (1110 samples and below). We release two datasets for training and evaluation on historical documents as well as a new package, YALTAi, which injects YOLOv5 in the segmentation pipeline of Kraken 4.1.","urldate":"2023-10-27","journal":"arXiv preprint arXiv:2207.11230","author":[{"propositions":[],"lastnames":["Clérice"],"firstnames":["Thibault"],"suffixes":[]}],"year":"2022","bibtex":"@article{clerice_you_2022,\n\ttitle = {You {Actually} {Look} {Twice} {At} it ({YALTAi}): {Using} an object detection approach instead of region segmentation within the {Kraken} engine},\n\tshorttitle = {You {Actually} {Look} {Twice} {At} it ({YALTAi})},\n\turl = {https://arxiv.org/pdf/2207.11230.pdf},\n\tabstract = {Layout Analysis (the identification of zones and their classification) is the first step along line segmenta-\ntion in Optical Character Recognition and similar tasks. The ability of identifying main body of text from\nmarginal text or running titles makes the difference between extracting the work full text of a digitized\nbook and noisy outputs. We show that most segmenters focus on pixel classification and that polygoniza-\ntion of this output has not been used as a target for the latest competition on historical document (ICDAR\n2017 and onwards), despite being the focus in the early 2010s. We propose to shift, for efficiency, the\ntask from a pixel classification-based polygonization to an object detection using isothetic rectangles. We\ncompare the output of Kraken and YOLOv5 in terms of segmentation and show that the later severely\noutperforms the first on small datasets (1110 samples and below). We release two datasets for training\nand evaluation on historical documents as well as a new package, YALTAi, which injects YOLOv5 in\nthe segmentation pipeline of Kraken 4.1.},\n\turldate = {2023-10-27},\n\tjournal = {arXiv preprint arXiv:2207.11230},\n\tauthor = {Clérice, Thibault},\n\tyear = {2022},\n}\n\n","author_short":["Clérice, T."],"key":"clerice_you_2022","id":"clerice_you_2022","bibbaseid":"clrice-youactuallylooktwiceatityaltaiusinganobjectdetectionapproachinsteadofregionsegmentationwithinthekrakenengine-2022","role":"author","urls":{"Paper":"https://arxiv.org/pdf/2207.11230.pdf"},"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://api.zotero.org/groups/2386895/collections/7PPRTB2H/items?format=bibtex&limit=100","dataSources":["u8q5uny4m5jJL9RcX"],"keywords":[],"search_terms":["actually","look","twice","yaltai","using","object","detection","approach","instead","region","segmentation","within","kraken","engine","clérice"],"title":"You Actually Look Twice At it (YALTAi): Using an object detection approach instead of region segmentation within the Kraken engine","year":2022}