Segmentation for document layout analysis: not dead yet

Segmentation for document layout analysis: not dead yet. Markewich, L., Zhang, H., Xing, Y., Lambert-Shirzad, N., Jiang, Z., Lee, R. K., Li, Z., & Ko, S. International Journal on Document Analysis and Recognition (IJDAR), 25(2):67–77, June, 2022.

Paper doi abstract bibtex

Document layout analysis is often the first task in document understanding systems, where a document is broken down into identifiable sections. One of the most common approaches to this task is image segmentation, where each pixel in a document image is classified. However, this task is challenging because as the number of classes increases, small and infrequent objects often get missed. In this paper, we propose a weighted bounding box regression loss methodology to improve accuracy for segmentation of document layouts, while demonstrating our results on our dense article dataset (DAD) and the existing PubLayNet dataset. First, we collect and annotate 43 document object classes across 450 open access research articles, constructing DAD. After benchmarking several segmentation networks, we achieve an F1 score of 96.26% on DAD and 97.11% on PubLayNet with DeeplabV3+, while also showing a bounding box regression method for segmentation results that improves the F1 by +1.99 points on DAD. Finally, we demonstrate the networks trained on DAD can be used as a bootstrapped annotation tool for the existing document layout datasets, decreasing annotation time by 38% with DeeplabV3+.

@article{markewich_segmentation_2022,
	title = {Segmentation for document layout analysis: not dead yet},
	volume = {25},
	issn = {1433-2825},
	shorttitle = {Segmentation for document layout analysis},
	url = {https://doi.org/10.1007/s10032-021-00391-3},
	doi = {10.1007/s10032-021-00391-3},
	abstract = {Document layout analysis is often the first task in document understanding systems, where a document is broken down into identifiable sections. One of the most common approaches to this task is image segmentation, where each pixel in a document image is classified. However, this task is challenging because as the number of classes increases, small and infrequent objects often get missed. In this paper, we propose a weighted bounding box regression loss methodology to improve accuracy for segmentation of document layouts, while demonstrating our results on our dense article dataset (DAD) and the existing PubLayNet dataset. First, we collect and annotate 43 document object classes across 450 open access research articles, constructing DAD. After benchmarking several segmentation networks, we achieve an F1 score of 96.26\% on DAD and 97.11\% on PubLayNet with DeeplabV3+, while also showing a bounding box regression method for segmentation results that improves the F1 by +1.99 points on DAD. Finally, we demonstrate the networks trained on DAD can be used as a bootstrapped annotation tool for the existing document layout datasets, decreasing annotation time by 38\% with DeeplabV3+.},
	language = {en},
	number = {2},
	urldate = {2023-05-11},
	journal = {International Journal on Document Analysis and Recognition (IJDAR)},
	author = {Markewich, Logan and Zhang, Hao and Xing, Yubin and Lambert-Shirzad, Navid and Jiang, Zhexin and Lee, Roy Ka-Wei and Li, Zhi and Ko, Seok-Bum},
	month = jun,
	year = {2022},
	keywords = {\#nosource, Annotation, Computer vision, Document layout analysis, Semantic segmentation},
	pages = {67--77},
}

Downloads: 0

{"_id":"nM9hx7gp9W3B4LCXQ","bibbaseid":"markewich-zhang-xing-lambertshirzad-jiang-lee-li-ko-segmentationfordocumentlayoutanalysisnotdeadyet-2022","author_short":["Markewich, L.","Zhang, H.","Xing, Y.","Lambert-Shirzad, N.","Jiang, Z.","Lee, R. K.","Li, Z.","Ko, S."],"bibdata":{"bibtype":"article","type":"article","title":"Segmentation for document layout analysis: not dead yet","volume":"25","issn":"1433-2825","shorttitle":"Segmentation for document layout analysis","url":"https://doi.org/10.1007/s10032-021-00391-3","doi":"10.1007/s10032-021-00391-3","abstract":"Document layout analysis is often the first task in document understanding systems, where a document is broken down into identifiable sections. One of the most common approaches to this task is image segmentation, where each pixel in a document image is classified. However, this task is challenging because as the number of classes increases, small and infrequent objects often get missed. In this paper, we propose a weighted bounding box regression loss methodology to improve accuracy for segmentation of document layouts, while demonstrating our results on our dense article dataset (DAD) and the existing PubLayNet dataset. First, we collect and annotate 43 document object classes across 450 open access research articles, constructing DAD. After benchmarking several segmentation networks, we achieve an F1 score of 96.26% on DAD and 97.11% on PubLayNet with DeeplabV3+, while also showing a bounding box regression method for segmentation results that improves the F1 by +1.99 points on DAD. Finally, we demonstrate the networks trained on DAD can be used as a bootstrapped annotation tool for the existing document layout datasets, decreasing annotation time by 38% with DeeplabV3+.","language":"en","number":"2","urldate":"2023-05-11","journal":"International Journal on Document Analysis and Recognition (IJDAR)","author":[{"propositions":[],"lastnames":["Markewich"],"firstnames":["Logan"],"suffixes":[]},{"propositions":[],"lastnames":["Zhang"],"firstnames":["Hao"],"suffixes":[]},{"propositions":[],"lastnames":["Xing"],"firstnames":["Yubin"],"suffixes":[]},{"propositions":[],"lastnames":["Lambert-Shirzad"],"firstnames":["Navid"],"suffixes":[]},{"propositions":[],"lastnames":["Jiang"],"firstnames":["Zhexin"],"suffixes":[]},{"propositions":[],"lastnames":["Lee"],"firstnames":["Roy","Ka-Wei"],"suffixes":[]},{"propositions":[],"lastnames":["Li"],"firstnames":["Zhi"],"suffixes":[]},{"propositions":[],"lastnames":["Ko"],"firstnames":["Seok-Bum"],"suffixes":[]}],"month":"June","year":"2022","keywords":"#nosource, Annotation, Computer vision, Document layout analysis, Semantic segmentation","pages":"67–77","bibtex":"@article{markewich_segmentation_2022,\n\ttitle = {Segmentation for document layout analysis: not dead yet},\n\tvolume = {25},\n\tissn = {1433-2825},\n\tshorttitle = {Segmentation for document layout analysis},\n\turl = {https://doi.org/10.1007/s10032-021-00391-3},\n\tdoi = {10.1007/s10032-021-00391-3},\n\tabstract = {Document layout analysis is often the first task in document understanding systems, where a document is broken down into identifiable sections. One of the most common approaches to this task is image segmentation, where each pixel in a document image is classified. However, this task is challenging because as the number of classes increases, small and infrequent objects often get missed. In this paper, we propose a weighted bounding box regression loss methodology to improve accuracy for segmentation of document layouts, while demonstrating our results on our dense article dataset (DAD) and the existing PubLayNet dataset. First, we collect and annotate 43 document object classes across 450 open access research articles, constructing DAD. After benchmarking several segmentation networks, we achieve an F1 score of 96.26\\% on DAD and 97.11\\% on PubLayNet with DeeplabV3+, while also showing a bounding box regression method for segmentation results that improves the F1 by +1.99 points on DAD. Finally, we demonstrate the networks trained on DAD can be used as a bootstrapped annotation tool for the existing document layout datasets, decreasing annotation time by 38\\% with DeeplabV3+.},\n\tlanguage = {en},\n\tnumber = {2},\n\turldate = {2023-05-11},\n\tjournal = {International Journal on Document Analysis and Recognition (IJDAR)},\n\tauthor = {Markewich, Logan and Zhang, Hao and Xing, Yubin and Lambert-Shirzad, Navid and Jiang, Zhexin and Lee, Roy Ka-Wei and Li, Zhi and Ko, Seok-Bum},\n\tmonth = jun,\n\tyear = {2022},\n\tkeywords = {\\#nosource, Annotation, Computer vision, Document layout analysis, Semantic segmentation},\n\tpages = {67--77},\n}\n\n\n\n","author_short":["Markewich, L.","Zhang, H.","Xing, Y.","Lambert-Shirzad, N.","Jiang, Z.","Lee, R. K.","Li, Z.","Ko, S."],"key":"markewich_segmentation_2022","id":"markewich_segmentation_2022","bibbaseid":"markewich-zhang-xing-lambertshirzad-jiang-lee-li-ko-segmentationfordocumentlayoutanalysisnotdeadyet-2022","role":"author","urls":{"Paper":"https://doi.org/10.1007/s10032-021-00391-3"},"keyword":["#nosource","Annotation","Computer vision","Document layout analysis","Semantic segmentation"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"article","biburl":"https://bibbase.org/zotero/fsimonetta","dataSources":["pzyFFGWvxG2bs63zP"],"keywords":["#nosource","annotation","computer vision","document layout analysis","semantic segmentation"],"search_terms":["segmentation","document","layout","analysis","dead","markewich","zhang","xing","lambert-shirzad","jiang","lee","li","ko"],"title":"Segmentation for document layout analysis: not dead yet","year":2022}