Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers. Barman, R., Ehrmann, M., Clematide, S., Oliveira, S. A., & Kaplan, F. arXiv:2002.06144 [cs], December, 2020. arXiv: 2002.06144Paper abstract bibtex The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance.
@article{barman_combining_2020,
title = {Combining {Visual} and {Textual} {Features} for {Semantic} {Segmentation} of {Historical} {Newspapers}},
url = {http://arxiv.org/abs/2002.06144},
abstract = {The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance.},
urldate = {2021-06-08},
journal = {arXiv:2002.06144 [cs]},
author = {Barman, Raphaël and Ehrmann, Maud and Clematide, Simon and Oliveira, Sofia Ares and Kaplan, Frédéric},
month = dec,
year = {2020},
note = {arXiv: 2002.06144},
keywords = {Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Retrieval, Computer Science - Machine Learning},
}
Downloads: 0
{"_id":"i7A9WXykZaveea56P","bibbaseid":"barman-ehrmann-clematide-oliveira-kaplan-combiningvisualandtextualfeaturesforsemanticsegmentationofhistoricalnewspapers-2020","author_short":["Barman, R.","Ehrmann, M.","Clematide, S.","Oliveira, S. A.","Kaplan, F."],"bibdata":{"bibtype":"article","type":"article","title":"Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers","url":"http://arxiv.org/abs/2002.06144","abstract":"The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance.","urldate":"2021-06-08","journal":"arXiv:2002.06144 [cs]","author":[{"propositions":[],"lastnames":["Barman"],"firstnames":["Raphaël"],"suffixes":[]},{"propositions":[],"lastnames":["Ehrmann"],"firstnames":["Maud"],"suffixes":[]},{"propositions":[],"lastnames":["Clematide"],"firstnames":["Simon"],"suffixes":[]},{"propositions":[],"lastnames":["Oliveira"],"firstnames":["Sofia","Ares"],"suffixes":[]},{"propositions":[],"lastnames":["Kaplan"],"firstnames":["Frédéric"],"suffixes":[]}],"month":"December","year":"2020","note":"arXiv: 2002.06144","keywords":"Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Retrieval, Computer Science - Machine Learning","bibtex":"@article{barman_combining_2020,\n\ttitle = {Combining {Visual} and {Textual} {Features} for {Semantic} {Segmentation} of {Historical} {Newspapers}},\n\turl = {http://arxiv.org/abs/2002.06144},\n\tabstract = {The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance.},\n\turldate = {2021-06-08},\n\tjournal = {arXiv:2002.06144 [cs]},\n\tauthor = {Barman, Raphaël and Ehrmann, Maud and Clematide, Simon and Oliveira, Sofia Ares and Kaplan, Frédéric},\n\tmonth = dec,\n\tyear = {2020},\n\tnote = {arXiv: 2002.06144},\n\tkeywords = {Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Retrieval, Computer Science - Machine Learning},\n}\n\n","author_short":["Barman, R.","Ehrmann, M.","Clematide, S.","Oliveira, S. A.","Kaplan, F."],"key":"barman_combining_2020","id":"barman_combining_2020","bibbaseid":"barman-ehrmann-clematide-oliveira-kaplan-combiningvisualandtextualfeaturesforsemanticsegmentationofhistoricalnewspapers-2020","role":"author","urls":{"Paper":"http://arxiv.org/abs/2002.06144"},"keyword":["Computer Science - Computation and Language","Computer Science - Computer Vision and Pattern Recognition","Computer Science - Information Retrieval","Computer Science - Machine Learning"],"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://api.zotero.org/groups/2386895/collections/QIDBSRLZ/items?format=bibtex&limit=100","dataSources":["fNQimi75NnY5dsMR6"],"keywords":["computer science - computation and language","computer science - computer vision and pattern recognition","computer science - information retrieval","computer science - machine learning"],"search_terms":["combining","visual","textual","features","semantic","segmentation","historical","newspapers","barman","ehrmann","clematide","oliveira","kaplan"],"title":"Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers","year":2020}