Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers. Barman, R., Ehrmann, M., Clematide, S., Oliveira, S. A., & Kaplan, F. arXiv:2002.06144 [cs], December, 2020. 🏷️ /unread、Computer Science - Computation and Language、Computer Science - Machine Learning、Computer Science - Information Retrieval、Computer Science - Computer Vision and Pattern Recognition

Paper abstract bibtex

The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance. 【摘要翻译】过去几十年来，人们获得了大量数字化历史文献，这自然为自动处理和探索提供了条件。寻求自动处理传真并从中提取信息的研究工作成倍增加，而文件布局分析则是其中必不可少的第一步。在过去几年中，深度学习技术在识别和分类文档图像中的兴趣片段方面取得了重大进展，但在使用更精细的分割类型以及考虑历史报纸等复杂的异质文档等方面仍存在许多挑战。此外，大多数方法只考虑视觉特征，忽略了文本信号。在这种情况下，我们引入了一种结合视觉和文本特征的多模态方法，用于历史报纸的语义分割。在对瑞士和卢森堡的非同步报纸进行一系列实验的基础上，我们研究了视觉和文本特征的预测能力及其跨时间和跨来源的概括能力。结果表明，与强大的视觉基线相比，多模态模型具有持续的改进，并且对高材料差异具有更好的稳健性。

@article{barman2020,
	title = {Combining {Visual} and {Textual} {Features} for {Semantic} {Segmentation} of {Historical} {Newspapers}},
	shorttitle = {结合视觉和文本特征对历史报纸进行语义分割},
	url = {http://arxiv.org/abs/2002.06144},
	abstract = {The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance.

【摘要翻译】过去几十年来，人们获得了大量数字化历史文献，这自然为自动处理和探索提供了条件。寻求自动处理传真并从中提取信息的研究工作成倍增加，而文件布局分析则是其中必不可少的第一步。在过去几年中，深度学习技术在识别和分类文档图像中的兴趣片段方面取得了重大进展，但在使用更精细的分割类型以及考虑历史报纸等复杂的异质文档等方面仍存在许多挑战。此外，大多数方法只考虑视觉特征，忽略了文本信号。在这种情况下，我们引入了一种结合视觉和文本特征的多模态方法，用于历史报纸的语义分割。在对瑞士和卢森堡的非同步报纸进行一系列实验的基础上，我们研究了视觉和文本特征的预测能力及其跨时间和跨来源的概括能力。结果表明，与强大的视觉基线相比，多模态模型具有持续的改进，并且对高材料差异具有更好的稳健性。},
	language = {en},
	urldate = {2021-06-08},
	journal = {arXiv:2002.06144 [cs]},
	author = {Barman, Raphaël and Ehrmann, Maud and Clematide, Simon and Oliveira, Sofia Ares and Kaplan, Frédéric},
	month = dec,
	year = {2020},
	note = {🏷️ /unread、Computer Science - Computation and Language、Computer Science - Machine Learning、Computer Science - Information Retrieval、Computer Science - Computer Vision and Pattern Recognition},
	keywords = {/unread, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Retrieval, Computer Science - Machine Learning},
}

Downloads: 0

{"_id":"i7A9WXykZaveea56P","bibbaseid":"barman-ehrmann-clematide-oliveira-kaplan-combiningvisualandtextualfeaturesforsemanticsegmentationofhistoricalnewspapers-2020","author_short":["Barman, R.","Ehrmann, M.","Clematide, S.","Oliveira, S. A.","Kaplan, F."],"bibdata":{"bibtype":"article","type":"article","title":"Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers","shorttitle":"结合视觉和文本特征对历史报纸进行语义分割","url":"http://arxiv.org/abs/2002.06144","abstract":"The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance. 【摘要翻译】过去几十年来，人们获得了大量数字化历史文献，这自然为自动处理和探索提供了条件。寻求自动处理传真并从中提取信息的研究工作成倍增加，而文件布局分析则是其中必不可少的第一步。在过去几年中，深度学习技术在识别和分类文档图像中的兴趣片段方面取得了重大进展，但在使用更精细的分割类型以及考虑历史报纸等复杂的异质文档等方面仍存在许多挑战。此外，大多数方法只考虑视觉特征，忽略了文本信号。在这种情况下，我们引入了一种结合视觉和文本特征的多模态方法，用于历史报纸的语义分割。在对瑞士和卢森堡的非同步报纸进行一系列实验的基础上，我们研究了视觉和文本特征的预测能力及其跨时间和跨来源的概括能力。结果表明，与强大的视觉基线相比，多模态模型具有持续的改进，并且对高材料差异具有更好的稳健性。","language":"en","urldate":"2021-06-08","journal":"arXiv:2002.06144 [cs]","author":[{"propositions":[],"lastnames":["Barman"],"firstnames":["Raphaël"],"suffixes":[]},{"propositions":[],"lastnames":["Ehrmann"],"firstnames":["Maud"],"suffixes":[]},{"propositions":[],"lastnames":["Clematide"],"firstnames":["Simon"],"suffixes":[]},{"propositions":[],"lastnames":["Oliveira"],"firstnames":["Sofia","Ares"],"suffixes":[]},{"propositions":[],"lastnames":["Kaplan"],"firstnames":["Frédéric"],"suffixes":[]}],"month":"December","year":"2020","note":"🏷️ /unread、Computer Science - Computation and Language、Computer Science - Machine Learning、Computer Science - Information Retrieval、Computer Science - Computer Vision and Pattern Recognition","keywords":"/unread, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Retrieval, Computer Science - Machine Learning","bibtex":"@article{barman2020,\n\ttitle = {Combining {Visual} and {Textual} {Features} for {Semantic} {Segmentation} of {Historical} {Newspapers}},\n\tshorttitle = {结合视觉和文本特征对历史报纸进行语义分割},\n\turl = {http://arxiv.org/abs/2002.06144},\n\tabstract = {The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance.\n\n【摘要翻译】过去几十年来，人们获得了大量数字化历史文献，这自然为自动处理和探索提供了条件。寻求自动处理传真并从中提取信息的研究工作成倍增加，而文件布局分析则是其中必不可少的第一步。在过去几年中，深度学习技术在识别和分类文档图像中的兴趣片段方面取得了重大进展，但在使用更精细的分割类型以及考虑历史报纸等复杂的异质文档等方面仍存在许多挑战。此外，大多数方法只考虑视觉特征，忽略了文本信号。在这种情况下，我们引入了一种结合视觉和文本特征的多模态方法，用于历史报纸的语义分割。在对瑞士和卢森堡的非同步报纸进行一系列实验的基础上，我们研究了视觉和文本特征的预测能力及其跨时间和跨来源的概括能力。结果表明，与强大的视觉基线相比，多模态模型具有持续的改进，并且对高材料差异具有更好的稳健性。},\n\tlanguage = {en},\n\turldate = {2021-06-08},\n\tjournal = {arXiv:2002.06144 [cs]},\n\tauthor = {Barman, Raphaël and Ehrmann, Maud and Clematide, Simon and Oliveira, Sofia Ares and Kaplan, Frédéric},\n\tmonth = dec,\n\tyear = {2020},\n\tnote = {🏷️ /unread、Computer Science - Computation and Language、Computer Science - Machine Learning、Computer Science - Information Retrieval、Computer Science - Computer Vision and Pattern Recognition},\n\tkeywords = {/unread, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Retrieval, Computer Science - Machine Learning},\n}\n\n","author_short":["Barman, R.","Ehrmann, M.","Clematide, S.","Oliveira, S. A.","Kaplan, F."],"key":"barman2020","id":"barman2020","bibbaseid":"barman-ehrmann-clematide-oliveira-kaplan-combiningvisualandtextualfeaturesforsemanticsegmentationofhistoricalnewspapers-2020","role":"author","urls":{"Paper":"http://arxiv.org/abs/2002.06144"},"keyword":["/unread","Computer Science - Computation and Language","Computer Science - Computer Vision and Pattern Recognition","Computer Science - Information Retrieval","Computer Science - Machine Learning"],"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://api.zotero.org/groups/2386895/collections/QIDBSRLZ/items?format=bibtex&limit=100","dataSources":["fNQimi75NnY5dsMR6"],"keywords":["/unread","computer science - computation and language","computer science - computer vision and pattern recognition","computer science - information retrieval","computer science - machine learning"],"search_terms":["combining","visual","textual","features","semantic","segmentation","historical","newspapers","barman","ehrmann","clematide","oliveira","kaplan"],"title":"Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers","year":2020}