A Light Transformer-Based Architecture for Handwritten Text Recognition

A Light Transformer-Based Architecture for Handwritten Text Recognition. Barrere, K., Soullard, Y., Lemaitre, A., & Coüasnon, B. In Uchida, S., Barney, E., & Eglin, V., editors, Document Analysis Systems, of Lecture Notes in Computer Science, pages 275–290, Cham, 2022. Springer International Publishing.
doi abstract bibtex

Transformer models have been showing ground-breaking results in the domain of natural language processing. More recently, they started to gain interest in many others fields as in computer vision. Traditional Transformer models typically require a significant amount of training data to achieve satisfactory results. However, in the domain of handwritten text recognition, annotated data acquisition remains costly resulting in small datasets compared to those commonly used to train a Transformer-based model. Hence, training Transformer models able to transcribe handwritten text from images remains challenging. We propose a light encoder-decoder Transformer-based architecture for handwriting text recognition, containing a small number of parameters compared to traditional Transformer architectures. We trained our architecture using a hybrid loss, combining the well-known connectionist temporal classification with the cross-entropy. Experiments are conducted on the well-known IAM dataset with and without the use of additional synthetic data. We show that our network reaches state-of-the-art results in both cases, compared with other larger Transformer-based models.

@inproceedings{barrere_light_2022,
	address = {Cham},
	series = {Lecture {Notes} in {Computer} {Science}},
	title = {A {Light} {Transformer}-{Based} {Architecture} for {Handwritten} {Text} {Recognition}},
	isbn = {978-3-031-06555-2},
	doi = {10.1007/978-3-031-06555-2_19},
	abstract = {Transformer models have been showing ground-breaking results in the domain of natural language processing. More recently, they started to gain interest in many others fields as in computer vision. Traditional Transformer models typically require a significant amount of training data to achieve satisfactory results. However, in the domain of handwritten text recognition, annotated data acquisition remains costly resulting in small datasets compared to those commonly used to train a Transformer-based model. Hence, training Transformer models able to transcribe handwritten text from images remains challenging. We propose a light encoder-decoder Transformer-based architecture for handwriting text recognition, containing a small number of parameters compared to traditional Transformer architectures. We trained our architecture using a hybrid loss, combining the well-known connectionist temporal classification with the cross-entropy. Experiments are conducted on the well-known IAM dataset with and without the use of additional synthetic data. We show that our network reaches state-of-the-art results in both cases, compared with other larger Transformer-based models.},
	language = {en},
	booktitle = {Document {Analysis} {Systems}},
	publisher = {Springer International Publishing},
	author = {Barrere, Killian and Soullard, Yann and Lemaitre, Aurélie and Coüasnon, Bertrand},
	editor = {Uchida, Seiichi and Barney, Elisa and Eglin, Véronique},
	year = {2022},
	keywords = {Handwritten text recognition, Hybrid loss, Light network, Neural networks, Transformer},
	pages = {275--290},
}

Downloads: 0

{"_id":"CHPmaWFc94LbGySRN","bibbaseid":"barrere-soullard-lemaitre-coasnon-alighttransformerbasedarchitectureforhandwrittentextrecognition-2022","author_short":["Barrere, K.","Soullard, Y.","Lemaitre, A.","Coüasnon, B."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","address":"Cham","series":"Lecture Notes in Computer Science","title":"A Light Transformer-Based Architecture for Handwritten Text Recognition","isbn":"978-3-031-06555-2","doi":"10.1007/978-3-031-06555-2_19","abstract":"Transformer models have been showing ground-breaking results in the domain of natural language processing. More recently, they started to gain interest in many others fields as in computer vision. Traditional Transformer models typically require a significant amount of training data to achieve satisfactory results. However, in the domain of handwritten text recognition, annotated data acquisition remains costly resulting in small datasets compared to those commonly used to train a Transformer-based model. Hence, training Transformer models able to transcribe handwritten text from images remains challenging. We propose a light encoder-decoder Transformer-based architecture for handwriting text recognition, containing a small number of parameters compared to traditional Transformer architectures. We trained our architecture using a hybrid loss, combining the well-known connectionist temporal classification with the cross-entropy. Experiments are conducted on the well-known IAM dataset with and without the use of additional synthetic data. We show that our network reaches state-of-the-art results in both cases, compared with other larger Transformer-based models.","language":"en","booktitle":"Document Analysis Systems","publisher":"Springer International Publishing","author":[{"propositions":[],"lastnames":["Barrere"],"firstnames":["Killian"],"suffixes":[]},{"propositions":[],"lastnames":["Soullard"],"firstnames":["Yann"],"suffixes":[]},{"propositions":[],"lastnames":["Lemaitre"],"firstnames":["Aurélie"],"suffixes":[]},{"propositions":[],"lastnames":["Coüasnon"],"firstnames":["Bertrand"],"suffixes":[]}],"editor":[{"propositions":[],"lastnames":["Uchida"],"firstnames":["Seiichi"],"suffixes":[]},{"propositions":[],"lastnames":["Barney"],"firstnames":["Elisa"],"suffixes":[]},{"propositions":[],"lastnames":["Eglin"],"firstnames":["Véronique"],"suffixes":[]}],"year":"2022","keywords":"Handwritten text recognition, Hybrid loss, Light network, Neural networks, Transformer","pages":"275–290","bibtex":"@inproceedings{barrere_light_2022,\n\taddress = {Cham},\n\tseries = {Lecture {Notes} in {Computer} {Science}},\n\ttitle = {A {Light} {Transformer}-{Based} {Architecture} for {Handwritten} {Text} {Recognition}},\n\tisbn = {978-3-031-06555-2},\n\tdoi = {10.1007/978-3-031-06555-2_19},\n\tabstract = {Transformer models have been showing ground-breaking results in the domain of natural language processing. More recently, they started to gain interest in many others fields as in computer vision. Traditional Transformer models typically require a significant amount of training data to achieve satisfactory results. However, in the domain of handwritten text recognition, annotated data acquisition remains costly resulting in small datasets compared to those commonly used to train a Transformer-based model. Hence, training Transformer models able to transcribe handwritten text from images remains challenging. We propose a light encoder-decoder Transformer-based architecture for handwriting text recognition, containing a small number of parameters compared to traditional Transformer architectures. We trained our architecture using a hybrid loss, combining the well-known connectionist temporal classification with the cross-entropy. Experiments are conducted on the well-known IAM dataset with and without the use of additional synthetic data. We show that our network reaches state-of-the-art results in both cases, compared with other larger Transformer-based models.},\n\tlanguage = {en},\n\tbooktitle = {Document {Analysis} {Systems}},\n\tpublisher = {Springer International Publishing},\n\tauthor = {Barrere, Killian and Soullard, Yann and Lemaitre, Aurélie and Coüasnon, Bertrand},\n\teditor = {Uchida, Seiichi and Barney, Elisa and Eglin, Véronique},\n\tyear = {2022},\n\tkeywords = {Handwritten text recognition, Hybrid loss, Light network, Neural networks, Transformer},\n\tpages = {275--290},\n}\n\n","author_short":["Barrere, K.","Soullard, Y.","Lemaitre, A.","Coüasnon, B."],"editor_short":["Uchida, S.","Barney, E.","Eglin, V."],"key":"barrere_light_2022","id":"barrere_light_2022","bibbaseid":"barrere-soullard-lemaitre-coasnon-alighttransformerbasedarchitectureforhandwrittentextrecognition-2022","role":"author","urls":{},"keyword":["Handwritten text recognition","Hybrid loss","Light network","Neural networks","Transformer"],"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://api.zotero.org/groups/2386895/collections/7PPRTB2H/items?format=bibtex&limit=100","dataSources":["u8q5uny4m5jJL9RcX"],"keywords":["handwritten text recognition","hybrid loss","light network","neural networks","transformer"],"search_terms":["light","transformer","based","architecture","handwritten","text","recognition","barrere","soullard","lemaitre","coüasnon"],"title":"A Light Transformer-Based Architecture for Handwritten Text Recognition","year":2022}