A Light Transformer-Based Architecture for Handwritten Text Recognition. Barrere, K., Soullard, Y., Lemaitre, A., & Coüasnon, B. In Uchida, S., Barney, E., & Eglin, V., editors, Document Analysis Systems, of Lecture Notes in Computer Science, pages 275–290, Cham, 2022. Springer International Publishing.
doi  abstract   bibtex   
Transformer models have been showing ground-breaking results in the domain of natural language processing. More recently, they started to gain interest in many others fields as in computer vision. Traditional Transformer models typically require a significant amount of training data to achieve satisfactory results. However, in the domain of handwritten text recognition, annotated data acquisition remains costly resulting in small datasets compared to those commonly used to train a Transformer-based model. Hence, training Transformer models able to transcribe handwritten text from images remains challenging. We propose a light encoder-decoder Transformer-based architecture for handwriting text recognition, containing a small number of parameters compared to traditional Transformer architectures. We trained our architecture using a hybrid loss, combining the well-known connectionist temporal classification with the cross-entropy. Experiments are conducted on the well-known IAM dataset with and without the use of additional synthetic data. We show that our network reaches state-of-the-art results in both cases, compared with other larger Transformer-based models.
@inproceedings{barrere_light_2022,
	address = {Cham},
	series = {Lecture {Notes} in {Computer} {Science}},
	title = {A {Light} {Transformer}-{Based} {Architecture} for {Handwritten} {Text} {Recognition}},
	isbn = {978-3-031-06555-2},
	doi = {10.1007/978-3-031-06555-2_19},
	abstract = {Transformer models have been showing ground-breaking results in the domain of natural language processing. More recently, they started to gain interest in many others fields as in computer vision. Traditional Transformer models typically require a significant amount of training data to achieve satisfactory results. However, in the domain of handwritten text recognition, annotated data acquisition remains costly resulting in small datasets compared to those commonly used to train a Transformer-based model. Hence, training Transformer models able to transcribe handwritten text from images remains challenging. We propose a light encoder-decoder Transformer-based architecture for handwriting text recognition, containing a small number of parameters compared to traditional Transformer architectures. We trained our architecture using a hybrid loss, combining the well-known connectionist temporal classification with the cross-entropy. Experiments are conducted on the well-known IAM dataset with and without the use of additional synthetic data. We show that our network reaches state-of-the-art results in both cases, compared with other larger Transformer-based models.},
	language = {en},
	booktitle = {Document {Analysis} {Systems}},
	publisher = {Springer International Publishing},
	author = {Barrere, Killian and Soullard, Yann and Lemaitre, Aurélie and Coüasnon, Bertrand},
	editor = {Uchida, Seiichi and Barney, Elisa and Eglin, Véronique},
	year = {2022},
	keywords = {Handwritten text recognition, Hybrid loss, Light network, Neural networks, Transformer},
	pages = {275--290},
}

Downloads: 0