Data Augmentation vs. PyraD-DCNN: A Fast, Light, and Shift Invariant FCNN for Text Recognition

Awal, Ahmad-Montaser; Neitthoffer, Timothée; Ghanmi, Nabil

doi:10.1007/978-3-030-86159-9_3

Ahmad-Montaser Awal¹⁰,
Timothée Neitthoffer¹⁰ &
Nabil Ghanmi¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12917))

Included in the following conference series:

International Conference on Document Analysis and Recognition

1852 Accesses

Abstract

Modern OCRs rely on the use of recurrent deep neural networks for their text recognition task. Despite its known capacity to achieve high accuracies, it is shown that these networks suffer from several drawbacks. Mainly, they are highly dependent on the training data-set and particularly non-resistant to shift and position variability. A data augmentation policy is proposed to remedy this problem. This policy allows generating realistic variations from the original data. A novel fast and efficient fully convolutional neural network (FCNN) with invariance properties is also proposed. Its structure is mainly based on a multi-resolution pyramid of dilated convolutions. It can be seen as an end-to-end 2D signal to sequence analyzer that does not need recurrent layers. In this work, extensive experiments have been held to study the stability of recurrent networks. It is shown that data augmentation significantly improves network stability. Experiments have also confirmed the advantage of the PyraD-DCNN based system in terms of performance but also in terms of stability and resilience to position and shift variability. A private data-set composed of more than 600k images has been used in this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers

Best Practices for a Handwritten Text Recognition System

PyraD-DCNN: A Fully Convolutional Neural Network to Replace BLSTM in Offline Text Recognition Systems

References

Benton, G., Finzi, M., Izmailov, P., Wilson, A.G.: Learning invariances in neural networks. arXiv preprint arXiv:2010.11882 (2020)
Chang, S.-Y., et al.: Temporal modeling using dilated convolution and gating for voice-activity-detection. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5549–5553. IEEE (2018)
Google Scholar
Chen, C., Liu, X., Ding, M., Zheng, J., Li, J.: 3d dilated multi-fiber network for real-time brain tumor segmentation in MRI
Google Scholar
Deb, D., Ventura, J.: An aggregated multicolumn dilated convolution network for perspective-free counting. CoRR
Google Scholar
Devillard, F., Heit, B.: Multi-scale filters implemented by cellular automaton for retinal layers modelling. Int. J. Parallel Emergent Distrib. Syst. 35(6), 1–24 (2018)
Google Scholar
Devries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. CoRR, abs/1708.04552 (2017)
Google Scholar
Elagouni, K., Garcia, C., Mamalet, F., Sébillot, P.: Combining multi-scale character recognition and linguistic knowledge for natural scene text OCR
Google Scholar
Gong, C., Wang, D., Li, M., Chandra, V., Liu, Q.: KeepAugment: a simple information-preserving data augmentation approach. arXiv preprint arXiv:2011.11778 (2020)
Goodfellow, I.J., Bulatov, Y., Ibarz, J., Arnoud, S., Shet, V.: Multi-digit number recognition from street view imagery using deep convolutional neural networks (2013)
Google Scholar
Graves, A., Fernández, S., Gomez, F.J., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: International Conference on Machine Learning (2006)
Google Scholar
Gupta, A., Rush, A.M.: Dilated convolutions for modeling long-distance genomic dependencies (2017)
Google Scholar
Hataya, R., Zdenek, J., Yoshizoe, K., Nakayama, H.: Faster autoaugment: learning augmentation strategies using backpropagation (2019)
Google Scholar
He, P., Huang, W., Qiao, Y., Loy, C., Tang, X.: Reading scene text in deep convolutional sequences. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Google Scholar
Jouanne, J., Dauchy, Q., Awal, A.M.: PyraD-DCNN: a fully convolutional neural network to replace BLSTM in offline text recognition systems. In: International Workshop on Computational Aspects of Deep Learning (2021)
Google Scholar
Lin, J., Su, Q., Yang, P., Ma, S., Sun, X.: Semantic-unit-based dilated convolution for multi-label text classification (2018)
Google Scholar
Mori, S., Suen, C.Y., Yamamoto, K.: Historical review of OCR research and development. Proc. IEEE 80(7), 1029–1058 (1992)
Article Google Scholar
Ptucha, R., Such, F.P., Pillai, S., Brockler, F., Singh, V., Hutkowski, P.: Intelligent character recognition using FCNN. Pattern Recogn. 88, 604–613 (2019)
Google Scholar
Renton, G., Soullard, Y., Chatelain, C., Adam, S., Kermorvant, C., Paquet, T.: Fully convolutional network with dilated convolutions for handwritten text line segmentation (2018)
Google Scholar
Such, F.P., Peri, D., Brockler, F., Paul, H., Ptucha, R.: Fully convolutional networks for handwriting recognition (2018)
Google Scholar
Sypetkowski, M., Jasiulewicz, J., Wojna, Z.: Augmentation inside the network (2020)
Google Scholar
Xu, Y., Noy, A., Lin, M., Qian, Q., Li, H., Jin, R.: WEMIX: how to better utilize data augmentation (2020)
Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Research Department - ARIADNEXT, Cesson-Sévigné, France
Ahmad-Montaser Awal, Timothée Neitthoffer & Nabil Ghanmi

Authors

Ahmad-Montaser Awal
View author publications
You can also search for this author in PubMed Google Scholar
Timothée Neitthoffer
View author publications
You can also search for this author in PubMed Google Scholar
Nabil Ghanmi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmad-Montaser Awal .

Editor information

Editors and Affiliations

Boise State University, Boise, ID, USA
Elisa H. Barney Smith
Indian Statistical Institute, Kolkata, India
Umapada Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Awal, AM., Neitthoffer, T., Ghanmi, N. (2021). Data Augmentation vs. PyraD-DCNN: A Fast, Light, and Shift Invariant FCNN for Text Recognition. In: Barney Smith, E.H., Pal, U. (eds) Document Analysis and Recognition – ICDAR 2021 Workshops. ICDAR 2021. Lecture Notes in Computer Science(), vol 12917. Springer, Cham. https://doi.org/10.1007/978-3-030-86159-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-86159-9_3
Published: 02 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86158-2
Online ISBN: 978-3-030-86159-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Data Augmentation vs. PyraD-DCNN: A Fast, Light, and Shift Invariant FCNN for Text Recognition