Abstract
Modern OCRs rely on the use of recurrent deep neural networks for their text recognition task. Despite its known capacity to achieve high accuracies, it is shown that these networks suffer from several drawbacks. Mainly, they are highly dependent on the training data-set and particularly non-resistant to shift and position variability. A data augmentation policy is proposed to remedy this problem. This policy allows generating realistic variations from the original data. A novel fast and efficient fully convolutional neural network (FCNN) with invariance properties is also proposed. Its structure is mainly based on a multi-resolution pyramid of dilated convolutions. It can be seen as an end-to-end 2D signal to sequence analyzer that does not need recurrent layers. In this work, extensive experiments have been held to study the stability of recurrent networks. It is shown that data augmentation significantly improves network stability. Experiments have also confirmed the advantage of the PyraD-DCNN based system in terms of performance but also in terms of stability and resilience to position and shift variability. A private data-set composed of more than 600k images has been used in this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Benton, G., Finzi, M., Izmailov, P., Wilson, A.G.: Learning invariances in neural networks. arXiv preprint arXiv:2010.11882 (2020)
Chang, S.-Y., et al.: Temporal modeling using dilated convolution and gating for voice-activity-detection. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5549–5553. IEEE (2018)
Chen, C., Liu, X., Ding, M., Zheng, J., Li, J.: 3d dilated multi-fiber network for real-time brain tumor segmentation in MRI
Deb, D., Ventura, J.: An aggregated multicolumn dilated convolution network for perspective-free counting. CoRR
Devillard, F., Heit, B.: Multi-scale filters implemented by cellular automaton for retinal layers modelling. Int. J. Parallel Emergent Distrib. Syst. 35(6), 1–24 (2018)
Devries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. CoRR, abs/1708.04552 (2017)
Elagouni, K., Garcia, C., Mamalet, F., Sébillot, P.: Combining multi-scale character recognition and linguistic knowledge for natural scene text OCR
Gong, C., Wang, D., Li, M., Chandra, V., Liu, Q.: KeepAugment: a simple information-preserving data augmentation approach. arXiv preprint arXiv:2011.11778 (2020)
Goodfellow, I.J., Bulatov, Y., Ibarz, J., Arnoud, S., Shet, V.: Multi-digit number recognition from street view imagery using deep convolutional neural networks (2013)
Graves, A., Fernández, S., Gomez, F.J., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: International Conference on Machine Learning (2006)
Gupta, A., Rush, A.M.: Dilated convolutions for modeling long-distance genomic dependencies (2017)
Hataya, R., Zdenek, J., Yoshizoe, K., Nakayama, H.: Faster autoaugment: learning augmentation strategies using backpropagation (2019)
He, P., Huang, W., Qiao, Y., Loy, C., Tang, X.: Reading scene text in deep convolutional sequences. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Jouanne, J., Dauchy, Q., Awal, A.M.: PyraD-DCNN: a fully convolutional neural network to replace BLSTM in offline text recognition systems. In: International Workshop on Computational Aspects of Deep Learning (2021)
Lin, J., Su, Q., Yang, P., Ma, S., Sun, X.: Semantic-unit-based dilated convolution for multi-label text classification (2018)
Mori, S., Suen, C.Y., Yamamoto, K.: Historical review of OCR research and development. Proc. IEEE 80(7), 1029–1058 (1992)
Ptucha, R., Such, F.P., Pillai, S., Brockler, F., Singh, V., Hutkowski, P.: Intelligent character recognition using FCNN. Pattern Recogn. 88, 604–613 (2019)
Renton, G., Soullard, Y., Chatelain, C., Adam, S., Kermorvant, C., Paquet, T.: Fully convolutional network with dilated convolutions for handwritten text line segmentation (2018)
Such, F.P., Peri, D., Brockler, F., Paul, H., Ptucha, R.: Fully convolutional networks for handwriting recognition (2018)
Sypetkowski, M., Jasiulewicz, J., Wojna, Z.: Augmentation inside the network (2020)
Xu, Y., Noy, A., Lin, M., Qian, Q., Li, H., Jin, R.: WEMIX: how to better utilize data augmentation (2020)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Awal, AM., Neitthoffer, T., Ghanmi, N. (2021). Data Augmentation vs. PyraD-DCNN: A Fast, Light, and Shift Invariant FCNN for Text Recognition. In: Barney Smith, E.H., Pal, U. (eds) Document Analysis and Recognition – ICDAR 2021 Workshops. ICDAR 2021. Lecture Notes in Computer Science(), vol 12917. Springer, Cham. https://doi.org/10.1007/978-3-030-86159-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-86159-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86158-2
Online ISBN: 978-3-030-86159-9
eBook Packages: Computer ScienceComputer Science (R0)