Improvising the CNN Feature Maps Through Integration of Channel Attention for Handwritten Text Recognition

Shashank, B. N.; Nagesh Bhattu, S.; Sri Phani Krishna, K.

doi:10.1007/978-3-031-31417-9_37

B. N. Shashank¹⁰,
S. Nagesh Bhattu¹⁰ &
K. Sri Phani Krishna¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1777))

Included in the following conference series:

International Conference on Computer Vision and Image Processing

397 Accesses

Abstract

Convolutional Neural Network (CNN) based encoder and Recurrent Neural Network (RNN) based decoder architectures are widely used in the design of Handwritten Text Recognition (HTR) systems. Effective encoder representation plays a vital role in improving the performance of HTR systems. Squeeze and Excitation Networks, used in the context of image classification, object detection and scene classification, capture global inter-channel dependencies. ECA-Net learns channel attention via local Cross Channel Interaction (CCI). The current work proposes an encoder-decoder architecture for HTR which combines the benefits of local and global cross-channel attention for effective encoder representation. Experimental results on the IAM dataset show that there is an 8.98%, 3.24% reduction in Character Error Rate (CER) and an 8.98%, 3.45% reduction in Word Error Rate (WER) when the proposed module is applied to the state-of-the-art HTR Flor model and Puigcerver model respectively. The proposed work also presents a detailed error analysis at the character level on the IAM dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Neto, d.S., Flor, A., et al.: HTR-Flor: a deep learning system for offline handwritten text recognition. In: 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). IEEE (2020)
Google Scholar
Joan. P.: Are multidimensional recurrent layers really necessary for handwritten text recognition?. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Vol. 1. IEEE (2017)
Google Scholar
Plötz, T., Fink, G.A.: Markov models for offline handwriting recognition: a survey. Int. J. Doc. Anal. Recogn. (IJDAR) 12(4), 269–298 (2009)
Article Google Scholar
Frinken, V., Peter, T., Fischer, A., Bunke, H., Do, T.-M.-T., Artieres, T.: Improved Handwriting Recognition by Combining Two Forms of Hidden Markov Models and a Recurrent Neural Network. In: Jiang, X., Petkov, N. (eds.) CAIP 2009. LNCS, vol. 5702, pp. 189–196. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03767-2_23
Chapter Google Scholar
Bluche, T., Ney, H., Kermorvant, C.: Tandem HMM with convolutional neural network for handwritten word recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE (2013)
Google Scholar
Alex, G., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. Adv. Neural Inf. Proc. Syst. 21 (2008)
Google Scholar
Théodore, B., Messina, B.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Vol. 1. IEEE (2017)
Google Scholar
Jie, H., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Qilong, W., et al.: Supplementary material for ‘ECA-Net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, WA, USA (2020)
Google Scholar
Marti, U-V., Horst Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recog. 5, 39–46 (2002)
Google Scholar
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM (JACM) 21(1), 168–173 (1974)
Article MathSciNet MATH Google Scholar
Nafiz, A., Fatos, Y.V.: An overview of character recognition focused on off-line handwriting. IEEE Trans. Syst. Man Cyber. Part C (Appl. Rev.) 31, 216–233 (2001). https://doi.org/10.1109/5326.941845
Marti, U.-V., Bunke, H.: Using a statistical language model to improve the performance of an hmm-based cursive handwriting recognition system. IJPRAI. 15, 65–90 (2001). https://doi.org/10.1142/S0218001401000848
Article Google Scholar
Sauvola, J., Seppänen, T., Haapakoski, S., Pietikäinen, M.: Adaptive Document Binarization. Pattern Recognition. 33. vol 1, pp. 147–152 (1997). https://doi.org/10.1109/ICDAR.1997.619831
de Zeeuw, F.: Slant Correction Using Histograms, Bachelor’s Thesis in Artificial Intelligence (2006)
Google Scholar
Marti, U.-V., Bunke, H.: Handwritten sentence recognition. 3. vol 3, pp. 463–466 (2000). https://doi.org/10.1109/ICPR.2000.903584
Voigtlaender, P., Doetsch, P., Ney, H.: Handwriting Recognition with Large Multidimensional Long Short-Term Memory Recurrent Neural Networks. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 228–233 (2016) https://doi.org/10.1109/ICFHR.2016.0052
Vu, P., Christopher, K., Jérôme, L.: Dropout Improves Recurrent Neural Networks for Handwriting Recognition. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR (2014) https://doi.org/10.1109/ICFHR.2014.55
Krishnan, P., Dutta, K., Jawahar, C.V.: Word Spotting and Recognition Using Deep Embedding. 1–6 (2018). https://doi.org/10.1109/DAS.2018.70
Alex, G., Santiago, F., Faustino, G., Jürgen, S.: Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural ’networks. In: ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006). https://doi.org/10.1145/1143844.1143891
Baoguang, S., Xiang, B., Cong,Y.: An End-to-End trainable neural network for image-based sequence recognition and its application to scene text recognition. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2015). https://doi.org/10.1109/TPAMI.2016.2646371

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Andhra Pradesh, Tadepalligudem, India
B. N. Shashank & S. Nagesh Bhattu
Department of Electrical Engineering, National Institute of Technology Andhra Pradesh, Tadepalligudem, India
K. Sri Phani Krishna

Authors

B. N. Shashank
View author publications
You can also search for this author in PubMed Google Scholar
S. Nagesh Bhattu
View author publications
You can also search for this author in PubMed Google Scholar
K. Sri Phani Krishna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Nagesh Bhattu .

Editor information

Editors and Affiliations

Visvesvaraya National Institute of Technology Nagpur, Nagpur, India
Deep Gupta
Visvesvaraya National Institute of Technology Nagpur, Nagpur, India
Kishor Bhurchandi
Indian Institute of Technology Ropar, Rupnagar, India
Subrahmanyam Murala
Indian Institute of Technology Roorkee, Roorkee, India
Balasubramanian Raman
Indian Institute of Technology Roorkee, Roorkee, India
Sanjeev Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shashank, B.N., Nagesh Bhattu, S., Sri Phani Krishna, K. (2023). Improvising the CNN Feature Maps Through Integration of Channel Attention for Handwritten Text Recognition. In: Gupta, D., Bhurchandi, K., Murala, S., Raman, B., Kumar, S. (eds) Computer Vision and Image Processing. CVIP 2022. Communications in Computer and Information Science, vol 1777. Springer, Cham. https://doi.org/10.1007/978-3-031-31417-9_37

Download citation

DOI: https://doi.org/10.1007/978-3-031-31417-9_37
Published: 07 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31416-2
Online ISBN: 978-3-031-31417-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improvising the CNN Feature Maps Through Integration of Channel Attention for Handwritten Text Recognition