An Hour-Glass CNN for Language Identification of Indic Texts in Digital Images

Chakraborty, Neelotpal; Mollah, Ayatullah Faruk; Basu, Subhadip; Sarkar, Ram

doi:10.1007/978-3-031-11349-9_3

Neelotpal Chakraborty¹⁰,
Ayatullah Faruk Mollah¹¹,
Subhadip Basu¹⁰ &
…
Ram Sarkar¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1568))

Included in the following conference series:

International Conference on Computer Vision and Image Processing

944 Accesses

Abstract

Understanding multi-lingual texts in any digital image calls for identifying the corresponding languages of the localized texts. India houses a multi-lingual ambience which necessitates the pursuit of an efficient model that is robust against various complexities and successfully identifies the language of Indic texts. This paper presents a deep learning based Convolutional Neural Network (CNN) model having an hour-glass like structure, for classifying texts in popular Indic languages like Bangla, English and Hindi. A new dataset, called Indic Texts in Digital Images (ITDI), is also presented which is a collection of text images, both scene and born-digital, written in Bangla, English and Hindi. The performance of the hour-glass CNN is evaluated upon standard Indic dataset like AUTNT giving an accuracy of 90.93% which is higher than most state-of-the-art models. The proposed model is also used to benchmark the performance on ITDI dataset with a reasonable accuracy of 85.18%. Sample instances of the proposed ITDI dataset can be found at: https://github.com/NCJUCSE/ITDI

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

How to handle bi/tri-lingual Indic texts in a single image? A new dataset of natural scene and born-digital images

Article 28 February 2022

IndicSTR12: A Dataset for Indic Scene Text Recognition

DELIGHT-Net: DEep and LIGHTweight network to segment Indian text at word level from wild scenic images

Article 24 August 2023

References

Joan, S.F., Valli, S.: A survey on text information extraction from born-digital and scene text images. Proc. Nat. Acad. Sci. India Sec. A Phys. Sci. 89(1), 77–101 (2019)
Article Google Scholar
Kanagarathinam, K., Sekar, K.: Text detection and recognition in raw image dataset of seven segment digital energy meter display. Energy Rep. 5, 842–852 (2019)
Article Google Scholar
Saha, S., Chakraborty, N., Kundu, S., Paul, S., Mollah, A.F., Basu, S., Sarkar, R.: Multi-lingual scene text detection and language identification. Pattern Recogn. Lett. 138, 16–22 (2020)
Article Google Scholar
Chakraborty, N., Chatterjee, A., Singh, P.K., Mollah, A.F., Sarkar, R.: Application of daisy descriptor for language identification in the wild. Multimedia Tools Appl. 80(1), 323–344 (2021)
Article Google Scholar
Long, S., He, X., Yao, C.: Scene text detection and recognition: the deep learning era. Int. J. Comput. Vis. 129(1), 161–184 (2021)
Article Google Scholar
Melekhov, I., Ylioinas, J., Kannala, J., Rahtu, E.: Image-based localization using hourglass networks. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 879–886 (2017)
Google Scholar
Liu, S., Shang, Y., Han, J., Wang, X., Gao, H., Liu, D.: Multi-lingual scene text detection based on fully convolutional networks. In: Pacific Rim Conference on Multimedia, pp. 423–432. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-77380-3_40
Chapter Google Scholar
Khan, T., Mollah, A.F.: AUTNT-A component level dataset for text non-text classification and benchmarking with novel script invariant feature descriptors and D-CNN. Multimedia Tools Appl. 78(22), 32159–32186 (2019)
Article Google Scholar
Khan, T., Mollah, A.F.: Component-level script classification benchmark with CNN on AUTNT Dataset. In: Bhattacharjee, D., Kole, D.K., Dey, N., Basu, S., Plewczynski, D. (eds.) Proceedings of International Conference on Frontiers in Computing and Systems and Computing, vol. 1255, pp. 225–234. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-7834-2_21
Cheng, C., Huang, Q., Bai, X., Feng, B., Liu, W.: Patch aggregator for scene text script identification. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1077–1083. IEEE (2019)
Google Scholar
Chakraborty, N., Kundu, S., Paul, S., Mollah, A.F., Basu, S., Sarkar, R.: Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach. J. Ambient Intell. Hum. Comput. 12, 7997–8008 (2020)
Article Google Scholar
Jajoo, M., Chakraborty, N., Mollah, A.F., Basu, S., Sarkar, R.: Script identification from camera-captured multi-script scene text components. In: Kalita, J., Balas, V., Borah, S., Pradhan, R. (eds.) Recent Developments in Machine Learning and Data Analytics. Advances in Intelligent Systems and Computing, vol. 740, pp. 159–166. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-1280-9_16
Chapter Google Scholar
Lu, L., Yi, Y., Huang, F., Wang, K., Wang, Q.: Integrating local CNN and global CNN for script identification in natural scene images. IEEE Access 7, 52669–52679 (2019)
Article Google Scholar
Mei, J., Dai, L., Shi, B., Bai, X.: Scene text script identification with convolutional recurrent neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 4053–4058. IEEE (2016)
Google Scholar
Fujii, Y., Driesen, K., Baccash, J., Hurst, A., Popat, A. C.: Sequence-to-label script identification for multilingual ocr. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 161–168. IEEE (2017)
Google Scholar
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258 (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv:1412.6980 (2014)

Download references

Acknowledgements

This work is partially supported by the CMATER research laboratory of the Computer Science and Engineering Department, Jadavpur University, India.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India
Neelotpal Chakraborty, Subhadip Basu & Ram Sarkar
Department of Computer Science and Engineering, Aliah University, Kolkata, 700160, India
Ayatullah Faruk Mollah

Authors

Neelotpal Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar
Ayatullah Faruk Mollah
View author publications
You can also search for this author in PubMed Google Scholar
Subhadip Basu
View author publications
You can also search for this author in PubMed Google Scholar
Ram Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Neelotpal Chakraborty .

Editor information

Editors and Affiliations

Indian Institute of Technology Roorkee, Roorkee, India
Balasubramanian Raman
Indian Institute of Technology Ropar, Ropar, India
Subrahmanyam Murala
Jadavpur University, Kolkata, India
Ananda Chowdhury
Indian Institute of Technology Ropar, Ropar, India
Abhinav Dhall
Indian Institute of Technology Ropar, Ropar, India
Puneet Goyal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chakraborty, N., Mollah, A.F., Basu, S., Sarkar, R. (2022). An Hour-Glass CNN for Language Identification of Indic Texts in Digital Images. In: Raman, B., Murala, S., Chowdhury, A., Dhall, A., Goyal, P. (eds) Computer Vision and Image Processing. CVIP 2021. Communications in Computer and Information Science, vol 1568. Springer, Cham. https://doi.org/10.1007/978-3-031-11349-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-11349-9_3
Published: 24 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11348-2
Online ISBN: 978-3-031-11349-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics