Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach

Chakraborty, Neelotpal; Kundu, Soumyadeep; Paul, Sayantan; Mollah, Ayatullah Faruk; Basu, Subhadip; Sarkar, Ram

doi:10.1007/s12652-020-02528-4

Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach

Original Research
Published: 19 September 2020

Volume 12, pages 7997–8008, (2021)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Neelotpal Chakraborty¹,
Soumyadeep Kundu¹,
Sayantan Paul¹,
Ayatullah Faruk Mollah²,
Subhadip Basu¹ &
…
Ram Sarkar¹

560 Accesses
8 Citations
Explore all metrics

Abstract

Since the past two decades, detecting text regions in complex natural images has emerged as a problem of great interest for the research fraternity. This is because these regions of interest serve as source of information that can be utilized for various purposes. However, these regions may contain texts in multiple languages. Hence, identifying the corresponding language of a detected scene text becomes important for further information processing. Language identification of the text, captured in a wild, is an extremely challenging research field in the domain of scene text recognition. In this paper, a deep learning-based classifier combination approach is proposed to solve the problem of language identification from multi-lingual scene text images. In this work, a minimalist Convolutional Neural Network architecture is used as the base model. Five variants of an input image—three different channels of RGB color model (i.e. R for red, G for green and B for blue) along with RGB itself, and grayscale image are passed through the base model separately. The outcomes of these five models are combined using the classifier combination approaches based on sum rule and product rule. Performances of the proposed model have been evaluated on some standard datasets like KAIST and MLe2e as well as in-house multi-lingual scent text dataset. From the experimental results, it has been observed that the proposed model outperforms some state-of-the-art methods considered here for comparison.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text/Non-text Scene Image Classification Using Deep Ensemble Network

EMBiL: An English-Manipuri Bi-lingual Benchmark for Scene Text Detection and Language Identification

Meetei Mayek, Hindi, and English Text Detection from Natural Scene Images Using YOLO

References

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Kudlur M (2016). Tensorflow: a system for large-scale machine learning. In: 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16) (pp. 265–283).
Ahamed P, Kundu S, Khan T, Bhateja V, Sarkar R, Mollah AF (2020) Handwritten Arabic numerals recognition using convolutional neural network. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-020-01901-7
Article Google Scholar
Baburaj M, George SN (2019) Tensor based approach for inpainting of video containing sparse text. Multim Tools Appl 78(2):1805–1829
Article Google Scholar
Bhunia AK, Konwer A, Bhunia AK, Bhowmick A, Roy PP, Pal U (2019) Script identification in natural scene image and video frames using an attention based convolutional-LSTM network. Pattern Recogn 85:172–184. https://doi.org/10.1016/J.PATCOG.2018.07.034
Article Google Scholar
Chakraborty N, Biswas S, Mollah AF, Basu S, Sarkar R (2018) Multi-lingual scene text detection by local histogram analysis and selection of optimal area for MSER. In: International Conference on Computational Intelligence, Communications, and Business Analytics (pp. 234–242). Springer, Singapore.
Deng L, Gong Y, Lin Y, Shuai J, Tu X, Zhang Y, Xie M (2019) Detecting multi-oriented text with corner-based region proposals. Neurocomputing 334:134–142
Article Google Scholar
Dhar D, Chakraborty N, Choudhury S, Paul A, Mollah AF, Basu S, Sarkar R (2020) Multilingual scene text detection using gradient morphology. Int J Comput Vis Image Process (IJCVIP) 10(3):31–43
Article Google Scholar
Dutta IN, Chakraborty N, Mollah AF, Basu S, Sarkar R (2019) Multi-lingual text localization from camera captured images based on foreground homogenity analysis. In: Recent Developments in Machine Learning and Data Analytics (pp. 149–158). Springer, Singapore.
Farhat W, Sghaier S, Faiedh H, Souani C (2019) Design of efficient embedded system for road sign recognition. J Ambient Intell Hum Comput 10(2):491–507
Article Google Scholar
Gomez L, Karatzas D (2016) A fine-grained approach to scene text script identification. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS) (pp. 192–197). IEEE.
Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recogn 67:85–96
Article Google Scholar
He W, Zhang XY, Yin F, Liu CL (2017) Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 745–753).
He W, Zhang XY, Yin F, Liu CL (2018) Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Trans Image Process 27(11):5406–5419
Article MathSciNet Google Scholar
Jajoo M, Chakraborty N, Mollah AF, Basu S, Sarkar R (2019) Script identification from camera-captured multi-script scene text components. In: Recent Developments in Machine Learning and Data Analytics (pp. 159–166). Springer, Singapore.
Jung J, Lee S, Cho MS, Kim JH (2011) Touch TT: Scene text extractor using touchscreen interface. ETRI J 33(1):78–88
Article Google Scholar
Kavitha PK, Saraswathi PV (2020) Content based satellite image retrieval system using fuzzy clustering. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-020-02064-1
Article Google Scholar
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980.
Kittler J, Hater M, Duin RP (1996) Combining classifiers. In: Proceedings of 13th international conference on pattern recognition (vol. 2, pp. 897–901). IEEE.
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436
Article Google Scholar
Lee S, Cho MS, Jung K, Kim JH (2010) Scene text extraction with edge constraint and text collinearity. In: 2010 20th International Conference on Pattern Recognition (pp. 3983–3986). IEEE.
Liao WH, Liang YH, Wu YC (2015) An integrated approach for multilingual scene text detection. In: 2015 7th International Conference of Soft Computing and Pattern Recognition (SoCPaR) (pp. 211–217). IEEE.
Lin H, Yang P, Zhang F (2019) Review of Scene Text Detection and Recognition. Arch Comput Methods Eng 27:1–22
Google Scholar
Liu Y, Jin L, Zhang S, Luo C, Zhang S (2019) Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recogn 90:337–345
Article Google Scholar
Lu L, Yi Y, Huang F, Wang K, Wang Q (2019) Integrating local CNN and global CNN for script identification in natural scene images. IEEE Access 7:52669–52679
Article Google Scholar
Mohandes M, Deriche M, Aliyu SO (2018) Classifiers combination techniques: a comprehensive review. IEEE Access 6:19626–19639
Article Google Scholar
Mukhopadhyay A, Singh P, Sarkar R, Nasipuri M (2018) A study of different classifier combination approaches for handwritten Indic Script Recognition. J Imag 4(2):39
Article Google Scholar
Mukhopadhyay A, Kumar S, Chowdhury SR, Chakraborty N, Mollah AF, Basu S, Sarkar R (2019) Multi-Lingual scene text detection using one-class classifier. Int J Comput Vis Image Process (IJCVIP) 9(2):48–65
Article Google Scholar
Narayanan VS, Kasthuri N (2020) An efficient recognition system for preserving ancient historical documents of English characters. J Ambient Intell Hum Comput 15:1–9
Google Scholar
Nicolaou A, Bagdanov AD, Gómez L, Karatzas D (2016) Visual script and language identification. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS) (pp. 393–398). IEEE.
Panda S, Ash S, Chakraborty N, Mollah AF, Basu S, Sarka, R (2020) Parameter tuning in MSER for text localization in multi-lingual camera-captured scene text images. In: Computational Intelligence in Pattern Recognition (pp. 999–1009). Springer, Singapore.
Paul S, Saha S, Basu S, Nasipuri M (2015) Text localization in camera captured images using adaptive stroke filter. In: Information Systems Design and Intelligent Applications (pp. 217–225). Springer, New Delhi.
Paul S, Saha S, Basu S, Saha PK, Nasipuri M (2019) Text localization in camera captured images using fuzzy distance transform based adaptive stroke filter. Multim Tools Appl. https://doi.org/10.1007/s11042-019-7178-3
Article Google Scholar
Saha S, Chakraborty N, Kundu S, Paul S, Mollah AF, Basu S, Sarkar R (2020) Multi-lingual scene text detection and language identification. Pattern Recog Lett. https://doi.org/10.1007/s11042-019-7178-3
Article Google Scholar
Saidane Z, Garcia C (2007) Automatic scene text recognition using a convolutional neural network. In: Workshop on Camera-Based Document Analysis and Recognition (vol. 1).
Sheng F, Zhang Y, Shi C, Qiu M, Yao S (2020) Xi’an tourism destination image analysis via deep learning. J Ambient Intell Hum Comput 18:1–10
Google Scholar
Shi B, Yao C, Zhang C, Guo X, Huang F, Bai X (2015) Automatic script identification in the wild. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (pp. 531–535). IEEE.
Shi B, Bai X, Yao C (2016a) Script identification in the wild via discriminative convolutional neural network. Pattern Recogn 52:448–458
Article Google Scholar
Shi B, Wang X, Lyu P, Yao C, Bai X (2016b) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4168–4176).
Singh AK, Mishra A, Dabral P, Jawahar CV (2016) A simple and effective solution for script identification in the wild. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS) (pp. 428–433). IEEE.
Singh PK, Sarkar R, Bhateja V, Nasipuri M (2018) A comprehensive handwritten Indic script recognition system: a tree-based approach. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-018-1052-4
Article Google Scholar
Tounsi M, Moalla I, Lebourgeois F, Alimi AM (2017) CNN based transfer learning for scene script identification. In: International Conference on Neural Information Processing (pp 702–711). Springer, Cham.
Tulyakov S, Jaeger S, Govindaraju V, Doermann D (2008) Review of classifier combination methods. In: Machine learning in document analysis and recognition (pp 361–386). Springer, Berlin, Heidelberg.
Ul-Hasan A, Afzal MZ, Shafait F, Liwicki M, Breuel TM (2015) A sequence learning approach for multiple script identification. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (pp. 1046–1050). IEEE.
Weinman JJ, Learned-Miller E, Hanson A (2008) A discriminative semi-Markov model for robust scene text recognition. In: 2008 19th International Conference on Pattern Recognition (pp. 1–5). IEEE. https://doi.org/10.1109/ICPR.2008.4761818
Xie H, Fang S, Zha ZJ, Yang Y, Li Y, Zhang Y (2019) Convolutional Attention Networks for Scene Text Recognition. ACM Trans Multim Comput Commun Appl (TOMM). https://doi.org/10.1145/3231737
Article Google Scholar
Zdenek J, Nakayama H (2017) Bag of local convolutional triplets for script identification in scene text. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (Vol. 1, pp. 369–375). IEEE.
Zhu Y, Yao C, Bai X (2016) Scene text detection and recognition: recent advances and future trends. Front Comput Sci 10(1):19–36
Article Google Scholar
Zhu X, Wang Q, Li P, Zhang XY, Wang L (2018) Learning region-wise deep feature representation for image analysis. J Ambient Intel Hum Comput. https://doi.org/10.1007/s12652-018-0894-0
Article Google Scholar

Download references

Acknowledgement

This work is partially supported by the CMATER research laboratory of the Computer Science and Engineering Department, Jadavpur University, India, PURSE-II and UPE-II, project.

Funding

This work is partially funded by DBT Grant (BT/PR16356/BID/7/596/2016), UGC Research Award (F.30–31/2016(SAII)) and DST Grant (EMR/2016/007213).

Author information

Authors and Affiliations

Computer Science and Engineering Department, Jadavpur University, Kolkata, India
Neelotpal Chakraborty, Soumyadeep Kundu, Sayantan Paul, Subhadip Basu & Ram Sarkar
Computer Science and Engineering Department, Aliah University, Kolkata, India
Ayatullah Faruk Mollah

Authors

Neelotpal Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar
Soumyadeep Kundu
View author publications
You can also search for this author in PubMed Google Scholar
Sayantan Paul
View author publications
You can also search for this author in PubMed Google Scholar
Ayatullah Faruk Mollah
View author publications
You can also search for this author in PubMed Google Scholar
Subhadip Basu
View author publications
You can also search for this author in PubMed Google Scholar
Ram Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Neelotpal Chakraborty.

Ethics declarations

Conflict of Interest

There is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chakraborty, N., Kundu, S., Paul, S. et al. Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach. J Ambient Intell Human Comput 12, 7997–8008 (2021). https://doi.org/10.1007/s12652-020-02528-4

Download citation

Received: 19 February 2020
Accepted: 05 September 2020
Published: 19 September 2020
Issue Date: July 2021
DOI: https://doi.org/10.1007/s12652-020-02528-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach

Abstract

Access this article

Similar content being viewed by others

Text/Non-text Scene Image Classification Using Deep Ensemble Network

EMBiL: An English-Manipuri Bi-lingual Benchmark for Scene Text Detection and Language Identification

Meetei Mayek, Hindi, and English Text Detection from Natural Scene Images Using YOLO

References

Acknowledgement

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach

Abstract

Access this article

Similar content being viewed by others

Text/Non-text Scene Image Classification Using Deep Ensemble Network

EMBiL: An English-Manipuri Bi-lingual Benchmark for Scene Text Detection and Language Identification

Meetei Mayek, Hindi, and English Text Detection from Natural Scene Images Using YOLO

References

Acknowledgement

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation