Abstract
Since the past two decades, detecting text regions in complex natural images has emerged as a problem of great interest for the research fraternity. This is because these regions of interest serve as source of information that can be utilized for various purposes. However, these regions may contain texts in multiple languages. Hence, identifying the corresponding language of a detected scene text becomes important for further information processing. Language identification of the text, captured in a wild, is an extremely challenging research field in the domain of scene text recognition. In this paper, a deep learning-based classifier combination approach is proposed to solve the problem of language identification from multi-lingual scene text images. In this work, a minimalist Convolutional Neural Network architecture is used as the base model. Five variants of an input image—three different channels of RGB color model (i.e. R for red, G for green and B for blue) along with RGB itself, and grayscale image are passed through the base model separately. The outcomes of these five models are combined using the classifier combination approaches based on sum rule and product rule. Performances of the proposed model have been evaluated on some standard datasets like KAIST and MLe2e as well as in-house multi-lingual scent text dataset. From the experimental results, it has been observed that the proposed model outperforms some state-of-the-art methods considered here for comparison.
Similar content being viewed by others
References
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Kudlur M (2016). Tensorflow: a system for large-scale machine learning. In: 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16) (pp. 265–283).
Ahamed P, Kundu S, Khan T, Bhateja V, Sarkar R, Mollah AF (2020) Handwritten Arabic numerals recognition using convolutional neural network. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-020-01901-7
Baburaj M, George SN (2019) Tensor based approach for inpainting of video containing sparse text. Multim Tools Appl 78(2):1805–1829
Bhunia AK, Konwer A, Bhunia AK, Bhowmick A, Roy PP, Pal U (2019) Script identification in natural scene image and video frames using an attention based convolutional-LSTM network. Pattern Recogn 85:172–184. https://doi.org/10.1016/J.PATCOG.2018.07.034
Chakraborty N, Biswas S, Mollah AF, Basu S, Sarkar R (2018) Multi-lingual scene text detection by local histogram analysis and selection of optimal area for MSER. In: International Conference on Computational Intelligence, Communications, and Business Analytics (pp. 234–242). Springer, Singapore.
Deng L, Gong Y, Lin Y, Shuai J, Tu X, Zhang Y, Xie M (2019) Detecting multi-oriented text with corner-based region proposals. Neurocomputing 334:134–142
Dhar D, Chakraborty N, Choudhury S, Paul A, Mollah AF, Basu S, Sarkar R (2020) Multilingual scene text detection using gradient morphology. Int J Comput Vis Image Process (IJCVIP) 10(3):31–43
Dutta IN, Chakraborty N, Mollah AF, Basu S, Sarkar R (2019) Multi-lingual text localization from camera captured images based on foreground homogenity analysis. In: Recent Developments in Machine Learning and Data Analytics (pp. 149–158). Springer, Singapore.
Farhat W, Sghaier S, Faiedh H, Souani C (2019) Design of efficient embedded system for road sign recognition. J Ambient Intell Hum Comput 10(2):491–507
Gomez L, Karatzas D (2016) A fine-grained approach to scene text script identification. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS) (pp. 192–197). IEEE.
Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recogn 67:85–96
He W, Zhang XY, Yin F, Liu CL (2017) Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 745–753).
He W, Zhang XY, Yin F, Liu CL (2018) Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Trans Image Process 27(11):5406–5419
Jajoo M, Chakraborty N, Mollah AF, Basu S, Sarkar R (2019) Script identification from camera-captured multi-script scene text components. In: Recent Developments in Machine Learning and Data Analytics (pp. 159–166). Springer, Singapore.
Jung J, Lee S, Cho MS, Kim JH (2011) Touch TT: Scene text extractor using touchscreen interface. ETRI J 33(1):78–88
Kavitha PK, Saraswathi PV (2020) Content based satellite image retrieval system using fuzzy clustering. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-020-02064-1
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980.
Kittler J, Hater M, Duin RP (1996) Combining classifiers. In: Proceedings of 13th international conference on pattern recognition (vol. 2, pp. 897–901). IEEE.
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436
Lee S, Cho MS, Jung K, Kim JH (2010) Scene text extraction with edge constraint and text collinearity. In: 2010 20th International Conference on Pattern Recognition (pp. 3983–3986). IEEE.
Liao WH, Liang YH, Wu YC (2015) An integrated approach for multilingual scene text detection. In: 2015 7th International Conference of Soft Computing and Pattern Recognition (SoCPaR) (pp. 211–217). IEEE.
Lin H, Yang P, Zhang F (2019) Review of Scene Text Detection and Recognition. Arch Comput Methods Eng 27:1–22
Liu Y, Jin L, Zhang S, Luo C, Zhang S (2019) Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recogn 90:337–345
Lu L, Yi Y, Huang F, Wang K, Wang Q (2019) Integrating local CNN and global CNN for script identification in natural scene images. IEEE Access 7:52669–52679
Mohandes M, Deriche M, Aliyu SO (2018) Classifiers combination techniques: a comprehensive review. IEEE Access 6:19626–19639
Mukhopadhyay A, Singh P, Sarkar R, Nasipuri M (2018) A study of different classifier combination approaches for handwritten Indic Script Recognition. J Imag 4(2):39
Mukhopadhyay A, Kumar S, Chowdhury SR, Chakraborty N, Mollah AF, Basu S, Sarkar R (2019) Multi-Lingual scene text detection using one-class classifier. Int J Comput Vis Image Process (IJCVIP) 9(2):48–65
Narayanan VS, Kasthuri N (2020) An efficient recognition system for preserving ancient historical documents of English characters. J Ambient Intell Hum Comput 15:1–9
Nicolaou A, Bagdanov AD, Gómez L, Karatzas D (2016) Visual script and language identification. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS) (pp. 393–398). IEEE.
Panda S, Ash S, Chakraborty N, Mollah AF, Basu S, Sarka, R (2020) Parameter tuning in MSER for text localization in multi-lingual camera-captured scene text images. In: Computational Intelligence in Pattern Recognition (pp. 999–1009). Springer, Singapore.
Paul S, Saha S, Basu S, Nasipuri M (2015) Text localization in camera captured images using adaptive stroke filter. In: Information Systems Design and Intelligent Applications (pp. 217–225). Springer, New Delhi.
Paul S, Saha S, Basu S, Saha PK, Nasipuri M (2019) Text localization in camera captured images using fuzzy distance transform based adaptive stroke filter. Multim Tools Appl. https://doi.org/10.1007/s11042-019-7178-3
Saha S, Chakraborty N, Kundu S, Paul S, Mollah AF, Basu S, Sarkar R (2020) Multi-lingual scene text detection and language identification. Pattern Recog Lett. https://doi.org/10.1007/s11042-019-7178-3
Saidane Z, Garcia C (2007) Automatic scene text recognition using a convolutional neural network. In: Workshop on Camera-Based Document Analysis and Recognition (vol. 1).
Sheng F, Zhang Y, Shi C, Qiu M, Yao S (2020) Xi’an tourism destination image analysis via deep learning. J Ambient Intell Hum Comput 18:1–10
Shi B, Yao C, Zhang C, Guo X, Huang F, Bai X (2015) Automatic script identification in the wild. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (pp. 531–535). IEEE.
Shi B, Bai X, Yao C (2016a) Script identification in the wild via discriminative convolutional neural network. Pattern Recogn 52:448–458
Shi B, Wang X, Lyu P, Yao C, Bai X (2016b) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4168–4176).
Singh AK, Mishra A, Dabral P, Jawahar CV (2016) A simple and effective solution for script identification in the wild. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS) (pp. 428–433). IEEE.
Singh PK, Sarkar R, Bhateja V, Nasipuri M (2018) A comprehensive handwritten Indic script recognition system: a tree-based approach. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-018-1052-4
Tounsi M, Moalla I, Lebourgeois F, Alimi AM (2017) CNN based transfer learning for scene script identification. In: International Conference on Neural Information Processing (pp 702–711). Springer, Cham.
Tulyakov S, Jaeger S, Govindaraju V, Doermann D (2008) Review of classifier combination methods. In: Machine learning in document analysis and recognition (pp 361–386). Springer, Berlin, Heidelberg.
Ul-Hasan A, Afzal MZ, Shafait F, Liwicki M, Breuel TM (2015) A sequence learning approach for multiple script identification. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (pp. 1046–1050). IEEE.
Weinman JJ, Learned-Miller E, Hanson A (2008) A discriminative semi-Markov model for robust scene text recognition. In: 2008 19th International Conference on Pattern Recognition (pp. 1–5). IEEE. https://doi.org/10.1109/ICPR.2008.4761818
Xie H, Fang S, Zha ZJ, Yang Y, Li Y, Zhang Y (2019) Convolutional Attention Networks for Scene Text Recognition. ACM Trans Multim Comput Commun Appl (TOMM). https://doi.org/10.1145/3231737
Zdenek J, Nakayama H (2017) Bag of local convolutional triplets for script identification in scene text. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (Vol. 1, pp. 369–375). IEEE.
Zhu Y, Yao C, Bai X (2016) Scene text detection and recognition: recent advances and future trends. Front Comput Sci 10(1):19–36
Zhu X, Wang Q, Li P, Zhang XY, Wang L (2018) Learning region-wise deep feature representation for image analysis. J Ambient Intel Hum Comput. https://doi.org/10.1007/s12652-018-0894-0
Acknowledgement
This work is partially supported by the CMATER research laboratory of the Computer Science and Engineering Department, Jadavpur University, India, PURSE-II and UPE-II, project.
Funding
This work is partially funded by DBT Grant (BT/PR16356/BID/7/596/2016), UGC Research Award (F.30–31/2016(SAII)) and DST Grant (EMR/2016/007213).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
There is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chakraborty, N., Kundu, S., Paul, S. et al. Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach. J Ambient Intell Human Comput 12, 7997–8008 (2021). https://doi.org/10.1007/s12652-020-02528-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-020-02528-4