Abstract
Detection and language identification of multi-lingual texts in natural scene images (NSI) and born-digital images (BDI) are popular research problems in the domain of information retrieval. Several methods addressing these problems have been evaluated over the years upon mostly NSI based standard datasets. However, datasets highlighting bi/tri-lingual Indic texts in a single image are quite a few. Also, datasets housing BDIs with multi-lingual texts are hardly available. To this end, a new dataset called Mixed-lingual Indic Texts in Digital Images (MITDI) having 500 NSIs and 500 BDIs, is introduced where each image contains texts written in at least two of the either English, Bangla and Hindi languages which are quite commonly used in India. Overall, NSI pool contains 360 images with bi-lingual texts and 140 with tri-lingual texts, whereas BDI pool contains 489 images with bi-lingual texts and 11 with tri-lingual texts. To benchmark the performance on MITDI, a deep learning based Connectionist-DenseNet framework is built and evaluated for each data pool NSI, BDI and combined set. The proposed dataset can serve as an important resource for evaluating state-of-the-art methods in this domain. The dataset is publicly available at: https://github.com/NCJUCSE/MITDI












Similar content being viewed by others
References
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ... & Zheng, X. (2016). Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16) (pp. 265–283).
Agrawal A, Mukherjee P, Srivastava S, & Lall B (2018). Enhanced characterness for text detection in the wild. In proceedings of 2nd international conference on Computer Vision & Image Processing (pp. 359–369). Springer, Singapore
Baur C, Albarqouni S, Navab N (2017) Semi-supervised deep learning for fully convolutional networks. In international conference on medical image computing and computer-assisted intervention (pp. 311-319). Springer, Cham.
Bhunia AK, Konwer A, Bhunia AK, Bhowmick A, Roy PP, Pal U (2019) Script identification in natural scene image and video frames using an attention based convolutional-LSTM network. Pattern Recogn 85:172–184
Bušta M, Patel Y, Matas J (2018, December) E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In Asian conference on computer vision (pp. 127-143). Springer, Cham.
Chakraborty N, Kundu S, Paul S, Mollah AF, Basu S, Sarkar R (2020) Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach. J Ambient Intell Humaniz Comput 12:7997–8008
Chakraborty N, Chatterjee A, Singh PK, Mollah AF, Sarkar R (2021) Application of daisy descriptor for language identification in the wild. Multimed Tools Appl 80(1):323–344
Chen H, Tsai SS, Schroth G, Chen DM, Grzeszczuk R, Girod B (2011) Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In 2011 18th IEEE international conference on image processing (pp. 2609-2612). IEEE.
Cheng C, Huang Q, Bai X, Feng B, Liu W (2019, September) Patch aggregator for scene text script identification. In 2019 international conference on document analysis and recognition (ICDAR) (pp. 1077-1083). IEEE.
Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21(1):1–13
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251-1258).
Deng D, Liu H, Li X, Cai D (2018) Pixellink: detecting scene text via instance segmentation. In proceedings of the AAAI conference on artificial intelligence (Vol. 32, no. 1).
Dhar D, Chakraborty N, Choudhury S, Paul A, Mollah AF, Basu S, Sarkar R (2020) Multilingual scene text detection using gradient morphology. Int J Comput Vision Image Process (IJCVIP) 10(3):31–43
Doulamis N, Doulamis A (2014) Semi-supervised deep learning for object tracking and classification. In 2014 IEEE international conference on image processing (ICIP) (pp. 848-852). IEEE.
Dutta IN, Chakraborty N, Mollah AF, Basu S, Sarkar R (2021) BOB: a bi-level overlapped binning procedure for scene word binarization. Multimed Tools Appl 80(5):7609–7635
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In 2010 IEEE computer society conference on computer vision and pattern recognition (pp. 2963-2970). IEEE.
Fan K, Baek SJ (2018) A robust proposal generation method for text lines in natural scene images. Neurocomputing 304:47–63
Fujii Y, Driesen K, Baccash J, Hurst A, Popat AC (2017, November) Sequence-to-label script identification for multilingual ocr. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR) (Vol. 1, pp. 161-168). IEEE.
Gomez L, Karatzas D (2016) A fine-grained approach to scene text script identification. In 2016 12th IAPR workshop on document analysis systems (DAS) (pp. 192-197). IEEE.
Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recogn 67:85–96
Guo Y, Liu Y, Oerlemans A, Lao S, Wu S, Lew MS (2016) Deep learning for visual understanding: a review. Neurocomputing 187:27–48
Haifeng D, Siqi H (2020, September) Natural scene text detection based on YOLO V2 network model. In journal of physics: conference series (Vol. 1634, no. 1, p. 012013). IOP publishing.
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708).
Huang G, Liu S, Van der Maaten L, Weinberger KQ (2018) Condensenet: an efficient densenet using learned group convolutions. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2752-2761).
Jiang Y, Zhu X, Wang X, Yang S, Li W, Wang H, ... & Luo Z (2017). R2cnn: rotational region cnn for orientation robust scene text detection. arXiv preprint arXiv:1706.09579.
Joan SF, Valli S (2019) A survey on text information extraction from born-digital and scene text images. Proceed National Acad Sci, India Section A: Phys Sci 89(1):77–101
Jung J, Lee S, Cho MS, Kim JH (2011) Touch TT: scene text extractor using touchscreen interface. ETRI J 33(1):78–88
Karatzas D, Mestre SR, Mas J, Nourbakhsh F, Roy PP (2011) ICDAR 2011 robust reading competition-challenge 1: reading text in born-digital images (web and email). In 2011 international conference on document analysis and recognition (pp. 1485-1490). IEEE.
Karatzas D, Shafait F, Uchida S, Iwamura M, i Bigorda LG, Mestre, S. R., ... & De Las Heras LP (2013). ICDAR 2013 robust reading competition. In 2013 12th International Conference on Document Analysis and Recognition (pp. 1484–1493). IEEE.
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., ... & Valveny E (2015). ICDAR 2015 competition on robust reading. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (pp. 1156–1160). IEEE.
Khan T, Mollah AF (2019) AUTNT-A component level dataset for text non-text classification and benchmarking with novel script invariant feature descriptors and D-CNN. Multimed Tools Appl 78(22):32159–32186
Khan T, Sarkar R, Mollah AF (2021) Deep learning approaches to scene text detection: a comprehensive review. Artif Intell Rev 54:3239–3298
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Liao M, Shi B, Bai X, Wang X, Liu W (2017) February. A fast text detector with a single deep neural network. In Thirty-first AAAI conference on artificial intelligence, Textboxes
Liao M, Shi B, Bai X (2018) Textboxes++: a single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690
Lienhart R, Effelsberg W (2000) Automatic text segmentation and text recognition for video indexing. Multimedia Systems 8(1):69–81
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In European conference on computer vision (pp. 21-37). Springer, Cham.
Liu Z, Lin G, Yang S, Liu F, Lin W, Goh WL (2019) Towards robust curve text detection with conditional spatial expansion. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7269-7278).
Liu Z, Zhou W, Li H (2019) Scene text detection with fully convolutional neural networks. Multimed Tools Appl 78(13):18205–18227
Lu L, Yi Y, Huang F, Wang K, Wang Q (2019) Integrating local CNN and global CNN for script identification in natural scene images. IEEE Access 7:52669–52679
Lucas SM (2005) ICDAR 2005 text locating competition results. In eighth international conference on document analysis and recognition (ICDAR'05) (pp. 80-84). IEEE.
Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R, … Lin X (2005) ICDAR 2003 robust reading competitions: entries, results, and future directions. IJDAR 7(2–3):105–122
Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767
Mei J, Dai L, Shi B, Bai X (2016) Scene text script identification with convolutional recurrent neural networks. In 2016 23rd international conference on pattern recognition (ICPR) (pp. 4053-4058). IEEE.
Mukhopadhyay A, Kumar S, Chowdhury SR, Chakraborty N, Mollah AF, Basu S, Sarkar R (2019) Multi-lingual scene text detection using one-class classifier. Int J Comput Vision Image Process (IJCVIP) 9(2):48–65
Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., ... & Ogier JM (2017). Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (Vol. 1, pp. 1454–1459). IEEE.
Nayef, N., Patel, Y., Busta, M., Chowdhury, P. N., Karatzas, D., Khlif, W., ... & Ogier, J. M. (2019). ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019. In 2019 International Conference on Document Analysis and Recognition (ICDAR) (pp. 1582–1587). IEEE.
Özgen AC, Fasounaki M, Ekenel HK (2018) Text detection in natural and computer-generated images. In 2018 26th signal processing and communications applications conference (SIU) (pp. 1-4). IEEE.
Paul S, Saha S, Basu S, Saha PK, Nasipuri M (2019) Text localization in camera captured images using fuzzy distance transform based adaptive stroke filter. Multimed Tools Appl 78(13):18017–18036
Raghunandan KS, Shivakumara P, Roy S, Kumar GH, Pal U, Lu T (2018) Multi-script-oriented text detection and recognition in video/scene/born digital images. IEEE Trans Circ Syst Video Technol 29(4):1145–1162
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
Saha S, Chakraborty N, Kundu S, Paul S, Mollah AF, Basu S, Sarkar R (2020) Multi-lingual scene text detection and language identification. Pattern Recogn Lett 138:16–22
Tarvainen A, Valpola H (2017) Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. arXiv preprint arXiv:1703.01780.
Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In European conference on computer vision (pp. 56-72). Springer, Cham.
Veit, A., Matera, T., Neumann, L., Matas, J., & Belongie, S. (2016). Coco-text: dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140.
Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E (2018) Deep learning for computer vision: a brief review. Comput Intell Neurosci 2018:1–13
Wang SH, Zhang YD (2020) DenseNet-201-based deep neural network with composite learning factor and precomputation for multiple sclerosis classification. ACM Trans Multimedia Comput, Comm, Appl (TOMM) 16(2s):1–19
Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In 2012 IEEE conference on computer vision and pattern recognition (pp. 1083-1090). IEEE.
Zhang H, Zhao K, Song YZ, Guo J (2013) Text extraction from natural scene image: a survey. Neurocomputing 122:310–323
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4159-4167).
Zhang Z, Liang X, Dong X, Xie Y, Cao G (2018) A sparse-view CT reconstruction method based on combination of DenseNet and deconvolution. IEEE Trans Med Imaging 37(6):1407–1417
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5551-5560).
Acknowledgements
This work is partially supported by the CMATER research laboratory of the Computer Science and Engineering Department, Jadavpur University, India, PURSE-II and UPE-II, project. This work is partially funded by DBT grant (BT/PR16356/BID/7/596/2016) and DST grant (EMR/2016/007213).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors state that there is no conflicts of interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chakraborty, N., Mitra, A., Choudhury, A. et al. How to handle bi/tri-lingual Indic texts in a single image? A new dataset of natural scene and born-digital images. Multimed Tools Appl 81, 15367–15394 (2022). https://doi.org/10.1007/s11042-022-12596-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12596-7