Abstract
Detection and language identification of texts in an unconstrained scene image are quintessential processes in the multimedia information retrieval domain. Over the years, various approaches have investigated them by considering detection and language identification as separate problem statements. To the best of our knowledge, scene text datasets with minority Indic languages are not yet available. To this end, we created a scene image dataset called EMBiL containing a combination of English and Manipuri text. It contains 720 scene images with a total of over 28500 text instances. The Manipuri language is one of the official languages of India. To benchmark the performance of EMBiL, we proposed a single-stage simultaneous detection and language identification network called SceneTextYOLO-Net based on YOLOv5. We specifically included the shallow layer characteristics and applied a multi-scale detection head to improve small target text detection. We also inserted an attention mechanism between the neck and head structures to concentrate on the image’s essential regions. We performed extensive experiments on the proposed dataset using various state-of-the-art techniques. Furthermore, we performed experimental analysis on ICDAR2015 using SceneTextYOLO-Net and state-of-the-art methods. EMBiL is available at: https://github.com/Naosekpam/EMBiL-Dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, D., Bourlard, H., Thiran, J.-P.: Text identification in complex background using SVM. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 2, p. II-II. IEEE (2001)
Chen, Z., et al.: PIoU loss: towards accurate oriented object detection in complex environments. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 195–211. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_12
Dastidar, S.G., Dutta, K., Das, N., Kundu, M., Nasipuri, M.: Exploring knowledge distillation of a deep neural network for multi-script identification. In: Dutta, P., Mandal, J.K., Mukhopadhyay, S. (eds.) CICBA 2021. CCIS, vol. 1406, pp. 150–162. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75529-4_12
Gomez, L., Karatzas, D.: A fine-grained approach to scene text script identification. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 192–197. IEEE (2016)
Inunganbi, S., Choudhary, P., Manglem, K.: Meitei Mayek handwritten dataset: compilation, segmentation, and character recognition. Vis. Comput. 37(2), 291–305 (2021)
Jiang, P., Ergu, D., Liu, F., Cai, Y., Ma, B.: A review of yolo algorithm developments. Procedia Comput. Sci. 199, 1066–1073 (2022)
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Khalil, A., Jarrah, M., Al-Ayyoub, M., Jararweh, Y.: Text detection and script identification in natural scene images using deep learning. Comput. Electr. Eng. 91, 107043 (2021)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20(11), 3111–3122 (2018)
Mei, J., Dai, L., Shi, B., Bai, X.: Scene text script identification with convolutional recurrent neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 4053–4058. IEEE (2016)
Munjal, R.S., Goyal, M., Moharir, R., Moharana, S.: TelCos: ondevice text localization with clustering of script. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021)
Naosekpam, V., Aggarwal, S., Sahu, N.: UTextNet: a UNet based arbitrary shaped scene text detector. In: Abraham, A., Gandhi, N., Hanne, T., Hong, T.-P., Nogueira Rios, T., Ding, W. (eds.) ISDA 2021. LNNS, vol. 418, pp. 368–378. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-96308-8_34
Naosekpam, V., Kumar, N., Sahu, N.: Multi-lingual Indian text detector for mobile devices. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds.) CVIP 2020. CCIS, vol. 1377, pp. 243–254. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-1092-9_21
Naosekpam, V., Sahu, N.: Text detection, recognition, and script identification in natural scene images: a review. Int. J. Multimedia Inf. Retrieval 11, 1–24 (2022)
Naosekpam, V., Shishir, A.S., Sahu, N.: Scene text recognition with orientation rectification via IC-STN. In: TENCON 2021-2021 IEEE Region 10 Conference (TENCON), pp. 664–669 (2021)
Saha, S., et al.: Multi-lingual scene text detection and language identification. Pattern Recognit. Lett. 138, 16–22 (2020)
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 International Conference on Computer Vision, pp. 1457–1464. IEEE (2011)
Wang, X., Zheng, S., Zhang, C., Li, R., Gui, L.: R-yolo: a real-time text detector for natural scenes with arbitrary rotation. Sensors 21(3), 888 (2021)
Wikipedia contributors. List of languages by number of native speakers in India – Wikipedia, the free encyclopedia (2022). https://en.wikipedia.org/w/index.php?title=List_of_languages_by_number_of_native_speakers_in_India &oldid=1094973215. Accessed 5 July 2022
Yang, X., Yan, J.: Arbitrary-oriented object detection with circular smooth label. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 677–694. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_40
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Naosekpam, V., Islam, M., Chourasia, A., Sahu, N. (2023). EMBiL: An English-Manipuri Bi-lingual Benchmark for Scene Text Detection and Language Identification. In: Tsapatsoulis, N., et al. Computer Analysis of Images and Patterns. CAIP 2023. Lecture Notes in Computer Science, vol 14184. Springer, Cham. https://doi.org/10.1007/978-3-031-44237-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-44237-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44236-0
Online ISBN: 978-3-031-44237-7
eBook Packages: Computer ScienceComputer Science (R0)