Skip to main content

EMBiL: An English-Manipuri Bi-lingual Benchmark for Scene Text Detection and Language Identification

  • Conference paper
  • First Online:
Computer Analysis of Images and Patterns (CAIP 2023)

Abstract

Detection and language identification of texts in an unconstrained scene image are quintessential processes in the multimedia information retrieval domain. Over the years, various approaches have investigated them by considering detection and language identification as separate problem statements. To the best of our knowledge, scene text datasets with minority Indic languages are not yet available. To this end, we created a scene image dataset called EMBiL containing a combination of English and Manipuri text. It contains 720 scene images with a total of over 28500 text instances. The Manipuri language is one of the official languages of India. To benchmark the performance of EMBiL, we proposed a single-stage simultaneous detection and language identification network called SceneTextYOLO-Net based on YOLOv5. We specifically included the shallow layer characteristics and applied a multi-scale detection head to improve small target text detection. We also inserted an attention mechanism between the neck and head structures to concentrate on the image’s essential regions. We performed extensive experiments on the proposed dataset using various state-of-the-art techniques. Furthermore, we performed experimental analysis on ICDAR2015 using SceneTextYOLO-Net and state-of-the-art methods. EMBiL is available at: https://github.com/Naosekpam/EMBiL-Dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, D., Bourlard, H., Thiran, J.-P.: Text identification in complex background using SVM. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 2, p. II-II. IEEE (2001)

    Google Scholar 

  2. Chen, Z., et al.: PIoU loss: towards accurate oriented object detection in complex environments. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 195–211. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_12

    Chapter  Google Scholar 

  3. Dastidar, S.G., Dutta, K., Das, N., Kundu, M., Nasipuri, M.: Exploring knowledge distillation of a deep neural network for multi-script identification. In: Dutta, P., Mandal, J.K., Mukhopadhyay, S. (eds.) CICBA 2021. CCIS, vol. 1406, pp. 150–162. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75529-4_12

    Chapter  Google Scholar 

  4. Gomez, L., Karatzas, D.: A fine-grained approach to scene text script identification. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 192–197. IEEE (2016)

    Google Scholar 

  5. Inunganbi, S., Choudhary, P., Manglem, K.: Meitei Mayek handwritten dataset: compilation, segmentation, and character recognition. Vis. Comput. 37(2), 291–305 (2021)

    Article  Google Scholar 

  6. Jiang, P., Ergu, D., Liu, F., Cai, Y., Ma, B.: A review of yolo algorithm developments. Procedia Comput. Sci. 199, 1066–1073 (2022)

    Article  Google Scholar 

  7. Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)

    Google Scholar 

  8. Khalil, A., Jarrah, M., Al-Ayyoub, M., Jararweh, Y.: Text detection and script identification in natural scene images using deep learning. Comput. Electr. Eng. 91, 107043 (2021)

    Article  Google Scholar 

  9. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

    Google Scholar 

  10. Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20(11), 3111–3122 (2018)

    Article  MathSciNet  Google Scholar 

  11. Mei, J., Dai, L., Shi, B., Bai, X.: Scene text script identification with convolutional recurrent neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 4053–4058. IEEE (2016)

    Google Scholar 

  12. Munjal, R.S., Goyal, M., Moharir, R., Moharana, S.: TelCos: ondevice text localization with clustering of script. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021)

    Google Scholar 

  13. Naosekpam, V., Aggarwal, S., Sahu, N.: UTextNet: a UNet based arbitrary shaped scene text detector. In: Abraham, A., Gandhi, N., Hanne, T., Hong, T.-P., Nogueira Rios, T., Ding, W. (eds.) ISDA 2021. LNNS, vol. 418, pp. 368–378. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-96308-8_34

    Chapter  Google Scholar 

  14. Naosekpam, V., Kumar, N., Sahu, N.: Multi-lingual Indian text detector for mobile devices. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds.) CVIP 2020. CCIS, vol. 1377, pp. 243–254. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-1092-9_21

    Chapter  Google Scholar 

  15. Naosekpam, V., Sahu, N.: Text detection, recognition, and script identification in natural scene images: a review. Int. J. Multimedia Inf. Retrieval 11, 1–24 (2022)

    Google Scholar 

  16. Naosekpam, V., Shishir, A.S., Sahu, N.: Scene text recognition with orientation rectification via IC-STN. In: TENCON 2021-2021 IEEE Region 10 Conference (TENCON), pp. 664–669 (2021)

    Google Scholar 

  17. Saha, S., et al.: Multi-lingual scene text detection and language identification. Pattern Recognit. Lett. 138, 16–22 (2020)

    Article  Google Scholar 

  18. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 International Conference on Computer Vision, pp. 1457–1464. IEEE (2011)

    Google Scholar 

  19. Wang, X., Zheng, S., Zhang, C., Li, R., Gui, L.: R-yolo: a real-time text detector for natural scenes with arbitrary rotation. Sensors 21(3), 888 (2021)

    Article  Google Scholar 

  20. Wikipedia contributors. List of languages by number of native speakers in India – Wikipedia, the free encyclopedia (2022). https://en.wikipedia.org/w/index.php?title=List_of_languages_by_number_of_native_speakers_in_India &oldid=1094973215. Accessed 5 July 2022

  21. Yang, X., Yan, J.: Arbitrary-oriented object detection with circular smooth label. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 677–694. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_40

    Chapter  Google Scholar 

  22. Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Veronica Naosekpam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Naosekpam, V., Islam, M., Chourasia, A., Sahu, N. (2023). EMBiL: An English-Manipuri Bi-lingual Benchmark for Scene Text Detection and Language Identification. In: Tsapatsoulis, N., et al. Computer Analysis of Images and Patterns. CAIP 2023. Lecture Notes in Computer Science, vol 14184. Springer, Cham. https://doi.org/10.1007/978-3-031-44237-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44237-7_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44236-0

  • Online ISBN: 978-3-031-44237-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics