EMBiL: An English-Manipuri Bi-lingual Benchmark for Scene Text Detection and Language Identification

Naosekpam, Veronica; Islam, Mushtaq; Chourasia, Amul; Sahu, Nilkanta

doi:10.1007/978-3-031-44237-7_7

Veronica Naosekpam ORCID: orcid.org/0000-0002-4850-4713¹⁵,
Mushtaq Islam¹⁵,
Amul Chourasia¹⁵ &
…
Nilkanta Sahu ORCID: orcid.org/0000-0002-9596-2215¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14184))

Included in the following conference series:

International Conference on Computer Analysis of Images and Patterns

402 Accesses

Abstract

Detection and language identification of texts in an unconstrained scene image are quintessential processes in the multimedia information retrieval domain. Over the years, various approaches have investigated them by considering detection and language identification as separate problem statements. To the best of our knowledge, scene text datasets with minority Indic languages are not yet available. To this end, we created a scene image dataset called EMBiL containing a combination of English and Manipuri text. It contains 720 scene images with a total of over 28500 text instances. The Manipuri language is one of the official languages of India. To benchmark the performance of EMBiL, we proposed a single-stage simultaneous detection and language identification network called SceneTextYOLO-Net based on YOLOv5. We specifically included the shallow layer characteristics and applied a multi-scale detection head to improve small target text detection. We also inserted an attention mechanism between the neck and head structures to concentrate on the image’s essential regions. We performed extensive experiments on the proposed dataset using various state-of-the-art techniques. Furthermore, we performed experimental analysis on ICDAR2015 using SceneTextYOLO-Net and state-of-the-art methods. EMBiL is available at: https://github.com/Naosekpam/EMBiL-Dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

IndicSTR12: A Dataset for Indic Scene Text Recognition

E2E-MLT - An Unconstrained End-to-End Method for Multi-language Scene Text

Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach

Article 19 September 2020

References

Chen, D., Bourlard, H., Thiran, J.-P.: Text identification in complex background using SVM. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 2, p. II-II. IEEE (2001)
Google Scholar
Chen, Z., et al.: PIoU loss: towards accurate oriented object detection in complex environments. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 195–211. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_12
Chapter Google Scholar
Dastidar, S.G., Dutta, K., Das, N., Kundu, M., Nasipuri, M.: Exploring knowledge distillation of a deep neural network for multi-script identification. In: Dutta, P., Mandal, J.K., Mukhopadhyay, S. (eds.) CICBA 2021. CCIS, vol. 1406, pp. 150–162. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75529-4_12
Chapter Google Scholar
Gomez, L., Karatzas, D.: A fine-grained approach to scene text script identification. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 192–197. IEEE (2016)
Google Scholar
Inunganbi, S., Choudhary, P., Manglem, K.: Meitei Mayek handwritten dataset: compilation, segmentation, and character recognition. Vis. Comput. 37(2), 291–305 (2021)
Article Google Scholar
Jiang, P., Ergu, D., Liu, F., Cai, Y., Ma, B.: A review of yolo algorithm developments. Procedia Comput. Sci. 199, 1066–1073 (2022)
Article Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Google Scholar
Khalil, A., Jarrah, M., Al-Ayyoub, M., Jararweh, Y.: Text detection and script identification in natural scene images using deep learning. Comput. Electr. Eng. 91, 107043 (2021)
Article Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20(11), 3111–3122 (2018)
Article MathSciNet Google Scholar
Mei, J., Dai, L., Shi, B., Bai, X.: Scene text script identification with convolutional recurrent neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 4053–4058. IEEE (2016)
Google Scholar
Munjal, R.S., Goyal, M., Moharir, R., Moharana, S.: TelCos: ondevice text localization with clustering of script. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021)
Google Scholar
Naosekpam, V., Aggarwal, S., Sahu, N.: UTextNet: a UNet based arbitrary shaped scene text detector. In: Abraham, A., Gandhi, N., Hanne, T., Hong, T.-P., Nogueira Rios, T., Ding, W. (eds.) ISDA 2021. LNNS, vol. 418, pp. 368–378. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-96308-8_34
Chapter Google Scholar
Naosekpam, V., Kumar, N., Sahu, N.: Multi-lingual Indian text detector for mobile devices. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds.) CVIP 2020. CCIS, vol. 1377, pp. 243–254. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-1092-9_21
Chapter Google Scholar
Naosekpam, V., Sahu, N.: Text detection, recognition, and script identification in natural scene images: a review. Int. J. Multimedia Inf. Retrieval 11, 1–24 (2022)
Google Scholar
Naosekpam, V., Shishir, A.S., Sahu, N.: Scene text recognition with orientation rectification via IC-STN. In: TENCON 2021-2021 IEEE Region 10 Conference (TENCON), pp. 664–669 (2021)
Google Scholar
Saha, S., et al.: Multi-lingual scene text detection and language identification. Pattern Recognit. Lett. 138, 16–22 (2020)
Article Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 International Conference on Computer Vision, pp. 1457–1464. IEEE (2011)
Google Scholar
Wang, X., Zheng, S., Zhang, C., Li, R., Gui, L.: R-yolo: a real-time text detector for natural scenes with arbitrary rotation. Sensors 21(3), 888 (2021)
Article Google Scholar
Wikipedia contributors. List of languages by number of native speakers in India – Wikipedia, the free encyclopedia (2022). https://en.wikipedia.org/w/index.php?title=List_of_languages_by_number_of_native_speakers_in_India &oldid=1094973215. Accessed 5 July 2022
Yang, X., Yan, J.: Arbitrary-oriented object detection with circular smooth label. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 677–694. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_40
Chapter Google Scholar
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Information Technology Guwahati, Guwahati, 781015, Assam, India
Veronica Naosekpam, Mushtaq Islam, Amul Chourasia & Nilkanta Sahu

Authors

Veronica Naosekpam
View author publications
You can also search for this author in PubMed Google Scholar
Mushtaq Islam
View author publications
You can also search for this author in PubMed Google Scholar
Amul Chourasia
View author publications
You can also search for this author in PubMed Google Scholar
Nilkanta Sahu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Veronica Naosekpam .

Editor information

Editors and Affiliations

Cyprus University of Technology, Limassol, Cyprus
Nicolas Tsapatsoulis
Cyprus University of Technology/CYENS Center of Excellence, Limassol, Cyprus
Andreas Lanitis
The University of New Mexico, Albuquerque, NM, USA
Marios Pattichis
University of Cyprus/CYENS Center of Excellence, Nicosia, Cyprus
Constantinos Pattichis
University of Cyprus/KIOS Center of Excellence, Nicosia, Cyprus
Christos Kyrkou
Cyprus University of Technology, Limassol, Cyprus
Efthyvoulos Kyriacou
Cyprus University of Technology/CYENS Center of Excellence, Limassol, Cyprus
Zenonas Theodosiou
CYENS Center of Excellence, Nicosia, Cyprus
Andreas Panayides

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Naosekpam, V., Islam, M., Chourasia, A., Sahu, N. (2023). EMBiL: An English-Manipuri Bi-lingual Benchmark for Scene Text Detection and Language Identification. In: Tsapatsoulis, N., et al. Computer Analysis of Images and Patterns. CAIP 2023. Lecture Notes in Computer Science, vol 14184. Springer, Cham. https://doi.org/10.1007/978-3-031-44237-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-44237-7_7
Published: 20 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44236-0
Online ISBN: 978-3-031-44237-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

EMBiL: An English-Manipuri Bi-lingual Benchmark for Scene Text Detection and Language Identification

Abstract

Access this chapter

Similar content being viewed by others

IndicSTR12: A Dataset for Indic Scene Text Recognition

E2E-MLT - An Unconstrained End-to-End Method for Multi-language Scene Text

Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

EMBiL: An English-Manipuri Bi-lingual Benchmark for Scene Text Detection and Language Identification

Abstract

Access this chapter

Similar content being viewed by others

IndicSTR12: A Dataset for Indic Scene Text Recognition

E2E-MLT - An Unconstrained End-to-End Method for Multi-language Scene Text

Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation