Skip to main content

Multi-lingual Scene Text Detection Containing the Arabic Scripts Using an Optimal then Enhanced YOLO Model

  • Conference paper
  • First Online:
Advances in Model and Data Engineering in the Digitalization Era (MEDI 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2071))

Included in the following conference series:

  • 43 Accesses

Abstract

In the past few years, notable progress has been achieved in the field of deep learning, particularly in the realm of identifying text within images of natural scenes, owing to the advancements in machine learning and artificial intelligence. The effectiveness of deep learning and text detection in the wild, especially when dealing with Arabic language, is frequently hindered by the scarcity of diverse datasets encompassing multiple languages and scripts, which poses an additional challenge. Despite significant advancements, this shortage continues to be a limiting factor. The YOLO (You Only Look Once) deep learning neural network has gained widespread popularity for its adaptability in tackling various machine learning tasks, notably in the field of computer vision. The YOLO algorithm has garnered growing recognition for its remarkable capability to address intricate issues when dealing with images taken in natural environments, managing noisy data, and surmounting the diverse challenges encountered in the wild. Our experiments provide a concise evaluation of text detection algorithms centered around convolutional neural networks (CNNs). Specifically, we concentrate on different versions of the YOLO models, applying identical data augmentation methods to both the SYPHAX dataset and the ICDAR MLT-2019 dataset, both of which encompass Arabic scripts within images of natural scenes. The objective of this article is to pinpoint the most efficient YOLO algorithm for recognizing Arabic script in the wild, and subsequently, to improve upon the best-performing model. Additionally, we aim to investigate potential research directions that can further enhance the capabilities of the most robust architecture in this domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bai, X., Yang, M., Lyu, P., Xu, Y., Luo, J.: Integrating scene text and visual appearance for fine-grained image classification. IEEE Access 6, 66322–66335 (2018)

    Article  Google Scholar 

  2. Abdelaziz, I., Abdou, S., Al-Barhamtoshy, H.: A large vocabulary system for Arabic online handwriting recognition. Pattern Anal. Appl. 19, 1129–1141 (2016). https://doi.org/10.1007/s10044-015-0526-7

    Article  MathSciNet  Google Scholar 

  3. Turki, H., Elleuch, M., Kherallah, M.: SYPHAX dataset. IEEE Dataport (2023). https://doi.org/10.21227/ydqd-2443

  4. Nayef, N., et al.: ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019. In: 2019 International conference on document analysis and recognition (ICDAR), pp. 1582–1587. IEEE (2019)

    Google Scholar 

  5. Sultana, F., Sufian, A., Dutta, P.: A review of object detection models based on convolutional neural network. In: Mandal, J.K., Banerjee, S. (eds.) Intelligent Computing: Image Processing Based Applications. AISC, vol. 1157, pp. 1–16. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-4288-6_1

    Chapter  Google Scholar 

  6. Turki, H., Halima, M.B., Alimi, A.M.: Text detection based on MSER and CNN features. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol. 1, pp. 949–954. IEEE (2017)

    Google Scholar 

  7. Amrouche, A., Bentrcia, Y., Hezil, N., Abed, A., Boubakeur, K.N., Ghribi, K.: Detection and localization of Arabic text in natural scene images. In: 2022 First International Conference on Computer Communications and Intelligent Systems (I3CIS), pp. 72–76. IEEE (2022)

    Google Scholar 

  8. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  9. Ravi, N., El-Sharkawy, M.: Real-time embedded implementation of improved object detector for resource-constrained devices. J. Low Power Electron. Appl. 12(2), 21 (2022)

    Article  Google Scholar 

  10. Diwan, T., Anirudh, G., Tembhurne, J.V.: Object detection using YOLO: challenges, architectural successors, datasets and applications. Multimed. Tools Appl. 82(6), 9243–9275 (2023). https://doi.org/10.1007/s11042-022-13644-y

    Article  Google Scholar 

  11. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  12. Jocher, G., Nishimura, K., Mineeva, T., Vilarino, R.: Yolov5 by ultralytics. Disponıvel em (2020). https://github.com/ultralytics/yolov5

  13. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  14. Latha, R.S., et al.: Text detection and language identification in natural scene images using YOLOv5. In: 2023 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–7. IEEE (2023)

    Google Scholar 

  15. Xu, Q., Zheng, G., Ren, W., Li, X., Yang, Z., Huang, Z.: An efficient and effective text spotter for characters in natural scene images based on an improved YOLOv5 model. In: International Conference on Artificial Intelligence, Virtual Reality, and Visualization (AIVRV 2022), vol. 12588, pp. 64–68. SPIE (2023)

    Google Scholar 

  16. Luo, Y., Zhao, C., Zhang, F.: Research on scene text detection algorithm based on modified YOLOv5. In: International Conference on Mechatronics Engineering and Artificial Intelligence (MEAI 2022), vol. 12596, pp. 620–626. SPIE (2023)

    Google Scholar 

  17. Li, C., et al.: YOLOv6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 (2022)

  18. Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)

  19. Norkobil Saydirasulovich, S., Abdusalomov, A., Jamil, M.K., Nasimov, R., Kozhamzharova, D., Cho, Y.I.: A YOLOv6-based improved fire detection approach for smart city environments. Sensors 23(6), 3161 (2023)

    Article  Google Scholar 

  20. Gupta, C., Gill, N.S., Gulia, P., Chatterjee, J.M.: A novel finetuned YOLOv6 transfer learning model for real-time object detection. J. Real-Time Image Proc. 20(3), 42 (2023). https://doi.org/10.1007/s11554-023-01299-3

    Article  Google Scholar 

  21. Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)

    Google Scholar 

  22. Negi, A., Kesarwani, Y., Saranya, P.: Text based traffic signboard detection using YOLO v7 architecture. In: Singh, M., Vipin Tyagi, P.K., Gupta, J.F., Ören, T. (eds.) Advances in Computing and Data Sciences: 7th International Conference, ICACDS 2023, Kolkata, India, April 27–28, 2023, Revised Selected Papers, pp. 1–11. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-37940-6_1

    Chapter  Google Scholar 

  23. Moussaoui, H., El Akkad, N., Benslimane, M.: Arabic and Latin license plate detection and recognition based on YOLOv7 and image processing methods (2023)

    Google Scholar 

  24. Veit, A., Matera, T., Neumann, L., Matas, J., Belongie, S.: Coco-text: dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140 (2016)

  25. Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)

    Google Scholar 

  26. Tounsi, M., Moalla, I., Alimi, A.M.: ARASTI: a database for Arabic scene text recognition. In: 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), pp. 140–144. IEEE (2017)

    Google Scholar 

  27. Ashraf, A.H., et al.: Weapons detection for security and video surveillance using CNN and YOLO-v5s. CMC-Comput. Mater. Contin. 70, 2761–2775 (2022)

    Google Scholar 

  28. Chen, R.C.: Automatic license plate recognition via sliding-window darknet-YOLO deep learning. Image Vis. Comput. 87, 47–56 (2019)

    Article  Google Scholar 

  29. Dewi, C., Chen, R.C., Jiang, X., Yu, H.: Deep convolutional neural network for enhancing traffic sign recognition developed on Yolo V4. Multimed. Tools Appl. 81(26), 37821–37845 (2022). https://doi.org/10.1007/s11042-022-12962-5

    Article  Google Scholar 

  30. Zhang, L., Xu, F., Liu, Y., Zhang, D., Gui, L., Zuo, D.: A posture detection method for augmented reality–aided assembly based on YOLO-6D. Int. J. Adv. Manuf. Technol. 125(7–8), 3385–3399 (2023). https://doi.org/10.1007/s00170-023-10964-7

    Article  Google Scholar 

  31. Zhang, D., Mao, R., Guo, R., Jiang, Y., Zhu, J.: YOLO-table: disclosure document table detection with involution. Int. J. Doc. Anal. Recognit. (IJDAR) 26(1), 1–14 (2023). https://doi.org/10.1007/s10032-022-00400-z

    Article  Google Scholar 

  32. Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019)

    Article  Google Scholar 

  33. Schaefer, S., McPhail, T., Warren, J.: Image deformation using moving least squares. In: ACM SIGGRAPH 2006 Papers, pp. 533–540 (2006)

    Google Scholar 

  34. Zeiler, M.D., Taylor, G.W., Fergus, R.: Adaptive deconvolutional networks for mid and high level feature learning. In: Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011, pp. 2018–2025 (2011)

    Google Scholar 

  35. Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 14 June 2020, Seattle, WA, USA (2020)

    Google Scholar 

  36. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Elleuch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Turki, H., Elleuch, M., Kherallah, M. (2024). Multi-lingual Scene Text Detection Containing the Arabic Scripts Using an Optimal then Enhanced YOLO Model. In: Mosbah, M., et al. Advances in Model and Data Engineering in the Digitalization Era. MEDI 2023. Communications in Computer and Information Science, vol 2071. Springer, Cham. https://doi.org/10.1007/978-3-031-55729-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-55729-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-55728-6

  • Online ISBN: 978-3-031-55729-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics