Multi-lingual Scene Text Detection Containing the Arabic Scripts Using an Optimal then Enhanced YOLO Model

Turki, Houssem; Elleuch, Mohamed; Kherallah, Monji

doi:10.1007/978-3-031-55729-3_5

Houssem Turki^13,16,
Mohamed Elleuch^14,15,16 &
Monji Kherallah^15,16

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2071))

Included in the following conference series:

International Conference on Model and Data Engineering

43 Accesses

Abstract

In the past few years, notable progress has been achieved in the field of deep learning, particularly in the realm of identifying text within images of natural scenes, owing to the advancements in machine learning and artificial intelligence. The effectiveness of deep learning and text detection in the wild, especially when dealing with Arabic language, is frequently hindered by the scarcity of diverse datasets encompassing multiple languages and scripts, which poses an additional challenge. Despite significant advancements, this shortage continues to be a limiting factor. The YOLO (You Only Look Once) deep learning neural network has gained widespread popularity for its adaptability in tackling various machine learning tasks, notably in the field of computer vision. The YOLO algorithm has garnered growing recognition for its remarkable capability to address intricate issues when dealing with images taken in natural environments, managing noisy data, and surmounting the diverse challenges encountered in the wild. Our experiments provide a concise evaluation of text detection algorithms centered around convolutional neural networks (CNNs). Specifically, we concentrate on different versions of the YOLO models, applying identical data augmentation methods to both the SYPHAX dataset and the ICDAR MLT-2019 dataset, both of which encompass Arabic scripts within images of natural scenes. The objective of this article is to pinpoint the most efficient YOLO algorithm for recognizing Arabic script in the wild, and subsequently, to improve upon the best-performing model. Additionally, we aim to investigate potential research directions that can further enhance the capabilities of the most robust architecture in this domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bai, X., Yang, M., Lyu, P., Xu, Y., Luo, J.: Integrating scene text and visual appearance for fine-grained image classification. IEEE Access 6, 66322–66335 (2018)
Article Google Scholar
Abdelaziz, I., Abdou, S., Al-Barhamtoshy, H.: A large vocabulary system for Arabic online handwriting recognition. Pattern Anal. Appl. 19, 1129–1141 (2016). https://doi.org/10.1007/s10044-015-0526-7
Article MathSciNet Google Scholar
Turki, H., Elleuch, M., Kherallah, M.: SYPHAX dataset. IEEE Dataport (2023). https://doi.org/10.21227/ydqd-2443
Nayef, N., et al.: ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019. In: 2019 International conference on document analysis and recognition (ICDAR), pp. 1582–1587. IEEE (2019)
Google Scholar
Sultana, F., Sufian, A., Dutta, P.: A review of object detection models based on convolutional neural network. In: Mandal, J.K., Banerjee, S. (eds.) Intelligent Computing: Image Processing Based Applications. AISC, vol. 1157, pp. 1–16. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-4288-6_1
Chapter Google Scholar
Turki, H., Halima, M.B., Alimi, A.M.: Text detection based on MSER and CNN features. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol. 1, pp. 949–954. IEEE (2017)
Google Scholar
Amrouche, A., Bentrcia, Y., Hezil, N., Abed, A., Boubakeur, K.N., Ghribi, K.: Detection and localization of Arabic text in natural scene images. In: 2022 First International Conference on Computer Communications and Intelligent Systems (I3CIS), pp. 72–76. IEEE (2022)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Ravi, N., El-Sharkawy, M.: Real-time embedded implementation of improved object detector for resource-constrained devices. J. Low Power Electron. Appl. 12(2), 21 (2022)
Article Google Scholar
Diwan, T., Anirudh, G., Tembhurne, J.V.: Object detection using YOLO: challenges, architectural successors, datasets and applications. Multimed. Tools Appl. 82(6), 9243–9275 (2023). https://doi.org/10.1007/s11042-022-13644-y
Article Google Scholar
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Jocher, G., Nishimura, K., Mineeva, T., Vilarino, R.: Yolov5 by ultralytics. Disponıvel em (2020). https://github.com/ultralytics/yolov5
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Latha, R.S., et al.: Text detection and language identification in natural scene images using YOLOv5. In: 2023 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–7. IEEE (2023)
Google Scholar
Xu, Q., Zheng, G., Ren, W., Li, X., Yang, Z., Huang, Z.: An efficient and effective text spotter for characters in natural scene images based on an improved YOLOv5 model. In: International Conference on Artificial Intelligence, Virtual Reality, and Visualization (AIVRV 2022), vol. 12588, pp. 64–68. SPIE (2023)
Google Scholar
Luo, Y., Zhao, C., Zhang, F.: Research on scene text detection algorithm based on modified YOLOv5. In: International Conference on Mechatronics Engineering and Artificial Intelligence (MEAI 2022), vol. 12596, pp. 620–626. SPIE (2023)
Google Scholar
Li, C., et al.: YOLOv6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 (2022)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
Norkobil Saydirasulovich, S., Abdusalomov, A., Jamil, M.K., Nasimov, R., Kozhamzharova, D., Cho, Y.I.: A YOLOv6-based improved fire detection approach for smart city environments. Sensors 23(6), 3161 (2023)
Article Google Scholar
Gupta, C., Gill, N.S., Gulia, P., Chatterjee, J.M.: A novel finetuned YOLOv6 transfer learning model for real-time object detection. J. Real-Time Image Proc. 20(3), 42 (2023). https://doi.org/10.1007/s11554-023-01299-3
Article Google Scholar
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)
Google Scholar
Negi, A., Kesarwani, Y., Saranya, P.: Text based traffic signboard detection using YOLO v7 architecture. In: Singh, M., Vipin Tyagi, P.K., Gupta, J.F., Ören, T. (eds.) Advances in Computing and Data Sciences: 7th International Conference, ICACDS 2023, Kolkata, India, April 27–28, 2023, Revised Selected Papers, pp. 1–11. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-37940-6_1
Chapter Google Scholar
Moussaoui, H., El Akkad, N., Benslimane, M.: Arabic and Latin license plate detection and recognition based on YOLOv7 and image processing methods (2023)
Google Scholar
Veit, A., Matera, T., Neumann, L., Matas, J., Belongie, S.: Coco-text: dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140 (2016)
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Google Scholar
Tounsi, M., Moalla, I., Alimi, A.M.: ARASTI: a database for Arabic scene text recognition. In: 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), pp. 140–144. IEEE (2017)
Google Scholar
Ashraf, A.H., et al.: Weapons detection for security and video surveillance using CNN and YOLO-v5s. CMC-Comput. Mater. Contin. 70, 2761–2775 (2022)
Google Scholar
Chen, R.C.: Automatic license plate recognition via sliding-window darknet-YOLO deep learning. Image Vis. Comput. 87, 47–56 (2019)
Article Google Scholar
Dewi, C., Chen, R.C., Jiang, X., Yu, H.: Deep convolutional neural network for enhancing traffic sign recognition developed on Yolo V4. Multimed. Tools Appl. 81(26), 37821–37845 (2022). https://doi.org/10.1007/s11042-022-12962-5
Article Google Scholar
Zhang, L., Xu, F., Liu, Y., Zhang, D., Gui, L., Zuo, D.: A posture detection method for augmented reality–aided assembly based on YOLO-6D. Int. J. Adv. Manuf. Technol. 125(7–8), 3385–3399 (2023). https://doi.org/10.1007/s00170-023-10964-7
Article Google Scholar
Zhang, D., Mao, R., Guo, R., Jiang, Y., Zhu, J.: YOLO-table: disclosure document table detection with involution. Int. J. Doc. Anal. Recognit. (IJDAR) 26(1), 1–14 (2023). https://doi.org/10.1007/s10032-022-00400-z
Article Google Scholar
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019)
Article Google Scholar
Schaefer, S., McPhail, T., Warren, J.: Image deformation using moving least squares. In: ACM SIGGRAPH 2006 Papers, pp. 533–540 (2006)
Google Scholar
Zeiler, M.D., Taylor, G.W., Fergus, R.: Adaptive deconvolutional networks for mid and high level feature learning. In: Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011, pp. 2018–2025 (2011)
Google Scholar
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 14 June 2020, Seattle, WA, USA (2020)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

National Engineering School of Sfax (ENIS), University of Sfax, Sfax, Tunisia
Houssem Turki
Higher Institute of Computer Science and Management of Kairoun (ISIGK), University of Kairoun, Kairoun, Tunisia
Mohamed Elleuch
Faculty of Sciences, University of Sfax, Sfax, Tunisia
Mohamed Elleuch & Monji Kherallah
Advanced Technologies for Environment and Smart Cities (ATES Unit), University of Sfax, Sfax, Tunisia
Houssem Turki, Mohamed Elleuch & Monji Kherallah

Authors

Houssem Turki
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Elleuch
View author publications
You can also search for this author in PubMed Google Scholar
Monji Kherallah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Elleuch .

Editor information

Editors and Affiliations

Bordeaux INP, Talence, France
Mohamed Mosbah
University College Dublin, Dublin, Ireland
Tahar Kechadi
ENSMA, Chasseneuil-du-Poitou, France
Ladjel Bellatreche
University of Sfax, Sfax, Tunisia
Faiez Gargouri
University of Lyon, Villeurbanne, France
Chirine Ghedira Guegan
ENSA-Tangier, Morocco, Tétouan, Morocco
Hassan Badir
Macquarie University, Sydney, NSW, Australia
Amin Beheshti
University of Sfax, Manouba, Tunisia
Mohamed Mohsen Gammoudi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Turki, H., Elleuch, M., Kherallah, M. (2024). Multi-lingual Scene Text Detection Containing the Arabic Scripts Using an Optimal then Enhanced YOLO Model. In: Mosbah, M., et al. Advances in Model and Data Engineering in the Digitalization Era. MEDI 2023. Communications in Computer and Information Science, vol 2071. Springer, Cham. https://doi.org/10.1007/978-3-031-55729-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-55729-3_5
Published: 21 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-55728-6
Online ISBN: 978-3-031-55729-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics