Using an Optimal then Enhanced YOLO Model for Multi-Lingual Scene Text Detection Containing the Arabic Scripts

Turki, Houssem; Elleuch, Mohamed; Kherallah, Monji

doi:10.1007/978-981-97-0376-0_34

Houssem Turki¹¹,
Mohamed Elleuch¹² &
Monji Kherallah¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14403))

Included in the following conference series:

Pacific-Rim Symposium on Image and Video Technology

105 Accesses

Abstract

In recent years, significant advancements have been made in deep learning and the recognition of text in images of natural scenes, thanks to the advancements in machine learning and artificial intelligence. The limited availability of diverse datasets containing multiple languages and scripts often restricts the effectiveness of deep learning and text detection in the wild, particularly when it comes to Arabic language as an additional challenge. Despite notable progress, this scarcity remains a constraint. The deep learning neural network known as YOLO (You Only Look Once) has become widely popular due to its versatility in addressing a wide range of machine learning tasks, particularly in the domain of computer vision. The YOLO algorithm has gained increasing acknowledgment for its outstanding ability to tackle complex problems in conjunction with complex backgrounds of an image captured from nature, handle noisy data, and overcome various challenges encountered in real-world situations. Our experiments offer a succinct analysis of text detection algorithms that rely on convolutional neural networks (CNNs); In particular, we focus on various iterations of the YOLO models, employing same specific data augmentation techniques on both SYPHAX dataset and ICDAR MLT-2019 dataset, which comprise Arabic scripts in real natural scene images. The aim of this article is to identify the most effective YOLO algorithm for detecting text containing the Arabic scripts in the wild then to enhance this optimal model obtained in addition to explore potential research avenues that can enhance the capabilities of the most robust architecture in this field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Text detection, recognition, and script identification in natural scene images: a Review

Article 05 July 2022

Benchmarked multi-script Thai scene text dataset and its multi-class detection solution

Article 07 January 2021

A Novel Multi-scale Deep Neural Framework for Script Invariant Text Detection

Article 10 January 2022

References

Bai, X., Yang, M., Lyu, P., Xu, Y., Luo, J.: Integrating scene text and visual appearance for fine-grained image classification. IEEE Access 6, 66322–66335 (2018)
Article Google Scholar
Abdelaziz, I., Abdou, S., Al-Barhamtoshy, H.: A large vocabulary system for Arabic online handwriting recognition. Pattern Anal. Appl. 19, 1129–1141 (2016)
Article MathSciNet Google Scholar
Turki, H., Elleuch, M., Kherallah, M.: SYPHAX Dataset. IEEE Dataport (2023). https://doi.org/10.21227/ydqd-2443
Nayef, N., et al.: ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1582–1587. IEEE (2019)
Google Scholar
Sultana, F., Sufian, A., Dutta, P.: A review of object detection models based on convolutional neural network. Intell. Comput.: Image Proc. Based Appl., 1–16 (2020)
Google Scholar
Turki, H., Halima, M.B., Alimi, A.M.: Text detection based on MSER and CNN features. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 949–954. IEEE (2017)
Google Scholar
Amrouche, A., Bentrcia, Y., Hezil, N., Abed, A., Boubakeur, K.N., Ghribi, K.: Detection and localization of Arabic text in natural scene images. In: 2022 First International Conference on Computer Communications and Intelligent Systems (I3CIS), pp. 72–76. IEEE (2022)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Ravi, N., El-Sharkawy, M.: Real-time embedded implementation of improved object detector for resource-constrained devices. J. Low Power Electron. Appl. 12(2), 21 (2022)
Article Google Scholar
Diwan, T., Anirudh, G., Tembhurne, J.V.: Object detection using YOLO: challenges, architectural successors, datasets and applications. Multimedia Tools Appl. 82(6), 9243–9275 (2023)
Article Google Scholar
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint: arXiv:2004.10934 (2020)
Jocher, G., Nishimura, K., Mineeva, T., Vilarino, R.: Yolov5 by ultralytics. Disponıvel em: https://github.com/ultralytics/yolov5 (2020)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint: arXiv:1804.02767 (2018)
Latha, R.S., et al.: Text detection and language identification in natural scene images using YOLOv5. In: 2023 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–7. IEEE (2023)
Google Scholar
Xu, Q., Zheng, G., Ren, W., Li, X., Yang, Z., Huang, Z.: An efficient and effective text spotter for characters in natural scene images based on an improved YOLOv5 model. In: International Conference on Artificial Intelligence, Virtual Reality, and Visualization (AIVRV 2022), vol. 12588, pp. 64–68. SPIE (2023)
Google Scholar
Luo, Y., Zhao, C., Zhang, F.: Research on scene text detection algorithm based on modified YOLOv5. In: International Conference on Mechatronics Engineering and Artificial Intelligence (MEAI 2022), vol. 12596, pp. 620–626. SPIE (2023)
Google Scholar
Li, C., et al.: YOLOv6: a single-stage object detection framework for industrial applications. arXiv preprint: arXiv:2209.02976 (2022)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding YOLO series in 2021. arXiv preprint: arXiv:2107.08430 (2021)
Norkobil Saydirasulovich, S., Abdusalomov, A., Jamil, M.K., Nasimov, R., Kozhamzharova, D., Cho, Y.I.: A YOLOv6-based improved fire detection approach for smart city environments. Sensors 23(6), 3161 (2023)
Article Google Scholar
Gupta, C., Gill, N.S., Gulia, P., Chatterjee, J.M.: A novel finetuned YOLOv6 transfer learning model for real-time object detection. J. Real-Time Image Proc. 20(3), 42 (2023)
Article Google Scholar
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)
Google Scholar
Negi, A., Kesarwani, Y., Saranya, P.: Text Based Traffic Signboard Detection Using YOLO v7 Architecture. In: Singh, M., Tyagi, V., Gupta, P., Flusser, J., Ören, T. (eds.) Advances in Computing and Data Sciences. Communications in Computer and Information Science, vol. 1848, pp. 1–11. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-37940-6_1
Moussaoui, H., El Akkad, N., Benslimane, M.: Arabic and Latin license plate detection and recognition based on YOLOv7 and image processing methods (2023)
Google Scholar
Veit, A., Matera, T., Neumann, L., Matas, J., Belongie, S.: Coco-text: dataset and benchmark for text detection and recognition in natural images. arXiv preprint: arXiv:1601.07140 (2016)
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Google Scholar
Tounsi, M., Moalla, I., Alimi, A.M.: ARASTI: a database for Arabic scene text recognition. In: 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), pp. 140–144. IEEE (2017)
Google Scholar
Ashraf, A.H., et al.: Weapons detection for security and video surveillance using CNN and YOLO-v5s. CMC-Comput. Mater. Contin 70, 2761–2775 (2022)
Google Scholar
Chen, R.C.: Automatic License Plate Recognition via sliding-window darknet-YOLO deep learning. Image Vis. Comput. 87, 47–56 (2019)
Article Google Scholar
Dewi, C., Chen, R.C., Jiang, X., Yu, H.: Deep convolutional neural network for enhancing traffic sign recognition developed on YOLO v4. Multimedia Tools Appl. 81(26), 37821–37845 (2022)
Article Google Scholar
Zhang, L., Xu, F., Liu, Y., Zhang, D., Gui, L., Zuo, D.: A posture detection method for augmented reality–aided assembly based on YOLO-6D. Int. J. Adv. Manufact. Technol. 125(7–8), 3385–3399 (2023)
Google Scholar
Zhang, D., Mao, R., Guo, R., Jiang, Y., Zhu, J.: YOLO-table: disclosure document table detection with involution. Int. J. Doc. Anal. Recogn. (IJDAR) 26(1), 1–14 (2023)
Article Google Scholar
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019)
Article Google Scholar
Schaefer, S., McPhail, T., Warren, J.: Image deformation using moving least squares. In: ACM SIGGRAPH 2006 Papers, pp. 533–540 (2006)
Google Scholar
Zeiler, M.D., Taylor, G.W., Fergus, R.: Adaptive deconvolutional networks for mid and high level feature learning. In: Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011, pp. 2018–2025 (2011)
Google Scholar
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14 June 2020 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

National Engineering School of Sfax (ENIS), University of Sfax, Sfax, Tunisia
Houssem Turki
National School of Computer Science (ENSI), University of Manouba, Manouba, Tunisia
Mohamed Elleuch
Faculty of Sciences, University of Sfax, Sfax, Tunisia
Monji Kherallah

Authors

Houssem Turki
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Elleuch
View author publications
You can also search for this author in PubMed Google Scholar
Monji Kherallah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Elleuch .

Editor information

Editors and Affiliations

Auckland University of Technology, Auckland, New Zealand
Wei Qi Yan
Auckland University of Technology, Auckland, New Zealand
Minh Nguyen
Auckland University of Technology, Auckland, New Zealand
Parma Nand
Auckland University of Technology, Auckland, New Zealand
Xuejun Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Turki, H., Elleuch, M., Kherallah, M. (2024). Using an Optimal then Enhanced YOLO Model for Multi-Lingual Scene Text Detection Containing the Arabic Scripts. In: Yan, W.Q., Nguyen, M., Nand, P., Li, X. (eds) Image and Video Technology. PSIVT 2023. Lecture Notes in Computer Science, vol 14403. Springer, Singapore. https://doi.org/10.1007/978-981-97-0376-0_34

Download citation

DOI: https://doi.org/10.1007/978-981-97-0376-0_34
Published: 12 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0375-3
Online ISBN: 978-981-97-0376-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Using an Optimal then Enhanced YOLO Model for Multi-Lingual Scene Text Detection Containing the Arabic Scripts

Abstract

Access this chapter

Similar content being viewed by others

Text detection, recognition, and script identification in natural scene images: a Review

Benchmarked multi-script Thai scene text dataset and its multi-class detection solution

A Novel Multi-scale Deep Neural Framework for Script Invariant Text Detection

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Using an Optimal then Enhanced YOLO Model for Multi-Lingual Scene Text Detection Containing the Arabic Scripts

Abstract

Access this chapter

Similar content being viewed by others

Text detection, recognition, and script identification in natural scene images: a Review

Benchmarked multi-script Thai scene text dataset and its multi-class detection solution

A Novel Multi-scale Deep Neural Framework for Script Invariant Text Detection

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation