skip to main content
10.1145/3607720.3607765acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnissConference Proceedingsconference-collections
research-article

An Innovative Ground Truth Dataset for Automated Validation of Arabic Handwritten Character Segmentation Algorithms

Published: 13 November 2023 Publication History

Abstract

Character segmentation is one of the most critical phases of Arabic handwriting recognition systems. The validation of Arabic Handwritten Character Segmentation (AHCS) algorithms requires Ground Truthed Datasets (GTD) that provide information on how characters should be segmented. Existing Arabic handwritten datasets provide Ground Truth Files (GTFs) which describe many entries, however, up to date, there is a lack of character-level information. Very few attempts have been made in the literature to establish character-level GTFs. Existing attempts provide characters’ boundaries or unique Segmentation Points (SP). The concept of constant SPs does not go in line with Arabic script features which suppose the presence of many possible SPs between two successive characters. Consequently, existing GTDs cannot allow reliable validation for all varieties of AHCS algorithms. In this paper, we propose a new ground-truthing concept that captures Segmentation Areas (SA) instead of SPs. This concept could respond to all varieties of AHCS algorithms and provides information about overlapping/touching characters and vertical ligatures. Our proposed GTD consists of 1400 GTF describing character-level information for a subset of 1400 images from the IFN/ENIT database. The proposed dataset is publicly available at (https://www.kaggle.com/datasets/mohsineelkhayati/character-level-ground-truth-dataset-for-ifnenit)

References

[1]
Khairun Nisaa Mohd, Airil Haimi Mohd Adnan, Ahmad Arifuddin Yusof, Muhamad Khairul Ahmad, and Muhammad Anwar Mohd Kamal. 2019. Teaching Arabic Language to Malaysian University Students Using Education Technologies Based on Education 4.0 Principles. 38–51.
[2]
M. S. Khorsheed. 2002. Off-Line Arabic Character Recognition – A Review. Pattern Analysis & Applications 5, 1: 31–45. https://doi.org/10.1007/s100440200004
[3]
Ramy Al-Hajj, Chafic Mokbel, L Et, and Laurence Likforman-Sulem. 2006. Reconnaissance de l’écriture arabe cursive: combinaison de classifieurs MMCs à fenêtres orientées.
[4]
BERRIN Yanikoglu and PETER A. Sandon. 1998. SEGMENTATION OF OFF-LINE CURSIVE HANDWRITING USING LINEAR PROGRAMMING. Pattern Recognition 31, 12: 1825–1833. https://doi.org/10.1016/S0031-3203(98)00081-8
[5]
Wongyu Cho, Seong-Whan Lee, and Jin H. Kim. 1995. Modeling and recognition of cursive words with hidden Markov models. Pattern Recognition 28, 12: 1941–1953. https://doi.org/10.1016/0031-3203(95)00041-0
[6]
Mansoor Alghamdi and William Teahan. 2018. Printed Arabic Script Recognition: A Survey. International Journal of Advanced Computer Science and Applications (IJACSA) 9, 9. https://doi.org/10.14569/IJACSA.2018.090953
[7]
Fitriyatul Qomariyah, Fitri Utaminingrum, and Wayan Firdaus Mahmudy. 2017. The Segmentation of Printed Arabic Characters Based on Interest Point. Journal of Telecommunication, Electronic and Computer Engineering (JTEC) 9, 2–8: 19–24.
[8]
Hesham M. Eraqi and Sherif Abdelazeem. 2012. A New Efficient Graphemes Segmentation Technique for Offline Arabic Handwriting. In 2012 International Conference on Frontiers in Handwriting Recognition, 95–100. https://doi.org/10.1109/ICFHR.2012.162
[9]
Khaoula Fergani and Abdelhak Bennia. 2018. New Segmentation Method for Analytical Recognition of Arabic Handwriting Using a Neural-Markovian Method. International Journal of Engineering and Technologies 14: 14–30. https://doi.org/10.18052/www.scipress.com/IJET.14.14
[10]
Jabril Ramdan, Khairuldin Omar, and Mohammad Faidzul. 2017. A Novel Method to Detect Segmentation points of Arabic Words using Peaks and Neural Network. International Journal on Advanced Science, Engineering and Information Technology 7, 2: 625–631. https://doi.org/10.18517/ijaseit.7.2.1824
[11]
Nada Essa, Eman El-Daydamony, and Ahmed Atwan Mohamed. 2018. Enhanced technique for Arabic handwriting recognition using deep belief network and a morphological algorithm for solving ligature segmentation. ETRI Journal 40, 6: 774–787. https://doi.org/10.4218/etrij.2017-0248
[12]
Yasser M. Alginahi. 2013. A survey on Arabic character segmentation. International Journal on Document Analysis and Recognition (IJDAR) 16, 2: 105–126. https://doi.org/10.1007/s10032-012-0188-6
[13]
Bijan Timsari and Hamid Fahimi. 1996. Morphological approach to character recognition in machine-printed Persian words. In Document Recognition III, 184–191. https://doi.org/10.1117/12.234724
[14]
D. Motawa, A. Amin, and R. Sabourin. 1997. Segmentation of Arabic cursive script. In Proceedings of the Fourth International Conference on Document Analysis and Recognition, 625–628 vol.2. https://doi.org/10.1109/ICDAR.1997.620580
[15]
Mohsine Elkhayati, Youssfi Elkettani, and Mohammed Mourchid. 2022. Segmentation of Handwritten Arabic Graphemes Using a Directed Convolutional Neural Network and Mathematical Morphology Operations. Pattern Recognition 122: 108288. https://doi.org/10.1016/j.patcog.2021.108288
[16]
Khader Mohammad, Aziz Qaroush, Muna Ayesh, Mahdi Washha, Ahmad Alsadeh, and Sos Agaian. 2019. Contour-based character segmentation for printed Arabic text with diacritics. Journal of Electronic Imaging 28, 04: 1. https://doi.org/10.1117/1.JEI.28.4.043030
[17]
Inam Ullah, Mohd Sanusi Azmi, Mohamad Ishak Desa, and Yazan M. Alomari. 2019. Segmentation of Touching Arabic Characters in Handwritten Documents by Overlapping Set Theory and Contour Tracing. International Journal of Advanced Computer Science and Applications (IJACSA) 10, 5. https://doi.org/10.14569/IJACSA.2019.0100519
[18]
Mohammad Tanvir Parvez and Sabri A. Mahmoud. 2013. Lexicon Reduction Using Segment Descriptors for Arabic Handwriting Recognition. In 2013 12th International Conference on Document Analysis and Recognition, 1265–1269. https://doi.org/10.1109/ICDAR.2013.256
[19]
Amjad Rehman, Majid Harouni, and Tanzila Saba. 2019. Cursive Multilingual Characters Recognition Based on Hard Geometric Features. arXiv:1904.08760 [cs]. Retrieved June 13, 2021 from http://arxiv.org/abs/1904.08760
[20]
Abdelhay Zoizou, Arsalane Zarghili, and Ilham Chaker. 2020. A new hybrid method for Arabic multi-font text segmentation, and a reference corpus construction. Journal of King Saud University - Computer and Information Sciences 32, 5: 576–582. https://doi.org/10.1016/j.jksuci.2018.07.003
[21]
Raid Saabni. 2014. Efficient recognition of machine printed Arabic text using partial segmentation and Hausdorff distance. In 2014 6th International Conference of Soft Computing and Pattern Recognition (SoCPaR), 284–289. https://doi.org/10.1109/SOCPAR.2014.7008020
[22]
Aziz Qaroush, Bassam Jaber, Khader Mohammad, Mahdi Washaha, Eman Maali, and Nibal Nayef. 2019. An efficient, font independent word and character segmentation algorithm for printed Arabic text. Journal of King Saud University - Computer and Information Sciences. https://doi.org/10.1016/j.jksuci.2019.08.013
[23]
Yousef Elarian, Abdelmalek Zidouri, and Wasfi Al-Khatib. 2014. Ground-Truth and Metric for the Evaluation of Arabic Handwritten Character Segmentation. In 2014 14th International Conference on Frontiers in Handwriting Recognition, 766–770. https://doi.org/10.1109/ICFHR.2014.134
[24]
Moftah Elzobi, Ayoub Al-Hamadi, Zaher Al Aghbari, and Laslo Dings. 2013. IESK-ArDB: a database for handwritten Arabic and an optimized topological segmentation approach. International Journal on Document Analysis and Recognition (IJDAR) 16, 3: 295–308. https://doi.org/10.1007/s10032-012-0190-z
[25]
Fadoua Bouafif Samoud, Samia Snoussi Maddouri, and Hamid Amiri. 2012. Three Evaluation Criteria's towards a Comparison of Two Characters Segmentation Methods for Handwritten Arabic Script. In 2012 International Conference on Frontiers in Handwriting Recognition, 774–779. https://doi.org/10.1109/ICFHR.2012.283
[26]
M. Pechwitz, S. Snoussi Maddouri, V. Märgner, N. Ellouze, and H. Amiri. 2002. IFN/ENIT-database of handwritten Arabic. In the 7th Colloque International Francophone sur l'Ecrit et le Document, CIFED, 129–136.
[27]
Amani Ali Ahmed Ali and M. Suresha. 2019. An Efficient Character Segmentation Algorithm for Recognition of Arabic Handwritten Script. In 2019 International Conference on Data Science and Communication (IconDSC), 1–6. https://doi.org/10.1109/IconDSC.2019.8817037

Index Terms

  1. An Innovative Ground Truth Dataset for Automated Validation of Arabic Handwritten Character Segmentation Algorithms

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      NISS '23: Proceedings of the 6th International Conference on Networking, Intelligent Systems & Security
      May 2023
      451 pages
      ISBN:9798400700194
      DOI:10.1145/3607720
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 November 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Arabic handwriting recognition
      2. Arabic handwritten character segmentation
      3. Automatic validation
      4. Ground truth dataset

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      NISS 2023

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 13
        Total Downloads
      • Downloads (Last 12 months)10
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 14 Jan 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media