Arabic character recognition using a Haar cascade classifier approach (HCC)

Theoretical Advances
Published: 07 April 2015

Volume 19, pages 411–426, (2016)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Ashraf AbdelRaouf¹,
Colin A. Higgins²,
Tony Pridmore² &
…
Mahmoud I. Khalil³

884 Accesses
10 Citations
Explore all metrics

Abstract

Optical character recognition (OCR) shows great potential for rapid data entry, but has limited success when applied to the Arabic language. Traditional OCR problems are compounded by the nature of Arabic language and because the script is heavily connected. A machine learning, Haar cascade classifier (HCC) approach was introduced by Viola and Jones (Rapid object detection using a boosted cascade of simple features. Kauai, Hawaii, 2001) to achieve rapid object detection based on a boosted cascade of simple Haar-like features. Here, that approach is applied for the first time to suit Arabic glyph recognition. HCC approach eliminates problematic steps in the pre-processing and recognition phases and, most importantly, character segmentation stage. A classifier was produced for each of the 61 Arabic glyphs that exist after the removal of diacritical marks (dots). These classifiers were trained and tested on some 2,000 images each. The system was tested with real text images and produces a recognition rate for Arabic glyphs of 87 %. The technique gives good results relative to those achieved using a commercial Arabic OCR application and existing state-of-the-art research application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Similar content being viewed by others

Arabic Character Recognition Based M-SVM: Review

Chapter © 2014

Arabic Character Recognition

Chapter © 2014

Recognition of Handwritten Arabic Characters using Histograms of Oriented Gradient (HOG)

Article 01 April 2018

References

Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: IEEE conference on computer vision and pattern recognition (CVPR01). Kauai, Hawaii, pp 511–518
Abdelazim HY (2006) Recent trends in Arabic character recognition. In: The sixth conference on language engineering. Cairo, Egypt, pp 212–249
Adolf F (2003) How-to build a cascade of boosted classifiers based on Haar-like features. OpenCV’s Rapid Object Detection. http://lab.cntl.kyutech.ac.jp/~kobalab/nishida/opencv/OpenCV_ObjectDetection_HowTo.pdf. Accessed 3 Nov 2012
Lienhart R, Maydt J (2002) An extended set of Haar-like features for rapid object detection. In: IEEE international conference of image processing (ICIP 2002). New York, USA, pp 900–903
Lienhart R, Kuranov A, Pisarevsky V (2002) Empirical analysis of detection cascades of boosted classifiers for rapid object detection. In: 25th pattern recognition symposium (DAGM03). Madgeburg, Germany, pp 297–304
Kanoun S, Moalla I, Ennaji A, Alimi AM (2000) Script identification for Arabic and Latin, printed and handwritten documents. Presented at the 4th IAPR-international workshop on document analysis systems: DAS. Rio de Janeiro, Brazil
Kanoun S, Ennaji A, Lecourtier Y, Alimi AM (2002) Linguistic integration information in the AABATAS Arabic text analysis system. In: 8th international workshop on frontiers in handwriting recognition (IWFHR’02). Ontario, Canada, pp 389–394
Kanoun S, Alimi AM, Lecourtier Y (2005) Affixal approach for arabic decomposable vocabulary recognition: a validation on printed word in only one font. In: The 8th international conference on document analysis and recognition (ICDAR’05). Seoul, Korea, pp 1025–1029
Kanoun S, Slimane F, Guesmi H, Ingold R, Alimi AM, Hennebert J (2007) Affixal approach versus analytical approach for off-line arabic decomposable vocabulary recognition. In: 10th international conference on document analysis and recognition, ICDAR ‘09. Barcelona, Spain, 661–665
Slimane F, Ingold R, Kanoun S, Alimi AM, Hennebert J (2009) A new arabic printed text image database and evaluation protocols. In: 10th international conference on document analysis and recognition. Barcelona, Spain, pp 946–950
Benjelil M, Kanoun S, Mullot R, Alimi AM (2010) Complex documents images segmentation based on steerable pyramid features. Int J Doc Anal Recogn 13:209–228
Article Google Scholar
Moussa SB, Zahour A, Benabdelhafid A, Alimi AM (2010) New features using fractal multi-dimensions for generalized Arabic font recognition. Pattern Recogn Lett 31:361–371
Article Google Scholar
Slimane F, Kanoun S, Hennebert J, Alimi AM, Ingold R (2013) A study on font-family and font-size recognition applied to Arabic word images at ultra-low resolution. Pattern Recogn Lett 34:209–218
Article Google Scholar
The Unicode Consortium (2014) The Unicode Standard, Version 7.0.0, The Unicode Consortium, Mountain View, CA. ISBN 978-1-936213-09-2. http://www.unicode.org/versions/Unicode7.0.0/
Jaiem FK, Kanoun S, Khemakhem M, El Abed H, Kardoun J (2013) Database for Arabic printed text recognition research. In: ICIAP 2013, Part I, LNCS 8156, pp 251–259
AbdelRaouf A, Higgins C, Pridmore T, Khalil M (2010) Building a multi-modal Arabic corpus (MMAC). Int J Doc Anal Recogn 13:285–302
Article Google Scholar
The Unicode Consortium (2011) The Unicode Standard, Version 6.0.0, Chapter 8. The Unicode Consortium, Mountain View, CA. ISBN 978-1-936213-01-6. http://unicode.org/Public/UNIDATA/ArabicShaping.txt. Accessed 11 Apr 2014
Ahmed I, Mahmoud SA, Parvez MT (2012) Printed Arabic text recognition. In: Guide to OCR for arabic scripts, Springer, London, pp 147–168
Lorigo LM, Govindaraju V (2006) Offline Arabic handwriting recognition: a survey. IEEE Trans Pattern Anal Mach Intell 28:712–724
Article Google Scholar
Amin A (1997) Off line Arabic character recognition—a survey. In: The 4th international conference on document analysis and recognition. Ulm, Germany, pp 596–599
Muna LL (2014) Khayyat, Ching Y. Suen, Learning-based word spotting system for Arabic handwritten documents. Pattern Recogn 47:1021–1030
Article Google Scholar
AbdelRaouf A, Higgins C, Khalil M (2008) A database for Arabic printed character recognition. In: The international conference on image analysis and recognition-ICIAR2008. Póvoa de Varzim, Portugal, pp 567–578
Alginahi YM (2013) A survey on Arabic character segmentation. Int J Doc Anal Recogn 16:105–126
Article Google Scholar
Harty R, Ghaddar C (2004) Arabic text recognition. Int Arab J Inf Technol 1:156–163
Google Scholar
Kasinski A, Schmidt A (2010) The architecture and performance of the face and eyes detection system based on the Haar cascade classifiers. Pattern Anal Appl 13:197–211
Article MathSciNet Google Scholar
Crow FC (1984) Summed-area tables for texture mapping. SIGGRAPH Comput Graph 18:207–212
Article Google Scholar
Messom C, Barczak A (2006) Fast and efficient rotated haar-like features using rotated integral images. In: Australian conference on robotics and automation (ACRA2006), pp 1–6
AbdelRaouf A, Higgins CA, Pridmore T, Khalil MI (2014) Fast Arabic glyph recognizer based on haar cascade classifiers. Presented at the international conference on pattern recognition applications and methods (ICPRAM 2014). Angers, France
Schapire RE (2002) The boosting approach to machine learning, an overview. In: MSRI workshop on nonlinear estimation and classification. Berkeley, CA, USA, pp 149–172
Khorsheed MS (2002) Off-line arabic character recognition—a review. Pattern Anal Appl 5:31–45
Article MathSciNet Google Scholar
Senior A (1992) Off-line handwriting recognition: a review and experiments. Cambridge University, Engineering Department, Cambridge
Cheriet M, Kharma N, Liu C-L, Suen C (2007) Character recognition systems: a guide for students and practitioners. Wiley, New York
Souza A, Cheriet M, Naoi S, Suen CY (2003) Automatic filter selection using image quality assessment. In: The 7th international conference on document analysis and recognition (ICDAR’03). Edinburgh, Scotland
Ahmad I (2013) A technique for skew detection of printed Arabic documents. In: Computer graphics, imaging and visualization (CGIV), 2013 10th international conference, pp 62–67
Breuel TM (2002) Robust least square baseline finding using a branch and bound algorithm. In: Document recognition and retrieval VIII, SPIE
Broumandnia A (2007) Shanbehzadeh J Fast Zernike wavelet moments for Farsi character recognition. Image Vis Comput 25:717–726
Article Google Scholar
Touj S, Amara NEB, Amiri H (2003) Generalized hough transform for Arabic optical character recognition. In: 7th international conference on document analysis and recognition (ICDAR 2003). Edinburgh, Scotland, pp 1242–1246
Noor SM, Mohammed IA, George LE (2011) Handwritten Arabic (indian) numerals recognition using Fourier descriptor and structure base classifier. J Al-Nahrain Univ 14:215–224
Google Scholar
Gonzalez RC, Woods RE (2007) Digital image processing, 3rd edn. Prentice Hall, New Jersey, USA
Zidouri A (2007) PCA-based Arabic character feature extraction. In: 9th international symposium on signal processing and its applications (ISSPA 2007). Sharjah, United Arab Emirates, pp 1–4
Kurt Z, Turkmen HI, Karsligil ME (2009) Linear discriminant analysis in ottoman alphabet character recognition. In: The European computing conference. Tbilisi, Georgia, pp 601–607
Trenkle J, Gillies A, Erlandson E, Schlosser S, Cavin S (2001) Advances in Arabic text recognition. In: Symposium on document image understanding technology. Maryland, USA
Yalniz IZ, Altingovde IS, Güdükbay U, Ulusoy Ö (2009) Integrated segmentation and recognition of connected Ottoman script. Opt Eng 48(11):117205
Article Google Scholar
Sabbour N, Shafait F (2013) A segmentation-free approach to Arabic and Urdu OCR. In: IS&T/SPIE electronic imaging, SPIE digital library, USA, pp 86580 N-86580 N-12
Abandah GA, Younis KS, Khedher MZ (2008) Handwritten Arabic character recognition using multiple classifiers based on letter form. In: The 5th iasted international conference on signal processing, pattern recognition and applications (SPPRA 2008). Innsbruck, Austria, pp 128–133
Alma’adeed S, Higgens C, Elliman D (2002) Recognition of off-line handwritten Arabic words using hidden markov model approach. In: The 16th international conference on pattern recognition (ICPR’02). Quebec, Canada, pp 481–484
Bushofa B, Spann M (1997) Segmentation and recognition of Arabic characters by structural classification. Image Vis Comput 15:167–179
Article Google Scholar
Mehran R, Pirsiavash H, Razzazi F (2005) A front-end OCR for Omni-font Persian/Arabic cursive printed documents. In: Digital image computing: techniques and applications (DICTA’05). Cairns, Australia, pp 56–64
Rahman AFR, Fairhurst MC (2003) Multiple classifier decision combination strategies for character recognition: a review. Int J Doc Anal Recogn 5:166–194
Article Google Scholar
OpenCV (2002) Rapid object detection with a cascade of boosted classifiers based on Haar-like features. OpenCV haartraining tutorial
Sonka M, Hlavac V, Boyle R (1998) Image processing: analysis and machine vision, 2nd edn. Thomson Learning Vocational, Cengage Learning, New Delhi, India
Box GEP, Muller ME (1958) A note on the generation of random normal deviates. Ann Math Stat 29:610–611
Article MATH Google Scholar
Seo N (2008) Tutorial: OpenCV haartraining (rapid object detection with a cascade of boosted classifiers based on Haar-like features)
IRIS (2011) Readiris 12 pro. http://www.irislink.com/c2-1684-225/Readiris-12-for-Windows.aspx. Accessed 27 Jul 2011
Kohavi R, Provost F (1998) Glossary of terms. special issue on applications of machine learning and the knowledge discovery process. Mach Learn 30:271–274
Article Google Scholar
IRIS (2004) Readiris pro 10, 10th edn
Schulz KU, Mihov S (2002) Fast string correction with Levenshtein automata. Int J Doc Anal Recogn 5:67–85
Article MATH Google Scholar
Garcıa S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. J Mach Learn Res 9:2677–2694
MATH Google Scholar
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Webb GI (2000) Multiboosting: a technique for combining boosting and wagging. Mach Learn 40:159–196
Article Google Scholar
Zaiontz C (2013–2015) The data analysis for this paper was generated using the real statistics resource pack software (release 3.5). http://www.real-statistics.com. Accessed 26 Feb 2015

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, Misr International University, Cairo, Egypt
Ashraf AbdelRaouf
School of Computer Science, The University of Nottingham, Nottingham, UK
Colin A. Higgins & Tony Pridmore
Computer and Systems Engineering Department, Faculty of Engineering, Ain Shams University, Cairo, Egypt
Mahmoud I. Khalil

Authors

Ashraf AbdelRaouf
View author publications
You can also search for this author in PubMed Google Scholar
Colin A. Higgins
View author publications
You can also search for this author in PubMed Google Scholar
Tony Pridmore
View author publications
You can also search for this author in PubMed Google Scholar
Mahmoud I. Khalil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashraf AbdelRaouf.

Rights and permissions

Reprints and permissions

About this article

Cite this article

AbdelRaouf, A., Higgins, C.A., Pridmore, T. et al. Arabic character recognition using a Haar cascade classifier approach (HCC). Pattern Anal Applic 19, 411–426 (2016). https://doi.org/10.1007/s10044-015-0466-2

Download citation

Received: 22 September 2012
Accepted: 15 March 2015
Published: 07 April 2015
Issue Date: May 2016
DOI: https://doi.org/10.1007/s10044-015-0466-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions