Skip to main content
Log in

A texture-based approach for word script and nature identification

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

In this work, we propose a texture-based approach to separate handwritten from machine-printed words, written in Arabic and Latin scripts. The idea is to benefit from differences in writing orientation and the difference between the stroke length to discriminate between these scripts. For that, we designed a K nearest neighbors classifier trained with a set of texture features. These features are extracted from black run-length (BRL) histograms and seem to be suitable for finding structural characteristics in word images. Four feature extraction scenarios: (1) BRL, (2) restricted BRL, (3) BRL statistics and (4) restricted BRL combined to their statistics are chosen to demonstrate the potential of such a texture-based approach in script identification. Exploiting these features, we have got very promising result. The identification correct rate is higher than 98.92 % in our experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Saïdani A, Kacem A, Belaïd A (2013) Identification of machine-printed and handwritten words in Arabic and Latin scripts. In: Proceedings of ICDAR, pp 798–802

  2. Saïdani A, Kacem A (2014) Pyramid histogram of oriented gradient for machine-printed/handwritten and Arabic/Latin word discrimination. In: Proceedings of SoCPaR, pp 267–272

  3. Kacem A, Saïdani A, Belaïd A (2014) How to separate between machine-printed/handwritten and Arabic/Latin words? ELCVIA 13(1):1–16

    Google Scholar 

  4. Saïdani A, Kacem A, Belaïd A (2015) Co-occurrence matrix of oriented gradients for word script and nature identification. To be appeared in proceedings of ICDAR

  5. Benjelil M, Mullot R, Alimi A (2012) Language and script identification based on Steerable Pyramid Features. In: Proceedings of ICFHR, 18–20 September, Bary-Italy, pp 712–717

  6. Haboubi S, Maddouri S, Amiri H (2011) Discrimination between Arabic and Latin from bilingual documents. In: Proceedings of CCCA

  7. Mozaffari S, Bahar P (2012) Farsi/Arabic handwritten from machine-printed words discrimination. In: Proceedings of ICFHR, Italy, pp 694–699

  8. Mezghani A, Slimane F, Kanoun S, Märgner V (2014) Identification of Arabic/French-handwritten/printed words using GMM-based system. In: Proceedings of CIFED, pp 371–374

  9. Benjelil M, Mullot R (2014) Performance of curvelets, dual-tree complex wavelet and discrete wavelet transform in handwritten word classification. In: Proceedings of SoCPaR, pp 53–58

  10. Marti U, Bunke H (1999) A full english sentence database for off-line handwriting recognition. In: Proceedings of ICDAR, pp 705–708

  11. Margner V, Ellouze N, Amiri H, Pechwitz M, Snoussi S (2002) Maddouri, IFN/ENIT—database of handwritten Arabic words. In: Proceedings of CIFED, pp 129–136

  12. Mezghani A, Kanoun S, Khemakhem M, El Abed H (2012) A Database for Arabic handwritten text image recognition and writer identification. In: Proceedings of ICFHR, pp 399–402

  13. Slimane F, Ingold R, Kanoun S, Alimi A, Hennebert J (2009) A new Arabic printed text image database and evaluation protocols. In: Proceedings of ICDAR, pp 946–950

  14. Ladha L, Deepa T (2011) Feature selection methods and algorithms. Int J Comput Sci Eng 3(5):1787–1797

    Google Scholar 

  15. Galloway MM (1975) Texture analysis using gray level run lengths. Comput Graph Image Process 4(2):172–179

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Afef Kacem.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kacem, A., Saïdani, A. A texture-based approach for word script and nature identification. Pattern Anal Applic 20, 1157–1167 (2017). https://doi.org/10.1007/s10044-016-0555-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-016-0555-x

Keywords

Navigation