Skip to main content

Benchmarking Post-processing Techniques for Offline Arabic Text Recognition System

  • Conference paper
  • First Online:
Proceedings of the 16th International Conference on Hybrid Intelligent Systems (HIS 2016) (HIS 2016)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 552))

Included in the following conference series:

  • 1061 Accesses

Abstract

Automatic recognition of offline Arabic text still faces a big challenge due to the Arabic script nature. Recently, researcher’s attention has been increased and variant methods had been applied in this area. This paper presents a comparative study of four OCR (Optical Character Recognition) post-processing error correction techniques. We evaluate their impact using two recognition approaches: a lexicon driven approach with and without the presence of OOV (Out Of Vocabulary) words and a lexicon free-based approach. An AOCR (Arabic Optical Character Recognition) is developed for this purpose. This system is based on HMM (Hidden Markov Model) segmentation free approach. A sliding window is performed on the line image from right to left in order to extract the oriented gradient histogram (HOG) features. Experiments are carried out on KAFD database using different scenarios and revealed a significant improvement in OCR error correction rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fink, G.A., Zhuang, C., Zhu, L.: A post-processing approach for handwritten Chinese address recognition. J. Chin. Inf. Process. (2006)

    Google Scholar 

  2. Farooq, F., Jose, D., Govindaraju, V.: Phrase-based correction model for improving handwriting recognition accuracies. Pattern Recogn. 42(12), 3271–3277 (2009)

    Article  MATH  Google Scholar 

  3. Perez-Cortes, J., Amengual, J., Arlandis, J., Llobet, R.: Stochastic error correcting parsing for OCR postprocessing. In: International Conference on Pattern Recognition (ICPR), vol. 4, pp. 405–408 (2000)

    Google Scholar 

  4. Llobet, R., Navarro-Cerdan, J.R., Perez-Cortes, J.-C., Arlandis, J.: OCR post-processing using weighted finite-state transducers. In: International Conference on Pattern Recognition (ICPR) (2010)

    Google Scholar 

  5. Mangu, L., Brill, E.: Automatic rule acquisition for spelling correction. In: International Conference on Machine Learning (ICML) (1997)

    Google Scholar 

  6. Hull, J.J.: Documents skew detection: survey and annotated bibliography. In: Document Analysis Systems II, pp. 40–64. World Scientific (1998)

    Google Scholar 

  7. Sauvola, J., PietikaKinen, M.: Adaptive document image binarization. Pattern Recogn. (PR) 33(2), 225–236 (2000)

    Article  Google Scholar 

  8. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. (IJCV) 60(2), 91–110 (2004)

    Article  Google Scholar 

  9. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)

    Google Scholar 

  10. HTK Speech Recognition Toolkit. http://htk.eng.cam.ac.uk/

  11. Kukich, K.: Techniques for automatically correcting words in text. ACM Comput. Surv. 24(4), 377–439 (1992)

    Article  Google Scholar 

  12. Fiscus, J.G.: A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER). In: Automatic Speech Recognition and Understanding. National Institute of Standards and Technology, Gaithersburg (1997)

    Google Scholar 

  13. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Cybern. Control Theor. 10(8), 707–710 (1966)

    MathSciNet  MATH  Google Scholar 

  14. http://www.googleguide.com/spelling_corrections.html

  15. Brants, T., Franz, A.: Web 1T 5-gram Version 1. Linguistic Data Consortium, Philadelphia (2006)

    Google Scholar 

  16. Wemhoener, D., Yalniz, I.Z., Manmatha, R.: Creating an improved version using noisy OCR from multiple editions. In: International Conference on Document analysis and Recognition (ICDAR) (2013)

    Google Scholar 

  17. Zeki Yalniz, I., Manmatha, R.: A fast alignment scheme for automatic OCR evaluation of books. In: International Conference on Document analysis and Recognition (ICDAR) (2011)

    Google Scholar 

  18. Brakensiek, A., Willett, D., Rigoll, G.: Unlimited vocabulary script recognition using character n-grams. In: Proceedings of the 22nd DAGM Symposium, pp. 436–443 (2000)

    Google Scholar 

  19. Luqman, H., Mahmoud, S.A., Awaida, S.: KAFD Arabic font database. Pattern Recogn. 47(6), 2231–2240 (2014)

    Article  Google Scholar 

  20. Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7, 171–176 (1964)

    Article  Google Scholar 

  21. Young, S.J., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D.,Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.2. 1). Cambridge University Engineering Department (2002)

    Google Scholar 

  22. Liu, L.-M., Babad, Y.M., Sun, W., Chan, K.-K.: Adaptive post processing of OCR text via knowledge acquisition. In: Proceedings of the 19th Annual Conference on Computer Science (1991)

    Google Scholar 

  23. Yalniz, I.Z., Manmatha, R.: A fast alignment scheme for automatic OCR evaluation of books. In: International Conference on Document analysis and Recognition (ICDAR) (2011)

    Google Scholar 

  24. Markov, A.A.: Essai d‟une Recherche Statistique Sur le Texte du Roman. “Eugène Oneguine”, Bulletin de l’Académie Impériale des Sciences de St.-Pétersbourg. VI série, 7(3), 153–162 (1913)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sana Khamekhem Jemni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Jemni, S.K., Kesentini, Y., Kanoun, S. (2017). Benchmarking Post-processing Techniques for Offline Arabic Text Recognition System. In: Abraham, A., Haqiq, A., Alimi, A., Mezzour, G., Rokbani, N., Muda, A. (eds) Proceedings of the 16th International Conference on Hybrid Intelligent Systems (HIS 2016). HIS 2016. Advances in Intelligent Systems and Computing, vol 552. Springer, Cham. https://doi.org/10.1007/978-3-319-52941-7_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52941-7_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52940-0

  • Online ISBN: 978-3-319-52941-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics