Skip to main content

Machine-Printed Character Recognition

  • Reference work entry
  • First Online:
Handbook of Document Image Processing and Recognition
  • 4104 Accesses

Abstract

This chapter reviews salient advances in techniques for machine-printed character recognition. Section “Overview” provides a historical perspective (The description of the historical evolution of OCR is based upon the Wikipedia entry for this topic: http://en.wikipedia.org/wiki/Optical_character_recognition. The reader is referred to that page for a more detailed review) on how OCR techniques have evolved from the earliest stage (mechanical device) to special-purpose reading machines and to personal computer software. Section “Summary of the State-of-the-Art” summarizes the state of the art in machine-printed character recognition. Sections “Segmentation and Preprocessing”, “Isolated Character Recognition”, and “Word Recognition” describe core technologies including binarization, document image preprocessing, page segmentation, feature extraction, character classification, and language modeling that have been developed for modern character recognition systems. Section “Systems and Applications” introduces available machine-printed OCR systems and applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 549.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 549.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Schantz HF (1982) The history of OCR, optical character recognition. Recognition Technologies Users Association, Manchester Center

    Google Scholar 

  2. Shahi M, Ahlawat AK, Pandey BN (2012) Literature survey on offline recognition of handwritten Hindi curve script using ANN approach. Int J Sci Res Publ 2(5):362–367

    Google Scholar 

  3. Niblack W (1986) An introduction to digital image processing. Prentice Hall, Englewood Cliffs

    Google Scholar 

  4. Sauvola J, Pietikainen M (2000) Adaptive document image binarization. Pattern Recognit 33(2):225–236

    Article  Google Scholar 

  5. Kimura F, Takashina K, Tsuruoka S, Miyake Y (1987) Modified quadratic discriminant functions and the application to Chinese character recognition. IEEE Trans Pattern Anal Mach Intell 9(1):149–153

    Article  Google Scholar 

  6. Kato N, Suzuki M, Omachi S, Aso H, Nemoto Y (1999) A handwritten character recognition system using directional element feature and asymmetric Mahalanobis distance. IEEE Trans Pattern Anal Mach Intell 21(3):258–262

    Article  Google Scholar 

  7. Natarajan P, Lu Z, Schwartz R, Bazzi I, Makhoul J (2001) Multilingual machine printed OCR. In: Bunke H, Caelli T (eds) Hidden Markov models – applications in computer vision. Series in machine perception and artificial intelligence, vol 45. World Scientific Publishing Company, River Edge, NJ, USA

    Google Scholar 

  8. Otsu N (1979) A threshold selection method from Gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66

    Article  Google Scholar 

  9. Kise K, Sato A, Iwata M (1998) Segmentation of page images using the area Voronoi diagram. Comput Vis Image Underst 70:370–382

    Article  Google Scholar 

  10. O’Gorman L (1993) Document spectrum for page layout analysis. IEEE Trans Pattern Anal Mach Intell 15:1162–1173

    Article  Google Scholar 

  11. Mao S, Kanungo T (2001) Empirical performance evaluation methodology and its application to page segmentation algorithms. IEEE Trans Pattern Anal Mach Intell 23(3): 242–256

    Article  Google Scholar 

  12. Lu Y, Tan CL (2003) A nearest neighbor chain based approach to skew estimation in document images. Pattern Recognit Lett 24:2315–2323

    Article  Google Scholar 

  13. Kapoor R, Bagai D, Kamal TS (2004) A new algorithm for skew detection and correction. Pattern Recognit Lett 25:1215–1229

    Article  Google Scholar 

  14. Li S, Shen Q, Sun J (2007) Skew detection using wavelet decomposition and projection profile analysis. Pattern Recognit Lett 28:555–562

    Article  Google Scholar 

  15. Singh C, Bhatia N, Kaur A (2008) Hough transform based fast skew detection and accurate skew correction methods. Pattern Recognit Lett 41:3528–3546

    Article  Google Scholar 

  16. Zhang Z, Tan CL (2001) Recovery of distorted document images from bound volumes. In: Proceedings of the 6th international conference on document analysis and recognition, Seattle, pp 429–433

    Google Scholar 

  17. Cao H, Ding X, Liu C (2003) A cylindrical surface model to rectify the bound document image. In: Proceedings of the 9th IEEE international conference on computer vision, Nice, vol 1, pp 228–233

    Google Scholar 

  18. Brown MS, Tsoi Y-C (2006) Geometric and shading correction for images of printed materials using boundary. IEEE Trans Image Process 15(7):1544–1554

    Article  Google Scholar 

  19. Liang J, DeMenthon D, Doermann DS (2008) Geometric rectification of camera-captured document images. IEEE Trans Pattern Anal Mach Intell 30(4):591–605

    Article  Google Scholar 

  20. Zhang L, Yip AM, Brown MS, Tan CL (2009) A unified framework for document restoration using inpainting and shape-from-shading. Pattern Recognit 42(12):2961–2978

    Article  Google Scholar 

  21. Miyoshi T, Nagasaki T, Shinjo H (2009) Character normalization methods using moments of gradient features and normalization cooperated feature extraction. In: Proceedings of the Chinese conference on pattern recognition, Nanjing

    Google Scholar 

  22. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other Kernel-based learning methods. Cambridge University Press, Cambridge/New York

    Book  Google Scholar 

  23. Favata JT, Srikantan G, Srihari SN (1994) Handprinted character/digit recognition using a multiple feature/resolution philosophy. In: Proceedings of the fourth international workshop frontiers in handwriting recognition, Taipei, pp 57–66

    Google Scholar 

  24. Huo Q, Ge Y, Feng Z-D (2001) High performance Chinese OCR based on Gabor features, discriminative feature extraction and model training. Proc Int Conf Acoust Speech Signal Process 3:1517–1520

    Google Scholar 

  25. Wang X, Ding X, Liu C (2005) Gabor filters-based feature extraction for character recognition. Pattern Recognit 38(3):369–379

    Article  Google Scholar 

  26. Chen J, Cao H, Prasad R, Bhardwaj A, Natarajan P (2010) Gabor features for offline Arabic handwriting recognition. In: Proceedings of the document analysis systems, Boston, pp 53–58

    Google Scholar 

  27. Lu Y (1993) On the segmentation of touching characters. In: Proceedings of the international conference on document analysis and recognition, Tsukuba, pp 440–443

    Google Scholar 

  28. Kovalevsky VA (1968) Character readers and pattern recognition. Spartan Books, Washington, DC

    Google Scholar 

  29. Casey RG, Nagy G (1982) Recursive segmentation and classification of composite patterns. In: Proceedings of the 6th international conference on pattern recognition, Munich

    Google Scholar 

  30. Fujisawa H, Nakano Y, Kurino K (1992) Segmentation methods for character recognition: from segmentation to document structure analysis. Proc IEEE 80(8):1079–1092

    Article  Google Scholar 

  31. Favata JT, Srihari SN (1992) Recognition of general handwritten words using a hypothesis generation and reduction methodology. In: Proceedings of the fifth USPS advanced technology conference, Washington, DC

    Google Scholar 

  32. Sinha RMK, Prasada B, Houle G, Sabourin M (1993) Hybrid recognition with string matching. IEEE Trans Pattern Anal Mach Intell 15(10):915–925

    Article  Google Scholar 

  33. Natarajan P, Subramanian K, Bhardwaj A, Prasad R (2009) Stochastic segment modeling for offline handwriting recognition. In: Proceedings of the international conference on document analysis and recognition, Barcelona, pp 971–975

    Google Scholar 

  34. Fink GA (2007) Markov models for pattern recognition: from theory to applications. Springer, Berlin/Germany, Heidelberg/Germany, New York/USA

    Google Scholar 

  35. Comparison of optical character recognition software, Wikipedia page. http://en.wikipedia.org/wiki/Comparison_of_optical_character_recognition_software

  36. Rice SV, Jenkins FR, Nartker TA (1995) The fourth annual test of OCR accuracy. In: Proceedings of the annual symposium on document analysis and information retrieval, Las Vegas, Nevada, USA

    Google Scholar 

Further Reading

  • Natarajan P, Lu Z, Schwartz R, Bazzi I, Makhoul J (2001) Multilingual machine printed OCR. Int J Pattern Recognit Artif Intell 15(1):43–63

    Article  Google Scholar 

  • Natarajan P, Subramanian K, Bhardwaj A, Prasad R (2009) Stochastic segment modeling for offline handwriting recognition. In: Proceedings of the international conference on document analysis and recognition, Barcelona, pp 971–975

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huaigu Cao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag London

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Cao, H., Natarajan, P. (2014). Machine-Printed Character Recognition. In: Doermann, D., Tombre, K. (eds) Handbook of Document Image Processing and Recognition. Springer, London. https://doi.org/10.1007/978-0-85729-859-1_44

Download citation

Publish with us

Policies and ethics