Abstract
This chapter reviews salient advances in techniques for machine-printed character recognition. Section “Overview” provides a historical perspective (The description of the historical evolution of OCR is based upon the Wikipedia entry for this topic: http://en.wikipedia.org/wiki/Optical_character_recognition. The reader is referred to that page for a more detailed review) on how OCR techniques have evolved from the earliest stage (mechanical device) to special-purpose reading machines and to personal computer software. Section “Summary of the State-of-the-Art” summarizes the state of the art in machine-printed character recognition. Sections “Segmentation and Preprocessing”, “Isolated Character Recognition”, and “Word Recognition” describe core technologies including binarization, document image preprocessing, page segmentation, feature extraction, character classification, and language modeling that have been developed for modern character recognition systems. Section “Systems and Applications” introduces available machine-printed OCR systems and applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Schantz HF (1982) The history of OCR, optical character recognition. Recognition Technologies Users Association, Manchester Center
Shahi M, Ahlawat AK, Pandey BN (2012) Literature survey on offline recognition of handwritten Hindi curve script using ANN approach. Int J Sci Res Publ 2(5):362–367
Niblack W (1986) An introduction to digital image processing. Prentice Hall, Englewood Cliffs
Sauvola J, Pietikainen M (2000) Adaptive document image binarization. Pattern Recognit 33(2):225–236
Kimura F, Takashina K, Tsuruoka S, Miyake Y (1987) Modified quadratic discriminant functions and the application to Chinese character recognition. IEEE Trans Pattern Anal Mach Intell 9(1):149–153
Kato N, Suzuki M, Omachi S, Aso H, Nemoto Y (1999) A handwritten character recognition system using directional element feature and asymmetric Mahalanobis distance. IEEE Trans Pattern Anal Mach Intell 21(3):258–262
Natarajan P, Lu Z, Schwartz R, Bazzi I, Makhoul J (2001) Multilingual machine printed OCR. In: Bunke H, Caelli T (eds) Hidden Markov models – applications in computer vision. Series in machine perception and artificial intelligence, vol 45. World Scientific Publishing Company, River Edge, NJ, USA
Otsu N (1979) A threshold selection method from Gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66
Kise K, Sato A, Iwata M (1998) Segmentation of page images using the area Voronoi diagram. Comput Vis Image Underst 70:370–382
O’Gorman L (1993) Document spectrum for page layout analysis. IEEE Trans Pattern Anal Mach Intell 15:1162–1173
Mao S, Kanungo T (2001) Empirical performance evaluation methodology and its application to page segmentation algorithms. IEEE Trans Pattern Anal Mach Intell 23(3): 242–256
Lu Y, Tan CL (2003) A nearest neighbor chain based approach to skew estimation in document images. Pattern Recognit Lett 24:2315–2323
Kapoor R, Bagai D, Kamal TS (2004) A new algorithm for skew detection and correction. Pattern Recognit Lett 25:1215–1229
Li S, Shen Q, Sun J (2007) Skew detection using wavelet decomposition and projection profile analysis. Pattern Recognit Lett 28:555–562
Singh C, Bhatia N, Kaur A (2008) Hough transform based fast skew detection and accurate skew correction methods. Pattern Recognit Lett 41:3528–3546
Zhang Z, Tan CL (2001) Recovery of distorted document images from bound volumes. In: Proceedings of the 6th international conference on document analysis and recognition, Seattle, pp 429–433
Cao H, Ding X, Liu C (2003) A cylindrical surface model to rectify the bound document image. In: Proceedings of the 9th IEEE international conference on computer vision, Nice, vol 1, pp 228–233
Brown MS, Tsoi Y-C (2006) Geometric and shading correction for images of printed materials using boundary. IEEE Trans Image Process 15(7):1544–1554
Liang J, DeMenthon D, Doermann DS (2008) Geometric rectification of camera-captured document images. IEEE Trans Pattern Anal Mach Intell 30(4):591–605
Zhang L, Yip AM, Brown MS, Tan CL (2009) A unified framework for document restoration using inpainting and shape-from-shading. Pattern Recognit 42(12):2961–2978
Miyoshi T, Nagasaki T, Shinjo H (2009) Character normalization methods using moments of gradient features and normalization cooperated feature extraction. In: Proceedings of the Chinese conference on pattern recognition, Nanjing
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other Kernel-based learning methods. Cambridge University Press, Cambridge/New York
Favata JT, Srikantan G, Srihari SN (1994) Handprinted character/digit recognition using a multiple feature/resolution philosophy. In: Proceedings of the fourth international workshop frontiers in handwriting recognition, Taipei, pp 57–66
Huo Q, Ge Y, Feng Z-D (2001) High performance Chinese OCR based on Gabor features, discriminative feature extraction and model training. Proc Int Conf Acoust Speech Signal Process 3:1517–1520
Wang X, Ding X, Liu C (2005) Gabor filters-based feature extraction for character recognition. Pattern Recognit 38(3):369–379
Chen J, Cao H, Prasad R, Bhardwaj A, Natarajan P (2010) Gabor features for offline Arabic handwriting recognition. In: Proceedings of the document analysis systems, Boston, pp 53–58
Lu Y (1993) On the segmentation of touching characters. In: Proceedings of the international conference on document analysis and recognition, Tsukuba, pp 440–443
Kovalevsky VA (1968) Character readers and pattern recognition. Spartan Books, Washington, DC
Casey RG, Nagy G (1982) Recursive segmentation and classification of composite patterns. In: Proceedings of the 6th international conference on pattern recognition, Munich
Fujisawa H, Nakano Y, Kurino K (1992) Segmentation methods for character recognition: from segmentation to document structure analysis. Proc IEEE 80(8):1079–1092
Favata JT, Srihari SN (1992) Recognition of general handwritten words using a hypothesis generation and reduction methodology. In: Proceedings of the fifth USPS advanced technology conference, Washington, DC
Sinha RMK, Prasada B, Houle G, Sabourin M (1993) Hybrid recognition with string matching. IEEE Trans Pattern Anal Mach Intell 15(10):915–925
Natarajan P, Subramanian K, Bhardwaj A, Prasad R (2009) Stochastic segment modeling for offline handwriting recognition. In: Proceedings of the international conference on document analysis and recognition, Barcelona, pp 971–975
Fink GA (2007) Markov models for pattern recognition: from theory to applications. Springer, Berlin/Germany, Heidelberg/Germany, New York/USA
Comparison of optical character recognition software, Wikipedia page. http://en.wikipedia.org/wiki/Comparison_of_optical_character_recognition_software
Rice SV, Jenkins FR, Nartker TA (1995) The fourth annual test of OCR accuracy. In: Proceedings of the annual symposium on document analysis and information retrieval, Las Vegas, Nevada, USA
Further Reading
Natarajan P, Lu Z, Schwartz R, Bazzi I, Makhoul J (2001) Multilingual machine printed OCR. Int J Pattern Recognit Artif Intell 15(1):43–63
Natarajan P, Subramanian K, Bhardwaj A, Prasad R (2009) Stochastic segment modeling for offline handwriting recognition. In: Proceedings of the international conference on document analysis and recognition, Barcelona, pp 971–975
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag London
About this entry
Cite this entry
Cao, H., Natarajan, P. (2014). Machine-Printed Character Recognition. In: Doermann, D., Tombre, K. (eds) Handbook of Document Image Processing and Recognition. Springer, London. https://doi.org/10.1007/978-0-85729-859-1_44
Download citation
DOI: https://doi.org/10.1007/978-0-85729-859-1_44
Published:
Publisher Name: Springer, London
Print ISBN: 978-0-85729-858-4
Online ISBN: 978-0-85729-859-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering