Abstract
In this paper, a natural scene character recognition method using convolutional neural network(CNN) and bimodal image enhancement is proposed. CNN based grayscale character recognizer has strong tolerance to degradations in natural scene images. Since character image is bimodal pattern image in essence, bimodal image enhancement is adopted to improve the performance of CNN classifier. Firstly, a maximum separability based color-to-gray method is used to strengthen the discriminative power in grayscale image space. Secondly, grayscale distribution normalization based on histogram alignment is performed. Through increasing the data consistency among grayscale training and test samples, it leads to a better CNN classifier. Thirdly, a shape holding grayscale character image normalization is adopted. Based on these measures, a high performance natural scene character recognizer is constructed. The recognition rate of 85.96% on ICDAR 2003 robust OCR dataset is higher than existing works, which verified the effectiveness of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: 7th International Conference on Document Analysis and Recognition, Edinburgh, Scotland, vol. 2, pp. 682–687 (2003)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. IEEE 86, 2278–2324 (1998)
Simard, P.Y., Steinkraus, D., Platt, J.C.: Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis. In: 7th International Conference on Document Analysis and Recognition, pp. 958–962 (2003)
Weinman, J.J., Learned-Miller, E., Hanson, A.R.: Scene Text Recognition Using Similarity and a Lexicon with Sparse Belief Propagation. IEEE Trans. on Pattern Analysis and Machine Intelligence 10(31), 1733–1746 (2009)
Yokobayashi, M., Wakahara, T.: Segmentation and recognition of characters in scene images using selective binarization in color space and gat correlation. In: 8th International Conference on Document Analysis and Recognition, vol. 1, pp. 167–171 (2005)
Yokobayashi, M., Wakahara, T.: Binarization and recognition of degraded characters using a maximum separability axis in color space and gat correlation. In: 18th International Conference on Pattern Recognition, Hongkong, China, vol. 2, pp. 885–888 (2006)
Clark, P., Mirmehdi, M.: Recognising text in real scenes. International Journal Document Analysis and Recognition 4(4), 243–257 (2004)
Chen, D., Odobez, J., Bourlard, H.: Text detection and recognition in images and video frames. Pattern Recognition 3(37), 595–608 (2004)
Kopf, S., Haenselmann, T., Effelsberg, W.: Robust character recognition in low-resolution images and videos. Technical report, Department for Mathematics and Computer Science, University of Mannheim (2005)
Sun, J., Hotta, Y., Katsuyama, Y., Naoi, S.: Camera based Degraded Text Recognition Using Grayscale Feature. In: 8th International Conference on Document Analysis and Recognition, pp. 182–186 (2005)
de Campos, T., Babu, B., Varma, M.: Character Recognition in Natural Images. In: International Conference on Computer Vision Theory and Applications, Lisbon, Portugal (2009)
Saidane, Z., Garcia, C.: Automatic scene text recognition using a convolutional neural network. In: 2nd International Workshop on Camera-Based Document Analysis and Recognition, pp. 100–106 (2007)
Jacobs, C., Simard, P.Y., Viola, P., Rinker, J.: Text Recognition of Low-resolution Document Images. In: 8th International Conference on Document Analysis and Recognition (ICDAR 2005), pp. 695–699 (2005)
Deng, H., Stathopoulos, G., Suen, C.Y.: Error-Correcting Output Coding for the Convolutional Neural Network for Optical Character Recognition. In: 10th International Conference on Document Analysis and Recognition, Barcelona, Spain, pp. 581–585 (2009)
Garcia, C., Delakis, M.: Convolutional Face Finder: A Neural Architecture for Fast and Robust Face Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(11), 1408–1423 (2004)
Otsu, N.: A Thresholding Selection Method from Gray-level Histogram. IEEE Transactions on System, Man, and Cybernetics 9(1), 62–66 (1978)
LeCun, Y.: The MNIST database of handwriting digits, http://yann.lecun.com/exdb/mnist
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhu, Y., Sun, J., Naoi, S. (2012). Recognizing Natural Scene Characters by Convolutional Neural Network and Bimodal Image Enhancement. In: Iwamura, M., Shafait, F. (eds) Camera-Based Document Analysis and Recognition. CBDAR 2011. Lecture Notes in Computer Science, vol 7139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29364-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-29364-1_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29363-4
Online ISBN: 978-3-642-29364-1
eBook Packages: Computer ScienceComputer Science (R0)