Recognizing Natural Scene Characters by Convolutional Neural Network and Bimodal Image Enhancement

Zhu, Yuanping; Sun, Jun; Naoi, Satoshi

doi:10.1007/978-3-642-29364-1_6

Yuanping Zhu¹⁸,
Jun Sun¹⁹ &
Satoshi Naoi¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7139))

Included in the following conference series:

International Workshop on Camera-Based Document Analysis and Recognition

1171 Accesses
8 Citations

Abstract

In this paper, a natural scene character recognition method using convolutional neural network(CNN) and bimodal image enhancement is proposed. CNN based grayscale character recognizer has strong tolerance to degradations in natural scene images. Since character image is bimodal pattern image in essence, bimodal image enhancement is adopted to improve the performance of CNN classifier. Firstly, a maximum separability based color-to-gray method is used to strengthen the discriminative power in grayscale image space. Secondly, grayscale distribution normalization based on histogram alignment is performed. Through increasing the data consistency among grayscale training and test samples, it leads to a better CNN classifier. Thirdly, a shape holding grayscale character image normalization is adopted. Based on these measures, a high performance natural scene character recognizer is constructed. The recognition rate of 85.96% on ICDAR 2003 robust OCR dataset is higher than existing works, which verified the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: 7th International Conference on Document Analysis and Recognition, Edinburgh, Scotland, vol. 2, pp. 682–687 (2003)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. IEEE 86, 2278–2324 (1998)
Article Google Scholar
Simard, P.Y., Steinkraus, D., Platt, J.C.: Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis. In: 7th International Conference on Document Analysis and Recognition, pp. 958–962 (2003)
Google Scholar
Weinman, J.J., Learned-Miller, E., Hanson, A.R.: Scene Text Recognition Using Similarity and a Lexicon with Sparse Belief Propagation. IEEE Trans. on Pattern Analysis and Machine Intelligence 10(31), 1733–1746 (2009)
Article Google Scholar
Yokobayashi, M., Wakahara, T.: Segmentation and recognition of characters in scene images using selective binarization in color space and gat correlation. In: 8th International Conference on Document Analysis and Recognition, vol. 1, pp. 167–171 (2005)
Google Scholar
Yokobayashi, M., Wakahara, T.: Binarization and recognition of degraded characters using a maximum separability axis in color space and gat correlation. In: 18th International Conference on Pattern Recognition, Hongkong, China, vol. 2, pp. 885–888 (2006)
Google Scholar
Clark, P., Mirmehdi, M.: Recognising text in real scenes. International Journal Document Analysis and Recognition 4(4), 243–257 (2004)
Article Google Scholar
Chen, D., Odobez, J., Bourlard, H.: Text detection and recognition in images and video frames. Pattern Recognition 3(37), 595–608 (2004)
Article Google Scholar
Kopf, S., Haenselmann, T., Effelsberg, W.: Robust character recognition in low-resolution images and videos. Technical report, Department for Mathematics and Computer Science, University of Mannheim (2005)
Google Scholar
Sun, J., Hotta, Y., Katsuyama, Y., Naoi, S.: Camera based Degraded Text Recognition Using Grayscale Feature. In: 8th International Conference on Document Analysis and Recognition, pp. 182–186 (2005)
Google Scholar
de Campos, T., Babu, B., Varma, M.: Character Recognition in Natural Images. In: International Conference on Computer Vision Theory and Applications, Lisbon, Portugal (2009)
Google Scholar
Saidane, Z., Garcia, C.: Automatic scene text recognition using a convolutional neural network. In: 2nd International Workshop on Camera-Based Document Analysis and Recognition, pp. 100–106 (2007)
Google Scholar
Jacobs, C., Simard, P.Y., Viola, P., Rinker, J.: Text Recognition of Low-resolution Document Images. In: 8th International Conference on Document Analysis and Recognition (ICDAR 2005), pp. 695–699 (2005)
Google Scholar
Deng, H., Stathopoulos, G., Suen, C.Y.: Error-Correcting Output Coding for the Convolutional Neural Network for Optical Character Recognition. In: 10th International Conference on Document Analysis and Recognition, Barcelona, Spain, pp. 581–585 (2009)
Google Scholar
Garcia, C., Delakis, M.: Convolutional Face Finder: A Neural Architecture for Fast and Robust Face Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(11), 1408–1423 (2004)
Article Google Scholar
Otsu, N.: A Thresholding Selection Method from Gray-level Histogram. IEEE Transactions on System, Man, and Cybernetics 9(1), 62–66 (1978)
Google Scholar
LeCun, Y.: The MNIST database of handwriting digits, http://yann.lecun.com/exdb/mnist

Download references

Author information

Authors and Affiliations

Department of Computer Science, Tianjin Normal University, Tianjin, China
Yuanping Zhu
Fujitsu R&D Center Co. Ltd., Beijing, China
Jun Sun & Satoshi Naoi

Authors

Yuanping Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Sun
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Naoi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Engineering, Dept. of Computer Science and Intelligent Systems, Osaka Prefecture University, 1-1 Gakuencho, Naka Sakai, 599-8531, Osaka, Japan
Masakazu Iwamura
German Research Center for Artificial Intelligence, Multimedia Analysis and Data Mining Competence Center, Trippstadter Str. 122, 67663, Kaiserslautern, Germany
Faisal Shafait

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, Y., Sun, J., Naoi, S. (2012). Recognizing Natural Scene Characters by Convolutional Neural Network and Bimodal Image Enhancement. In: Iwamura, M., Shafait, F. (eds) Camera-Based Document Analysis and Recognition. CBDAR 2011. Lecture Notes in Computer Science, vol 7139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29364-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-29364-1_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29363-4
Online ISBN: 978-3-642-29364-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics