research-article

Scene Text Analysis using Deep Belief Networks

Authors:

Santanu ChaudhuryAuthors Info & Claims

ICVGIP '14: Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing

Article No.: 71, Pages 1 - 8

https://doi.org/10.1145/2683483.2683554

Published: 14 December 2014 Publication History

Abstract

This paper focuses on the recognition and analysis of text embedded in scene images using Deep learning. The proposed approach uses deep learning architectures for automated higher order feature extraction, thereby improving classification accuracies in comparison to handcrafted features used traditionally. Exhaustive experiments have been performed with Deep Belief Networks and Convolutional Deep Neural Networks with varied training algorithms like Contrastive Divergence, De-noising Score Matching and supervised learning algorithms such as logistic regression and Multi-layer perceptron. These algorithms have been validated on 4 standard datasets: Chars 74K English, Chars 74K Kannada, ICDAR 2003 Robust OCR dataset and SVT-CHAR dataset. The proposed network achieves improved recognition results on Chars74K English, Kannada and SVT-CHAR dataset in comparison to the state-of-art algorithms. For ICDAR 2003 dataset, the proposed network is marginally worse in comparison to Deep Convolutional networks. Although deep belief networks have been considerably used for several applications, according to the knowledge of the authors, this is the first paper to report scene text recognition using deep belief networks.

References

[1]

Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798–1828, 2013.

Digital Library

[2]

Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 153–160. MIT Press, 2007.

[3]

A. Bissacco, M. Cummins, Y. Netzer, and H. Neven. Photoocr: Reading text in uncontrolled conditions. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 785–792, Dec 2013.

Digital Library

[4]

K. Cho, A. Ilin, and T. Raiko. Improved learning of gaussian-bernoulli restricted boltzmann machines. 6791:10–17, 2011.

Digital Library

[5]

A. Coates, B. Carpenter, C. Case, S. Satheesh, B. Suresh, T. Wang, D. J. Wu, and A. Y. Ng. Text detection and character recognition in scene images with unsupervised feature learning. In ICDAR, pages 440–445. IEEE Computer Society, 2011.

Digital Library

[6]

G. Dahl, D. Yu, L. Deng, and A. Acero. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. Audio, Speech, and Language Processing, IEEE Transactions on, 20(1):30–42, jan. 2012.

Digital Library

[7]

T. E. de Campos, B. R. Babu, and M. Varma. Character recognition in natural images. In Proceedings of the International Conference on Computer Vision Theory and Applications, Lisbon, Portugal, February 2009.

[8]

A. Graves, A. Mohamed, and G. E. Hinton. Speech recognition with deep recurrent neural networks. CoRR, abs/1303.5778, 2013.

[9]

G. E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Comput., 14(8):1771–1800, Aug. 2002.

Digital Library

[10]

G. E. Hinton, L. Deng, D. Yu, G. E. Dahl, A. rahman Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag., 29(6):82–97, 2012.

[11]

G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural Comput., 18(7):1527–1554, jul 2006.

Digital Library

[12]

G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, July 2006.

[13]

T. Judd, K. Ehinger, F. Durand, and A. Torralba. Learning to predict where humans look. In IEEE International Conference on Computer Vision (ICCV), 2009.

[14]

D. Karatzas, S. R. Mestre, J. Mas, F. Nourbakhsh, and P. P. Roy. ICDAR 2011 robust reading competition - challenge 1: Reading text in born-digital images (web and email). In ICDAR, pages 1485–1490, 2011.

Digital Library

[15]

D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Bigorda, S. R. Mestre, J. Mas, D. F. Mota, J. Almazán, and L. de las Heras. ICDAR 2013 robust reading competition. In ICDAR, pages 1484–1493, 2013.

Digital Library

[16]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012.

[17]

F. Lauer, C. Y. Suen, and G. Bloch. A trainable feature extractor for handwritten digit recognition. Pattern Recogn., 40(6):1816–1824, June 2007.

Digital Library

[18]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, November 1998.

[19]

A. Mishra, K. Alahari, and C. Jawahar. Top-down and bottom-up cues for scene text recognition. In CVPR '12: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2687–2694. IEEE Computer Society, 2012.

Digital Library

[20]

A. Mishra, K. Alahari, and C. V. Jawahar. Image retrieval using textual cues. In Proceedings of IEEE International Conference on Computer Vision, 2013.

Digital Library

[21]

L. Neumann and J. Matas. A method for text localization and recognition in real-world images. In Proceedings of the 10th Asian Conference on Computer Vision - Volume Part III, ACCV'10, pages 770–783. Springer-Verlag, 2011.

Digital Library

[22]

Z. Saãrdane and C. Garcia. Automatic scene text recognition using a convolutional neural network. In International Workshop on Camera-Based Document Analysis and Recognition (CBDAR 2007), pages 100–106, Sept. 2007.

[23]

J. Schmidhuber. Multi-column deep neural networks for image classification. In CVPR, pages 3642–3649. IEEE Computer Society, 2012.

Digital Library

[24]

A. Shahab, F. Shafait, and A. Dengel. Icdar 2011 robust reading competition challenge 2: Reading text in scene images. In Proceedings of the 2011 International Conference on Document Analysis and Recognition, ICDAR '11, pages 1491–1496, 2011.

Digital Library

[25]

G. W. Taylor, G. E. Hinton, and S. T. Roweis. Modeling human motion using binary latent variables. In NIPS, pages 1345–1352, 2006.

[26]

P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol. Extracting and composing robust features with denoising autoencoders. In ICML, pages 1096–1103. ACM, 2008.

Digital Library

[27]

P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res., 11:3371–3408, Dec. 2010.

Digital Library

[28]

L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, and R. Fergus. Regularization of neural networks using dropconnect. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), volume 28, pages 1058–1066. JMLR Workshop and Conference Proceedings, May 2013.

[29]

K. Wang, B. Babenko, and S. Belongie. End-to-end scene text recognition. In IEEE International Conference on Computer Vision (ICCV), 2011.

Digital Library

[30]

K. Wang and S. Belongie. Word spotting in the wild. In European Conference on Computer Vision (ECCV), Sept. 2010.

Digital Library

[31]

T. Wang, D. Wu, A. Coates, and A. Ng. End-to-end text recognition with convolutional neural networks. In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 3304–3308, Nov 2012.

[32]

J. J. Weinman, E. G. Learned-Miller, and A. R. Hanson. Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans. Pattern Anal. Mach. Intell., 31(10):1733–1746, 2009.

Digital Library

[33]

Z. Zhang, P. Sturgess, S. Sengupta, N. Crook, and P. H. S. Torr. Efficient discriminative learning of parametric nearest neighbor classifiers. In CVPR, pages 2232–2239. IEEE, 2012.

Digital Library

Cited By

Lin LTang XZhu XYu XBi T(2024)An acoustic emission identification model for train axle fatigue cracks based on deep belief networkMeasurement Science and Technology10.1088/1361-6501/ad3b3035:7(076125)Online publication date: 22-Apr-2024
https://doi.org/10.1088/1361-6501/ad3b30
Shikha NPranav RSingh NUmadevi VHussain M(2023)Kannada Word Detection in Heterogeneous Scene Images2023 10th International Conference on Signal Processing and Integrated Networks (SPIN)10.1109/SPIN57001.2023.10117096(379-383)Online publication date: 23-Mar-2023
https://doi.org/10.1109/SPIN57001.2023.10117096
Sil RAlpana Roy ADasmahapatra MDhali D(2021)An intelligent approach for automated argument based legal text recognition and summarization using machine learningJournal of Intelligent & Fuzzy Systems10.3233/JIFS-189867(1-10)Online publication date: 27-Mar-2021
https://doi.org/10.3233/JIFS-189867
Show More Cited By

Index Terms

Scene Text Analysis using Deep Belief Networks

Recommendations

Text classification based on deep belief network and softmax regression

In this paper, we propose a novel hybrid text classification model based on deep belief network and softmax regression. To solve the sparse high-dimensional matrix computation problem of texts data, a deep belief network is introduced. After the feature ...
Scene text recognition using residual convolutional recurrent neural network

Text is a significant tool for human communication, and text recognition in scene images becomes more and more important. In this paper, we propose a residual convolutional recurrent neural network for solving the task of scene text recognition. The ...
Handwritten Hangul recognition using deep convolutional neural networks

In spite of the advances in recognition technology, handwritten Hangul recognition (HHR) remains largely unsolved due to the presence of many confusing characters and excessive cursiveness in Hangul handwritings. Even the best existing recognizers do ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICVGIP '14: Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing

December 2014

692 pages

ISBN:9781450330619

DOI:10.1145/2683483

General Chairs:
A. G. Ramakrishnan
IISc, Bangalore
,
Jitendra Malik
University California, Berkeley
,
Program Chairs:
Alex Efros
UC-Berkeley
,
C. V. Jawahar
IIIT Hyderabad
,
Manik Varma
Microsoft Research

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 December 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICVGIP '14

ICVGIP '14: Indian Conference on Computer Vision Graphics and Image Processing

December 14 - 18, 2014

Bangalore, India

Acceptance Rates

Overall Acceptance Rate 95 of 286 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
203
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lin LTang XZhu XYu XBi T(2024)An acoustic emission identification model for train axle fatigue cracks based on deep belief networkMeasurement Science and Technology10.1088/1361-6501/ad3b3035:7(076125)Online publication date: 22-Apr-2024
https://doi.org/10.1088/1361-6501/ad3b30
Shikha NPranav RSingh NUmadevi VHussain M(2023)Kannada Word Detection in Heterogeneous Scene Images2023 10th International Conference on Signal Processing and Integrated Networks (SPIN)10.1109/SPIN57001.2023.10117096(379-383)Online publication date: 23-Mar-2023
https://doi.org/10.1109/SPIN57001.2023.10117096
Sil RAlpana Roy ADasmahapatra MDhali D(2021)An intelligent approach for automated argument based legal text recognition and summarization using machine learningJournal of Intelligent & Fuzzy Systems10.3233/JIFS-189867(1-10)Online publication date: 27-Mar-2021
https://doi.org/10.3233/JIFS-189867
Tounsi MMoalla IPal UAlimi A(2021)Arabic and Latin Scene Text Recognition by Combining Handcrafted and Deep-Learned FeaturesArabian Journal for Science and Engineering10.1007/s13369-021-06311-147:8(9727-9740)Online publication date: 29-Nov-2021
https://doi.org/10.1007/s13369-021-06311-1
Ahmed SNaz SRazzak MYusof R(2019)A Novel Dataset for English-Arabic Scene Text Recognition (EASTR)-42K and Its Evaluation Using Invariant Feature Extraction on Detected Extremal RegionsIEEE Access10.1109/ACCESS.2019.28958767(19801-19820)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2895876
Ray SBanerjee ADatta AGhosh S(2016)A memory efficient DNA sequence alignment technique using pointing matrix2016 IEEE Region 10 Conference (TENCON)10.1109/TENCON.2016.7848720(3559-3562)Online publication date: Nov-2016
https://doi.org/10.1109/TENCON.2016.7848720
Ray SSrivastava NGhosh S(2016)A hardware-based high-throughput DNA sequence alignment scheme2016 IEEE Annual India Conference (INDICON)10.1109/INDICON.2016.7838990(1-6)Online publication date: Dec-2016
https://doi.org/10.1109/INDICON.2016.7838990

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten