skip to main content
10.1145/2683483.2683554acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicvgipConference Proceedingsconference-collections
research-article

Scene Text Analysis using Deep Belief Networks

Published: 14 December 2014 Publication History

Abstract

This paper focuses on the recognition and analysis of text embedded in scene images using Deep learning. The proposed approach uses deep learning architectures for automated higher order feature extraction, thereby improving classification accuracies in comparison to handcrafted features used traditionally. Exhaustive experiments have been performed with Deep Belief Networks and Convolutional Deep Neural Networks with varied training algorithms like Contrastive Divergence, De-noising Score Matching and supervised learning algorithms such as logistic regression and Multi-layer perceptron. These algorithms have been validated on 4 standard datasets: Chars 74K English, Chars 74K Kannada, ICDAR 2003 Robust OCR dataset and SVT-CHAR dataset. The proposed network achieves improved recognition results on Chars74K English, Kannada and SVT-CHAR dataset in comparison to the state-of-art algorithms. For ICDAR 2003 dataset, the proposed network is marginally worse in comparison to Deep Convolutional networks. Although deep belief networks have been considerably used for several applications, according to the knowledge of the authors, this is the first paper to report scene text recognition using deep belief networks.

References

[1]
Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798–1828, 2013.
[2]
Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 153–160. MIT Press, 2007.
[3]
A. Bissacco, M. Cummins, Y. Netzer, and H. Neven. Photoocr: Reading text in uncontrolled conditions. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 785–792, Dec 2013.
[4]
K. Cho, A. Ilin, and T. Raiko. Improved learning of gaussian-bernoulli restricted boltzmann machines. 6791:10–17, 2011.
[5]
A. Coates, B. Carpenter, C. Case, S. Satheesh, B. Suresh, T. Wang, D. J. Wu, and A. Y. Ng. Text detection and character recognition in scene images with unsupervised feature learning. In ICDAR, pages 440–445. IEEE Computer Society, 2011.
[6]
G. Dahl, D. Yu, L. Deng, and A. Acero. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. Audio, Speech, and Language Processing, IEEE Transactions on, 20(1):30–42, jan. 2012.
[7]
T. E. de Campos, B. R. Babu, and M. Varma. Character recognition in natural images. In Proceedings of the International Conference on Computer Vision Theory and Applications, Lisbon, Portugal, February 2009.
[8]
A. Graves, A. Mohamed, and G. E. Hinton. Speech recognition with deep recurrent neural networks. CoRR, abs/1303.5778, 2013.
[9]
G. E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Comput., 14(8):1771–1800, Aug. 2002.
[10]
G. E. Hinton, L. Deng, D. Yu, G. E. Dahl, A. rahman Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag., 29(6):82–97, 2012.
[11]
G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural Comput., 18(7):1527–1554, jul 2006.
[12]
G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, July 2006.
[13]
T. Judd, K. Ehinger, F. Durand, and A. Torralba. Learning to predict where humans look. In IEEE International Conference on Computer Vision (ICCV), 2009.
[14]
D. Karatzas, S. R. Mestre, J. Mas, F. Nourbakhsh, and P. P. Roy. ICDAR 2011 robust reading competition - challenge 1: Reading text in born-digital images (web and email). In ICDAR, pages 1485–1490, 2011.
[15]
D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Bigorda, S. R. Mestre, J. Mas, D. F. Mota, J. Almazán, and L. de las Heras. ICDAR 2013 robust reading competition. In ICDAR, pages 1484–1493, 2013.
[16]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012.
[17]
F. Lauer, C. Y. Suen, and G. Bloch. A trainable feature extractor for handwritten digit recognition. Pattern Recogn., 40(6):1816–1824, June 2007.
[18]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, November 1998.
[19]
A. Mishra, K. Alahari, and C. Jawahar. Top-down and bottom-up cues for scene text recognition. In CVPR '12: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2687–2694. IEEE Computer Society, 2012.
[20]
A. Mishra, K. Alahari, and C. V. Jawahar. Image retrieval using textual cues. In Proceedings of IEEE International Conference on Computer Vision, 2013.
[21]
L. Neumann and J. Matas. A method for text localization and recognition in real-world images. In Proceedings of the 10th Asian Conference on Computer Vision - Volume Part III, ACCV'10, pages 770–783. Springer-Verlag, 2011.
[22]
Z. Saãrdane and C. Garcia. Automatic scene text recognition using a convolutional neural network. In International Workshop on Camera-Based Document Analysis and Recognition (CBDAR 2007), pages 100–106, Sept. 2007.
[23]
J. Schmidhuber. Multi-column deep neural networks for image classification. In CVPR, pages 3642–3649. IEEE Computer Society, 2012.
[24]
A. Shahab, F. Shafait, and A. Dengel. Icdar 2011 robust reading competition challenge 2: Reading text in scene images. In Proceedings of the 2011 International Conference on Document Analysis and Recognition, ICDAR '11, pages 1491–1496, 2011.
[25]
G. W. Taylor, G. E. Hinton, and S. T. Roweis. Modeling human motion using binary latent variables. In NIPS, pages 1345–1352, 2006.
[26]
P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol. Extracting and composing robust features with denoising autoencoders. In ICML, pages 1096–1103. ACM, 2008.
[27]
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res., 11:3371–3408, Dec. 2010.
[28]
L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, and R. Fergus. Regularization of neural networks using dropconnect. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), volume 28, pages 1058–1066. JMLR Workshop and Conference Proceedings, May 2013.
[29]
K. Wang, B. Babenko, and S. Belongie. End-to-end scene text recognition. In IEEE International Conference on Computer Vision (ICCV), 2011.
[30]
K. Wang and S. Belongie. Word spotting in the wild. In European Conference on Computer Vision (ECCV), Sept. 2010.
[31]
T. Wang, D. Wu, A. Coates, and A. Ng. End-to-end text recognition with convolutional neural networks. In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 3304–3308, Nov 2012.
[32]
J. J. Weinman, E. G. Learned-Miller, and A. R. Hanson. Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans. Pattern Anal. Mach. Intell., 31(10):1733–1746, 2009.
[33]
Z. Zhang, P. Sturgess, S. Sengupta, N. Crook, and P. H. S. Torr. Efficient discriminative learning of parametric nearest neighbor classifiers. In CVPR, pages 2232–2239. IEEE, 2012.

Cited By

View all
  • (2024)An acoustic emission identification model for train axle fatigue cracks based on deep belief networkMeasurement Science and Technology10.1088/1361-6501/ad3b3035:7(076125)Online publication date: 22-Apr-2024
  • (2023)Kannada Word Detection in Heterogeneous Scene Images2023 10th International Conference on Signal Processing and Integrated Networks (SPIN)10.1109/SPIN57001.2023.10117096(379-383)Online publication date: 23-Mar-2023
  • (2021)An intelligent approach for automated argument based legal text recognition and summarization using machine learningJournal of Intelligent & Fuzzy Systems10.3233/JIFS-189867(1-10)Online publication date: 27-Mar-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICVGIP '14: Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing
December 2014
692 pages
ISBN:9781450330619
DOI:10.1145/2683483
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 December 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deep Belief Networks
  2. Deep Convolutional Neural Network
  3. Restricted Boltzmann Machines
  4. Scene Text Recognition

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICVGIP '14

Acceptance Rates

Overall Acceptance Rate 95 of 286 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An acoustic emission identification model for train axle fatigue cracks based on deep belief networkMeasurement Science and Technology10.1088/1361-6501/ad3b3035:7(076125)Online publication date: 22-Apr-2024
  • (2023)Kannada Word Detection in Heterogeneous Scene Images2023 10th International Conference on Signal Processing and Integrated Networks (SPIN)10.1109/SPIN57001.2023.10117096(379-383)Online publication date: 23-Mar-2023
  • (2021)An intelligent approach for automated argument based legal text recognition and summarization using machine learningJournal of Intelligent & Fuzzy Systems10.3233/JIFS-189867(1-10)Online publication date: 27-Mar-2021
  • (2021)Arabic and Latin Scene Text Recognition by Combining Handcrafted and Deep-Learned FeaturesArabian Journal for Science and Engineering10.1007/s13369-021-06311-147:8(9727-9740)Online publication date: 29-Nov-2021
  • (2019)A Novel Dataset for English-Arabic Scene Text Recognition (EASTR)-42K and Its Evaluation Using Invariant Feature Extraction on Detected Extremal RegionsIEEE Access10.1109/ACCESS.2019.28958767(19801-19820)Online publication date: 2019
  • (2016)A memory efficient DNA sequence alignment technique using pointing matrix2016 IEEE Region 10 Conference (TENCON)10.1109/TENCON.2016.7848720(3559-3562)Online publication date: Nov-2016
  • (2016)A hardware-based high-throughput DNA sequence alignment scheme2016 IEEE Annual India Conference (INDICON)10.1109/INDICON.2016.7838990(1-6)Online publication date: Dec-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media