skip to main content
10.1145/3095713.3095724acmotherconferencesArticle/Chapter ViewAbstractPublication PagescbmiConference Proceedingsconference-collections
research-article

A Robust Ensemble of ResNets for Character Level End-to-end Text Detection in Natural Scene Images

Authors Info & Claims
Published:19 June 2017Publication History

ABSTRACT

Detecting text in natural scene images is a challenging task. In this paper, we propose a character-level end-to-end text detection algorithm in natural scene images. In general, text detection tasks are categorized into three parts: text localization, text segmentation, and text recognition. The proposed method aims not only to localize but also to recognize text. To do these tasks successfully, the proposed method consists of four steps: character candidate patch extraction, patch classification using ensemble of ResNets, non-character region elimination, and character region grouping via self-tuning spectral clustering. In the character candidate patch extraction step, character candidate patches are extracted from the image by using both edge information from multi-scale images and Maximally Stable Extremal Regions (MSERs). Then each patch is classified into either character patch or non-character patch by using the deep network that is composed of three ResNets with different hyper-parameters. Text regions are determined by filtering out non-character patches. In order to make further reduction of classification errors, character characteristics are employed to compensate classification results of the ensemble of ResNets. To evaluate the text detection performance, character regions are grouped via self-tuning spectral clustering. The proposed method shows competitive performance on the ICDAR 2013 dataset.

References

  1. H. Chen, S. S. Tsai, G. Schroth, D. M. Chen, R. Grzeszczuk, and B. Girod. 2011. Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In 18th IEEE International Conference on Image Processing. IEEE. Google ScholarGoogle ScholarCross RefCross Ref
  2. H. Cho, M. Sung, and B. Jun. 2016. Canny Text Detector: Fast and Robust Scene Text Localization Algorithm. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarCross RefCross Ref
  3. B. Epshtein, E. Ofek, and Y. Wexler. 2010. Detecting text in natural scenes with stroke width transform. In Computer Vision and Pattern Recognition (CVPR), IEEE Conference on. IEEE. Google ScholarGoogle ScholarCross RefCross Ref
  4. K. He, X. Zhang, S. Ren, and J. Sun. 2015. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015).Google ScholarGoogle Scholar
  5. T. He, W. Huang, Y. Qiao, and J. Yao. 2016. Text-attentional convolutional neural network for scene text detection. IEEE Transactions on Image Processing 25, 6 (2016), 2529--2541. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W. Huang, Y. Qiao, and X. Tang. 2014. Robust scene text detection with convolution neural network induced mser trees. In European Conference on Computer Vision. Springer. Google ScholarGoogle ScholarCross RefCross Ref
  7. D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Bigorda, S. R. Mestre, J. Mas, David F. M., J. A. Almazan, and L. P. de las Heras. 2013. ICDAR 2013 robust reading competition. In 12th International Conference on Document Analysis and Recognition. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. I. Koo and D. H. Kim. 2013. Scene text detection via connected component clustering and nontext filtering. IEEE Transactions on Image Processing 22, 6 (2013), 2296--2305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Matas, O. Chum, M. Urban, and T. Pajdla. 2004. Robust wide-baseline stereo from maximally stable extremal regions. Image and vision computing 22, 10 (2004), 761--767. Google ScholarGoogle ScholarCross RefCross Ref
  10. K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  11. T. Wang, D. J. Wu, A. Coates, and A. Y. Ng. 2012. End-to-end text recognition with convolutional neural networks. In Pattern Recognition (ICPR), 21st International Conference on. IEEE.Google ScholarGoogle Scholar
  12. L. Xu, C. Lu, Y. Xu, and J. Jia. 2011. Image smoothing via L0 gradient minimization. In ACM Transactions on Graphics (TOG), Vol. 30. ACM, 174.Google ScholarGoogle Scholar
  13. X. C. Yin, X. Yin, K. Huang, and H. W. Hao. 2014. Robust text detection in natural scene images. IEEE transactions on pattern analysis and machine intelligence 36, 5 (2014), 970--983. Google ScholarGoogle ScholarCross RefCross Ref
  14. Lihi Zelnik-Manor and Pietro Perona. 2005. Self-tuning spectral clustering. (2005).Google ScholarGoogle Scholar
  15. Zheng Zhang, Wei Shen, Cong Yao, and Xiang Bai. 2015. Symmetry-based text line detection in natural scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  16. S. Zhu and R. Zanibbi. 2016. A Text Detection System for Natural Scenes with Convolutional Feature Learning and Cascaded Classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Robust Ensemble of ResNets for Character Level End-to-end Text Detection in Natural Scene Images

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      CBMI '17: Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing
      June 2017
      237 pages
      ISBN:9781450353335
      DOI:10.1145/3095713

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 June 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader