Skip to main content
Log in

Scene text detection using adaptive color reduction, adjacent character model and hybrid verification strategy

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Text detection is a primary task for text recognition and understanding, which can be used in many image analysis techniques. In this paper, we propose an effective scene text detection method including three major steps: connected components (CCs) extraction, character-linking and text/non-text classification. First, for CCs extraction, we design an adaptive color reduction scheme by analyzing image color histogram, which reasonably selects color centers and generates unfixed number of color layers for images in different color complexities. Then, for character-linking, an adjacent character model is built by training an extreme learning machine (ELM), instead of setting various thresholds in previous approaches. Finally, a hybrid text verification strategy is adopted, combining convolutional neural network with ELM for text/non-text classification and performing better than just using one of them. Experimental results on some publicly available datasets illustrate the effectiveness of our method and comparative results with some state-of-the-art algorithms demonstrate our competitiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Yi, C., Tian, Y: Assistive text reading from complex background for blind persons. In: Camera-Based Document Analysis and Recognition, pp. 15–28. Springer, Berlin (2012)

  2. Yi, C., Tian, Y.: Scene text recognition in mobile applications by character descriptor and structure configuration. IEEE Trans. Image Process. 23(7), 2972–2982 (2014)

    Article  MathSciNet  Google Scholar 

  3. Weinman, J.J., Miller, E.L., Hanson, A.R.: Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1733–1746 (2009)

    Article  Google Scholar 

  4. Yin, X.C., Hao, H.W., Sun, J., Naoi, S.: Robust vanishing point detection for mobilecam-based documents. In: International Conference on Document Analysis and Recognition (ICDAR), 2011, pp. 136–140. IEEE (2011)

  5. Liu, X., Li, C., Zhu, H., Wong, T.-T., Xu, X.: Text-aware balloon extraction from manga. Vis. Comput., pp. 1–11 (2015). doi: 10.1007/s00371-015-1084-0

  6. Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004, vol. 2, pp. II–366. IEEE (2004)

  7. Minetto, R., Thome, N., Cord, M., Leite, N.J., Stolfi, J.: T-hog: an effective gradient-based descriptor for single line text regions. Pattern Recognit. 46(3), 1078–1090 (2013)

    Article  Google Scholar 

  8. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1457–1464. IEEE (2011)

  9. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2963–2970. IEEE (2010)

  10. Koo, H.I., Kim, D.H.: Scene text detection via connected component clustering and nontext filtering. IEEE Trans. Image Process. 22(6), 2296–2305 (2013)

    Article  MathSciNet  Google Scholar 

  11. Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptors. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 1241–1248. IEEE (2013)

  12. Yin, X., Huang, K., Hao, H.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2014)

    Article  Google Scholar 

  13. Huang, G.-B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 42(2), 513–529 (2012)

  14. Nikolaou, N., Papamarkos, N.: Color reduction for complex document images. Int. J. Imaging Syst. Technol. 19(1), 14–26 (2009)

    Article  Google Scholar 

  15. Yi, C., Tian, Y.L.: Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans. Image Process. 20(9), 2594–2605 (2011)

    Article  MathSciNet  Google Scholar 

  16. Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 2609–2612. IEEE (2011)

  17. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)

    Article  Google Scholar 

  18. Yi, C., Tian, Y.L.: Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification. IEEE Trans. Image Process. 21(9), 4256–4268 (2012)

    Article  MathSciNet  Google Scholar 

  19. Wang, X., Song, Y., Zhang, Y.: Natural scene text detection with multi-channel connected component segmentation. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1375–1379. IEEE (2013)

  20. Yao, C., Bai, X., Liu, W.: A unified framework for multi-oriented text detection and recognition. IEEE Trans. Image Process. 23(11), 4737–4749 (2014)

    Article  MathSciNet  Google Scholar 

  21. Zhang, X., Lin, Z., Sun, F., Ma, Y.: Transform invariant text extraction. Vis. Comput. 30(4), 401–415 (2014)

    Article  Google Scholar 

  22. Zhang, Z., Ganesh, A., Liang, X., Ma, Y.: Tilt: transform invariant low-rank textures. Int. J. Comput. Vis. 99(1), 1–24 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  23. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  24. Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp. 3304–3308. IEEE (2012)

  25. Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced MSER trees. In: Computer Vision—ECCV 2014, pp. 497–511. Springer, Berlin (2014)

  26. Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1083–1090. IEEE (2012)

  27. Saidane, Z., Garcia, C.: Automatic scene text recognition using a convolutional neural network. In: Workshop on Camera-Based Document Analysis and Recognition, vol. 1 (2007)

  28. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)

    Article  Google Scholar 

  29. Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognit. Lett. 34(2), 107–116 (2013)

    Article  Google Scholar 

  30. Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 682–682. IEEE Computer Society (2003)

  31. Wolf, C., Jolion, J.-M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. Recognit. 8(4), 280–296 (2006)

    Article  Google Scholar 

  32. Lucas, S.M.: ICDAR 2005 text locating competition results. In: Eighth International Conference on Document Analysis and Recognition, 2005. Proceedings, pp. 80–84. IEEE (2005)

  33. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Gomez i Bigorda, L., Robles Mestre, S., Mas, J., Fernandez Mota, D., Almazan Almazan, J., de las Heras, L.-P.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1484–1493. IEEE (2013)

  34. Wang, K., Belongie, S.: Word Spotting in the Wild. Springer, Berlin (2010)

    Book  Google Scholar 

  35. de Campos, T.E., Babu, B.R., Varma, M.: Character recognition in natural images. In: VISAPP (2), pp. 273–280 (2009)

  36. Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 687–691. IEEE (2011)

  37. Shi, C., Wang, C., Xiao, B., Gao, S., Hu, J.: Scene text recognition using structure guided character detection and linguistic knowledge. IEEE Trans. Circuits Syst. Video Technol. 24(7), 1235–1250 (2014)

    Article  Google Scholar 

  38. Fabrizio, J., Marcotegui, B., Cord, M.: Text segmentation in natural scenes using toggle-mapping. In: 2009 16th IEEE International Conference on Image Processing (ICIP), pp. 2373–2376. IEEE (2009)

  39. Fabrizio, J., Marcotegui, B., Cord, M.: Text detection in street level images. Pattern Anal. Appl. 16(4), 519–533 (2013)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable suggestions.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Beiji Zou or Yu-qian Zhao.

Additional information

This work is partly supported by the National Natural Science Foundation of China (Grant Nos. 61172184, 61379107, 61402539, and 61573380), Program for New Century Excellent Talents in University of Education Ministry in China (Grant No. NCET-13-0603), Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20130162110016) and the Fundamental Research Funds for the Central Universities of Central South University (Grant No. 2015zzts052).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, H., Zou, B., Zhao, Yq. et al. Scene text detection using adaptive color reduction, adjacent character model and hybrid verification strategy. Vis Comput 33, 113–126 (2017). https://doi.org/10.1007/s00371-015-1156-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-015-1156-1

Keywords

Navigation