Skip to main content
Log in

Scene text extraction based on edges and support vector regression

  • Special Issue Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

This paper presents a scene text extraction technique that automatically detects and segments texts from scene images. Three text-specific features are designed over image edges with which a set of candidate text boundaries is first detected. For each detected candidate text boundary, one or more candidate characters are then extracted by using a local threshold that is estimated based on the surrounding image pixels. The real characters and words are finally identified by a support vector regression model that is trained using bags-of-words representation. The proposed technique has been evaluated over the latest ICDAR-2013 Robust Reading Competition dataset. Experiments show that it obtains superior F-measures of 78.19 % and 75.24 % (on atom level), respectively, for the scene text detection and segmentation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://dag.cvc.uab.es/icdar2013competition/.

References

  1. Niblack, W.: An Introduction to Digital Image Processing. Prentice-Hall, Englewood (1986)

    Google Scholar 

  2. Liang, J., Doermann, D., Li, H.: Camera-based analysis of text and documents: a survey. Int. J. Doc. Anal. Recognit. 7(2–3), 84–104 (2005)

    Article  Google Scholar 

  3. Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recognit. 37(5), 977–997 (2004)

    Article  Google Scholar 

  4. Clavelli, A., Karatzas, D., Llados, J.: A framework for the assessment of text extraction algorithms on complex colour images. In: IAPR International Workshop on Document Analysis Systems, pp. 19–26 (2010)

  5. Chen, X., Yuille, A.: Detecting and reading text in natural scenes. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 366–373 (2004)

  6. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Gomez i Bigorda, L., Robles Mestre, S., Mas, J., Fernandez Mota, D., Almazan Almazan, J., de las Heras, L.-P.: ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013)

  7. Lienhart, R., Wernicke, A.: Localizing and segmenting text in images and videos. IEEE Trans. Circuit Syst. Video Technol. 12(4), 256–268 (2002)

    Article  Google Scholar 

  8. Jain, A.K., Yu, B.: Automatic text location in images and video frames. Pattern Recognit. 31(12), 2055–2076 (1998)

    Article  Google Scholar 

  9. Kim, H.K.: Efficient automatic text location method and content-based indexing and structuring of video database. J. Vis. Commun. Image Represent. 7(4), 336–344 (1996)

    Article  Google Scholar 

  10. Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: International Conference on Document Analysis and Recognition, pp. 682–687 (2003)

  11. Lucas, S.M.: ICDAR 2005 text locating competition results. In: International Conference on Document Analysis and Recognition, pp. 80–84 (2005)

  12. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2963–2970 (2010)

  13. Datta, R., Joshi, D., Li, J., Wang, James Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), 1–60 (2008)

    Article  Google Scholar 

  14. Shivakumara, P., Phan, T.Q., Tan, C.L.: A Laplacian approach to multi-oriented text detection in video. IEEE Trans. Pattern. Anal. Mach. Intell. 33(2), 412–419 (2011)

    Article  Google Scholar 

  15. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern. Anal. Mach. Intell. 8(6), 679–698 (1986)

    Article  Google Scholar 

  16. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)

  17. Mishra, A., Alahari, K., Jawahar, C.V.: Top-down and bottom-up cues for scene text recognition. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2687–2694 (2012)

  18. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: IEEE International Conference on Computer Vision, pp. 1457–1464 (2011)

  19. Wang, K., Belongie, S.: Word spotting in the wild. In: European Conference on Computer Vision, pp. 591–604 (2010)

  20. Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Wu, D.J., Ng, A.Y.: Text detection and character recognition in scene images with unsupervised feature learning. In: International Conference on Document Analysis and Recognition, pp. 440–445 (2011)

  21. Wang, T., Wu, David J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: International Conference on Pattern Recognition, pp. 3304–3308 (2012)

  22. Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 3538–3545 (2012)

  23. Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1083–1090 (2012)

  24. Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans. Image Process. 20(9), 2594–2605 (2011)

  25. Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: International Conference on Document Analysis and Recognition, pp. 1491–1496 (2011)

  26. Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: International Conference on Image Processing, pp. 2609–2612 (2011)

  27. Wolf, C., Jolion, J.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. Recognit. 8(4), 280–296 (2006)

    Article  Google Scholar 

  28. Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognit. Lett. 34(2), 280–296 (2012)

    Google Scholar 

  29. Pan, Y.F., Hou, X., Liu, C.L.: A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image. Process. 20(3), 800–813 (2011)

    Article  MathSciNet  Google Scholar 

  30. Yi, C., Tian, Y.: Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification. IEEE Trans. Image. Process. 21(9), 4256–4268 (2012)

    Article  MathSciNet  Google Scholar 

  31. Kasar, T., Kumar, J., Ramakrishnan, A.G.: Font and background color independent text binarization. In: International workshop on Camera Based Document Analysis and Recognition (workshop of ICDAR), pp. 3–9 (2007)

  32. Basak, D., Pal, S., Patranabis, D.C.: Support vector regression. Neural Inf. Process. 11(10), 203–224 (2007)

    Google Scholar 

  33. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–65 (1979)

    Article  MathSciNet  Google Scholar 

  34. Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptors. In: IEEE International Conference on Computer Vision, pp. 1241–1248 (2013)

  35. Chen, T., Yap, K.-H., Zhang, D.J.: Discriminative soft bag-of-visual phrase for mobile landmark recognition. IEEE Trans. Multimed. 13, 612–622 (2014)

    Article  Google Scholar 

  36. Li, T., Mei, T., Kweon, I.-S., Hua, X.S.: Contextual bags-of-words for visual categorization. IEEE Trans. Circuits Syst. Video Technol. 21, 381–392 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shijian Lu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, S., Chen, T., Tian, S. et al. Scene text extraction based on edges and support vector regression. IJDAR 18, 125–135 (2015). https://doi.org/10.1007/s10032-015-0237-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-015-0237-z

Keywords

Navigation