Abstract
This paper presents a scene text extraction technique that automatically detects and segments texts from scene images. Three text-specific features are designed over image edges with which a set of candidate text boundaries is first detected. For each detected candidate text boundary, one or more candidate characters are then extracted by using a local threshold that is estimated based on the surrounding image pixels. The real characters and words are finally identified by a support vector regression model that is trained using bags-of-words representation. The proposed technique has been evaluated over the latest ICDAR-2013 Robust Reading Competition dataset. Experiments show that it obtains superior F-measures of 78.19 % and 75.24 % (on atom level), respectively, for the scene text detection and segmentation tasks.
Similar content being viewed by others
References
Niblack, W.: An Introduction to Digital Image Processing. Prentice-Hall, Englewood (1986)
Liang, J., Doermann, D., Li, H.: Camera-based analysis of text and documents: a survey. Int. J. Doc. Anal. Recognit. 7(2–3), 84–104 (2005)
Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recognit. 37(5), 977–997 (2004)
Clavelli, A., Karatzas, D., Llados, J.: A framework for the assessment of text extraction algorithms on complex colour images. In: IAPR International Workshop on Document Analysis Systems, pp. 19–26 (2010)
Chen, X., Yuille, A.: Detecting and reading text in natural scenes. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 366–373 (2004)
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Gomez i Bigorda, L., Robles Mestre, S., Mas, J., Fernandez Mota, D., Almazan Almazan, J., de las Heras, L.-P.: ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013)
Lienhart, R., Wernicke, A.: Localizing and segmenting text in images and videos. IEEE Trans. Circuit Syst. Video Technol. 12(4), 256–268 (2002)
Jain, A.K., Yu, B.: Automatic text location in images and video frames. Pattern Recognit. 31(12), 2055–2076 (1998)
Kim, H.K.: Efficient automatic text location method and content-based indexing and structuring of video database. J. Vis. Commun. Image Represent. 7(4), 336–344 (1996)
Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: International Conference on Document Analysis and Recognition, pp. 682–687 (2003)
Lucas, S.M.: ICDAR 2005 text locating competition results. In: International Conference on Document Analysis and Recognition, pp. 80–84 (2005)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2963–2970 (2010)
Datta, R., Joshi, D., Li, J., Wang, James Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), 1–60 (2008)
Shivakumara, P., Phan, T.Q., Tan, C.L.: A Laplacian approach to multi-oriented text detection in video. IEEE Trans. Pattern. Anal. Mach. Intell. 33(2), 412–419 (2011)
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern. Anal. Mach. Intell. 8(6), 679–698 (1986)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)
Mishra, A., Alahari, K., Jawahar, C.V.: Top-down and bottom-up cues for scene text recognition. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2687–2694 (2012)
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: IEEE International Conference on Computer Vision, pp. 1457–1464 (2011)
Wang, K., Belongie, S.: Word spotting in the wild. In: European Conference on Computer Vision, pp. 591–604 (2010)
Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Wu, D.J., Ng, A.Y.: Text detection and character recognition in scene images with unsupervised feature learning. In: International Conference on Document Analysis and Recognition, pp. 440–445 (2011)
Wang, T., Wu, David J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: International Conference on Pattern Recognition, pp. 3304–3308 (2012)
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 3538–3545 (2012)
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1083–1090 (2012)
Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans. Image Process. 20(9), 2594–2605 (2011)
Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: International Conference on Document Analysis and Recognition, pp. 1491–1496 (2011)
Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: International Conference on Image Processing, pp. 2609–2612 (2011)
Wolf, C., Jolion, J.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. Recognit. 8(4), 280–296 (2006)
Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognit. Lett. 34(2), 280–296 (2012)
Pan, Y.F., Hou, X., Liu, C.L.: A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image. Process. 20(3), 800–813 (2011)
Yi, C., Tian, Y.: Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification. IEEE Trans. Image. Process. 21(9), 4256–4268 (2012)
Kasar, T., Kumar, J., Ramakrishnan, A.G.: Font and background color independent text binarization. In: International workshop on Camera Based Document Analysis and Recognition (workshop of ICDAR), pp. 3–9 (2007)
Basak, D., Pal, S., Patranabis, D.C.: Support vector regression. Neural Inf. Process. 11(10), 203–224 (2007)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–65 (1979)
Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptors. In: IEEE International Conference on Computer Vision, pp. 1241–1248 (2013)
Chen, T., Yap, K.-H., Zhang, D.J.: Discriminative soft bag-of-visual phrase for mobile landmark recognition. IEEE Trans. Multimed. 13, 612–622 (2014)
Li, T., Mei, T., Kweon, I.-S., Hua, X.S.: Contextual bags-of-words for visual categorization. IEEE Trans. Circuits Syst. Video Technol. 21, 381–392 (2010)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lu, S., Chen, T., Tian, S. et al. Scene text extraction based on edges and support vector regression. IJDAR 18, 125–135 (2015). https://doi.org/10.1007/s10032-015-0237-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-015-0237-z