Abstract
We propose an algorithm of text detection to accurately and reliably determine the bounding regions of texts in a natural scene. The cascaded convolutional neural networks are aggregated in our system in order to obtain accurate Precision, Recall and F-score (PRF) of text detection. The first fully convolutional network, as a coarse detector, is in charge of detecting and segmenting areas of text-like. And the second network filters the segment blocks of non-text and accurately determines each text lines of the segment blocks. In order to make best use of the advantages of two networks, we proposed an intermediate-processing mechanism. The whole system has powerful capability of detecting those squeezed lines with very tiny words and also those texts with different sizes, especially for small size text. Our experimental system is based on a Titan X GPU and achieves precision of 0.92, recall of 0.83 and F-score of 0.87, which is listed in the 22nd place among all the published results of the ICDAR 2013 Focused Scene Text dataset benchmark.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Tu, Z., Ma, Y., Liu, W., et al.: Detecting texts of arbitrary orientations in natural images. In: Computer Vision and Pattern Recognition, pp. 1083–1090. IEEE (2012)
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3538–3545. IEEE Computer Society (2012)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Computer Vision and Pattern Recognition, pp. 2963–2970. IEEE (2010)
Jaderberg, M., Simonyan, K., Vedaldi, A., et al.: Synthetic data and artificial neural networks for natural scene text recognition. Eprint Arxiv (2014)
Jaderberg, M., Simonyan, K., Vedaldi, A., et al.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016)
Liao, M., Shi, B., Bai, X., et al.: TextBoxes: a fast text detector with a single deep neural network (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324. IEEE Computer Society (2016)
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: Computer Vision and Pattern Recognition, pp. 779–788. IEEE (2016)
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Zhong, Z., Jin, L., Zhang, S., et al.: DeepText: a unified framework for text proposal generation and text detection in natural images. Archit. Sci. 12, 1–18 (2015)
Qin, S., Manduchi, R.: Cascaded segmentation-detection networks for word-level text spotting (2017)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440. IEEE Computer Society (2015)
Zhang, Z., Zhang, C., Shen, W., et al.: Multi-oriented Text Detection with Fully Convolutional Networks. In: Computer Vision and Pattern Recognition. IEEE (2016)
Karatzas, D., Shafait, F., Uchida, S., et al.: ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition, pp. 1484–1493. IEEE (2013)
Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)
Karatzas, D., Gomez-Bigorda, L., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160 (2015)
Zhu, Y., Yao, C., Bai, X.: Scene text detection and recognition: recent advances and future trends. Front. Comput. Sci. 10(1), 19–36 (2016)
Wang, S., Fu, C., Li, Q.: Text detection in natural scene image: a survey. In: Huang, X.-L. (ed.) MLICOM 2016. LNICST, vol. 183, pp. 257–264. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52730-7_26
Xie, S., Tu, Z.: Holistically-nested edge detection, pp. 1395–1403 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
He, P., Huang, W., Qiao, Y., et al.: Reading scene text in deep convolutional sequences. 116(1), 3501–3508 (2015)
Buta, M., Neumann, L., Matas, J.: FASText: efficient unconstrained scene text detector. In: IEEE International Conference on Computer Vision, pp. 1206–1214. IEEE (2015)
Neumann, L., Matas, J.: Efficient Scene text localization and recognition with local character refinement. In: International Conference on Document Analysis and Recognition, pp. 746–750. IEEE (2015)
Neumann, L., Matas, J.: Real-time lexicon-free scene text localization and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1872–1885 (2016)
Tian, S., Pan, Y., Huang, C., et al.: Text flow: a unified text detection system in natural scene images, pp. 4651–4659 (2016)
Zhang, Z., Shen, W., Yao, C., et al.: Symmetry-based text line detection in natural scenes. In: Computer Vision and Pattern Recognition, pp. 2558–2567. IEEE (2015)
He, T., Huang, W., Qiao, Y., et al.: Text-attentional convolutional neural network for scene text detection. IEEE Trans. Image Process. 25(6), 2529–2541 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Li, J., Wang, C., Luo, Z., Tang, Z., Li, H. (2018). Accurate Detection for Scene Texts with a Cascaded CNN Networks. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10705. Springer, Cham. https://doi.org/10.1007/978-3-319-73600-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-73600-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73599-3
Online ISBN: 978-3-319-73600-6
eBook Packages: Computer ScienceComputer Science (R0)