Skip to main content
Log in

Bottom-Up Scene Text Detection with Markov Clustering Networks

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

A novel detection framework named Markov Clustering Network (MCN) is proposed for fast and robust scene text detection. Different from the traditional top-down scene text detection approaches that inherit from the classic object detection, MCN detects scene text objects in a bottom-up manner. MCN predicts instance-level bounding boxes by firstly converting an image into a stochastic flow graph where Markov Clustering is performed based on the predicted stochastic flows. The stochastic flows encode the local correlation and semantic information of scene text objects. An object is modeled as strongly connected nodes by flows, which allows flexible and bottom-up detection for scale-varying and rotated text objects without prior knowledge of object size. The flow prediction is supported by the advanced Convolutional Neural Networks architectures and Position-aware spatial attention mechanism, which provides enhanced flow prediction by adaptively fusing spatial representations. The experimental evaluation on public benchmarks shows that our MCN method achieves the state-of-art performance on public benchmarks, especially in retrieving long and oriented texts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. We use both 1D and 2D notation, alternatively, to index a node. The transformation between 1D notation m and 2D notation \((i_m,j_m)\) can be represented by \(m = i_m + \frac{H}{U}\cdot j_m\).

References

  • Bissacco, A., Cummins, M., Netzer, Y., Neven, H. (2013). Photoocr: Reading text in uncontrolled conditions. In Proceedings of the IEEE international conference on computer vision, pp. 785–792

  • Chen, D., Olobez, J. M., Bourlard, H. (2002). Text segmentation and recognition in complex background based on markov random field. In Object recognition supported by user interaction for service robots, Vol. 4, pp. 227–230.

  • Dai, Y., Huang, Z., Gao, Y., Chen, K. (2017). Fused text segmentation networks for multi-oriented scene text detection. arXiv preprint arXiv:1709.03272

  • Deng, J., Berg, A., Satheesh, S., Su, H., Khosla, A., Fei-Fei, L. (2012). Ilsvrc-2012

  • Deng, D., Liu, H., Li, X., Cai, D. (2017). Pixellink: Detecting scene text via instance segmentation. In Thirty-second AAAI conference on artificial intelligence

  • Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T. (2015). Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2625–2634

  • Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pp. 1440–1448

  • Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2016). Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(1), 142–158.

    Article  Google Scholar 

  • Gupta, A., Vedaldi, A., Zisserman, A. (2016). Synthetic data for text localisation in natural images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2315–2324

  • He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X. (2017a). Single shot text detector with regional attention. In Proceedings of the IEEE international conference on computer vision, pp. 3047–3055

  • He, D., Yang, X., Liang, C., Zhou, Z., Ororbi, A. G., Kifer, D., Lee Giles, C. (2017b). Multi-scale fcn with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3519–3528

  • He, W., Zhang, X. Y., Yin, F., Liu, C. L. (2017c). Deep direct regression for multi-oriented scene text detection. arXiv preprint arXiv:1703.08289

  • Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., Ding, E. (2017). Wordsup: Exploiting word annotations for character based text detection. In Proceedings of the IEEE international conference on computer vision

  • Huang, W., Lin, Z., Yang, J., Wang, J. (2013). Text localization in natural images using stroke feature transform and text covariance descriptors. In Proceedings of the IEEE international conference on computer vision, pp. 1241–1248

  • Huang, W., Qiao, Y., Tang, X. (2014). Robust scene text detection with convolution neural network induced mser trees. In European conference on computer vision, pp. 497–511. Springer

  • ICDAR (2017). Rrobust reading competition. http://u-pat.org/ICDAR2017/index.php

  • Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2016). Reading text in the wild with convolutional neural networks. International Journal of Computer Vision, 116(1), 1–20.

    Article  MathSciNet  Google Scholar 

  • Jiang, F., Hao, Z., Liu, X. (2017a). Deep scene text detection with connected component proposals. arXiv preprint arXiv:1708.05133

  • Jiang, Y., Zhu, X., Wang, X., Yang, S., Li, W., Wang, H., Fu, P., Luo, Z. (2017b). R2cnn: Rotational region cnn for orientation robust scene text detection. arXiv preprint arXiv:1706.09579

  • Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V. R., Lu, S., et al. (2017). Icdar 2015 competition on robust reading. In 13th international conference on document analysis and recognition (ICDAR), 2015 , pp. 1156–1160. IEEE

  • Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S. R., Mas, J., Mota, D. F., Almazan, J. A., de las Heras, L. P. (2013). Icdar 2013 robust reading competition. In 2013 12th international conference on document analysis and recognition, pp. 1484–1493. IEEE

  • LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.

    Article  Google Scholar 

  • Li, Y., Ma, J. (2017). A unified deep neural network for scene text detection. In International conference on intelligent computing, pp. 101–112. Springer

  • Liao, M., Shi, B., Bai, X., Wang, X., Liu, W. (2017). Textboxes: A fast text detector with a single deep neural network. In Thirty-first AAAI conference on artificial intelligence

  • Liao, M., Zhu, Z., Shi, B., Xia, G. S., Bai, X. (2018a). Rotation-sensitive regression for oriented scene text detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5909–5918

  • Liao, M., Shi, B., & Bai, X. (2018b). Textboxes++: A single-shot oriented scene text detector. IEEE Transactions on Image Processing, 27(8), 3676–3690.

    Article  MathSciNet  Google Scholar 

  • Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., Berg, A. C. (2016). Ssd: Single shot multibox detector. In European conference on computer vision, pp. 21–37. Springer

  • Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J. (2018a). Fots: Fast oriented text spotting with a unified network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5676–5685

  • Liu, Z., Lin, G., Yang, S., Feng, J., Lin, W., Goh, W. L. (2018b). Learning markov clustering networks for scene text detection. In The IEEE conference on computer vision and pattern recognition (CVPR)

  • Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C. (2018). Textsnake: A flexible representation for detecting text of arbitrary shapes. In Proceedings of the European conference on computer vision (ECCV), pp. 20–36

    Chapter  Google Scholar 

  • Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X. (2018). Multi-oriented scene text detection via corner localization and region segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7553–7563

  • Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., et al. (2018). Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 20(11), 3111–3122.

    Article  Google Scholar 

  • Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10), 761–767.

    Article  Google Scholar 

  • Mishra, A., Alahari, K., Jawahar, C. (2012). Scene text recognition using higher order language priors

  • Neumann, L., Matas, J. (2012). Real-time scene text localization and recognition. In IEEE conference on computer vision and pattern recognition (CVPR), 2012, pp. 3538–3545. IEEE

  • Neumann, L., & Matas, J. (2016). Real-time lexicon-free scene text localization and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9), 1872–1885.

    Article  Google Scholar 

  • Nickolls, J., Buck, I., Garland, M., & Skadron, K. (2008). Scalable parallel programming with cuda. Queue, 6(2), 40–53.

    Article  Google Scholar 

  • Nistér, D., Stewénius, H. (2008). Linear time maximally stable extremal regions. In European conference on computer vision, pp. 183–196. Springer

  • Redmon, J., Divvala, S., Girshick, R., Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788

  • Ren, S., He, K., Girshick, R., Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pp. 91–99

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.

    Article  MathSciNet  Google Scholar 

  • Satuluri, V., Parthasarathy, S. (2009). Scalable graph clustering using stochastic flows: applications to community discovery. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 737–746. ACM

  • Satuluri, V., Parthasarathy, S., Ucar, D. (2010). Markov clustering of protein interaction networks with improved balance and scalability. In Proceedings of the first ACM international conference on bioinformatics and computational biology, pp. 247–256. ACM

  • Semeniuta, S., Severyn, A., Barth, E. (2016). Recurrent dropout without memory loss. arXiv preprint arXiv:1603.05118

  • Shaw, P., Uszkoreit, J., Vaswani, A. (2018). Self-attention with relative position representations. arXiv preprint arXiv:1803.02155

  • Shi, B., Bai, X., Belongie, S. (2017). Detecting oriented text in natural images by linking segments. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2550–2558

  • Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z. (2013). Scene text recognition using part-based tree-structured character detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2961–2968

  • Shrivastava, A., Gupta, A., Girshick, R. (2016). Training region-based object detectors with online hard example mining. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 761–769

  • Simonyan, K., Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  • Tian, Z., Huang, W., He, T., He, P., Qiao, Y. (2016). Detecting text in natural image with connectionist text proposal network. In European conference on computer vision, pp. 56–72. Springer

  • Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Lim Tan, C. (2015). Text flow: A unified text detection system in natural scene images. In Proceedings of the IEEE international conference on computer vision, pp. 4651–4659

  • Van Dongen, S. M. (2001). Graph clustering by flow simulation. Ph.D. thesis

  • Wang, K., Belongie, S. (2010). Word spotting in the wild. In European conference on computer vision, pp. 591–604. Springer

  • Wang, T., Wu, D. J., Coates, A., Ng, A. Y. (2012). End-to-end text recognition with convolutional neural networks. In 21st international conference on pattern recognition (ICPR), 2012, pp. 3304–3308. IEEE

  • Xue, C., Lu, S., Zhan, F. (2018). Accurate scene text detection through border semantics awareness and bootstrapping. In European conference on computer vision (ECCV)

    Chapter  Google Scholar 

  • Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z. (2012). Detecting texts of arbitrary orientations in natural images. In 2012 IEEE conference on computer vision and pattern recognition, pp. 1083–1090. IEEE

  • Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z. (2016). Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002

  • Yao, C., Bai, X., & Liu, W. (2014). A unified framework for multioriented text detection and recognition. IEEE Transactions on Image Processing, 23(11), 4737–4749.

    Article  MathSciNet  Google Scholar 

  • Yuliang, L., Lianwen, J., Shuaitao, Z., Sheng, Z. (2017). Detecting curve text in the wild: New dataset and new solution. arXiv preprint arXiv:1712.02170

  • Zamberletti, A., Noce, L., Gallo, I. (2014). Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions. In Asian conference on computer vision, pp. 91–105. Springer

  • Zhang, S., Liu, Y., Jin, L., Luo, C. (2018). Feature enhancement network: A refined scene text detector. In Thirty-second AAAI conference on artificial intelligence

  • Zhang, Z., Shen, W., Yao, C., Bai, X. (2015). Symmetry-based text line detection in natural scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2558–2567

  • Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X. (2016). Multi-oriented text detection with fully convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4159–4167

  • Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J. (2017). East: An efficient and accurate scene text detector. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5551–5560

  • Zhu, Y., Yao, C., & Bai, X. (2016). Scene text detection and recognition: Recent advances and future trends. Frontiers of Computer Science, 10(1), 19–36.

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported by the National Research Foundation Singapore under its AI Singapore Programme (Award Number: AISG-RP-2018-003) and the MOE Tier-1 research Grants: RG126/17 (S) and RG28/18 (S).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guosheng Lin.

Additional information

Communicated by Florent Perronnin.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Z., Lin, G. & Goh, W.L. Bottom-Up Scene Text Detection with Markov Clustering Networks. Int J Comput Vis 128, 1786–1809 (2020). https://doi.org/10.1007/s11263-020-01298-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-020-01298-y

Keywords

Navigation