Single shot multi-oriented text detection based on local and non-local features

Li, XiaoQian; Liu, Jie; Zhang, ShuWu; Zhang, GuiXuan; Zheng, Yang

doi:10.1007/s10032-020-00356-y

Single shot multi-oriented text detection based on local and non-local features

Original Paper
Published: 04 August 2020

Volume 23, pages 241–252, (2020)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

XiaoQian Li^1,2,
Jie Liu^1,3,
ShuWu Zhang^1,2,3,
GuiXuan Zhang^1,3 &
…
Yang Zheng¹

571 Accesses
3 Citations
Explore all metrics

Abstract

In order to improve the robustness of text detector on scene text of various scales, a single shot text detector that combines local and non-local features is proposed in this paper. A dilated inception module for local feature extraction and a text self-attention module for non-local feature extraction are presented, and these two kinds of modules are integrated into single shot detector (SSD) of generic object detection so as to perform multi-oriented text detection in natural scene. The proposed modules make a contribution to richer and wider receptive field and enhance feature representation. Furthermore, the performance of our text detector is improved. In addition, compared with previous text detectors based on SSD which classify positive and negative samples depending on default boxes, we exploit pixels as reference for more accurate matching with ground truth which avoids complex anchor design. Furthermore, to evaluate the effectiveness of the proposed method, we carry out several comparative experiments on public standard benchmarks and analyze the experimental results in detail. The experimental results illustrate that the proposed text detector can compete with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Fast Method for Scene Text Detection

Single Shot Text Detector with Rotational Prior Boxes

Article 03 March 2018

Combining Swin Transformer and Attention-Weighted Fusion for Scene Text Detection

Article Open access 17 February 2024

References

Chen, L., Papandreou, G., Kokkinos, I., Murphy, K.P., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Article Google Scholar
Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: detecting scene text via instance segmentation. arXiv preprint arXiv:1801.01315 (2018)
Fu, J., Liu, J., Tian, H., Fang, Z., Lu, H.: Dual attention network for scene segmentation. arXiv:1809.02983 (2018)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Article Google Scholar
He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: The IEEE International Conference on Computer Vision (ICCV), vol. 6 (2017)
He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 745–753. IEEE (2017)
Icdar2019 competition on multi-lingual scene text detection and script identification. http://rrc.cvc.uab.es/?ch=15&com=introduction (2019)
Jiang, Y., Zhu, X., Wang, X., Yang, S., Li, W., Wang, H., Fu, P., Luo, Z.: R2cnn: rotational region cnn for orientation robust scene text detection. arXiv:1706.09579 (2017)
Kang, L., Li, Y., Doermann, D.: Orientation robust text line detection in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4034–4041 (2014)
Karatzas, D., Gomezbigorda, L., Nicolaou, A., Ghosh, S.K., Bagdanov, A.D., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V., Lu, S., et al.: Icdar 2015 competition on robust reading, pp. 1156–1160 (2015)
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G.I., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J., Heras, L.D.L.: Icdar 2013 robust reading competition, pp. 1484–1493 (2013)
Liao, M., Shi, B., Bai, X.: Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)
Article MathSciNet Google Scholar
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: AAAI, pp. 4161–4167 (2017)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2999–3007 (2018)
Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)
Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: fast oriented text spotting with a unified network. In: Computer
Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: Textsnake: a flexible representation for detecting text of arbitrary shapes. In: European Conference on Computer Vision, pp. 19–35. Springer (2018)
Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 20, 3111–3122 (2018)
Article Google Scholar
None: Icdar2017 competition on multi-lingual scene text detection and script identification. http://rrc.cvc.uab.es/?ch=8&com=introduction (2017)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229 (2013)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Woo, S., Park, J., Lee, J.Y., So Kweon, I.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1083–1090. IEEE (2012)
Yin, X.C., Pei, W.Y., Zhang, J., Hao, H.W.: Multi-orientation scene text detection with adaptive clustering. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1930–1937 (2015)
Article Google Scholar
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. arXiv:1805.08318 (2018)
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4159–4167 (2016)
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: East: an efficient and accurate scene text detector. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2642–2651. IEEE (2017)
Zhu, Y., Du, J.: Sliding line point regression for shape robust scene text detection. arXiv:1801.09969 (2018)

Download references

Acknowledgements

This work is supported by the National Key R&D Program of China under Grant 2017YFB1401000, China Postdoctoral Science Foundation under Grant 2018M641524 and Science and Technology Program of Beijing Z201100001820002. This work is part of the research achievements of the Key Laboratory of Digital Rights Services, which is one of the National Science and Standardization Key Labs for Press and Publication Industry.

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
XiaoQian Li, Jie Liu, ShuWu Zhang, GuiXuan Zhang & Yang Zheng
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
XiaoQian Li & ShuWu Zhang
AICFVE, Beijing Film Academy, Beijing, 100088, China
Jie Liu, ShuWu Zhang & GuiXuan Zhang

Authors

XiaoQian Li
View author publications
You can also search for this author in PubMed Google Scholar
Jie Liu
View author publications
You can also search for this author in PubMed Google Scholar
ShuWu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
GuiXuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Zheng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, X., Liu, J., Zhang, S. et al. Single shot multi-oriented text detection based on local and non-local features. IJDAR 23, 241–252 (2020). https://doi.org/10.1007/s10032-020-00356-y

Download citation

Received: 27 June 2019
Revised: 03 April 2020
Accepted: 14 July 2020
Published: 04 August 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s10032-020-00356-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Single shot multi-oriented text detection based on local and non-local features

Abstract

Access this article

Similar content being viewed by others

A Fast Method for Scene Text Detection

Single Shot Text Detector with Rotational Prior Boxes

Combining Swin Transformer and Attention-Weighted Fusion for Scene Text Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Single shot multi-oriented text detection based on local and non-local features

Abstract

Access this article

Similar content being viewed by others

A Fast Method for Scene Text Detection

Single Shot Text Detector with Rotational Prior Boxes

Combining Swin Transformer and Attention-Weighted Fusion for Scene Text Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation