Skip to main content

TextFocus: Efficient Multi-scale Detection for Arbitrary Scene Text

  • Conference paper
  • First Online:
Computational Data and Social Networks (CSoNet 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14479))

Included in the following conference series:

  • 84 Accesses

Abstract

In the age of deep learning, the emergence of high-resolution datasets containing small text presents a growing challenge in scene text detection. Scaling down entire images to address this issue often leads to text distortion and performance degradation. In this research, we introduce TextFocus, an innovative algorithm that leverages a multi-scale training strategy with a focus on efficiency. Instead of analyzing each pixel in an image pyramid, TextFocus will attempt to identify context regions surrounding ground-truth instances, or “chips,” and then process for finding all text regions in the image sample. All text information from every chip from the model will then be combined with careful post processing methodology to obtain the final results for text detection. As a result of TextFocus’ ability to resample very large image samples (4000x4000 pixels) into low resolution chips (640x640 pixels), our model can train twice as quickly and handle batches as large as 50 on a single GPU when scaled normally. When the larger the training size, the better the result is basic tactic, our method demonstrates that training on high resolution scale might not be ideal. Our implementation using ResNet-18 backbone with segment-like head achieves 0.828 F1 score on the SCUT-CTW1500 [1] dataset, 0.611 F1 score on the Large CTW [2] dataset with acceptable FPS for realtime purpose.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yuliang, L., Lianwen, J., Shuaitao, Z., Sheng, Z.: Detecting curve text in the wild: new dataset and new solution (2017). arXiv:1712.02170

  2. Yuan, T.-L., Zhu, Z., Xu, K., Li, C.-J., Mu, T.-J., Hu, S.-M.: A large Chinese text dataset in the wild. J. Comput. Sci. Technol. 34, 509–521 (2019)

    Article  Google Scholar 

  3. Mukhiddinov, M.: Scene text detection and localization using fully convolutional network. In: Proceedings of the International Conference on Information Science and Communications Technologies (ICISCT), pp. 1–5. IEEE (2019)

    Google Scholar 

  4. Zhang, S.-X., Yang, C., Zhu, X., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. IEEE Trans. Multimedia (2023). https://doi.org/10.1109/TMM.2023.3286657

  5. Ye, M., Zhang, J., Zhao, S., et al.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 348–357 (2023)

    Google Scholar 

  6. Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Tan, C.L.: Text flow: a unified text detection system in natural scene images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4651–4659 (2015)

    Google Scholar 

  7. Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2558–2567 (2015)

    Google Scholar 

  8. Sun, L., Huo, Q., Jia, W., Chen, K.: A robust approach for text detection from natural scene images. Pattern Recogn. 48(9), 2906–2920 (2015)

    Article  Google Scholar 

  9. Yin, X.-C., Yin, X., Huang, K., Hao, H.-W.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2013)

    Google Scholar 

  10. Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20(11), 3111–3122 (2018)

    Article  MathSciNet  Google Scholar 

  11. Liao, M., Shi, B., Bai, X.: Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)

    Article  MathSciNet  Google Scholar 

  12. Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)

    Google Scholar 

  13. Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)

    Google Scholar 

  14. Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8440–8449 (2019)

    Google Scholar 

  15. Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: ABCNet: real-time scene text spotting with adaptive Bezier-curve network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9809–9818 (2020)

    Google Scholar 

  16. Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3123–3131 (2021)

    Google Scholar 

  17. Dai, P., Zhang, S., Zhang, H., Cao, X.: Progressive contour regression for arbitrary-shape scene text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7393–7402 (2021)

    Google Scholar 

  18. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  19. Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: Proceedings of 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)

    Google Scholar 

  20. Ch’ng, C.K., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 935–942. IEEE (2017)

    Google Scholar 

  21. Najibi, M., Singh, B., Davis, L.S.: Autofocus: Efficient multi-scale inference. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9745–9755 (2019)

    Google Scholar 

  22. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)

    Google Scholar 

  23. Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. Lecture Notes in Computer Science(), vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4

  24. Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)

    Google Scholar 

  25. Nabati, R., Qi, H.: RRPN: radar region proposal network for object detection in autonomous vehicles. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 3093–3097. IEEE (2019)

    Google Scholar 

  26. Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. Lecture Notes in Computer Science(), vol. 11206, pp. 20–36. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_2

  27. Hung, P.D., Loan, B.T.: Automatic Vietnamese Passport recognition on android phones. In: Dang, T.K., Küng, J., Takizawa, M., Chung, T.M. (eds.) Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications. FDSE 2020. Communications in Computer and Information Science, vol. 1306, pp. 476–485. Springer, Singapore (2020). https://doi.org/10.1007/978-981-33-4370-2_36

  28. Duy, L.D., Hung, P.D.: Adaptive graph attention network in person re-identification. Pattern Recogn. Image Anal. 32, 384–392 (2022)

    Article  Google Scholar 

  29. Su, N.T., Hung, P.D., Vinh, B.T., Diep, V.T.: Rice leaf disease classification using deep learning and target for mobile devices. In: Al-Emran, M., Al-Sharafi, M.A., Al-Kabi, M.N., Shaalan, K. (eds.) Proceedings of International Conference on Emerging Technologies and Intelligent Systems. ICETIS 2021. Lecture Notes in Networks and Systems, vol. 299. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-82616-1_13

  30. Hung, L.Q., Tuan, T.D., Hieu, N.T., Hung, P.D.: Cervical spine fracture detection via computed tomography scan. In: Nguyen, N.T., et al. (eds.) Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2023. Communications in Computer and Information Science, vol. 1863. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-42430-4_38

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phan Duy Hung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Manh, D.Q., Khoi, T.M., Hieu, D.M., Hung, P.D. (2024). TextFocus: Efficient Multi-scale Detection for Arbitrary Scene Text. In: Hà, M.H., Zhu, X., Thai, M.T. (eds) Computational Data and Social Networks. CSoNet 2023. Lecture Notes in Computer Science, vol 14479. Springer, Singapore. https://doi.org/10.1007/978-981-97-0669-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0669-3_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0668-6

  • Online ISBN: 978-981-97-0669-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics