TextFocus: Efficient Multi-scale Detection for Arbitrary Scene Text

Manh, Do Quang; Khoi, Tran Minh; Hieu, Duong Minh; Hung, Phan Duy

doi:10.1007/978-981-97-0669-3_4

Do Quang Manh¹⁰,
Tran Minh Khoi¹⁰,
Duong Minh Hieu¹⁰ &
…
Phan Duy Hung¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14479))

Included in the following conference series:

International Conference on Computational Data and Social Networks

84 Accesses

Abstract

In the age of deep learning, the emergence of high-resolution datasets containing small text presents a growing challenge in scene text detection. Scaling down entire images to address this issue often leads to text distortion and performance degradation. In this research, we introduce TextFocus, an innovative algorithm that leverages a multi-scale training strategy with a focus on efficiency. Instead of analyzing each pixel in an image pyramid, TextFocus will attempt to identify context regions surrounding ground-truth instances, or “chips,” and then process for finding all text regions in the image sample. All text information from every chip from the model will then be combined with careful post processing methodology to obtain the final results for text detection. As a result of TextFocus’ ability to resample very large image samples (4000x4000 pixels) into low resolution chips (640x640 pixels), our model can train twice as quickly and handle batches as large as 50 on a single GPU when scaled normally. When the larger the training size, the better the result is basic tactic, our method demonstrates that training on high resolution scale might not be ideal. Our implementation using ResNet-18 backbone with segment-like head achieves 0.828 F1 score on the SCUT-CTW1500 [1] dataset, 0.611 F1 score on the Large CTW [2] dataset with acceptable FPS for realtime purpose.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Yuliang, L., Lianwen, J., Shuaitao, Z., Sheng, Z.: Detecting curve text in the wild: new dataset and new solution (2017). arXiv:1712.02170
Yuan, T.-L., Zhu, Z., Xu, K., Li, C.-J., Mu, T.-J., Hu, S.-M.: A large Chinese text dataset in the wild. J. Comput. Sci. Technol. 34, 509–521 (2019)
Article Google Scholar
Mukhiddinov, M.: Scene text detection and localization using fully convolutional network. In: Proceedings of the International Conference on Information Science and Communications Technologies (ICISCT), pp. 1–5. IEEE (2019)
Google Scholar
Zhang, S.-X., Yang, C., Zhu, X., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. IEEE Trans. Multimedia (2023). https://doi.org/10.1109/TMM.2023.3286657
Ye, M., Zhang, J., Zhao, S., et al.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 348–357 (2023)
Google Scholar
Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Tan, C.L.: Text flow: a unified text detection system in natural scene images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4651–4659 (2015)
Google Scholar
Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2558–2567 (2015)
Google Scholar
Sun, L., Huo, Q., Jia, W., Chen, K.: A robust approach for text detection from natural scene images. Pattern Recogn. 48(9), 2906–2920 (2015)
Article Google Scholar
Yin, X.-C., Yin, X., Huang, K., Hao, H.-W.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2013)
Google Scholar
Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20(11), 3111–3122 (2018)
Article MathSciNet Google Scholar
Liao, M., Shi, B., Bai, X.: Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)
Article MathSciNet Google Scholar
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
Google Scholar
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Google Scholar
Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8440–8449 (2019)
Google Scholar
Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: ABCNet: real-time scene text spotting with adaptive Bezier-curve network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9809–9818 (2020)
Google Scholar
Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3123–3131 (2021)
Google Scholar
Dai, P., Zhang, S., Zhang, H., Cao, X.: Progressive contour regression for arbitrary-shape scene text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7393–7402 (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: Proceedings of 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Google Scholar
Ch’ng, C.K., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 935–942. IEEE (2017)
Google Scholar
Najibi, M., Singh, B., Davis, L.S.: Autofocus: Efficient multi-scale inference. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9745–9755 (2019)
Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)
Google Scholar
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. Lecture Notes in Computer Science(), vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)
Google Scholar
Nabati, R., Qi, H.: RRPN: radar region proposal network for object detection in autonomous vehicles. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 3093–3097. IEEE (2019)
Google Scholar
Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. Lecture Notes in Computer Science(), vol. 11206, pp. 20–36. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_2
Hung, P.D., Loan, B.T.: Automatic Vietnamese Passport recognition on android phones. In: Dang, T.K., Küng, J., Takizawa, M., Chung, T.M. (eds.) Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications. FDSE 2020. Communications in Computer and Information Science, vol. 1306, pp. 476–485. Springer, Singapore (2020). https://doi.org/10.1007/978-981-33-4370-2_36
Duy, L.D., Hung, P.D.: Adaptive graph attention network in person re-identification. Pattern Recogn. Image Anal. 32, 384–392 (2022)
Article Google Scholar
Su, N.T., Hung, P.D., Vinh, B.T., Diep, V.T.: Rice leaf disease classification using deep learning and target for mobile devices. In: Al-Emran, M., Al-Sharafi, M.A., Al-Kabi, M.N., Shaalan, K. (eds.) Proceedings of International Conference on Emerging Technologies and Intelligent Systems. ICETIS 2021. Lecture Notes in Networks and Systems, vol. 299. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-82616-1_13
Hung, L.Q., Tuan, T.D., Hieu, N.T., Hung, P.D.: Cervical spine fracture detection via computed tomography scan. In: Nguyen, N.T., et al. (eds.) Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2023. Communications in Computer and Information Science, vol. 1863. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-42430-4_38

Download references

Author information

Authors and Affiliations

FPT University, Hanoi, Vietnam
Do Quang Manh, Tran Minh Khoi, Duong Minh Hieu & Phan Duy Hung

Authors

Do Quang Manh
View author publications
You can also search for this author in PubMed Google Scholar
Tran Minh Khoi
View author publications
You can also search for this author in PubMed Google Scholar
Duong Minh Hieu
View author publications
You can also search for this author in PubMed Google Scholar
Phan Duy Hung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Phan Duy Hung .

Editor information

Editors and Affiliations

National Economics University, Hanoi, Vietnam
Minh Hoàng Hà
Florida Atlantic University, Boca Raton, FL, USA
Xingquan Zhu
University of Florida, Gainesville, FL, USA
My T. Thai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Manh, D.Q., Khoi, T.M., Hieu, D.M., Hung, P.D. (2024). TextFocus: Efficient Multi-scale Detection for Arbitrary Scene Text. In: Hà, M.H., Zhu, X., Thai, M.T. (eds) Computational Data and Social Networks. CSoNet 2023. Lecture Notes in Computer Science, vol 14479. Springer, Singapore. https://doi.org/10.1007/978-981-97-0669-3_4

Download citation

DOI: https://doi.org/10.1007/978-981-97-0669-3_4
Published: 29 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0668-6
Online ISBN: 978-981-97-0669-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics