Abstract
In the age of deep learning, the emergence of high-resolution datasets containing small text presents a growing challenge in scene text detection. Scaling down entire images to address this issue often leads to text distortion and performance degradation. In this research, we introduce TextFocus, an innovative algorithm that leverages a multi-scale training strategy with a focus on efficiency. Instead of analyzing each pixel in an image pyramid, TextFocus will attempt to identify context regions surrounding ground-truth instances, or “chips,” and then process for finding all text regions in the image sample. All text information from every chip from the model will then be combined with careful post processing methodology to obtain the final results for text detection. As a result of TextFocus’ ability to resample very large image samples (4000x4000 pixels) into low resolution chips (640x640 pixels), our model can train twice as quickly and handle batches as large as 50 on a single GPU when scaled normally. When the larger the training size, the better the result is basic tactic, our method demonstrates that training on high resolution scale might not be ideal. Our implementation using ResNet-18 backbone with segment-like head achieves 0.828 F1 score on the SCUT-CTW1500 [1] dataset, 0.611 F1 score on the Large CTW [2] dataset with acceptable FPS for realtime purpose.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yuliang, L., Lianwen, J., Shuaitao, Z., Sheng, Z.: Detecting curve text in the wild: new dataset and new solution (2017). arXiv:1712.02170
Yuan, T.-L., Zhu, Z., Xu, K., Li, C.-J., Mu, T.-J., Hu, S.-M.: A large Chinese text dataset in the wild. J. Comput. Sci. Technol. 34, 509–521 (2019)
Mukhiddinov, M.: Scene text detection and localization using fully convolutional network. In: Proceedings of the International Conference on Information Science and Communications Technologies (ICISCT), pp. 1–5. IEEE (2019)
Zhang, S.-X., Yang, C., Zhu, X., Yin, X.-C.: Arbitrary shape text detection via boundary transformer. IEEE Trans. Multimedia (2023). https://doi.org/10.1109/TMM.2023.3286657
Ye, M., Zhang, J., Zhao, S., et al.: Deepsolo: Let transformer decoder with explicit points solo for text spotting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 348–357 (2023)
Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Tan, C.L.: Text flow: a unified text detection system in natural scene images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4651–4659 (2015)
Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2558–2567 (2015)
Sun, L., Huo, Q., Jia, W., Chen, K.: A robust approach for text detection from natural scene images. Pattern Recogn. 48(9), 2906–2920 (2015)
Yin, X.-C., Yin, X., Huang, K., Hao, H.-W.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2013)
Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20(11), 3111–3122 (2018)
Liao, M., Shi, B., Bai, X.: Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8440–8449 (2019)
Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: ABCNet: real-time scene text spotting with adaptive Bezier-curve network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9809–9818 (2020)
Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3123–3131 (2021)
Dai, P., Zhang, S., Zhang, H., Cao, X.: Progressive contour regression for arbitrary-shape scene text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7393–7402 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: Proceedings of 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Ch’ng, C.K., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 935–942. IEEE (2017)
Najibi, M., Singh, B., Davis, L.S.: Autofocus: Efficient multi-scale inference. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9745–9755 (2019)
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. Lecture Notes in Computer Science(), vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)
Nabati, R., Qi, H.: RRPN: radar region proposal network for object detection in autonomous vehicles. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 3093–3097. IEEE (2019)
Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. Lecture Notes in Computer Science(), vol. 11206, pp. 20–36. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_2
Hung, P.D., Loan, B.T.: Automatic Vietnamese Passport recognition on android phones. In: Dang, T.K., Küng, J., Takizawa, M., Chung, T.M. (eds.) Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications. FDSE 2020. Communications in Computer and Information Science, vol. 1306, pp. 476–485. Springer, Singapore (2020). https://doi.org/10.1007/978-981-33-4370-2_36
Duy, L.D., Hung, P.D.: Adaptive graph attention network in person re-identification. Pattern Recogn. Image Anal. 32, 384–392 (2022)
Su, N.T., Hung, P.D., Vinh, B.T., Diep, V.T.: Rice leaf disease classification using deep learning and target for mobile devices. In: Al-Emran, M., Al-Sharafi, M.A., Al-Kabi, M.N., Shaalan, K. (eds.) Proceedings of International Conference on Emerging Technologies and Intelligent Systems. ICETIS 2021. Lecture Notes in Networks and Systems, vol. 299. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-82616-1_13
Hung, L.Q., Tuan, T.D., Hieu, N.T., Hung, P.D.: Cervical spine fracture detection via computed tomography scan. In: Nguyen, N.T., et al. (eds.) Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2023. Communications in Computer and Information Science, vol. 1863. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-42430-4_38
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Manh, D.Q., Khoi, T.M., Hieu, D.M., Hung, P.D. (2024). TextFocus: Efficient Multi-scale Detection for Arbitrary Scene Text. In: Hà, M.H., Zhu, X., Thai, M.T. (eds) Computational Data and Social Networks. CSoNet 2023. Lecture Notes in Computer Science, vol 14479. Springer, Singapore. https://doi.org/10.1007/978-981-97-0669-3_4
Download citation
DOI: https://doi.org/10.1007/978-981-97-0669-3_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0668-6
Online ISBN: 978-981-97-0669-3
eBook Packages: Computer ScienceComputer Science (R0)