Abstract
Recently, Detection Transformer has become a trendy paradigm in object detection by virtual of eliminating complicated post-processing procedures. Some previous literatures have already explored DETR in scene text detection. However, arbitrary-shaped texts in the wild vary greatly in scale, predicting control points of text instances directly might achieve sub-optimal training efficiency and performance. To solve this problem, this paper proposes Scalable Text Detection Transformer (SText-DETR), a concise DETR framework using scalable query and content prior to improve detection performance and boost training process. The whole pipeline is built upon the two-stage variant of Deformable-DETR. In particular, we present a Scalable Query Module in the decoder stage to modulate position query with text’s width and height, making each text instance more sensitive to its scale. Moreover, Content Prior is presented as auxiliary information to offer better prior and speed up the training process. We conduct extensive experiments on three curved text benchmarks Total-Text, CTW1500, and ICDAR19 ArT, respectively. Results show that our proposed SText-DETR surpasses most existing methods and achieves comparable performance to the state-of-art method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-End object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chng, C.K., et al.: ICDAR 2019 robust reading challenge on arbitrary-shaped text-RRC-ART. In: ICDAR, pp. 1571–1576. IEEE (2019)
Ch’ng, C.K., Chan, C.S., Liu, C.L.: Total-text: toward orientation robustness in scene text detection. IJDAR 23(1), 31–52 (2020)
Dai, P., Zhang, S., Zhang, H., Cao, X.: Progressive contour regression for arbitrary-shape scene text detection. In: CVPR, pp. 7393–7402 (2021)
Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: detecting scene text via instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3CL: intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. Int. J. Comput. Vision 130(8), 1961–1977 (2022)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Huang, M., et al.: Swintextspotter: scene text spotting via better synergy between text detection and text recognition. In: CVPR, pp. 4593–4603 (2022)
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logist. Q. 2(1–2), 83–97 (1955)
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: AAAI, vol. 34, pp. 11474–11481 (2020)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017)
Liu, S., et al.: DAB-DETR: dynamic anchor boxes are better queries for detr. In: ICLR (2022)
Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: ABCNET: real-time scene text spotting with adaptive bezier-curve network. In: CVPR, pp. 9809–9818 (2020)
Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. PR 90, 337–345 (2019)
Liu, Y., et al.: Abcnet v2: adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 8048–8064 (2021)
Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: Textsnake: a flexible representation for detecting text of arbitrary shapes. In: ECCV, pp. 20–36 (2018)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2017)
Meng, D., et al.: Conditional detr for fast training convergence. In: CVPR, pp. 3651–3660 (2021)
Nayef, N., et al.: ICDAR 2019 robust reading challenge on multi-lingual scene text detection and recognition-RRC-MLT-2019. In: ICDAR, pp. 1582–1587. IEEE (2019)
Raisi, Z., Naiel, M.A., Younes, G., Wardell, S., Zelek, J.S.: Transformer-based text detection in the wild. In: CVPR, pp. 3162–3171 (2021)
Sun, Y., et al.: ICDAR 2019 competition on large-scale street view text with partial labeling-RRC-LSVT. In: ICDAR, pp. 1557–1562. IEEE (2019)
Tang, J., et al.: Few could be better than all: feature sampling and grouping for scene text detection. In: CVPR, pp. 4563–4572 (2022)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Wang, F., Chen, Y., Wu, F., Li, X.: Textray: contour-based geometric modeling for arbitrary-shaped scene text detection. In: ACM MM, pp. 111–119 (2020)
Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: ICCV, pp. 8440–8449 (2019)
Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020)
Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: DPTEXT-DETR: towards better scene text detection with dynamic points in transformer. arXiv preprint arXiv:2207.04491 (2022)
Zhang, S.X., Zhu, X., Yang, C., Yin, X.C.: Arbitrary shape text detection via boundary transformer. arXiv preprint arXiv:2205.05320 (2022)
Zhang, X., Su, Y., Tripathi, S., Tu, Z.: Text spotting transformers. In: CVPR, pp. 9519–9528 (2022)
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: CVPR, pp. 5551–5560 (2017)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: deformable transformers for end-to-end object detection. In: ICLR (2021)
Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liao, P., Wang, Z. (2024). SText-DETR: End-to-End Arbitrary-Shaped Text Detection with Scalable Query in Transformer. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14433. Springer, Singapore. https://doi.org/10.1007/978-981-99-8546-3_39
Download citation
DOI: https://doi.org/10.1007/978-981-99-8546-3_39
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8545-6
Online ISBN: 978-981-99-8546-3
eBook Packages: Computer ScienceComputer Science (R0)