A Quantum-Based Attention Mechanism in Scene Text Detection

Wu, Hao; Zhou, Jun; Zhang, Qiong; Lei, Yang; Yu, Kun; An, Wenbo; Zhang, Juntao

doi:10.1007/978-981-99-8543-2_1

Hao Wu¹⁵,
Jun Zhou¹⁵,
Qiong Zhang¹⁵,
Yang Lei¹⁵,
Kun Yu^15,16,
Wenbo An¹⁵ &
…
Juntao Zhang ORCID: orcid.org/0000-0001-8174-5378¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14432))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

754 Accesses
1 Citations

Abstract

Attention mechanisms have provided benefits in very many visual tasks, e.g. image classification, object detection, semantic segmentation. However, few attention modules have been proposed specifically for scene text detection. We propose an attention mechanism based on Quantum-State-based Mapping (QSM) that enhances channel and spatial attention, introduces higher-order representations, and mixes contextual information. Our approach includes two attention modules: Quantum-based Convolutional Attention Module (QCAM), a plug-and-play module applicable to pre-trained text detection models; Adaptive Channel Information Transfer Module (ACTM), which replaces feature pyramids and complex networks of DBNet++ with a 35.9% reduction in FLOPs. In CNN-based methods, our QCAM achieves state-of-the-art performance on three benchmarks. Remarkably, when compared to the Transformer-based methods such as FSG, our QCAM remains competitive in F-measure on all benchmarks. Notably, QCAM has a 29.5% reduction in parameters compared to FSG, resulting in a balance between detection accuracy and efficiency. ACTM significantly improves F-measure over DBNet++ on three benchmarks, providing an alternative to feature pyramids in scene text detection. The codes, models and training logs are available at https://github.com/yws-wxs/QCAM.

H. Wu and J. Zhou—Contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Combining Swin Transformer and Attention-Weighted Fusion for Scene Text Detection

Article Open access 17 February 2024

SAM: Self Attention Mechanism for Scene Text Recognition Based on Swin Transformer

Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention

References

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Chen, L., et al.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: CVPR, pp. 6298–6306 (2017)
Google Scholar
Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., Sun, J.: You only look one-level feature. In: CVPR, pp. 13034–13043 (2021)
Google Scholar
Ch’ng, C.K., Chan, C.S.: Total-Text: A comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 935–942 (2017)
Google Scholar
Du, B., Ye, J., Zhang, J., Liu, J., Tao, D.: I3CL: intra-and inter-instance collaborative learning for arbitrary-shaped scene text detection. IJCV 130(8), 1961–1977 (2022)
Article Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: CVPR, pp. 2315–2324 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160 (2015)
Google Scholar
Kuang, Z., et al.: MMOCR: a comprehensive toolbox for text detection, recognition and understanding. In: Proceedings of the 29th ACM International Conference on Multimedia, MM ’21, pp. 3791–3794. Association for Computing Machinery, New York (2021)
Google Scholar
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: AAAI, vol. 34, pp. 11474–11481 (2020)
Google Scholar
Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. TPAMI 45(1), 919–931 (2023)
Article Google Scholar
Lin, T.Y., Dollár P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 936–944 (2017)
Google Scholar
Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: ABCNet: real-time scene text spotting with adaptive bezier-curve network. In: CVPR, pp. 9806–9815 (2020)
Google Scholar
Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recogn. 90, 337–345 (2019)
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. TPAMI 39(6), 1137–1149 (2017)
Article Google Scholar
Tang, J., et al.: Few could be better than all: feature sampling and grouping for scene text detection. In: CVPR, pp. 4553–4562 (2022)
Google Scholar
Tolstikhin, I.O., et al.: MLP-Mixer: an all-MLP architecture for vision. In: NeurIPS, vol. 34, pp. 24261–24272 (2021)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pp. 6000–6010. Curran Associates Inc., Red Hook (2017)
Google Scholar
Wang, F., Chen, Y., Wu, F., Li, X.: TextRay: contour-based geometric modeling for arbitrary-shaped scene text detection. In: Proceedings of the 28th ACM International Conference on Multimedia, MM ’20, pp. 111–119. Association for Computing Machinery, New York (2020)
Google Scholar
Wang, H., et al.: All you need is boundary: toward arbitrary-shaped text spotting. In: AAAI, vol. 34, pp. 12160–12167 (2020)
Google Scholar
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: CVPR, pp. 9328–9337 (2019)
Google Scholar
Wang, Y., Xie, H., Zha, Z.J., Xing, M., Fu, Z., Zhang, Y.: ContourNet: taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11750–11759 (2020)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Zhang, J., et al.: An application of quantum mechanics to attention methods in computer vision. In: ICASSP, pp. 1–5 (2023)
Google Scholar
Zhang, S.X., et al.: Deep relational reasoning graph network for arbitrary shape text detection. In: CVPR, pp. 9696–9705 (2020)
Google Scholar
Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3122–3130 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of System Engineering, AMS, Beijing, China
Hao Wu, Jun Zhou, Qiong Zhang, Yang Lei, Kun Yu, Wenbo An & Juntao Zhang
University of Electronic Science and Technology of China, Chengdu, China
Kun Yu

Authors

Hao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Qiong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Lei
View author publications
You can also search for this author in PubMed Google Scholar
Kun Yu
View author publications
You can also search for this author in PubMed Google Scholar
Wenbo An
View author publications
You can also search for this author in PubMed Google Scholar
Juntao Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juntao Zhang .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, H. et al. (2024). A Quantum-Based Attention Mechanism in Scene Text Detection. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14432. Springer, Singapore. https://doi.org/10.1007/978-981-99-8543-2_1

Download citation

DOI: https://doi.org/10.1007/978-981-99-8543-2_1
Published: 29 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8542-5
Online ISBN: 978-981-99-8543-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Quantum-Based Attention Mechanism in Scene Text Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Combining Swin Transformer and Attention-Weighted Fusion for Scene Text Detection

SAM: Self Attention Mechanism for Scene Text Recognition Based on Swin Transformer

Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Quantum-Based Attention Mechanism in Scene Text Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Combining Swin Transformer and Attention-Weighted Fusion for Scene Text Detection

SAM: Self Attention Mechanism for Scene Text Recognition Based on Swin Transformer

Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation