Deblurring transformer tracking with conditional cross-attention

Sun, Fuming; Zhao, Tingting; Zhu, Bing; Jia, Xu; Wang, Fasheng

doi:10.1007/s00530-022-01043-0

Deblurring transformer tracking with conditional cross-attention

Regular Paper
Published: 30 December 2022

Volume 29, pages 1131–1144, (2023)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Fuming Sun¹,
Tingting Zhao¹,
Bing Zhu²,
Xu Jia³ &
…
Fasheng Wang¹

483 Accesses
Explore all metrics

Abstract

In object tracking, motion blur is a common challenge induced by rapid movement of target object or long time exposure of the camera, which leads to poor tracking performance. Traditional solutions usually perform image recovery operations before tracking object. However, most image recovery methods usually have higher computational cost, which decreases the tracking speed. In order to solve the above problems, we propose a deblurring Transformer-based tracking method embedding the conditional cross-attention. The proposed method integrates three important modules: (1) an image quality assessment (IQA) module to estimate image quality; (2) an image deblurring module based on lightweight adversarial network to improve image quality; and (3) a tracking module based on Transformer with conditional cross-attention to enhance the object localization ability. Experimental results on two UAV object tracking benchmarks show that the proposed trackers achieve competitive results compared to several state-of-the-art trackers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discriminative correlation tracking based on spatial attention mechanism for low-resolution imaging systems

Article 04 March 2021

RT-Deblur: real-time image deblurring for object detection

Article 11 July 2023

Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks

Data availability

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

You, S., Zhu, H., Li, M., Li, Y.: A review of visual trackers and analysis of its application to mobile robot. ArXiv abs/1910.09761 (2019)
Li, P., Wang, D., Wang, L., Lu, H.: Deep visual tracking: review and experimental comparison. Pattern Recogn. 76, 323–338 (2018)
Article Google Scholar
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: ECCV, pp. 101–117 (2018)
Dai, K., Wang, D., Lu, H., Sun, C., Li, J.: Visual tracking via adaptive spatially-regularized correlation filters. In: CVPR, pp. 4670–4679 (2019)
Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer tracking. In: CVPR, pp. 8126–8135 (2021)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020)
Meng, D., Chen, X., Fan, Z., Zeng, G., Li, H., Yuan, Y., Sun, L., Wang, J.: Conditional detr for fast training convergence. In: CVPR, pp. 3651–3660 (2021)
Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al.: Searching for mobilenetv3. In: ICCV, pp. 1314–1324 (2019)
Li, S., Yeung, D.-Y.: Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models. AAAI Conf. Artif. Intell. 31, 4140–4146 (2017)
Google Scholar
Fu, C., Cao, Z., Li, Y., Ye, J., Feng, C.: Onboard real-time aerial tracking with efficient siamese anchor proposal network. IEEE Trans. Geosci. Remote Sens. 60, 1–13 (2021)
Google Scholar
Wang, F., Yin, S., Mbelwa, J.T., Sun, F.: Learning saliency aware correlation filter for visual tracking. Multimed. Tools Appl. 81, 27879–27893 (2022)
Article Google Scholar
Wang, Y., Wang, F., Wang, C., He, J., Sun, F.: Context and saliency aware correlation filter for visual target tracking. Computer J. 65, 1846–1859 (2022)
Article Google Scholar
Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese instance search for tracking. In: CVPR, pp. 1420–1429 (2016)
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: ECCV, pp. 850–865 (2016)
Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: CVPR, pp. 6668–6677 (2020)
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: CVPR, pp. 4282–4291 (2019)
Guo, D., Shao, Y., Cui, Y., Wang, Z., Zhang, L., Shen, C.: Graph attention tracking. In: CVPR, pp. 9543–9552 (2021)
Wu, R., Wen, X., Liu, Z., Yuan, L., Xu, H.: Stasiamrpn: visual tracking based on spatiotemporal and attention. Multimed. Syst. 28, 1543–1555 (2021)
Article Google Scholar
Ondrašovič, M., Tarábek, P.: Siamese visual object tracking: a survey. IEEE Access 9, 110149–110172 (2021)
Article Google Scholar
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: CVPR, pp. 8971–8980 (2018)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł, Polosukhin, I.: Attention is all you need. NeurIPS 30, 6000–6010 (2017)
Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S.A.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2021)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021)
Yan, B., Peng, H., Fu, J., Wang, D., Lu, H.: Learning spatio-temporal transformer for visual tracking. In: ICCV, pp. 10448–10457 (2021)
Chen, B., Li, P., Bai, L., Qiao, L., Shen, Q., Li, B., Gan, W., Wu, W., Ouyang, W.: Backbone is all your need: a simplified architecture for visual object tracking. In: ECCV, pp. 375–392 (2022)
Cui, Y., Jiang, C., Wang, L., Wu, G.: Mixformer: End-to-end tracking with iterative mixed attention. In: CVPR, pp. 13598–13608 (2022)
Song, Z., Yu, J., Chen, Y.P., Yang, W.: Transformer tracking with cyclic shifting window attention. In: CVPR, pp. 8781–8790 (2022)
Zhao, M., Okada, K., Inaba, M.: Trtr: Visual tracking with transformer. ArXiv abs/2105.03817 (2021)
Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., Bharath, A.A.: Generative adversarial networks: An overview. IEEE Signal Process. Mag. 35(1), 53–65 (2018)
Article Google Scholar
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR, pp. 1125–1134 (2017)
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: Deblurgan: Blind motion deblurring using conditional adversarial networks. In: CVPR, pp. 8183–8192 (2018)
Kupyn, O., Martyniuk, T., Wu, J., Wang, Z.: Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better. In: ICCV, pp. 8878–8887 (2019)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141 (2018)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: ECCV, pp. 694–711 (2016)

Download references

Acknowledgements

This work was partly supported by the National Natural Science Foundation of China under Grant 61976042 and 61972068, Innovative Talents Program for Liaoning Universities under Grant LR2019020 and the Liaoning Revitalization Talents Program under Grant XLYC2007023, and was partly supported by Applied Basic Research Project of Liaoning Province under Grant 2022JH2/101300279.

Author information

Authors and Affiliations

School of Information and Communication Engineering, Dalian Minzu University, Liaohexi Road, Dalian, 116600, Liaoning, China
Fuming Sun, Tingting Zhao & Fasheng Wang
Department of Information Engineering, Harbin Institute of Technology, Xidazhi Street, Harbin, 150006, Heilongjiang, China
Bing Zhu
School of Electronics and Information Engineering, Liaoning University of Technology, Shiying Street, Jinzhou, 121001, Liaoning, China
Xu Jia

Authors

Fuming Sun
View author publications
You can also search for this author in PubMed Google Scholar
Tingting Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Bing Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xu Jia
View author publications
You can also search for this author in PubMed Google Scholar
Fasheng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F. Sun and B. Zhu conceived this study. T. Zhao and F. Wang conducted the experiment and wrote the initial manuscript. X. Jia and F. Wang reviewed and edited it.

Corresponding author

Correspondence to Fasheng Wang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sun, F., Zhao, T., Zhu, B. et al. Deblurring transformer tracking with conditional cross-attention. Multimedia Systems 29, 1131–1144 (2023). https://doi.org/10.1007/s00530-022-01043-0

Download citation

Received: 20 November 2022
Accepted: 19 December 2022
Published: 30 December 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s00530-022-01043-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deblurring transformer tracking with conditional cross-attention

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Discriminative correlation tracking based on spatial attention mechanism for low-resolution imaging systems

RT-Deblur: real-time image deblurring for object detection

Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now