Skip to main content
Log in

Deblurring transformer tracking with conditional cross-attention

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

In object tracking, motion blur is a common challenge induced by rapid movement of target object or long time exposure of the camera, which leads to poor tracking performance. Traditional solutions usually perform image recovery operations before tracking object. However, most image recovery methods usually have higher computational cost, which decreases the tracking speed. In order to solve the above problems, we propose a deblurring Transformer-based tracking method embedding the conditional cross-attention. The proposed method integrates three important modules: (1) an image quality assessment (IQA) module to estimate image quality; (2) an image deblurring module based on lightweight adversarial network to improve image quality; and (3) a tracking module based on Transformer with conditional cross-attention to enhance the object localization ability. Experimental results on two UAV object tracking benchmarks show that the proposed trackers achieve competitive results compared to several state-of-the-art trackers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. You, S., Zhu, H., Li, M., Li, Y.: A review of visual trackers and analysis of its application to mobile robot. ArXiv abs/1910.09761 (2019)

  2. Li, P., Wang, D., Wang, L., Lu, H.: Deep visual tracking: review and experimental comparison. Pattern Recogn. 76, 323–338 (2018)

    Article  Google Scholar 

  3. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: ECCV, pp. 101–117 (2018)

  4. Dai, K., Wang, D., Lu, H., Sun, C., Li, J.: Visual tracking via adaptive spatially-regularized correlation filters. In: CVPR, pp. 4670–4679 (2019)

  5. Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer tracking. In: CVPR, pp. 8126–8135 (2021)

  6. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020)

  7. Meng, D., Chen, X., Fan, Z., Zeng, G., Li, H., Yuan, Y., Sun, L., Wang, J.: Conditional detr for fast training convergence. In: CVPR, pp. 3651–3660 (2021)

  8. Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al.: Searching for mobilenetv3. In: ICCV, pp. 1314–1324 (2019)

  9. Li, S., Yeung, D.-Y.: Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models. AAAI Conf. Artif. Intell. 31, 4140–4146 (2017)

    Google Scholar 

  10. Fu, C., Cao, Z., Li, Y., Ye, J., Feng, C.: Onboard real-time aerial tracking with efficient siamese anchor proposal network. IEEE Trans. Geosci. Remote Sens. 60, 1–13 (2021)

    Google Scholar 

  11. Wang, F., Yin, S., Mbelwa, J.T., Sun, F.: Learning saliency aware correlation filter for visual tracking. Multimed. Tools Appl. 81, 27879–27893 (2022)

    Article  Google Scholar 

  12. Wang, Y., Wang, F., Wang, C., He, J., Sun, F.: Context and saliency aware correlation filter for visual target tracking. Computer J. 65, 1846–1859 (2022)

    Article  Google Scholar 

  13. Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese instance search for tracking. In: CVPR, pp. 1420–1429 (2016)

  14. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: ECCV, pp. 850–865 (2016)

  15. Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: CVPR, pp. 6668–6677 (2020)

  16. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: CVPR, pp. 4282–4291 (2019)

  17. Guo, D., Shao, Y., Cui, Y., Wang, Z., Zhang, L., Shen, C.: Graph attention tracking. In: CVPR, pp. 9543–9552 (2021)

  18. Wu, R., Wen, X., Liu, Z., Yuan, L., Xu, H.: Stasiamrpn: visual tracking based on spatiotemporal and attention. Multimed. Syst. 28, 1543–1555 (2021)

    Article  Google Scholar 

  19. Ondrašovič, M., Tarábek, P.: Siamese visual object tracking: a survey. IEEE Access 9, 110149–110172 (2021)

    Article  Google Scholar 

  20. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: CVPR, pp. 8971–8980 (2018)

  21. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł, Polosukhin, I.: Attention is all you need. NeurIPS 30, 6000–6010 (2017)

    Google Scholar 

  22. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S.A.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2021)

  23. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021)

  24. Yan, B., Peng, H., Fu, J., Wang, D., Lu, H.: Learning spatio-temporal transformer for visual tracking. In: ICCV, pp. 10448–10457 (2021)

  25. Chen, B., Li, P., Bai, L., Qiao, L., Shen, Q., Li, B., Gan, W., Wu, W., Ouyang, W.: Backbone is all your need: a simplified architecture for visual object tracking. In: ECCV, pp. 375–392 (2022)

  26. Cui, Y., Jiang, C., Wang, L., Wu, G.: Mixformer: End-to-end tracking with iterative mixed attention. In: CVPR, pp. 13598–13608 (2022)

  27. Song, Z., Yu, J., Chen, Y.P., Yang, W.: Transformer tracking with cyclic shifting window attention. In: CVPR, pp. 8781–8790 (2022)

  28. Zhao, M., Okada, K., Inaba, M.: Trtr: Visual tracking with transformer. ArXiv abs/2105.03817 (2021)

  29. Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., Bharath, A.A.: Generative adversarial networks: An overview. IEEE Signal Process. Mag. 35(1), 53–65 (2018)

    Article  Google Scholar 

  30. Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR, pp. 1125–1134 (2017)

  31. Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)

    Article  Google Scholar 

  32. Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: Deblurgan: Blind motion deblurring using conditional adversarial networks. In: CVPR, pp. 8183–8192 (2018)

  33. Kupyn, O., Martyniuk, T., Wu, J., Wang, Z.: Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better. In: ICCV, pp. 8878–8887 (2019)

  34. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141 (2018)

  35. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017)

  36. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: ECCV, pp. 694–711 (2016)

Download references

Acknowledgements

This work was partly supported by the National Natural Science Foundation of China under Grant 61976042 and 61972068, Innovative Talents Program for Liaoning Universities under Grant LR2019020 and the Liaoning Revitalization Talents Program under Grant XLYC2007023, and was partly supported by Applied Basic Research Project of Liaoning Province under Grant 2022JH2/101300279.

Author information

Authors and Affiliations

Authors

Contributions

F. Sun and B. Zhu conceived this study. T. Zhao and F. Wang conducted the experiment and wrote the initial manuscript. X. Jia and F. Wang reviewed and edited it.

Corresponding author

Correspondence to Fasheng Wang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, F., Zhao, T., Zhu, B. et al. Deblurring transformer tracking with conditional cross-attention. Multimedia Systems 29, 1131–1144 (2023). https://doi.org/10.1007/s00530-022-01043-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-022-01043-0

Keywords

Navigation