Skip to main content
Log in

VTAAN: Visual Tracking with Attentive Adversarial Network

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Existing tracking methods might suffer from the performance degradation due to insufficient positive samples. A typical network structure is proposed to enrich positive samples by generating masks during the tracking process. Although this structure has achieved good results, it ignores the drift problem that occurs when the tracked object is very similar to the surrounding objects. This problem is particularly significant when background interference exists and similar objects appear. To handle this problem, in this paper, we propose a novel attentive adversarial network for visual tracking. Inspired by human visual cognitive system, we propose to employ an attention mechanism to focus on each region differing the target object from the background. At the same time, we use a variant of the cross entropy (CE) function to deal with the class imbalance problem. Our network shows favorable performance compared with state-of-the-art methods on existing tracking benchmark datasets. We conclude that our novel attentive adversarial network not only enriches positive samples in the feature space but also prevents the similarity drift problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Song Y, Ma C, Wu X, Gong L, Bao L, Zuo W, et al. Vital: Visual tracking via adversarial learning. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2018. p. 8990–8999.

  2. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. In: The 28th International Conference on Neural Information Processing Systems (NIPS); 2014. p. 2672–2680.

  3. Kaneko T, Yoshitaka U, Harada T. 2018. Label-noise robust generative adversarial networks. CoRR, arXiv:1811.11165.

  4. Wang X, Cao Z, Wang R, Liu Z, Zhu X. Improving human pose estimation with self-attention generative adversarial networks. IEEE Access 2019;7:119668–119680.

    Article  Google Scholar 

  5. Bau D, Zhu J, Strobelt H, Zhou B, Tenenbaum BJ, Freeman WT, et al. 2018. GAN dissection: visualizing and understanding generative adversarial networks. CoRR, arXiv:1811.10597.

  6. Ye Z, Lyu F, Li L, Sun Y, Fu Q, Hu F. Unsupervised object transfiguration with attention. Cogn Comput 2019;11(6):869–878.

    Article  Google Scholar 

  7. Li Y, Yang L, Xu B, Wang J, Lin H. Improving user attribute classification with text and social network attention. Cogn Comput 2019;11(4):459–468.

    Article  Google Scholar 

  8. Chen B, Li P, Sun C, Wang D, Yang G, Lu H. Multi attention module for visual tracking. Pattern Recogn 2019;87:80–93.

    Article  Google Scholar 

  9. Gao T, Han X, Liu Z, Sun M. Hybrid attention-based prototypical networks for noisy few-shot relation classification. In: The 33rd AAAI Conference on Artificial Intelligence (AAAI); 2019. vol. 33, p. 6407–6414.

  10. Baltrušaitis T, Ahuja C, Morency L. Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 2019;41(2):423–443.

    Article  Google Scholar 

  11. Patrick V, Emma S, Andrew M. Simultaneously self-attending to all mentions for full-abstract biological relation extraction. In: The 56th Annual Meeting of the Association for Computational Linguistics (ACL); 2018. p. 872–884.

  12. Hu D. 2018. An introductory survey on attention mechanisms in NLP problems. CoRR, arXiv:1811.05544.

  13. Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In: 2017 IEEE International Conference on Computer Vision (ICCV); 2017. p. 4846–4855.

  14. Kosiorek AR, Bewley A, Ingmar P. Hierarchical attentive recurrent tracking. In: The 31st International Conference on Neural Information Processing Systems (NIPS); 2017. p. 3056–3064.

  15. Pu S, Song Y, Ma C, Zhang H, Yang M. Deep attentive tracking via reciprocative learning. In: The 32nd International Conference on Neural Information Processing Systems (NIPS); 2018. p. 1935–1945.

  16. Luo L, Xiong Y, Liu Y. Adaptive gradient methods with dynamic bound of learning rate. In The 7th International Conference on Learning Representations (ICLR); 2019.

  17. Cui Y, Zhang J, He Z, Hu J. Multiple pedestrian tracking by combining particle filter and network flow model. Neurocomputing 2019;351:217–227.

    Article  Google Scholar 

  18. Assa A, et al. Sample-based adaptive Kalman filtering for accurate camera pose tracking. Neurocomputing 2019; 333:307–318.

    Article  Google Scholar 

  19. Huang F, Chen Y, Li L, Ji Z, Tao J, Tan X, Fan G. Implementation of the parallel mean shift-based image segmentation algorithm on a GPU cluster. Int J Digit Earth 2019;12(3):328–353.

    Article  Google Scholar 

  20. Vojir T, Noskova J, Matas J. Robust scale-adaptive mean-shift for tracking. Image Analysis. 2013:652–663.

  21. Ghassabeh YA, Rudzicz F. Modified mean shift algorithm. IET Image Process 2018;12(12):2172–2177.

    Article  Google Scholar 

  22. Wang Z, Dai S. Mean-shift algorithm for 3D spatial positioning. Procedia Comput Sci 2018;131:446–453.

    Article  Google Scholar 

  23. Ma C, Huang J, Yang X, Yang M. Adaptive correlation filters with long-term and short-term memory for object tracking. Int J Comput Vis 2018;126(8):771–796.

    Article  Google Scholar 

  24. Zuo W, Wu X, Lin L, Zhang L, Yang M. Learning support correlation filters for visual tracking. IEEE Trans Pattern Anal Mach Intell 2019;41(5):1158–1172.

    Article  Google Scholar 

  25. Danelljan M, Bhat G, Khan FS, Felsberg M. Eco: Efficient convolution operators for tracking. In: 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2017. p. 6931–6939.

  26. Danelljan M, Robinson A, Khan FS, Felsberg M. Beyond correlation filters: learning continuous convolution operators for visual tracking. In: The 12nd European Conference on Computer Vision (ECCV); 2016. p. 472–488.

  27. He Z, Fan Y, Zhuang J, Dong Y, Bai H. Correlation filters with weighted convolution responses. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW); 2017. p. 1992–2000.

  28. Sun Z, Wang Y, Robert L. Hard negative mining for correlation filters in visual tracking. Mach Vis Appl 2019;30(3):487– 506.

    Article  Google Scholar 

  29. Wang Q, Zhang L, Bertinetto L, Hu W, Torr PHS. 2018. Fast online object tracking and segmentation: a unifying approach. CoRR, arXiv:1812.05050.

  30. Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W. Distractor-aware Siamese networks for visual object tracking. In: The 14th European Conference on Computer Vision (ECCV); 2018. p. 103–119.

  31. Li P, Wang D, Wang L, Lu H. Deep visual tracking: review and experimental comparison. Pattern Recogn 2018;76:323– 338.

    Article  Google Scholar 

  32. Qi Y, Zhang S, Qin L, Huang Q, Yao H, Lim J, et al. Hedging deep features for visual tracking. IEEE Trans Pattern Anal Mach Intell 2019;41(5):1116–1130.

    Article  Google Scholar 

  33. Bhat G, Danelljan M, Gool LV, Timofte R. Learning discriminative model prediction for tracking. In: 2019 IEEE International Conference on Computer Vision (ICCV); 2019. p. 6181–6190.

  34. Li C, Lin L, Zuo W, Tang J, Yang M. Visual tracking via dynamic graph learning. IEEE Trans Pattern Anal Mach Intell 2019;41(11):2770–2782.

    Article  Google Scholar 

  35. Li C, Liang X, Lu Y, Zhao N, Tang J. RGB-T object tracking: benchmark and baseline. Pattern Recogn 2019;96:106977.

    Article  Google Scholar 

  36. Wang Z, Healy G, Smeaton AF, Ward TE. Use of neural signals to evaluate the quality of generative adversarial network performance in facial image generation. Cogn Comput 2019;12:13–24.

    Article  Google Scholar 

  37. Englert C, Koroma D, Bertrams A, Martarelli CS. Testing the validity of the attention control video: an eye-tracking approach of the ego depletion effect. PLOS ONE 2019;14(1):1–12.

    Article  Google Scholar 

  38. Wu Y, Zhang R, Zhan Y. Attention-based convolutional neural network for the detection of built-up areas in high-resolution sar images. In: 2018 IEEE International Geoscience and Remote Sensing Society (IGARSS); 2018. p. 4495–4498.

  39. Crowe EM, Howard CJ, Attwood AS, Kent C. Goal-directed unequal attention allocation during multiple object tracking. Attent Percept Psychophys 2019;81(5):1312–1326.

    Article  Google Scholar 

  40. Nam H, Han B. Learning multi-domain convolutional neural networks for visual tracking. In: 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2016. p. 4293–4302.

  41. Wu Y, Lim J, Yang M. Online object tracking: a benchmark. In 2013 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2013. p. 2411–2418.

  42. Wu Y, Lim J, Yang M. Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 2015;37(9): 1834–1848.

    Article  Google Scholar 

  43. Kristan M, et al. The visual object tracking vot2016 challenge results. In: The 12nd European Conference on Computer Vision Workshops (ECCVW); 2016. p. 777–823.

  44. Paszke A, Gross S, Soumith C, Chanan G, Edward Y, DeVito Z, et al. Automatic differentiation in PyTorch. In The 31st International Conference on Neural Information Processing Systems (NIPS); 2017.

Download references

Funding

This work was partly supported by the National Natural Science Foundation of China (Grant No. 61602006, No. 61702002, and No. 61976003), Anhui Provincial Natural Science Foundation (Grant No. 1908085MF206).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chenglong Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, F., Wang, X., Tang, J. et al. VTAAN: Visual Tracking with Attentive Adversarial Network. Cogn Comput 13, 646–656 (2021). https://doi.org/10.1007/s12559-020-09727-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-020-09727-3

Keywords

Navigation