VTAAN: Visual Tracking with Attentive Adversarial Network

Wang, Futian; Wang, Xiaoping; Tang, Jin; Luo, Bin; Li, Chenglong

doi:10.1007/s12559-020-09727-3

VTAAN: Visual Tracking with Attentive Adversarial Network

Published: 10 June 2020

Volume 13, pages 646–656, (2021)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Futian Wang¹,
Xiaoping Wang¹,
Jin Tang¹,
Bin Luo¹ &
…
Chenglong Li ORCID: orcid.org/0000-0002-7233-2739¹

362 Accesses
6 Citations
Explore all metrics

Abstract

Existing tracking methods might suffer from the performance degradation due to insufficient positive samples. A typical network structure is proposed to enrich positive samples by generating masks during the tracking process. Although this structure has achieved good results, it ignores the drift problem that occurs when the tracked object is very similar to the surrounding objects. This problem is particularly significant when background interference exists and similar objects appear. To handle this problem, in this paper, we propose a novel attentive adversarial network for visual tracking. Inspired by human visual cognitive system, we propose to employ an attention mechanism to focus on each region differing the target object from the background. At the same time, we use a variant of the cross entropy (CE) function to deal with the class imbalance problem. Our network shows favorable performance compared with state-of-the-art methods on existing tracking benchmark datasets. We conclude that our novel attentive adversarial network not only enriches positive samples in the feature space but also prevents the similarity drift problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

References

Song Y, Ma C, Wu X, Gong L, Bao L, Zuo W, et al. Vital: Visual tracking via adversarial learning. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2018. p. 8990–8999.
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. In: The 28th International Conference on Neural Information Processing Systems (NIPS); 2014. p. 2672–2680.
Kaneko T, Yoshitaka U, Harada T. 2018. Label-noise robust generative adversarial networks. CoRR, arXiv:1811.11165.
Wang X, Cao Z, Wang R, Liu Z, Zhu X. Improving human pose estimation with self-attention generative adversarial networks. IEEE Access 2019;7:119668–119680.
Article Google Scholar
Bau D, Zhu J, Strobelt H, Zhou B, Tenenbaum BJ, Freeman WT, et al. 2018. GAN dissection: visualizing and understanding generative adversarial networks. CoRR, arXiv:1811.10597.
Ye Z, Lyu F, Li L, Sun Y, Fu Q, Hu F. Unsupervised object transfiguration with attention. Cogn Comput 2019;11(6):869–878.
Article Google Scholar
Li Y, Yang L, Xu B, Wang J, Lin H. Improving user attribute classification with text and social network attention. Cogn Comput 2019;11(4):459–468.
Article Google Scholar
Chen B, Li P, Sun C, Wang D, Yang G, Lu H. Multi attention module for visual tracking. Pattern Recogn 2019;87:80–93.
Article Google Scholar
Gao T, Han X, Liu Z, Sun M. Hybrid attention-based prototypical networks for noisy few-shot relation classification. In: The 33rd AAAI Conference on Artificial Intelligence (AAAI); 2019. vol. 33, p. 6407–6414.
Baltrušaitis T, Ahuja C, Morency L. Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 2019;41(2):423–443.
Article Google Scholar
Patrick V, Emma S, Andrew M. Simultaneously self-attending to all mentions for full-abstract biological relation extraction. In: The 56th Annual Meeting of the Association for Computational Linguistics (ACL); 2018. p. 872–884.
Hu D. 2018. An introductory survey on attention mechanisms in NLP problems. CoRR, arXiv:1811.05544.
Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In: 2017 IEEE International Conference on Computer Vision (ICCV); 2017. p. 4846–4855.
Kosiorek AR, Bewley A, Ingmar P. Hierarchical attentive recurrent tracking. In: The 31st International Conference on Neural Information Processing Systems (NIPS); 2017. p. 3056–3064.
Pu S, Song Y, Ma C, Zhang H, Yang M. Deep attentive tracking via reciprocative learning. In: The 32nd International Conference on Neural Information Processing Systems (NIPS); 2018. p. 1935–1945.
Luo L, Xiong Y, Liu Y. Adaptive gradient methods with dynamic bound of learning rate. In The 7th International Conference on Learning Representations (ICLR); 2019.
Cui Y, Zhang J, He Z, Hu J. Multiple pedestrian tracking by combining particle filter and network flow model. Neurocomputing 2019;351:217–227.
Article Google Scholar
Assa A, et al. Sample-based adaptive Kalman filtering for accurate camera pose tracking. Neurocomputing 2019; 333:307–318.
Article Google Scholar
Huang F, Chen Y, Li L, Ji Z, Tao J, Tan X, Fan G. Implementation of the parallel mean shift-based image segmentation algorithm on a GPU cluster. Int J Digit Earth 2019;12(3):328–353.
Article Google Scholar
Vojir T, Noskova J, Matas J. Robust scale-adaptive mean-shift for tracking. Image Analysis. 2013:652–663.
Ghassabeh YA, Rudzicz F. Modified mean shift algorithm. IET Image Process 2018;12(12):2172–2177.
Article Google Scholar
Wang Z, Dai S. Mean-shift algorithm for 3D spatial positioning. Procedia Comput Sci 2018;131:446–453.
Article Google Scholar
Ma C, Huang J, Yang X, Yang M. Adaptive correlation filters with long-term and short-term memory for object tracking. Int J Comput Vis 2018;126(8):771–796.
Article Google Scholar
Zuo W, Wu X, Lin L, Zhang L, Yang M. Learning support correlation filters for visual tracking. IEEE Trans Pattern Anal Mach Intell 2019;41(5):1158–1172.
Article Google Scholar
Danelljan M, Bhat G, Khan FS, Felsberg M. Eco: Efficient convolution operators for tracking. In: 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2017. p. 6931–6939.
Danelljan M, Robinson A, Khan FS, Felsberg M. Beyond correlation filters: learning continuous convolution operators for visual tracking. In: The 12nd European Conference on Computer Vision (ECCV); 2016. p. 472–488.
He Z, Fan Y, Zhuang J, Dong Y, Bai H. Correlation filters with weighted convolution responses. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW); 2017. p. 1992–2000.
Sun Z, Wang Y, Robert L. Hard negative mining for correlation filters in visual tracking. Mach Vis Appl 2019;30(3):487– 506.
Article Google Scholar
Wang Q, Zhang L, Bertinetto L, Hu W, Torr PHS. 2018. Fast online object tracking and segmentation: a unifying approach. CoRR, arXiv:1812.05050.
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W. Distractor-aware Siamese networks for visual object tracking. In: The 14th European Conference on Computer Vision (ECCV); 2018. p. 103–119.
Li P, Wang D, Wang L, Lu H. Deep visual tracking: review and experimental comparison. Pattern Recogn 2018;76:323– 338.
Article Google Scholar
Qi Y, Zhang S, Qin L, Huang Q, Yao H, Lim J, et al. Hedging deep features for visual tracking. IEEE Trans Pattern Anal Mach Intell 2019;41(5):1116–1130.
Article Google Scholar
Bhat G, Danelljan M, Gool LV, Timofte R. Learning discriminative model prediction for tracking. In: 2019 IEEE International Conference on Computer Vision (ICCV); 2019. p. 6181–6190.
Li C, Lin L, Zuo W, Tang J, Yang M. Visual tracking via dynamic graph learning. IEEE Trans Pattern Anal Mach Intell 2019;41(11):2770–2782.
Article Google Scholar
Li C, Liang X, Lu Y, Zhao N, Tang J. RGB-T object tracking: benchmark and baseline. Pattern Recogn 2019;96:106977.
Article Google Scholar
Wang Z, Healy G, Smeaton AF, Ward TE. Use of neural signals to evaluate the quality of generative adversarial network performance in facial image generation. Cogn Comput 2019;12:13–24.
Article Google Scholar
Englert C, Koroma D, Bertrams A, Martarelli CS. Testing the validity of the attention control video: an eye-tracking approach of the ego depletion effect. PLOS ONE 2019;14(1):1–12.
Article Google Scholar
Wu Y, Zhang R, Zhan Y. Attention-based convolutional neural network for the detection of built-up areas in high-resolution sar images. In: 2018 IEEE International Geoscience and Remote Sensing Society (IGARSS); 2018. p. 4495–4498.
Crowe EM, Howard CJ, Attwood AS, Kent C. Goal-directed unequal attention allocation during multiple object tracking. Attent Percept Psychophys 2019;81(5):1312–1326.
Article Google Scholar
Nam H, Han B. Learning multi-domain convolutional neural networks for visual tracking. In: 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2016. p. 4293–4302.
Wu Y, Lim J, Yang M. Online object tracking: a benchmark. In 2013 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2013. p. 2411–2418.
Wu Y, Lim J, Yang M. Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 2015;37(9): 1834–1848.
Article Google Scholar
Kristan M, et al. The visual object tracking vot2016 challenge results. In: The 12nd European Conference on Computer Vision Workshops (ECCVW); 2016. p. 777–823.
Paszke A, Gross S, Soumith C, Chanan G, Edward Y, DeVito Z, et al. Automatic differentiation in PyTorch. In The 31st International Conference on Neural Information Processing Systems (NIPS); 2017.

Download references

Funding

This work was partly supported by the National Natural Science Foundation of China (Grant No. 61602006, No. 61702002, and No. 61976003), Anhui Provincial Natural Science Foundation (Grant No. 1908085MF206).

Author information

Authors and Affiliations

School of Computer Science and Technology, Anhui University, Hefei, 230601, China
Futian Wang, Xiaoping Wang, Jin Tang, Bin Luo & Chenglong Li

Authors

Futian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoping Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jin Tang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Luo
View author publications
You can also search for this author in PubMed Google Scholar
Chenglong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chenglong Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, F., Wang, X., Tang, J. et al. VTAAN: Visual Tracking with Attentive Adversarial Network. Cogn Comput 13, 646–656 (2021). https://doi.org/10.1007/s12559-020-09727-3

Download citation

Received: 29 October 2019
Accepted: 14 May 2020
Published: 10 June 2020
Issue Date: May 2021
DOI: https://doi.org/10.1007/s12559-020-09727-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

VTAAN: Visual Tracking with Attentive Adversarial Network

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Ethical Approval

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

VTAAN: Visual Tracking with Attentive Adversarial Network

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Ethical Approval

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation