Abstract
Existing tracking methods might suffer from the performance degradation due to insufficient positive samples. A typical network structure is proposed to enrich positive samples by generating masks during the tracking process. Although this structure has achieved good results, it ignores the drift problem that occurs when the tracked object is very similar to the surrounding objects. This problem is particularly significant when background interference exists and similar objects appear. To handle this problem, in this paper, we propose a novel attentive adversarial network for visual tracking. Inspired by human visual cognitive system, we propose to employ an attention mechanism to focus on each region differing the target object from the background. At the same time, we use a variant of the cross entropy (CE) function to deal with the class imbalance problem. Our network shows favorable performance compared with state-of-the-art methods on existing tracking benchmark datasets. We conclude that our novel attentive adversarial network not only enriches positive samples in the feature space but also prevents the similarity drift problem.
Similar content being viewed by others
References
Song Y, Ma C, Wu X, Gong L, Bao L, Zuo W, et al. Vital: Visual tracking via adversarial learning. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2018. p. 8990–8999.
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. In: The 28th International Conference on Neural Information Processing Systems (NIPS); 2014. p. 2672–2680.
Kaneko T, Yoshitaka U, Harada T. 2018. Label-noise robust generative adversarial networks. CoRR, arXiv:1811.11165.
Wang X, Cao Z, Wang R, Liu Z, Zhu X. Improving human pose estimation with self-attention generative adversarial networks. IEEE Access 2019;7:119668–119680.
Bau D, Zhu J, Strobelt H, Zhou B, Tenenbaum BJ, Freeman WT, et al. 2018. GAN dissection: visualizing and understanding generative adversarial networks. CoRR, arXiv:1811.10597.
Ye Z, Lyu F, Li L, Sun Y, Fu Q, Hu F. Unsupervised object transfiguration with attention. Cogn Comput 2019;11(6):869–878.
Li Y, Yang L, Xu B, Wang J, Lin H. Improving user attribute classification with text and social network attention. Cogn Comput 2019;11(4):459–468.
Chen B, Li P, Sun C, Wang D, Yang G, Lu H. Multi attention module for visual tracking. Pattern Recogn 2019;87:80–93.
Gao T, Han X, Liu Z, Sun M. Hybrid attention-based prototypical networks for noisy few-shot relation classification. In: The 33rd AAAI Conference on Artificial Intelligence (AAAI); 2019. vol. 33, p. 6407–6414.
Baltrušaitis T, Ahuja C, Morency L. Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 2019;41(2):423–443.
Patrick V, Emma S, Andrew M. Simultaneously self-attending to all mentions for full-abstract biological relation extraction. In: The 56th Annual Meeting of the Association for Computational Linguistics (ACL); 2018. p. 872–884.
Hu D. 2018. An introductory survey on attention mechanisms in NLP problems. CoRR, arXiv:1811.05544.
Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In: 2017 IEEE International Conference on Computer Vision (ICCV); 2017. p. 4846–4855.
Kosiorek AR, Bewley A, Ingmar P. Hierarchical attentive recurrent tracking. In: The 31st International Conference on Neural Information Processing Systems (NIPS); 2017. p. 3056–3064.
Pu S, Song Y, Ma C, Zhang H, Yang M. Deep attentive tracking via reciprocative learning. In: The 32nd International Conference on Neural Information Processing Systems (NIPS); 2018. p. 1935–1945.
Luo L, Xiong Y, Liu Y. Adaptive gradient methods with dynamic bound of learning rate. In The 7th International Conference on Learning Representations (ICLR); 2019.
Cui Y, Zhang J, He Z, Hu J. Multiple pedestrian tracking by combining particle filter and network flow model. Neurocomputing 2019;351:217–227.
Assa A, et al. Sample-based adaptive Kalman filtering for accurate camera pose tracking. Neurocomputing 2019; 333:307–318.
Huang F, Chen Y, Li L, Ji Z, Tao J, Tan X, Fan G. Implementation of the parallel mean shift-based image segmentation algorithm on a GPU cluster. Int J Digit Earth 2019;12(3):328–353.
Vojir T, Noskova J, Matas J. Robust scale-adaptive mean-shift for tracking. Image Analysis. 2013:652–663.
Ghassabeh YA, Rudzicz F. Modified mean shift algorithm. IET Image Process 2018;12(12):2172–2177.
Wang Z, Dai S. Mean-shift algorithm for 3D spatial positioning. Procedia Comput Sci 2018;131:446–453.
Ma C, Huang J, Yang X, Yang M. Adaptive correlation filters with long-term and short-term memory for object tracking. Int J Comput Vis 2018;126(8):771–796.
Zuo W, Wu X, Lin L, Zhang L, Yang M. Learning support correlation filters for visual tracking. IEEE Trans Pattern Anal Mach Intell 2019;41(5):1158–1172.
Danelljan M, Bhat G, Khan FS, Felsberg M. Eco: Efficient convolution operators for tracking. In: 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2017. p. 6931–6939.
Danelljan M, Robinson A, Khan FS, Felsberg M. Beyond correlation filters: learning continuous convolution operators for visual tracking. In: The 12nd European Conference on Computer Vision (ECCV); 2016. p. 472–488.
He Z, Fan Y, Zhuang J, Dong Y, Bai H. Correlation filters with weighted convolution responses. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW); 2017. p. 1992–2000.
Sun Z, Wang Y, Robert L. Hard negative mining for correlation filters in visual tracking. Mach Vis Appl 2019;30(3):487– 506.
Wang Q, Zhang L, Bertinetto L, Hu W, Torr PHS. 2018. Fast online object tracking and segmentation: a unifying approach. CoRR, arXiv:1812.05050.
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W. Distractor-aware Siamese networks for visual object tracking. In: The 14th European Conference on Computer Vision (ECCV); 2018. p. 103–119.
Li P, Wang D, Wang L, Lu H. Deep visual tracking: review and experimental comparison. Pattern Recogn 2018;76:323– 338.
Qi Y, Zhang S, Qin L, Huang Q, Yao H, Lim J, et al. Hedging deep features for visual tracking. IEEE Trans Pattern Anal Mach Intell 2019;41(5):1116–1130.
Bhat G, Danelljan M, Gool LV, Timofte R. Learning discriminative model prediction for tracking. In: 2019 IEEE International Conference on Computer Vision (ICCV); 2019. p. 6181–6190.
Li C, Lin L, Zuo W, Tang J, Yang M. Visual tracking via dynamic graph learning. IEEE Trans Pattern Anal Mach Intell 2019;41(11):2770–2782.
Li C, Liang X, Lu Y, Zhao N, Tang J. RGB-T object tracking: benchmark and baseline. Pattern Recogn 2019;96:106977.
Wang Z, Healy G, Smeaton AF, Ward TE. Use of neural signals to evaluate the quality of generative adversarial network performance in facial image generation. Cogn Comput 2019;12:13–24.
Englert C, Koroma D, Bertrams A, Martarelli CS. Testing the validity of the attention control video: an eye-tracking approach of the ego depletion effect. PLOS ONE 2019;14(1):1–12.
Wu Y, Zhang R, Zhan Y. Attention-based convolutional neural network for the detection of built-up areas in high-resolution sar images. In: 2018 IEEE International Geoscience and Remote Sensing Society (IGARSS); 2018. p. 4495–4498.
Crowe EM, Howard CJ, Attwood AS, Kent C. Goal-directed unequal attention allocation during multiple object tracking. Attent Percept Psychophys 2019;81(5):1312–1326.
Nam H, Han B. Learning multi-domain convolutional neural networks for visual tracking. In: 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2016. p. 4293–4302.
Wu Y, Lim J, Yang M. Online object tracking: a benchmark. In 2013 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2013. p. 2411–2418.
Wu Y, Lim J, Yang M. Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 2015;37(9): 1834–1848.
Kristan M, et al. The visual object tracking vot2016 challenge results. In: The 12nd European Conference on Computer Vision Workshops (ECCVW); 2016. p. 777–823.
Paszke A, Gross S, Soumith C, Chanan G, Edward Y, DeVito Z, et al. Automatic differentiation in PyTorch. In The 31st International Conference on Neural Information Processing Systems (NIPS); 2017.
Funding
This work was partly supported by the National Natural Science Foundation of China (Grant No. 61602006, No. 61702002, and No. 61976003), Anhui Provincial Natural Science Foundation (Grant No. 1908085MF206).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, F., Wang, X., Tang, J. et al. VTAAN: Visual Tracking with Attentive Adversarial Network. Cogn Comput 13, 646–656 (2021). https://doi.org/10.1007/s12559-020-09727-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-020-09727-3