Abstract
Most of the existing visual object trackers are based on deep convolutional feature maps, but there have fewer works about finding new features for tracking. This paper proposes a novel tracking framework based on a full convolutional auto-encoder appearance model, which is trained by using Wasserstein distance and maximum mean discrepancy . Compared with previous works, the proposed framework has better performance in three aspects, including appearance model, update scheme, and state estimation. To address the issues of the original update scheme including poor discriminant performance under limited supervisory information, sample pollution caused by long term object occlusion, and sample importance unbalance, in this paper, a novel latent space importance weighting algorithm, a novel sample space management algorithm, and a novel IOU-based label smoothing algorithm are proposed respectively. Besides, an improved weighted loss function is adopted to address the sample imbalance issue. Finally, to improve the state estimation accuracy, the combination of Kullback-Leibler divergence and generalized intersection over union is introduced. Extensive experiments are performed on the three widely used benchmarks, and the results demonstrate the state-of-the-art performance of the proposed method. Code and models are available at https://github.com/wahahamyt/CAT.git.
Similar content being viewed by others
References
Bertinetto L, Valmadre J, Henriques J F, et al (2016) Fully-convolutional siamese networks for object tracking[C]. In: European conference on computer vision. Springer, Cham, pp 850–865
Danelljan M, Hager G, Shahbaz Khan F, et al (2015) Learning spatially regularized correlation filters for visual tracking[C]. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318
Ma C, Huang J B, Yang X, et al (2015) Hierarchical convolutional features for visual tracking[C]. In: Proceedings of the IEEE international conference on computer vision, pp 3074–3082
Hong S, You T, Kwak S, et al (2015) Online tracking by learning discriminative saliency map with convolutional neural network[C]. In: International conference on machine learning, pp 597–606
Islam MM, Hu G, Liu Q et al (2018) Correlation filter based moving object tracking with scale adaptation and online re-detection[J]. IEEE Access 6:75244–75258
Wu Y, Lim J, Yang MH (2013) Online Object Tracking: A Benchmark[C]. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society
Wu Y, Lim J, Yang MH (2015) Object tracking benchmark[J]. IEEE Transac Pattern Analy Mach Intell 37(9):1834–1848
Liang P, Blasch E, Ling H (2015) Encoding color information for visual tracking: algorithms and benchmark[J]. IEEE Transac Image Process 24(12):5630–5644
Li B, Yan J, Wu W, et al (2018) High performance visual tracking with siamese region proposal network[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8971–8980
Li B, Wu W, Wang Q, et al (2019) Siamrpn++: Evolution of siamese visual tracking with very deep networks[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4282–4291
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4293–4302
Danelljan M, Bhat G, Khan F S, et al (2019) Atom: Accurate tracking by overlap maximization[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4660–4669
Zhang J, Ma S, Sclaroff S (2014) MEEM: robust tracking via multiple experts using entropy minimization[C]. In: European conference on computer vision. Springer, Cham, pp 188–203
Bolme D S, Beveridge J R, Draper B A, et al (2010) Visual object tracking using adaptive correlation filters[C]. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2544–2550
Henriques JF, Caseiro R, Martins P et al (2015) High-speed tracking with kernelized correlation filters[J]. IEEE Transac Pattern Analy Mach Intell 37(3):583–596
Bertinetto L, Valmadre J, Golodetz S, et al (2016) Staple: Complementary learners for real-time tracking[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409
Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking[C]. Advances in neural information processing systems 809–817
Girshick R, Donahue J, Darrell T, et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Lin TY, Goyal P, Girshick R et al (2017) Focal loss for dense object detection[J]. IEEE Transac Pattern Analy Mach Intell PP(99):2999–3007
Ren S, He K, Girshick R et al (2015) Faster r-cnn: Towards real-time object detection with region proposal networks[C]. Advances in neural information processing systems 91–99
He Y, Zhu C, Wang J, et al (2019) Bounding box regression with uncertainty for accurate object detection[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2888–2897
Rezatofighi H, Tsoi N, Gwak J Y, et al (2019) Generalized intersection over union: A metric and a loss for bounding box regression[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 658–666
Tolstikhin I, Bousquet O, Gelly S, et al (2018) Wasserstein Auto-Encoders[C]. In: International Conference on Learning Representations (ICLR 2018). OpenReview. net
Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge[J]. Int J comput vision 115(3):211–252
Li B, Liu Y, Wang X (2019) Gradient harmonized single-stage detector[C]. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 8577–8584
Fan H, Lin L, Yang F, et al (2019) Lasot: A high-quality benchmark for large-scale single object tracking[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5374–5383
Danelljan M, Bhat G, Shahbaz Khan F, et al (2017) Eco: Efficient convolution operators for tracking[C]. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646
Bhat G, Danelljan M, Gool L V, et al (2019) Learning discriminative model prediction for tracking[C]. In: Proceedings of the IEEE International Conference on Computer Vision, pp 6182–6191
Li F, Tian C, Zuo W, et al (2018) Learning spatial-temporal regularized correlation filters for visual tracking[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4904–4913
Zhang Y, Wang L, Qi J, et al (2018) Structured siamese network for real-time visual tracking[C]. In: Proceedings of the European conference on computer vision (ECCV), pp 351–366
Choi J, Jin Chang H, Fischer T, et al (2018) Context-aware deep feature compression for high-speed visual tracking[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 479–488
Zhang M, Lucas J, Ba J, et al (2019) Lookahead Optimizer: k steps forward, 1 step back[C]. Advances in Neural Information Processing Systems 9593–9604
Kristan M, Leonardis A, Matas J, et al (2018) The sixth visual object tracking vot2018 challenge results[C]. In: Proceedings of the European Conference on Computer Vision (ECCV)
Xie S, Girshick R, Dollár P, et al (2017) Aggregated residual transformations for deep neural networks[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500
Ma N, Zhang X, Zheng H T, et al (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 116–131
Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: Common objects in context[C]. European conference on computer vision. Springer, Cham, pp 740–755
Acknowledgements
This work is supported by National Nature Science Foundation of China (grant No. 61871106), Key R & D projects of Liaoning Province, 460 China (grant No. 2020JH2/10100029), and the Open Project Program Foundation of the Key Laboratory of Opto-Electronics Information Processing, Chinese Academy of Sciences (OEIP-O-202002).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xu, L., Wei, Y., Dong, C. et al. Wasserstein Distance-Based Auto-Encoder Tracking. Neural Process Lett 53, 2305–2329 (2021). https://doi.org/10.1007/s11063-021-10507-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-021-10507-9