Skip to main content
Log in

Wasserstein Distance-Based Auto-Encoder Tracking

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Most of the existing visual object trackers are based on deep convolutional feature maps, but there have fewer works about finding new features for tracking. This paper proposes a novel tracking framework based on a full convolutional auto-encoder appearance model, which is trained by using Wasserstein distance and maximum mean discrepancy . Compared with previous works, the proposed framework has better performance in three aspects, including appearance model, update scheme, and state estimation. To address the issues of the original update scheme including poor discriminant performance under limited supervisory information, sample pollution caused by long term object occlusion, and sample importance unbalance, in this paper, a novel latent space importance weighting algorithm, a novel sample space management algorithm, and a novel IOU-based label smoothing algorithm are proposed respectively. Besides, an improved weighted loss function is adopted to address the sample imbalance issue. Finally, to improve the state estimation accuracy, the combination of Kullback-Leibler divergence and generalized intersection over union is introduced. Extensive experiments are performed on the three widely used benchmarks, and the results demonstrate the state-of-the-art performance of the proposed method. Code and models are available at https://github.com/wahahamyt/CAT.git.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Notes

  1. https://developer.nvidia.com/deep-learning-performance-training-inference.

References

  1. Bertinetto L, Valmadre J, Henriques J F, et al (2016) Fully-convolutional siamese networks for object tracking[C]. In: European conference on computer vision. Springer, Cham, pp 850–865

  2. Danelljan M, Hager G, Shahbaz Khan F, et al (2015) Learning spatially regularized correlation filters for visual tracking[C]. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318

  3. Ma C, Huang J B, Yang X, et al (2015) Hierarchical convolutional features for visual tracking[C]. In: Proceedings of the IEEE international conference on computer vision, pp 3074–3082

  4. Hong S, You T, Kwak S, et al (2015) Online tracking by learning discriminative saliency map with convolutional neural network[C]. In: International conference on machine learning, pp 597–606

  5. Islam MM, Hu G, Liu Q et al (2018) Correlation filter based moving object tracking with scale adaptation and online re-detection[J]. IEEE Access 6:75244–75258

    Article  Google Scholar 

  6. Wu Y, Lim J, Yang MH (2013) Online Object Tracking: A Benchmark[C]. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society

  7. Wu Y, Lim J, Yang MH (2015) Object tracking benchmark[J]. IEEE Transac Pattern Analy Mach Intell 37(9):1834–1848

    Article  Google Scholar 

  8. Liang P, Blasch E, Ling H (2015) Encoding color information for visual tracking: algorithms and benchmark[J]. IEEE Transac Image Process 24(12):5630–5644

    Article  MathSciNet  Google Scholar 

  9. Li B, Yan J, Wu W, et al (2018) High performance visual tracking with siamese region proposal network[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8971–8980

  10. Li B, Wu W, Wang Q, et al (2019) Siamrpn++: Evolution of siamese visual tracking with very deep networks[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4282–4291

  11. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4293–4302

  12. Danelljan M, Bhat G, Khan F S, et al (2019) Atom: Accurate tracking by overlap maximization[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4660–4669

  13. Zhang J, Ma S, Sclaroff S (2014) MEEM: robust tracking via multiple experts using entropy minimization[C]. In: European conference on computer vision. Springer, Cham, pp 188–203

  14. Bolme D S, Beveridge J R, Draper B A, et al (2010) Visual object tracking using adaptive correlation filters[C]. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2544–2550

  15. Henriques JF, Caseiro R, Martins P et al (2015) High-speed tracking with kernelized correlation filters[J]. IEEE Transac Pattern Analy Mach Intell 37(3):583–596

    Article  Google Scholar 

  16. Bertinetto L, Valmadre J, Golodetz S, et al (2016) Staple: Complementary learners for real-time tracking[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409

  17. Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking[C]. Advances in neural information processing systems 809–817

  18. Girshick R, Donahue J, Darrell T, et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  19. Lin TY, Goyal P, Girshick R et al (2017) Focal loss for dense object detection[J]. IEEE Transac Pattern Analy Mach Intell PP(99):2999–3007

    Google Scholar 

  20. Ren S, He K, Girshick R et al (2015) Faster r-cnn: Towards real-time object detection with region proposal networks[C]. Advances in neural information processing systems 91–99

  21. He Y, Zhu C, Wang J, et al (2019) Bounding box regression with uncertainty for accurate object detection[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2888–2897

  22. Rezatofighi H, Tsoi N, Gwak J Y, et al (2019) Generalized intersection over union: A metric and a loss for bounding box regression[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 658–666

  23. Tolstikhin I, Bousquet O, Gelly S, et al (2018) Wasserstein Auto-Encoders[C]. In: International Conference on Learning Representations (ICLR 2018). OpenReview. net

  24. Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge[J]. Int J comput vision 115(3):211–252

    Article  MathSciNet  Google Scholar 

  25. Li B, Liu Y, Wang X (2019) Gradient harmonized single-stage detector[C]. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 8577–8584

  26. Fan H, Lin L, Yang F, et al (2019) Lasot: A high-quality benchmark for large-scale single object tracking[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5374–5383

  27. Danelljan M, Bhat G, Shahbaz Khan F, et al (2017) Eco: Efficient convolution operators for tracking[C]. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646

  28. Bhat G, Danelljan M, Gool L V, et al (2019) Learning discriminative model prediction for tracking[C]. In: Proceedings of the IEEE International Conference on Computer Vision, pp 6182–6191

  29. Li F, Tian C, Zuo W, et al (2018) Learning spatial-temporal regularized correlation filters for visual tracking[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4904–4913

  30. Zhang Y, Wang L, Qi J, et al (2018) Structured siamese network for real-time visual tracking[C]. In: Proceedings of the European conference on computer vision (ECCV), pp 351–366

  31. Choi J, Jin Chang H, Fischer T, et al (2018) Context-aware deep feature compression for high-speed visual tracking[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 479–488

  32. Zhang M, Lucas J, Ba J, et al (2019) Lookahead Optimizer: k steps forward, 1 step back[C]. Advances in Neural Information Processing Systems 9593–9604

  33. Kristan M, Leonardis A, Matas J, et al (2018) The sixth visual object tracking vot2018 challenge results[C]. In: Proceedings of the European Conference on Computer Vision (ECCV)

  34. Xie S, Girshick R, Dollár P, et al (2017) Aggregated residual transformations for deep neural networks[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500

  35. Ma N, Zhang X, Zheng H T, et al (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 116–131

  36. Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: Common objects in context[C]. European conference on computer vision. Springer, Cham, pp 740–755

    Google Scholar 

Download references

Acknowledgements

This work is supported by National Nature Science Foundation of China (grant No. 61871106), Key R & D projects of Liaoning Province, 460 China (grant No. 2020JH2/10100029), and the Open Project Program Foundation of the Key Laboratory of Opto-Electronics Information Processing, Chinese Academy of Sciences (OEIP-O-202002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Wei.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, L., Wei, Y., Dong, C. et al. Wasserstein Distance-Based Auto-Encoder Tracking. Neural Process Lett 53, 2305–2329 (2021). https://doi.org/10.1007/s11063-021-10507-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-021-10507-9

Keywords

Navigation