End-to-end deep metric network for visual tracking

Tian, Shengjing; Shen, Shuwei; Tian, Guoqiang; Liu, Xiuping; Yin, Baocai

doi:10.1007/s00371-019-01730-6

End-to-end deep metric network for visual tracking

Original Article
Published: 24 July 2019

Volume 36, pages 1219–1232, (2020)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Shengjing Tian ORCID: orcid.org/0000-0002-8109-3581¹,
Shuwei Shen¹,
Guoqiang Tian¹,
Xiuping Liu¹ &
…
Baocai Yin²

431 Accesses
8 Citations
Explore all metrics

Abstract

In this paper, we propose an end-to-end deep metric network (DMN) for visual tracking, where any target can be accurately tracked given only a bounding box of the first frame. Our main motivation is to make the network learn to learn a deep distance metric by following the philosophy of one-shot learning. Instead of utilizing a hand-crafted distance metric like Euclidean distance, our DMN focuses on providing a learnable metric, which is more robust to appearance variations. Furthermore, we are the first to properly combine mean square errors and contrastive loss into a joint loss function for back-propagation. During online tracking, DMN firstly applies our instance initialization for obtaining sequence-specific information and then straightforwardly tracks the target without the help of box refinement, occlusion detection and online updating. The final tracking score considers both our DMN scalar output and the constrain of motion smoothness. Ablation analyses are carried out to validate the effectiveness of our proposed method. And experiments on the prevalent benchmarks show that our method can achieve a competitive performance when compared with some representative trackers, especially those existing metric learning-based algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised Deep Representation Learning for Real-Time Tracking

Article 21 September 2020

Hierarchical Representations with Discriminative Meta-filters in Dual Path Network for Tracking

A location-aware siamese network for high-speed visual tracking

Article 10 June 2022

Notes

SINT is a version without optical flow, and its results were obtained on our own PC using the pre-trained Caffe model.

References

Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision (ECCV), pp. 850–865 (2016)
Briechle, K., Hanebeck, U.D.: Template matching using fast normalized cross correlation. In: Proceeding of SPIE on Optical Pattern Recognition XII (2001)
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 539–546 (2005)
Comaniciu, D., Ramesh, V., Meer, P.: Real-time tracking of non-rigid objects using mean shift. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2000)
Danelljan, M., Häger, G., Khan, F.S., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: British Machine Vision Conference (BMVC) (2014)
Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: International Conference on Machine Learning (ICML) (2007)
Elgammal, A., Duraiswami, R., Davis, L.S.: Probabilistic tracking in joint feature-spatial spaces. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2003)
Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)
Han, Z., Wang, P., Ye, Q.: Adaptive discriminative deep correlation filter for visual object tracking. IEEE Transactions on Circuits and Systems for Video Technology (Early Access) (2018)
Hare, S., Golodetz, S., Saffari, A., Vineet, V., Cheng, M.M., Hicks, S.L., Torr, P.H.S.: Struck: structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096–2109 (2016)
Article Google Scholar
He, A., Luo, C., Tian, X., Zeng, W.: A twofold siamese network for real-time object tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Held, D., Thrun, S., Savarese, S.: Learning to track at 100 fps with deep regression networks. In: European Conference on Computer Vision (ECCV) (2016)
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)
Article Google Scholar
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.P.: Exploiting the circulant structure of tracking-by-detection with kernels. In: European Conference on Computer Vision (ECCV) (2012)
Hu, J., Lu, J., Tan, Y.P.: Deep metric learning for visual tracking. IEEE Trans. Circuits Syst. Video Technol. 26(11), 2056–2068 (2016)
Article Google Scholar
Jiang, N., Liu, W., Wu, Y.: Order determination and sparsity-regularized metric learning adaptive visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1956–1963 (2012)
Kehl, R., Bray, M., Gool, L.V.: Full body tracking from multiple views using stochastic sampling. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)
Kim, K., Lepetit, V., Woo, W.: Scalable real-time planar targets tracking for digilog books. Vis. Comput. 26(6–8), 1145–1154 (2010)
Article Google Scholar
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pugfelder, R., et al.: The visual object tracking vot2017 challenge results. In: IEEE International Conference on Computer Vision Workshop (ICCVW) (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Li, X., Shen, C., Shi, Q., Dick, A.R., van den Hengel, A.: Non-sparse linear representations for visual tracking with online reservoir metric learning. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 1760–1767 (2012)
Lu, J., Hu, J., Tan, Y.P.: Nonlinear metric learning for visual tracking. In: IEEE International Conference on Multimedia and Expo (ICME) (2016)
Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Robust visual tracking via hierarchical convolutional features. IEEE Transactions on Pattern Analysis and Machine Intelligence (Early Access) (2018)
Ma, Z., Wu, E.: Real-time and robust hand tracking with a single depth camera. Vis. Comput. 30(10), 1133–1144 (2014)
Article Google Scholar
Mei, X., Ling, H.: Robust visual tracking using l1 minimization. In: IEEE International Conference on Computer Vision (ICCV), pp. 1436–1443 (2009)
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4293–4302 (2016)
Rechy-Ramirez, E.J., Marin-Hernandez, A., Rios-Figueroa, H.V.: A human–computer interface for wrist rehabilitation: a pilot study using commercial sensors to detect wrist movements. Vis. Comput. 35(1), 41–55 (2019)
Article Google Scholar
Ross, D.A., Lim, J., Lin, R.S., Yang, M.H.: Incremental learning for robust visual tracking. Int. J. Comput. Vis. 77(1–3), 125–141 (2008)
Article Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint: arXiv:1409.1556 (2014)
Smeulders, A.W.M., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)
Article Google Scholar
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese instance search for tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1420–1429 (2016)
Tsagkatakis, G., Savakis, A.E.: Online distance metric learning for object tracking. IEEE Trans. Circuits Syst. Video Technol. 21(12), 1810–1821 (2011)
Article Google Scholar
Valmadre, J., Bertinetto, L., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: End-to-end representation learning for correlation filter based tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5000–5008 (2017)
Vishwakarma, S., Agrawal, A.: A survey on activity recognition and behavior understanding in video surveillance. Vis. Comput. 29(10), 983–1009 (2013)
Article Google Scholar
Wang, D., Lu, H., Yang, M.H.: Least soft-threshold squares tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2371–2378 (2013)
Wang, N., Li, S., Gupta, A., Yeung, D.Y.: Transferring rich feature hierarchies for robust visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Wang, N., Yeung, D.Y.: Learning a deep compact image representation for visual tracking. In: Advances in Neural Information Processing Systems (NIPS), pp. 809–817 (2013)
Wang, Q., Teng, Z., Xing, J., Gao, J., Hu, W., Maybank, S.: Learning attentions: residual attentional siamese network for high performance online visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Wang, S., Lu, H., Yang, F., Yang, M.H.: Superpixel tracking. In: IEEE International Conference on Computer Vision (ICCV), pp. 1323–1330 (2011)
Wang, X., Hua, G., Han, T.X.: Discriminative tracking by metric learning. In: European Conference on Computer Vision (ECCV) (2010)
Wang, X., Li, C., Luo, B., Tang, J.: Sint++: Robust visual tracking via adversarial positive instance generation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)
Article Google Scholar
Wu, Y., Ma, B., Yang, M., Zhang, J., Jia, Y.: Metric learning based structural appearance model for robust visual tracking. IEEE Trans. Circuits Syst. Video Technol. 24(5), 865–877 (2014)
Article Google Scholar
Zhang, K., Zhang, L., Yang, M.H.: Real-time compressive tracking. In: European Conference on Computer Vision (ECCV) (2012)
Zhang, T., Ghanem, B., Liu, S., Ahuja, N.: Robust visual tracking via structured multi-task sparse learning. Int. J. Comput. Vis. 101(2), 367–383 (2013)
Article MathSciNet Google Scholar
Zhang, T., Xu, C., Yang, M.H.: Robust structural sparse tracking. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 473–486 (2019)
Article Google Scholar

Download references

Funding

This work was funded by the National Natural Science Foundation of China (Grant Number U1811463).

Author information

Authors and Affiliations

School of Mathematical Sciences, Dalian University of Technology, Dalian, 116024, People’s Republic of China
Shengjing Tian, Shuwei Shen, Guoqiang Tian & Xiuping Liu
Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, 116024, People’s Republic of China
Baocai Yin

Authors

Shengjing Tian
View author publications
You can also search for this author in PubMed Google Scholar
Shuwei Shen
View author publications
You can also search for this author in PubMed Google Scholar
Guoqiang Tian
View author publications
You can also search for this author in PubMed Google Scholar
Xiuping Liu
View author publications
You can also search for this author in PubMed Google Scholar
Baocai Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shengjing Tian.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tian, S., Shen, S., Tian, G. et al. End-to-end deep metric network for visual tracking. Vis Comput 36, 1219–1232 (2020). https://doi.org/10.1007/s00371-019-01730-6

Download citation

Published: 24 July 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s00371-019-01730-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

End-to-end deep metric network for visual tracking

Abstract

Access this article

Similar content being viewed by others

Unsupervised Deep Representation Learning for Real-Time Tracking

Hierarchical Representations with Discriminative Meta-filters in Dual Path Network for Tracking

A location-aware siamese network for high-speed visual tracking

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

End-to-end deep metric network for visual tracking

Abstract

Access this article

Similar content being viewed by others

Unsupervised Deep Representation Learning for Real-Time Tracking

Hierarchical Representations with Discriminative Meta-filters in Dual Path Network for Tracking

A location-aware siamese network for high-speed visual tracking

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation