Skip to main content

Advertisement

Log in

Long-term object tracking based on joint tracking and detection strategy with Siamese network

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

In computer vision, Siamese network-based visual object tracking algorithms employ a strategy of detecting the target in the vicinity of the previous tracking result, effectively avoiding redundant computations. Nonetheless, the efficacy of such algorithms may be compromised during long-term tracking scenarios, wherein the target may resurface after a prolonged absence of the search region, leading to tracking failures or drift. This issue is particularly relevant in long-term tracking tasks, wherein target disappearance is common and can result in degraded performance. As a solution to this problem, we propose a novel visual object tracking (VOT) algorithm using a joint tracking and detection (JTD) strategy to handle target disappearance. We design a discriminator in our algorithm to judge the target disappearance and enable switching between global and local searches. The proposed algorithm was evaluated on the long-term tracking datasets(OxUvA, UAV20L, and LaSOT), and the results showed that our algorithm could solve the problem of tracking failure caused by target disappearance and improve the robustness and precision of the tracker.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Code 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Guo, W., Li, D., Liang, B., Shan, B.: Multi-view region proposal network predictive learning for tracking. Multimed. Syst. 29(1), 333–346 (2023)

    Article  Google Scholar 

  2. Vadamala, P.R., Aklak, A.F.: Discriminative appearance model with template spatial adjustment for visual object tracking. Soft. Comput. 27(14), 9787–9800 (2023)

    Article  Google Scholar 

  3. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) https://doi.org/10.48550/arXiv.2004.10934

  4. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015). https://doi.org/10.1109/iccv.2015.169

  5. Gao, J., Zhang, T., Xu, C.: Learning to model relationships for zero-shot video classification. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3476–3491 (2020)

    Article  Google Scholar 

  6. Gao, J., Xu, C.: Learning video moment retrieval without a single annotated video. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1646–1657 (2021)

    Article  MathSciNet  Google Scholar 

  7. Hu, Y., Gao, J., Dong, J., Fan, B., Liu, H.: Exploring rich semantics for open-set action recognition. IEEE Trans. Multimed. 26, 5410–5421 (2024)

    Article  Google Scholar 

  8. Gao, J., Chen, M., Xu, C.: Vectorized evidential learning for weakly-supervised temporal action localization. IEEE Trans. Pattern Anal. Mach. Intell. 45(12), 15949–15963 (2023)

    Article  Google Scholar 

  9. Chen, F., Wang, X., Zhao, Y., Lv, S., Niu, X.: Visual object tracking: a survey. Comput. Vis. Image Underst. 222, 103508 (2022)

    Article  Google Scholar 

  10. An, Z., Wang, X., Li, B., Xiang, Z., Zhang, B.: Robust visual tracking for uavs with dynamic feature weight selection. Appl. Intell. 53(4), 3836–3849 (2023)

    Article  Google Scholar 

  11. Li, P., Zhang, H., Chen, Y.: Structural local sparse and low-rank tracker using deep features. Multimed. Syst. 29(3), 1481–1498 (2023)

    Article  Google Scholar 

  12. Suljagic, H., Bayraktar, E., Celebi, N.: Similarity based person re-identification for multi-object tracking using deep Siamese network. Neural Comput. Appl. 34(20), 18171–18182 (2022). https://doi.org/10.1007/s00521-022-07456-2

    Article  Google Scholar 

  13. Shen, J., Liu, Y., Dong, X., Lu, X., Khan, F., Hoi, S.: Distilled Siamese networks for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. 44(12), 8896–8909 (2022)

    Article  Google Scholar 

  14. Yang, K., He, Z., Pei, W., Zhou, Z., Li, X., Yuan, D., Zhang, H.: Siamcorners: Siamese corner networks for visual tracking. IEEE Trans. Multimed. 24, 1956–1967 (2022)

    Article  Google Scholar 

  15. Hu, W., Wang, Q., Zhang, L., Bertinetto, L., Torr, P.H.: Siammask: a framework for fast online object tracking and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 3072–3089 (2023)

    Google Scholar 

  16. Zheng, G., Fu, C., Ye, J., Li, B., Lu, G., Pan, J.: Scale-aware siamese object tracking for vision-based uam approaching. IEEE Trans. Ind. Inf. 19(9), 9349–9360 (2023)

    Article  Google Scholar 

  17. Xiao, D., Tan, K., Wei, Z., Zhang, G.: Siamese block attention network for online update object tracking. Appl. Intell. 53(3), 3459–3471 (2023)

    Article  Google Scholar 

  18. Yang, K., Song, H., Zhang, K., Liu, Q.: Hierarchical attentive Siamese network for real-time visual tracking. Neural Comput. Appl. 32(18), 14335–14346 (2020). https://doi.org/10.1007/s00521-019-04238-1

    Article  Google Scholar 

  19. Serrano, N., Bellogín, A.: Siamese neural networks in recommendation. Neural Comput. Appl. (2023). https://doi.org/10.1007/s00521-023-08610-0

    Article  Google Scholar 

  20. Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016). https://doi.org/10.1109/cvpr.2016.158

  21. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II 14, pp. 850–865 (2016). https://doi.org/10.1007/978-3-319-48881-3_56 . Springer

  22. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018). https://doi.org/10.1109/cvpr.2018.00935

  23. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 101–117 (2018). https://doi.org/10.1007/978-3-030-01240-3_7

  24. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.S., et al.: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp. 16–20 (2019). https://doi.org/10.1109/cvpr.2019.00441

  25. Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6668–6677 (2020). https://doi.org/10.1109/cvpr42600.2020.00670

  26. Gurkan, F., Cerkezi, L., Cirakman, O., Gunsel, B.: Tdiot: target-driven inference for deep video object tracking. IEEE Trans. Image Process. 30, 7938–7951 (2021)

    Article  Google Scholar 

  27. Xuan, S., Li, S., Zhao, Z., Kou, L., Zhou, Z., Xia, G.-S.: Siamese networks with distractor-reduction method for long-term visual object tracking. Pattern Recogn. 112, 107698 (2021)

    Article  Google Scholar 

  28. Huang, L., Zhao, X., Huang, K.: Globaltrack: A simple and strong baseline for long-term tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11037–11044 (2020)

  29. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386

    Article  Google Scholar 

  30. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848 . IEEE

  31. Real, E., Shlens, J., Mazzocchi, S., Pan, X., Vanhoucke, V.: Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5296–5305 (2017). https://doi.org/10.1109/cvpr.2017.789

  32. Valmadre, J., Bertinetto, L., Henriques, J.F., Tao, R., Vedaldi, A., Smeulders, A.W., Torr, P.H., Gavves, E.: Long-term tracking in the wild: A benchmark. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 670–685 (2018). https://doi.org/10.1007/978-3-030-01219-9_41

  33. Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.: End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2805–2813 (2017)

  34. Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2011). https://doi.org/10.1109/TPAMI.2011.239

    Article  Google Scholar 

  35. Ma, C., Yang, X., Zhang, C., Yang, M.-H.: Long-term correlation tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5388–5396 (2015). https://doi.org/10.1109/cvpr.2015.7299177

  36. Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4591–4600 (2019)

  37. Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for uav tracking. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 445–461 (2016). https://doi.org/10.1007/978-3-319-46448-0_27 . Springer

  38. Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., Ling, H.: Lasot: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5374–5383 (2019). https://doi.org/10.1109/cvpr.2019.00552

  39. Muller, M., Bibi, A., Giancola, S., Alsubaihi, S., Ghanem, B.: Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 300–317 (2018). https://doi.org/10.1007/978-3-030-01246-5_19

Download references

Acknowledgements

The research was funded by the National Natural Science Foundation of China (No. 62271193); the Aeronautical Science Foundation of China (No. 20185142003); Natural Science Foundation of Henan Province, China (No. 222300420433); the Science and Technology Innovative Talents in Universities of Henan Province, China (No. 21HASTIT030); Young Backbone Teachers in Universities of Henan Province, China (No. 2020GGJS073); Major Science and Technology Projects of Longmen Laboratory, China (No.231100220300)

Author information

Authors and Affiliations

Authors

Contributions

Lifan Sun conceptualized and designed the algorithm, implemented the initial codebase, and prepared the original manuscript draft. Jiayi Zhang contributed to the development and fine-tuning of the algorithm, performed substantial debugging and code optimization, and assisted with manuscript writing and revisions. Zhe Yang designed and executed the performance tests, analyzed the computational results, and contributed to the interpretation of these results for the manuscript. Dan Gao and Bo Fan provided essential theoretical insights, contributed to algorithm improvements, and critically revised the manuscript for important intellectual content. All authors reviewed the manuscript.

Corresponding author

Correspondence to Lifan Sun.

Ethics declarations

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a Conflict of interest in connection with the work submitted. The authors declare no Conflict of interest.

Additional information

Communicated by J. Gao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, L., Zhang, J., Yang, Z. et al. Long-term object tracking based on joint tracking and detection strategy with Siamese network. Multimedia Systems 30, 162 (2024). https://doi.org/10.1007/s00530-024-01366-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-024-01366-0

Keywords