Abstract
In computer vision, Siamese network-based visual object tracking algorithms employ a strategy of detecting the target in the vicinity of the previous tracking result, effectively avoiding redundant computations. Nonetheless, the efficacy of such algorithms may be compromised during long-term tracking scenarios, wherein the target may resurface after a prolonged absence of the search region, leading to tracking failures or drift. This issue is particularly relevant in long-term tracking tasks, wherein target disappearance is common and can result in degraded performance. As a solution to this problem, we propose a novel visual object tracking (VOT) algorithm using a joint tracking and detection (JTD) strategy to handle target disappearance. We design a discriminator in our algorithm to judge the target disappearance and enable switching between global and local searches. The proposed algorithm was evaluated on the long-term tracking datasets(OxUvA, UAV20L, and LaSOT), and the results showed that our algorithm could solve the problem of tracking failure caused by target disappearance and improve the robustness and precision of the tracker.













Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Guo, W., Li, D., Liang, B., Shan, B.: Multi-view region proposal network predictive learning for tracking. Multimed. Syst. 29(1), 333–346 (2023)
Vadamala, P.R., Aklak, A.F.: Discriminative appearance model with template spatial adjustment for visual object tracking. Soft. Comput. 27(14), 9787–9800 (2023)
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) https://doi.org/10.48550/arXiv.2004.10934
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015). https://doi.org/10.1109/iccv.2015.169
Gao, J., Zhang, T., Xu, C.: Learning to model relationships for zero-shot video classification. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3476–3491 (2020)
Gao, J., Xu, C.: Learning video moment retrieval without a single annotated video. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1646–1657 (2021)
Hu, Y., Gao, J., Dong, J., Fan, B., Liu, H.: Exploring rich semantics for open-set action recognition. IEEE Trans. Multimed. 26, 5410–5421 (2024)
Gao, J., Chen, M., Xu, C.: Vectorized evidential learning for weakly-supervised temporal action localization. IEEE Trans. Pattern Anal. Mach. Intell. 45(12), 15949–15963 (2023)
Chen, F., Wang, X., Zhao, Y., Lv, S., Niu, X.: Visual object tracking: a survey. Comput. Vis. Image Underst. 222, 103508 (2022)
An, Z., Wang, X., Li, B., Xiang, Z., Zhang, B.: Robust visual tracking for uavs with dynamic feature weight selection. Appl. Intell. 53(4), 3836–3849 (2023)
Li, P., Zhang, H., Chen, Y.: Structural local sparse and low-rank tracker using deep features. Multimed. Syst. 29(3), 1481–1498 (2023)
Suljagic, H., Bayraktar, E., Celebi, N.: Similarity based person re-identification for multi-object tracking using deep Siamese network. Neural Comput. Appl. 34(20), 18171–18182 (2022). https://doi.org/10.1007/s00521-022-07456-2
Shen, J., Liu, Y., Dong, X., Lu, X., Khan, F., Hoi, S.: Distilled Siamese networks for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. 44(12), 8896–8909 (2022)
Yang, K., He, Z., Pei, W., Zhou, Z., Li, X., Yuan, D., Zhang, H.: Siamcorners: Siamese corner networks for visual tracking. IEEE Trans. Multimed. 24, 1956–1967 (2022)
Hu, W., Wang, Q., Zhang, L., Bertinetto, L., Torr, P.H.: Siammask: a framework for fast online object tracking and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 3072–3089 (2023)
Zheng, G., Fu, C., Ye, J., Li, B., Lu, G., Pan, J.: Scale-aware siamese object tracking for vision-based uam approaching. IEEE Trans. Ind. Inf. 19(9), 9349–9360 (2023)
Xiao, D., Tan, K., Wei, Z., Zhang, G.: Siamese block attention network for online update object tracking. Appl. Intell. 53(3), 3459–3471 (2023)
Yang, K., Song, H., Zhang, K., Liu, Q.: Hierarchical attentive Siamese network for real-time visual tracking. Neural Comput. Appl. 32(18), 14335–14346 (2020). https://doi.org/10.1007/s00521-019-04238-1
Serrano, N., Bellogín, A.: Siamese neural networks in recommendation. Neural Comput. Appl. (2023). https://doi.org/10.1007/s00521-023-08610-0
Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016). https://doi.org/10.1109/cvpr.2016.158
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II 14, pp. 850–865 (2016). https://doi.org/10.1007/978-3-319-48881-3_56 . Springer
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018). https://doi.org/10.1109/cvpr.2018.00935
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 101–117 (2018). https://doi.org/10.1007/978-3-030-01240-3_7
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.S., et al.: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp. 16–20 (2019). https://doi.org/10.1109/cvpr.2019.00441
Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6668–6677 (2020). https://doi.org/10.1109/cvpr42600.2020.00670
Gurkan, F., Cerkezi, L., Cirakman, O., Gunsel, B.: Tdiot: target-driven inference for deep video object tracking. IEEE Trans. Image Process. 30, 7938–7951 (2021)
Xuan, S., Li, S., Zhao, Z., Kou, L., Zhou, Z., Xia, G.-S.: Siamese networks with distractor-reduction method for long-term visual object tracking. Pattern Recogn. 112, 107698 (2021)
Huang, L., Zhao, X., Huang, K.: Globaltrack: A simple and strong baseline for long-term tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11037–11044 (2020)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848 . IEEE
Real, E., Shlens, J., Mazzocchi, S., Pan, X., Vanhoucke, V.: Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5296–5305 (2017). https://doi.org/10.1109/cvpr.2017.789
Valmadre, J., Bertinetto, L., Henriques, J.F., Tao, R., Vedaldi, A., Smeulders, A.W., Torr, P.H., Gavves, E.: Long-term tracking in the wild: A benchmark. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 670–685 (2018). https://doi.org/10.1007/978-3-030-01219-9_41
Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.: End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2805–2813 (2017)
Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2011). https://doi.org/10.1109/TPAMI.2011.239
Ma, C., Yang, X., Zhang, C., Yang, M.-H.: Long-term correlation tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5388–5396 (2015). https://doi.org/10.1109/cvpr.2015.7299177
Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4591–4600 (2019)
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for uav tracking. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 445–461 (2016). https://doi.org/10.1007/978-3-319-46448-0_27 . Springer
Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., Ling, H.: Lasot: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5374–5383 (2019). https://doi.org/10.1109/cvpr.2019.00552
Muller, M., Bibi, A., Giancola, S., Alsubaihi, S., Ghanem, B.: Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 300–317 (2018). https://doi.org/10.1007/978-3-030-01246-5_19
Acknowledgements
The research was funded by the National Natural Science Foundation of China (No. 62271193); the Aeronautical Science Foundation of China (No. 20185142003); Natural Science Foundation of Henan Province, China (No. 222300420433); the Science and Technology Innovative Talents in Universities of Henan Province, China (No. 21HASTIT030); Young Backbone Teachers in Universities of Henan Province, China (No. 2020GGJS073); Major Science and Technology Projects of Longmen Laboratory, China (No.231100220300)
Author information
Authors and Affiliations
Contributions
Lifan Sun conceptualized and designed the algorithm, implemented the initial codebase, and prepared the original manuscript draft. Jiayi Zhang contributed to the development and fine-tuning of the algorithm, performed substantial debugging and code optimization, and assisted with manuscript writing and revisions. Zhe Yang designed and executed the performance tests, analyzed the computational results, and contributed to the interpretation of these results for the manuscript. Dan Gao and Bo Fan provided essential theoretical insights, contributed to algorithm improvements, and critically revised the manuscript for important intellectual content. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
We declare that we do not have any commercial or associative interest that represents a Conflict of interest in connection with the work submitted. The authors declare no Conflict of interest.
Additional information
Communicated by J. Gao.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, L., Zhang, J., Yang, Z. et al. Long-term object tracking based on joint tracking and detection strategy with Siamese network. Multimedia Systems 30, 162 (2024). https://doi.org/10.1007/s00530-024-01366-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00530-024-01366-0