Skip to main content
Log in

Visual tracking based on depth cross-correlation and feature alignment

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Visual tracking technology based on the Siamese network have enabled excellent performance on many tracking datasets. However, these trackers cannot provide desirable results in unconstrained environments, such as fast motion and extensive scale variations. To solve this problem, this paper proposes Adaptive Dilated Fusion module, Depth Pixel-Wise Correlation module and Feature Alignment module to meet the above challenges. Adaptive Dilated Fusion module facilitates extensive scale variations by adding receptive field pyramid on the last layer of Siamese network; Depth Pixel-Wise Correlation module aims to extract pixel level features through average pooling and maximum pooling and reduce the influence of background noise; Feature Alignment module alleviates the mismatch between classification task and regression task. Experiments are performed on several public datasets VOT2017, OTB100, LaSOT, etc. The tracking performance of algorithm is tested on complex scenes such as fast motion, various resolutions and extensive scale variations. On the OTB100 dataset, the tracker proposed in this paper (named SiamAPA) obtains up 2.4% (AUC) compared with the reference network on fast motion scene, 4.9% on various resolution scene and 1.3% on extensive scale variations scene. On the VOT2017 dataset, SiamAPA obtains up 3.7% (EAO) compared with the reference network. On the LaSOT dataset, the accuracy is improved by 1% and the robustness is improved by 1.9% compared with the reference network. Thanks to the coordination of the above three innovations, the proposed algorithm is superior to classical algorithms such as SPM tracker in many datasets while performs real-time tracking effect.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

Similar content being viewed by others

Data Availability

All data generated during this study are included in these published articles:

1) OTB100.

https://doi.org/10.1109/TPAMI.2014.2388226.

2) VOT2017.

https://doi.org/10.1109/ICCVW.2017.230.

3) LaSOT.

https://doi.org/10.1109/CVPR.2019.00552.

4) Trackingnet.

https://doi.org/10.1007/978-3-030-01246-5_19.

5) GOT-10k.

https://doi.org/10.1109/tpami.2019.2957464.

Reference

  1. Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H. S. (2016). Fully-convolutional siamese networks for object tracking. European conference on computer vision, 850–865. https://doi.org/10.1007/978-3-319-48881-3_56

  2. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X. L. High performance visual tracking with Siamese region proposal network. Proceedings of 2018 IEEE/ CVF Conference on Computer Vision and Pattern, & Recognition (2018). 8971–8980. https://doi.org/10.1109/CVPR. 2018. 00935

  3. Ren, S. Q., He, K. M., Girshick, R., & Sun, J. (2017). Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  4. Guo, D. Y., Wang, J., Cui, Y., Wang, Z. H., & Chen, S. Y. (2020). SiamCAR: siamese fully convolutional classification and regression for visual tracking, Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6268–6276 https://doi.org/10.1109/CVPR42600.2020.00630

  5. Tian, Z., Shen, C., Chen, H., & He, T. (2019). FCOS: Fully Convolutional One-Stage Object Detection, Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 9626–9635. https://doi.org/10.1109/ICCV.2019.00972

  6. Wang, G., Luo, C., Xiong, Z., Zeng, W. SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking, Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern, & Recognition (2019). 3638–3647. https://doi.org/10.1109/CVPR.2019.00376

  7. Voigtlaender, P., Luiten, J., Torr, P. H., & Leibe, B. (2020). Siam r-cnn: Visual tracking by re-detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 6578–6588. https://doi.org/10.1109/cvpr42600.2020.00661

  8. Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S. Feature Pyramid Networks for Object Detection, Proceedings of 2017 IEEE Conference on Computer Vision and Pattern, & Recognition (2017). 936–944. https://doi.org/10.1109/CVPR.2017.106

  9. Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement.arXiv preprint arXiv:1804.02767

  10. Liu, S., & Huang, D. (2018). Receptive field block net for accurate and fast object detection. In Proceedings of the European conference on computer vision, 385–400. https://doi.org/10.1007/978-3-030-01252-6_24

  11. Jiang, B., Luo, R., Mao, J., Xiao, T., & Jiang, Y. (2018). Acquisition of localization confidence for accurate object detection, Proceedings of the 15th European Conference on Computer Vision, 784–799.https://doi.org/10.1007/978-3-030-01264-9_48

  12. Yang, Z., Liu, S., Hu, H., Wang, L., & Lin, S. (2019). RepPoints: Point Set Representation for Object Detection, Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 9656–9665. https://doi.org/10.1109/ICCV.2019.00975

  13. Wu, Y., Chen, Y., Yuan, L., Liu, Z., Wang, L., Li, H. … Recognition (2020). 10183–10192. https://doi.org/10.1109/CVPR42600. 2020.01020

  14. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: Evolution of siamese visual tracking with very deep networks, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern, & Recognition (2019). 4282–4291. https://doi.org/10.1109/CVPR. 2019.00441

  15. Yan, B., Zhang, X., Wang, D., Lu, H., & Yang, X. (2021). Alpha-refine: Boosting tracking performance by precise bounding box estimation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5289–5298. https://doi.org/10.1109/CVPR46437. 2021.00525

  16. Fan, H., & Ling, H. (2020). CRACT: Cascaded Regression-Align-Classification for Robust Visual Tracking. arXiv preprint arXiv:2011.12483

  17. Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., & Sun, J. (2021). You only look one-level feature. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13039–13048. https://doi.org/10.1109/CVPR46437.2021.01284

  18. Wu, Y., Lim, J., & Yang, M. H. (2015). Object Tracking Benchmark. IEEE Transactions on Pattern Analysis & Machine Intelligence, 37(9), 1834–1848. https://doi.org/10.1109/TPAMI.2014.2388226

    Article  Google Scholar 

  19. Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., ˇCehovin Zajc, L., & Fernandez, G. (2017). The visual object tracking vot2017 challenge results. In Proceedings of the IEEE international conference on computer vision workshops, pp. 1949–1972. https://doi.org/10.1109/ICCVW.2017.230

  20. Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., & Yu, S. (2019). Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5374–5383. https://doi.org/10.1109/CVPR.2019.00552

  21. Muller, M., Bibi, A., Giancola, S., Alsubaihi, S., & Ghanem, B. (2018). Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In Proceedings of the European Conference on Computer Vision (ECCV) pp. 300–317 https://doi.org/10.1007/978-3-030-01246-5_19

  22. Huang, L., Zhao, X., & Huang, K. (2019). Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(5), 1562–1577. https://doi.org/10.1109/tpami.2019.2957464

    Article  Google Scholar 

  23. Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., & Torr, P. H. (2017). End-to-end representation learning for correlation filter based tracking. Proceedings of the IEEE conference on computer vision and pattern recognition, 2805–2813. 10. https://doi.org/1109/ CVPR.2017.531

  24. Zhang, Z. P., Peng, H. W. Deeper and wider Siamese networks for real-time visual tracking, Proceedings of 2019 IEEE/ CVF Conference on Computer Vision and Pattern, & Recognition (2019). 4586–4595. https://doi.org/10.1109/CVPR. 2019. 00472

  25. Danelljan, M., Bhat, G., Khan, F. S., Felsberg, M. ECO: efficient convolution operators for tracking, Proceedings of 2017 IEEE Conference on Computer Vision and Pattern, & Recognition (2017). 6931–6939. https://doi.org/10.1109/CVPR. 2017. 733

  26. Li, P., Wang, C. B. O. W., Yang, D. X., & Lu, H. (2019). GradNet: Gradient-Guided Network for Visual Object Tracking, Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 6161–6170. https://doi.org/10.1109/ICCV.2019.00626

  27. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J. J., & Hu, W. M. (2018). Distractor-aware Siamese networks for visual object tracking, Proceedings of the 15th European Conference on Computer Vision, 103–119 https://doi.org/10.1007/978-3-030-01240-3_7

  28. Wang, Q., Zhang, L., Bertinetto, L., Hu, W. M., Torr, P. H. S. Fast online object tracking and segmentation: a unifying approach, Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern, & Recognition (2019). 1328–1338. https://doi.org/10.1109/CVPR.2019.00142

  29. Song, Y., Ma, C., Gong, L., Zhang, J., Lau, R. W. H., & Yang, M. (2017). CREST: Convolutional Residual Learning for Visual Tracking, Proceedings of 2017 IEEE International Conference on Computer Vision, 2574–2583. https://doi.org/10.1109/ICCV.2017.279

  30. Zhang, Z., Xie, Y., Xing, F., McGough, M., & Yang, L. (2017). Mdnet: A semantically and visually interpretable medical image diagnosis network. Proceedings of the IEEE conference on computer vision and pattern recognition, 6428–6436. https://doi.org/10.1109/CVPR.2017.378

  31. Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P. H. S. Staple: complementary learners for real-time tracking, Proceedings of 2016 IEEE Conference on Computer Vision and Pattern, & Recognition (2016). 1401–1409. https://doi.org/10.1109/CVPR. 2016. 156

  32. Bhat, G., Johnander, J., Danelljan, M., Khan, F. S., & Felsberg, M. (2018). Unveiling the power of deep tracking. In Proceedings of the European Conference on Computer Vision 483–498 https://doi.org/10.1007/978-3-030-01216-8_30

  33. Ma, C., Huang, J. B., Yang, X., & Yang, M. H. (2015). Hierarchical convolutional features for visual tracking. In Proceedings of the IEEE international conference on computer vision pp. 3074–3082 https://doi.org/10.1109/iccv.2015.352

  34. Danelljan, M., Robinson, A., Shahbaz Khan, F., & Felsberg, M. (2016). Beyond correlation filters: Learning continuous convolution operators for visual tracking. In European conference on computer vision pp. 472–488 https://doi.org/10.1007/978-3-319-46454-1_29

  35. Held, D., Thrun, S., & Savarese, S. (2016). Learning to track at 100 fps with deep regression networks. In European conference on computer vision pp. 749–765 https://doi.org/10.1007/978-3-319-46448-0_45

Download references

Funding

The work was supported by the Natural Science Foundation of China NSFC under Grants 61871445, 61302156; the Key R & D Foundation Project of Jiangsu province under Grant BE2016001-4.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Yao Xiao, Fuxiang Wang, Xuhui Liu. The first draft of the manuscript was written by Guang Han and Yao Xiao and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Guang Han.

Ethics declarations

Ethics approval

This paper is not a study with human subjects, so no ethics approval is required.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, G., Xiao, Y., Wang, F. et al. Visual tracking based on depth cross-correlation and feature alignment. J Sign Process Syst 95, 37–47 (2023). https://doi.org/10.1007/s11265-022-01791-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-022-01791-2

Index Terms

Navigation