Visual tracking based on depth cross-correlation and feature alignment

Han, Guang; Xiao, Yao; Wang, Fuxiang; Liu, Xuhui

doi:10.1007/s11265-022-01791-2

Visual tracking based on depth cross-correlation and feature alignment

Published: 03 September 2022

Volume 95, pages 37–47, (2023)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Guang Han ORCID: orcid.org/0000-0003-4812-9180¹,
Yao Xiao¹,
Fuxiang Wang¹ &
…
Xuhui Liu¹

204 Accesses
Explore all metrics

Abstract

Visual tracking technology based on the Siamese network have enabled excellent performance on many tracking datasets. However, these trackers cannot provide desirable results in unconstrained environments, such as fast motion and extensive scale variations. To solve this problem, this paper proposes Adaptive Dilated Fusion module, Depth Pixel-Wise Correlation module and Feature Alignment module to meet the above challenges. Adaptive Dilated Fusion module facilitates extensive scale variations by adding receptive field pyramid on the last layer of Siamese network; Depth Pixel-Wise Correlation module aims to extract pixel level features through average pooling and maximum pooling and reduce the influence of background noise; Feature Alignment module alleviates the mismatch between classification task and regression task. Experiments are performed on several public datasets VOT2017, OTB100, LaSOT, etc. The tracking performance of algorithm is tested on complex scenes such as fast motion, various resolutions and extensive scale variations. On the OTB100 dataset, the tracker proposed in this paper (named SiamAPA) obtains up 2.4% (AUC) compared with the reference network on fast motion scene, 4.9% on various resolution scene and 1.3% on extensive scale variations scene. On the VOT2017 dataset, SiamAPA obtains up 3.7% (EAO) compared with the reference network. On the LaSOT dataset, the accuracy is improved by 1% and the robustness is improved by 1.9% compared with the reference network. Thanks to the coordination of the above three innovations, the proposed algorithm is superior to classical algorithms such as SPM tracker in many datasets while performs real-time tracking effect.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

Data Availability

All data generated during this study are included in these published articles:

1) OTB100.

https://doi.org/10.1109/TPAMI.2014.2388226.

2) VOT2017.

https://doi.org/10.1109/ICCVW.2017.230.

3) LaSOT.

https://doi.org/10.1109/CVPR.2019.00552.

4) Trackingnet.

https://doi.org/10.1007/978-3-030-01246-5_19.

5) GOT-10k.

https://doi.org/10.1109/tpami.2019.2957464.

Reference

Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H. S. (2016). Fully-convolutional siamese networks for object tracking. European conference on computer vision, 850–865. https://doi.org/10.1007/978-3-319-48881-3_56
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X. L. High performance visual tracking with Siamese region proposal network. Proceedings of 2018 IEEE/ CVF Conference on Computer Vision and Pattern, & Recognition (2018). 8971–8980. https://doi.org/10.1109/CVPR. 2018. 00935
Ren, S. Q., He, K. M., Girshick, R., & Sun, J. (2017). Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Guo, D. Y., Wang, J., Cui, Y., Wang, Z. H., & Chen, S. Y. (2020). SiamCAR: siamese fully convolutional classification and regression for visual tracking, Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6268–6276 https://doi.org/10.1109/CVPR42600.2020.00630
Tian, Z., Shen, C., Chen, H., & He, T. (2019). FCOS: Fully Convolutional One-Stage Object Detection, Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 9626–9635. https://doi.org/10.1109/ICCV.2019.00972
Wang, G., Luo, C., Xiong, Z., Zeng, W. SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking, Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern, & Recognition (2019). 3638–3647. https://doi.org/10.1109/CVPR.2019.00376
Voigtlaender, P., Luiten, J., Torr, P. H., & Leibe, B. (2020). Siam r-cnn: Visual tracking by re-detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 6578–6588. https://doi.org/10.1109/cvpr42600.2020.00661
Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S. Feature Pyramid Networks for Object Detection, Proceedings of 2017 IEEE Conference on Computer Vision and Pattern, & Recognition (2017). 936–944. https://doi.org/10.1109/CVPR.2017.106
Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement.arXiv preprint arXiv:1804.02767
Liu, S., & Huang, D. (2018). Receptive field block net for accurate and fast object detection. In Proceedings of the European conference on computer vision, 385–400. https://doi.org/10.1007/978-3-030-01252-6_24
Jiang, B., Luo, R., Mao, J., Xiao, T., & Jiang, Y. (2018). Acquisition of localization confidence for accurate object detection, Proceedings of the 15th European Conference on Computer Vision, 784–799.https://doi.org/10.1007/978-3-030-01264-9_48
Yang, Z., Liu, S., Hu, H., Wang, L., & Lin, S. (2019). RepPoints: Point Set Representation for Object Detection, Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 9656–9665. https://doi.org/10.1109/ICCV.2019.00975
Wu, Y., Chen, Y., Yuan, L., Liu, Z., Wang, L., Li, H. … Recognition (2020). 10183–10192. https://doi.org/10.1109/CVPR42600. 2020.01020
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: Evolution of siamese visual tracking with very deep networks, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern, & Recognition (2019). 4282–4291. https://doi.org/10.1109/CVPR. 2019.00441
Yan, B., Zhang, X., Wang, D., Lu, H., & Yang, X. (2021). Alpha-refine: Boosting tracking performance by precise bounding box estimation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5289–5298. https://doi.org/10.1109/CVPR46437. 2021.00525
Fan, H., & Ling, H. (2020). CRACT: Cascaded Regression-Align-Classification for Robust Visual Tracking. arXiv preprint arXiv:2011.12483
Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., & Sun, J. (2021). You only look one-level feature. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13039–13048. https://doi.org/10.1109/CVPR46437.2021.01284
Wu, Y., Lim, J., & Yang, M. H. (2015). Object Tracking Benchmark. IEEE Transactions on Pattern Analysis & Machine Intelligence, 37(9), 1834–1848. https://doi.org/10.1109/TPAMI.2014.2388226
Article Google Scholar
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., ˇCehovin Zajc, L., & Fernandez, G. (2017). The visual object tracking vot2017 challenge results. In Proceedings of the IEEE international conference on computer vision workshops, pp. 1949–1972. https://doi.org/10.1109/ICCVW.2017.230
Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., & Yu, S. (2019). Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5374–5383. https://doi.org/10.1109/CVPR.2019.00552
Muller, M., Bibi, A., Giancola, S., Alsubaihi, S., & Ghanem, B. (2018). Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In Proceedings of the European Conference on Computer Vision (ECCV) pp. 300–317 https://doi.org/10.1007/978-3-030-01246-5_19
Huang, L., Zhao, X., & Huang, K. (2019). Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(5), 1562–1577. https://doi.org/10.1109/tpami.2019.2957464
Article Google Scholar
Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., & Torr, P. H. (2017). End-to-end representation learning for correlation filter based tracking. Proceedings of the IEEE conference on computer vision and pattern recognition, 2805–2813. 10. https://doi.org/1109/ CVPR.2017.531
Zhang, Z. P., Peng, H. W. Deeper and wider Siamese networks for real-time visual tracking, Proceedings of 2019 IEEE/ CVF Conference on Computer Vision and Pattern, & Recognition (2019). 4586–4595. https://doi.org/10.1109/CVPR. 2019. 00472
Danelljan, M., Bhat, G., Khan, F. S., Felsberg, M. ECO: efficient convolution operators for tracking, Proceedings of 2017 IEEE Conference on Computer Vision and Pattern, & Recognition (2017). 6931–6939. https://doi.org/10.1109/CVPR. 2017. 733
Li, P., Wang, C. B. O. W., Yang, D. X., & Lu, H. (2019). GradNet: Gradient-Guided Network for Visual Object Tracking, Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 6161–6170. https://doi.org/10.1109/ICCV.2019.00626
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J. J., & Hu, W. M. (2018). Distractor-aware Siamese networks for visual object tracking, Proceedings of the 15th European Conference on Computer Vision, 103–119 https://doi.org/10.1007/978-3-030-01240-3_7
Wang, Q., Zhang, L., Bertinetto, L., Hu, W. M., Torr, P. H. S. Fast online object tracking and segmentation: a unifying approach, Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern, & Recognition (2019). 1328–1338. https://doi.org/10.1109/CVPR.2019.00142
Song, Y., Ma, C., Gong, L., Zhang, J., Lau, R. W. H., & Yang, M. (2017). CREST: Convolutional Residual Learning for Visual Tracking, Proceedings of 2017 IEEE International Conference on Computer Vision, 2574–2583. https://doi.org/10.1109/ICCV.2017.279
Zhang, Z., Xie, Y., Xing, F., McGough, M., & Yang, L. (2017). Mdnet: A semantically and visually interpretable medical image diagnosis network. Proceedings of the IEEE conference on computer vision and pattern recognition, 6428–6436. https://doi.org/10.1109/CVPR.2017.378
Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P. H. S. Staple: complementary learners for real-time tracking, Proceedings of 2016 IEEE Conference on Computer Vision and Pattern, & Recognition (2016). 1401–1409. https://doi.org/10.1109/CVPR. 2016. 156
Bhat, G., Johnander, J., Danelljan, M., Khan, F. S., & Felsberg, M. (2018). Unveiling the power of deep tracking. In Proceedings of the European Conference on Computer Vision 483–498 https://doi.org/10.1007/978-3-030-01216-8_30
Ma, C., Huang, J. B., Yang, X., & Yang, M. H. (2015). Hierarchical convolutional features for visual tracking. In Proceedings of the IEEE international conference on computer vision pp. 3074–3082 https://doi.org/10.1109/iccv.2015.352
Danelljan, M., Robinson, A., Shahbaz Khan, F., & Felsberg, M. (2016). Beyond correlation filters: Learning continuous convolution operators for visual tracking. In European conference on computer vision pp. 472–488 https://doi.org/10.1007/978-3-319-46454-1_29
Held, D., Thrun, S., & Savarese, S. (2016). Learning to track at 100 fps with deep regression networks. In European conference on computer vision pp. 749–765 https://doi.org/10.1007/978-3-319-46448-0_45

Download references

Funding

The work was supported by the Natural Science Foundation of China NSFC under Grants 61871445, 61302156; the Key R & D Foundation Project of Jiangsu province under Grant BE2016001-4.

Author information

Authors and Affiliations

Engineering Research Center of Wideband Wireless Communication Technique, Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing, China
Guang Han, Yao Xiao, Fuxiang Wang & Xuhui Liu

Authors

Guang Han
View author publications
You can also search for this author in PubMed Google Scholar
Yao Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Fuxiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xuhui Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Yao Xiao, Fuxiang Wang, Xuhui Liu. The first draft of the manuscript was written by Guang Han and Yao Xiao and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Guang Han.

Ethics declarations

Ethics approval

This paper is not a study with human subjects, so no ethics approval is required.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, G., Xiao, Y., Wang, F. et al. Visual tracking based on depth cross-correlation and feature alignment. J Sign Process Syst 95, 37–47 (2023). https://doi.org/10.1007/s11265-022-01791-2

Download citation

Received: 27 February 2022
Revised: 26 May 2022
Accepted: 30 June 2022
Published: 03 September 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11265-022-01791-2

Index Terms

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual tracking based on depth cross-correlation and feature alignment

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

Data Availability

Reference

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Index Terms

Navigation

Visual tracking based on depth cross-correlation and feature alignment

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

Data Availability

Reference

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Index Terms

Search

Navigation