Skip to main content

Advertisement

Log in

SiamRCSC: Robust siamese network with channel and spatial constraints for visual object tracking

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Locating and classifying the target object is performed by the siamese-based tracking framework by evaluating the similarity on the feature maps from the template and search branches. While the promising tracking performances have been achieved by the state-of-the-art (SOTA) trackers, the robustness and accuracy of these trackers significantly decline in complex scenes, such as deformation in appearance and interference in background. In order to suppress these defects, sophisticated template updating mechanisms as well as refinement modules have been proposed by many recent works, but these methods do not consider higher-order correlations between features, which play an important role in obtaining accurate localization and classification information. For this, we propose a high-order spatial constraint (HOSC) module, considering high-order correlations between features through recursively executing interactions in feature maps. Additionally, to learn superior feature representations and more discriminative features, a more efficient and effective channel attention with mutual compensation (CAMC) module is proposed in this work, where channel attention from the template branch is utilized to enhance the channel constraint of the search branch for improving the learning of discriminative features and it would be advantageous for the template branch to encode more contextual information from the search image. Finally, extensive experiments were conducted on datasets (OTB100, VOT2018, LaSOT and GOT-10K), and the proposed method achieves competitive performance compared to SOTA trackers (CNN-based).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Algorithm 2
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

The datasets used or analyzed during the current study are available from the corresponding author upon reasonable request.

References

  1. Chen, X., Yan, X., Zheng, F., Jiang, Y., Xia, S.-T., Zhao, Y., Ji, R.: One-shot adversarial attacks on visual tracking with dual attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

  2. Zhang, J., Yuan, T., He, Y., Wang, J.: A background-aware correlation filter with adaptive saliency-aware regularization for visual tracking. Neural Comput. Appl. 34, 6359–6376 (2022). https://doi.org/10.1007/s00521-021-06771-4

    Article  Google Scholar 

  3. Wang, Z., Li, M., Lu, Y., Bao, Y., Li, Z., Zhao, J.: Effective multiple pedestrian tracking system in video surveillance with monocular stationary camera. Expert Syst. Appl. 178, 114992 (2021). https://doi.org/10.1016/j.eswa.2021.114992

    Article  Google Scholar 

  4. Bao, Y., Yu, Y., Qi, Y., Wang, Z.: Multiple object tracking with adaptive multi-features fusion and improved learnable graph matching. Vis. Comput. (2023). https://doi.org/10.1007/s00371-023-02916-9

    Article  Google Scholar 

  5. Gao, J., Zhang, T., Xu, C.: I know the relationships: Zero-shot action recognition via two-stream graph convolutional networks and knowledge graphs. Proc. AAAI Conf. Artif. Intell. 33, 8303–8311 (2019). https://doi.org/10.1609/aaai.v33i01.33018303

    Article  Google Scholar 

  6. Hu, Y., Fu, J., Chen, M., Gao, J., Dong, J., Fan, B., Liu, H.: Learning proposal-aware re-ranking for weakly-supervised temporal action localization. IEEE Trans. Circuits Syst. Video Technol. 34(1), 207–220 (2024). https://doi.org/10.1109/TCSVT.2023.3283430

    Article  Google Scholar 

  7. Fu, Z., Liu, Q., Fu, Z., Wang, Y.: Stmtrack: Template-free visual tracking with space-time memory networks. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021). https://doi.org/10.1109/cvpr46437.2021.01356

  8. Yu, Y., Xiong, Y., Huang, W., Scott, M.R.: Deformable siamese attention networks for visual object tracking. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021). https://doi.org/10.1109/cvpr42600.2020.00676

  9. Danelljan, M., Bhat, G., Khan, F. Felsberg, M.: ATOM: Accurate Tracking by Overlap Maximization (2018)

  10. Zhang, Z., Peng, H., Fu, J., Li, B., Hu, W.: Ocean: object-aware anchor-free tracking, pp. 771–787 (2020). https://doi.org/10.1007/978-3-030-58589-1-46

  11. Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr42600.2020.00670

  12. Bertinetto, L., Valmadre, J., Henriques, J.A.P., Vedaldi, A. Torr, P.H.S.: Fully-Convolutional Siamese Networks for Object Tracking (2016)

  13. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2016). https://doi.org/10.1109/tpami.2016.2577031

    Article  Google Scholar 

  14. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/cvpr.2018.00935

  15. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: Evolution of Siamese visual tracking with very deep networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://doi.org/10.1109/cvpr.2019.00441

  16. Guo, D., Shao, Y., Cui, Y., Wang, Z., Zhang, L., Shen, C.: Graph attention tracking. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021). https://doi.org/10.1109/cvpr46437.2021.00942

  17. Mayer, C., Danelljan, M., Pani Paudel, D., Van Gool, L.: Learning target candidate association to keep track of what not to track. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021). https://doi.org/10.1109/iccv48922.2021.01319

  18. Yang, K., Zhang, H., Shi, J., Ma, J.: Bandt: A border-aware network with deformable transformers for visual tracking. IEEE Trans. Consum. Electron. 69(3), 377–390 (2023). https://doi.org/10.1109/TCE.2023.3251407

    Article  Google Scholar 

  19. Rao, Y., Zhao, W., Tang, Y., Zhou, J., Lim, S.-N., Lu, J.: HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions (2022)

  20. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16\(\times\)16 words: Transformers for image recognition at scale. Computer Vision and Pattern Recognition (2020)

  21. Cheng, S., Zhong, B., Li, G., Liu, X., Tang, Z., Li, X., Wang, J.: Learning to filter: Siamese relation network for robust tracking. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021). https://doi.org/10.1109/cvpr46437.2021.00440

  22. Bromley, J., Guyon, I., LeCun, Y., Sackinger, E. Shah, R.: Signature Verification using a “Siamese” Time Delay Neural Network (1993)

  23. Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese Instance Search for Tracking (2016)

  24. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware Siamese Networks for Visual Object Tracking, pp. 103–119 (2018). https://doi.org/10.1007/978-3-030-01240-3-7

  25. Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr.2019.00814

  26. Wang, G., Luo, C., Xiong, Z., Zeng, W.: Spm-tracker: Series-parallel matching for real-time visual object tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr.2019.00376

  27. Zhang, Z., Peng, H.: Deeper and wider Siamese networks for real-time visual tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr.2019.00472

  28. Zhang, L., Gonzalez-Garcia, A., Weijer, J., Danelljan, M., Khan, F.: Learning the Model Update for Siamese Trackers (2019)

  29. Li, P., Chen, B., Ouyang, W., Wang, D., Yang, X., Lu, H.: Gradnet: Gradient-guided network for visual object tracking. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2020). https://doi.org/10.1109/iccv.2019.00626

  30. Wang, Q., Teng, Z., Xing, J., Gao, J., Hu, W., Maybank, S.: Learning attentions: Residual attentional siamese network for high performance online visual tracking. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/cvpr.2018.00510

  31. Chen, B., Li, P., Sun, C., Wang, D., Yang, G., Lu, H.: Multi attention module for visual tracking. Pattern Recogn. 87, 80–93 (2019). https://doi.org/10.1016/j.patcog.2018.10.005

    Article  Google Scholar 

  32. Saribas, H., Cevikalp, H., Köpüklü, O., Uzun, B.: TRAT: tracking by attention using spatio-temporal features. (2020) CoRR: arXiv:2011.09524

  33. Chen, J., Huang, B., Li, J., Wang, Y., Ren, M., Xu, T.: Learning spatio-temporal attention based siamese network for tracking uavs in the wild. Remote Sensing 14(8), 1797 (2022). https://doi.org/10.3390/rs14081797

    Article  Google Scholar 

  34. Huang, K., Qin, P., Tu, X., Leng, L., Chu, J.: Siamcam: A real-time siamese network for object tracking with compensating attention mechanism. Appl. Sci. 12(8), 3931 (2022). https://doi.org/10.3390/app12083931

    Article  Google Scholar 

  35. Wang, Z., Li, Z., Leng, J., Li, M., Bai, L.: Multiple pedestrian tracking with graph attention map on urban road scene. IEEE Trans. Intell. Transp. Syst. 24(8), 8567–8579 (2023). https://doi.org/10.1109/TITS.2022.3193961

    Article  Google Scholar 

  36. Gao, J., Zhang, T., Xu, C.: Graph convolutional tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://doi.org/10.1109/cvpr.2019.00478

  37. Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011–2023 (2019). https://doi.org/10.1109/tpami.2019.2913372

  38. Woo, S., Park, J., Lee, J.-Y., Kweon, I.: CBAM: Convolutional Block Attention Module (2018)

  39. Hao, C., Chen, Y., Yang, Z.-X., Wu, E.: Higher-order potentials for video object segmentation in bilateral space. Neurocomputing 401, 28–35 (2020). https://doi.org/10.1016/j.neucom.2020.03.020

    Article  Google Scholar 

  40. Xing, D., Evangeliou, N., Tsoukalas, A., Tzes, A.: Siamese transformer pyramid networks for real-time uav tracking. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2022). https://doi.org/10.1109/wacv51458.2022.00196

  41. Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable Convolutional Networks (2017)

  42. Wu, Y., Lim, J., Yang, M.-H.: Object tracking benchmark. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1834–1848. (2015). https://doi.org/10.1109/tpami.2014.2388226

  43. Kristan, M., Leonardis, A., Matas, J. et al.: The Sixth Visual Object Tracking VOT2018 Challenge Results, pp. 3–53 (2019). https://doi.org/10.1007/978-3-030-11009-3-1

  44. Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., Ling, H.: Lasot: A high-quality benchmark for large-scale single object tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://doi.org/10.1109/cvpr.2019.00552

  45. Huang, L., Zhao, X., Huang, K.: Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1562–1577 (2021). https://doi.org/10.1109/tpami.2019.2957464

    Article  Google Scholar 

  46. Miller, G.A.: Wordnet: a lexical database for english: New horizons in commercial and industrial AI. Communications of The ACM, (1995)

  47. Yan, B., Peng, H., Fu, J., Wang, D., Lu, H.: Learning spatio-temporal transformer for visual tracking. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021). https://doi.org/10.1109/iccv48922.2021.01028

  48. Cao, Z., Fu, C., Ye, J., Li, B., Li, Y.: Hift: Hierarchical feature transformer for aerial tracking. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021). https://doi.org/10.1109/iccv48922.2021.01517

  49. Mayer, C., Danelljan, M., Bhat, G., Paul, M., Pani, D., Fisher, P., Luc, Y., Gool, V.: Transforming model prediction for tracking, pp. 8731–8740. (2022)

  50. Song, Z., Yu, J., Chen, Y.P., Yang, W.: Transformer tracking with cyclic shifting window attention. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8781–8790 (2022). https://doi.org/10.1109/CVPR52688.2022.00859

  51. Bhat, G., Danelljan, M., Gool, L., Timofte, R.: Know Your Surroundings: Exploiting Scene Information for Object Tracking (2020)

  52. Bhat, G., Danelljan, M., Van Gool, L., Timofte, R.: Learning discriminative model prediction for tracking. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2020). https://doi.org/10.1109/iccv.2019.00628

  53. Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.S.: Fast online object tracking and segmentation: a unifying approach. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr.2019.00142

  54. Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr.2019.00472

  55. Zhao, M., Okada, K., Inaba, M.: Trtr: Visual tracking with transformer. Computer Vision and Pattern Recognition (2021)

  56. Voigtlaender, P., Luiten, J., Torr, P.H.S., Leibe, B.: Siam r-cnn: Visual tracking by re-detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr42600.2020.00661

  57. Danelljan, M., Van Gool, L., Timofte, R.: Probabilistic regression for visual tracking. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/cvpr42600.2020.00721

  58. Xu, Y., Wang, Z., Li, Z., Yuan, Y., Yu, G.: Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. Proc. AAAI Conf. Artif. Intell. 34, 12549–12556 (2020). https://doi.org/10.1609/aaai.v34i07.6944

  59. Wang, N., Zhou, W., Wang, J., Li, H.: Transformer meets tracker: Exploiting temporal context for robust visual tracking. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021). https://doi.org/10.1109/cvpr46437.2021.00162

Download references

Acknowledgements

We want to thank Xun Che for help with experiments and equipment, and Yong Liu for guidance on method design and implementation.

Funding

This work was financially supported in part by Jiangsu Provincial Key Research and Development Program (Grant No. BE2017301), Jiangsu Provincial Key Research and Development Program (Grant No. BE2022363), Project of Jiangsu Modern Agricultural Machinery Equipment Technology Demonstration and Promotion (Grant No. NJ2022-03), National Natural Science Fundation of China (Grant No. 61473155), and Six Talent Peaks Project in Jiangsu Province of China (No. GDZB-039).

Author information

Authors and Affiliations

Authors

Contributions

Y.Z. completed manuscript writing, framework design, and evaluation. X.C. performed experiment configuration, model training, and manuscript quality improvement. Y.L. controlled the direction and design of this study. All authors reviewed the manuscript.

Corresponding author

Correspondence to Yong Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Communicated by Junyu Gao.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 6833 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, Y., Liu, Y. & Che, X. SiamRCSC: Robust siamese network with channel and spatial constraints for visual object tracking. Multimedia Systems 30, 323 (2024). https://doi.org/10.1007/s00530-024-01524-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-024-01524-4

Keywords