Skip to main content

Dynamic Subframe Splitting and Spatio-Temporal Motion Entangled Sparse Attention for RGB-E Tracking

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15043))

Included in the following conference series:

  • 233 Accesses

Abstract

Event-based bionic camera asynchronously captures dynamic scenes with high temporal resolution and high dynamic range, offering potential for the integration of events and RGB under conditions of illumination degradation and fast motion. Existing RGB-E tracking methods model event characteristics utilising attention mechanism of Transformer before integrating both modalities. Nevertheless, these methods involve aggregating the event stream into a single event frame, lacking the utilisation of the temporal information inherent in the event stream. Moreover, the traditional attention mechanism is well-suited for dense semantic features, while the attention mechanism for sparse event features require revolution. In this paper, we propose a dynamic event subframe splitting strategy to split the event stream into more fine-grained event clusters, aiming to capture spatio-temporal features that contain motion cues. Based on this, we design an event-based sparse attention mechanism to enhance the interaction of event features in temporal and spatial dimensions. The experimental results indicate that our method outperforms existing state-of-the-art methods on the FE240 and COESOT datasets, providing an effective processing manner for the event data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer tracking. In: CVPR (2021)

    Google Scholar 

  2. Danelljan, M., Gool, L.V., Timofte, R.: Probabilistic regression for visual tracking. In: CVPR (2020)

    Google Scholar 

  3. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  4. Fu, Y., Li, M., Liu, W., Wang, Y., Zhang, J., Yin, B., Wei, X., Yang, X.: Distractor-aware event-based tracking. IEEE TIP (2023)

    Google Scholar 

  5. Gallego, G., Delbrück, T., Orchard, G., Bartolozzi, C., Taba, B., Censi, A., Leutenegger, S., Davison, A.J., Conradt, J., Daniilidis, K., et al.: Event-based vision: a survey. IEEE TPAMI 44(1), 154–180 (2020)

    Google Scholar 

  6. Gao, S., Zhou, C., Ma, C., Wang, X., Yuan, J.: Aiatrack: attention in attention for transformer visual tracking. In: ECCV, pp. 146–164. Springer (2022)

    Google Scholar 

  7. Jiang, H., Wu, X., Xu, T.: Asymmetric attention fusion for unsupervised video object segmentation. In: PRCV, pp. 170–182. Springer (2023)

    Google Scholar 

  8. Mayer, C., Danelljan, M., Bhat, G., Paul, M., Paudel, D.P., Yu, F., Van Gool, L.: Transforming model prediction for tracking. In: CVPR, pp. 8731–8740 (2022)

    Google Scholar 

  9. Shao, P., Xu, T., Tang, Z., Li, L., Wu, X.J., Kittler, J.: Tenet: targetness entanglement incorporating with multi-scale pooling and mutually-guided fusion for RGB-E object tracking. arXiv preprint arXiv:2405.05004 (2024)

  10. Tang, C., Wang, X., Huang, J., Jiang, B., Zhu, L., Zhang, J., Wang, Y., Tian, Y.: Revisiting color-event based tracking: a unified network, dataset, and metric. arXiv preprint arXiv:2211.11010 (2022)

  11. Tang, Z., Xu, T., Li, H., Wu, X.J., Zhu, X., Kittler, J.: Exploring fusion strategies for accurate RGBT visual object tracking. Inf. Fusion 99, 101881 (2023)

    Article  Google Scholar 

  12. Tang, Z., Xu, T., Wu, X., Zhu, X.F., Kittler, J.: Generative-based fusion mechanism for multi-modal tracking. In: AAAI, vol. 38, pp. 5189–5197 (2024)

    Google Scholar 

  13. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. NeurIPS 30 (2017)

    Google Scholar 

  14. Wang, N., Zhou, W., Wang, J., Li, H.: Transformer meets tracker: exploiting temporal context for robust visual tracking. In: CVPR, pp. 1571–1580 (2021)

    Google Scholar 

  15. Wang, X., Li, J., Zhu, L., Zhang, Z., Chen, Z., Li, X., Wang, Y., Tian, Y., Wu, F.: Visevent: reliable object tracking via collaboration of frame and event flows. IEEE TCYB (2023)

    Google Scholar 

  16. Wang, X., Wang, S., Tang, C., Zhu, L., Jiang, B., Tian, Y., Tang, J.: Event stream-based visual object tracking: a high-resolution benchmark dataset and a novel baseline. arXiv preprint arXiv:2309.14611 (2023)

  17. Xu, T., Feng, Z.H., Wu, X.J., Kittler, J.: Joint group feature selection and discriminative filter learning for robust visual object tracking. In: ICCV, pp. 7950–7960 (2019)

    Google Scholar 

  18. Xu, T., Feng, Z., Wu, X.J., Kittler, J.: Adaptive channel selection for robust visual object tracking with discriminative correlation filters. IJCV 129, 1359–1375 (2021)

    Article  Google Scholar 

  19. Xu, T., Zhu, X.F., Wu, X.J.: Learning spatio-temporal discriminative model for affine subspace based visual object tracking. Vis. Intell. 1(1), 4 (2023)

    Article  Google Scholar 

  20. Yan, B., Peng, H., Fu, J., Wang, D., Lu, H.: Learning spatio-temporal transformer for visual tracking. In: ICCV (2021)

    Google Scholar 

  21. Yan, S., Yang, J., Käpylä, J., Zheng, F., Leonardis, A., Kämäräinen, J.K.: Depthtrack: Unveiling the power of RGBD tracking. In: ICCV, pp. 10725–10733 (2021)

    Google Scholar 

  22. Ye, B., Chang, H., Ma, B., Shan, S., Chen, X.: Joint feature learning and relation modeling for tracking: a one-stream framework. In: ECCV, pp. 341–357. Springer (2022)

    Google Scholar 

  23. Zhang, H., Gao, Z., Zhang, J., Yang, G.: Visual tracking with levy flight grasshopper optimization algorithm. In: PRCV, pp. 217–227. Springer (2019)

    Google Scholar 

  24. Zhang, J., Dong, B., Fu, Y., Wang, Y., Wei, X., Yin, B., Yang, X.: A universal event-based plug-in module for visual object tracking in degraded conditions. IJCV pp. 1–23 (2023)

    Google Scholar 

  25. Zhang, J., Dong, B., Zhang, H., Ding, J., Heide, F., Yin, B., Yang, X.: Spiking transformers for event-based single object tracking. In: CVPR, pp. 8801–8810 (2022)

    Google Scholar 

  26. Zhang, J., Wang, Y., Liu, W., Li, M., Bai, J., Yin, B., Yang, X.: Frame-event alignment and fusion network for high frame rate tracking. In: CVPR, pp. 9781–9790 (2023)

    Google Scholar 

  27. Zhang, J., Yang, X., Fu, Y., Wei, X., Yin, B., Dong, B.: Object tracking by jointly exploiting frame and event domain. In: ICCV, pp. 13043–13052 (2021)

    Google Scholar 

  28. Zhang, P., Zhao, J., Wang, D., Lu, H., Ruan, X.: Visible-thermal UAV tracking: a large-scale benchmark and new baseline. In: CVPR, pp. 8886–8895 (2022)

    Google Scholar 

  29. Zhao, X., Zhang, Y.: Tfatrack: Temporal feature aggregation for UAV tracking and a unified benchmark. In: PRCV, pp. 55–66. Springer (2022)

    Google Scholar 

  30. Zhu, J., Lai, S., Chen, X., Wang, D., Lu, H.: Visual prompt multi-modal tracking. In: CVPR, pp. 9516–9526 (2023)

    Google Scholar 

  31. Zhu, X.F., Xu, T., Tang, Z., Wu, Z., Liu, H., Yang, X., Wu, X.J., Kittler, J.: Rgbd1k: a large-scale dataset and benchmark for RGB-D object tracking. In: AAAI, vol. 37, pp. 3870–3878 (2023)

    Google Scholar 

  32. Zhu, Z., Hou, J., Wu, D.O.: Cross-modal orthogonal high-rank augmentation for RGB-event transformer-trackers. In: CVPR, pp. 22045–22055 (2023)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao-Jun Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shao, P., Xu, T., Zhu, XF., Wu, XJ., Kittler, J. (2025). Dynamic Subframe Splitting and Spatio-Temporal Motion Entangled Sparse Attention for RGB-E Tracking. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15043. Springer, Singapore. https://doi.org/10.1007/978-981-97-8493-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-8493-6_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-8492-9

  • Online ISBN: 978-981-97-8493-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics