Abstract
The high temporal variation of the point clouds is the key challenge of 3D single-object tracking (3D SOT). Existing approaches rely on the assumption that the shape variation of the point clouds and the motion of the objects across neighboring frames are smooth, failing to cope with high temporal variation data. In this paper, we present a novel framework for 3D SOT in point clouds with high temporal variation, called HVTrack. HVTrack proposes three novel components to tackle the challenges in the high temporal variation scenario: 1) A Relative-Pose-Aware Memory module to handle temporal point cloud shape variations; 2) a Base-Expansion Feature Cross-Attention module to deal with similar object distractions in expanded search areas; 3) a Contextual Point Guided Self-Attention module for suppressing heavy background noise. We construct a dataset with high temporal variation (KITTI-HV) by setting different frame intervals for sampling in the KITTI dataset. On the KITTI-HV with 5 frame intervals, our HVTrack surpasses the state-of-the-art tracker CXTracker by 11.3%/15.7% in Success/Precision.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, X., et al.: Trajectoryformer: 3D object tracking transformer with predictive trajectory hypotheses. arXiv preprint arXiv:2306.05888 (2023)
Cheng, R., Wang, X., Sohel, F., Lei, H.: Topology-aware universal adversarial attack on 3D object tracking. Vis. Intell. 1(1), 31 (2023)
Chiu, H.K., Prioletti, A., Li, J., Bohg, J.: Probabilistic 3D multi-object tracking for autonomous driving. arXiv preprint arXiv:2001.05673 (2020)
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Cui, Y., Fang, Z., Shan, J., Gu, Z., Zhou, S.: 3D object tracking with transformer. arXiv preprint arXiv:2110.14921 (2021)
Ding, S., Rehder, E., Schneider, L., Cordts, M., Gall, J.: 3dmotformer: graph transformer for online 3D multi-object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9784–9794 (2023)
Fang, Z., Zhou, S., Cui, Y., Scherer, S.: 3D-SiamRPN: an end-to-end learning method for real-time 3D single object tracking using raw point cloud. IEEE Sens. J. 21(4), 4995–5011 (2020)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)
Giancola, S., Zarzar, J., Ghanem, B.: Leveraging shape completion for 3D siamese tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1359–1368 (2019)
Guo, Z., Mao, Y., Zhou, W., Wang, M., Li, H.: CMT: context-matching-guided transformer for 3D tracking in point clouds. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13682, pp. 95–111. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20047-2_6
Hui, L., Wang, L., Cheng, M., Xie, J., Yang, J.: 3D siamese voxel-to-BEV tracker for sparse point clouds. In: Advances in Neural Information Processing Systems, vol. 34, pp. 28714–28727 (2021)
Hui, L., Wang, L., Tang, L., Lan, K., Xie, J., Yang, J.: 3D siamese transformer network for single object tracking on point clouds. arXiv preprint arXiv:2207.11995 (2022)
Jiao, L., Wang, D., Bai, Y., Chen, P., Liu, F.: Deep learning in visual tracking: a review. IEEE Trans. Neural Netw. Learn. Syst. 34(9), 5497–5516 (2021)
Jiayao, S., Zhou, S., Cui, Y., Fang, Z.: Real-time 3D single object tracking with transformer. IEEE Trans. Multimedia 25, 2339–2353 (2022)
Kapania, S., Saini, D., Goyal, S., Thakur, N., Jain, R., Nagrath, P.: Multi object tracking with UAVs using deep SORT and YOLOv3 RetinaNet detection framework. In: Proceedings of the 1st ACM Workshop on Autonomous and Intelligent Mobile Systems, pp. 1–6 (2020)
Kart, U., Lukezic, A., Kristan, M., Kamarainen, J.K., Matas, J.: Object tracking by reconstruction with view-specific discriminative correlation filters. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1339–1348 (2019)
Lan, K., Jiang, H., Xie, J.: Temporal-aware siamese tracker: integrate temporal context for 3D object tracking. In: Proceedings of the Asian Conference on Computer Vision, pp. 399–414 (2022)
Luo, C., Yang, X., Yuille, A.: Exploring simple 3D multi-object tracking for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10488–10497 (2021)
Machida, E., Cao, M., Murao, T., Hashimoto, H.: Human motion tracking of mobile robot with kinect 3D sensor. In: Proceedings of SICE Annual Conference (SICE), pp. 2207–2211. IEEE (2012)
Nishimura, H., Komorita, S., Kawanishi, Y., Murase, H.: SDOF-tracker: fast and accurate multiple human tracking by skipped-detection and optical-flow. IEICE Trans. Inf. Syst. 105(11), 1938–1946 (2022)
Pang, Z., Li, Z., Wang, N.: Model-free vehicle tracking and state estimation in point cloud sequences. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8075–8082. IEEE (2021)
Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9277–9286 (2019)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Qi, H., Feng, C., Cao, Z., Zhao, F., Xiao, Y.: P2B: point-to-box network for 3D object tracking in point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6329–6338 (2020)
Ren, C., Xu, Q., Zhang, S., Yang, J.: Hierarchical prior mining for non-local multi-view stereo. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3611–3620 (2023)
Ren, S., Yang, X., Liu, S., Wang, X.: SG-former: self-guided transformer with evolving token reallocation. arXiv preprint arXiv:2308.12216 (2023)
Sadjadpour, T., Li, J., Ambrus, R., Bohg, J.: Shasta: modeling shape and spatio-temporal affinities for 3D multi-object tracking. IEEE Robot. Autom. Lett. (2023)
Shan, J., Zhou, S., Fang, Z., Cui, Y.: PTT: point-track-transformer module for 3D single object tracking in point clouds. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1310–1316 (2021)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Sun, P., et al.: Scalability in perception for autonomous driving: waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454 (2020)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Wang, Q., Chen, Y., Pang, Z., Wang, N., Zhang, Z.: Immortal tracker: tracklet never dies. arXiv preprint arXiv:2111.13672 (2021)
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 38(5), 1–12 (2019)
Wang, Z., Xie, Q., Lai, Y.K., Wu, J., Long, K., Wang, J.: Mlvsnet: multi-level voting siamese network for 3D visual tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3101–3110 (2021)
Weng, X., Wang, J., Held, D., Kitani, K.: 3D multi-object tracking: a baseline and new evaluation metrics. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 10359–10366. IEEE (2020)
Weng, X., Wang, Y., Man, Y., Kitani, K.M.: GNN3DMOT: graph neural network for 3D multi-object tracking with 2D-3D multi-feature learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6499–6508 (2020)
Wu, Q., Yang, J., Sun, K., Zhang, C., Zhang, Y., Salzmann, M.: Mixcycle: mixup assisted semi-supervised 3D single object tracking with cycle consistency. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13956–13966 (2023)
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)
Xu, T.X., Guo, Y.C., Lai, Y.K., Zhang, S.H.: Cxtrack: improving 3D point cloud tracking with contextual information. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1084–1093 (2023)
Yin, T., Zhou, X., Krahenbuhl, P.: Center-based 3D object detection and tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11784–11793 (2021)
Yoo, J.S., Lee, H., Jung, S.W.: Video object segmentation-aware video frame interpolation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12322–12333 (2023)
Zarzar, J., Giancola, S., Ghanem, B.: Efficient bird eye view proposals for 3D siamese tracking. arXiv preprint arXiv:1903.10168 (2019)
Zhang, X., Yang, J., Zhang, S., Zhang, Y.: 3D registration with maximal cliques. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 17745–17754 (2023)
Zheng, C., et al.: Box-aware feature enhancement for single object tracking on point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13199–13208 (2021)
Zheng, C., et al.: Beyond 3D siamese tracking: a motion-centric paradigm for 3D single object tracking in point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8111–8120 (2022)
Zhou, C., et al.: PTTR: relational 3D point cloud object tracking with transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8531–8540 (2022)
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 474–490. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_28
Acknowledgements
This work is supported in part by the National Natural Science Foundation of China (NFSC) under Grants 62372377 and 62176242.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, Q., Sun, K., An, P., Salzmann, M., Zhang, Y., Yang, J. (2025). 3D Single-Object Tracking in Point Clouds with High Temporal Variation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15065. Springer, Cham. https://doi.org/10.1007/978-3-031-72667-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-72667-5_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72666-8
Online ISBN: 978-3-031-72667-5
eBook Packages: Computer ScienceComputer Science (R0)