Stereo3DMOT: Stereo Vision Based 3D Multi-object Tracking with Multimodal ReID

Mao, Chen; Tan, Chong; Liu, Hong; Hu, Jingqi; Zheng, Min

doi:10.1007/978-981-99-8555-5_39

Chen Mao^15,16,
Chong Tan¹⁵,
Hong Liu¹⁵,
Jingqi Hu^15,16 &
…
Min Zheng¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14436))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

355 Accesses

Abstract

3D Multi-Object Tracking (MOT) is a key component in numerous applications, such as autonomous driving and intelligent robotics, playing a crucial role in the perception and decision-making processes of intelligent systems. In this paper, we propose a 3D MOT system based on a cost-effective stereo camera pair, which includes a 3D multimodal re-identification (ReID) model capable of multi-task learning. The ReID model obtains the multimodal features of objects, including RGB and point cloud information. We design data association and trajectory management algorithms. The data association computes an affinity matrix for the object feature embeddings and motion information, while the trajectory management controls the lifecycle of the trajectories. In addition, we create a ReID dataset based on the KITTI Tracking dataset, used for training and validating ReID models. Results demonstrate that our method can achieve accurate object tracking solely with a stereo camera pair, maintaining high reliability even in cases of occlusion and missed detections. Experimental evaluation shows that our approach outperforms competitive results on the KITTI MOT leaderboard. Our code, dataset, and model are available at https://github.com/maomao279/Stereo3DMOT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Shi, S., Wang, X., Li, H.: Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–779 (2019)
Google Scholar
Sun, J., et al.: DISP R-CNN: stereo 3d object detection via shape prior guided instance disparity estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10548–10557 (2020)
Google Scholar
Li, P., Chen, X., Shen, S.: Stereo R-CNN based 3d object detection for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7644–7652 (2019)
Google Scholar
Weng, X., Wang, J., Held, D., Kitani, K.: Ab3dmot: a baseline for 3d multi-object tracking and new evaluation metrics. arXiv preprint arXiv:2008.08063 (2020)
Pang, Z., Li, Z., Wang, N.: SimpleTrack: understanding and rethinking 3D multi-object tracking. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) Computer - ECCV 2022 Workshops. ECCV 2022, Part I, LNCS, vol. 13801, pp. 680–696. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25056-9_43
Benbarka, N., Schröder, J., Zell, A.: Score refinement for confidence-based 3d multi-object tracking. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8083–8090. IEEE (2021)
Google Scholar
Wu, H., Han, W., Wen, C., Li, X., Wang, C.: 3d multi-object tracking in point clouds based on prediction confidence-guided data association. IEEE Trans. Intell. Transp. Syst. 23(6), 5668–5677 (2021)
Article Google Scholar
Wang, X., He, J., Fu, C., Meng, T., Huang, M.: You only need two detectors to achieve multi-modal 3d multi-object tracking. arXiv preprint arXiv:2304.08709 (2023)
Huang, K., Hao, Q.: Joint multi-object detection and tracking with camera-lidar fusion for autonomous driving. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6983–6989. IEEE (2021)
Google Scholar
Wang, X., Fu, C., He, J., Wang, S., Wang, J.: Strongfusionmot: a multi-object tracking method based on lidar-camera fusion. IEEE Sens. J. 23, 11241–11252 (2022)
Article Google Scholar
Zhang, K., Liu, Y., Mei, F., Jin, J., Wang, Y.: Boost correlation features with 3D-MiIoU-based camera-LiDAR fusion for MODT in autonomous driving. Remote Sens. 15(4), 874 (2023)
Article Google Scholar
Baser, E., Balasubramanian, V., Bhattacharyya, P., Czarnecki, K.: Fantrack: 3d multi-object tracking with feature association network. In: 2019 IEEE Intelligent Vehicles Symposium (IV), pp. 1426–1433. IEEE (2019)
Google Scholar
Marinello, N., Proesmans, M., Van Gool, L.: Triplettrack: 3d object tracking using triplet embeddings and LSTM. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4500–4510 (2022)
Google Scholar
Weng, X., Wang, Y., Man, Y., Kitani, K.M.: Gnn3dmot: Graph neural network for 3d multi-object tracking with 2d–3d multi-feature learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6499–6508 (2020)
Google Scholar
Kim, A., Ošep, A., Leal-Taixé, L.: Eagermot: 3d multi-object tracking via sensor fusion. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 11315–11321. IEEE (2021)
Google Scholar
Wang, X., Fu, C., Li, Z., Lai, Y., He, J.: DeepFusionMOT: a 3d multi-object tracking framework based on camera-lidar fusion with deep association. IEEE Robot. Autom. Lett. 7(3), 8260–8267 (2022)
Article Google Scholar
Kuma, R., Weill, E., Aghdasi, F., Sriram, P.: Vehicle re-identification: an efficient baseline using triplet embedding. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–9. IEEE (2019)
Google Scholar
Hao, Y., Wang, N., Li, J., Gao, X.: HSmE: hypersphere manifold embedding for visible thermal person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8385–8392 (2019)
Google Scholar
Li, Y.J., Chen, Y.C., Lin, Y.Y., Du, X., Wang, Y.C.F.: Recover and identify: a generative dual model for cross-resolution person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8090–8099 (2019)
Google Scholar
Li, M., Zhu, X., Gong, S.: Unsupervised tracklet person re-identification. IEEE Trans. Pattern Anal. Mach. Intell. 42(7), 1770–1782 (2019)
Article Google Scholar
Luo, H., Gu, Y., Liao, X., Lai, S., Jiang, W.: Bag of tricks and a strong baseline for deep person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)
Google Scholar
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.: Deep learning for person re-identification: a survey and outlook. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 2872–2893 (2021)
Article Google Scholar
He, L., Liao, X., Liu, W., Liu, X., Cheng, P., Mei, T.: Fastreid: a pytorch toolbox for general instance re-identification. arXiv preprint arXiv:2006.02631 (2020)
Wu, A., Zheng, W.S., Lai, J.H.: Robust depth-based person re-identification. IEEE Trans. Image Process. 26(6), 2588–2603 (2017)
Article MathSciNet Google Scholar
Karianakis, N., Liu, Z., Chen, Y., Soatto, S.: Person depth REID: robust person re-identification with commodity depth sensors. arXiv preprint arXiv:1705.09882 (2017)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)
Google Scholar
Howard, A., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Google Scholar
Xu, G., Wang, Y., Cheng, J., Tang, J., Yang, X.: Accurate and efficient stereo matching via attention concatenation volume. arXiv preprint arXiv:2209.12699 (2022)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Shenoi, A., et al.: JRMOT: a real-time 3d multi-object tracker and a new large-scale dataset. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 10335–10342. IEEE (2020)
Google Scholar
Luiten, J., Fischer, T., Leibe, B.: Track to reconstruct and reconstruct to track. IEEE Robot. Autom. Lett. 5(2), 1803–1810 (2020)
Article Google Scholar
Kim, A., Brasó, G., Ošep, A., Leal-Taixé, L. (2022). PolarMOT: how far can geometric relations take us in 3D multi-object tracking?. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022. ECCV 2022. LNCS, Part XXII, vol. 13682, pp. 41–58. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20047-2_3

Download references

Author information

Authors and Affiliations

Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China
Chen Mao, Chong Tan, Hong Liu, Jingqi Hu & Min Zheng
University of Chinese Academy of Sciences, Beijing, 101408, China
Chen Mao & Jingqi Hu

Authors

Chen Mao
View author publications
You can also search for this author in PubMed Google Scholar
Chong Tan
View author publications
You can also search for this author in PubMed Google Scholar
Hong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jingqi Hu
View author publications
You can also search for this author in PubMed Google Scholar
Min Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Min Zheng .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mao, C., Tan, C., Liu, H., Hu, J., Zheng, M. (2024). Stereo3DMOT: Stereo Vision Based 3D Multi-object Tracking with Multimodal ReID. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14436. Springer, Singapore. https://doi.org/10.1007/978-981-99-8555-5_39

Download citation

DOI: https://doi.org/10.1007/978-981-99-8555-5_39
Published: 28 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8554-8
Online ISBN: 978-981-99-8555-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Stereo3DMOT: Stereo Vision Based 3D Multi-object Tracking with Multimodal ReID