Abstract
Acquiring 3D trajectories of on-road vehicles is an essential visual task for autonomous driving systems. Existing 3D vehicle tracking methods either rely on point cloud data or need to be trained on visual tracking datasets. In comparison, a decoupled monocular 3D vehicle tracking framework is proposed in this paper. Because our framework is the first of its kind, a previous decoupled LiDAR-based method is taken as the baseline by substituting its detector with a monocular one. On this foundation, we further employ global coordinates to cancel out ego motion and introduce the angular rate into the 3D Kalman filter. In order to tackle the problem of long-term association, a trajectory management scheme is proposed with our novel hibernation mechanism. Furthermore, it is pointed out that current monocular 3D tracking methods have not been tailored for the depth estimation uncertainty produced by monocular 3D detectors. In this regard, we propose a depth-aware association strategy which endows remoter vehicles with larger matching regions in the data association stage. As another contribution, we discuss the defects of current metrics for evaluating 3D tracking performance and devise a nonuniform metric which is dedicated to monocular vision. Through extensive experiments conducted on the KITTI tracking benchmark, the superiority of proposed monocular 3D vehicle tracking framework and metric is demonstrated by both quantitative results and qualitative intuition.






Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Kim A, Ošep A, Leal-Taixé L (2021) Eagermot: 3d multi-object tracking via sensor fusion. arXiv:2104.14682
Wu H, Han W, Wen C, Li X, Wang C (2021) 3d multi-object tracking in point clouds based on prediction confidence-guided data association. IEEE Trans Intell Transp Syst
Chaabane M, Zhang P, Beveridge JR, O’Hara S (2021) Deft: Detection embeddings for tracking. arXiv:2102.02267
Zhou X, Koltun V, Krähenbühl P (2020) Tracking objects as points. In: European Conference on Computer Vision, Springer, pp 474–490
Weng X, Kitani K (2019) A baseline for 3d multi-object tracking. arXiv:1907.03961
Kuhn HW (1955) The hungarian method for the assignment problem, vol 2
Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing 2008:1–10
Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: European conference on computer vision, Springer, pp 17–35
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 3354–3361
Brazil G, Liu X (2019) M3d-rpn: Monocular 3d region proposal network for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9287–9296
Chabot F, Chaouch M, Rabarisoa J, Teuliere C, Chateau T (2017) Deep manta: A coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2040–2049
He T, Soatto S (2019) Mono3d++: Monocular 3d vehicle detection with two-scale 3d hypotheses and task priors. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 8409–8416
Manhardt F, Kehl W, Gaidon A (2019) Roi-10d: Monocular lifting of 2d detection to 6d pose and metric shape. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2069–2078
Qin Z, Wang J, Lu Y (2019) Monogrnet: A geometric reasoning network for monocular 3d object localization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 8851–8858
Simonelli A, Bulo SR, Porzi L, Antequera ML, Kontschieder P (2020) Disentangling monocular 3d object detection: From single to multi-class recognition. IEEE Trans Pattern Anal Mach Intell
Chen Y, Tai L, Sun K, Li M (2020) Monopair: Monocular 3d object detection using pairwise spatial relationships. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12093–12102
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv:1904.07850
Gao T, Pan H, Gao H (2020) Monocular 3d object detection with sequential feature association and depth hint augmentation. arXiv:2011.14589
Liu Z, Wu Z, Tóth R (2020) Smoke: Single-stage monocular 3d object detection via keypoint estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 996–997
Ma X, Zhang Y, Xu D, Zhou D, Yi S, Li H, Ouyang W (2021) Delving into localization errors for monocular 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4721–4730
Li P, Zhao H (2021) Monocular 3d detection with geometric constraint embedding and semi-supervised training. IEEE Robotics and Automation Letters 6(3):5565–5572
Li P, Zhao H, Liu P, Cao F (2020) Rtm3d: Real-time monocular 3d detection from object keypoints for autonomous driving. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, Springer, pp 644–660
Cai Y, Li B, Jiao Z, Li H, Zeng X, Wang X (2020) Monocular 3d object detection with decoupled structured polygon estimation and height-guided depth estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, pp 10478–10485
Ding M, Huo Y, Yi H, Wang Z, Shi J, Lu Z, Luo P (2020) Learning depth-guided convolutions for monocular 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 1000–1001
Bao W, Xu B, Chen Z (2019) Monofenet: Monocular 3d object detection with feature enhancement networks. IEEE Trans Image Process 29:2753–2765
Pal SK, Pramanik A, Maiti J, Mitra P (2021) Deep learning in multi-object detection and tracking: state of the art. Appl Intell, pp 1–30
Hu H-N, Cai Q-Z, Wang D, Lin J, Sun M, Krahenbuhl P, Darrell T, Yu F (2019) Joint monocular 3d vehicle detection and tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 5390–5399
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 941–951
Yu F, Wang D, Shelhamer E, Darrell T (2018) Deep layer aggregation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2403–2412
Mao Q-C, Sun H-M, Zuo L-Q, Jia R-S (2020) Finding every car: a traffic surveillance multi-scale vehicle object detection method. Appl Intell 50(10):3125–3136
Yin G, Yu M, Wang M, Hu Y, Zhang Y (2021) Research on highway vehicle detection based on faster r-cnn and domain adaptation. Appl Intell, pp 1–16
Wang K, Liu M (2021) Yolov3-mt: A yolov3 using multi-target tracking for vehicle visual detection. Appl Intell, pp 1–22
Luiten J, Osep A, Dendorfer P, Torr P, Geiger A, Leal-Taixé L, Leibe B (2021) Hota: A higher order metric for evaluating multi-object tracking. International journal of computer vision 129 (2):548–578
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant U1964201.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gao, T., Jia, Z., Lin, W. et al. Delving into monocular 3D vehicle tracking: a decoupled framework and a dedicated metric. Appl Intell 53, 746–756 (2023). https://doi.org/10.1007/s10489-022-03432-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03432-4