Abstract
3D objection detection is a key task in autonomous driving. Because 3D structure information is lost during perspective projection, 3D localization of an object from monocular images is challenging. We herein present a monocular 3D object detection method that formulates the 3D object localization as a paired keypoints regression problem. Our method exploits 2D bounding box priors to predict the projection of paired 3D keypoints on the image plane for each object, and the object localization is recovered via an inverse projection. A fast keypoint regression network is proposed to predict the projection of keypoints and to generate the initial 3D bounding box. Furthermore, to obtain more accurate 3D detection results, we leverage a light-weight cascaded refinement module to rectify the initial 3D box, which takes the instance point cloud converted from the monocular depth prediction as input. Experiments on the KITTI dataset demonstrate that our method exhibits state-of-the-art performance solely via monocular images. Our method achieves 15.97, 10.42, and 7.91 3D AP on the three difficulty levels on the KITTI test set, respectively.
Similar content being viewed by others
References
Brazil G, Liu X (2019) M3D-RPN: monocular 3d region proposal network for object detection. In: Proceedings of the IEEE international conference on computer vision, pp 9286–9295
Cai Y, Li B, Jiao Z, Li H, Zeng X, Wang X (2020) Monocular 3d object detection with decoupled structured polygon estimation and height-guided depth estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 10478–10485
Cai Z, Fan Q, Feris RS, Vasconcelos N, Leibe B, Matas J, Sebe N, Welling M (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Proceedings of the European Conference on Computer Vision, pp 354–370
Chabot F, Chaouch M, Rabarisoa J, Teulière C, Chateau T (2017) Deep MANTA: A coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1827–1836
Chang J, Chen Y (2018) Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5410–5418
Chen X, Kundu K, Zhang Z, Ma H, Fidler S, Urtasun R (2016) Monocular 3d object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2147–2156
Chen X, Kundu K, Zhu Y, Ma H, Fidler S, Urtasun R (2018) 3d object proposals using stereo imagery for accurate object class detection. IEEE Trans Pattern Anal Mach Intell 40(5):1259–1272
Chen Y, Tai L, Sun K, Li M (2020) Monopair: Monocular 3d object detection using pairwise spatial relationships. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 12090–12099
Ding M, Huo Y, Yi H, Wang Z, Shi J, Lu Z, Luo P (2020) Learning depth-guided convolutions for monocular 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 11669–11678
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE international conference on computer vision, pp 6568–6577
Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2002–2011
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the KITTI vision benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3354–3361
Godard C, Aodha OM, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6602–6611
Gupta I, Rangesh A, Trivedi MM, Leal-Taixé L, Roth S (2018) 3d bounding boxes for road vehicles: A one-stage, localization prioritized approach using single monocular images. In: Proceedings of the European Conference on Computer Vision Workshops, pp 626–641
Ku J, Pon AD, Waslander SL (2019) Monocular 3d object detection leveraging accurate proposals and shape reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 11867–11876
Law H, Deng J, Ferrari V, Hebert M, Sminchisescu C, Weiss Y (2018) Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision, pp 765–781
Li B, Ouyang W, Sheng L, Zeng X, Wang X (2019a) GS3D: an efficient 3d object detection framework for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1019–1028
Li P, Chen X, Shen S (2019b) Stereo R-CNN based 3d object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7644–7652
Li P, Liu S, Shen S (2019c) Multi-sensor 3d object box refinement for autonomous driving. arXiv:1909.04942
Li P, Zhao H, Liu P, Cao F, Vedaldi A, Bischof H, Brox T, Frahm J (2020) RTM3D: real-time monocular 3d detection from object keypoints for autonomous driving. In: Proceedings of the European Conference on Computer Vision, pp 644–660
Liu L, Lu J, Xu C, Tian Q, Zhou J (2019) Deep fitting degree scoring network for monocular 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1057–1066
Liu Z, Wu Z, Tóth R (2020) SMOKE: single-stage monocular 3d object detection via keypoint estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 4289–4298
Ma X, Wang Z, Li H, Zhang P, Ouyang W, Fan X (2019) Accurate monocular 3d object detection via color-embedded 3d reconstruction for autonomous driving. In: Proceedings of the IEEE international conference on computer vision, pp 6850–6859
Manhardt F, Kehl W, Gaidon A (2019) ROI-10D: monocular lifting of 2d detection to 6d pose and metric shape. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2069–2078
Mousavian A, Anguelov D, Flynn J, Kosecka J (2017) 3d bounding box estimation using deep learning and geometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5632–5640
Naiden A, Paunescu V, Kim G, Jeon B, Leordeanu M (2019) Shift R-CNN: deep monocular 3d object detection with closed-form geometric constraints. IEEE International Conference on Image Processing, ICIP 2019:61–65
Pon AD, Ku J, Li C, Waslander SL (2020) Object-centric stereo matching for 3d object detection. IEEE International Conference on Robotics and Automation, ICRA 2020:8383–8389
Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 77–85
Qi CR, Liu W, Wu C, Su H, Guibas LJ (2018) Frustum pointnets for 3d object detection from RGB-D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 918–927
Qi CR, Litany O, He K, Guibas LJ (2019) Deep hough voting for 3d object detection in point clouds. In: Proceedings of the IEEE international conference on computer vision, pp 9276–9285
Qin Z, Wang J, Lu Y (2019) Triangulation learning network: From monocular to stereo 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7615–7623
Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–779
Shi S, Guo C, Jiang L, Wang Z, Shi J, Wang X, Li H (2020a) PV-RCNN: point-voxel feature set abstraction for 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 10526–10535
Shi X, Chen Z, Kim T (2020b) Distance-normalized unified representation for monocular 3d object detection. In: Proceedings of the European Conference on Computer Vision, pp 91–107
Simonelli A, Bulò SR, Porzi L, Lopez-Antequera M, Kontschieder P (2019) Disentangling monocular 3d object detection. In: Proceedings of the IEEE international conference on computer vision, pp 1991–1999
Simonyan K, Zisserman A, Bengio Y, LeCun Y (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015
Sun J, Chen L, Xie Y, Zhang S, Jiang Q, Zhou X, Bao H (2020) Disp R-CNN: stereo 3d object detection via shape prior guided instance disparity estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 10545–10554
Vora S, Lang AH, Helou B, Beijbom O (2020) Pointpainting: Sequential fusion for 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4603–4611
Wang Y, Chao W, Garg D, Hariharan B, Campbell ME, Weinberger KQ (2019) Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8445–8453
Xiang Y, Choi W, Lin Y, Savarese S (2017) Subcategory-aware convolutional neural networks for object proposals and detection. IEEE Winter Conference on Applications of Computer Vision, WACV 2017:924–933
Xie L, Xiang C, Yu Z, Xu G, Yang Z, Cai D, He X (2020) PI-RCNN: an efficient multi-sensor 3d object detector with point-based attentive cont-conv fusion module. In: Proceedings of the AAAI conference on artificial intelligence, pp 12460–12467
Xu B, Chen Z (2018) Multi-level fusion based 3d object detection from monocular images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2345–2353
Yang B, Luo W, Urtasun R (2018) PIXOR: real-time 3d object detection from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7652–7660
Yang L, Zhang X, Wang L, Zhu M, Li J (2021) Lite-fpn for keypoint-based monocular 3d object detection. arXiv:2105.00268
You Y, Wang Y, Chao W, Garg D, Pleiss G, Hariharan B, Campbell ME, Weinberger KQ (2020) Pseudo-lidar++: Accurate depth for 3d object detection in autonomous driving. In: 8th International Conference on Learning Representations, ICLR 2020
Zhou X, Wang D, Krähenbühl P (2019a) Objects as points. arXiv:1904.07850
Zhou X, Zhuo J, Krähenbühl P (2019b) Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 850–859
Zhou Y, Tuzel O (2018) Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4490–4499
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing Interests
We have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ji, C., Liu, G. & Zhao, D. Monocular 3D object detection via estimation of paired keypoints for autonomous driving. Multimed Tools Appl 81, 5973–5988 (2022). https://doi.org/10.1007/s11042-021-11801-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11801-3