Skip to main content
Log in

Monocular 3D object detection via estimation of paired keypoints for autonomous driving

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

3D objection detection is a key task in autonomous driving. Because 3D structure information is lost during perspective projection, 3D localization of an object from monocular images is challenging. We herein present a monocular 3D object detection method that formulates the 3D object localization as a paired keypoints regression problem. Our method exploits 2D bounding box priors to predict the projection of paired 3D keypoints on the image plane for each object, and the object localization is recovered via an inverse projection. A fast keypoint regression network is proposed to predict the projection of keypoints and to generate the initial 3D bounding box. Furthermore, to obtain more accurate 3D detection results, we leverage a light-weight cascaded refinement module to rectify the initial 3D box, which takes the instance point cloud converted from the monocular depth prediction as input. Experiments on the KITTI dataset demonstrate that our method exhibits state-of-the-art performance solely via monocular images. Our method achieves 15.97, 10.42, and 7.91 3D AP on the three difficulty levels on the KITTI test set, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Brazil G, Liu X (2019) M3D-RPN: monocular 3d region proposal network for object detection. In: Proceedings of the IEEE international conference on computer vision, pp 9286–9295

  2. Cai Y, Li B, Jiao Z, Li H, Zeng X, Wang X (2020) Monocular 3d object detection with decoupled structured polygon estimation and height-guided depth estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 10478–10485

  3. Cai Z, Fan Q, Feris RS, Vasconcelos N, Leibe B, Matas J, Sebe N, Welling M (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Proceedings of the European Conference on Computer Vision, pp 354–370

  4. Chabot F, Chaouch M, Rabarisoa J, Teulière C, Chateau T (2017) Deep MANTA: A coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1827–1836

  5. Chang J, Chen Y (2018) Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5410–5418

  6. Chen X, Kundu K, Zhang Z, Ma H, Fidler S, Urtasun R (2016) Monocular 3d object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2147–2156

  7. Chen X, Kundu K, Zhu Y, Ma H, Fidler S, Urtasun R (2018) 3d object proposals using stereo imagery for accurate object class detection. IEEE Trans Pattern Anal Mach Intell 40(5):1259–1272

    Article  Google Scholar 

  8. Chen Y, Tai L, Sun K, Li M (2020) Monopair: Monocular 3d object detection using pairwise spatial relationships. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 12090–12099

  9. Ding M, Huo Y, Yi H, Wang Z, Shi J, Lu Z, Luo P (2020) Learning depth-guided convolutions for monocular 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 11669–11678

  10. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE international conference on computer vision, pp 6568–6577

  11. Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2002–2011

  12. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the KITTI vision benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3354–3361

  13. Godard C, Aodha OM, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6602–6611

  14. Gupta I, Rangesh A, Trivedi MM, Leal-Taixé L, Roth S (2018) 3d bounding boxes for road vehicles: A one-stage, localization prioritized approach using single monocular images. In: Proceedings of the European Conference on Computer Vision Workshops, pp 626–641

  15. Ku J, Pon AD, Waslander SL (2019) Monocular 3d object detection leveraging accurate proposals and shape reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 11867–11876

  16. Law H, Deng J, Ferrari V, Hebert M, Sminchisescu C, Weiss Y (2018) Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision, pp 765–781

  17. Li B, Ouyang W, Sheng L, Zeng X, Wang X (2019a) GS3D: an efficient 3d object detection framework for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1019–1028

  18. Li P, Chen X, Shen S (2019b) Stereo R-CNN based 3d object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7644–7652

  19. Li P, Liu S, Shen S (2019c) Multi-sensor 3d object box refinement for autonomous driving. arXiv:1909.04942

  20. Li P, Zhao H, Liu P, Cao F, Vedaldi A, Bischof H, Brox T, Frahm J (2020) RTM3D: real-time monocular 3d detection from object keypoints for autonomous driving. In: Proceedings of the European Conference on Computer Vision, pp 644–660

  21. Liu L, Lu J, Xu C, Tian Q, Zhou J (2019) Deep fitting degree scoring network for monocular 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1057–1066

  22. Liu Z, Wu Z, Tóth R (2020) SMOKE: single-stage monocular 3d object detection via keypoint estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 4289–4298

  23. Ma X, Wang Z, Li H, Zhang P, Ouyang W, Fan X (2019) Accurate monocular 3d object detection via color-embedded 3d reconstruction for autonomous driving. In: Proceedings of the IEEE international conference on computer vision, pp 6850–6859

  24. Manhardt F, Kehl W, Gaidon A (2019) ROI-10D: monocular lifting of 2d detection to 6d pose and metric shape. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2069–2078

  25. Mousavian A, Anguelov D, Flynn J, Kosecka J (2017) 3d bounding box estimation using deep learning and geometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5632–5640

  26. Naiden A, Paunescu V, Kim G, Jeon B, Leordeanu M (2019) Shift R-CNN: deep monocular 3d object detection with closed-form geometric constraints. IEEE International Conference on Image Processing, ICIP 2019:61–65

    Google Scholar 

  27. Pon AD, Ku J, Li C, Waslander SL (2020) Object-centric stereo matching for 3d object detection. IEEE International Conference on Robotics and Automation, ICRA 2020:8383–8389

    Google Scholar 

  28. Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 77–85

  29. Qi CR, Liu W, Wu C, Su H, Guibas LJ (2018) Frustum pointnets for 3d object detection from RGB-D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 918–927

  30. Qi CR, Litany O, He K, Guibas LJ (2019) Deep hough voting for 3d object detection in point clouds. In: Proceedings of the IEEE international conference on computer vision, pp 9276–9285

  31. Qin Z, Wang J, Lu Y (2019) Triangulation learning network: From monocular to stereo 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7615–7623

  32. Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–779

  33. Shi S, Guo C, Jiang L, Wang Z, Shi J, Wang X, Li H (2020a) PV-RCNN: point-voxel feature set abstraction for 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 10526–10535

  34. Shi X, Chen Z, Kim T (2020b) Distance-normalized unified representation for monocular 3d object detection. In: Proceedings of the European Conference on Computer Vision, pp 91–107

  35. Simonelli A, Bulò SR, Porzi L, Lopez-Antequera M, Kontschieder P (2019) Disentangling monocular 3d object detection. In: Proceedings of the IEEE international conference on computer vision, pp 1991–1999

  36. Simonyan K, Zisserman A, Bengio Y, LeCun Y (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015

  37. Sun J, Chen L, Xie Y, Zhang S, Jiang Q, Zhou X, Bao H (2020) Disp R-CNN: stereo 3d object detection via shape prior guided instance disparity estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 10545–10554

  38. Vora S, Lang AH, Helou B, Beijbom O (2020) Pointpainting: Sequential fusion for 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4603–4611

  39. Wang Y, Chao W, Garg D, Hariharan B, Campbell ME, Weinberger KQ (2019) Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8445–8453

  40. Xiang Y, Choi W, Lin Y, Savarese S (2017) Subcategory-aware convolutional neural networks for object proposals and detection. IEEE Winter Conference on Applications of Computer Vision, WACV 2017:924–933

    Google Scholar 

  41. Xie L, Xiang C, Yu Z, Xu G, Yang Z, Cai D, He X (2020) PI-RCNN: an efficient multi-sensor 3d object detector with point-based attentive cont-conv fusion module. In: Proceedings of the AAAI conference on artificial intelligence, pp 12460–12467

  42. Xu B, Chen Z (2018) Multi-level fusion based 3d object detection from monocular images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2345–2353

  43. Yang B, Luo W, Urtasun R (2018) PIXOR: real-time 3d object detection from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7652–7660

  44. Yang L, Zhang X, Wang L, Zhu M, Li J (2021) Lite-fpn for keypoint-based monocular 3d object detection. arXiv:2105.00268

  45. You Y, Wang Y, Chao W, Garg D, Pleiss G, Hariharan B, Campbell ME, Weinberger KQ (2020) Pseudo-lidar++: Accurate depth for 3d object detection in autonomous driving. In: 8th International Conference on Learning Representations, ICLR 2020

  46. Zhou X, Wang D, Krähenbühl P (2019a) Objects as points. arXiv:1904.07850

  47. Zhou X, Zhuo J, Krähenbühl P (2019b) Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 850–859

  48. Zhou Y, Tuzel O (2018) Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4490–4499

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guizhong Liu.

Ethics declarations

Competing Interests

We have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ji, C., Liu, G. & Zhao, D. Monocular 3D object detection via estimation of paired keypoints for autonomous driving. Multimed Tools Appl 81, 5973–5988 (2022). https://doi.org/10.1007/s11042-021-11801-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11801-3

Keywords

Navigation