Skip to main content
Log in

A survey of 3D object detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Due to the rapid development of science and technology, object detection has become a promising research direction in computer vision. In recent years, most object detection frameworks proposed in the existing research are 2D. However, 2D object detection cannot take three-dimensional space into account, resulting in its inability to be used to solve problems in real world. Hence, we conduct this 3D object detection survey in the hope that 3D object detection methods can be better applied to the contexts of intelligent video surveillance, robot navigation and autonomous driving technology. There exist various 3D object detection methods while in this paper we only focus on the popular deep learning based methods. We divide these approaches into four categories according to the input data category. Besides, we discuss the innovations of these frames and compare their experimental results in terms of accuracy. Finally, we indicate the technical difficulties associated with current 3D object detection and discuss future research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Ahmed M (2020) Density based clustering for 3d object detection in point clouds. In: Conference on computer vision and pattern recognition

  2. Bay H, Tuytelaars T, Van Gool LJ (2006) Surf: Speeded up robust features

  3. Belongie S (2014) microsoft coco: Common objects in context

  4. Beltran J, Guindel C, Moreno FM, Cruzado D, Garcia F, De La Escalera A (2018) Birdnet: a 3d object detection framework from lidar information

  5. Bo L, Yan J, Wei W, Zheng Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  6. Caesar H, Bankiti V, Lang AH, Vora S, Beijbom O (2020) nuscenes: A multimodal dataset for autonomous driving. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  7. Cai Z, Vasconcelos N (2017) Cascade r-cnn: Delving into high quality object detection

  8. Chang J, Chen Y (2018) Pyramid stereo matching network. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 5410–5418

  9. Chang J, Chen Y (2018) Pyramid stereo matching network

  10. Chang A, Dai A, Funkhouser T, Halber M, Niessner M, Savva M, Song S, Zeng A, Yinda Z (2017) Matterport3D: Learning from RGB-D data in indoor environments. International Conference on 3D Vision (3DV)

  11. Chen X, Kundu K, Zhu Y, Ma H, Fidler S (2018) 3d object proposals using stereo imagery for accurate object class detection. IEEE Trans Pattern Anal and Mach Intell

  12. Chen J, Lei Bn, Song Q, Ying H, Chen D, Wu J (2020) A hierarchical graph network for 3d object detection on point clouds. CVPR: 389–398

  13. Chen Y, Liu S, Shen X, Jia J (2019) Fast point r-cnn

  14. Chen X, Ma H, Wan J, Li B, Xia T (2016) Multi-view 3d object detection network for autonomous driving

  15. Dai A, Chang A X, Savva M, Halber M, Funkhouser T, Nießner M (2017) Scannet Richly-annotated 3d reconstructions of indoor scenes. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 2432–2443

  16. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  17. Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation

  18. Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector

  19. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE conference on computer vision & pattern recognition

  20. Girshick R, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation

  21. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation, pp 580–587

  22. Godard C, Aodha OM, Brostow GJ (2016) Unsupervised monocular depth estimation with left-right consistency

  23. Godard C, Aodha OM, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Computer Vision & Pattern Recognition

  24. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell

  25. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition, pp 770–778

  26. Jrgensen E, Zach C, Kahl F (2019) Monocular 3d object detection and box fitting trained end-to-end using intersection-over-union loss

  27. Ke NY, Sukthankar R (2004) Pca-sift: a more distinctive representation for local image descriptors. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition

  28. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  29. Lahoud J, Ghanem B (2017) 2d-driven 3d object detection in rgb-d images. In: IEEE International conference on computer vision

  30. Li P, Chen X, Shen S (2019) Stereo r-cnn based 3d object detection for autonomous driving

  31. Li B, Ouyang W, Sheng L, Zeng X, Wang X (2019) Gs3d: An efficient 3d object detection framework for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1019–1028

  32. Liang M, Yang B, Zeng W, Chen Y, Hu R, Casas S, Urtasun R (2020) Pnpnet: End-to-end perception and prediction with tracking in the loop. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 11550–11559

  33. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell PP(99):2999–3007

    Google Scholar 

  34. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. Lect. Notes Comput. Sci, pp 21–37

  35. Mayer N, Ilg E, Hausser P, Fischer P, Brox T (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  36. Mousavian A, Anguelov D, Flynn J, Kosecka J (2017) 3d bounding box estimation using deep learning and geometry. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR)

  37. Ng M, Huo Y, Yi H, Wang Z, Shi J, Lu Z, Luo P (2020) Learning depth-guided convolutions for monocular 3d object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW)

  38. Pang S, Morris D, Radha H (2020) Clocs: Camera-lidar object candidates fusion for 3d object detection

  39. Qi CR, Liu W, Wu C, Su H, Guibas LJ (2017) Frustum pointnets for 3d object detection from rgb-d data

  40. Qi CR, Su H, Mo K, Guibas LJ (2016) Pointnet: Deep learning on point sets for 3d classification and segmentation

  41. Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space

  42. Qian R, Garg D, Wang Y, You Y, Chao WL (2020) End-to-end pseudo-lidar for image-based 3d object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  43. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271

  44. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement

  45. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  46. Shafiee MJ, Chywl B, Li F, Wong A (2017) Fast yolo: A fast you only look once system for real-time embedded object detection in video

  47. Shi S, Guo C, Li J, Wang Z, Li H (2020) Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  48. Shi W, Rajkumar R (2020) Point-gnn: Graph neural network for 3d object detection in a point cloud. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  49. Shi S, Wang X, Li H (2018) Pointrcnn: 3d object proposal generation and detection from point cloud

  50. Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: European conference on computer vision

  51. Sindagi VA, Zhou Y, Tuzel O (2019) Mvx-net: Multimodal voxelnet for 3d object detection

  52. Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: A rgb-d scene understanding benchmark suite

  53. Su H, Jampani V, Sun D, Maji S, Kalogerakis E, Yang M-H, Kautz J (2018) Splatnet: Sparse lattice networks for point cloud processing, pp 2530–2539

  54. Sun J, Chen L, Xie Y, Zhang S, Jiang Q, Zhou X, Bao H (2020) Disp r-cnn: Stereo 3d object detection via shape prior guided instance disparity estimation

  55. Wang Y, Chao W-L, Garg D, Hariharan B, Campbell M, Weinberger KQ (2018) Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving

  56. Wang Y, Chen X, You Y, Li E, Hariharan B, Campbell M, Weinberger KQ, Chao WL (2020) Train in Germany, test in the USA Making 3d object detectors generalize. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  57. Wang X, Han TX, Yan S (2010) An hog-lbp human detector with partial occlusion handling. In: 2009 IEEE 12th international conference on computer vision

  58. Wang Z, Jia K (2019) Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3d object detection

  59. Wang X, Shrivastava A, Gupta A (2017) Hard positive generation via adversary for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  60. Wang J, Zhu M, Sun D, Bo W, Wei H (2019) Mcf3d: Multi-stage complementary fusion for multi-sensor 3d object detection. IEEE Access PP(99):1–1

    Article  Google Scholar 

  61. LLC W (2019) Waymo open dataset: An autonomous driving dataset

  62. Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: A deep representation for volumetric shapes. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR)

  63. Xu D, Anguelov D, Jain A (2017) Pointfusion: Deep sensor fusion for 3d bounding box estimation

  64. Xu B, Chen Z (2018) Multi-level fusion based 3d object detection from monocular images. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  65. Yang B, Luo W, Urtasun R (2018) Pixor: Real-time 3d object detection from point clouds

  66. Zhou J, Lu X, Tan X, Shao Z, Ding S, Ma L (2019) Fvnet: 3d front-view proposal generation for real-time object detection from point clouds. arXiv:1903.10750

  67. Zhou Y, Tuzel O (2017) Voxelnet: End-to-end learning for point cloud based 3d object detection

  68. Zhou H, Yuan Y, Shi C (2009) Object tracking using sift features and mean shift. Computer Vision & Image Understanding 113(3):345–352

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported in part by the National Natural Science Foundation of China under grant agreements Nos. 61973250, 61802306, 61973249, 61702415, 61902318, 61876145. Scientific research plan for servicing local area of Shaanxi province education department: 19JC038, 19JC041. Key Research and Development Program of Shaanxi (Nos.2019GY-012, 2021GY-077).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ling Guo.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, W., Xu, P., Guo, L. et al. A survey of 3D object detection. Multimed Tools Appl 80, 29617–29641 (2021). https://doi.org/10.1007/s11042-021-11137-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11137-y

Keywords

Navigation