Abstract
Due to the rapid development of science and technology, object detection has become a promising research direction in computer vision. In recent years, most object detection frameworks proposed in the existing research are 2D. However, 2D object detection cannot take three-dimensional space into account, resulting in its inability to be used to solve problems in real world. Hence, we conduct this 3D object detection survey in the hope that 3D object detection methods can be better applied to the contexts of intelligent video surveillance, robot navigation and autonomous driving technology. There exist various 3D object detection methods while in this paper we only focus on the popular deep learning based methods. We divide these approaches into four categories according to the input data category. Besides, we discuss the innovations of these frames and compare their experimental results in terms of accuracy. Finally, we indicate the technical difficulties associated with current 3D object detection and discuss future research directions.






Similar content being viewed by others
References
Ahmed M (2020) Density based clustering for 3d object detection in point clouds. In: Conference on computer vision and pattern recognition
Bay H, Tuytelaars T, Van Gool LJ (2006) Surf: Speeded up robust features
Belongie S (2014) microsoft coco: Common objects in context
Beltran J, Guindel C, Moreno FM, Cruzado D, Garcia F, De La Escalera A (2018) Birdnet: a 3d object detection framework from lidar information
Bo L, Yan J, Wei W, Zheng Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Caesar H, Bankiti V, Lang AH, Vora S, Beijbom O (2020) nuscenes: A multimodal dataset for autonomous driving. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Cai Z, Vasconcelos N (2017) Cascade r-cnn: Delving into high quality object detection
Chang J, Chen Y (2018) Pyramid stereo matching network. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 5410–5418
Chang J, Chen Y (2018) Pyramid stereo matching network
Chang A, Dai A, Funkhouser T, Halber M, Niessner M, Savva M, Song S, Zeng A, Yinda Z (2017) Matterport3D: Learning from RGB-D data in indoor environments. International Conference on 3D Vision (3DV)
Chen X, Kundu K, Zhu Y, Ma H, Fidler S (2018) 3d object proposals using stereo imagery for accurate object class detection. IEEE Trans Pattern Anal and Mach Intell
Chen J, Lei Bn, Song Q, Ying H, Chen D, Wu J (2020) A hierarchical graph network for 3d object detection on point clouds. CVPR: 389–398
Chen Y, Liu S, Shen X, Jia J (2019) Fast point r-cnn
Chen X, Ma H, Wan J, Li B, Xia T (2016) Multi-view 3d object detection network for autonomous driving
Dai A, Chang A X, Savva M, Halber M, Funkhouser T, Nießner M (2017) Scannet Richly-annotated 3d reconstructions of indoor scenes. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 2432–2443
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE conference on computer vision & pattern recognition
Girshick R, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation, pp 580–587
Godard C, Aodha OM, Brostow GJ (2016) Unsupervised monocular depth estimation with left-right consistency
Godard C, Aodha OM, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Computer Vision & Pattern Recognition
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition, pp 770–778
Jrgensen E, Zach C, Kahl F (2019) Monocular 3d object detection and box fitting trained end-to-end using intersection-over-union loss
Ke NY, Sukthankar R (2004) Pca-sift: a more distinctive representation for local image descriptors. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Lahoud J, Ghanem B (2017) 2d-driven 3d object detection in rgb-d images. In: IEEE International conference on computer vision
Li P, Chen X, Shen S (2019) Stereo r-cnn based 3d object detection for autonomous driving
Li B, Ouyang W, Sheng L, Zeng X, Wang X (2019) Gs3d: An efficient 3d object detection framework for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1019–1028
Liang M, Yang B, Zeng W, Chen Y, Hu R, Casas S, Urtasun R (2020) Pnpnet: End-to-end perception and prediction with tracking in the loop. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 11550–11559
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell PP(99):2999–3007
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. Lect. Notes Comput. Sci, pp 21–37
Mayer N, Ilg E, Hausser P, Fischer P, Brox T (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Mousavian A, Anguelov D, Flynn J, Kosecka J (2017) 3d bounding box estimation using deep learning and geometry. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR)
Ng M, Huo Y, Yi H, Wang Z, Shi J, Lu Z, Luo P (2020) Learning depth-guided convolutions for monocular 3d object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW)
Pang S, Morris D, Radha H (2020) Clocs: Camera-lidar object candidates fusion for 3d object detection
Qi CR, Liu W, Wu C, Su H, Guibas LJ (2017) Frustum pointnets for 3d object detection from rgb-d data
Qi CR, Su H, Mo K, Guibas LJ (2016) Pointnet: Deep learning on point sets for 3d classification and segmentation
Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space
Qian R, Garg D, Wang Y, You Y, Chao WL (2020) End-to-end pseudo-lidar for image-based 3d object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Shafiee MJ, Chywl B, Li F, Wong A (2017) Fast yolo: A fast you only look once system for real-time embedded object detection in video
Shi S, Guo C, Li J, Wang Z, Li H (2020) Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Shi W, Rajkumar R (2020) Point-gnn: Graph neural network for 3d object detection in a point cloud. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Shi S, Wang X, Li H (2018) Pointrcnn: 3d object proposal generation and detection from point cloud
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: European conference on computer vision
Sindagi VA, Zhou Y, Tuzel O (2019) Mvx-net: Multimodal voxelnet for 3d object detection
Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: A rgb-d scene understanding benchmark suite
Su H, Jampani V, Sun D, Maji S, Kalogerakis E, Yang M-H, Kautz J (2018) Splatnet: Sparse lattice networks for point cloud processing, pp 2530–2539
Sun J, Chen L, Xie Y, Zhang S, Jiang Q, Zhou X, Bao H (2020) Disp r-cnn: Stereo 3d object detection via shape prior guided instance disparity estimation
Wang Y, Chao W-L, Garg D, Hariharan B, Campbell M, Weinberger KQ (2018) Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving
Wang Y, Chen X, You Y, Li E, Hariharan B, Campbell M, Weinberger KQ, Chao WL (2020) Train in Germany, test in the USA Making 3d object detectors generalize. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Wang X, Han TX, Yan S (2010) An hog-lbp human detector with partial occlusion handling. In: 2009 IEEE 12th international conference on computer vision
Wang Z, Jia K (2019) Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3d object detection
Wang X, Shrivastava A, Gupta A (2017) Hard positive generation via adversary for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Wang J, Zhu M, Sun D, Bo W, Wei H (2019) Mcf3d: Multi-stage complementary fusion for multi-sensor 3d object detection. IEEE Access PP(99):1–1
LLC W (2019) Waymo open dataset: An autonomous driving dataset
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: A deep representation for volumetric shapes. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR)
Xu D, Anguelov D, Jain A (2017) Pointfusion: Deep sensor fusion for 3d bounding box estimation
Xu B, Chen Z (2018) Multi-level fusion based 3d object detection from monocular images. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Yang B, Luo W, Urtasun R (2018) Pixor: Real-time 3d object detection from point clouds
Zhou J, Lu X, Tan X, Shao Z, Ding S, Ma L (2019) Fvnet: 3d front-view proposal generation for real-time object detection from point clouds. arXiv:1903.10750
Zhou Y, Tuzel O (2017) Voxelnet: End-to-end learning for point cloud based 3d object detection
Zhou H, Yuan Y, Shi C (2009) Object tracking using sift features and mean shift. Computer Vision & Image Understanding 113(3):345–352
Acknowledgements
This research was supported in part by the National Natural Science Foundation of China under grant agreements Nos. 61973250, 61802306, 61973249, 61702415, 61902318, 61876145. Scientific research plan for servicing local area of Shaanxi province education department: 19JC038, 19JC041. Key Research and Development Program of Shaanxi (Nos.2019GY-012, 2021GY-077).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liang, W., Xu, P., Guo, L. et al. A survey of 3D object detection. Multimed Tools Appl 80, 29617–29641 (2021). https://doi.org/10.1007/s11042-021-11137-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11137-y