A survey of 3D object detection

Liang, Wei; Xu, Pengfei; Guo, Ling; Bai, Heng; Zhou, Yang; Chen, Feng

doi:10.1007/s11042-021-11137-y

A survey of 3D object detection

Published: 03 July 2021

Volume 80, pages 29617–29641, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Wei Liang¹,
Pengfei Xu¹,
Ling Guo ORCID: orcid.org/0000-0003-2518-3416¹,
Heng Bai¹,
Yang Zhou¹ &
…
Feng Chen¹

2441 Accesses
22 Citations
1 Altmetric
Explore all metrics

Abstract

Due to the rapid development of science and technology, object detection has become a promising research direction in computer vision. In recent years, most object detection frameworks proposed in the existing research are 2D. However, 2D object detection cannot take three-dimensional space into account, resulting in its inability to be used to solve problems in real world. Hence, we conduct this 3D object detection survey in the hope that 3D object detection methods can be better applied to the contexts of intelligent video surveillance, robot navigation and autonomous driving technology. There exist various 3D object detection methods while in this paper we only focus on the popular deep learning based methods. We divide these approaches into four categories according to the input data category. Besides, we discuss the innovations of these frames and compare their experimental results in terms of accuracy. Finally, we indicate the technical difficulties associated with current 3D object detection and discuss future research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Tausif Diwan, G. Anirudh & Jitendra V. Tembhurne

End-to-End Object Detection with Transformers

References

Ahmed M (2020) Density based clustering for 3d object detection in point clouds. In: Conference on computer vision and pattern recognition
Bay H, Tuytelaars T, Van Gool LJ (2006) Surf: Speeded up robust features
Belongie S (2014) microsoft coco: Common objects in context
Beltran J, Guindel C, Moreno FM, Cruzado D, Garcia F, De La Escalera A (2018) Birdnet: a 3d object detection framework from lidar information
Bo L, Yan J, Wei W, Zheng Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Caesar H, Bankiti V, Lang AH, Vora S, Beijbom O (2020) nuscenes: A multimodal dataset for autonomous driving. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Cai Z, Vasconcelos N (2017) Cascade r-cnn: Delving into high quality object detection
Chang J, Chen Y (2018) Pyramid stereo matching network. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 5410–5418
Chang J, Chen Y (2018) Pyramid stereo matching network
Chang A, Dai A, Funkhouser T, Halber M, Niessner M, Savva M, Song S, Zeng A, Yinda Z (2017) Matterport3D: Learning from RGB-D data in indoor environments. International Conference on 3D Vision (3DV)
Chen X, Kundu K, Zhu Y, Ma H, Fidler S (2018) 3d object proposals using stereo imagery for accurate object class detection. IEEE Trans Pattern Anal and Mach Intell
Chen J, Lei Bn, Song Q, Ying H, Chen D, Wu J (2020) A hierarchical graph network for 3d object detection on point clouds. CVPR: 389–398
Chen Y, Liu S, Shen X, Jia J (2019) Fast point r-cnn
Chen X, Ma H, Wan J, Li B, Xia T (2016) Multi-view 3d object detection network for autonomous driving
Dai A, Chang A X, Savva M, Halber M, Funkhouser T, Nießner M (2017) Scannet Richly-annotated 3d reconstructions of indoor scenes. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 2432–2443
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Article Google Scholar
Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE conference on computer vision & pattern recognition
Girshick R, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation, pp 580–587
Godard C, Aodha OM, Brostow GJ (2016) Unsupervised monocular depth estimation with left-right consistency
Godard C, Aodha OM, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Computer Vision & Pattern Recognition
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition, pp 770–778
Jrgensen E, Zach C, Kahl F (2019) Monocular 3d object detection and box fitting trained end-to-end using intersection-over-union loss
Ke NY, Sukthankar R (2004) Pca-sift: a more distinctive representation for local image descriptors. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar
Lahoud J, Ghanem B (2017) 2d-driven 3d object detection in rgb-d images. In: IEEE International conference on computer vision
Li P, Chen X, Shen S (2019) Stereo r-cnn based 3d object detection for autonomous driving
Li B, Ouyang W, Sheng L, Zeng X, Wang X (2019) Gs3d: An efficient 3d object detection framework for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1019–1028
Liang M, Yang B, Zeng W, Chen Y, Hu R, Casas S, Urtasun R (2020) Pnpnet: End-to-end perception and prediction with tracking in the loop. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 11550–11559
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell PP(99):2999–3007
Google Scholar
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. Lect. Notes Comput. Sci, pp 21–37
Mayer N, Ilg E, Hausser P, Fischer P, Brox T (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Mousavian A, Anguelov D, Flynn J, Kosecka J (2017) 3d bounding box estimation using deep learning and geometry. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR)
Ng M, Huo Y, Yi H, Wang Z, Shi J, Lu Z, Luo P (2020) Learning depth-guided convolutions for monocular 3d object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW)
Pang S, Morris D, Radha H (2020) Clocs: Camera-lidar object candidates fusion for 3d object detection
Qi CR, Liu W, Wu C, Su H, Guibas LJ (2017) Frustum pointnets for 3d object detection from rgb-d data
Qi CR, Su H, Mo K, Guibas LJ (2016) Pointnet: Deep learning on point sets for 3d classification and segmentation
Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space
Qian R, Garg D, Wang Y, You Y, Chao WL (2020) End-to-end pseudo-lidar for image-based 3d object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Shafiee MJ, Chywl B, Li F, Wong A (2017) Fast yolo: A fast you only look once system for real-time embedded object detection in video
Shi S, Guo C, Li J, Wang Z, Li H (2020) Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Shi W, Rajkumar R (2020) Point-gnn: Graph neural network for 3d object detection in a point cloud. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Shi S, Wang X, Li H (2018) Pointrcnn: 3d object proposal generation and detection from point cloud
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: European conference on computer vision
Sindagi VA, Zhou Y, Tuzel O (2019) Mvx-net: Multimodal voxelnet for 3d object detection
Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: A rgb-d scene understanding benchmark suite
Su H, Jampani V, Sun D, Maji S, Kalogerakis E, Yang M-H, Kautz J (2018) Splatnet: Sparse lattice networks for point cloud processing, pp 2530–2539
Sun J, Chen L, Xie Y, Zhang S, Jiang Q, Zhou X, Bao H (2020) Disp r-cnn: Stereo 3d object detection via shape prior guided instance disparity estimation
Wang Y, Chao W-L, Garg D, Hariharan B, Campbell M, Weinberger KQ (2018) Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving
Wang Y, Chen X, You Y, Li E, Hariharan B, Campbell M, Weinberger KQ, Chao WL (2020) Train in Germany, test in the USA Making 3d object detectors generalize. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Wang X, Han TX, Yan S (2010) An hog-lbp human detector with partial occlusion handling. In: 2009 IEEE 12th international conference on computer vision
Wang Z, Jia K (2019) Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3d object detection
Wang X, Shrivastava A, Gupta A (2017) Hard positive generation via adversary for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Wang J, Zhu M, Sun D, Bo W, Wei H (2019) Mcf3d: Multi-stage complementary fusion for multi-sensor 3d object detection. IEEE Access PP(99):1–1
Article Google Scholar
LLC W (2019) Waymo open dataset: An autonomous driving dataset
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: A deep representation for volumetric shapes. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR)
Xu D, Anguelov D, Jain A (2017) Pointfusion: Deep sensor fusion for 3d bounding box estimation
Xu B, Chen Z (2018) Multi-level fusion based 3d object detection from monocular images. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Yang B, Luo W, Urtasun R (2018) Pixor: Real-time 3d object detection from point clouds
Zhou J, Lu X, Tan X, Shao Z, Ding S, Ma L (2019) Fvnet: 3d front-view proposal generation for real-time object detection from point clouds. arXiv:1903.10750
Zhou Y, Tuzel O (2017) Voxelnet: End-to-end learning for point cloud based 3d object detection
Zhou H, Yuan Y, Shi C (2009) Object tracking using sift features and mean shift. Computer Vision & Image Understanding 113(3):345–352
Article Google Scholar

Download references

Acknowledgements

This research was supported in part by the National Natural Science Foundation of China under grant agreements Nos. 61973250, 61802306, 61973249, 61702415, 61902318, 61876145. Scientific research plan for servicing local area of Shaanxi province education department: 19JC038, 19JC041. Key Research and Development Program of Shaanxi (Nos.2019GY-012, 2021GY-077).

Author information

Authors and Affiliations

Northwest University, Xi’an, Shaanxi Province, 710127, China
Wei Liang, Pengfei Xu, Ling Guo, Heng Bai, Yang Zhou & Feng Chen

Authors

Wei Liang
View author publications
You can also search for this author in PubMed Google Scholar
Pengfei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Ling Guo
View author publications
You can also search for this author in PubMed Google Scholar
Heng Bai
View author publications
You can also search for this author in PubMed Google Scholar
Yang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Feng Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ling Guo.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liang, W., Xu, P., Guo, L. et al. A survey of 3D object detection. Multimed Tools Appl 80, 29617–29641 (2021). https://doi.org/10.1007/s11042-021-11137-y

Download citation

Received: 27 September 2020
Revised: 11 January 2021
Accepted: 03 June 2021
Published: 03 July 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s11042-021-11137-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of 3D object detection

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A survey of 3D object detection

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation