Abstract
Visual perception is a key technology in the Intelligent Visual Internet of Things. The research of object detection methods is of great significance for improving the safety and efficiency of unmanned driving technology and intelligent visual Internet of Things. 3D point clouds object detection of deep learning can not only use the deep network to automatically learn characteristics of the multi-layer abstract structure, improve calculation efficiency and detection accuracy of the model, but also have better performance in dealing with object occlusion, absence and data sparsity with obtained high-dimensional point clouds information. However, the current review of object detection methods for 3D point clouds based on deep learning is scarce. In order to provide a more comprehensive understanding and understanding of the security and efficiency development of driverless technology, this paper is divided into the monocular camera, RGB-D image and LiDAR point cloud, according to the main data of the network model, and further subdivides according to the different use methods of the model. Analyze the performance of various model detection methods. This article also summarizes current commonly used 3D point clouds datasets of object detection, organizes and describes detection metrics of commonly used 3D point clouds, and discusses research challenges and development trends. The real-time performance of 3D point cloud object detection under the intelligent vision Internet of Things needs to be improved.
Similar content being viewed by others
References
Ali W, Abdelkarim S, Zidan M, Zahran M, El Sallab A (2018) YOLO3D: End-to-End Real-Time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud. In: European Conference on Computer Vision. Springer, Cham, pp 716–728
Asvadi A, Garrote L, Premebida C, Peixoto P, Nunes UJ (2017) Depthcn: vehicle detection using 3d-lidar and convnet. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC). IEEE, pp 1–6
Beltrán J, Guindel C, Moreno FM, Cruzado D, Garcia F, De La Escalera A (2018) Birdnet: a 3d object detection framework from lidar information. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, pp 3517–3523
Brazil G, Liu X (2019) M3d-rpn: Monocular 3d region proposal network for object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 9287–9296
Estrach JB, Zaremba W, Szlam A, LeCun Y (2014) Spectral networks and deep locally connected networks on graphs. In 2nd International Conference on Learning Representations, ICLR 2014
Chabot F, Chaouch M, Rabarisoa J, Teuliere C, Chateau T (2017) Deep manta: A coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2040–2049
Chen X, Kundu K, Zhu Y, Berneshawi AG, Ma H, Fidler S, Urtasun R (2015) 3d object proposals for accurate object class detection. Adv Neural Inf Proces Syst 28:424–432
Chen X, Kundu K, Zhang Z, Ma H, Fidler S, Urtasun R (2016) Monocular 3d object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2147–2156
Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1907–1915
Chen Y, Liu S, Shen X, Jia J (2019) Fast point r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 9775–9784
Dai A, Chang AX, Savva M, Halber M, Funkhouser T, Nießner M (2017) Scannet: Richly-annotated 3d reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5828–5839
Deng J, Czarnecki K (2019) MLOD: A multi-view 3D object detection based on robust feature fusion method. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, pp 279–284
Deng Z, Jan Latecki L (2017) Amodal detection of 3d objects: Inferring 3d bounding boxes from 2d ones in rgb-depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5762–5770
Dorai C, Jain AK (1995) Cosmos-a representation scheme for free-form surfaces. In: Proceedings of IEEE International Conference on Computer Vision. IEEE, pp 1024–1029
Du X, Ang MH, Karaman S, Rus D (2018) A general pipeline for 3d detection of vehicles. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 3194–3200
Engelcke M, Rao D, Wang DZ, Tong CH, Posner I (2017) Vote3deep: Fast object detection in 3d point clouds using efficient convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 1355–1361
Feng M, Gilani SZ, Wang Y, Zhang L, Mian A (2020) Relation graph network for 3D object detection in point clouds. IEEE Trans Image Process 30:92–107
Garcia-Garcia A, Gomez-Donoso F, Garcia-Rodriguez J, Orts-Escolano S, Cazorla M, Azorin-Lopez J (2016) Pointnet: A 3d convolutional neural network for real-time object class recognition. In: 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, pp 1578–1584
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 3354–3361
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: The kitti dataset. Int J Robot Res 32(11):1231–1237
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Gomez-Ojeda R, Briales J, Gonzalez-Jimenez J (2016) Pl-svo: Semi-direct monocular visual odometry by combining points and line segments. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 4211–4216
González A, Villalonga G, Xu J, Vázquez D, Amores J, López AM (2015) Multiview random forest of local experts combining rgb and lidar data for pedestrian detection. In: 2015 IEEE Intelligent Vehicles Symposium (IV). IEEE, pp 356–361
González A, Vázquez D, López AM, Amores J (2016) On-board object detection: Multicue, multimodal, and multiview random forest of local experts. IEEE trans Cybern 47(11):3980–3990
Gu X, Wang Y, Wu C, Lee YJ, Wang P (2019) Hplflownet: Hierarchical permutohedral lattice flownet for scene flow estimation on large-scale point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3254–3263
Gupta S, Girshick R, Arbeláez P, Malik J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: European conference on computer vision. Springer, Cham, pp 345–360
Gupta I, Rangesh A, Trivedi M (2018) 3D Bounding Boxes for Road Vehicles: A One-Stage, Localization Prioritized Approach Using Single Monocular Images. In: European Conference on Computer Vision. Springer, Cham, pp 626–641
He T, Soatto S (2019) Mono3d++: Monocular 3d vehicle detection with two-scale 3d hypotheses and task priors. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 8409–8416
Hou J, Dai A, Nießner M (2019) 3d-sis: 3d semantic instance segmentation of rgb-d scans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4421–4430
Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1521–1529
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Ku J, Mozifian M, Lee J, Harakeh A, Waslander SL (2018) Joint 3d proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 1–8
Kuang H, Wang B, An J, Zhang M, Zhang Z (2020) Voxel-FPN: Multi-Scale Voxel Feature Aggregation for 3D Object Detection from LIDAR Point Clouds. Sensors 20(3):704
Lang AH, Vora S, Caesar H, Zhou L, Yang J, Beijbom O (2019) Pointpillars: Fast encoders for object detection from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 12697–12705
Lawin FJ, Danelljan M, Tosteberg P, Bhat G, Khan FS, Felsberg M (2017) Deep projective 3D semantic segmentation. In: International Conference on Computer Analysis of Images and Patterns. Springer, Cham, pp 95–107
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Li B (2017) 3d fully convolutional network for vehicle detection in point cloud. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 1513–1518
Li B, Zhang T, Xia T (2016) Vehicle detection from 3d lidar using fully convolutional network. arXiv preprint arXiv:1608.07916
Li B, Ouyang W, Sheng L, Zeng X, Wang X (2019) Gs3d: An efficient 3d object detection framework for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1019–1028
Liang M, Yang B, Wang S, Urtasun R (2018) Deep continuous fusion for multi-sensor 3d object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 641–656
Liang M, Yang B, Chen Y, Hu R, Urtasun R (2019) Multi-task multi-sensor fusion for 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7345–7353
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, Cham, pp 21–37
Liu H, Hu Q, Li B, Guo Y (2019) Robust Long-Term Tracking via Instance-Specific Proposals. IEEE Trans Instrum Meas 69(4):950–962
Liu X, Qi CR, Guibas LJ (2019) Flownet3d: Learning scene flow in 3d point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 529–537
Liu Z, Tang H, Lin Y, Han S (2019) Point-voxel cnn for efficient 3d deep learning. In: Advances in Neural Information Processing Systems, pp 965–975
LLC W (2019) Waymo Open Dataset: An Autonomous Driving Dataset
Mao J, Wang X, Li H (2019) Interpolated convolutional networks for 3d point cloud understanding. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1578–1587
Maturana D, Scherer S (2015) Voxnet: A 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 922–928
Meyer GP, Laddha A, Kee E, Vallespi-Gonzalez C, Wellington CK (2019) Lasernet: An efficient probabilistic 3d object detector for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 12677–12686
Mousavian A, Anguelov D, Flynn J, Kosecka J (2017) 3d bounding box estimation using deep learning and geometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7074–7082
Naiden A, Paunescu V, Kim G, Jeon B, Leordeanu M (2019) Shift R-CNN: Deep monocular 3D object detection with closed-form geometric constraints. In: 2019 IEEE International Conference on Image Processing (ICIP). IEEE, pp 61–65
Ngiam J, Caine B, Han W, Yang B, Vasudevan V et al. (2019) Starnet: Targeted computation for object detection in point clouds. arXiv preprint arXiv:1908.11069
Paigwar A, Erkent O, Wolf C, Laugier C (2019) Attentional PointNet for 3D-Object Detection in Point Clouds. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, pp 1297–1306
Park Y, Lepetit V, Woo W (2008) Multiple 3d object tracking for augmented reality. In: 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality. IEEE, pp 117–120
Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660
Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv Neural Inf Proces Syst 30:5099–5108
Qi CR, Liu W, Wu C, Su H, Guibas LJ (2018) Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 918–927
Qi CR, Litany O, He K, Guibas LJ (2019) Deep hough voting for 3d object detection in point clouds. In: Proceedings of the IEEE International Conference on Computer Vision, pp 9277–9286
Rao Y, Lu J, Zhou J (2019) Spherical fractal convolutional neural networks for point cloud recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 452–460
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Shen X, Stamos I (2020) Frustum VoxNet for 3D object detection from RGB-D or Depth images. In: The IEEE Winter Conference on Applications of Computer Vision, pp 1698–1706
Shi W, Rajkumar R (2020) Point-gnn: Graph neural network for 3d object detection in a point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1711–1719
Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–779
Shi S, Guo C, Jiang L, Wang Z, Shi J, Wang X, Li H (2020) Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10529–10538
Shi S, Wang Z, Shi J, Wang X, Li H (2020) From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE Transactions on Pattern Analysis and Machine Intelligence
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: European conference on computer vision. Springer, Berlin, Heidelberg, pp 746–760
Simon M, Amende K, Kraus A, Honer J, Sämann T, Kaulbersch H, … Gross HM (2019) Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, pp 1190–1199
Song S, Xiao J (2016) Deep sliding shapes for amodal 3d object detection in rgb-d images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 808–816
Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: A rgb-d scene understanding benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 567–576
Stein F, Medioni G (1992) Structural indexing: Efficient 3-D object recognition. IEEE Trans Pattern Anal Mach Intell 14(2):125–145
Tian Y, Yang G, Wang Z, Wang H, Li E, Liang Z (2019) Apple detection during different growth stages in orchards using the improved YOLOV3 model. Comput Electron Agric 157:417–426
Tian Y, Wang K, Wang Y, Tian Y, Wang Z, Wang FY (2020) Adaptive and azimuth-aware fusion network of multimodal local features for 3D object detection. Neurocomputing 411:32–44
Wang Z, Jia K (2019) Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 1742–1749
Wang PS, Liu Y, Guo YX, Sun CY, Tong X (2017) O-cnn: Octree-based convolutional neural networks for 3d shape analysis. ACM Trans Graph (TOG) 36(4):1–11
Wang W, Yu R, Huang Q, Neumann U (2018) Sgpn: Similarity group proposal network for 3d point cloud instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2569–2578
Wang L, Huang Y, Hou Y, Zhang S, Shan J (2019) Graph attention convolution for point cloud semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 10296–10305
Weng X, Kitani K (2019) Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, pp 857–866
Wirges S, Fischer T, Stiller C, Frias JB (2018) Object detection and classification in occupancy grid maps using deep convolutional networks. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, pp 3530–3535
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1912–1920
Wu B, Wan A, Yue X, Keutzer K (2018) Squeezeseg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 1887–1893
Wu W, Qi Z, Fuxin L (2019) Pointconv: Deep convolutional networks on 3d point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9621–9630
Xie S, Liu S, Chen Z, Tu Z (2018) Attentional shapecontextnet for point cloud recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4606–4615
Xu B, Chen Z (2018) Multi-level fusion based 3d object detection from monocular images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2345–2353
Xu D, Ouyang W, Ricci E, Wang X, Sebe N (2017) Learning cross-modal deep representations for robust pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5363–5371
Xu D, Anguelov D, Jain A (2018) Pointfusion: Deep sensor fusion for 3d bounding box estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 244–253
Xu L, Wang H, Gulliver TA (2020) Outage probability performance analysis and prediction for mobile IoV networks based on ICS-BP neural network. IEEE Internet of Things Journal
Yan Y, Mao Y, Li B (2018) Second: Sparsely embedded convolutional detection. Sensors 18(10):3337
Yang Z, Sun Y, Liu S, Shen X, Jia J (2018) Ipod: Intensive point-based object detector for point cloud. arXiv preprint arXiv:1812.05276
Yang Z, Sun Y, Liu S, et al (2018) Ipod: Intensive point-based object detector for point cloud[J]. arXiv preprint arXiv:1812.05276
Yang J, Zhang Q, Ni B, Li L, Liu J, Zhou M, Tian Q (2019) Modeling point clouds with self-attention and gumbel subset sampling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3323–3332
Yang Z, Sun Y, Liu S, Shen X, Jia J (2019) Std: Sparse-to-dense 3d object detector for point cloud. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1951–1960
Yang Z, Sun Y, Liu S, Jia J (2020) 3dssd: Point-based 3d single stage object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11040–11048
Ye M, Xu S, Cao T (2020) HVNet: Hybrid Voxel Network for LiDAR Based 3D Object Detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1631–1640
Yi L, Zhao W, Wang H, Sung M, Guibas LJ (2019) Gspn: Generative shape proposal network for 3d instance segmentation in point cloud. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3947–3956
Zarzar J, Giancola S, Ghanem B (2019) PointRGCN: Graph convolution networks for 3D vehicles detection refinement. arXiv preprint arXiv:1911.12236
Zeng Y, Hu Y, Liu S, Ye J, Han Y, Li X, Sun N (2018) Rt3d: Real-time 3-d vehicle detection in lidar point cloud for autonomous driving. IEEE Robot Automation Lett 3(4):3434–3440
Zhi S, Liu Y, Li X, Guo Y (2017) LightNet: a lightweight 3D convolutional neural network for real-time 3D object recognition. In: Proceedings of the Workshop on 3D Object Retrieval. Eurographics Association, pp 9–16
Zhou Y, Sun P, Zhang Y, Anguelov D, Gao J, Ouyang et al (2020) End-to-end multi-view fusion for 3d object detection in lidar point clouds. In: Conference on Robot Learning. PMLR, pp 923–932
Zhou Y, Tuzel O (2018) Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4490–4499
Acknowledgments
This research was funded by the National Natural Science Foundation of China (No. 61702295), the Shandong Province Natural Science Foundation (No. ZR2020QF003, ZR2017BF023), the Opening Foundation of Key Laboratory of Opto-Technology and Intelligent Control (Lanzhou Jiaotong University), The Ministry of Education (No.KFKT2020-09), the Shandong Province Postdoctoral Innovation Project (No. 201703032), the Shandong Province Colleges and Universities Young Talents Initiation Program (No.2019KJN047), and the Doctoral Fund of QUST (No.1203043003480, 010029029).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, H., Wang, J., Xu, L. et al. Efficient and accurate object detection for 3D point clouds in intelligent visual internet of things. Multimed Tools Appl 80, 31297–31334 (2021). https://doi.org/10.1007/s11042-020-10475-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10475-7