Abstract
Frustum-based 3D detection methods suffer from the ignorance of a 2D detector for that the object will never be detected in point cloud if it is omitted by a 2D image proposal. In this work, we propose a novel method named semantic frustum-based sparsely embedded convolutional detection (SFB-SECOND) for 3D object detection, which is devoted to solving the limitation of frustum-based methods, i.e., heavily relying on the accurate 2D detector. Specifically, for the image and LIDAR describing the same scene, we initially use developed methods of semantic segmentation and object detection to generate the object mask, selecting all potential targets within two confidence-related regions. Through this object mask, we quickly locate the objects of interest in LIDAR and dig them up as semantic frustum. This selected frustum not only rules out more background and irrelevant objects in LIDAR but also maximizes the use of rich 3D information. Then, to accurate the orientation estimation, we introduce a refined form of region-aware loss regression to cooperate with the region-aware frustum. Besides, a new data augmentation strategy is proposed to further make haste the convergence speed and improve detection performance. In addition, the proposed SFB-SECOND achieves state-of-the-art performances on the 3D object detection benchmark KITTI with real-time speed, showing superiority over previous methods.
Similar content being viewed by others
References
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: The kitti dataset. The International Journal of Robotics Research 32(11), 1231–1237 (2013)
X. Chen, H. Ma, J. Wan, B. Li, T. Xia, Multi-view 3d object detection network for autonomous driving, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1907–1915
B. Yang, W. Luo, R. Urtasun, Pixor: Real-time 3d object detection from point clouds, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7652–7660
Poullis, C.: A framework for automatic modeling from point cloud data. IEEE Transactions on Pattern Analysis & Machine Intelligence 35(11), 2563–2575 (2013)
Yan, Y., Mao, Y., Li, B.: Second: Sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
C. R. Qi, W. Liu, C. Wu, H. Su, L. J. Guibas, Frustum pointnets for 3d object detection from rgb-d data, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 918–927
Z. Wang, K. Jia, Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3d object detection, arXiv preprint arXiv:1903.01864, 2019
J. Ren, X. Chen, J. Liu, W. Sun, J. Pang, Q. Yan, Y.-W. Tai, and L. Xu, Accurate single stage detector using recurrent rolling convolution, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5420–5428
Gomaa, A., Abdelwahab, M.M., Abo-Zahhad, M. Real-time algorithm for simultaneous vehicle detection, and tracking in aerial view videos, In, : IEEE 61st International Midwest Symposium on Circuits and Systems (MWSCAS). IEEE 2018, 222–225 (2018)
Gomaa, A., Abdelwahab, M.M., Abo-Zahhad, M., Minematsu, T., Taniguchi, R.-I.: Robust vehicle detection and counting algorithm employing a convolution neural network and optical flow. Sensors 19(20), 4588 (2019)
A. Gomaa, M. M. Abdelwahab, M. Abo-Zahhad, Efficient vehicle detection and tracking strategy in aerial videos by employing morphological operations and feature points motion analysis, Multimedia Tools and Applications, vol. 79, no. 35, pp. 26 023–26 043, 2020
M. Simon, K. Amende, A. Kraus, J. Honer, T. Samann, H. Kaulbersch, S. Milz, H. Michael Gross, Complexer-yolo: Real-time 3d object detection and tracking on semantic point clouds, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0
A. Ošep, P. Voigtlaender, J. Luiten, S. Breuers, B. Leibe, Large-scale object mining for object discovery from unlabeled video, In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 5502–5508
J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, H. Lu, Dual attention network for scene segmentation, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3146–3154
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis & Machine Intelligence 4, 640–651 (2017)
J. Dai, Y. Li, K. He, J. Sun, R-fcn: Object detection via region-based fully convolutional networks, In: Advances in Neural Information Processing Systems, 2016, pp. 379–387
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(4), 834–848 (2017)
L.-C. Chen, G. Papandreou, F. Schroff, H. Adam, Rethinking atrous convolution for semantic image segmentation, arXiv preprint arXiv:1706.05587, 2017
Y. Liu, K. Chen, C. Liu, Z. Qin, Z. Luo, J. Wang, Structured knowledge distillation for semantic segmentation, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 2604–2613
Hu, Z., Tang, J., Wang, Z., Zhang, K., Zhang, L., Sun, Q.: Deep learning for image-based cancer detection and diagnosis- a survey. Pattern Recognition 83, 134–149 (2018)
Y. Zhou, O. Tuzel, Voxelnet: End-to-end learning for point cloud based 3d object detection, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4490–4499
A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, O. Beijbom, Pointpillars: Fast encoders for object detection from point clouds, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 697–12 705
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg, Ssd: Single shot multibox detector, In: European Conference on Computer Vision, 2016, pp. 21–37
T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, In: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2980–2988
Z. Yang, Y. Sun, S. Liu, X. Shen, J. Jia, Ipod: Intensive point-based object detector for point cloud, arXiv preprint arXiv:1812.05276, 2018
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3d proposal generation and object detection from view aggregation, In. IEEE/RSJ International Conference on Intelligent Robots and Systems 2018, 1–8 (2018)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Feng, Y., Yu, J., Xu, J. et al. Semantic frustum-based sparsely embedded convolutional detection. SIViP 15, 1239–1246 (2021). https://doi.org/10.1007/s11760-021-01854-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-021-01854-0