ABSTRACT
An accurate and efficient 3D object detection system is crucial to for the autonomous vehicle. However, due to the complexity of the environment, a single sensor, such as LIDAR or camera, cannot meet the safety requirements of autonomous driving. In this paper, a two stage 3D detection network using Guidance-Point-Based feature fusion is proposed. For the first stage network, firstly, the features in the image space are converted to BEV(bird's-eye-view) through the Guidance-Point-Based feature mapping module designed in this paper. Secondly, the LIDAR feature and the camera feature in BEV are fused through the adaptive fusion module, and finally a Cneter-Based strategy is used for detection. In the second stage, keypointed features are used to further refine the objects output by the first stage network. Evaluation on the nuScenes dataset shows that the network we proposed achieves higher accuracy with less additional time.
Supplemental Material
Available for Download
- J. Shen, Q. Liu and H. Chen. 2020. An Optimized Multi-sensor Fused Object Detection Method for Intelligent Vehicles0. In 2020 IEEE 5th International Conference on Intelligent Transportation Engineering (ICITE), 2020,Beijing China, 265-270Google ScholarCross Ref
- Redmon J, Divvala S, Girshick R, 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, Las Vegas,USA, 779-788Google ScholarCross Ref
- Redmon J, Farhadi A. 2017. YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, Hawaii, USA, 7263-7271Google ScholarCross Ref
- Farhadi A, Redmon J. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. 2, 4, 7, 11Google Scholar
- Bochkovskiy A, Wang C Y, Liao H Y M. 2020. YOLOv4: Optimal speed and accuracy of object detection. J. arXiv preprint arXiv:2004.10934, 2020: 1-17.Google Scholar
- H. Law and J. Deng. 2018. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), 2018, Munich, GermanyGoogle Scholar
- Xingyi Zhou, Dequan Wang, and Philipp Kr¨ahenb¨uhl. 2019. Objects as points. arXiv:1904.07850, 2019. 2, 3Google Scholar
- Y. Kim and D. Kum. 2019. Deep Learning based Vehicle Position and Orientation Estimation via Inverse Perspective Mapping Image. In 2019 IEEE Intelligent Vehicles Symposium (IV), 2019, Paris, France, 317-323, doi: 10.1109/IVS.2019.8814050.Google ScholarDigital Library
- T. Roddick, A. Kendall, and R. Cipolla. 2018. Orthographic feature transform for monocular 3d object detection. arXiv preprint arXiv:1811.08188, 2018. 2.Google Scholar
- Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. 2019. Multi-view 3d object detection network for autonomous driving. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA:IEEE,2019: 1907-1915.Google Scholar
- Yin Zhou,Oncel Tuzel. 2018. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA: IEEE,2018:4490-4499.Google ScholarCross Ref
- Alex H Lang, Sourabh Vora, Holger Caesar,Lubing Zhou,Jiong Yang, and Oscar Beijbom. 2019. Pointpillars: Fast Encoders for Object Detection from Point Clouds[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach, CA, USA:IEEE,2019:12689-12697.Google Scholar
- T. Yin, X. Zhou and P. Krähenbühl. 2021. "Center-based 3D Object Detection and Tracking," 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11779-11788, doi: 10.1109/CVPR46437.2021.01161.Google Scholar
- Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Computer Vision and Pattern Recognition (CVPR),Hawaii,USA, IEEE, 1(2):4, 2017.Google Scholar
- Charles Ruizhongtai Qi, Li Yi, Hao Su, Leonidas J Guibas. 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems, Long Beach, 5099-5108Google Scholar
- Shaoshuai Shi, Xiaogang Wang, and Hongsheng Li. 2019. Pointrcnn: 3d object proposal generation and detection from point cloud. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach,USA, 770–779Google ScholarCross Ref
- Shi S, Guo C, Jiang L, 2020. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, Seattle, USA, 10529-10538Google ScholarCross Ref
- Charles Ruizhongtai Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J. Guibas. 2018. Frustum pointnets for 3d object detection from RGBD data. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Salt Lake City, UT, USA:IEEE,2018:918-927.Google ScholarCross Ref
- Liang, M, Yang, B, Wang, S, Urtasun, R. 2018. Deep continuous fusion for multi-sensor 3d object detection. In Proceedings of the European Conference on Computer Vision (ECCV), 2018, Munich, Germany, 641–656Google Scholar
- J. H. Yoo, Y. Kim, J. S. Kim, and J. W. Choi. 2020. 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection, arXiv preprint arXiv:2004.12636, 2020Google Scholar
- M. Ding 2020. Learning Depth-Guided Convolutions for Monocular 3D Object Detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA,11669-11678, doi: 10.1109/CVPR42600.2020.01169.Google ScholarCross Ref
- Tsung-Yi Lin, Priyal Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. 2018. Focal loss for dense object detection. J. IEEE transactions on pattern analysis and machine intelligence,2018,318-327.Google Scholar
- Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O. 2019. nuscenes: A multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027Google Scholar
Recommendations
Two-Stage Feature Attention Fusion for Radar-Camera 3D Object Detection
ADMIT '23: Proceedings of the 2023 2nd International Conference on Algorithms, Data Mining, and Information TechnologyMulti-sensor fusion is essential for 3D object detection in intelligent transportation due to it makes best use of cross-modality information, in which feature-level fusion of millimeter-wave radar and camera has been a hot topic. Existing research ...
3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-view Spatial Feature Fusion for 3D Object Detection
Computer Vision – ECCV 2020AbstractIn this paper, we propose a new deep architecture for fusing camera and LiDAR sensors for 3D object detection. Because the camera and LiDAR sensor signals have different characteristics and distributions, fusing these two modalities is expected to ...
3D object detection based on the fusion of projected point cloud and image features
EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer EngineeringThe complementary advantages of point cloud and image can provide more accurate 3D and semantic information to the model. Aiming at the problems that most existing methods adopt a single fusion strategy and thus fail to achieve deep fusion of image and ...
Comments