Abstract
Recently, multimodal 3D object detection that fuses the complementary information from LiDAR data and RGB images has been an active research topic. However, it is not trivial to fuse images and point clouds because of different representations of them. Inadequate feature fusion also brings bad effects on detection performance. We convert images into pseudo point clouds by using a depth completion and utilize a more efficient feature fusion method to address the problems. In this paper, we propose a dual-way multimodal feature fusion network (DMFF) for 3D object detection. Specifically, we first use a dual stream feature extraction module (DSFE) to generate homogeneous LiDAR and pseudo region of interest (RoI) features. Then, we propose a dual-way feature interaction method (DWFI) that enables intermodal and intramodal interaction of the two features. Next, we design a local attention feature fusion module (LAFF) to select which features of the input are more likely to contribute to the desired output. In addition, the proposed DMFF achieves the state-of-the-art performances on the KITTI Dataset.
Similar content being viewed by others
Data and materials availability
Not applicable.
References
Zhou, C., Zhang, Y., Chen, J., and Huang, D.: OcTr: Octree-based transformer for 3D object detection. arXiv preprint arXiv:2303.12621 (2023)
Sheng, H., Cai, S., Liu, Y., Deng, B., Huang, J., Hua, X. S., Zhao, M. J.: Improving 3d object detection with channel-wise transformer. In: Proceedings of the IEEE/cvf conference on computer vision and pattern recognition. pp. 2743-2752 (2021)
Hu, J. S., Kuai, T., and Waslander, S. L.: Point density-aware voxels for lidar 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8469-8478 (2022)
Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., Li, H.: Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: CVPR. pp. 10529–10538 (2020)
Shi, S., Wang, X., Li, H.: Pointrcnn: 3d object proposal generation and detection from point cloud. In: CVPR. pp. 770–779 (2019)
Xu, Q., Zhong, Y., and Neumann, U.: Behind the curtain: learning occluded shapes for 3d object detection. In: Proceedings of the AAAI conference on artificial intelligence. pp. 2893-2901 (2022)
Chen, Y., Li, Y., Zhang, X., Sun, J., and Jia, J.: Focal sparse convolutional networks for 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5428-5437 (2022)
Zhu, H., Deng, J., Zhang, Y., Ji, J., Mao, Q., Li, H., and Zhang, Y.: Vpfnet: Improving 3d object detection with virtual point based lidar and stereo data fusion. arXiv preprint arXiv:2111.14382 (2021)
Liu, Z., Tang, H., Amini, A., Yang, X., Mao, H., Rus, D., and Han, S.: BEVFusion: multi-task multi-sensor fusion with unified bird’s-eye view representation. arXiv preprint arXiv:2205.13542 (2022)
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3d object detection from rgb-d data. In: CVPR. pp. 918–927 (2018)
Wang, Y., Mao, Q., Zhu, H., Deng, J., Zhang, Y., Ji, J., Zhang, Y.: Multi-modal 3d object detection in autonomous driving: a survey. arXiv preprint arXiv:2106.12735 (2021)
Vora, S., Lang, A.H., Helou, B., Beijbom, O.: Pointpainting: Sequential fusion for 3d object detection. In: CVPR. pp. 4604–4612 (2020)
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3d object detection network for autonomous driving. In: CVPR. pp. 1907–1915 (2017)
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3d proposal generation and object detection from view aggregation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS). pp. 1–8. IEEE (2018)
Wu, X., Peng, L., Yang, H., Xie, L., Huang, C., Deng, C., Cai, D.: Sparse fuse dense: towards high quality 3d detection with depth completion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5418-5427 (2022)
Hu, M., Wang, S., Li, B., Ning, S., Fan, L., Gong, X.: Penet: Towards precise and efficient image guided depth completion. In 2021 IEEE international conference on robotics and automation (ICRA). pp. 13656-13662. IEEE (2021)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: CVPR. pp. 3354–3361 (2012)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: CVPR. pp. 652–660 (2017)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. NeurIPS 30 (2017)
Zhou, Y., Tuzel, O.: Voxelnet: End-to-end learning for point cloud based 3d object detection. In: CVPR. pp. 4490–4499 (2018)
Yan, Y., Mao, Y., Li, B.: Second: Sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Deng, J., Shi, S., Li, P., Zhou, W., Zhang, Y., Li, H.: Voxel r-cnn: Towards high performance voxel-based 3d object detection. In: AAAI. pp. 1201–1209 (2021)
Wang, Y., Chao, W. L., Garg, D., Hariharan, B., Campbell, M., Weinberger, K. Q.: Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8445-8453 (2019)
Ma, X., Wang, Z., Li, H., Zhang, P., Ouyang, W., and Fan, X.: Accurate monocular 3d object detection via color-embedded 3d reconstruction for autonomous driving. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 6851-6860 (2019)
Roddick, T., Kendall, A., Cipolla, R.: Orthographic feature transform for monocular 3d object detection. arXiv preprint arXiv:1811.08188 (2018)
Reading, C., Harakeh, A., Chae, J., Waslander, S.L.: Categorical depth distribution network for monocular 3d object detection. In: CVPR. pp. 8555–8564 (2021)
Huang, T., Liu, Z., Chen, X., Bai, X.: Epnet: Enhancing point features with image semantics for 3d object detection. In: ECCV. pp. 35–52. Springer (2020)
Liu, Z., Huang, T., Li, B., Chen, X., Wang, X., Bai, X.: EPNet++: Cascade bi-directional fusion for multi-modal 3D object detection. arXiv preprint arXiv:2112.11088 (2021)
Vora, S., Lang, A.H., Helou, B., Beijbom, O.: Pointpainting: sequential fusion for 3d object detection. In: CVPR. pp. 4604–4612 (2020)
Wang, C., Ma, C., Zhu, M., Yang, X.: Pointaugmenting: cross-modal augmentation for 3d object detection. In: CVPR. pp. 11794–11803 (2021)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems, 30 (2017)
Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi- task multi-sensor fusion for 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7345-7353 (2019)
Mahmoud, A., Hu, J. S., Waslander, S. L.: Dense voxel fusion for 3D object detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 663-672 (2023)
Yang, H., Liu, Z., Wu, X., Wang, W., Qian, W., He, X., Cai, D.: Graph R-CNN: towards accurate 3D object detection with semantic-decorated local graph. In: ECCV. pp. 662-679. Springer (2022)
Funding
This work was supported in part by the Natural Science Foundation of Heilongjiang Province of China (No.LH2021F026), Fundamental Research Funds for the Central Universities (No. HIT.NSRIF202243), and Aeronautical Science Foundation of China (No.2022Z071077002).
Author information
Authors and Affiliations
Contributions
XD and XD designed the research. XD drafted the manuscript. XD helped organize the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Ethical approval
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dong, X., Di, X. & Wang, W. DMFF: dual-way multimodal feature fusion for 3D object detection. SIViP 18, 455–463 (2024). https://doi.org/10.1007/s11760-023-02772-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-023-02772-z