Skip to main content
Log in

DMFF: dual-way multimodal feature fusion for 3D object detection

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Recently, multimodal 3D object detection that fuses the complementary information from LiDAR data and RGB images has been an active research topic. However, it is not trivial to fuse images and point clouds because of different representations of them. Inadequate feature fusion also brings bad effects on detection performance. We convert images into pseudo point clouds by using a depth completion and utilize a more efficient feature fusion method to address the problems. In this paper, we propose a dual-way multimodal feature fusion network (DMFF) for 3D object detection. Specifically, we first use a dual stream feature extraction module (DSFE) to generate homogeneous LiDAR and pseudo region of interest (RoI) features. Then, we propose a dual-way feature interaction method (DWFI) that enables intermodal and intramodal interaction of the two features. Next, we design a local attention feature fusion module (LAFF) to select which features of the input are more likely to contribute to the desired output. In addition, the proposed DMFF achieves the state-of-the-art performances on the KITTI Dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data and materials availability

Not applicable.

References

  1. Zhou, C., Zhang, Y., Chen, J., and Huang, D.: OcTr: Octree-based transformer for 3D object detection. arXiv preprint arXiv:2303.12621 (2023)

  2. Sheng, H., Cai, S., Liu, Y., Deng, B., Huang, J., Hua, X. S., Zhao, M. J.: Improving 3d object detection with channel-wise transformer. In: Proceedings of the IEEE/cvf conference on computer vision and pattern recognition. pp. 2743-2752 (2021)

  3. Hu, J. S., Kuai, T., and Waslander, S. L.: Point density-aware voxels for lidar 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8469-8478 (2022)

  4. Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., Li, H.: Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: CVPR. pp. 10529–10538 (2020)

  5. Shi, S., Wang, X., Li, H.: Pointrcnn: 3d object proposal generation and detection from point cloud. In: CVPR. pp. 770–779 (2019)

  6. Xu, Q., Zhong, Y., and Neumann, U.: Behind the curtain: learning occluded shapes for 3d object detection. In: Proceedings of the AAAI conference on artificial intelligence. pp. 2893-2901 (2022)

  7. Chen, Y., Li, Y., Zhang, X., Sun, J., and Jia, J.: Focal sparse convolutional networks for 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5428-5437 (2022)

  8. Zhu, H., Deng, J., Zhang, Y., Ji, J., Mao, Q., Li, H., and Zhang, Y.: Vpfnet: Improving 3d object detection with virtual point based lidar and stereo data fusion. arXiv preprint arXiv:2111.14382 (2021)

  9. Liu, Z., Tang, H., Amini, A., Yang, X., Mao, H., Rus, D., and Han, S.: BEVFusion: multi-task multi-sensor fusion with unified bird’s-eye view representation. arXiv preprint arXiv:2205.13542 (2022)

  10. Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3d object detection from rgb-d data. In: CVPR. pp. 918–927 (2018)

  11. Wang, Y., Mao, Q., Zhu, H., Deng, J., Zhang, Y., Ji, J., Zhang, Y.: Multi-modal 3d object detection in autonomous driving: a survey. arXiv preprint arXiv:2106.12735 (2021)

  12. Vora, S., Lang, A.H., Helou, B., Beijbom, O.: Pointpainting: Sequential fusion for 3d object detection. In: CVPR. pp. 4604–4612 (2020)

  13. Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3d object detection network for autonomous driving. In: CVPR. pp. 1907–1915 (2017)

  14. Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3d proposal generation and object detection from view aggregation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS). pp. 1–8. IEEE (2018)

  15. Wu, X., Peng, L., Yang, H., Xie, L., Huang, C., Deng, C., Cai, D.: Sparse fuse dense: towards high quality 3d detection with depth completion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5418-5427 (2022)

  16. Hu, M., Wang, S., Li, B., Ning, S., Fan, L., Gong, X.: Penet: Towards precise and efficient image guided depth completion. In 2021 IEEE international conference on robotics and automation (ICRA). pp. 13656-13662. IEEE (2021)

  17. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: CVPR. pp. 3354–3361 (2012)

  18. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: CVPR. pp. 652–660 (2017)

  19. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. NeurIPS 30 (2017)

  20. Zhou, Y., Tuzel, O.: Voxelnet: End-to-end learning for point cloud based 3d object detection. In: CVPR. pp. 4490–4499 (2018)

  21. Yan, Y., Mao, Y., Li, B.: Second: Sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)

    Article  Google Scholar 

  22. Deng, J., Shi, S., Li, P., Zhou, W., Zhang, Y., Li, H.: Voxel r-cnn: Towards high performance voxel-based 3d object detection. In: AAAI. pp. 1201–1209 (2021)

  23. Wang, Y., Chao, W. L., Garg, D., Hariharan, B., Campbell, M., Weinberger, K. Q.: Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8445-8453 (2019)

  24. Ma, X., Wang, Z., Li, H., Zhang, P., Ouyang, W., and Fan, X.: Accurate monocular 3d object detection via color-embedded 3d reconstruction for autonomous driving. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 6851-6860 (2019)

  25. Roddick, T., Kendall, A., Cipolla, R.: Orthographic feature transform for monocular 3d object detection. arXiv preprint arXiv:1811.08188 (2018)

  26. Reading, C., Harakeh, A., Chae, J., Waslander, S.L.: Categorical depth distribution network for monocular 3d object detection. In: CVPR. pp. 8555–8564 (2021)

  27. Huang, T., Liu, Z., Chen, X., Bai, X.: Epnet: Enhancing point features with image semantics for 3d object detection. In: ECCV. pp. 35–52. Springer (2020)

  28. Liu, Z., Huang, T., Li, B., Chen, X., Wang, X., Bai, X.: EPNet++: Cascade bi-directional fusion for multi-modal 3D object detection. arXiv preprint arXiv:2112.11088 (2021)

  29. Vora, S., Lang, A.H., Helou, B., Beijbom, O.: Pointpainting: sequential fusion for 3d object detection. In: CVPR. pp. 4604–4612 (2020)

  30. Wang, C., Ma, C., Zhu, M., Yang, X.: Pointaugmenting: cross-modal augmentation for 3d object detection. In: CVPR. pp. 11794–11803 (2021)

  31. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems, 30 (2017)

  32. Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi- task multi-sensor fusion for 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7345-7353 (2019)

  33. Mahmoud, A., Hu, J. S., Waslander, S. L.: Dense voxel fusion for 3D object detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 663-672 (2023)

  34. Yang, H., Liu, Z., Wu, X., Wang, W., Qian, W., He, X., Cai, D.: Graph R-CNN: towards accurate 3D object detection with semantic-decorated local graph. In: ECCV. pp. 662-679. Springer (2022)

Download references

Funding

This work was supported in part by the Natural Science Foundation of Heilongjiang Province of China (No.LH2021F026), Fundamental Research Funds for the Central Universities (No. HIT.NSRIF202243), and Aeronautical Science Foundation of China (No.2022Z071077002).

Author information

Authors and Affiliations

Authors

Contributions

XD and XD designed the research. XD drafted the manuscript. XD helped organize the manuscript.

Corresponding author

Correspondence to Xiaoguang Di.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethical approval

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, X., Di, X. & Wang, W. DMFF: dual-way multimodal feature fusion for 3D object detection. SIViP 18, 455–463 (2024). https://doi.org/10.1007/s11760-023-02772-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02772-z

Keywords

Navigation