Abstract
This paper introduces BFVTModel, an efficient 3D semantic segmentation Model. Multi-modal segmentators, which employ LiDAR and Camera sensors as input, have gained popularity owing to their capacity to leverage the semantic information in image data and complementary details in point cloud data. However, the segmentation model using multi-modal input faces some problems, such as the modal consistency problem and the model complexity problem. In this paper, we propose utilizing the idea of an Occupancy-based multi-view feature combination method to obtain the 3D information of the features of the two modalities independently. We also design a projection structure to obtain the three-axis features in different modalities according to the information, including the main view, side view, and top view features. We construct a module called Feature View Transform (FVT) that combines three-axis plane features with bias. The CNN neural network is used to replace attention mechanisms, reducing the dimensionality of the original three-dimensional space matrix, thereby reducing the parameter count and increasing the model’s speed. We used a bilinear mapping structure to fuse LiDAR and Camera features and complete the 3D semantic segmentation task. We validate the model on the NuScenes dataset and obtain competitive accuracy results while leading the way in efficiency evaluation metrics.
Supported by National Natural Science Foundation of China (Grant No. 62076193).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Caesar, H., et al.: nuscenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631 (2020)
Cen, J., et al.: Cmdfusion: bidirectional fusion network with cross-modality knowledge distillation for lidar semantic segmentation. IEEE Robot. Autom. Lett. 9(1), 771–778 (2023)
Cheng, R., Razani, R., Taghavi, E., Li, E., Liu, B.: 2-s3net: attentive feature fusion with adaptive feature selection for sparse semantic segmentation network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12547–12556 (2021)
Huang, Y., Zheng, W., Zhang, Y., Zhou, J., Lu, J.: Tri-perspective view for vision-based 3d semantic occupancy prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9223–9232 (2023)
Jaritz, M., Vu, T.H., De Charette, R., Wirbel, É., Pérez, P.: Cross-modal learning for domain adaptation in 3d semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 1533–1544 (2022)
Kim, Y., Park, K., Kim, M., Kum, D., Choi, J.W.: 3d dual-fusion: Dual-domain dual-query camera-lidar fusion for 3d object detection. arXiv preprint arXiv:2211.13529 (2022)
Li, J., Dai, H., Han, H., Ding, Y.: Mseg3d: multi-modal 3d semantic segmentation for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21694–21704 (2023)
Liu, Z., et al.: Bevfusion: multi-task multi-sensor fusion with unified bird’s-eye view representation. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 2774–2781. IEEE (2023)
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: Learning 3d reconstruction in function space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4460–4470 (2019)
Tang, H., et al.: Searching efficient 3D architectures with sparse point-voxel convolution. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 685–702. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_41
Xiao, A., et al.: 3d semantic segmentation in the wild: learning generalized models for adverse-condition point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9382–9392 (2023)
Yan, J., et al.: Cross modal transformer via coordinates encoding for 3d object dectection, 2(3), 4. arXiv preprint arXiv:2301.01283 (2023)
Yan, X., Gao, J., Zheng, C., Zheng, C., Zhang, R., Cui, S., Li, Z.: 2dpass: 2d priors assisted semantic segmentation on lidar point clouds. In: European Conference on Computer Vision, pp. 677–695. Springer (2022). https://doi.org/10.1007/978-3-031-19815-1_39
Zhang, Y., et al.: Polarnet: an improved grid representation for online lidar point clouds semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9601–9610 (2020)
Zhang, Y., Zhu, Z., Du, D.: Occformer: dual-path transformer for vision-based 3d semantic occupancy prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9433–9443 (2023)
Zhao, J., Mei, K.: Cascaded bilinear mapping collaborative hybrid attention modality fusion model. In: Chinese Conference on Pattern Recognition and Computer Vision (PRCV), pp. 287–298. Springer (2023). https://doi.org/10.1007/978-981-99-8435-0_2
Zhu, X., Zhou, H., Wang, T., Hong, F., Ma, Y., Li, W., Li, H., Lin, D.: Cylindrical and asymmetrical 3d convolution networks for lidar segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9939–9948 (2021)
Zhuang, Z., Li, R., Jia, K., Wang, Q., Li, Y., Tan, M.: Perception-aware multi-sensor fusion for 3d lidar semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16280–16290 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhao, J., Wei, FF., Liang, A., Mei, K. (2025). Efficient Matrix-Based Multi-view Projection Features Combined for Multi-modal 3D Semantic Segmentation. In: Hadfi, R., Anthony, P., Sharma, A., Ito, T., Bai, Q. (eds) PRICAI 2024: Trends in Artificial Intelligence. PRICAI 2024. Lecture Notes in Computer Science(), vol 15283. Springer, Singapore. https://doi.org/10.1007/978-981-96-0122-6_36
Download citation
DOI: https://doi.org/10.1007/978-981-96-0122-6_36
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0121-9
Online ISBN: 978-981-96-0122-6
eBook Packages: Computer ScienceComputer Science (R0)