skip to main content
10.1145/3581783.3612013acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Cross-modal Unsupervised Domain Adaptation for 3D Semantic Segmentation via Bidirectional Fusion-then-Distillation

Published: 27 October 2023 Publication History

Abstract

Cross-modal Unsupervised Domain Adaptation (UDA) becomes a research hotspot because it reduces the laborious annotation of target domain samples. Existing methods only mutually mimic the outputs of cross-modality in each domain, which enforces the class probability distribution agreeable in different domains. However, these methods ignore the complementarity brought by the modality fusion representation in cross-modal learning. In this paper, we propose a cross-modal UDA method for 3D semantic segmentation via Bidirectional Fusion-then-Distillation, named BFtD-xMUDA, which explores cross-modal fusion in UDA and realizes distribution consistency between outputs of two domains not only for 2D image and 3D point cloud but also for 2D/3D and fusion. Our method contains three significant components: Model-agnostic Feature Fusion Module (MFFM), Bidirectional Distillation (B-Distill), and Cross-modal Debiased Pseudo-Labeling (xDPL). MFFM is employed to generate cross-modal fusion features for establishing a latent space, which enforces maximum correlation and complementarity between two heterogeneous modalities. B-Distill is introduced to exploit bidirectional knowledge distillation which includes cross-modality and cross-domain fusion distillation, and well-achieving domain-modality alignment. xDPL is designed to model the uncertainty of pseudo-labels by self-training scheme. Extensive experimental results demonstrate that our method outperforms state-of-the-art competitors in several adaptation scenarios.

Supplemental Material

MP4 File
In this video, we will present the background, motivation, method, and experimental performance of BFtD-xMUDA. Maybe you are interested in our work if you are interested in the following topics: 1) Multi-modal learning; 2) Unsupervised domain adaptation; 3) Point cloud semantic segmentation; 4) Autonomous driving; 5) Scenario understanding. We will release the code publicly in the future. If you are interested in this direction, welcome to communicate and discuss with us.

References

[1]
Iro Armeni, Ozan Sener, Amir R Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 2016. 3D semantic parsing of large-scale indoor spaces. In CVPR. 1534--1543.
[2]
Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Jurgen Gall. 2019. SemanticKITTI: A dataset for semantic scene understanding of lidar sequences. In ICCV. 9297--9307.
[3]
Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. 2020. nuScenes: A multimodal dataset for autonomous driving. In CVPR. 11621--11631.
[4]
Ran Cheng, Ryan Razani, Ehsan Taghavi, Enxu Li, and Bingbing Liu. 2021. (AF)2-S3Net: Attentive feature fusion with adaptive feature selection for sparse semantic segmentation network. In CVPR. 12547--12556.
[5]
Christopher Choy, JunYoung Gwak, and Silvio Savarese. 2019. 4d spatio-temporal convnets: Minkowski convolutional neural networks. In CVPR. 3075--3084.
[6]
Tiago Cortinhal, George Tzelepis, and Eren Erdal Aksoy. 2020. SalsaNext: Fast, uncertainty-aware semantic segmentation of lidar point clouds. In ISVC. Springer, 207--222.
[7]
Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. 2017. ScanNet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR. 5828--5839.
[8]
Khaled El Madawi, Hazem Rashed, Ahmad El Sallab, Omar Nasr, Hanan Kamel, and Senthil Yogamani. 2019. Rgb and lidar fusion based 3d semantic segmentation for autonomous driving. In ITSC. IEEE, 7--12.
[9]
Kyle Genova, Xiaoqi Yin, Abhijit Kundu, Caroline Pantofaru, Forrester Cole, Avneesh Sud, Brian Brewington, Brian Shucker, and Thomas Funkhouser. 2021. Learning 3D semantic segmentation with only 2D image supervision. In 3DV. IEEE, 361--372.
[10]
Jakob Geyer, Yohannes Kassahun, Mentar Mahmudi, Xavier Ricou, Rupesh Durgesh, Andrew S Chung, Lorenz Hauswald, Viet Hoang Pham, Maximilian Mühlegg, Sebastian Dorn, et al. 2020. A2D2: Audi autonomous driving dataset. arXiv preprint arXiv:2004.06320 (2020).
[11]
Benjamin Graham, Martin Engelcke, and Laurens Van Der Maaten. 2018. 3d semantic segmentation with submanifold sparse convolutional networks. In CVPR. 9224--9232.
[12]
Meng-Hao Guo, Zheng-Ning Liu, Tai-Jiang Mu, and Shi-Min Hu. 2022. Beyond self-attention: External attention using two linear layers for visual tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 5 (2022), 5436--5447.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.
[14]
Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate Saenko, Alexei Efros, and Trevor Darrell. 2018. CyCADA: Cycle-consistent adversarial domain adaptation. In ICML. Pmlr, 1989--1998.
[15]
Yuenan Hou, Xinge Zhu, Yuexin Ma, Chen Change Loy, and Yikang Li. 2022. Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation. In CVPR. 8479--8488.
[16]
Lukas Hoyer, Dengxin Dai, and Luc Van Gool. 2022a. DAFormer: Improving network architectures and training strategies for domain-adaptive semantic segmentation. In CVPR. 9924--9935.
[17]
Lukas Hoyer, Dengxin Dai, and Luc Van Gool. 2022b. HRDA: Context-aware high-resolution domain-adaptive semantic segmentation. In ECCV. Springer, 372--391.
[18]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In CVPR. 7132--7141.
[19]
Qingyong Hu, Bo Yang, Linhai Xie, Stefano Rosa, Yulan Guo, Zhihua Wang, Niki Trigoni, and Andrew Markham. 2020. RandLA-Net: Efficient semantic segmentation of large-scale point clouds. In CVPR. 11108--11117.
[20]
Maximilian Jaritz, Tuan-Hung Vu, Raoul de Charette, Emilie Wirbel, and Patrick Pérez. 2020. xMUDA: Cross-modal unsupervised domain adaptation for 3d semantic segmentation. In CVPR. 12605--12614.
[21]
Georg Krispel, Michael Opitz, Georg Waltner, Horst Possegger, and Horst Bischof. 2020. FuseSeg: Lidar point cloud segmentation fusing multi-modal data. In WACV. 1874--1883.
[22]
Miaoyu Li, Yachao Zhang, Yuan Xie, Zuodong Gao, Cuihua Li, Zhizhong Zhang, and Yanyun Qu. 2022. Cross-Domain and Cross-Modal Knowledge Distillation in Domain Adaptation for 3D Semantic Segmentation. In ACMMM. 3829--3837.
[23]
Yunsheng Li, Lu Yuan, and Nuno Vasconcelos. 2019. Bidirectional learning for domain adaptation of semantic segmentation. In CVPR. 6936--6945.
[24]
Wei Liu, Zhiming Luo, Yuanzheng Cai, Ying Yu, Yang Ke, José Marcato Junior, Wesley Nunes Goncc alves, and Jonathan Li. 2021. Adversarial unsupervised domain adaptation for 3D semantic segmentation with multi-modal learning. ISPRS, Vol. 176 (2021), 211--221.
[25]
Yawei Luo, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. 2019. Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In CVPR. 2507--2516.
[26]
Andres Milioto, Ignacio Vizzo, Jens Behley, and Cyrill Stachniss. 2019. RangeNet: Fast and accurate lidar semantic segmentation. In IROS. IEEE, 4213--4220.
[27]
Duo Peng, Yinjie Lei, Wen Li, Pingping Zhang, and Yulan Guo. 2021. Sparse-to-dense feature matching: Intra and inter domain cross-modal learning in domain adaptation for 3d semantic segmentation. In ICCV. 7108--7117.
[28]
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017a. PointNet: Deep learning on point sets for 3d classification and segmentation. In CVPR. 652--660.
[29]
Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017b. PointNet: Deep hierarchical feature learning on point sets in a metric space. NeurIPS (2017), 5099--5108.
[30]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In MICCAI. Springer, 234--241.
[31]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. ImageNet large scale visual recognition challenge. IJCV, Vol. 115, 3 (2015), 211--252.
[32]
Cristiano Saltori, Fabio Galasso, Giuseppe Fiameni, Nicu Sebe, Elisa Ricci, and Fabio Poiesi. 2022. CosMix: Compositional semantic mix for domain adaptation in 3d lidar segmentation. In ECCV. Springer, 586--602.
[33]
Haotian Tang, Zhijian Liu, Shengyu Zhao, Yujun Lin, Ji Lin, Hanrui Wang, and Song Han. 2020a. Searching efficient 3d architectures with sparse point-voxel convolution. In ECCV. Springer, 685--702.
[34]
Haotian Tang, Zhijian Liu, Shengyu Zhao, Yujun Lin, Ji Lin, Hanrui Wang, and Song Han. 2020b. Searching efficient 3d architectures with sparse point-voxel convolution. In ECCV. Springer, 685--702.
[35]
Hugues Thomas, Charles R Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Francc ois Goulette, and Leonidas J Guibas. 2019. KPConv: Flexible and deformable convolution for point clouds. In ICCV. 6411--6420.
[36]
Yi-Hsuan Tsai, Wei-Chih Hung, Samuel Schulter, Kihyuk Sohn, Ming-Hsuan Yang, and Manmohan Chandraker. 2018. Learning to adapt structured output space for semantic segmentation. In CVPR. 7472--7481.
[37]
Tuan-Hung Vu, Himalaya Jain, Maxime Bucher, Matthieu Cord, and Patrick Pérez. 2019. AdvEnt: Adversarial entropy minimization for domain adaptation in semantic segmentation. In CVPR. 2517--2526.
[38]
Bichen Wu, Alvin Wan, Xiangyu Yue, and Kurt Keutzer. 2018. SqueezeSeg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud. In ICRA. IEEE, 1887--1893.
[39]
Bichen Wu, Xuanyu Zhou, Sicheng Zhao, Xiangyu Yue, and Kurt Keutzer. 2019b. SqueezeSegv2: Improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud. In ICRA. IEEE, 4376--4382.
[40]
Wenxuan Wu, Zhongang Qi, and Li Fuxin. 2019a. PointConv: Deep convolutional networks on 3d point clouds. In CVPR. 9621--9630.
[41]
Aoran Xiao, Jiaxing Huang, Dayan Guan, Fangneng Zhan, and Shijian Lu. 2022. Transfer learning from synthetic to real lidar point cloud for semantic segmentation. In AAAI, Vol. 36. 2795--2803.
[42]
Chenfeng Xu, Bichen Wu, Zining Wang, Wei Zhan, Peter Vajda, Kurt Keutzer, and Masayoshi Tomizuka. 2020. SqueezeSegv3: Spatially-adaptive convolution for efficient point-cloud segmentation. In ECCV. Springer, 1--19.
[43]
Jianyun Xu, Ruixiang Zhang, Jian Dou, Yushi Zhu, Jie Sun, and Shiliang Pu. 2021. RPVNet: A deep and efficient range-point-voxel fusion network for lidar point cloud segmentation. In ICCV. 16024--16033.
[44]
Xu Yan, Jiantao Gao, Chaoda Zheng, Chao Zheng, Ruimao Zhang, Shuguang Cui, and Zhen Li. 2022. 2DPASS: 2d priors assisted semantic segmentation on lidar point clouds. In ECCV. Springer, 677--695.
[45]
Li Yi, Boqing Gong, and Thomas Funkhouser. 2021. Complete & Label: A domain adaptation approach to semantic segmentation of lidar point clouds. In CVPR. 15363--15373.
[46]
Pan Zhang, Bo Zhang, Ting Zhang, Dong Chen, Yong Wang, and Fang Wen. 2021. Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmentation. In CVPR. 12414--12424.
[47]
Yachao Zhang, Miaoyu Li, Yuan Xie, Cuihua Li, Cong Wang, Zhizhong Zhang, and Yanyun Qu. 2022. Self-supervised Exclusive Learning for 3D Segmentation with Cross-Modal Unsupervised Domain Adaptation. In ACMMM. 3338--3346.
[48]
Yang Zhang, Zixiang Zhou, Philip David, Xiangyu Yue, Zerong Xi, Boqing Gong, and Hassan Foroosh. 2020. PolarNet: An improved grid representation for online lidar point clouds semantic segmentation. In CVPR. 9601--9610.
[49]
Sicheng Zhao, Yezhen Wang, Bo Li, Bichen Wu, Yang Gao, Pengfei Xu, Trevor Darrell, and Kurt Keutzer. 2021. ePointDA: An end-to-end simulation-to-real domain adaptation framework for lidar point cloud segmentation. In AAAI, Vol. 35. 3500--3509.
[50]
Zhedong Zheng and Yi Yang. 2021. Rectifying pseudo label learning via uncertainty estimation for domain adaptive semantic segmentation. IJCV, Vol. 129, 4 (2021), 1106--1120.
[51]
Zhuangwei Zhuang, Rong Li, Kui Jia, Qicheng Wang, Yuanqing Li, and Mingkui Tan. 2021. Perception-aware multi-sensor fusion for 3d lidar semantic segmentation. In ICCV. 16280--16290.

Cited By

View all
  • (2025)RE-GZSL: Relation Extrapolation for Generalized Zero-Shot LearningIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.348607435:3(1973-1986)Online publication date: Mar-2025
  • (2024)LaneCMKT: Boosting Monocular 3D Lane Detection with Cross-Modal Knowledge TransferProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681038(4283-4291)Online publication date: 28-Oct-2024

Index Terms

  1. Cross-modal Unsupervised Domain Adaptation for 3D Semantic Segmentation via Bidirectional Fusion-then-Distillation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 3d semantic segmentation
    2. unsupervised domain adaptation

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)188
    • Downloads (Last 6 weeks)13
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)RE-GZSL: Relation Extrapolation for Generalized Zero-Shot LearningIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.348607435:3(1973-1986)Online publication date: Mar-2025
    • (2024)LaneCMKT: Boosting Monocular 3D Lane Detection with Cross-Modal Knowledge TransferProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681038(4283-4291)Online publication date: 28-Oct-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media