Skip to main content

Learning to Adapt SAM for Segmenting Cross-Domain Point Clouds

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15101))

Included in the following conference series:

  • 492 Accesses

Abstract

Unsupervised domain adaptation (UDA) in 3D segmentation tasks presents a formidable challenge, primarily steming from the sparse and unordered nature of point clouds. Especially for LiDAR point clouds, the domain discrepancy becomes obvious across varying capture scenes, fluctuating weather conditions, and the diverse array of LiDAR devices in use. Inspired by the remarkable generalization capabilities exhibited by the vision foundation model, SAM, in the realm of image segmentation, our approach leverages the wealth of general knowledge embedded within SAM to unify feature representations across diverse 3D domains and further solves the 3D domain adaptation problem. Specifically, we harness the corresponding images associated with point clouds to facilitate knowledge transfer and propose an innovative hybrid feature augmentation methodology, which enhances the alignment between the 3D feature space and SAM’s feature space, operating at both the scene and instance levels. Our method is evaluated on many widely-recognized datasets and achieves state-of-the-art performance.

X. Zhu and Y. Ma—This work was supported by NSFC (No.62206173), Natural Science Foundation of Shanghai (No.22dz1201900), Shanghai Sailing Program (No.22YF1428700), MoE Key Laboratory of Intelligent Perception and Human-Machine Collaboration (ShanghaiTech University), Shanghai Frontiers Science Center of Human-centered Artificial Intelligence (ShangHAI).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bai, X., et al.: TransFusion: robust lidar-camera fusion for 3D object detection with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1090–1099 (2022)

    Google Scholar 

  2. Behley, J., et al.: SemanticKITTI: a dataset for semantic scene understanding of lidar sequences. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9297–9307 (2019)

    Google Scholar 

  3. Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631 (2020)

    Google Scholar 

  4. Cao, H., Xu, Y., Yang, J., Yin, P., Yuan, S., Xie, L.: MoPA: multi-modal prior aided domain adaptation for 3D semantic segmentation. arXiv preprint arXiv:2309.11839 (2023)

  5. Cardace, A., Ramirez, P.Z., Salti, S., Di Stefano, L.: Exploiting the complementarity of 2D and 3D networks to address domain-shift in 3D semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 98–109 (2023)

    Google Scholar 

  6. Chang, W.L., Wang, H.P., Peng, W.H., Chiu, W.C.: All About Structure: adapting structural information across domains for boosting semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1900–1909 (2019)

    Google Scholar 

  7. Chen, R., et al.: Towards label-free scene understanding by vision foundation models. In: Advances in Neural Information Processing Systems (2023)

    Google Scholar 

  8. Chen, R., et al.: Clip2scene: towards label-efficient 3D scene understanding by clip. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7020–7030 (2023)

    Google Scholar 

  9. Choy, C., Gwak, J., Savarese, S.: 4D Spatio-Temporal ConvNets: Minkowski convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3075–3084 (2019)

    Google Scholar 

  10. Cong, P., et al.: Weakly supervised 3D multi-person pose estimation for large-scale scenes based on monocular camera and single lidar. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 461–469 (2023)

    Google Scholar 

  11. Contributors, M.: MMDetection3D: OpenMMLab next-generation platform for general 3D object detection (2020). https://github.com/open-mmlab/mmdetection3d

  12. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  13. Fong, W.K., et al.: Panoptic nuScenes: a large-scale benchmark for lidar panoptic segmentation and tracking. IEEE Robot. Autom. Lett. 7(2), 3795–3802 (2022)

    Article  Google Scholar 

  14. Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4340–4349 (2016)

    Google Scholar 

  15. Geyer, J., et al.: A2D2: audi autonomous driving dataset. arXiv preprint arXiv:2004.06320 (2020)

  16. Graham, B., Engelcke, M., Van Der Maaten, L.: 3D semantic segmentation with submanifold sparse convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9224–9232 (2018)

    Google Scholar 

  17. Guo, Y., Wang, H., Hu, Q., Liu, H., Liu, L., Bennamoun, M.: Deep learning for 3D point clouds: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(12), 4338–4364 (2020)

    Article  Google Scholar 

  18. He, D., Abid, F., Kim, J.H.: Multimodal fusion and data augmentation for 3D semantic segmentation. In: IEEE International Conference on Control, Automation and Systems, pp. 1143–1148 (2022)

    Google Scholar 

  19. Jaritz, M., Vu, T.H., Charette, R.d., Wirbel, E., Pérez, P.: xMUDA: cross-modal unsupervised domain adaptation for 3D semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12605–12614 (2020)

    Google Scholar 

  20. Jaritz, M., Vu, T.H., De Charette, R., Wirbel, É., Pérez, P.: Cross-modal learning for domain adaptation in 3D semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 1533–1544 (2022)

    Article  Google Scholar 

  21. Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: International Conference on Machine Learning, pp. 4904–4916. PMLR (2021)

    Google Scholar 

  22. Kim, M., Byun, H.: Learning texture invariant representation for domain adaptation of semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12975–12984 (2020)

    Google Scholar 

  23. Kirillov, A., et al.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)

    Google Scholar 

  24. Kong, L., Ren, J., Pan, L., Liu, Z.: Lasermix for semi-supervised lidar semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21705–21715 (2023)

    Google Scholar 

  25. Krispel, G., Opitz, M., Waltner, G., Possegger, H., Bischof, H.: Fuseseg: Lidar point cloud segmentation fusing multi-modal data. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1874–1883 (2020)

    Google Scholar 

  26. Li, M., Zhang, Y., Ma, X., Qu, Y., Fu, Y.: BEV-DG: cross-modal learning under Bird’s-eye view for domain generalization of 3D semantic segmentation. arXiv preprint arXiv:2308.06530 (2023)

  27. Li, M., et al.: Cross-domain and cross-modal knowledge distillation in domain adaptation for 3D semantic segmentation. In: Proceedings of the ACM International Conference on Multimedia, pp. 3829–3837 (2022)

    Google Scholar 

  28. Liu, Y., et al.: Segment any point cloud sequences by distilling vision foundation models. arXiv preprint arXiv:2306.09347 (2023)

  29. Mei, J., et al.: Waymo Open Dataset: Panoramic video panoptic segmentation. In: European Conference on Computer Vision, pp. 53–72. Springer (2022). https://doi.org/10.1007/978-3-031-19818-2_4

  30. Morerio, P., Cavazza, J., Murino, V.: Minimal-entropy correlation alignment for unsupervised deep domain adaptation. arXiv preprint arXiv:1711.10288 (2017)

  31. Paszke, A., Gross, S., Massa, e.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  32. Peng, D., Lei, Y., Li, W., Zhang, P., Guo, Y.: Sparse-to-dense feature matching: intra and inter domain cross-modal learning in domain adaptation for 3D semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7108–7117 (2021)

    Google Scholar 

  33. Peng, S., Genova, K., Jiang, C., Tagliasacchi, A., Pollefeys, M., Funkhouser, T., et al.: OpenScene: 3D scene understanding with open vocabularies. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 815–824 (2023)

    Google Scholar 

  34. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)

    Google Scholar 

  35. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  36. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  37. Saltori, C., Galasso, F., Fiameni, G., Sebe, N., Poiesi, F., Ricci, E.: Compositional semantic mix for domain adaptation in point cloud segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2023)

    Google Scholar 

  38. Saltori, C., Galasso, F., Fiameni, G., Sebe, N., Ricci, E., Poiesi, F.: CoSMix: compositional semantic mix for domain adaptation in 3D lidar segmentation. In: European Conference on Computer Vision, pp. 586–602 (2022)

    Google Scholar 

  39. Shaban, A., Lee, J., Jung, S., Meng, X., Boots, B.: LiDAR-UDA: self-ensembling through time for unsupervised lidar domain adaptation (2023)

    Google Scholar 

  40. Touvron, H., et al.: LLaMA: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)

  41. Wang, W., et al.: InternImage: exploring large-scale vision foundation models with deformable convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14408–14419 (2023)

    Google Scholar 

  42. Wang, X., Zhang, X., Cao, Y., Wang, W., Shen, C., Huang, T.: SegGPT: segmenting everything in context. arXiv preprint arXiv:2304.03284 (2023)

  43. Xiao, A., Huang, J., Guan, D., Cui, K., Lu, S., Shao, L.: PolarMix: general data augmentation technique for lidar point clouds. Adv. Neural. Inf. Process. Syst. 35, 11035–11048 (2022)

    Google Scholar 

  44. Xu, Y., et al.: Human-centric scene understanding for 3D large-scale scenarios. arXiv preprint arXiv:2307.14392 (2023)

  45. Yan, X., et al.: 2DPASS: 2D priors assisted semantic segmentation on LiDAR point clouds. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVIII, pp. 677–695. Springer Nature Switzerland, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_39

    Chapter  Google Scholar 

  46. Yi, L., Gong, B., Funkhouser, T.: Complete & Label: a domain adaptation approach to semantic segmentation of lidar point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15363–15373 (2021)

    Google Scholar 

  47. Zhang, Y., Wang, Z.: Joint adversarial learning for domain adaptation in semantic segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 6877–6884 (2020)

    Google Scholar 

  48. Zhu, X., et al.: Cylindrical and asymmetrical 3D convolution networks for lidar segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9939–9948 (2021)

    Google Scholar 

  49. Zou, X., et al.: Segment everything everywhere all at once. arXiv preprint arXiv:2304.06718 (2023)

  50. Zou, Y., Yu, Z., Kumar, B., Wang, J.: Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In: European Conference on Computer Vision, pp. 289–305 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuexin Ma .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1372 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Peng, X. et al. (2025). Learning to Adapt SAM for Segmenting Cross-Domain Point Clouds. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15101. Springer, Cham. https://doi.org/10.1007/978-3-031-72775-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72775-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72774-0

  • Online ISBN: 978-3-031-72775-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics