Skip to main content

\(3\times 2\): 3D Object Part Segmentation by 2D Semantic Correspondences

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15096))

Included in the following conference series:

  • 379 Accesses

Abstract

3D object part segmentation is essential in computer vision applications. While substantial progress has been made in 2D object part segmentation, the 3D counterpart has received less attention, in part due to the scarcity of annotated 3D datasets, which are expensive to collect. In this work, we propose to leverage a few annotated 3D shapes or richly annotated 2D datasets to perform 3D object part segmentation. We present our novel approach, termed 3-By-2 that achieves SOTA performance on different benchmarks with various granularity levels. By using features from pretrained foundation models and exploiting semantic and geometric correspondences, we are able to overcome the challenges of limited 3D annotations. Our approach leverages available 2D labels, enabling effective 3D object part segmentation. Our method 3-By-2 can accommodate various part taxonomies and granularities, demonstrating part label transfer ability across different object categories. Project website: https://ngailapdi.github.io/projects/3by2/.

Work done as an intern at Meta AI (FAIR).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abdelreheem, A., Skorokhodov, I., Ovsjanikov, M., Wonka, P.: SATR: zero-shot semantic segmentation of 3D shapes. arXiv preprint arXiv:2304.04909 (2023)

  2. Amir, S., Gandelsman, Y., Bagon, S., Dekel, T.: Deep ViT features as dense visual descriptors. arXiv preprint arXiv:2112.058142(3), 4 (2021)

  3. Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)

    Google Scholar 

  4. Cen, J., et al.: Segment anything in 3D with NeRFs. In: Advances in Neural Information Processing Systems, vol. 36 (2024)

    Google Scholar 

  5. Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)

  6. Chen, N., Liu, L., Cui, Z., Chen, R., Ceylan, D., Tu, C., Wang, W.: Unsupervised learning of intrinsic structural representation points. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9121–9130 (2020)

    Google Scholar 

  7. Dai, A., Nießner, M.: 3DMV: joint 3D-multi-view prediction for 3D semantic scene segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 458–474. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_28

    Chapter  Google Scholar 

  8. Deng, S., Xu, X., Wu, C., Chen, K., Jia, K.: 3D affordancenet: a benchmark for visual object affordance understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1778–1787 (2021)

    Google Scholar 

  9. Gupta, A., Dollar, P., Girshick, R.: LVIS: a dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5356–5364 (2019)

    Google Scholar 

  10. He, J., et al.: PartImageNet: a large, high-quality dataset of parts. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13668, pp. 128–145. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20074-8_8

    Chapter  Google Scholar 

  11. Hedlin, E., et al.: Unsupervised semantic correspondence using stable diffusion. arXiv preprint arXiv:2305.15581 (2023)

  12. Huang, R., et al.: Segment3D: learning fine-grained class-agnostic 3D segmentation without manual labels. arXiv preprint arXiv:2312.17232 (2023)

  13. Jaritz, M., Gu, J., Su, H.: Multi-view pointnet for 3D scene understanding. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 3995–4003 (2019). https://api.semanticscholar.org/CorpusID:203593088

  14. Kalogerakis, E., Hertzmann, A., Singh, K.: Learning 3D mesh segmentation and labeling. ACM Trans. Graph. 29(3) (2010)

    Google Scholar 

  15. Kim, H., Sung, M.: PartSTAD: 2D-to-3D part segmentation task adaptation (2024)

    Google Scholar 

  16. Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)

  17. Li, L.H., et al.: Grounded language-image pre-training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10965–10975 (2022)

    Google Scholar 

  18. Li, Y., et al.: 3D CoMPaT: composition of materials on parts of 3D things. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13668, pp. 110–127. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20074-8_7

    Chapter  Google Scholar 

  19. Liu, M., et al.: Partslip: low-shot part segmentation for 3D point clouds via pretrained image-language models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21736–21746 (2023)

    Google Scholar 

  20. Liu, W., Mao, J., Hsu, J., Hermans, T., Garg, A., Wu, J.: Composable part-based manipulation. In: 7th Annual Conference on Robot Learning (2023). https://openreview.net/forum?id=o-K3HVUeEw

  21. Liu, X., Xu, X., Rao, A., Gan, C., Yi, L.: AutoGPart: intermediate supervision search for generalizable 3D part segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11624–11634 (2022)

    Google Scholar 

  22. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)

    Article  Google Scholar 

  23. Min, J., Lee, J., Ponce, J., Cho, M.: SPair-71k: a large-scale benchmark for semantic correspondence. arXiv preprint arXiv:1908.10543 (2019)

  24. Mo, K., et al.: PartNet: a large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 909–918 (2019)

    Google Scholar 

  25. Nadeau, P., Giamou, M., Kelly, J.: The sum of its parts: visual part segmentation for inertial parameter identification of manipulated objects. arXiv preprint arXiv:2302.06685 (2023)

  26. Nguyen, P.D.A., et al.: Open3DIS: open-vocabulary 3D instance segmentation with 2D mask guidance (2023)

    Google Scholar 

  27. Peng, S., et al.: OpenScene: 3D scene understanding with open vocabularies. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 815–824 (2023)

    Google Scholar 

  28. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  29. Qian, G., et al.: PointNext: revisiting pointnet++ with improved training and scaling strategies. In: Advances in Neural Information Processing Systems, vol. 35, pp. 23192–23204 (2022)

    Google Scholar 

  30. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  31. Ramanathan, V., et al.: Paco: parts and attributes of common objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7141–7151 (2023)

    Google Scholar 

  32. Sharma, G., Yin, K., Maji, S., Kalogerakis, E., Litany, O., Fidler, S.: MvDeCor: multi-view dense correspondence learning for fine-grained 3D segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13662, pp. 550–567. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20086-1_32

    Chapter  Google Scholar 

  33. Singh, C., Murdoch, W.J., Yu, B.: Hierarchical interpretations for neural network predictions. arXiv preprint arXiv:1806.05337 (2018)

  34. Sun, P., et al.: Going denser with open-vocabulary part segmentation. arXiv preprint arXiv:2305.11173 (2023)

  35. Takmaz, A., Fedele, E., Sumner, R.W., Pollefeys, M., Tombari, F., Engelmann, F.: Openmask3D: open-vocabulary 3D instance segmentation. arXiv preprint arXiv:2306.13631 (2023)

  36. Tang, L., Jia, M., Wang, Q., Phoo, C.P., Hariharan, B.: Emergent correspondence from image diffusion. arXiv preprint arXiv:2306.03881 (2023)

  37. Varadarajan, K.M., Vincze, M.: Object part segmentation and classification in range images for grasping. In: 2011 15th International Conference on Advanced Robotics (ICAR), pp. 21–27. IEEE (2011)

    Google Scholar 

  38. Vu, T., Kim, K., Luu, T.M., Nguyen, T., Yoo, C.D.: Softgroup for 3D instance segmentation on point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2708–2717 (2022)

    Google Scholar 

  39. Wang, L., Li, X., Fang, Y.: Few-shot learning of part-specific probability space for 3D shape segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  40. Wang, R., Zhang, Y., Mao, J., Zhang, R., Cheng, C.Y., Wu, J.: Ikea-manual: seeing shape assembly step by step. In: Advances in Neural Information Processing Systems, vol. 35, pp. 28428–28440 (2022)

    Google Scholar 

  41. Xiang, F., et al.: SAPIEN: a simulated part-based interactive environment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11097–11107 (2020)

    Google Scholar 

  42. Xu, M., Yin, X., Qiu, L., Liu, Y., Tong, X., Han, X.: SAMPro3D: locating SAM prompts in 3D for zero-shot scene segmentation. arXiv preprint arXiv:2311.17707 (2023)

  43. Xue, Y., Chen, N., Liu, J., Sun, W.: Zerops: high-quality cross-modal knowledge transfer for zero-shot 3D part segmentation (2023)

    Google Scholar 

  44. Yang, Y., Wu, X., He, T., Zhao, H., Liu, X.: SAM3D: segment anything in 3d scenes. arXiv preprint arXiv:2306.03908 (2023)

  45. Yi, L., et al.: A scalable active framework for region annotation in 3D shape collections. In: SIGGRAPH Asia (2016)

    Google Scholar 

  46. Yu, Q., Du, H., Liu, C., Yu, X.: When 3D bounding-box meets SAM: point cloud instance segmentation with weak-and-noisy supervision. arXiv abs/2309.00828 (2023). https://api.semanticscholar.org/CorpusID:261530997

  47. Zhang, J., et al.: A tale of two features: stable diffusion complements DINO for zero-shot semantic correspondence. arXiv preprint arXiv:2305.15347 (2023)

  48. Zhao, L., Lu, J., Zhou, J.: Similarity-aware fusion network for 3D semantic segmentation. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1585–1592 (2021). https://api.semanticscholar.org/CorpusID:235732071

  49. Zhao, N., Chua, T.S., Lee, G.H.: Few-shot 3D point cloud semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8873–8882 (2021)

    Google Scholar 

  50. Zhou, Y., Gu, J., Li, X., Liu, M., Fang, Y., Su, H.: PartSLIP++: enhancing low-shot 3d part segmentation via multi-view instance segmentation and maximum likelihood estimation. arXiv preprint arXiv:2312.03015 (2023)

  51. Zhu, J., et al.: Label transfer between images and 3D shapes via local correspondence encoding. Comput. Aided Geom. Des. 71(C), 255–266 (2019). https://doi.org/10.1016/j.cagd.2019.04.009

  52. Zhu, X., et al.: PointCLIP V2: prompting clip and GPT for powerful 3D open-world learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2639–2650 (2023)

    Google Scholar 

Download references

Acknowledgement

This work was partly supported by NIH R01HD104624-01A1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anh Thai .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3793 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Thai, A., Wang, W., Tang, H., Stojanov, S., Rehg, J.M., Feiszli, M. (2025). \(3\times 2\): 3D Object Part Segmentation by 2D Semantic Correspondences. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15096. Springer, Cham. https://doi.org/10.1007/978-3-031-72920-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72920-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72919-5

  • Online ISBN: 978-3-031-72920-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics