$$3\times 2$$ : 3D Object Part Segmentation by 2D Semantic Correspondences

Thai, Anh; Wang, Weiyao; Tang, Hao; Stojanov, Stefan; Rehg, James M.; Feiszli, Matt

doi:10.1007/978-3-031-72920-1_9

Anh Thai^13,14,
Weiyao Wang¹⁴,
Hao Tang¹⁴,
Stefan Stojanov¹³,
James M. Rehg¹⁵ &
…
Matt Feiszli¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15096))

Included in the following conference series:

European Conference on Computer Vision

379 Accesses

Abstract

3D object part segmentation is essential in computer vision applications. While substantial progress has been made in 2D object part segmentation, the 3D counterpart has received less attention, in part due to the scarcity of annotated 3D datasets, which are expensive to collect. In this work, we propose to leverage a few annotated 3D shapes or richly annotated 2D datasets to perform 3D object part segmentation. We present our novel approach, termed 3-By-2 that achieves SOTA performance on different benchmarks with various granularity levels. By using features from pretrained foundation models and exploiting semantic and geometric correspondences, we are able to overcome the challenges of limited 3D annotations. Our approach leverages available 2D labels, enabling effective 3D object part segmentation. Our method 3-By-2 can accommodate various part taxonomies and granularities, demonstrating part label transfer ability across different object categories. Project website: https://ngailapdi.github.io/projects/3by2/.

Work done as an intern at Meta AI (FAIR).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models

Guided SAM: Label-Efficient Part Segmentation

Unsupervised Part Discovery by Unsupervised Disentanglement

References

Abdelreheem, A., Skorokhodov, I., Ovsjanikov, M., Wonka, P.: SATR: zero-shot semantic segmentation of 3D shapes. arXiv preprint arXiv:2304.04909 (2023)
Amir, S., Gandelsman, Y., Bagon, S., Dekel, T.: Deep ViT features as dense visual descriptors. arXiv preprint arXiv:2112.058142(3), 4 (2021)
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)
Google Scholar
Cen, J., et al.: Segment anything in 3D with NeRFs. In: Advances in Neural Information Processing Systems, vol. 36 (2024)
Google Scholar
Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
Chen, N., Liu, L., Cui, Z., Chen, R., Ceylan, D., Tu, C., Wang, W.: Unsupervised learning of intrinsic structural representation points. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9121–9130 (2020)
Google Scholar
Dai, A., Nießner, M.: 3DMV: joint 3D-multi-view prediction for 3D semantic scene segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 458–474. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_28
Chapter Google Scholar
Deng, S., Xu, X., Wu, C., Chen, K., Jia, K.: 3D affordancenet: a benchmark for visual object affordance understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1778–1787 (2021)
Google Scholar
Gupta, A., Dollar, P., Girshick, R.: LVIS: a dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5356–5364 (2019)
Google Scholar
He, J., et al.: PartImageNet: a large, high-quality dataset of parts. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13668, pp. 128–145. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20074-8_8
Chapter Google Scholar
Hedlin, E., et al.: Unsupervised semantic correspondence using stable diffusion. arXiv preprint arXiv:2305.15581 (2023)
Huang, R., et al.: Segment3D: learning fine-grained class-agnostic 3D segmentation without manual labels. arXiv preprint arXiv:2312.17232 (2023)
Jaritz, M., Gu, J., Su, H.: Multi-view pointnet for 3D scene understanding. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 3995–4003 (2019). https://api.semanticscholar.org/CorpusID:203593088
Kalogerakis, E., Hertzmann, A., Singh, K.: Learning 3D mesh segmentation and labeling. ACM Trans. Graph. 29(3) (2010)
Google Scholar
Kim, H., Sung, M.: PartSTAD: 2D-to-3D part segmentation task adaptation (2024)
Google Scholar
Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)
Li, L.H., et al.: Grounded language-image pre-training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10965–10975 (2022)
Google Scholar
Li, Y., et al.: 3D CoMPaT: composition of materials on parts of 3D things. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13668, pp. 110–127. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20074-8_7
Chapter Google Scholar
Liu, M., et al.: Partslip: low-shot part segmentation for 3D point clouds via pretrained image-language models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21736–21746 (2023)
Google Scholar
Liu, W., Mao, J., Hsu, J., Hermans, T., Garg, A., Wu, J.: Composable part-based manipulation. In: 7th Annual Conference on Robot Learning (2023). https://openreview.net/forum?id=o-K3HVUeEw
Liu, X., Xu, X., Rao, A., Gan, C., Yi, L.: AutoGPart: intermediate supervision search for generalizable 3D part segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11624–11634 (2022)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Article Google Scholar
Min, J., Lee, J., Ponce, J., Cho, M.: SPair-71k: a large-scale benchmark for semantic correspondence. arXiv preprint arXiv:1908.10543 (2019)
Mo, K., et al.: PartNet: a large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 909–918 (2019)
Google Scholar
Nadeau, P., Giamou, M., Kelly, J.: The sum of its parts: visual part segmentation for inertial parameter identification of manipulated objects. arXiv preprint arXiv:2302.06685 (2023)
Nguyen, P.D.A., et al.: Open3DIS: open-vocabulary 3D instance segmentation with 2D mask guidance (2023)
Google Scholar
Peng, S., et al.: OpenScene: 3D scene understanding with open vocabularies. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 815–824 (2023)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Qian, G., et al.: PointNext: revisiting pointnet++ with improved training and scaling strategies. In: Advances in Neural Information Processing Systems, vol. 35, pp. 23192–23204 (2022)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Ramanathan, V., et al.: Paco: parts and attributes of common objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7141–7151 (2023)
Google Scholar
Sharma, G., Yin, K., Maji, S., Kalogerakis, E., Litany, O., Fidler, S.: MvDeCor: multi-view dense correspondence learning for fine-grained 3D segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13662, pp. 550–567. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20086-1_32
Chapter Google Scholar
Singh, C., Murdoch, W.J., Yu, B.: Hierarchical interpretations for neural network predictions. arXiv preprint arXiv:1806.05337 (2018)
Sun, P., et al.: Going denser with open-vocabulary part segmentation. arXiv preprint arXiv:2305.11173 (2023)
Takmaz, A., Fedele, E., Sumner, R.W., Pollefeys, M., Tombari, F., Engelmann, F.: Openmask3D: open-vocabulary 3D instance segmentation. arXiv preprint arXiv:2306.13631 (2023)
Tang, L., Jia, M., Wang, Q., Phoo, C.P., Hariharan, B.: Emergent correspondence from image diffusion. arXiv preprint arXiv:2306.03881 (2023)
Varadarajan, K.M., Vincze, M.: Object part segmentation and classification in range images for grasping. In: 2011 15th International Conference on Advanced Robotics (ICAR), pp. 21–27. IEEE (2011)
Google Scholar
Vu, T., Kim, K., Luu, T.M., Nguyen, T., Yoo, C.D.: Softgroup for 3D instance segmentation on point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2708–2717 (2022)
Google Scholar
Wang, L., Li, X., Fang, Y.: Few-shot learning of part-specific probability space for 3D shape segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Wang, R., Zhang, Y., Mao, J., Zhang, R., Cheng, C.Y., Wu, J.: Ikea-manual: seeing shape assembly step by step. In: Advances in Neural Information Processing Systems, vol. 35, pp. 28428–28440 (2022)
Google Scholar
Xiang, F., et al.: SAPIEN: a simulated part-based interactive environment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11097–11107 (2020)
Google Scholar
Xu, M., Yin, X., Qiu, L., Liu, Y., Tong, X., Han, X.: SAMPro3D: locating SAM prompts in 3D for zero-shot scene segmentation. arXiv preprint arXiv:2311.17707 (2023)
Xue, Y., Chen, N., Liu, J., Sun, W.: Zerops: high-quality cross-modal knowledge transfer for zero-shot 3D part segmentation (2023)
Google Scholar
Yang, Y., Wu, X., He, T., Zhao, H., Liu, X.: SAM3D: segment anything in 3d scenes. arXiv preprint arXiv:2306.03908 (2023)
Yi, L., et al.: A scalable active framework for region annotation in 3D shape collections. In: SIGGRAPH Asia (2016)
Google Scholar
Yu, Q., Du, H., Liu, C., Yu, X.: When 3D bounding-box meets SAM: point cloud instance segmentation with weak-and-noisy supervision. arXiv abs/2309.00828 (2023). https://api.semanticscholar.org/CorpusID:261530997
Zhang, J., et al.: A tale of two features: stable diffusion complements DINO for zero-shot semantic correspondence. arXiv preprint arXiv:2305.15347 (2023)
Zhao, L., Lu, J., Zhou, J.: Similarity-aware fusion network for 3D semantic segmentation. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1585–1592 (2021). https://api.semanticscholar.org/CorpusID:235732071
Zhao, N., Chua, T.S., Lee, G.H.: Few-shot 3D point cloud semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8873–8882 (2021)
Google Scholar
Zhou, Y., Gu, J., Li, X., Liu, M., Fang, Y., Su, H.: PartSLIP++: enhancing low-shot 3d part segmentation via multi-view instance segmentation and maximum likelihood estimation. arXiv preprint arXiv:2312.03015 (2023)
Zhu, J., et al.: Label transfer between images and 3D shapes via local correspondence encoding. Comput. Aided Geom. Des. 71(C), 255–266 (2019). https://doi.org/10.1016/j.cagd.2019.04.009
Zhu, X., et al.: PointCLIP V2: prompting clip and GPT for powerful 3D open-world learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2639–2650 (2023)
Google Scholar

Download references

Acknowledgement

This work was partly supported by NIH R01HD104624-01A1.

Author information

Authors and Affiliations

Georgia Institute of Technology, Atlanta, USA
Anh Thai & Stefan Stojanov
Meta AI, FAIR, New York, USA
Anh Thai, Weiyao Wang, Hao Tang & Matt Feiszli
University of Illinois Urbana-Champaign, Champaign, USA
James M. Rehg

Authors

Anh Thai
View author publications
You can also search for this author in PubMed Google Scholar
Weiyao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Tang
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Stojanov
View author publications
You can also search for this author in PubMed Google Scholar
James M. Rehg
View author publications
You can also search for this author in PubMed Google Scholar
Matt Feiszli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anh Thai .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3793 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thai, A., Wang, W., Tang, H., Stojanov, S., Rehg, J.M., Feiszli, M. (2025). $3\times 2$: 3D Object Part Segmentation by 2D Semantic Correspondences. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15096. Springer, Cham. https://doi.org/10.1007/978-3-031-72920-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-72920-1_9
Published: 01 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72919-5
Online ISBN: 978-3-031-72920-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

\(3\times 2\): 3D Object Part Segmentation by 2D Semantic Correspondences

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models

Guided SAM: Label-Efficient Part Segmentation

Unsupervised Part Discovery by Unsupervised Disentanglement

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 3793 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

\(3\times 2\): 3D Object Part Segmentation by 2D Semantic Correspondences

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models

Guided SAM: Label-Efficient Part Segmentation

Unsupervised Part Discovery by Unsupervised Disentanglement

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 3793 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us