OP-Align: Object-Level and Part-Level Alignment for Self-supervised Category-Level Articulated Object Pose Estimation

Che, Yuchen; Furukawa, Ryo; Kanezaki, Asako

doi:10.1007/978-3-031-73226-3_5

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15133))

Included in the following conference series:

European Conference on Computer Vision

163 Accesses

Abstract

Category-level articulated object pose estimation focuses on the pose estimation of unknown articulated objects within known categories. Despite its significance, this task remains challenging due to the varying shapes and poses of objects, expensive dataset annotation costs, and complex real-world environments. In this paper, we propose a novel self-supervised approach that leverages a single-frame point cloud to solve this task. Our model consistently generates reconstruction with a canonical pose and joint state for the entire input object, and it estimates object-level poses that reduce overall pose variance and part-level poses that align each part of the input with its corresponding part of the reconstruction. Experimental results demonstrate that our approach significantly outperforms previous self-supervised methods and is comparable to the state-of-the-art supervised methods. To assess the performance of our model in real-world scenarios, we also introduce a new real-world articulated object benchmark dataset (Code and dataset are released at https://github.com/YC-Che/OP-Align.).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

U-COPE: Taking a Further Step to Universal 9D Category-Level Object Pose Estimation

Unsupervised Pose-aware Part Decomposition for Man-Made Articulated Objects

Instance- and Category-Level 6D Object Pose Estimation

References

Abbatematteo, B., Tellex, S., Konidaris, G.: Learning to generalize kinematic models to novel objects. In: Proceedings of the 3rd Conference on Robot Learning (2019)
Google Scholar
Chang, A.X., et al.: ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
Chen, H., Liu, S., Chen, W., Li, H., Hill, R.: Equivariant point network for 3D point cloud analysis. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 14514–14523 (2021)
Google Scholar
Chen, W., Jia, X., Chang, H.J., Duan, J., Shen, L., Leonardis, A.: FS-Net: fast shape-based network for category-level 6d object pose estimation with decoupled rotation mechanism. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1581–1590 (2021)
Google Scholar
Chu, R., Liu, Z., Ye, X., Tan, X., Qi, X., Fu, C.W., Jia, J.: Command-driven articulated object understanding and manipulation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8813–8823 (2023)
Google Scholar
Di, Y., Zhang, R., Lou, Z., Manhardt, F., Ji, X., Navab, N., Tombari, F.: GPV-Pose: category-level object pose estimation via geometry-guided point-wise voting. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6781–6791 (2022)
Google Scholar
Harris, C.R., et al.: Array programming with NumPy. Nature 585(7825), 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2
Article Google Scholar
Hausman, K., Niekum, S., Osentoski, S., Sukhatme, G.S.: Active articulation model estimation through interactive perception. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp. 3305–3312. IEEE (2015)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
Huang, J., et al.: MultiBodySync: multi-body segmentation and motion estimation via 3D scan synchronization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7108–7118 (2021)
Google Scholar
Insafutdinov, E., Dosovitskiy, A.: Unsupervised learning of shape and pose with differentiable point clouds. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Irshad, M.Z., Kollar, T., Laskey, M., Stone, K., Kira, Z.: CenterSnap: single-shot multi-object 3D shape reconstruction and categorical 6D pose and size estimation. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp. 10632–10640. IEEE (2022)
Google Scholar
Irshad, M.Z., Zakharov, S., Ambrus, R., Kollar, T., Kira, Z., Gaidon, A.: ShAPO: implicit representations for multi-object shape, appearance, and pose optimization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022. ECCV 2022. LNCS, vol. 13662, pp. 275–292. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20086-1_16
Jiang, H., Mao, Y., Savva, M., Chang, A.X.: OPD: single-view 3D openable part detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022. ECCV 2022. LNCS, vol. 13699, pp. 410–426. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19842-7_24
Jiang, Z., Hsu, C.C., Zhu, Y.: Ditto: Building digital twins of articulated objects from interaction. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5616–5626 (2022)
Google Scholar
Kawana, Y., Mukuta, Y., Harada, T.: Unsupervised pose-aware part decomposition for man-made articulated objects. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022. ECCV 2022. LNCS, vol. 13663, pp. 558–575. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20062-5_32
Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)
Lei, J., Daniilidis, K.: CaDex: learning canonical deformation coordinate space for dynamic surface representation via neural homeomorphism. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6624–6634 (2022)
Google Scholar
Li, C., Bai, J., Hager, G.D.: A unified framework for multi-view multi-class object pose estimation. In: European Conference on Computer Vision, pp. 254–269 (2018)
Google Scholar
Li, X., Wang, H., Yi, L., Guibas, L.J., Abbott, A.L., Song, S.: Category-level articulated object pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3706–3715 (2020)
Google Scholar
Li, X., et al.: Leveraging SE(3) equivariance for self-supervised category-level object pose estimation from point clouds. Adv. Neural Inform. Process. Syst. 34, 15370–15381 (2021)
Google Scholar
Lin, Z.H., Huang, S.Y., Wang, Y.C.F.: Convolution in the cloud: learning deformable kernels in 3D graph convolution networks for point cloud analysis. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1800–1809 (2020)
Google Scholar
Liu, G., et al.: Semi-weakly supervised object kinematic motion prediction. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 21726–21735 (2023)
Google Scholar
Liu, X., Zhang, J., Hu, R., Huang, H., Wang, H., Yi, L.: Self-supervised category-level articulated object pose estimation with part-level se (3) equivariance. In: International Conference on Learning Representations (2023)
Google Scholar
Liu, Y., et al.: HOI4D: a 4D egocentric dataset for category-level human-object interaction. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 21013–21022 (2022)
Google Scholar
Locatello, F., et al.: Object-centric learning with slot attention. Adv. Neural Inform. Process. Syst. 33, 11525–11538 (2020)
Google Scholar
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: Learning 3D reconstruction in function space. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4460–4470 (2019)
Google Scholar
Mo, K., et al.: PartNet: a large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 909–918 (2019)
Google Scholar
Mu, J., Qiu, W., Kortylewski, A., Yuille, A., Vasconcelos, N., Wang, X.: A-SDF: learning disentangled signed distance functions for articulated shape representation. In: International Conference on Computer Vision, pp. 13001–13011 (2021)
Google Scholar
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 165–174 (2019)
Google Scholar
Paschalidou, D., Katharopoulos, A., Geiger, A., Fidler, S.: Neural parts: learning expressive 3D shape abstractions with invertible neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3204–3215 (2021)
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Segal, A., Haehnel, D., Thrun, S.: Generalized-ICP. In: Robotics: science and systems. vol. 2, p. 435. Seattle, WA (2009)
Google Scholar
Shi, Y., Cao, X., Zhou, B.: Self-supervised learning of part mobility from point cloud sequence. In: Computer Graphics Forum. vol. 40, pp. 104–116. Wiley Online Library (2021)
Google Scholar
Sundermeyer, M., Marton, Z.C., Durner, M., Triebel, R.: Augmented autoencoders: implicit 3D orientation learning for 6D object detection. Int. J. Comput. Vis. 128, 714–729 (2020)
Article Google Scholar
Tian, M., Ang, M.H., Lee, G.H.: Shape prior deformation for categorical 6D object pose and size estimation. In: European Conference on Computer Vision (2020)
Google Scholar
Wang, G., Manhardt, F., Shao, J., Ji, X., Navab, N., Tombari, F.: Self6D: self-supervised monocular 6D object pose estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 108–125. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_7
Chapter Google Scholar
Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2642–2651 (2019)
Google Scholar
Wang, X., Zhou, B., Shi, Y., Chen, X., Zhao, Q., Xu, K.: Shape2Motion: joint analysis of motion parts and attributes from 3D shapes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8876–8884 (2019)
Google Scholar
Weng, Y., et al.: CAPTRA: CAtegory-level pose tracking for rigid and articulated objects from point clouds. In: International Conference on Computer Vision, pp. 13209–13218 (2021)
Google Scholar
Wu, T., Pan, L., Zhang, J., Wang, T., Liu, Z., Lin, D.: Density-aware chamfer distance as a comprehensive metric for point cloud completion. arXiv preprint arXiv:2111.12702 (2021)
Xiang, F., et al.: SAPIEN: A simulAted part-based interactive ENvironment. In: IEEE Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Zhu, M., Ghaffari, M., Clark, W.A., Peng, H.: E2PN: efficient se(3)-equivariant point network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1223–1232 (2023)
Google Scholar

Download references

Acknowledgements

We thank Ryutaro Yamauchi and Tatsushi Matsubayashi from ALBERT Inc. (now Accenture Japan Ltd.) for their insightful suggestions and support. This work was supported by JST FOREST Program, Grant Number JPMJFR206H.

Author information

Authors and Affiliations

Tokyo Institute of Technology, Tokyo, Japan
Yuchen Che & Asako Kanezaki
Accenture Japan Ltd, Tokyo, Japan
Ryo Furukawa

Authors

Yuchen Che
View author publications
You can also search for this author in PubMed Google Scholar
Ryo Furukawa
View author publications
You can also search for this author in PubMed Google Scholar
Asako Kanezaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuchen Che .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2812 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Che, Y., Furukawa, R., Kanezaki, A. (2025). OP-Align: Object-Level and Part-Level Alignment for Self-supervised Category-Level Articulated Object Pose Estimation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15133. Springer, Cham. https://doi.org/10.1007/978-3-031-73226-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-73226-3_5
Published: 01 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73225-6
Online ISBN: 978-3-031-73226-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

OP-Align: Object-Level and Part-Level Alignment for Self-supervised Category-Level Articulated Object Pose Estimation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

U-COPE: Taking a Further Step to Universal 9D Category-Level Object Pose Estimation

Unsupervised Pose-aware Part Decomposition for Man-Made Articulated Objects

Instance- and Category-Level 6D Object Pose Estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2812 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

OP-Align: Object-Level and Part-Level Alignment for Self-supervised Category-Level Articulated Object Pose Estimation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

U-COPE: Taking a Further Step to Universal 9D Category-Level Object Pose Estimation

Unsupervised Pose-aware Part Decomposition for Man-Made Articulated Objects

Instance- and Category-Level 6D Object Pose Estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2812 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation