research-article

LocalPose: Object Pose Estimation with Local Geometry Guidance

Authors:

Jingwei HuangAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 8176 - 8184

https://doi.org/10.1145/3581783.3612089

Published: 27 October 2023 Publication History

Abstract

We present LocalPose, a novel method for 9 DoF object pose estimation from object point clouds. Existing works regress pose directly from the global shape embedding and are limited to a fixed set of shapes. We identify that the global object pose is closely related to local geometry properties like surface orientations at representative regions. Therefore, our key idea is to summarize local geometry properties as pose signatures at each point and aggregate them into the global pose, where local pose signatures are easier to learn by the network and generalize to novel shapes. We find two types of pose signatures that benefit pose estimation. First, we learn a neural network to predict 9 DoF pose signatures as pose candidates, and the process of voting them for the object pose. Second, we treat surface normals as direct pose regulators who help to select a subset of pose candidates to achieve the best accuracy. Experiments show that our method outperforms the state-of-the-art in terms of fine-grained pose accuracy on synthetic and real datasets, contributed by both pose signatures as candidates and regulators.

References

[1]

Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. 2008. Speeded-Up Robust Features (SURF). Computer Vision and Image Understanding (CVIU), Vol. 110, 3 (2008), 346--359. https://doi.org/10.1016/j.cviu.2007.09.014 Similarity Matching in Computer Vision and Multimedia.

Digital Library

[2]

Dingding Cai, Janne Heikkilä, and Esa Rahtu. 2022. OVE6D: Object Viewpoint Encoding for Depth-based 6D Object Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6803--6813.

[3]

Angel X. Chang, Thomas A. Funkhouser, Leonidas J. Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, L. Yi, and Fisher Yu. 2015. ShapeNet: An Information-Rich 3D Model Repository. ArXiv, Vol. abs/1512.03012 (2015).

[4]

Dengsheng Chen, Jun Li, and Kai Xu. 2020. Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), 11970--11979.

[5]

Kai Chen and Qi Dou. 2021. Sgpa: Structure-guided prior adaptation for category-level 6d object pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2773--2782.

[6]

Wei Chen, Xi Jia, Hyung Jin Chang, Jinming Duan, Shen Linlin, and Ales Leonardis. 2021. FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1581--1590.

[7]

Yan Di, Ruida Zhang, Zhiqiang Lou, Fabian Manhardt, Xiangyang Ji, Nassir Navab, and Federico Tombari. 2022. GPV-Pose: Category-Level Object Pose Estimation via Geometry-Guided Point-Wise Voting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6781--6791.

[8]

Bertram Drost, Markus Ulrich, Nassir Navab, and Slobodan Ilic. 2010. Model globally, match locally: Efficient and robust 3D object recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (2010), 998--1005.

[9]

Wadim Kehl, Fabian Manhardt, Federico Tombari, Slobodan Ilic, and Nassir Navab. 2017. SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).

[10]

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. 2023. Segment Anything. arXiv:2304.02643 (2023).

[11]

Vincent Lepetit, Francesc Moreno-Noguer, and P. Fua. 2009. EPnP: An Accurate O(n) Solution to the PnP Problem. International Journal of Computer Vision (IJCV), Vol. 81 (2009), 155--166.

Digital Library

[12]

Zhigang Li, Gu Wang, and Xiangyang Ji. 2019. CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

[13]

Haitao Lin, Zichang Liu, Chilam Cheang, Yanwei Fu, Guodong Guo, and Xiangyang Xue. 2022a. SAR-Net: Shape Alignment and Recovery Network for Category-Level 6D Object Pose and Size Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6707--6717.

[14]

Jiehong Lin, Zewei Wei, Changxing Ding, and Kui Jia. 2022b. Category-Level 6D Object Pose and Size Estimation using Self-Supervised Deep Prior Deformation Networks. In European Conference on Computer Vision (ECCV).

Digital Library

[15]

Jiehong Lin, Zewei Wei, Zhihao Li, Songcen Xu, Kui Jia, and Yuanqing Li. 2021. DualPoseNet: Category-level 6D Object Pose and Size Estimation Using Dual Pose Network with Refined Learning of Pose Consistency. IEEE/CVF International Conference on Computer Vision (ICCV) (2021), 3540--3549.

[16]

Zhi-Hao Lin, Sheng Yu Huang, and Y. Wang. 2020. Convolution in the Cloud: Learning Deformable Kernels in 3D Graph Convolution Networks for Point Cloud Analysis. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), 1797--1806.

[17]

Siyi Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chun yue Li, Jianwei Yang, Hang Su, Jun-Juan Zhu, and Lei Zhang. 2023. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection. ArXiv, Vol. abs/2303.05499 (2023).

[18]

Ze Liu, Zheng Zhang, Yue Cao, Han Hu, and Xin Tong. 2021. Group-Free 3D Object Detection via Transformers. IEEE/CVF International Conference on Computer Vision (ICCV) (2021), 2929--2938.

[19]

D.G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision (ICCV), Vol. 2. 1150--1157 vol.2. https://doi.org/10.1109/ICCV.1999.790410

[20]

Van Nguyen Nguyen, Yinlin Hu, Yang Xiao, Mathieu Salzmann, and Vincent Lepetit. 2022. Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).

[21]

Kiru Park, Timothy Patten, and Markus Vincze. 2019. Pix2Pose: Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation. In The IEEE International Conference on Computer Vision (ICCV).

[22]

Sida Peng, Yuan Liu, Qixing Huang, Xiaowei Zhou, and Hujun Bao. 2019. PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]

Charles R. Qi, Or Litany, Kaiming He, and Leonidas J. Guibas. 2019. Deep Hough Voting for 3D Object Detection in Point Clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

[24]

Mahdi Rad and Vincent Lepetit. 2017. BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects Without Using Depth. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).

[25]

Danila Rukhovich, Anna Vorontsova, and Anton Konushin. 2022. FCAF3D: fully convolutional anchor-free 3D object detection. In European Conference on Computer Vision (ECCV). Springer, 477--493.

Digital Library

[26]

Peter H. Schönemann. 1966. A generalized solution of the orthogonal procrustes problem. Psychometrika, Vol. 31 (1966), 1--10.

[27]

Martin Sundermeyer, Maximilian Durner, En Yen Puang, Zoltan-Csaba Marton, Narunas Vaskevicius, Kai O. Arras, and Rudolph Triebel. 2020. Multi-Path Learning for Object Pose Estimation Across Domains. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]

Bugra Tekin, Sudipta N. Sinha, and Pascal Fua. 2018. Real-Time Seamless Single Shot 6D Object Pose Prediction. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]

Meng Tian, Marcelo H Ang Jr, and Gim Hee Lee. 2020. Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation. In Proceedings of the European Conference on Computer Vision (ECCV).

Digital Library

[30]

S. Umeyama. 1991. Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 13, 4 (1991), 376--380. https://doi.org/10.1109/34.88573

Digital Library

[31]

Oriol Vinyals, Charles Blundell, Tim Lillicrap, Koray Kavukcuoglu, and Daan Wierstra. 2016. Matching Networks for One Shot Learning. In Advances in Neural Information Processing Systems (NeurIPS), Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 3630--3638.

[32]

He Wang, Srinath Sridhar, Jingwei Huang, Julien Valentin, Shuran Song, and Leonidas J. Guibas. 2019. Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]

Sergey Zakharov, Ivan Shugurov, and Slobodan Ilic. 2019. DPOD: 6D Pose Object Detector and Refiner. In International Conference on Computer Vision (ICCV).

[34]

Ruida Zhang, Yan Di, Zhiqiang Lou, Fabian Manhardt, Nassir Navab, Federico Tombari, and Xiangyang Ji. 2022. RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation. In European Conference on Computer Vision (ECCV).

[35]

Zaiwei Zhang, Bo Sun, Haitao Yang, and Qi-Xing Huang. 2020. H3DNet: 3D Object Detection Using Hybrid Geometric Primitives. In European Conference on Computer Vision (ECCV).

Digital Library

[36]

Chen Zhao, Yinlin Hu, and Mathieu Salzmann. 2022. Fusing Local Similarities for Retrieval-based 3D Orientation Estimation of Unseen Objects. In European Conference on Computer Vision (ECCV).

Digital Library

[37]

Linfang Zheng, Chen Wang, Yinghan Sun, Esha Dasgupta, Hua Chen, Ales Leonardis, Wei Zhang, and Hyung Jin Chang. 2023. HS-Pose: Hybrid Scope Feature Extraction for Category-level Object Pose Estimation. arxiv: 2303.15743 [cs.CV]

Index Terms

LocalPose: Object Pose Estimation with Local Geometry Guidance
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking
Computer Vision – ECCV 2024
Abstract
6D object pose estimation is crucial in the field of computer vision. However, it suffers from a significant lack of large-scale and diverse datasets, impeding comprehensive model evaluation and curtailing downstream applications. To address these ...
Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review
Abstract
This paper presents a comprehensive survey on vision-based robotic grasping. We conclude three key tasks during vision-based robotic grasping, which are object localization, object pose estimation and grasp estimation. In detail, the object ...
Adversarial imitation learning-based network for category-level 6D object pose estimation
Abstract
Category-level 6D object pose estimation is a very fundamental and key research in computer vision. In order to get rid of the dependence on the object 3D models, analysis-by-synthesis object pose estimation methods have recently been widely ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
153
Total Downloads

Downloads (Last 12 months)81
Downloads (Last 6 weeks)5

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents