research-article

OCSKB: An Object Component Sketch Knowledge Base for Fast 6D Pose Estimation

Authors:

Jiakai LuoAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 5819 - 5827

https://doi.org/10.1145/3581783.3612063

Published: 27 October 2023 Publication History

Abstract

6D pose estimation from a single RGB image is a fundamental task in computer vision. In most methods of instance-level or category-level 6D pose estimation, accurate CAD models or point cloud models are indispensable part. It is not easy to quickly obtain the models of these everyday objects. To address this issue, we present a part-level object component sketch knowledge base which consists of 270 real-world object sketch models of 30 categories. Objects are disassembled into geometry components with spatial relationship according to their functions and structures, and convert them into three basic spatial structures: frustum, circular truncated cone, and sphere. We present a fast pipeline for sketch modeling with our tool. The average time for this method to build a simple model for everyday objects is about 2 minutes. Additionally, we leverage the geometric information and spatial relationships inherent in the multiple viewpoint projection maps of these sketch bases to develop a rapid inference framework for 6D pose estimation. The interpretable steps in our framework gradually retrieve and activate valid solutions in the discrete 6D pose space. Extensive experiments in real-world environments have demonstrated that our method can reliably and robustly estimate the 6D pose of objects, even without access to accurate CAD or point cloud models. Furthermore, our method achieves state-of-the-art performance, operating at a speed of 90 frames per second using parallel computing on GPU.

References

[1]

Irving Biederman. 1987. Recognition-by-components: a theory of human image understanding. Psychological review, Vol. 94, 2 (1987), 115.

[2]

Berk Calli, Arjun Singh, Aaron Walsman, Siddhartha Srinivasa, Pieter Abbeel, and Aaron M Dollar. 2015. The ycb object and model set: Towards common benchmarks for manipulation research. In 2015 international conference on advanced robotics (ICAR). IEEE, 510--517.

[3]

Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. 2015. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015).

[4]

Dengsheng Chen, Jun Li, Zheng Wang, and Kai Xu. 2020b. Learning canonical shape space for category-level 6d object pose and size estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11973--11982.

[5]

Wei Chen, Xi Jia, Hyung Jin Chang, Jinming Duan, Linlin Shen, and Ales Leonardis. 2021. Fs-net: Fast shape-based network for category-level 6d object pose estimation with decoupled rotation mechanism. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1581--1590.

[6]

Xu Chen, Zijian Dong, Jie Song, Andreas Geiger, and Otmar Hilliges. 2020a. Category level object pose estimation via neural analysis-by-synthesis. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXVI 16. Springer, 139--156.

Digital Library

[7]

Zhaoxin Fan, Zhenbo Song, Jian Xu, Zhicheng Wang, Kejian Wu, Hongyan Liu, and Jun He. 2022. Object level depth reconstruction for category level 6d object pose estimation from monocular RGB image. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part II. Springer, 220--236.

[8]

Can Gümeli, Angela Dai, and Matthias Nießner. 2022. ROCA: robust CAD model retrieval and alignment from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4022--4031.

[9]

Yisheng He, Wei Sun, Haibin Huang, Jianran Liu, Haoqiang Fan, and Jian Sun. 2020. Pvn3d: A deep point-wise 3d keypoints voting network for 6dof pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11632--11641.

[10]

Stefan Hinterstoisser, Stefan Holzer, Cedric Cagniart, Slobodan Ilic, Kurt Konolige, Nassir Navab, and Vincent Lepetit. 2011. Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In 2011 International Conference on Computer Vision. IEEE, 858--865.

Digital Library

[11]

Stefan Hinterstoisser, Vincent Lepetit, Slobodan Ilic, Stefan Holzer, Gary Bradski, Kurt Konolige, and Nassir Navab. 2013. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Computer Vision--ACCV 2012: 11th Asian Conference on Computer Vision, Daejeon, Korea, November 5-9, 2012, Revised Selected Papers, Part I 11. Springer, 548--562.

Digital Library

[12]

Tomáš Hodan, Pavel Haluza, Štepán Obdrvz álek, Jiri Matas, Manolis Lourakis, and Xenophon Zabulis. 2017. T-LESS: An RGB-D dataset for 6D pose estimation of texture-less objects. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 880--888.

[13]

Glenn Jocher. 2020. YOLOv5 by Ultralytics. https://doi.org/10.5281/zenodo.3908559

[14]

Wadim Kehl, Fabian Manhardt, Federico Tombari, Slobodan Ilic, and Nassir Navab. 2017. Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again. In Proceedings of the IEEE International Conference on Computer Vision. 1521--1529.

[15]

Taeyeop Lee, Byeong-Uk Lee, Myungchul Kim, and In So Kweon. 2021. Category-level metric scale object shape and pose estimation. IEEE Robotics and Automation Letters, Vol. 6, 4 (2021), 8575--8582.

[16]

Chi Li, Jin Bai, and Gregory D Hager. 2018. A unified framework for multi-view multi-class object pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV). 254--269.

Digital Library

[17]

Zhi-Hao Lin, Sheng-Yu Huang, and Yu-Chiang Frank Wang. 2020. Convolution in the cloud: Learning deformable kernels in 3d graph convolution networks for point cloud analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1800--1809.

[18]

Liu Liu, Wenqiang Xu, Haoyuan Fu, Sucheng Qian, Qiaojun Yu, Yang Han, and Cewu Lu. 2022. AKB-48: A Real-World Articulated Object Knowledge Base. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14809--14818.

[19]

Fabian Manhardt, Wadim Kehl, and Adrien Gaidon. 2019. Roi-10d: Monocular lifting of 2d detection to 6d pose and metric shape. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2069--2078.

[20]

Fabian Manhardt, Gu Wang, Benjamin Busam, Manuel Nickel, Sven Meier, Luca Minciullo, Xiangyang Ji, and Nassir Navab. 2020. CPS: Improving class-level 6D pose and shape estimation from monocular images with self-supervised learning. arXiv preprint arXiv:2003.05848 (2020).

[21]

Eric Marchand, Hideaki Uchiyama, and Fabien Spindler. 2015. Pose estimation for augmented reality: a hands-on survey. IEEE transactions on visualization and computer graphics, Vol. 22, 12 (2015), 2633--2651.

[22]

Kaichun Mo, Shilin Zhu, Angel X Chang, Li Yi, Subarna Tripathi, Leonidas J Guibas, and Hao Su. 2019. Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 909--918.

[23]

Yinyu Nie, Xiaoguang Han, Shihui Guo, Yujian Zheng, Jian Chang, and Jian Jun Zhang. 2020. Total3dunderstanding: Joint layout, object pose and mesh reconstruction for indoor scenes from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 55--64.

[24]

Markus Oberweger, Mahdi Rad, and Vincent Lepetit. 2018. Making deep heatmaps robust to partial occlusions for 3d object pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV). 119--134.

Digital Library

[25]

Kiru Park, Timothy Patten, and Markus Vincze. 2019. Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7668--7677.

[26]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, Vol. 32 (2019).

[27]

Sida Peng, Yuan Liu, Qixing Huang, Xiaowei Zhou, and Hujun Bao. 2019. Pvnet: Pixel-wise voting network for 6dof pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4561--4570.

[28]

Mahdi Rad and Vincent Lepetit. 2017. Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In Proceedings of the IEEE International Conference on Computer Vision. 3828--3836.

[29]

Fulin Tang, Yihong Wu, Xiaohui Hou, and Haibin Ling. 2019. 3D mapping and 6D pose computation for real time augmented reality on cylindrical objects. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 30, 9 (2019), 2887--2899.

Digital Library

[30]

Bugra Tekin, Sudipta N Sinha, and Pascal Fua. 2018. Real-time seamless single shot 6d object pose prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 292--301.

[31]

Meng Tian, Marcelo H Ang, and Gim Hee Lee. 2020. Shape prior deformation for categorical 6d object pose and size estimation. In European Conference on Computer Vision. Springer, 530--546.

Digital Library

[32]

Jonathan Tremblay, Thang To, Balakumar Sundaralingam, Yu Xiang, Dieter Fox, and Stan Birchfield. 2018. Deep object pose estimation for semantic robotic grasping of household objects. arXiv preprint arXiv:1809.10790 (2018).

[33]

Chen Wang, Danfei Xu, Yuke Zhu, Roberto Martíin-Martín, Cewu Lu, Li Fei-Fei, and Silvio Savarese. 2019b. Densefusion: 6d object pose estimation by iterative dense fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3343--3352.

[34]

Gu Wang, Fabian Manhardt, Federico Tombari, and Xiangyang Ji. 2021. Gdr-net: Geometry-guided direct regression network for monocular 6d object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16611--16621.

[35]

He Wang, Srinath Sridhar, Jingwei Huang, Julien Valentin, Shuran Song, and Leonidas J Guibas. 2019a. Normalized object coordinate space for category-level 6d object pose and size estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2642--2651.

[36]

Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, et al. 2020. Sapien: A simulated part-based interactive environment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11097--11107.

[37]

Yu Xiang, Tanner Schmidt, Venkatraman Narayanan, and Dieter Fox. 2017. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017).

[38]

Xue Yang and Junchi Yan. 2022. On the arbitrary-oriented object detection: Classification based approaches revisited. International Journal of Computer Vision, Vol. 130, 5 (2022), 1340--1365.

Digital Library

Index Terms

OCSKB: An Object Component Sketch Knowledge Base for Fast 6D Pose Estimation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks

Recommendations

Incremental Object 6D Pose Estimation
Pattern Recognition
Abstract
We present a novel setting for 6D object pose estimation, where a model progressively adapts its parameters to estimate the pose of new objects without forgetting. This capability is crucial for real-world applications, particularly in scenarios ...
6D Object Pose Estimation by Visual Descriptor
RICAI '20: Proceedings of the 2020 2nd International Conference on Robotics, Intelligent Control and Artificial Intelligence

One essential component for object pose estimation is to extract the objects' features with suitable representation. For symmetrical objects and smooth objects that lack texture, the pose estimation results are not satisfactory because it is difficult ...
Neural Object Learning for 6D Pose Estimation Using a Few Cluttered Images
Computer Vision – ECCV 2020
Abstract
Recent methods for 6D pose estimation of objects assume either textured 3D models or real images that cover the entire range of target poses. However, it is difficult to obtain textured 3D models and annotate the poses of objects in real ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Guangzhou Key Laboratory of Scene Understanding and Intelligent Interaction
Natural Science Foundation of China
Guangzhou Key Research and Development Program

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
118
Total Downloads

Downloads (Last 12 months)43
Downloads (Last 6 weeks)5

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten