skip to main content
10.1145/3581783.3612063acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

OCSKB: An Object Component Sketch Knowledge Base for Fast 6D Pose Estimation

Published: 27 October 2023 Publication History

Abstract

6D pose estimation from a single RGB image is a fundamental task in computer vision. In most methods of instance-level or category-level 6D pose estimation, accurate CAD models or point cloud models are indispensable part. It is not easy to quickly obtain the models of these everyday objects. To address this issue, we present a part-level object component sketch knowledge base which consists of 270 real-world object sketch models of 30 categories. Objects are disassembled into geometry components with spatial relationship according to their functions and structures, and convert them into three basic spatial structures: frustum, circular truncated cone, and sphere. We present a fast pipeline for sketch modeling with our tool. The average time for this method to build a simple model for everyday objects is about 2 minutes. Additionally, we leverage the geometric information and spatial relationships inherent in the multiple viewpoint projection maps of these sketch bases to develop a rapid inference framework for 6D pose estimation. The interpretable steps in our framework gradually retrieve and activate valid solutions in the discrete 6D pose space. Extensive experiments in real-world environments have demonstrated that our method can reliably and robustly estimate the 6D pose of objects, even without access to accurate CAD or point cloud models. Furthermore, our method achieves state-of-the-art performance, operating at a speed of 90 frames per second using parallel computing on GPU.

References

[1]
Irving Biederman. 1987. Recognition-by-components: a theory of human image understanding. Psychological review, Vol. 94, 2 (1987), 115.
[2]
Berk Calli, Arjun Singh, Aaron Walsman, Siddhartha Srinivasa, Pieter Abbeel, and Aaron M Dollar. 2015. The ycb object and model set: Towards common benchmarks for manipulation research. In 2015 international conference on advanced robotics (ICAR). IEEE, 510--517.
[3]
Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. 2015. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015).
[4]
Dengsheng Chen, Jun Li, Zheng Wang, and Kai Xu. 2020b. Learning canonical shape space for category-level 6d object pose and size estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11973--11982.
[5]
Wei Chen, Xi Jia, Hyung Jin Chang, Jinming Duan, Linlin Shen, and Ales Leonardis. 2021. Fs-net: Fast shape-based network for category-level 6d object pose estimation with decoupled rotation mechanism. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1581--1590.
[6]
Xu Chen, Zijian Dong, Jie Song, Andreas Geiger, and Otmar Hilliges. 2020a. Category level object pose estimation via neural analysis-by-synthesis. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXVI 16. Springer, 139--156.
[7]
Zhaoxin Fan, Zhenbo Song, Jian Xu, Zhicheng Wang, Kejian Wu, Hongyan Liu, and Jun He. 2022. Object level depth reconstruction for category level 6d object pose estimation from monocular RGB image. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part II. Springer, 220--236.
[8]
Can Gümeli, Angela Dai, and Matthias Nießner. 2022. ROCA: robust CAD model retrieval and alignment from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4022--4031.
[9]
Yisheng He, Wei Sun, Haibin Huang, Jianran Liu, Haoqiang Fan, and Jian Sun. 2020. Pvn3d: A deep point-wise 3d keypoints voting network for 6dof pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11632--11641.
[10]
Stefan Hinterstoisser, Stefan Holzer, Cedric Cagniart, Slobodan Ilic, Kurt Konolige, Nassir Navab, and Vincent Lepetit. 2011. Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In 2011 International Conference on Computer Vision. IEEE, 858--865.
[11]
Stefan Hinterstoisser, Vincent Lepetit, Slobodan Ilic, Stefan Holzer, Gary Bradski, Kurt Konolige, and Nassir Navab. 2013. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Computer Vision--ACCV 2012: 11th Asian Conference on Computer Vision, Daejeon, Korea, November 5-9, 2012, Revised Selected Papers, Part I 11. Springer, 548--562.
[12]
Tomáš Hodan, Pavel Haluza, Štepán Obdrvz álek, Jiri Matas, Manolis Lourakis, and Xenophon Zabulis. 2017. T-LESS: An RGB-D dataset for 6D pose estimation of texture-less objects. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 880--888.
[13]
Glenn Jocher. 2020. YOLOv5 by Ultralytics. https://doi.org/10.5281/zenodo.3908559
[14]
Wadim Kehl, Fabian Manhardt, Federico Tombari, Slobodan Ilic, and Nassir Navab. 2017. Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again. In Proceedings of the IEEE International Conference on Computer Vision. 1521--1529.
[15]
Taeyeop Lee, Byeong-Uk Lee, Myungchul Kim, and In So Kweon. 2021. Category-level metric scale object shape and pose estimation. IEEE Robotics and Automation Letters, Vol. 6, 4 (2021), 8575--8582.
[16]
Chi Li, Jin Bai, and Gregory D Hager. 2018. A unified framework for multi-view multi-class object pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV). 254--269.
[17]
Zhi-Hao Lin, Sheng-Yu Huang, and Yu-Chiang Frank Wang. 2020. Convolution in the cloud: Learning deformable kernels in 3d graph convolution networks for point cloud analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1800--1809.
[18]
Liu Liu, Wenqiang Xu, Haoyuan Fu, Sucheng Qian, Qiaojun Yu, Yang Han, and Cewu Lu. 2022. AKB-48: A Real-World Articulated Object Knowledge Base. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14809--14818.
[19]
Fabian Manhardt, Wadim Kehl, and Adrien Gaidon. 2019. Roi-10d: Monocular lifting of 2d detection to 6d pose and metric shape. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2069--2078.
[20]
Fabian Manhardt, Gu Wang, Benjamin Busam, Manuel Nickel, Sven Meier, Luca Minciullo, Xiangyang Ji, and Nassir Navab. 2020. CPS: Improving class-level 6D pose and shape estimation from monocular images with self-supervised learning. arXiv preprint arXiv:2003.05848 (2020).
[21]
Eric Marchand, Hideaki Uchiyama, and Fabien Spindler. 2015. Pose estimation for augmented reality: a hands-on survey. IEEE transactions on visualization and computer graphics, Vol. 22, 12 (2015), 2633--2651.
[22]
Kaichun Mo, Shilin Zhu, Angel X Chang, Li Yi, Subarna Tripathi, Leonidas J Guibas, and Hao Su. 2019. Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 909--918.
[23]
Yinyu Nie, Xiaoguang Han, Shihui Guo, Yujian Zheng, Jian Chang, and Jian Jun Zhang. 2020. Total3dunderstanding: Joint layout, object pose and mesh reconstruction for indoor scenes from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 55--64.
[24]
Markus Oberweger, Mahdi Rad, and Vincent Lepetit. 2018. Making deep heatmaps robust to partial occlusions for 3d object pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV). 119--134.
[25]
Kiru Park, Timothy Patten, and Markus Vincze. 2019. Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7668--7677.
[26]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, Vol. 32 (2019).
[27]
Sida Peng, Yuan Liu, Qixing Huang, Xiaowei Zhou, and Hujun Bao. 2019. Pvnet: Pixel-wise voting network for 6dof pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4561--4570.
[28]
Mahdi Rad and Vincent Lepetit. 2017. Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In Proceedings of the IEEE International Conference on Computer Vision. 3828--3836.
[29]
Fulin Tang, Yihong Wu, Xiaohui Hou, and Haibin Ling. 2019. 3D mapping and 6D pose computation for real time augmented reality on cylindrical objects. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 30, 9 (2019), 2887--2899.
[30]
Bugra Tekin, Sudipta N Sinha, and Pascal Fua. 2018. Real-time seamless single shot 6d object pose prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 292--301.
[31]
Meng Tian, Marcelo H Ang, and Gim Hee Lee. 2020. Shape prior deformation for categorical 6d object pose and size estimation. In European Conference on Computer Vision. Springer, 530--546.
[32]
Jonathan Tremblay, Thang To, Balakumar Sundaralingam, Yu Xiang, Dieter Fox, and Stan Birchfield. 2018. Deep object pose estimation for semantic robotic grasping of household objects. arXiv preprint arXiv:1809.10790 (2018).
[33]
Chen Wang, Danfei Xu, Yuke Zhu, Roberto Martíin-Martín, Cewu Lu, Li Fei-Fei, and Silvio Savarese. 2019b. Densefusion: 6d object pose estimation by iterative dense fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3343--3352.
[34]
Gu Wang, Fabian Manhardt, Federico Tombari, and Xiangyang Ji. 2021. Gdr-net: Geometry-guided direct regression network for monocular 6d object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16611--16621.
[35]
He Wang, Srinath Sridhar, Jingwei Huang, Julien Valentin, Shuran Song, and Leonidas J Guibas. 2019a. Normalized object coordinate space for category-level 6d object pose and size estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2642--2651.
[36]
Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, et al. 2020. Sapien: A simulated part-based interactive environment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11097--11107.
[37]
Yu Xiang, Tanner Schmidt, Venkatraman Narayanan, and Dieter Fox. 2017. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017).
[38]
Xue Yang and Junchi Yan. 2022. On the arbitrary-oriented object detection: Classification based approaches revisited. International Journal of Computer Vision, Vol. 130, 5 (2022), 1340--1365.

Index Terms

  1. OCSKB: An Object Component Sketch Knowledge Base for Fast 6D Pose Estimation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 6d pose estimation
    2. geometry components
    3. interpretable steps
    4. knowledge base

    Qualifiers

    • Research-article

    Funding Sources

    • Guangzhou Key Laboratory of Scene Understanding and Intelligent Interaction
    • Natural Science Foundation of China
    • Guangzhou Key Research and Development Program

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 118
      Total Downloads
    • Downloads (Last 12 months)43
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media