Leaping from 2D Detection to Efficient 6DoF Object Pose Estimation

Liu, Jinhui; Zou, Zhikang; Ye, Xiaoqing; Tan, Xiao; Ding, Errui; Xu, Feng; Yu, Xin

doi:10.1007/978-3-030-66096-3_47

Jinhui Liu^10,11,
Zhikang Zou¹⁰,
Xiaoqing Ye¹⁰,
Xiao Tan¹⁰,
Errui Ding¹⁰,
Feng Xu¹¹ &
…
Xin Yu¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12536))

Included in the following conference series:

European Conference on Computer Vision

2427 Accesses

Abstract

Estimating 6DoF object poses from single RGB images is very challenging due to severe occlusions and large search space of camera poses. Keypoint voting based methods have demonstrated its effectiveness and superiority on predicting object poses. However, those approaches are often affected by inaccurate semantic segmentation in computing the keypoint locations. To enable our model to focus on local regions without being distracted by backgrounds, we first localize object regions by a 2D object detector. In doing so, we not only reduce the search space of keypoints but also improve the robustness of the pose estimation. Moreover, since symmetric objects may suffer ambiguity along the symmetric dimension, we propose to select keypoints on the geometrically symmetric locations to resolve the ambiguity. The extensive experimental results on seven different datasets of the BOP challenge benchmark demonstrate that our method outperforms the state-of-the-art and achieves the 3-rd place in the BOP challenge.

The first three authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A RGB-D feature fusion network for occluded object 6D pose estimation

Article 13 June 2024

Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images

U-COPE: Taking a Further Step to Universal 9D Category-Level Object Pose Estimation

References

Gu, C., Ren, X.: Discriminative mixture-of-templates for viewpoint classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 408–421. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15555-0_30
Chapter Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Hinterstoisser, S., et al.: Gradient response maps for real-time detection of textureless objects. IEEE Trans. Pattern Anal. Mach. Intell. 34(5), 876–888 (2011)
Article Google Scholar
Hinterstoisser, S., et al.: Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 858–865. IEEE (2011)
Google Scholar
Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: ACCV (2012)
Google Scholar
Hodan, T., Barath, D., Matas, J.: EPOS: estimating 6D pose of objects with symmetries. In: CVPR (2020)
Google Scholar
Hodaň, T., et al.: BOP: benchmark for 6D object pose estimation. In: ECCV (2018)
Google Scholar
Hu, Y., Hugonot, J., Fua, P., Salzmann, M.: Segmentation-driven 6D object pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3385–3394 (2019)
Google Scholar
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1521–1529 (2017)
Google Scholar
Kiru, P., Timothy, P., Markus, V.: Pix2pose: pixel-wise coordinate regression of objects for 6D pose estimation. In: ICCV (2019)
Google Scholar
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPNP: an accurate O(n) solution to the PnP problem. Int. J. Comput. Vis. 81(2), 155 (2009)
Article Google Scholar
Liu, M.Y., Tuzel, O., Veeraraghavan, A., Chellappa, R.: Fast directional chamfer matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1696–1703. IEEE (2010)
Google Scholar
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6DoF pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4561–4570 (2019)
Google Scholar
Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: Proceedings of the IEEE International Conference on Computer Vision (CVPR), pp. 3828–3836 (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Rios-Cabrera, R., Tuytelaars, T.: Discriminatively trained templates for 3D object detection: a real time scalable approach. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2048–2055 (2013)
Google Scholar
Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2686–2694 (2015)
Google Scholar
Sundermeyer, M., Marton, Z.C., Durner, M., Triebel, R.: Augmented autoencoders: implicit 3D orientation learning for 6D object detection. IJCV 128(3), 714–729 (2020)
Article Google Scholar
Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6d object pose prediction. In: CVPR (2018)
Google Scholar
Tian, Y., Yu, X., Fan, B., Wu, F., Heijnen, H., Balntas, V.: Sosnet: second order similarity regularization for local descriptor learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11016–11025 (2019)
Google Scholar
Tulsiani, S., Malik, J.: Viewpoints and keypoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1510–1519 (2015)
Google Scholar
Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2642–2651 (2019)
Google Scholar
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: a convolutional neural network for 6D object pose estimation in cluttered scenes. Robotics: Science and Systems (2017)
Google Scholar
Yu, X., et al.: Unsupervised extraction of local image descriptors via relative distance ranking loss. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1–8 (2019)
Google Scholar
Yu, X., Zhuang, Z., Koniusz, P., Li, H.: 6DoF object pose estimation via differentiable proxy voting loss. In: BMVC (2020)
Google Scholar
Zakharov, S., Shugurov, I., Ilic, S.: DPOD: 6D pose object detector and refiner. In: ICCV (2019)
Google Scholar
Zhigang, L., Gu, W., Xiangyang, J.: CDPN: coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation. In: ICCV (2019)
Google Scholar
Zhu, M., et al.: Single image 3D object detection and pose estimation for grasping. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3936–3943. IEEE (2014)
Google Scholar

Download references

Acknowledgement

This work was supported by Baidu Inc., China, the National Key R&D Program of China 2018YFA0704000, the NSFC (No. 61822111, 61727808, 61671268) and Beijing Natural Science Foundation (JQ19015, L182052).

Author information

Authors and Affiliations

Baidu Inc., Beijing, China
Jinhui Liu, Zhikang Zou, Xiaoqing Ye, Xiao Tan & Errui Ding
Tsinghua University, Beijing, China
Jinhui Liu & Feng Xu
University of Technology Sydney, Ultimo, Australia
Xin Yu

Authors

Jinhui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhikang Zou
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqing Ye
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Tan
View author publications
You can also search for this author in PubMed Google Scholar
Errui Ding
View author publications
You can also search for this author in PubMed Google Scholar
Feng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xin Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Yu .

Editor information

Editors and Affiliations

University of Clermont Auvergne, Clermont Ferrand, France
Adrien Bartoli
Università degli Studi di Udine, Udine, Italy
Andrea Fusiello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, J. et al. (2020). Leaping from 2D Detection to Efficient 6DoF Object Pose Estimation. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12536. Springer, Cham. https://doi.org/10.1007/978-3-030-66096-3_47

Download citation

DOI: https://doi.org/10.1007/978-3-030-66096-3_47
Published: 03 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66095-6
Online ISBN: 978-3-030-66096-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Leaping from 2D Detection to Efficient 6DoF Object Pose Estimation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A RGB-D feature fusion network for occluded object 6D pose estimation

Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images

U-COPE: Taking a Further Step to Universal 9D Category-Level Object Pose Estimation

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Leaping from 2D Detection to Efficient 6DoF Object Pose Estimation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A RGB-D feature fusion network for occluded object 6D pose estimation

Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images

U-COPE: Taking a Further Step to Universal 9D Category-Level Object Pose Estimation

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation