Abstract
Estimating 6DoF object poses from single RGB images is very challenging due to severe occlusions and large search space of camera poses. Keypoint voting based methods have demonstrated its effectiveness and superiority on predicting object poses. However, those approaches are often affected by inaccurate semantic segmentation in computing the keypoint locations. To enable our model to focus on local regions without being distracted by backgrounds, we first localize object regions by a 2D object detector. In doing so, we not only reduce the search space of keypoints but also improve the robustness of the pose estimation. Moreover, since symmetric objects may suffer ambiguity along the symmetric dimension, we propose to select keypoints on the geometrically symmetric locations to resolve the ambiguity. The extensive experimental results on seven different datasets of the BOP challenge benchmark demonstrate that our method outperforms the state-of-the-art and achieves the 3-rd place in the BOP challenge.
The first three authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Gu, C., Ren, X.: Discriminative mixture-of-templates for viewpoint classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 408–421. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15555-0_30
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Hinterstoisser, S., et al.: Gradient response maps for real-time detection of textureless objects. IEEE Trans. Pattern Anal. Mach. Intell. 34(5), 876–888 (2011)
Hinterstoisser, S., et al.: Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 858–865. IEEE (2011)
Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: ACCV (2012)
Hodan, T., Barath, D., Matas, J.: EPOS: estimating 6D pose of objects with symmetries. In: CVPR (2020)
Hodaň, T., et al.: BOP: benchmark for 6D object pose estimation. In: ECCV (2018)
Hu, Y., Hugonot, J., Fua, P., Salzmann, M.: Segmentation-driven 6D object pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3385–3394 (2019)
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1521–1529 (2017)
Kiru, P., Timothy, P., Markus, V.: Pix2pose: pixel-wise coordinate regression of objects for 6D pose estimation. In: ICCV (2019)
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPNP: an accurate O(n) solution to the PnP problem. Int. J. Comput. Vis. 81(2), 155 (2009)
Liu, M.Y., Tuzel, O., Veeraraghavan, A., Chellappa, R.: Fast directional chamfer matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1696–1703. IEEE (2010)
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6DoF pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4561–4570 (2019)
Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: Proceedings of the IEEE International Conference on Computer Vision (CVPR), pp. 3828–3836 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Rios-Cabrera, R., Tuytelaars, T.: Discriminatively trained templates for 3D object detection: a real time scalable approach. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2048–2055 (2013)
Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2686–2694 (2015)
Sundermeyer, M., Marton, Z.C., Durner, M., Triebel, R.: Augmented autoencoders: implicit 3D orientation learning for 6D object detection. IJCV 128(3), 714–729 (2020)
Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6d object pose prediction. In: CVPR (2018)
Tian, Y., Yu, X., Fan, B., Wu, F., Heijnen, H., Balntas, V.: Sosnet: second order similarity regularization for local descriptor learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11016–11025 (2019)
Tulsiani, S., Malik, J.: Viewpoints and keypoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1510–1519 (2015)
Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2642–2651 (2019)
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: a convolutional neural network for 6D object pose estimation in cluttered scenes. Robotics: Science and Systems (2017)
Yu, X., et al.: Unsupervised extraction of local image descriptors via relative distance ranking loss. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1–8 (2019)
Yu, X., Zhuang, Z., Koniusz, P., Li, H.: 6DoF object pose estimation via differentiable proxy voting loss. In: BMVC (2020)
Zakharov, S., Shugurov, I., Ilic, S.: DPOD: 6D pose object detector and refiner. In: ICCV (2019)
Zhigang, L., Gu, W., Xiangyang, J.: CDPN: coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation. In: ICCV (2019)
Zhu, M., et al.: Single image 3D object detection and pose estimation for grasping. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3936–3943. IEEE (2014)
Acknowledgement
This work was supported by Baidu Inc., China, the National Key R&D Program of China 2018YFA0704000, the NSFC (No. 61822111, 61727808, 61671268) and Beijing Natural Science Foundation (JQ19015, L182052).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, J. et al. (2020). Leaping from 2D Detection to Efficient 6DoF Object Pose Estimation. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12536. Springer, Cham. https://doi.org/10.1007/978-3-030-66096-3_47
Download citation
DOI: https://doi.org/10.1007/978-3-030-66096-3_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66095-6
Online ISBN: 978-3-030-66096-3
eBook Packages: Computer ScienceComputer Science (R0)