Skip to main content

6 DoF Pose Estimation of Textureless Objects from Multiple RGB Frames

  • Conference paper
  • First Online:
Book cover Computer Vision – ECCV 2020 Workshops (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12536))

Included in the following conference series:

Abstract

This paper addresses the problems of object detection and 6 DoF pose estimation from a sequence of RGB images. Our deep learning-based approach uses only synthetic non-textured 3D CAD models for training and has no access to the images from the target domain. The image sequence is used to obtain a sparse 3D reconstruction of the scene via Structure from Motion. The domain gap is closed by relying on the intuition that geometric edges are the only prominent features that can be extracted from both the 3D models and the sparse reconstructions. Based on this assumption, we have developed a domain-invariant data preparation scheme and 3DKeypointNet, which is a neural network for detecting of the 3D keypoints in sparse and noisy point clouds. The final pose is estimated with RANSAC and a scale-aware point cloud alignment method. The proposed method has been tested on the T-LESS dataset and compared to methods also trained on synthetic data. The results indicate the potential of our method despite the fact that the entire pipeline is solely trained on synthetic data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Antoniou, A., Storkey, A., Edwards, H.: Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340 (2017)

  2. Avetisyan, A., Dai, A., Nießner, M.: End-to-end cad model retrieval and 9dof alignment in 3D scans. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2551–2560 (2019)

    Google Scholar 

  3. Birdal, T., Ilic, S.: Point pair features based object detection and pose estimation revisited. In: 2015 International Conference on 3D Vision, pp. 527–535. IEEE (2015)

    Google Scholar 

  4. Birdal, T., Ilic, S.: A point sampling algorithm for 3D matching of irregular geometries. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6871–6878. IEEE (2017)

    Google Scholar 

  5. Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., Krishnan, D.: Unsupervised pixel-level domain adaptation with generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3722–3731 (2017)

    Google Scholar 

  6. Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D., Erhan, D.: Domain separation networks. In: Advances in Neural Information Processing Systems, pp. 343–351 (2016)

    Google Scholar 

  7. Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_35

    Chapter  Google Scholar 

  8. Brachmann, E., Michel, F., Krull, A., Ying Yang, M., Gumhold, S., et al.: Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3364–3372 (2016)

    Google Scholar 

  9. Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915 (2017)

    Google Scholar 

  10. Coumans, E.: Bullet physics simulation. In: ACM SIGGRAPH 2015 Courses, p. 7. ACM (2015)

    Google Scholar 

  11. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)

    Google Scholar 

  12. Drost, B., Ilic, S.: 3D object detection and localization using multimodal point pair features. In: 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, pp. 9–16. IEEE (2012)

    Google Scholar 

  13. Drost, B., Ulrich, M., Bergmann, P., Hartinger, P., Steger, C.: Introducing MVTec ITODD-a dataset for 3D object recognition in industry. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2200–2208 (2017)

    Google Scholar 

  14. Drost, B., Ulrich, M., Navab, N., Ilic, S.: Model globally, match locally: efficient and robust 3D object recognition. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 998–1005. IEEE (2010)

    Google Scholar 

  15. Drummond, T., Cipolla, R.: Real-time visual tracking of complex structures. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 932–946 (2002). https://doi.org/10.1109/TPAMI.2002.1017620

    Article  Google Scholar 

  16. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  17. Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2096-2030 (2016)

    Google Scholar 

  18. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. (IJRR) 32(11), 1231–1237 (2013)

    Article  Google Scholar 

  19. Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_42

    Chapter  Google Scholar 

  20. Hinterstoisser, S., Lepetit, V., Rajkumar, N., Konolige, K.: Going further with point pair features. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 834–848. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_51

    Chapter  Google Scholar 

  21. Hodaň, T., Haluza, P., Obdržálek, Š., Matas, J., Lourakis, M., Zabulis, X.: T-LESS: an RGB-D dataset for 6D pose estimation of texture-less objects. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2017)

    Google Scholar 

  22. Hodaň, T., Matas, J., Obdržálek, Š.: On evaluation of 6D object pose estimation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 606–619. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_52

    Chapter  Google Scholar 

  23. Hodan, T., Melenovsky, A.: BOP: benchmark for 6D object pose estimation (2019). https://bop.felk.cvut.cz/home/

  24. Hodaň, T., et al.: BOP: benchmark for 6D object pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_2

    Chapter  Google Scholar 

  25. Hofer, M., Maurer, M., Bischof, H.: Efficient 3D scene abstraction using line segments. Comput. Vis. Image Underst. 157, 167–178 (2017)

    Article  Google Scholar 

  26. Kaskman, R., Zakharov, S., Shugurov, I., Ilic, S.: HomebrewedDB: RGB-D dataset for 6D pose estimation of 3D objects. In: The IEEE International Conference on Computer Vision (ICCV) Workshops, October 2019

    Google Scholar 

  27. Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1521–1529 (2017)

    Google Scholar 

  28. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  29. Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3D proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1–8. IEEE (2018)

    Google Scholar 

  30. Lee, H.-Y., Tseng, H.-Y., Huang, J.-B., Singh, M., Yang, M.-H.: Diverse image-to-image translation via disentangled representations. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 36–52. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_3

    Chapter  Google Scholar 

  31. Lepetit, V., Moreno-Noguer, F., Fua, P.: EP\(n\)P: an accurate \(O (n)\) solution to the P\(n\)P problem. Int. J. Comput. Vision 81(2) (2009). Article number: 155. https://doi.org/10.1007/s11263-008-0152-6

  32. Li, Y., Bu, R., Sun, M., Chen, B.: PointCNN. arXiv preprint arXiv:1801.07791 (2018)

  33. Li, Z., Wang, G., Ji, X.: CDPN: coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation. In: The IEEE International Conference on Computer Vision (ICCV), October 2019

    Google Scholar 

  34. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  35. Liu, X., Qi, C.R., Guibas, L.J.: FlowNet3D: learning scene flow in 3D point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 529–537 (2019)

    Google Scholar 

  36. Long, M., Cao, Y., Wang, J., Jordan, M.I.: Learning transferable features with deep adaptation networks. arXiv preprint arXiv:1502.02791 (2015)

  37. Manhardt, F., et al.: Explaining the ambiguity of object detection and 6D pose from visual data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6841–6850 (2019)

    Google Scholar 

  38. Park, K., Patten, T., Vincze, M.: Pix2Pose: pixel-wise coordinate regression of objects for 6D pose estimation. In: The IEEE International Conference on Computer Vision (ICCV), October 2019

    Google Scholar 

  39. Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS Autodiff Workshop (2017)

    Google Scholar 

  40. Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6DoF pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4561–4570 (2019)

    Google Scholar 

  41. Pitteri, G., Ilic, S., Lepetit, V.: CorNet: generic 3D corners for 6D pose estimation of new objects without retraining. In: The IEEE International Conference on Computer Vision (ICCV) Workshops, October 2019

    Google Scholar 

  42. Pitteri, G., Ramamonjisoa, M., Ilic, S., Lepetit, V.: On object symmetries and 6D pose estimation from images. In: 2019 International Conference on 3D Vision (3DV), pp. 614–622. IEEE (2019)

    Google Scholar 

  43. Planche, B., Zakharov, S., Wu, Z., Hutter, A., Kosch, H., Ilic, S.: Seeing beyond appearance-mapping real images into geometrical domains for unsupervised cad-based recognition. arXiv preprint arXiv:1810.04158 (2018)

  44. Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. arXiv preprint arXiv:1904.09664 (2019)

  45. Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum PointNets for 3D object detection from RGB-D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2018)

    Google Scholar 

  46. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)

    Google Scholar 

  47. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)

    Google Scholar 

  48. Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3828–3836 (2017)

    Google Scholar 

  49. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  50. Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)

    Google Scholar 

  51. Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2107–2116 (2017)

    Google Scholar 

  52. Sundermeyer, M., Marton, Z.C., Durner, M., Brucker, M., Triebel, R.: Implicit 3D orientation learning for 6D object detection from RGB images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 699–715 (2018)

    Google Scholar 

  53. Sundermeyer, M., Marton, Z.C., Durner, M., Triebel, R.: Augmented autoencoders: implicit 3D orientation learning for 6D object detection. Int. J. Comput. Vis. 128, 714–729 (2020). https://doi.org/10.1007/s11263-019-01243-810.1007/s11263-019-01243-8

    Article  Google Scholar 

  54. Taigman, Y., Polyak, A., Wolf, L.: Unsupervised cross-domain image generation. arXiv preprint arXiv:1611.02200 (2016)

  55. Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 292–301 (2018)

    Google Scholar 

  56. Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., Darrell, T.: Deep domain confusion: maximizing for domain invariance. arXiv preprint arXiv:1412.3474 (2014)

  57. Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 4, 376–380 (1991)

    Article  Google Scholar 

  58. Vidal, J., Lin, C.Y., Lladó, X., Martí, R.: A method for 6D pose estimation of free-form rigid objects using point pair features on range data. Sensors 18(8), 2678 (2018)

    Article  Google Scholar 

  59. Wang, C., et al.: DenseFusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3343–3352 (2019)

    Google Scholar 

  60. Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017)

  61. Yang, B., Luo, W., Urtasun, R.: PIXOR: real-time 3D object detection from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7652–7660 (2018)

    Google Scholar 

  62. Zakharov, S., Kehl, W., Ilic, S.: DeceptionNet: network-driven domain randomization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 532–541 (2019)

    Google Scholar 

  63. Zakharov, S., Planche, B., Wu, Z., Hutter, A., Kosch, H., Ilic, S.: Keep it unreal: bridging the realism gap for 2.5D recognition with geometry priors only. In: 3DV (2018)

    Google Scholar 

  64. Zakharov, S., Shugurov, I., Ilic, S.: DPOD: 6D pose object detector and refiner. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1941–1950 (2019)

    Google Scholar 

  65. Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018)

    Google Scholar 

Download references

Acknowledgements

We thank the authors of AAE [52, 53], DPOD [64] and PPF [12, 14] for providing us with the poses estimated with their detectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ivan Shugurov .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2783 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kaskman, R., Shugurov, I., Zakharov, S., Ilic, S. (2020). 6 DoF Pose Estimation of Textureless Objects from Multiple RGB Frames. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12536. Springer, Cham. https://doi.org/10.1007/978-3-030-66096-3_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66096-3_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66095-6

  • Online ISBN: 978-3-030-66096-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics