Skip to main content

Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13699))

Included in the following conference series:

  • 3747 Accesses

Abstract

Commercial depth sensors usually generate noisy and missing depths, especially on specular and transparent objects, which poses critical issues to downstream depth or point cloud-based tasks. To mitigate this problem, we propose a powerful RGBD fusion network, SwinDRNet, for depth restoration. We further propose Domain Randomization-Enhanced Depth Simulation (DREDS) approach to simulate an active stereo depth system using physically based rendering and generate a large-scale synthetic dataset that contains 130K photorealistic RGB images along with their simulated depths carrying realistic sensor noises. To evaluate depth restoration methods, we also curate a real-world dataset, namely STD, that captures 30 cluttered scenes composed of 50 objects with different materials from specular, transparent, to diffuse. Experiments demonstrate that the proposed DREDS dataset bridges the sim-to-real domain gap such that, trained on DREDS, our SwinDRNet can seamlessly generalize to other real depth datasets, e.g. ClearGrasp, and outperform the competing methods on depth restoration. We further show that our depth restoration effectively boosts the performance of downstream tasks, including category-level pose estimation and grasping tasks. Our data and code are available at https://github.com/PKU-EPIC/DREDS.

Q. Dai and J. Zhang— Equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Blender. https://www.blender.org/

  2. Object capture API on macos. https://developer.apple.com/augmented-reality/object-capture/

  3. Breyer, M., Chung, J.J., Ott, L., Roland, S., Juan, N.: Volumetric grasping network: Real-time 6 DOF grasp detection in clutter. In: Conference on Robot Learning (2020)

    Google Scholar 

  4. Calli, B., et al.: Yale-CMU-berkeley dataset for robotic manipulation research. Int. J. Robot. Res. 36(3), 261–268 (2017)

    Article  Google Scholar 

  5. Chang, A.X., et al.: ShapeNet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)

  6. Chen, K., Dou, Q.: SGPA: structure-guided prior adaptation for category-level 6D object pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2773–2782 (2021)

    Google Scholar 

  7. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems 27 (2014)

    Google Scholar 

  8. Fang, H.S., Wang, C., Gou, M., Lu, C.: Graspnet-1billion: a large-scale benchmark for general object grasping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11444–11453 (2020)

    Google Scholar 

  9. He, Y., Sun, W., Huang, H., Liu, J., Fan, H., Sun, J.: PVN3D: a deep point-wise 3D keypoints voting network for 6DoF pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11632–11641 (2020)

    Google Scholar 

  10. Hu, M., Wang, S., Li, B., Ning, S., Fan, L., Gong, X.: PENET: towards precise and efficient image guided depth completion. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13656–13662. IEEE (2021)

    Google Scholar 

  11. Jiang, Z., Zhu, Y., Svetlik, M., Fang, K., Zhu, Y.: Synergies between affordance and geometry: 6-DoF grasp detection via implicit representations. Robot.: Sci. Syst. (2021)

    Google Scholar 

  12. Jiao, J., Cao, Y., Song, Y., Lau, R.: Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss. In: Proceedings of the European conference on computer vision (ECCV), pp. 53–69 (2018)

    Google Scholar 

  13. Khirodkar, R., Yoo, D., Kitani, K.: Domain randomization for scene-specific car detection and pose estimation. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1932–1940. IEEE (2019)

    Google Scholar 

  14. Landau, M.J., Choo, B.Y., Beling, P.A.: Simulating kinect infrared and depth images. IEEE Trans. Cybernet. 46(12), 3018–3031 (2015)

    Article  Google Scholar 

  15. Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

    Google Scholar 

  16. Long, X., et al.: Adaptive surface normal constraint for depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12849–12858 (2021)

    Google Scholar 

  17. Mo, K., Guibas, L.J., Mukadam, M., Gupta, A., Tulsiani, S.: Where2Act: from pixels to actions for articulated 3D objects. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6813–6823 (2021)

    Google Scholar 

  18. Mu, T., et al.: ManiSkill: generalizable manipulation skill benchmark with large-scale demonstrations. In: Annual Conference on Neural Information Processing Systems (NeurIPS) (2021)

    Google Scholar 

  19. Park, J., Joo, K., Hu, Z., Liu, C.-K., So Kweon, I.: Non-local spatial propagation network for depth completion. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12358, pp. 120–136. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_8

    Chapter  Google Scholar 

  20. Peng, X.B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Sim-to-real transfer of robotic control with dynamics randomization. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 3803–3810. IEEE (2018)

    Google Scholar 

  21. Planche, B., Singh, R.V.: Physics-based differentiable depth sensor simulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14387–14397 (2021)

    Google Scholar 

  22. Planche, B., et al.: DepthSynth: real-time realistic synthetic data generation from cad models for 2.5 D recognition. In: 2017 International Conference on 3D Vision (3DV), pp. 1–10. IEEE (2017)

    Google Scholar 

  23. Prakash, A., et al.: Structured domain randomization: bridging the reality gap by context-aware synthetic data. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 7249–7255. IEEE (2019)

    Google Scholar 

  24. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems 30 (2017)

    Google Scholar 

  25. Qu, C., Liu, W., Taylor, C.J.: Bayesian deep basis fitting for depth completion with uncertainty. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16147–16157 (2021)

    Google Scholar 

  26. Sajjan, S., et al.: Clear grasp: 3D shape estimation of transparent objects for manipulation. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 3634–3642. IEEE (2020)

    Google Scholar 

  27. Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30. IEEE (2017)

    Google Scholar 

  28. Tremblay, J., et al.: Training deep networks with synthetic data: bridging the reality gap by domain randomization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 969–977 (2018)

    Google Scholar 

  29. Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity invariant CNNs. In: 2017 International Conference on 3D Vision (3DV), pp. 11–20. IEEE (2017)

    Google Scholar 

  30. Van Gansbeke, W., Neven, D., De Brabandere, B., Van Gool, L.: Sparse and noisy lidar completion with RGB guidance and uncertainty. In: 2019 16th International Conference on Machine Vision Applications (MVA), pp. 1–6. IEEE (2019)

    Google Scholar 

  31. Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2642–2651 (2019)

    Google Scholar 

  32. Weng, Y., et al.: CAPTRA: category-level pose tracking for rigid and articulated objects from point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13209–13218 (2021)

    Google Scholar 

  33. Xiong, X., Xiong, H., Xian, K., Zhao, C., Cao, Z., Li, X.: Sparse-to-dense depth completion revisited: sampling strategy and graph construction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 682–699. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_41

    Chapter  Google Scholar 

  34. Xu, H., Wang, Y.R., Eppel, S., Aspuru-Guzik, A., Shkurti, F., Garg, A.: Seeing glass: joint point-cloud and depth completion for transparent objects. In: 5th Annual Conference on Robot Learning (2021)

    Google Scholar 

  35. Yue, X., Zhang, Y., Zhao, S., Sangiovanni-Vincentelli, A., Keutzer, K., Gong, B.: Domain randomization and pyramid consistency: Simulation-to-real generalization without accessing target domain data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2100–2110 (2019)

    Google Scholar 

  36. Zakharov, S., Kehl, W., Ilic, S.: DeceptionNet: network-driven domain randomization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 532–541 (2019)

    Google Scholar 

  37. Zhang, X., et al.: Close the visual domain gap by physics-grounded active stereovision depth sensor simulation. arXiv preprint arXiv:2201.11924 (2022)

  38. Zhu, L., et al.: RGB-D local implicit function for depth completion of transparent objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4649–4658 (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to He Wang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 15115 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dai, Q. et al. (2022). Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13699. Springer, Cham. https://doi.org/10.1007/978-3-031-19842-7_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19842-7_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19841-0

  • Online ISBN: 978-3-031-19842-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics