Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects

Dai, Qiyu; Zhang, Jiyao; Li, Qiwei; Wu, Tianhao; Dong, Hao; Liu, Ziyuan; Tan, Ping; Wang, He

doi:10.1007/978-3-031-19842-7_22

Qiyu Dai¹²,
Jiyao Zhang¹³,
Qiwei Li¹²,
Tianhao Wu¹²,
Hao Dong¹²,
Ziyuan Liu¹⁴,
Ping Tan^14,15 &
…
He Wang¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13699))

Included in the following conference series:

European Conference on Computer Vision

3853 Accesses
22 Citations

Abstract

Commercial depth sensors usually generate noisy and missing depths, especially on specular and transparent objects, which poses critical issues to downstream depth or point cloud-based tasks. To mitigate this problem, we propose a powerful RGBD fusion network, SwinDRNet, for depth restoration. We further propose Domain Randomization-Enhanced Depth Simulation (DREDS) approach to simulate an active stereo depth system using physically based rendering and generate a large-scale synthetic dataset that contains 130K photorealistic RGB images along with their simulated depths carrying realistic sensor noises. To evaluate depth restoration methods, we also curate a real-world dataset, namely STD, that captures 30 cluttered scenes composed of 50 objects with different materials from specular, transparent, to diffuse. Experiments demonstrate that the proposed DREDS dataset bridges the sim-to-real domain gap such that, trained on DREDS, our SwinDRNet can seamlessly generalize to other real depth datasets, e.g. ClearGrasp, and outperform the competing methods on depth restoration. We further show that our depth restoration effectively boosts the performance of downstream tasks, including category-level pose estimation and grasping tasks. Our data and code are available at https://github.com/PKU-EPIC/DREDS.

Q. Dai and J. Zhang— Equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

UDCGN: Uncertainty-Driven Cross-Guided Network for Depth Completion of Transparent Objects

Unsupervised single-shot depth estimation using perceptual reconstruction

Article Open access 11 August 2023

2D GANs Meet Unsupervised Single-View 3D Reconstruction

References

Blender. https://www.blender.org/
Object capture API on macos. https://developer.apple.com/augmented-reality/object-capture/
Breyer, M., Chung, J.J., Ott, L., Roland, S., Juan, N.: Volumetric grasping network: Real-time 6 DOF grasp detection in clutter. In: Conference on Robot Learning (2020)
Google Scholar
Calli, B., et al.: Yale-CMU-berkeley dataset for robotic manipulation research. Int. J. Robot. Res. 36(3), 261–268 (2017)
Article Google Scholar
Chang, A.X., et al.: ShapeNet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)
Chen, K., Dou, Q.: SGPA: structure-guided prior adaptation for category-level 6D object pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2773–2782 (2021)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems 27 (2014)
Google Scholar
Fang, H.S., Wang, C., Gou, M., Lu, C.: Graspnet-1billion: a large-scale benchmark for general object grasping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11444–11453 (2020)
Google Scholar
He, Y., Sun, W., Huang, H., Liu, J., Fan, H., Sun, J.: PVN3D: a deep point-wise 3D keypoints voting network for 6DoF pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11632–11641 (2020)
Google Scholar
Hu, M., Wang, S., Li, B., Ning, S., Fan, L., Gong, X.: PENET: towards precise and efficient image guided depth completion. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13656–13662. IEEE (2021)
Google Scholar
Jiang, Z., Zhu, Y., Svetlik, M., Fang, K., Zhu, Y.: Synergies between affordance and geometry: 6-DoF grasp detection via implicit representations. Robot.: Sci. Syst. (2021)
Google Scholar
Jiao, J., Cao, Y., Song, Y., Lau, R.: Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss. In: Proceedings of the European conference on computer vision (ECCV), pp. 53–69 (2018)
Google Scholar
Khirodkar, R., Yoo, D., Kitani, K.: Domain randomization for scene-specific car detection and pose estimation. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1932–1940. IEEE (2019)
Google Scholar
Landau, M.J., Choo, B.Y., Beling, P.A.: Simulating kinect infrared and depth images. IEEE Trans. Cybernet. 46(12), 3018–3031 (2015)
Article Google Scholar
Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Long, X., et al.: Adaptive surface normal constraint for depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12849–12858 (2021)
Google Scholar
Mo, K., Guibas, L.J., Mukadam, M., Gupta, A., Tulsiani, S.: Where2Act: from pixels to actions for articulated 3D objects. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6813–6823 (2021)
Google Scholar
Mu, T., et al.: ManiSkill: generalizable manipulation skill benchmark with large-scale demonstrations. In: Annual Conference on Neural Information Processing Systems (NeurIPS) (2021)
Google Scholar
Park, J., Joo, K., Hu, Z., Liu, C.-K., So Kweon, I.: Non-local spatial propagation network for depth completion. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12358, pp. 120–136. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_8
Chapter Google Scholar
Peng, X.B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Sim-to-real transfer of robotic control with dynamics randomization. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 3803–3810. IEEE (2018)
Google Scholar
Planche, B., Singh, R.V.: Physics-based differentiable depth sensor simulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14387–14397 (2021)
Google Scholar
Planche, B., et al.: DepthSynth: real-time realistic synthetic data generation from cad models for 2.5 D recognition. In: 2017 International Conference on 3D Vision (3DV), pp. 1–10. IEEE (2017)
Google Scholar
Prakash, A., et al.: Structured domain randomization: bridging the reality gap by context-aware synthetic data. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 7249–7255. IEEE (2019)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems 30 (2017)
Google Scholar
Qu, C., Liu, W., Taylor, C.J.: Bayesian deep basis fitting for depth completion with uncertainty. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16147–16157 (2021)
Google Scholar
Sajjan, S., et al.: Clear grasp: 3D shape estimation of transparent objects for manipulation. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 3634–3642. IEEE (2020)
Google Scholar
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30. IEEE (2017)
Google Scholar
Tremblay, J., et al.: Training deep networks with synthetic data: bridging the reality gap by domain randomization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 969–977 (2018)
Google Scholar
Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity invariant CNNs. In: 2017 International Conference on 3D Vision (3DV), pp. 11–20. IEEE (2017)
Google Scholar
Van Gansbeke, W., Neven, D., De Brabandere, B., Van Gool, L.: Sparse and noisy lidar completion with RGB guidance and uncertainty. In: 2019 16th International Conference on Machine Vision Applications (MVA), pp. 1–6. IEEE (2019)
Google Scholar
Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2642–2651 (2019)
Google Scholar
Weng, Y., et al.: CAPTRA: category-level pose tracking for rigid and articulated objects from point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13209–13218 (2021)
Google Scholar
Xiong, X., Xiong, H., Xian, K., Zhao, C., Cao, Z., Li, X.: Sparse-to-dense depth completion revisited: sampling strategy and graph construction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 682–699. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_41
Chapter Google Scholar
Xu, H., Wang, Y.R., Eppel, S., Aspuru-Guzik, A., Shkurti, F., Garg, A.: Seeing glass: joint point-cloud and depth completion for transparent objects. In: 5th Annual Conference on Robot Learning (2021)
Google Scholar
Yue, X., Zhang, Y., Zhao, S., Sangiovanni-Vincentelli, A., Keutzer, K., Gong, B.: Domain randomization and pyramid consistency: Simulation-to-real generalization without accessing target domain data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2100–2110 (2019)
Google Scholar
Zakharov, S., Kehl, W., Ilic, S.: DeceptionNet: network-driven domain randomization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 532–541 (2019)
Google Scholar
Zhang, X., et al.: Close the visual domain gap by physics-grounded active stereovision depth sensor simulation. arXiv preprint arXiv:2201.11924 (2022)
Zhu, L., et al.: RGB-D local implicit function for depth completion of transparent objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4649–4658 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Peking University, Beijing, China
Qiyu Dai, Qiwei Li, Tianhao Wu, Hao Dong & He Wang
Xi’an Jiaotong University, Xi’an, China
Jiyao Zhang
Alibaba XR Lab, Beijing, China
Ziyuan Liu & Ping Tan
Simon Fraser University, Burnaby, Canada
Ping Tan

Authors

Qiyu Dai
View author publications
You can also search for this author in PubMed Google Scholar
Jiyao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qiwei Li
View author publications
You can also search for this author in PubMed Google Scholar
Tianhao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Dong
View author publications
You can also search for this author in PubMed Google Scholar
Ziyuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ping Tan
View author publications
You can also search for this author in PubMed Google Scholar
He Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to He Wang .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 15115 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dai, Q. et al. (2022). Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13699. Springer, Cham. https://doi.org/10.1007/978-3-031-19842-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-19842-7_22
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19841-0
Online ISBN: 978-3-031-19842-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects