Context Dual-Branch Attention Network for Depth Completion of Transparent Object

Hu, Yutao; Wang, Zheng; Chen, Jiacheng; Wang, Wanliang

doi:10.1007/978-3-031-13841-6_54

Context Dual-Branch Attention Network for Depth Completion of Transparent Object

Yutao Hu¹⁴,
Zheng Wang¹⁵,
Jiacheng Chen¹⁴ &
…
Wanliang Wang¹⁴

Conference paper
First Online: 10 August 2022

2655 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13458))

Abstract

With the rise in popularity of RGB-D cameras, vision-based robot approaches that rely on the depth information given by RGB-D cameras are gaining favor. However, because of their reflection and refraction features, transparent objects, which are a prevalent part of our daily lives, are difficult to distinguish and locate with an RGB-D camera. To overcome this issue, we present DCTNet, a novel technique for depth completion of transparent objects, in this study. DCTNet is a dual-branch approach that uses a single RGB-D picture to complete the depth of a transparent end-to-end. We apply MSSA, a multi-scale spatial attention technique, to fuse distinct branch feature maps to improve the depth completion results even more. Experiments show that when compared to ClearGrasp, our approach produces much better performance and improves inference speed.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Lysenkov, I., Eruhimov, V., Bradski, G.: Recognition and pose estimation of rigid transparent objects with a kinect sensor. Robotics 273(273–280), 2 (2013)
Google Scholar
Phillips, C.J., Lecce, M., Daniilidis, K.: Seeing glassware: from edge detection to pose estimation and shape recovery. In: Robotics: Science and Systems, vol. 3, p. 3 (2016)
Google Scholar
Han, K., Wong, K.-Y.K., Liu, M.: A fixed viewpoint approach for dense reconstruction of transparent objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4001–4008 (2015)
Google Scholar
Qian, Y., Gong, M., Yang, Y.H.: 3d reconstruction of transparent objects with position-normal consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4369–4377 (2016)
Google Scholar
Sajjan, S.: Clear grasp: 3d shape estimation of transparent objects for manipulation. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 3634–3642. IEEE (2020)
Google Scholar
Van Gansbeke, W., Neven, D., De Brabandere, B., Van Gool, L.: Sparse and noisy lidar completion with RGB guidance and uncertainty. In: 2019 16th International Conference on Machine Vision Applications (MVA), pp. 1–6. IEEE (2019)
Google Scholar
Hu, M., Wang, S., Li, B., Ning, S., Fan, L., Gong, X.: PENet: towards precise and efficient image guided depth completion. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13656–13662. IEEE (2021)
Google Scholar
Qiu, J., et al.: DeepLiDAR: deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3313–3322 (2019)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems 27 (2014)
Google Scholar
Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–2011 (2018)
Google Scholar
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 239–248. IEEE (2016)
Google Scholar
Chen, W., Fu, Z., Yang, D., Deng, J.: Single-image depth perception in the wild. In: Advances in Neural Information Processing Systems 29 (2016)
Google Scholar
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658 (2015)
Google Scholar
Ma, F., Karaman, S.: Sparse-to-dense: depth prediction from sparse depth samples and a single image. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 4796–4803. IEEE (2018)
Google Scholar
Chen, Y., Yang, B., Liang, M., Urtasun, R.: Learning joint 2d–3d representations for depth completion. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10023–10032 (2019)
Google Scholar
Xu, Y., Zhu, X., Shi, J., Zhang, G., Bao, H., Li, H.: Depth completion from sparse lidar data with depth-normal constraints. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2811–2820 (2019)
Google Scholar
Zhang, Y., Funkhouser, T.: Deep depth completion of a single RGB-D image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 175–185 (2018)
Google Scholar
Zhu, L.: RGB-D local implicit function for depth completion of transparent objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4649–4658 (2021)
Google Scholar
Xu, H., Wang, Y.R., Eppel, S., Aspuru-Guzik, A., Shkurti, F., Garg, A.: Seeing glass: joint point cloud and depth completion for transparent objects. arXiv preprint arXiv:2110.00087 (2021)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30 (2017)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Hu, J., Shen, L., Albanie, S., Sun, G., Vedaldi, A.: Gather-excite: exploiting feature context in convolutional neural networks. In: Advances in Neural Information Processing Systems 31 (2018)
Google Scholar
Gao, Z., Xie, J., Wang, Q., Li, P.: Global second-order pooling convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3024–3033 (2019)
Google Scholar
Dai, J.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Lin, X., Ma, L., Liu, W., Chang, S.-F.: Context-gated convolution. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 701–718. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_41
Chapter Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar

Download references

Acknowledgments

This work was partly supported by the National Natural Science Foundation of China (No. 61873240) and the Foundation of State Key Laboratory of Digital Manufacturing Equipment and Technology (Grant No. DMETKF2022024).

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, 310000, Zhejiang, China
Yutao Hu, Jiacheng Chen & Wanliang Wang
School of Computer and Computational Science, Zhejiang University City College, Hangzhou, 310000, Zhejiang, China
Zheng Wang

Authors

Yutao Hu
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiacheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wanliang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zheng Wang .

Editor information

Editors and Affiliations

Harbin Institute of Technology, Shenzhen, China
Honghai Liu
Huazhong University of Science and Technology, Wuhan, China
Zhouping Yin
Shenyang Institute of Automation, Shenyang, Liaoning, China
Lianqing Liu
Harbin Institute of Technology, Harbin, China
Li Jiang
Shanghai Jiao Tong University, Shanghai, China
Guoying Gu
Shenzhen Institute of Advanced Technology, Shenzhen, China
Xinyu Wu
Harbin Institute of Technology, Shenzhen, China
Weihong Ren

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, Y., Wang, Z., Chen, J., Wang, W. (2022). Context Dual-Branch Attention Network for Depth Completion of Transparent Object. In: Liu, H., et al. Intelligent Robotics and Applications. ICIRA 2022. Lecture Notes in Computer Science(), vol 13458. Springer, Cham. https://doi.org/10.1007/978-3-031-13841-6_54

Download citation

DOI: https://doi.org/10.1007/978-3-031-13841-6_54
Published: 10 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13840-9
Online ISBN: 978-3-031-13841-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics