Self-supervised monocular depth estimation based on pseudo-pose guidance and grid regularization

Xiao, Ying; Chen, Weiting; Wang, Jiangtao

doi:10.1007/s10489-022-04006-0

Self-supervised monocular depth estimation based on pseudo-pose guidance and grid regularization

Published: 15 August 2022

Volume 53, pages 10149–10161, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Ying Xiao¹,
Weiting Chen¹ &
Jiangtao Wang¹

261 Accesses
1 Altmetric
Explore all metrics

Abstract

Self-supervised monocular depth estimation (SMDE) has emerged as a promising alternative to generate a dense depth map in outdoor scenarios because of its low requirements for training data and sensors. However, training with only consecutive temporal frames without depth ground truth causes problems such as a lack of global supervision information and inaccurate depth estimation in low-texture areas. In this study, we propose a pseudo-label generation (PLG) module, a two-stream pose estimation (TSPE) structure, and a grid regularization loss function to address these issues. Here, the PLG is used to automatically generate pseudo-grid and pseudo-pose in the data preprocessing stage. The pseudo-grid provides reliable global position and direction information for supervision, while the pseudo-pose is used in TSPE to provide more geometric information. The TSPE fuses the geometry-based pseudo-pose and the network-based pose with a simple structure. The generalization of the pose estimation is improved by providing more geometric information. By handling the negligible depth error in the low-texture area, the proposed grid regularization loss function improves the depth estimation performance. Experiments show that our methods can improve the depth estimation performance, especially in the object boundary and low-texture area, with no additional training data or model parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D-PL: Domain Adaptive Depth Estimation with 3D-Aware Pseudo-Labeling

Geometry Meets Semantics for Semi-supervised Monocular Depth Estimation

Transferring knowledge from monocular completion for self-supervised monocular depth estimation

Article 24 July 2021

References

Luo X, Huang JB, Szeliski R, Matzen K, Kopf J (2020) Consistent video depth estimation. ACM Trans Graph (TOG) 39(4):71–1
Article Google Scholar
Wang Y, Chao WL, Garg D, Hariharan B, Campbell M, Weinberger KQ (2019) Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8445–8453
Zhu K, Jiang X, Fang Z, Gao Y, Fujita H, Hwang JN (2021) Photometric transfer for direct visual odometry. Knowl-Based Syst 213:106671
Article Google Scholar
Guizilini V, Ambrus R, Pillai S, Raventos A, Gaidon A (2020) 3d Packing for self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2485–2494
Zhao C, Sun Q, Zhang C, Tang Y, Qian F (2020) Monocular depth estimation based on deep learning: an overview. Sci China Technol Sci, pp 1–16
Xu H, Liu N (2021) Detail-preserving depth estimation from a single image based on modified fully convolutional residual network and gradient network. SN Applied Sciences 3(12):1–15
Article Google Scholar
Godard C, Mac Aodha O, Firman M, Brostow GJ (2019) Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3828–3838
GonzalezBello JL, Kim M (2020) Forget about the lidar: Self-supervised depth estimators with med probability volumes. Adv Neural Inf Process Syst 33:12626–12637
Google Scholar
Xue F, Zhuo G, Huang Z, Fu W, Wu Z, Ang MH (2020) Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications. In: 2020 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 2330–2337. https://doi.org/10.1109/IROS45743.2020.9340802 https://doi.org/10.1109/IROS45743.2020.9340802
Wu Z, Zhuo G, Xue F (2020) Self-supervised monocular depth estimation scale recovery using ransac outlier removal
Song X, Li W, Zhou D, Dai Y, Fang J, Li H, Zhang L (2021) Mlda-net: Multi-level dual attention-based network for self-supervised monocular depth estimation. IEEE Trans Image Process 30:4691–4705
Article Google Scholar
Chen X, Wang Y, Chen X, Zeng W (2021) S2r-depthnet: Learning a generalizable depth-specific structural representation . In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3034–3043
Kumar VR, Klingner M, Yogamani S, Milz S, Fingscheidt T, Mader P (2021) Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 61–71
Klingner M, Termöhlen JA, Mikolajczyk J, Fingscheidt T (2020) Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance. In: European conference on computer vision, pp 582–600. Springer
Zhu S, Brazil G, Liu X (2020) The edge of depth: Explicit constraints between segmentation and depth. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13116–13125
Garg R, Bg VK, Carneiro G, Reid I (2016) Unsupervised cnn for single view depth estimation: Geometry to the rescue. In: European conference on computer vision, pp 740–756. Springer
Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 270–279
Li K, Fu Z, Wang H, Chen Z, Guo Y (2021) Adv-depth: Self-supervised monocular depth estimation with an adversarial loss. IEEE Signal Process Lett 28:638–642. https://doi.org/10.1109/LSP.2021.3065203
Zheng C, Cham TJ, Cai J (2018) T2net: Synthetic-to-realistic translation for solving single-image depth estimation tasks. In: Proceedings of the european conference on computer vision (ECCV), pp 767–783
Sattler T, Zhou Q, Pollefeys M, Leal-Taixe L (2019) Understanding the limitations of cnn-based absolute camera pose regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3302–3312
Jaderberg M, Simonyan K, Zisserman A et al (2015) Spatial transformer networks. Adv Neural Inf Process Syst 28:2017–2025
Google Scholar
Nguyen T, Chen SW, Shivakumar SS, Taylor CJ, Kumar V (2018) Unsupervised deep homography: a fast and robust homography estimation model. IEEE Robot Autom Lett 3(3):2346–2353
Article Google Scholar
Tao Y, Ling Z (2020) Deep features homography transformation fusion network—a universal foreground segmentation algorithm for ptz cameras and a comparative study. Sensors 20(12):3420
Article Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60 (2):91–110
Article Google Scholar
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vis Image Underst 110(3):346–359
Article Google Scholar
Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an efficient alternative to sift or surf. In: 2011 International conference on computer vision, pp 2564–2571. Ieee
Wang H, Sang X, Chen D, Wang P, Yan B, Qi S, Ye X, Yao T (2021) Self-supervised learning of monocular depth estimation based on progressive strategy. IEEE Trans Comput Imaging 7:375–383
Article Google Scholar
Li J, Hu Q, Ai M (2021) Point cloud registration based on one-point ransac and scale-annealing biweight estimation. IEEE Trans Geosci Remote Sens
Zhang YF, Thorburn PJ, Xiang W, Fitch P (2019) Ssim—a deep learning approach for recovering missing time series sensor data. IEEE Internet Things J 6(4):6618–6628
Article Google Scholar
Yin KL, Pu YF, Lu L (2020) Combination of fractional flann filters for solving the van der pol-duffing oscillator. Neurocomputing 399:183–192
Article Google Scholar
Wang N, He H (2019) Adaptive homography-based visual servo for micro unmanned surface vehicles. Int J Adv Manuf Technol 105(12):4875–4882
Article Google Scholar
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: The kitti dataset. Int J Robot Res (IJRR)
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: Conference on computer vision and pattern recognition (CVPR)
Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1851–1858
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:8026–8037
Google Scholar
Xiao J, Li H, Qu G, Fujita H, Cao Y, Zhu J, Huang C (2021) Hope: heatmap and offset for pose estimation. J Ambient Intell Humaniz Comput, pp 1–13
Zhan H, Garg R, Weerasekera CS, Li K, Agarwal H, Reid I (2018) Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 340–349
Chen PY, Liu AH, Liu YC, Wang YCF (2019) Towards scene understanding: Unsupervised monocular depth estimation with semantic-aware representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2624–2632
Lei Z, Wang Y, Li Z, Yang J (2021) Attention based multilayer feature fusion convolutional neural network for unsupervised monocular depth estimation. Neurocomputing 423:343–352
Article Google Scholar
Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

Download references

Acknowledgements

This work was supported in part by the National Key Research and Development Program of China (No. 2018YFB2101300), in part by the National Natural Science Foundation of China (Grant No. 61871186), and in part by the Dean’s Fund of Engineering Research Center of Software/Hardware Co-design Technology and Application, Ministry of Education (East China Normal University).

Author information

Authors and Affiliations

MOE Research Center of Software/Hardware Co-Design Engineering, East China Normal University, Shanghai, China
Ying Xiao, Weiting Chen & Jiangtao Wang

Authors

Ying Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Weiting Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiangtao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiting Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiao, Y., Chen, W. & Wang, J. Self-supervised monocular depth estimation based on pseudo-pose guidance and grid regularization. Appl Intell 53, 10149–10161 (2023). https://doi.org/10.1007/s10489-022-04006-0

Download citation

Accepted: 13 July 2022
Published: 15 August 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10489-022-04006-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-supervised monocular depth estimation based on pseudo-pose guidance and grid regularization

Abstract

Access this article

Similar content being viewed by others

3D-PL: Domain Adaptive Depth Estimation with 3D-Aware Pseudo-Labeling

Geometry Meets Semantics for Semi-supervised Monocular Depth Estimation

Transferring knowledge from monocular completion for self-supervised monocular depth estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Self-supervised monocular depth estimation based on pseudo-pose guidance and grid regularization

Abstract

Access this article

Similar content being viewed by others

3D-PL: Domain Adaptive Depth Estimation with 3D-Aware Pseudo-Labeling

Geometry Meets Semantics for Semi-supervised Monocular Depth Estimation

Transferring knowledge from monocular completion for self-supervised monocular depth estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation