Skip to main content
Log in

Self-supervised monocular depth estimation based on pseudo-pose guidance and grid regularization

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Self-supervised monocular depth estimation (SMDE) has emerged as a promising alternative to generate a dense depth map in outdoor scenarios because of its low requirements for training data and sensors. However, training with only consecutive temporal frames without depth ground truth causes problems such as a lack of global supervision information and inaccurate depth estimation in low-texture areas. In this study, we propose a pseudo-label generation (PLG) module, a two-stream pose estimation (TSPE) structure, and a grid regularization loss function to address these issues. Here, the PLG is used to automatically generate pseudo-grid and pseudo-pose in the data preprocessing stage. The pseudo-grid provides reliable global position and direction information for supervision, while the pseudo-pose is used in TSPE to provide more geometric information. The TSPE fuses the geometry-based pseudo-pose and the network-based pose with a simple structure. The generalization of the pose estimation is improved by providing more geometric information. By handling the negligible depth error in the low-texture area, the proposed grid regularization loss function improves the depth estimation performance. Experiments show that our methods can improve the depth estimation performance, especially in the object boundary and low-texture area, with no additional training data or model parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Luo X, Huang JB, Szeliski R, Matzen K, Kopf J (2020) Consistent video depth estimation. ACM Trans Graph (TOG) 39(4):71–1

    Article  Google Scholar 

  2. Wang Y, Chao WL, Garg D, Hariharan B, Campbell M, Weinberger KQ (2019) Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8445–8453

  3. Zhu K, Jiang X, Fang Z, Gao Y, Fujita H, Hwang JN (2021) Photometric transfer for direct visual odometry. Knowl-Based Syst 213:106671

    Article  Google Scholar 

  4. Guizilini V, Ambrus R, Pillai S, Raventos A, Gaidon A (2020) 3d Packing for self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2485–2494

  5. Zhao C, Sun Q, Zhang C, Tang Y, Qian F (2020) Monocular depth estimation based on deep learning: an overview. Sci China Technol Sci, pp 1–16

  6. Xu H, Liu N (2021) Detail-preserving depth estimation from a single image based on modified fully convolutional residual network and gradient network. SN Applied Sciences 3(12):1–15

    Article  Google Scholar 

  7. Godard C, Mac Aodha O, Firman M, Brostow GJ (2019) Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3828–3838

  8. GonzalezBello JL, Kim M (2020) Forget about the lidar: Self-supervised depth estimators with med probability volumes. Adv Neural Inf Process Syst 33:12626–12637

    Google Scholar 

  9. Xue F, Zhuo G, Huang Z, Fu W, Wu Z, Ang MH (2020) Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications. In: 2020 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 2330–2337. https://doi.org/10.1109/IROS45743.2020.9340802https://doi.org/10.1109/IROS45743.2020.9340802

  10. Wu Z, Zhuo G, Xue F (2020) Self-supervised monocular depth estimation scale recovery using ransac outlier removal

  11. Song X, Li W, Zhou D, Dai Y, Fang J, Li H, Zhang L (2021) Mlda-net: Multi-level dual attention-based network for self-supervised monocular depth estimation. IEEE Trans Image Process 30:4691–4705

    Article  Google Scholar 

  12. Chen X, Wang Y, Chen X, Zeng W (2021) S2r-depthnet: Learning a generalizable depth-specific structural representation . In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3034–3043

  13. Kumar VR, Klingner M, Yogamani S, Milz S, Fingscheidt T, Mader P (2021) Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 61–71

  14. Klingner M, Termöhlen JA, Mikolajczyk J, Fingscheidt T (2020) Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance. In: European conference on computer vision, pp 582–600. Springer

  15. Zhu S, Brazil G, Liu X (2020) The edge of depth: Explicit constraints between segmentation and depth. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13116–13125

  16. Garg R, Bg VK, Carneiro G, Reid I (2016) Unsupervised cnn for single view depth estimation: Geometry to the rescue. In: European conference on computer vision, pp 740–756. Springer

  17. Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 270–279

  18. Li K, Fu Z, Wang H, Chen Z, Guo Y (2021) Adv-depth: Self-supervised monocular depth estimation with an adversarial loss. IEEE Signal Process Lett 28:638–642. https://doi.org/10.1109/LSP.2021.3065203

  19. Zheng C, Cham TJ, Cai J (2018) T2net: Synthetic-to-realistic translation for solving single-image depth estimation tasks. In: Proceedings of the european conference on computer vision (ECCV), pp 767–783

  20. Sattler T, Zhou Q, Pollefeys M, Leal-Taixe L (2019) Understanding the limitations of cnn-based absolute camera pose regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3302–3312

  21. Jaderberg M, Simonyan K, Zisserman A et al (2015) Spatial transformer networks. Adv Neural Inf Process Syst 28:2017–2025

    Google Scholar 

  22. Nguyen T, Chen SW, Shivakumar SS, Taylor CJ, Kumar V (2018) Unsupervised deep homography: a fast and robust homography estimation model. IEEE Robot Autom Lett 3(3):2346–2353

    Article  Google Scholar 

  23. Tao Y, Ling Z (2020) Deep features homography transformation fusion network—a universal foreground segmentation algorithm for ptz cameras and a comparative study. Sensors 20(12):3420

    Article  Google Scholar 

  24. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60 (2):91–110

    Article  Google Scholar 

  25. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vis Image Underst 110(3):346–359

    Article  Google Scholar 

  26. Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an efficient alternative to sift or surf. In: 2011 International conference on computer vision, pp 2564–2571. Ieee

  27. Wang H, Sang X, Chen D, Wang P, Yan B, Qi S, Ye X, Yao T (2021) Self-supervised learning of monocular depth estimation based on progressive strategy. IEEE Trans Comput Imaging 7:375–383

    Article  Google Scholar 

  28. Li J, Hu Q, Ai M (2021) Point cloud registration based on one-point ransac and scale-annealing biweight estimation. IEEE Trans Geosci Remote Sens

  29. Zhang YF, Thorburn PJ, Xiang W, Fitch P (2019) Ssim—a deep learning approach for recovering missing time series sensor data. IEEE Internet Things J 6(4):6618–6628

    Article  Google Scholar 

  30. Yin KL, Pu YF, Lu L (2020) Combination of fractional flann filters for solving the van der pol-duffing oscillator. Neurocomputing 399:183–192

    Article  Google Scholar 

  31. Wang N, He H (2019) Adaptive homography-based visual servo for micro unmanned surface vehicles. Int J Adv Manuf Technol 105(12):4875–4882

    Article  Google Scholar 

  32. Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: The kitti dataset. Int J Robot Res (IJRR)

  33. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: Conference on computer vision and pattern recognition (CVPR)

  34. Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1851–1858

  35. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:8026–8037

    Google Scholar 

  36. Xiao J, Li H, Qu G, Fujita H, Cao Y, Zhu J, Huang C (2021) Hope: heatmap and offset for pose estimation. J Ambient Intell Humaniz Comput, pp 1–13

  37. Zhan H, Garg R, Weerasekera CS, Li K, Agarwal H, Reid I (2018) Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 340–349

  38. Chen PY, Liu AH, Liu YC, Wang YCF (2019) Towards scene understanding: Unsupervised monocular depth estimation with semantic-aware representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2624–2632

  39. Lei Z, Wang Y, Li Z, Yang J (2021) Attention based multilayer feature fusion convolutional neural network for unsupervised monocular depth estimation. Neurocomputing 423:343–352

    Article  Google Scholar 

  40. Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

Download references

Acknowledgements

This work was supported in part by the National Key Research and Development Program of China (No. 2018YFB2101300), in part by the National Natural Science Foundation of China (Grant No. 61871186), and in part by the Dean’s Fund of Engineering Research Center of Software/Hardware Co-design Technology and Application, Ministry of Education (East China Normal University).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiting Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiao, Y., Chen, W. & Wang, J. Self-supervised monocular depth estimation based on pseudo-pose guidance and grid regularization. Appl Intell 53, 10149–10161 (2023). https://doi.org/10.1007/s10489-022-04006-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04006-0

Keywords

Navigation