Skip to main content
Log in

Self-supervised monocular Depth estimation with multi-scale structure similarity loss

  • 1227: Content-based Image Retrieval
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The raw depth image captured by the depth sensor usually has an extensive range of missing depth values, and the incomplete depth map burdens many downstream vision tasks. In order to overcome the incorrect estimation issue of depth information with the original luminosity loss function for processing complex texture areas and distant moving objects, this paper proposes a self-supervised monocular depth estimation algorithm based on multi-scale structure similarity loss. So as to enhance the perception ability of the depth prediction network for pixel edges, this paper proposes a multi-scale structural similarity when calculating the loss. In addition, an attention mechanism is also added to the encoder stage of the deep prediction network. As a result, the network not only ignores the features with small contributions, but also strengthens the features assist judgment based on the adjustment of the feature map. Finally, the experiments on the KITTI dataset and Cityscapes are conducted, and then the results are compared and analyzed with the state-of-the-art algorithms. The experimental results demonstrate that the proposed algorithm achieves significant improvements in accuracy, especially on the KITTI dataset, whose precision is raised to 88.4%. Moreover, under the premise of outstanding accuracy, the visualization effect of depth estimation has also been significantly improved, especially in the scenes with multi-person overlap on Cityscapes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Ahmed SST, Thanuja K, Guptha NS, et al. (2016) Telemedicine approach for remote patient monitoring system using smart phones with an economical hardware kit[C]. 2016 international conference on computing technologies and intelligent data engineering (ICCTIDE'16), pp: 1–4

  2. Ali U, Bayramli B, Alsarhan T, Lu H (2021) A lightweight network for monocular depth estimation with decoupled body and edge supervision[J]. Image Vis Comput 113:104261

    Article  Google Scholar 

  3. Behley J, Arbade MG, Milioto A, et al. (2020) SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp:9297–9307

  4. Bian JW, Li Z, Wang N, et al. (2019) Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video[J], 32, pp:1–11.

  5. Casser V, Pirk S, Mahjourian R et al (2019) Depth Prediction without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos[C]. Thirty-Third AAAI Conf Artificial Intellig (AAAI’19) 33(01):8001–8008

    Google Scholar 

  6. Chen L, Kou Q, Cheng D, Yao J (2020) Content-guided deep residual network for single image super-resolution[J]. Optik 202:163678

    Article  Google Scholar 

  7. Dc A, Rl A, Jl A et al (2021) Activity guided multi-scales collaboration based on scaled-CNN for saliency prediction[J]. Image Vis Comput 114:104267

    Article  Google Scholar 

  8. Eigen D, Fergus R (2014) Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture[J]. IEEE:2650–2658

  9. Garg R, Bg VK, Carneiro G et al (2016) Unsupervised CNN for single view Depth estimation: geometry to the rescue[C]. European Conf Comput Vision 9912:740–756

    Google Scholar 

  10. Godard C, Aodha OM, Brostow GJ (2017) Unsupervised Monocular Depth Estimation with Left-Right Consistency[C]. Computer Vision & Pattern Recognition, pp:6602–6611.

  11. Godard C, Aodha OM, Firman M, et al. (2019) Digging Into Self-Supervised Monocular Depth Estimation[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp:3828–3838.

  12. Gordon A, Li H, Jonschkowski R, et al. (2019) Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp: 8976–8985.

  13. He K, Zhang X, Ren S, et al. (2016) Deep Residual Learning for Image Recognition[J]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp:770–778.

  14. Jung H, Park E, Yoo S (2021) Fine-grained semantics-aware representation enhancement for self-supervised monocular depth estimation[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, pp: 12642–12652

  15. Khan F, Salahuddin S, Javidnia H (2020) Deep learning-based monocular Depth estimation methods-a state-of-the-art review[J]. Sensors (Basel) 20(8):2272

    Article  Google Scholar 

  16. Klingner M, Termhlen J A, Mikolajczyk J, et al. (2020) Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance[J], pp:582–600.

  17. Laina I, Rupprecht C, Belagiannis V, et al. (2016) Deeper Depth Prediction with Fully Convolutional Residual Networks[C]. Fourth International Conference on 3d Vision, pp: 239–248.

  18. Li R, Wang S, Long Z, et al. (2017) UnDeepVO: Monocular Visual Odometry through Unsupervised Deep Learning[J], pp: 7286–729.

  19. Li J, Cheng D, Liu R, Kou Q, Zhao K (2021) Unsupervised person re-identification based on measurement Axis[J]. IEEE Signal Proces Lett 28:379–383

    Article  Google Scholar 

  20. Luo C, Yang Z, Peng W et al (2018) Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding[J]. IEEE Trans Pattern Anal Mach Intell 42:2624–2641

    Article  Google Scholar 

  21. Mathew A, Mathew J (2020) Monocular depth estimation with SPN loss[J]. Image Vis Comput 100:103934

    Article  Google Scholar 

  22. Mehta I, Sakurikar P, Narayanan PJ (2018) Structured Adversarial Training for Unsupervised Monocular Depth Estimation[C]. 2018 International Conference on 3D Vision (3DV), pp: 314–323.

  23. Meng Y, Lu Y, Raj A, et al. (2020) SIGNet: Semantic Instance Aided Unsupervised 3D Geometry Perception[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9802–9812

  24. Pillai S, Ambrus R, Gaidon A (2019) SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation[C]. 2019 International Conference on Robotics and Automation (ICRA), pp: 9250–9256.

  25. Poggi M, Tosi F, Mattoccia S (2018) Learning monocular depth estimation with unsupervised trinocular assumptions[C]. 2018 International Conference on 3D Vision (3DV), pp: 324–333.

  26. Praveena HD, Guptha NS, Kazemzadeh A, Parameshachari BD, Hemalatha KL (2022) Effective CBMIR System Using Hybrid Features-Based Independent Condensed Nearest Neighbor Model[J]. J Healthcare Engin 2022:1–9

    Article  Google Scholar 

  27. Ranftl R, Bochkovskiy A, Koltun V (2021) Vision transformers for dense prediction[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, pp: 12179–12188.

  28. Ranjan A, Jampani V, Balles L, et al. (2019) Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp: 12232–12241.

  29. Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation[J]. International Conference on Medical Image Computing and Computer-Assisted Intervention, pp:234–241.

  30. Rosa N, Guizilini V, Grassi V (2019) Sparse-to-Continuous: Enhancing Monocular Depth Estimation using Occupancy Maps[C]. 2019 19th International Conference on Advanced Robotics (ICAR), pp: 793–800.

  31. Schön M, Buchholz M, Dietmayer K (2021) Mgnet: Monocular geometric scene understanding for autonomous driving[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, pp: 15804–15815.

  32. Tosi F, Aleotti F, Poggi M, et al. (2019) Learning monocular depth estimation infusing traditional stereo knowledge[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp: 9799–9809.

  33. Wang Z, Simoncelli EP, Bovik AC (2003) Multiscale structural similarity for image quality assessment[C]. Proc IEEE Asilomar Conference on Signals, pp: 1398–1402.

  34. Wang C, Buenaposada JM, Rui Z, et al (2018) Learning Depth from Monocular Videos using Direct Methods[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp: 2022–2030.

  35. Wang H, Wang M, Che Z, et al. RGB-Depth Fusion GAN for Indoor Depth Completion[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp: 6209–6218.

  36. Wong A, Hong B W, Soatto S (2019) Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction[J]. IEEE, pp: 5627–5636.

  37. Zhan H, Garg R, Weerasekera C S, et al. (2018) Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction[J]. IEEE, pp: 340–349.

  38. Zhe C, Kar A, Haene C, et al. (2019) Learning Independent Object Motion from Unlabelled Stereoscopic Videos[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp: 5594–5603.

  39. Zhou J, Wang Y, Qin K, et al. (2019) Unsupervised High-Resolution Depth Learning From Videos With Dual Networks[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp: 6871–6880.

Download references

Acknowledgements

This paper is supported by the National Natural Science Foundation of China under Grant No. 51774281.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deqiang Cheng.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, C., Cheng, D., Kou, Q. et al. Self-supervised monocular Depth estimation with multi-scale structure similarity loss. Multimed Tools Appl 82, 38035–38050 (2023). https://doi.org/10.1007/s11042-022-14012-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-14012-6

Keywords

Navigation