Skip to main content
Log in

Transferring knowledge from monocular completion for self-supervised monocular depth estimation

  • 1221: Deep Learning for Image/Video Compression and Visual Quality Assessment
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Monocular depth estimation is a very challenging task in computer vision, with the goal to predict per-pixel depth from a single RGB image. Supervised learning methods require large amounts of depth measurement data, which are time-consuming and expensive to obtain. Self-supervised methods are showing great promise, exploiting geometry to provide supervision signals through image warping. Moreover, several works leverage on other visual tasks (e.g. stereo matching and semantic segmentation) to further advance self-supervised monocular depth estimation. In this paper, we propose a novel framework utilizing monocular depth completion as an auxiliary task to assist monocular depth estimation. In particular, a knowledge transfer strategy is employed to enable monocular depth estimation to benefit from the effective feature representations learned by monocular depth completion task. The correlation between monocular depth completion and monocular depth estimation could be fully and effectively utilized in this framework. Only unlabeled stereo images are used in the proposed framework, which achieves a self-supervised learning paradigm. Experimental results on publicly available dataset prove that the proposed approach achieves superior performance to state-of-the-art self-supervised methods and comparable performance with supervised methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Abadi M, Agarwal A, Barham P, et al. (2015) TensorFlow: Large scale machine learning on heterogeneous systems

  2. Atapour-Abarghouei A, Breckon TP (2018) Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2800–2810

  3. Cao Y, Wu Z, Shen C (2018) Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Trans Circuits Syst Video Tech 28(11):3174–3182

    Article  Google Scholar 

  4. Chen P, Liu AH, Liu Y, Wang YF (2019) Towards scene understanding: Unsupervised monocular depth estimation with semantic-aware representation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2619–2627

  5. Cordts M, Omran M, Ramos S et al (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223

  6. Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Advances in neural information processing systems, pp 2366–2374

  7. Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2002–2011

  8. Garg R, Carneiro G, Reid I (2016) Unsupervised CNN for single view depth estimation: Geometry to the rescue. In: European conference on computer vision, pp 740–756

  9. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3354–3361

  10. Godard C, Aodha OM, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 270–279

  11. Godard C, Aodha OM, Brostow GJ (2019) Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE international conference on computer vision, pp 3827–3837

  12. Guizilini V, Ambrus R, Pillai S et al (2020) 3D packing for self-supervised monocular depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2482–2491

  13. Guizilini V, Hou R, Li J et al (2020) Semantically-guided representation learning for self-supervised monocular depth. In: Proceedings of the eighth international conference on learning representations, pp 1–14

  14. Guo X, Li H, Yi S, Ren J, Wang X (2018) Learning monocular depth by distilling cross-domain stereo networks. In: European conference on computer vision, pp 484–500

  15. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  16. Jiang H, Larsson G, Shakhnarovich M, Miller E (2018) Self-supervised relative depth learning for urban scene understanding. In: European conference on computer vision, pp 20–37

  17. Kuznietsov Y, Stckler J, Leibe B (2017) Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6647–6655

  18. Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: International conference on 3D vision, pp 239–248

  19. Lei J, Li X, Peng B, Fang L, Ling N, Huang Q (2021) Deep spatial-spectral subspace clustering for hyperspectral image. IEEE Transactions on Circuits and Systems for Video Technology 31(7):2686–2697

    Article  Google Scholar 

  20. Li S, Li W, Cook C, Zhu C, Gao Y (2018) Independently recurrent neural network (IndRNN): Building a longer and deeper RNN. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5457–5466

  21. Li B, Shen C, Dai Y et al (2015) Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1119–1127

  22. Liu F, Shen C, Lin G (2015) Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5162–5170

  23. Liu F, Shen C, Lin G, Reid I (2016) Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Machine Intell 38(10):2024–2039

    Article  Google Scholar 

  24. Mahjourian R, Wicke M, Angelova A (2018) Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5667–5675

  25. Mehta I, Sakurikar P, Narayanan PJ (2018) Structured adversarial training for unsupervised monocular depth estimation. In: International conference on 3D vision, pp 314–323

  26. Mei X, Sun X, Zhou M, Jiao S, Wang H, Zhang X (2011) On building an accurate stereo matching system on graphics hardware. In: Proceedings of the IEEE international conference on computer vision workshops, pp 467–474

  27. Owen AB (2007) A robust hybrid of lasso and ridge regression. Contemp Math 443(7):59–72

    Article  MathSciNet  MATH  Google Scholar 

  28. Pan Z, Yu W, Lei J, Ling N, Kwong S (2021) TSAN: Synthesized view quality enhancement via two-stream attention network for 3D-HEVC. IEEE Transactions on Circuits and Systems for Video Technology. 1–14 https://doi.org/10.1109/TCSVT.2021.3057518

  29. Peng B, Lei J, Fu H, Jia Y, Zhang Z, Li Y (2021) Deep video action clustering via spatio-temporal feature learning. Neurocomputing 1–9. https://doi.org/10.1016/j.neucom.2020.05.123

  30. Pilzer A, Lathuiliere S, Sebe N et al (2019) Refine and distill: Exploiting cycle-inconsistency and knowledge distillation for unsupervised monocular depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9760–9769

  31. Poggi M, Tosi F, Mattoccia S (2018) Learning monocular depth estimation with unsupervised trinocular assumptions. In: International conference on 3D vision, pp 324–333

  32. Ramirez P, Poggi M, Tosi F, Mattoccia S, Stefano LD (2018) Geometry meets semantics for semi-supervised monocular depth estimation. In: Asian Conference on Computer Vision, pp 298–313

  33. Ranjan A, Jampani V, Kim K, Sun D, Wulff, Black MJ (2019) Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12232–12241

  34. Russakovsky O, Deng J, Su H (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  35. Saxena A, Chung SH, Ng AY (2008) 3-D depth reconstruction from a single still image. Int J Comput Vision 76(1):53–69

    Article  Google Scholar 

  36. Scharstein D, Szeliski R (2003) High-accuracy stereo depth maps using structured light. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 195–202

  37. Shelhamer E, Long J, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  38. Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from RGBD images. In: European conference on computer vision, pp 746–760

  39. Tonioni A, Poggi M, Mattoccia S, Stefano LD (2020) Unsupervised domain adaptation for depth prediction from images. IEEE Trans Pattern Anal Machine Intell 42(10):2396–2409

    Article  Google Scholar 

  40. Tosi F, Aleotti F, Poggi M et al (2019) Learning monocular depth estimation infusing traditional stereo knowledge. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9791–9801

  41. Tosi F, Poggi M, Tonioni A, Stefano LD, Mattoccia S (2017) Learning confidence measures in the wild. In: 28th British machine vision conference

  42. Wang Z (2004) Image quality assessment: from error visibility to structural similarity, vol 13, pp 600–612

  43. Wang P, Shen X, Lin Z, Cohen S, Price B, Yuille A (2015) Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2800–2809

  44. Wong A, Hong B, Soatto S (2019) Bilateral cyclic constraint and adaptive regularization for unsupervised monocular depth prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5644–5653

  45. Xu D, Ricci E, Ouyang W, Wang X, Sebe N (2019) Monocular depth estimation using multi-scale continuous CRFs as sequential deep networks. IEEE Trans Pattern Anal Machine Intell 41(6):1426–1440

    Article  Google Scholar 

  46. Yang Z, Wang P, Wang Y, Xu W, Nevatia R (2018) Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5667–5675

  47. Yang Z, Wang P, Xu W, Zhao L, Nevatia R (2018) Unsupervised learning of geometry with edge-aware depth-normal consistency. In: 32nd AAAI Conference on artificial intelligence, pp 7493–7500

  48. Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6612–6621

  49. Zhu J, Wang L, Yang R, Davis J E, Pan Z (2011) Reliability fusion of time-of-flight depth and stereo geometry for high quality depth maps. IEEE Trans Pattern Anal Machine Intell 33(7):1400–1414

    Article  Google Scholar 

  50. Zou Y, Luo Z, Huang J-B (2018) DF-Net: Unsupervised joint learning of depth and flow using cross-task consistency. In: European conference on computer vision, pp 36–53

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by the Natural Science Foundation of Tianjin (No.18ZXZNGX00110).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, L., Li, Y., Liu, B. et al. Transferring knowledge from monocular completion for self-supervised monocular depth estimation. Multimed Tools Appl 81, 42485–42495 (2022). https://doi.org/10.1007/s11042-021-11212-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11212-4

Keywords

Navigation