Skip to main content
Log in

Perceptual Monocular Depth Estimation

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Monocular depth estimation (MDE), which is the task of using a single image to predict scene depths, has gained considerable interest, in large part owing to the popularity of applying deep learning methods to solve “computer vision problems”. Monocular cues provide sufficient data for humans to instantaneously extract an understanding of scene geometries and relative depths, which is evidence of both the processing power of the human visual system and the predictive power of the monocular data. However, developing computational models to predict depth from monocular images remains challenging. Hand-designed MDE features do not perform particularly well, and even current “deep” models are still evolving. Here we propose a novel approach that uses perceptually-relevant natural scene statistics (NSS) features to predict depths from monocular images in a simple, scale-agnostic way that is competitive with state-of-the-art systems. While the statistics of natural photographic images have been successfully used in a variety of image and video processing, analysis, and quality assessment tasks, they have never been applied in a predictive end-to-end deep-learning model for monocular depth. Correspondingly, no previous work has explicitly incorporated perceptual features in a monocular depth-prediction approach. Here we accomplish this by developing a new closed-form bivariate model of image luminances and use features extracted from this model and from other NSS models to drive a novel deep learning framework for predicting depth given a single image.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Attneave F (1954) Some informational aspects of visual perception. Psychol Rev 61(3):183

    Article  Google Scholar 

  2. Barlow HB (1972) Single units and sensation: A neuron doctrine for perceptual psychology? Perception 1(4):371–394

    Article  Google Scholar 

  3. Barlow HB et al (1961) Possible principles underlying the transformation of sensory messages. Sensory Commun 1:217–234

    Google Scholar 

  4. Bell AJ, Sejnowski TJ (1997) The “independent components” of natural scenes are edge filters. Vis Res 37(23):3327–3338

    Article  Google Scholar 

  5. Bovik AC (2013) Automatic prediction of perceptual image and video quality. Proc IEEE 101(9):2008–2024

    Google Scholar 

  6. Cao Y, Wu Z, Shen C (2018) Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Trans Circuits Syst Video Technol 28(11):3174–3182

    Article  Google Scholar 

  7. Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499

    Article  MathSciNet  Google Scholar 

  8. Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. Adv Neural Inf Proc Syst 27:2366–2374

    Google Scholar 

  9. Field DJ (1987) Relations between the statistics of natural images and the response properties of cortical cells. J Opt Soc Am 4(12):2379–2394

    Article  Google Scholar 

  10. Field DJ (1999) Wavelets, vision and the statistics of natural scenes. Philos Trans Royal Soc Lond Ser A Math Phys Eng Sci 357(1760):2527–2542

    Article  Google Scholar 

  11. Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern Recognition

  12. Garg R, BG VK, Carneiro G, Reid I (2016) Unsupervised cnn for single view depth estimation: Geometry to the rescue. In: Proc. Eur. Conf. Comput. Vis., pp 740–756

  13. Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE conference on computer vision and pattern Recognition

  14. Godard C, Mac Aodha O, Firman M, Brostow G (2018) Digging into self-supervised monocular depth estimation. arXiv preprint arXiv:1806.01260

  15. Godard C, Mac Aodha O, Firman M, Brostow GJ (2019) Digging into self-supervised monocular depth estimation. In: Proc. IEEE Int’l Conf. Comput. Vis., pp 3828–3838

  16. Gómez E, Gomez-Viilegas M, Marin J (1998) A multivariate generalization of the power exponential family of distributions. Commun Stat Theory Methods 27(3):589–600

    Article  MathSciNet  Google Scholar 

  17. Ha H, Im S, Park J, Jeon HG, So Kweon I (2016) High-quality depth from uncalibrated small motion clip. In: Proceedings of the IEEE conference on computer vision and pattern Recognition

  18. van Hateren JH, Ruderman DL (1998) Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex. Proc Royal Soc Lond Ser B Biol Sci 265(1412):2315–2320

    Article  Google Scholar 

  19. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern Recognition, pp 770–778

  20. Hyvärinen A, Hoyer P (2000) Emergence of phase-and shift-invariant features by decomposition of natural images into independent feature subspaces. Neural Comput 12(7):1705–1720

    Article  Google Scholar 

  21. Jou JY, Bovik AC (1989) Improved initial approximation and intensity-guided discontinuity detection in visible-surface reconstruction. Comput Vis Graphics Image Process 47(3):292–326

    Article  Google Scholar 

  22. Karsch K, Liu C, Kang SB (2014) Depth transfer: Depth extraction from video using non-parametric sampling. IEEE Trans Pattern Anal Mach Intell 36(11):2144–2158

    Article  Google Scholar 

  23. Keras-contrib (2018) DSSIM Loss Function. https://github.com/keras-team/keras-contrib/blob/master/keras_contrib/losses/dssim.py

  24. Kim S, Park K, Sohn K, Lin S (2016) Unified depth prediction and intrinsic image decomposition from a single image via joint convolutional neural fields. In: Proc. Eur. Conf. Comput. Vis., pp 143–159

  25. Kong N, Black MJ (2015) Intrinsic depth: Improving depth transfer with intrinsic images. In: IEEE Int’l Conf. Comput. Vis

  26. Kuznietsov Y, Stuckler J, Leibe B (2017) Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE conference on computer vision and pattern Recognition

  27. Lagarias JC, Reeds JA, Wright MH, Wright PE (1998) Convergence properties of the Nelder-Mead simplex method in low dimensions. SIAM J Optim 9(1):112–147

    Article  MathSciNet  Google Scholar 

  28. Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: Int’l Conf. 3D Vision, pp 239–248

  29. Li B, Shen C, Dai Y, Van Den Hengel A, He M (2015) Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In: Proceedings of the IEEE conference on computer vision and pattern Recognition, pp 1119–1127

  30. Liu F, Shen C, Lin G, Reid I (2016) Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Mach Intell 38(10):2024–2039

    Article  Google Scholar 

  31. Liu Y, Cormack LK, Bovik AC (2009) Luminance, disparity, and range statistics in 3D natural scenes. In: Proc. SPIE, Human Vis. and Electronic Imaging, vol 7240, p 72401G

  32. Liu Y, Cormack LK, Bovik AC (2011) Statistical modeling of 3D natural scenes with application to bayesian stereopsis. IEEE Trans Image Process 20(9):2515–2530

    Article  MathSciNet  Google Scholar 

  33. Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708

    Article  MathSciNet  Google Scholar 

  34. Mittal A, Soundararajan R, Bovik AC (2013) Making a “completely blind” image quality analyzer. IEEE Signal Process Lett 20(3):209–212

    Article  Google Scholar 

  35. Olshausen BA, Field DJ (1997) Sparse coding with an overcomplete basis set: A strategy employed by v1? Vis Res 37(23):3311–3325

    Article  Google Scholar 

  36. Olshausen BA, Field DJ (2004) Sparse coding of sensory inputs. Curr Opin Neurobiol 14(4):481–487

    Article  Google Scholar 

  37. Pan J, Mueller M, Lahlou T, Bovik AC (2018) Orthogonally-divergent fisheye stereo. In: Int’l Conf. Advanced Concepts for Intell. Vis. Systems, pp 112–124

  38. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  39. Portilla J, Strela V, Wainwright MJ, Simoncelli EP (2003) Image denoising using scale mixtures of gaussians in the wavelet domain. IEEE Trans Image Process 12(11):1338–1351

    Article  MathSciNet  Google Scholar 

  40. Potetz B, Lee TS (2003) Statistical correlations between two-dimensional images and three-dimensional structures in natural scenes. J Opt Soc Am 20(7):1292–1303

    Article  Google Scholar 

  41. Rajagopalan A, Chaudhuri S, Mudenagudi U (2004) Depth estimation and image restoration using defocused stereo pairs. IEEE Trans Pattern Anal Mach Intell 26(11):1521–1525

    Article  Google Scholar 

  42. Rao RP, Olshausen BA, Lewicki MS (2002) Probabilistic models of the brain: perception and neural function. MIT press, Cambridge

    Book  Google Scholar 

  43. Roy A, Todorovic S (2016) Monocular depth estimation using neural regression forest. In: Proceedings of the IEEE conference on computer vision and pattern Recognition

  44. Ruderman DL (1994) The statistics of natural images. Netw Comput Neural Syst 5(4):517–548

    Article  Google Scholar 

  45. Sharifi K, Leon-Garcia A (1995) Estimation of shape parameter for generalized gaussian distributions in subband decompositions of video. IEEE Trans Circuits Syst Video Technol 5(1):52–56

    Article  Google Scholar 

  46. Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from RGBD images. In: Eur. Conf. Comput. Vis., pp 746–760

  47. Simoncelli EP, Olshausen BA (2001) Natural image statistics and neural representation. Ann Rev Neurosci 24(1):1193–1216

    Article  Google Scholar 

  48. Sinno Z, Caramanis C, Bovik AC (2018) Towards a closed form second-order natural scene statistics model. IEEE Trans Image Process 27(7):3194–3209

    Article  MathSciNet  Google Scholar 

  49. Su CC, Bovik AC, Cormack LK (2011) Natural scene statistics of color and range. In: IEEE Int’l Conf. Image Process., pp 257–260

  50. Su CC, Bovik AC, Cormack LK (2012) Statistical model of color and disparity with application to bayesian stereopsis. In: IEEE Southwest Symp. Image Anal. and Interpretation, pp 169–172

  51. Su CC, Cormack LK, Bovik AC (2013) Color and depth priors in natural images. IEEE Trans Image Process 22(6):2259–2274

    Article  MathSciNet  Google Scholar 

  52. Su CC, Cormack LK, Bovik AC (2014) Bivariate statistical modeling of color and range in natural scenes. In: Proc. SPIE, Human Vis. and Electronic Imaging, vol 9014. https://doi.org/10.1117/12.2036505

  53. Su CC, Cormack LK, Bovik AC (2014) New bivariate statistical model of natural image correlations. In: IEEE Int’l Conf. Acoustics, Speech, Signal Process., pp 5362–5366

  54. Su CC, Cormack LK, Bovik AC (2015a) Closed-form correlation model of oriented bandpass natural images. IEEE Signal Process Lett 22(1):21–25

    Article  Google Scholar 

  55. Su CC, Cormack LK, Bovik AC (2015b) Oriented correlation models of distorted natural images with application to natural stereopair quality evaluation. IEEE Trans Image Process 24(5):1685–1699

    Article  MathSciNet  Google Scholar 

  56. Su CC, Cormack LK, Bovik AC (2017) Bayesian depth estimation from monocular natural images. J Vis 17:22–22

    Article  Google Scholar 

  57. Tang H, Joshi N, Kapoor A (2011) Learning a blind measure of perceptual image quality. In: Proceedings of the IEEE conference on computer vision and pattern Recognition, pp 305–312

  58. Van Hateren JH, van der Schaaf A (1998) Independent component filters of natural images compared with simple cells in primary visual cortex. Proc Royal Soc Lond Ser B Biol Sci 265(1394):359–366

    Article  Google Scholar 

  59. Wainwright MJ, Schwartz O (2002) Natural image statistics and divisive. Probabilistic Models of the Brain: Perception and Neural Function, Chap 10, p 203

  60. Wainwright MJ, Schwartz O, Simoncelli EP (2001) Natural image statistics and divisive normalization: Modeling nonlinearities and adaptation in cortical neurons. Perception and Neural Function, Probabilistic Models of the Brain. MIT Press, pp 203–222

  61. Wang Z, Bovik AC (2011) Reduced- and no-reference image quality assessment. IEEE Signal Process Mag 28(6):29–40

    Article  Google Scholar 

  62. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP et al (2004) Image quality assessment: From error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

    Article  Google Scholar 

  63. Xu D, Ricci E, Ouyang W, Wang X, Sebe N (2017) Multi-scale continuous crfs as sequential deep networks for monocular depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern Recognition, pp 5354–5362

  64. Zagoruyko S, Komodakis N (2015) Learning to compare image patches via convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern Recognition

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Janice Pan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pan, J., Bovik, A.C. Perceptual Monocular Depth Estimation. Neural Process Lett 53, 1205–1228 (2021). https://doi.org/10.1007/s11063-021-10437-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-021-10437-6

Keywords

Navigation