Perceptual Monocular Depth Estimation

Pan, Janice; Bovik, Alan C.

doi:10.1007/s11063-021-10437-6

Perceptual Monocular Depth Estimation

Published: 10 February 2021

Volume 53, pages 1205–1228, (2021)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

428 Accesses
2 Citations
Explore all metrics

Abstract

Monocular depth estimation (MDE), which is the task of using a single image to predict scene depths, has gained considerable interest, in large part owing to the popularity of applying deep learning methods to solve “computer vision problems”. Monocular cues provide sufficient data for humans to instantaneously extract an understanding of scene geometries and relative depths, which is evidence of both the processing power of the human visual system and the predictive power of the monocular data. However, developing computational models to predict depth from monocular images remains challenging. Hand-designed MDE features do not perform particularly well, and even current “deep” models are still evolving. Here we propose a novel approach that uses perceptually-relevant natural scene statistics (NSS) features to predict depths from monocular images in a simple, scale-agnostic way that is competitive with state-of-the-art systems. While the statistics of natural photographic images have been successfully used in a variety of image and video processing, analysis, and quality assessment tasks, they have never been applied in a predictive end-to-end deep-learning model for monocular depth. Correspondingly, no previous work has explicitly incorporated perceptual features in a monocular depth-prediction approach. Here we accomplish this by developing a new closed-form bivariate model of image luminances and use features extracted from this model and from other NSS models to drive a novel deep learning framework for predicting depth given a single image.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

References

Attneave F (1954) Some informational aspects of visual perception. Psychol Rev 61(3):183
Article Google Scholar
Barlow HB (1972) Single units and sensation: A neuron doctrine for perceptual psychology? Perception 1(4):371–394
Article Google Scholar
Barlow HB et al (1961) Possible principles underlying the transformation of sensory messages. Sensory Commun 1:217–234
Google Scholar
Bell AJ, Sejnowski TJ (1997) The “independent components” of natural scenes are edge filters. Vis Res 37(23):3327–3338
Article Google Scholar
Bovik AC (2013) Automatic prediction of perceptual image and video quality. Proc IEEE 101(9):2008–2024
Google Scholar
Cao Y, Wu Z, Shen C (2018) Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Trans Circuits Syst Video Technol 28(11):3174–3182
Article Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499
Article MathSciNet Google Scholar
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. Adv Neural Inf Proc Syst 27:2366–2374
Google Scholar
Field DJ (1987) Relations between the statistics of natural images and the response properties of cortical cells. J Opt Soc Am 4(12):2379–2394
Article Google Scholar
Field DJ (1999) Wavelets, vision and the statistics of natural scenes. Philos Trans Royal Soc Lond Ser A Math Phys Eng Sci 357(1760):2527–2542
Article Google Scholar
Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern Recognition
Garg R, BG VK, Carneiro G, Reid I (2016) Unsupervised cnn for single view depth estimation: Geometry to the rescue. In: Proc. Eur. Conf. Comput. Vis., pp 740–756
Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE conference on computer vision and pattern Recognition
Godard C, Mac Aodha O, Firman M, Brostow G (2018) Digging into self-supervised monocular depth estimation. arXiv preprint arXiv:1806.01260
Godard C, Mac Aodha O, Firman M, Brostow GJ (2019) Digging into self-supervised monocular depth estimation. In: Proc. IEEE Int’l Conf. Comput. Vis., pp 3828–3838
Gómez E, Gomez-Viilegas M, Marin J (1998) A multivariate generalization of the power exponential family of distributions. Commun Stat Theory Methods 27(3):589–600
Article MathSciNet Google Scholar
Ha H, Im S, Park J, Jeon HG, So Kweon I (2016) High-quality depth from uncalibrated small motion clip. In: Proceedings of the IEEE conference on computer vision and pattern Recognition
van Hateren JH, Ruderman DL (1998) Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex. Proc Royal Soc Lond Ser B Biol Sci 265(1412):2315–2320
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern Recognition, pp 770–778
Hyvärinen A, Hoyer P (2000) Emergence of phase-and shift-invariant features by decomposition of natural images into independent feature subspaces. Neural Comput 12(7):1705–1720
Article Google Scholar
Jou JY, Bovik AC (1989) Improved initial approximation and intensity-guided discontinuity detection in visible-surface reconstruction. Comput Vis Graphics Image Process 47(3):292–326
Article Google Scholar
Karsch K, Liu C, Kang SB (2014) Depth transfer: Depth extraction from video using non-parametric sampling. IEEE Trans Pattern Anal Mach Intell 36(11):2144–2158
Article Google Scholar
Keras-contrib (2018) DSSIM Loss Function. https://github.com/keras-team/keras-contrib/blob/master/keras_contrib/losses/dssim.py
Kim S, Park K, Sohn K, Lin S (2016) Unified depth prediction and intrinsic image decomposition from a single image via joint convolutional neural fields. In: Proc. Eur. Conf. Comput. Vis., pp 143–159
Kong N, Black MJ (2015) Intrinsic depth: Improving depth transfer with intrinsic images. In: IEEE Int’l Conf. Comput. Vis
Kuznietsov Y, Stuckler J, Leibe B (2017) Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE conference on computer vision and pattern Recognition
Lagarias JC, Reeds JA, Wright MH, Wright PE (1998) Convergence properties of the Nelder-Mead simplex method in low dimensions. SIAM J Optim 9(1):112–147
Article MathSciNet Google Scholar
Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: Int’l Conf. 3D Vision, pp 239–248
Li B, Shen C, Dai Y, Van Den Hengel A, He M (2015) Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In: Proceedings of the IEEE conference on computer vision and pattern Recognition, pp 1119–1127
Liu F, Shen C, Lin G, Reid I (2016) Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Mach Intell 38(10):2024–2039
Article Google Scholar
Liu Y, Cormack LK, Bovik AC (2009) Luminance, disparity, and range statistics in 3D natural scenes. In: Proc. SPIE, Human Vis. and Electronic Imaging, vol 7240, p 72401G
Liu Y, Cormack LK, Bovik AC (2011) Statistical modeling of 3D natural scenes with application to bayesian stereopsis. IEEE Trans Image Process 20(9):2515–2530
Article MathSciNet Google Scholar
Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708
Article MathSciNet Google Scholar
Mittal A, Soundararajan R, Bovik AC (2013) Making a “completely blind” image quality analyzer. IEEE Signal Process Lett 20(3):209–212
Article Google Scholar
Olshausen BA, Field DJ (1997) Sparse coding with an overcomplete basis set: A strategy employed by v1? Vis Res 37(23):3311–3325
Article Google Scholar
Olshausen BA, Field DJ (2004) Sparse coding of sensory inputs. Curr Opin Neurobiol 14(4):481–487
Article Google Scholar
Pan J, Mueller M, Lahlou T, Bovik AC (2018) Orthogonally-divergent fisheye stereo. In: Int’l Conf. Advanced Concepts for Intell. Vis. Systems, pp 112–124
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Portilla J, Strela V, Wainwright MJ, Simoncelli EP (2003) Image denoising using scale mixtures of gaussians in the wavelet domain. IEEE Trans Image Process 12(11):1338–1351
Article MathSciNet Google Scholar
Potetz B, Lee TS (2003) Statistical correlations between two-dimensional images and three-dimensional structures in natural scenes. J Opt Soc Am 20(7):1292–1303
Article Google Scholar
Rajagopalan A, Chaudhuri S, Mudenagudi U (2004) Depth estimation and image restoration using defocused stereo pairs. IEEE Trans Pattern Anal Mach Intell 26(11):1521–1525
Article Google Scholar
Rao RP, Olshausen BA, Lewicki MS (2002) Probabilistic models of the brain: perception and neural function. MIT press, Cambridge
Book Google Scholar
Roy A, Todorovic S (2016) Monocular depth estimation using neural regression forest. In: Proceedings of the IEEE conference on computer vision and pattern Recognition
Ruderman DL (1994) The statistics of natural images. Netw Comput Neural Syst 5(4):517–548
Article Google Scholar
Sharifi K, Leon-Garcia A (1995) Estimation of shape parameter for generalized gaussian distributions in subband decompositions of video. IEEE Trans Circuits Syst Video Technol 5(1):52–56
Article Google Scholar
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from RGBD images. In: Eur. Conf. Comput. Vis., pp 746–760
Simoncelli EP, Olshausen BA (2001) Natural image statistics and neural representation. Ann Rev Neurosci 24(1):1193–1216
Article Google Scholar
Sinno Z, Caramanis C, Bovik AC (2018) Towards a closed form second-order natural scene statistics model. IEEE Trans Image Process 27(7):3194–3209
Article MathSciNet Google Scholar
Su CC, Bovik AC, Cormack LK (2011) Natural scene statistics of color and range. In: IEEE Int’l Conf. Image Process., pp 257–260
Su CC, Bovik AC, Cormack LK (2012) Statistical model of color and disparity with application to bayesian stereopsis. In: IEEE Southwest Symp. Image Anal. and Interpretation, pp 169–172
Su CC, Cormack LK, Bovik AC (2013) Color and depth priors in natural images. IEEE Trans Image Process 22(6):2259–2274
Article MathSciNet Google Scholar
Su CC, Cormack LK, Bovik AC (2014) Bivariate statistical modeling of color and range in natural scenes. In: Proc. SPIE, Human Vis. and Electronic Imaging, vol 9014. https://doi.org/10.1117/12.2036505
Su CC, Cormack LK, Bovik AC (2014) New bivariate statistical model of natural image correlations. In: IEEE Int’l Conf. Acoustics, Speech, Signal Process., pp 5362–5366
Su CC, Cormack LK, Bovik AC (2015a) Closed-form correlation model of oriented bandpass natural images. IEEE Signal Process Lett 22(1):21–25
Article Google Scholar
Su CC, Cormack LK, Bovik AC (2015b) Oriented correlation models of distorted natural images with application to natural stereopair quality evaluation. IEEE Trans Image Process 24(5):1685–1699
Article MathSciNet Google Scholar
Su CC, Cormack LK, Bovik AC (2017) Bayesian depth estimation from monocular natural images. J Vis 17:22–22
Article Google Scholar
Tang H, Joshi N, Kapoor A (2011) Learning a blind measure of perceptual image quality. In: Proceedings of the IEEE conference on computer vision and pattern Recognition, pp 305–312
Van Hateren JH, van der Schaaf A (1998) Independent component filters of natural images compared with simple cells in primary visual cortex. Proc Royal Soc Lond Ser B Biol Sci 265(1394):359–366
Article Google Scholar
Wainwright MJ, Schwartz O (2002) Natural image statistics and divisive. Probabilistic Models of the Brain: Perception and Neural Function, Chap 10, p 203
Wainwright MJ, Schwartz O, Simoncelli EP (2001) Natural image statistics and divisive normalization: Modeling nonlinearities and adaptation in cortical neurons. Perception and Neural Function, Probabilistic Models of the Brain. MIT Press, pp 203–222
Wang Z, Bovik AC (2011) Reduced- and no-reference image quality assessment. IEEE Signal Process Mag 28(6):29–40
Article Google Scholar
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP et al (2004) Image quality assessment: From error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article Google Scholar
Xu D, Ricci E, Ouyang W, Wang X, Sebe N (2017) Multi-scale continuous crfs as sequential deep networks for monocular depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern Recognition, pp 5354–5362
Zagoruyko S, Komodakis N (2015) Learning to compare image patches via convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern Recognition

Download references

Author information

Authors and Affiliations

Laboratory for Image and Video Engineering (LIVE), The University of Texas at Austin, Austin, USA
Janice Pan & Alan C. Bovik

Authors

Janice Pan
View author publications
You can also search for this author in PubMed Google Scholar
Alan C. Bovik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Janice Pan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pan, J., Bovik, A.C. Perceptual Monocular Depth Estimation. Neural Process Lett 53, 1205–1228 (2021). https://doi.org/10.1007/s11063-021-10437-6

Download citation

Accepted: 27 January 2021
Published: 10 February 2021
Issue Date: April 2021
DOI: https://doi.org/10.1007/s11063-021-10437-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Perceptual Monocular Depth Estimation

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Perceptual Monocular Depth Estimation

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation