Real-time monocular depth estimation with adaptive receptive fields

Ji, Zhenyan; Song, Xiaojun; Guo, Xiaoxuan; Wang, Fangshi; Armendáriz-Iñigo, José Enrique

doi:10.1007/s11554-020-01036-0

Real-time monocular depth estimation with adaptive receptive fields

Special Issue Paper
Published: 24 October 2020

Volume 18, pages 1369–1381, (2021)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Zhenyan Ji ORCID: orcid.org/0000-0002-6566-9464¹,
Xiaojun Song¹,
Xiaoxuan Guo¹,
Fangshi Wang¹ &
…
José Enrique Armendáriz-Iñigo²

350 Accesses
Explore all metrics

Abstract

Monocular depth estimation is a popular research topic in the field of autonomous driving. Nowadays many models are leading in accuracy but performing poorly in a real-time scenario. To effectively increase the depth estimation efficiency, we propose a novel model combining a multi-scale pyramid architecture for depth estimation together with adaptive receptive fields. The pyramid architecture reduces the trainable parameters from dozens of mega to less than 10 mega. Adaptive receptive fields are more sensitive to objects at different depth/distances in images, leading to better accuracy. We have adopted stacked convolution kernels instead of raw kernels to compress the model. Thus, the model that we proposed performs well in both real-time performance and estimation accuracy. We provide a set of experiments where our model performs better in terms of Eigen split than other previously known models. Furthermore, we show that our model is also better in runtime performance in regard to the depth estimation to the rest of models but the Pyd-Net model. Finally, our model is a lightweight depth estimation model with state-of-the-art accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Influence of Neural Network Receptive Field on Monocular Depth and Ego-Motion Estimation

Article Open access 28 November 2023

Depth-Enhanced Deep Learning Approach For Monocular Camera Based 3D Object Detection

Article Open access 09 July 2024

A Synopsis of Monocular Depth Estimation

References

Ashutosh, S., Sung, H., Andrew, Y.Ng.: Learning depth from single monocular images, NIPS (2006)
Janusz, K., Meng, W., Prakash, I.: 2D-to-3D image conversion by learning depth from examples, CVPR(2012). https://doi.org/10.1109/CVPRW.2012.6238903
Karsch, K., Liu, C., Kang, S.B.: Depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans. Pattern Anal. Mach. Intell. 36, 2144–2158 (2014)
Article Google Scholar
Mohammad, H.B., Vignesh, J., Robinson, P., Anurag, B., Wei, D., Neel, S.: Im2depth: Scalable exemplar based depth transfer, IEEE Winter Conference on Applications of Computer Vision (2014). https://doi.org/10.1109/WACV.2014.6836091
Li, S.Z.: Markov random field models in computer vision, ECCV, pp. 361–370 (1994)
Julien, M., Jean, P., Guillermo, S., Andrew, Z., Francis, R.B.: Supervised dictionary learning, NIPS (2008)
Matto, P., Filippo, A., Fabio, T., Stefano, M.: Towards real-time unsupervised monocular depth estimation on CPU, IROS (2018). https://doi.org/10.1109/IROS.2018.8593814
Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks, CVPR, pp. 510–519 (2019)
Simonyan, K.: Zisserman. Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network, NIPS, pp. 2366–2374 (2014)
Laina, I., Ruppreecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks, International Conference on 3D Vision (2016). https://doi.org/10.1109/3DV.2016.32
Fu, H., Gong, M., Wang, C.: Batmanghelich.K, Tao.D, Deep ordinal regression network for monocular depth estimation, CVPR, pp. 2002-2011 (2018)
Godard, C., Aodha, M.O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency, CVPR, pp. 270–279 (2017)
Yin, Z., Shi, J.: GeoNet:Unsupervised learning of dense depth, optical flow and camera pose, CVPR, pp. 1983–1992 (2018)
Mahjourian, R., Wicke, M., Angelova, A. Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints, CVPR, pp. 5667–5675 (2018)
Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image, CVPR, pp. 5162–5170 (2015)
Sutton, C., McCallum, A.: An introduction to conditional random fields. Found. Trends Mach. Learn. 4, 267–373 (2012)
Article Google Scholar
Xu, D., Ricci, E., Ouyang, W., Wang, X., Sebe, N.: Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation, CVPR, pp. 5354–5362 (2017)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004)
Article Google Scholar
Garg, R., Kumar, V., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue, ECCV, pp. 740–756 (2016)
Zhou, T., Brown, M., Snavely, N., Love, D.G.: Unsupervised learning of depth and ego-motion from video, CVPR, pp. 1851–1858 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2015). arXiv:1412.6980
Zhang, W., Zhang, Z., Zeadally, S. et al.: MASM:A multiple-algorithm service model for energy-delay optimization in edge artificial intelligence. IEEE Trans. Ind. Inform. pp. 1–1 (2019)
Zhang, Z., Zhang, W., Tseng, F.H.: Satellite mobile edge computing: improving qos of high-speed satellite-terrestrial networks using edge computing techniques. IEEE Netw 33, 70–76 (2019)
Article Google Scholar
Koch, T., Liebel, L., Fraundorfer, F., Korner, M.: Evaluation of CNN-based single-image depth estimation methods. ECCV. (2018)
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. IEEE Conf. Comput. Vis. pp. 2650–2658 (2015)

Download references

Acknowledgements

This work is supported by the National Key Research and Development Program of China (No. 2018YFC0831903) and Major Project of National Natural Science Foundation of China (No. 51935002).

Author information

Authors and Affiliations

School of Software Engineering, Beijing Jiaotong University, 100044, Beijing, China
Zhenyan Ji, Xiaojun Song, Xiaoxuan Guo & Fangshi Wang
Department of Statistics, Computer Science and Mathematics, Public University of Navarre, 31006, Pamplona, Spain
José Enrique Armendáriz-Iñigo

Authors

Zhenyan Ji
View author publications
You can also search for this author inPubMed Google Scholar
Xiaojun Song
View author publications
You can also search for this author inPubMed Google Scholar
Xiaoxuan Guo
View author publications
You can also search for this author inPubMed Google Scholar
Fangshi Wang
View author publications
You can also search for this author inPubMed Google Scholar
José Enrique Armendáriz-Iñigo
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhenyan Ji.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ji, Z., Song, X., Guo, X. et al. Real-time monocular depth estimation with adaptive receptive fields. J Real-Time Image Proc 18, 1369–1381 (2021). https://doi.org/10.1007/s11554-020-01036-0

Download citation

Received: 27 January 2020
Accepted: 09 October 2020
Published: 24 October 2020
Issue Date: August 2021
DOI: https://doi.org/10.1007/s11554-020-01036-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-time monocular depth estimation with adaptive receptive fields

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Influence of Neural Network Receptive Field on Monocular Depth and Ego-Motion Estimation

Depth-Enhanced Deep Learning Approach For Monocular Camera Based 3D Object Detection

A Synopsis of Monocular Depth Estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now