Skip to main content
Log in

Real-time monocular depth estimation with adaptive receptive fields

  • Special Issue Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Monocular depth estimation is a popular research topic in the field of autonomous driving. Nowadays many models are leading in accuracy but performing poorly in a real-time scenario. To effectively increase the depth estimation efficiency, we propose a novel model combining a multi-scale pyramid architecture for depth estimation together with adaptive receptive fields. The pyramid architecture reduces the trainable parameters from dozens of mega to less than 10 mega. Adaptive receptive fields are more sensitive to objects at different depth/distances in images, leading to better accuracy. We have adopted stacked convolution kernels instead of raw kernels to compress the model. Thus, the model that we proposed performs well in both real-time performance and estimation accuracy. We provide a set of experiments where our model performs better in terms of Eigen split than other previously known models. Furthermore, we show that our model is also better in runtime performance in regard to the depth estimation to the rest of models but the Pyd-Net model. Finally, our model is a lightweight depth estimation model with state-of-the-art accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Ashutosh, S., Sung, H., Andrew, Y.Ng.: Learning depth from single monocular images, NIPS (2006)

  2. Janusz, K., Meng, W., Prakash, I.: 2D-to-3D image conversion by learning depth from examples, CVPR(2012). https://doi.org/10.1109/CVPRW.2012.6238903

  3. Karsch, K., Liu, C., Kang, S.B.: Depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans. Pattern Anal. Mach. Intell. 36, 2144–2158 (2014)

    Article  Google Scholar 

  4. Mohammad, H.B., Vignesh, J., Robinson, P., Anurag, B., Wei, D., Neel, S.: Im2depth: Scalable exemplar based depth transfer, IEEE Winter Conference on Applications of Computer Vision (2014). https://doi.org/10.1109/WACV.2014.6836091

  5. Li, S.Z.: Markov random field models in computer vision, ECCV, pp. 361–370 (1994)

  6. Julien, M., Jean, P., Guillermo, S., Andrew, Z., Francis, R.B.: Supervised dictionary learning, NIPS (2008)

  7. Matto, P., Filippo, A., Fabio, T., Stefano, M.: Towards real-time unsupervised monocular depth estimation on CPU, IROS (2018). https://doi.org/10.1109/IROS.2018.8593814

  8. Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks, CVPR, pp. 510–519 (2019)

  9. Simonyan, K.: Zisserman. Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014)

  10. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network, NIPS, pp. 2366–2374 (2014)

  11. Laina, I., Ruppreecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks, International Conference on 3D Vision (2016). https://doi.org/10.1109/3DV.2016.32

  12. Fu, H., Gong, M., Wang, C.: Batmanghelich.K, Tao.D, Deep ordinal regression network for monocular depth estimation, CVPR, pp. 2002-2011 (2018)

  13. Godard, C., Aodha, M.O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency, CVPR, pp. 270–279 (2017)

  14. Yin, Z., Shi, J.: GeoNet:Unsupervised learning of dense depth, optical flow and camera pose, CVPR, pp. 1983–1992 (2018)

  15. Mahjourian, R., Wicke, M., Angelova, A. Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints, CVPR, pp. 5667–5675 (2018)

  16. Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image, CVPR, pp. 5162–5170 (2015)

  17. Sutton, C., McCallum, A.: An introduction to conditional random fields. Found. Trends Mach. Learn. 4, 267–373 (2012)

    Article  Google Scholar 

  18. Xu, D., Ricci, E., Ouyang, W., Wang, X., Sebe, N.: Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation, CVPR, pp. 5354–5362 (2017)

  19. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004)

    Article  Google Scholar 

  20. Garg, R., Kumar, V., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue, ECCV, pp. 740–756 (2016)

  21. Zhou, T., Brown, M., Snavely, N., Love, D.G.: Unsupervised learning of depth and ego-motion from video, CVPR, pp. 1851–1858 (2017)

  22. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2015). arXiv:1412.6980

  23. Zhang, W., Zhang, Z., Zeadally, S. et al.: MASM:A multiple-algorithm service model for energy-delay optimization in edge artificial intelligence. IEEE Trans. Ind. Inform. pp. 1–1 (2019)

  24. Zhang, Z., Zhang, W., Tseng, F.H.: Satellite mobile edge computing: improving qos of high-speed satellite-terrestrial networks using edge computing techniques. IEEE Netw 33, 70–76 (2019)

    Article  Google Scholar 

  25. Koch, T., Liebel, L., Fraundorfer, F., Korner, M.: Evaluation of CNN-based single-image depth estimation methods. ECCV. (2018)

  26. Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. IEEE Conf. Comput. Vis. pp. 2650–2658 (2015)

Download references

Acknowledgements

This work is supported by the National Key Research and Development Program of China (No. 2018YFC0831903) and Major Project of National Natural Science Foundation of China (No. 51935002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhenyan Ji.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ji, Z., Song, X., Guo, X. et al. Real-time monocular depth estimation with adaptive receptive fields. J Real-Time Image Proc 18, 1369–1381 (2021). https://doi.org/10.1007/s11554-020-01036-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-020-01036-0

Keywords

Navigation