Abstract
Monocular depth estimation is a popular research topic in the field of autonomous driving. Nowadays many models are leading in accuracy but performing poorly in a real-time scenario. To effectively increase the depth estimation efficiency, we propose a novel model combining a multi-scale pyramid architecture for depth estimation together with adaptive receptive fields. The pyramid architecture reduces the trainable parameters from dozens of mega to less than 10 mega. Adaptive receptive fields are more sensitive to objects at different depth/distances in images, leading to better accuracy. We have adopted stacked convolution kernels instead of raw kernels to compress the model. Thus, the model that we proposed performs well in both real-time performance and estimation accuracy. We provide a set of experiments where our model performs better in terms of Eigen split than other previously known models. Furthermore, we show that our model is also better in runtime performance in regard to the depth estimation to the rest of models but the Pyd-Net model. Finally, our model is a lightweight depth estimation model with state-of-the-art accuracy.









Similar content being viewed by others
References
Ashutosh, S., Sung, H., Andrew, Y.Ng.: Learning depth from single monocular images, NIPS (2006)
Janusz, K., Meng, W., Prakash, I.: 2D-to-3D image conversion by learning depth from examples, CVPR(2012). https://doi.org/10.1109/CVPRW.2012.6238903
Karsch, K., Liu, C., Kang, S.B.: Depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans. Pattern Anal. Mach. Intell. 36, 2144–2158 (2014)
Mohammad, H.B., Vignesh, J., Robinson, P., Anurag, B., Wei, D., Neel, S.: Im2depth: Scalable exemplar based depth transfer, IEEE Winter Conference on Applications of Computer Vision (2014). https://doi.org/10.1109/WACV.2014.6836091
Li, S.Z.: Markov random field models in computer vision, ECCV, pp. 361–370 (1994)
Julien, M., Jean, P., Guillermo, S., Andrew, Z., Francis, R.B.: Supervised dictionary learning, NIPS (2008)
Matto, P., Filippo, A., Fabio, T., Stefano, M.: Towards real-time unsupervised monocular depth estimation on CPU, IROS (2018). https://doi.org/10.1109/IROS.2018.8593814
Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks, CVPR, pp. 510–519 (2019)
Simonyan, K.: Zisserman. Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network, NIPS, pp. 2366–2374 (2014)
Laina, I., Ruppreecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks, International Conference on 3D Vision (2016). https://doi.org/10.1109/3DV.2016.32
Fu, H., Gong, M., Wang, C.: Batmanghelich.K, Tao.D, Deep ordinal regression network for monocular depth estimation, CVPR, pp. 2002-2011 (2018)
Godard, C., Aodha, M.O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency, CVPR, pp. 270–279 (2017)
Yin, Z., Shi, J.: GeoNet:Unsupervised learning of dense depth, optical flow and camera pose, CVPR, pp. 1983–1992 (2018)
Mahjourian, R., Wicke, M., Angelova, A. Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints, CVPR, pp. 5667–5675 (2018)
Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image, CVPR, pp. 5162–5170 (2015)
Sutton, C., McCallum, A.: An introduction to conditional random fields. Found. Trends Mach. Learn. 4, 267–373 (2012)
Xu, D., Ricci, E., Ouyang, W., Wang, X., Sebe, N.: Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation, CVPR, pp. 5354–5362 (2017)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004)
Garg, R., Kumar, V., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue, ECCV, pp. 740–756 (2016)
Zhou, T., Brown, M., Snavely, N., Love, D.G.: Unsupervised learning of depth and ego-motion from video, CVPR, pp. 1851–1858 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2015). arXiv:1412.6980
Zhang, W., Zhang, Z., Zeadally, S. et al.: MASM:A multiple-algorithm service model for energy-delay optimization in edge artificial intelligence. IEEE Trans. Ind. Inform. pp. 1–1 (2019)
Zhang, Z., Zhang, W., Tseng, F.H.: Satellite mobile edge computing: improving qos of high-speed satellite-terrestrial networks using edge computing techniques. IEEE Netw 33, 70–76 (2019)
Koch, T., Liebel, L., Fraundorfer, F., Korner, M.: Evaluation of CNN-based single-image depth estimation methods. ECCV. (2018)
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. IEEE Conf. Comput. Vis. pp. 2650–2658 (2015)
Acknowledgements
This work is supported by the National Key Research and Development Program of China (No. 2018YFC0831903) and Major Project of National Natural Science Foundation of China (No. 51935002).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ji, Z., Song, X., Guo, X. et al. Real-time monocular depth estimation with adaptive receptive fields. J Real-Time Image Proc 18, 1369–1381 (2021). https://doi.org/10.1007/s11554-020-01036-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-020-01036-0