Abstract
Three-dimensional scene understanding is an emerging field in many real-world applications. Autonomous driving, robotics, and continuous real-time tracking are hot topics within the engineering society. One essential component of this is to develop faster and more reliable algorithms being capable of predicting depths from RGB images. Generally, it is easier to install a system with fewer cameras because it requires less calibration. Thus, our aim is to develop a strategy for predicting the depth on a single image as precisely as possible from one point of view. There are existing methods for this problem with promising results. The goal of this paper is to advance the state-of-the-art in the field of single-image depth prediction using convolutional neural networks. In order to do so, we modified an existing deep neural network to get improved results. The proposed architecture contains additional side-to-side connections between the encoding and decoding branches.
The research was supported by the Hungarian Scientific Research Fund (No. NKFIH OTKA K-120499 and KH-126513) and the BME-Artificial Intelligence FIKP grant of EMMI (BME FIKP-MI/FM).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alexander, E., Guo, Q., Koppal, S., Gortler, S.J., Zickler, T.: Focal flow: velocity and depth from differential defocus through motion. Int. J. Comput. Vis. 1–22 (2017)
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658 (2015)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems, pp. 2366–2374 (2014)
Grossmann, P.: Depth from focus. Pattern Recognit. Lett. 5(1), 63–69 (1987)
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: European Conference on Computer Vision, pp. 345–360. Springer, Heidelberg (2014)
He, K., Zhang, X., Shaoqing, R., Jian, S.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Kundu, A., Li, Y., Dellaert, F., Li, F., Rehg, J.M.: Joint semantic segmentation and 3D reconstruction from monocular video. In: European Conference on Computer Vision, pp. 703–718. Springer, Heidelberg (2014)
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 239–248 (2016)
Li, S.Z.: Markov random field models in computer vision. In: European Conference on Computer Vision, pp. 361–370. Springer, Heidelberg (1994)
Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3194–3203 (2016)
Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5162–5170 (2015)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Owen, A.B.: A robust hybrid of lasso and ridge regression. Contemp. Math. 443(7), 59–72 (2007)
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. Mach. Learn. Mol. Mater. (2017)
Pfister, T., Charles, J., Zissermann, A.: Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1913–1921 (2015)
Ren, X., Bo, L., Fox, D.: RGB-(D) scene labeling: features and algorithms. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2759–2766 (2012)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer, Heidelberg (2015)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: European Conference on Computer Vision, pp. 746–760. Springer, Heidelberg (2012)
Simonyan, K., Zissermann, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint (2014)
Sutton, C., McCallum, A.: An introduction to conditional random fields. Found. Trends Mach. Learn. 4(4), 267–373 (2012)
Szirányi, T., Zerubia, J., Czúni, L., Kato, Z.: Image segmentation using Markov random field model in fully parallel cellular network architectures. Real-Time Imaging 6(3), 195–211 (2000)
Tao, M.W., Srinivasan, P.P., Malik, J., Rusinkiewicz, S., Ramamoorthi, R.: Depth from shading, defocus, and correspondence using light-field angular coherence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1940–1948 (2015)
Taylor, J., Shotton, J., Sharp, T., Fitzgibbon, A.: The Vitruvian manifold: inferring dense correspondences for one-shot human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 103–110 (2012)
Ullman, S.: The interpretation of structure from motion. Proc. R. Soc. Lond. B 203(1153), 405–426 (1979)
Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B., Yuille, A.: Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2800–2809 (2015)
Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M.: Shape-from-shading: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 21(8), 690–706 (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Harsányi, K., Kiss, A., Majdik, A., Sziranyi, T. (2019). A Hybrid CNN Approach for Single Image Depth Estimation: A Case Study. In: Choroś, K., Kopel, M., Kukla, E., Siemiński, A. (eds) Multimedia and Network Information Systems. MISSI 2018. Advances in Intelligent Systems and Computing, vol 833. Springer, Cham. https://doi.org/10.1007/978-3-319-98678-4_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-98678-4_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98677-7
Online ISBN: 978-3-319-98678-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)