A Hybrid CNN Approach for Single Image Depth Estimation: A Case Study

Harsányi, Károly; Kiss, Attila; Majdik, András; Sziranyi, Tamas

doi:10.1007/978-3-319-98678-4_38

Károly Harsányi¹⁸,
Attila Kiss¹⁸,
András Majdik¹⁸ &
…
Tamas Sziranyi^18,19

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 833))

Included in the following conference series:

International Conference on Multimedia and Network Information System

951 Accesses
1 Citations

Abstract

Three-dimensional scene understanding is an emerging field in many real-world applications. Autonomous driving, robotics, and continuous real-time tracking are hot topics within the engineering society. One essential component of this is to develop faster and more reliable algorithms being capable of predicting depths from RGB images. Generally, it is easier to install a system with fewer cameras because it requires less calibration. Thus, our aim is to develop a strategy for predicting the depth on a single image as precisely as possible from one point of view. There are existing methods for this problem with promising results. The goal of this paper is to advance the state-of-the-art in the field of single-image depth prediction using convolutional neural networks. In order to do so, we modified an existing deep neural network to get improved results. The proposed architecture contains additional side-to-side connections between the encoding and decoding branches.

The research was supported by the Hungarian Scientific Research Fund (No. NKFIH OTKA K-120499 and KH-126513) and the BME-Artificial Intelligence FIKP grant of EMMI (BME FIKP-MI/FM).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alexander, E., Guo, Q., Koppal, S., Gortler, S.J., Zickler, T.: Focal flow: velocity and depth from differential defocus through motion. Int. J. Comput. Vis. 1–22 (2017)
Google Scholar
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658 (2015)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems, pp. 2366–2374 (2014)
Google Scholar
Grossmann, P.: Depth from focus. Pattern Recognit. Lett. 5(1), 63–69 (1987)
Article Google Scholar
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: European Conference on Computer Vision, pp. 345–360. Springer, Heidelberg (2014)
Google Scholar
He, K., Zhang, X., Shaoqing, R., Jian, S.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Kundu, A., Li, Y., Dellaert, F., Li, F., Rehg, J.M.: Joint semantic segmentation and 3D reconstruction from monocular video. In: European Conference on Computer Vision, pp. 703–718. Springer, Heidelberg (2014)
Google Scholar
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 239–248 (2016)
Google Scholar
Li, S.Z.: Markov random field models in computer vision. In: European Conference on Computer Vision, pp. 361–370. Springer, Heidelberg (1994)
Google Scholar
Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3194–3203 (2016)
Google Scholar
Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5162–5170 (2015)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Owen, A.B.: A robust hybrid of lasso and ridge regression. Contemp. Math. 443(7), 59–72 (2007)
Article MathSciNet Google Scholar
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. Mach. Learn. Mol. Mater. (2017)
Google Scholar
Pfister, T., Charles, J., Zissermann, A.: Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1913–1921 (2015)
Google Scholar
Ren, X., Bo, L., Fox, D.: RGB-(D) scene labeling: features and algorithms. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2759–2766 (2012)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer, Heidelberg (2015)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)
Google Scholar
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: European Conference on Computer Vision, pp. 746–760. Springer, Heidelberg (2012)
Google Scholar
Simonyan, K., Zissermann, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint (2014)
Google Scholar
Sutton, C., McCallum, A.: An introduction to conditional random fields. Found. Trends Mach. Learn. 4(4), 267–373 (2012)
Article Google Scholar
Szirányi, T., Zerubia, J., Czúni, L., Kato, Z.: Image segmentation using Markov random field model in fully parallel cellular network architectures. Real-Time Imaging 6(3), 195–211 (2000)
Article Google Scholar
Tao, M.W., Srinivasan, P.P., Malik, J., Rusinkiewicz, S., Ramamoorthi, R.: Depth from shading, defocus, and correspondence using light-field angular coherence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1940–1948 (2015)
Google Scholar
Taylor, J., Shotton, J., Sharp, T., Fitzgibbon, A.: The Vitruvian manifold: inferring dense correspondences for one-shot human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 103–110 (2012)
Google Scholar
Ullman, S.: The interpretation of structure from motion. Proc. R. Soc. Lond. B 203(1153), 405–426 (1979)
Article Google Scholar
Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B., Yuille, A.: Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2800–2809 (2015)
Google Scholar
Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M.: Shape-from-shading: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 21(8), 690–706 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Machine Perception Research Laboratory, MTA SZTAKI, Budapest, Hungary
Károly Harsányi, Attila Kiss, András Majdik & Tamas Sziranyi
Faculty of Transportation Engineering and Vehicle Engineering, BME, Budapest, Hungary
Tamas Sziranyi

Authors

Károly Harsányi
View author publications
You can also search for this author in PubMed Google Scholar
Attila Kiss
View author publications
You can also search for this author in PubMed Google Scholar
András Majdik
View author publications
You can also search for this author in PubMed Google Scholar
Tamas Sziranyi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Attila Kiss .

Editor information

Editors and Affiliations

Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wrocław, Poland
Kazimierz Choroś
Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wrocław, Poland
Marek Kopel
Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wrocław, Poland
Elżbieta Kukla
Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wrocław, Poland
Andrzej Siemiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Harsányi, K., Kiss, A., Majdik, A., Sziranyi, T. (2019). A Hybrid CNN Approach for Single Image Depth Estimation: A Case Study. In: Choroś, K., Kopel, M., Kukla, E., Siemiński, A. (eds) Multimedia and Network Information Systems. MISSI 2018. Advances in Intelligent Systems and Computing, vol 833. Springer, Cham. https://doi.org/10.1007/978-3-319-98678-4_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-98678-4_38
Published: 15 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98677-7
Online ISBN: 978-3-319-98678-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics