Skip to main content

A Hybrid CNN Approach for Single Image Depth Estimation: A Case Study

  • Conference paper
  • First Online:
Multimedia and Network Information Systems (MISSI 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 833))

Included in the following conference series:

Abstract

Three-dimensional scene understanding is an emerging field in many real-world applications. Autonomous driving, robotics, and continuous real-time tracking are hot topics within the engineering society. One essential component of this is to develop faster and more reliable algorithms being capable of predicting depths from RGB images. Generally, it is easier to install a system with fewer cameras because it requires less calibration. Thus, our aim is to develop a strategy for predicting the depth on a single image as precisely as possible from one point of view. There are existing methods for this problem with promising results. The goal of this paper is to advance the state-of-the-art in the field of single-image depth prediction using convolutional neural networks. In order to do so, we modified an existing deep neural network to get improved results. The proposed architecture contains additional side-to-side connections between the encoding and decoding branches.

The research was supported by the Hungarian Scientific Research Fund (No. NKFIH OTKA K-120499 and KH-126513) and the BME-Artificial Intelligence FIKP grant of EMMI (BME FIKP-MI/FM).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alexander, E., Guo, Q., Koppal, S., Gortler, S.J., Zickler, T.: Focal flow: velocity and depth from differential defocus through motion. Int. J. Comput. Vis. 1–22 (2017)

    Google Scholar 

  2. Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658 (2015)

    Google Scholar 

  3. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems, pp. 2366–2374 (2014)

    Google Scholar 

  4. Grossmann, P.: Depth from focus. Pattern Recognit. Lett. 5(1), 63–69 (1987)

    Article  Google Scholar 

  5. Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: European Conference on Computer Vision, pp. 345–360. Springer, Heidelberg (2014)

    Google Scholar 

  6. He, K., Zhang, X., Shaoqing, R., Jian, S.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  7. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  8. Kundu, A., Li, Y., Dellaert, F., Li, F., Rehg, J.M.: Joint semantic segmentation and 3D reconstruction from monocular video. In: European Conference on Computer Vision, pp. 703–718. Springer, Heidelberg (2014)

    Google Scholar 

  9. Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 239–248 (2016)

    Google Scholar 

  10. Li, S.Z.: Markov random field models in computer vision. In: European Conference on Computer Vision, pp. 361–370. Springer, Heidelberg (1994)

    Google Scholar 

  11. Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3194–3203 (2016)

    Google Scholar 

  12. Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5162–5170 (2015)

    Google Scholar 

  13. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

    Google Scholar 

  14. Owen, A.B.: A robust hybrid of lasso and ridge regression. Contemp. Math. 443(7), 59–72 (2007)

    Article  MathSciNet  Google Scholar 

  15. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. Mach. Learn. Mol. Mater. (2017)

    Google Scholar 

  16. Pfister, T., Charles, J., Zissermann, A.: Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1913–1921 (2015)

    Google Scholar 

  17. Ren, X., Bo, L., Fox, D.: RGB-(D) scene labeling: features and algorithms. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2759–2766 (2012)

    Google Scholar 

  18. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer, Heidelberg (2015)

    Google Scholar 

  19. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  20. Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)

    Google Scholar 

  21. Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: European Conference on Computer Vision, pp. 746–760. Springer, Heidelberg (2012)

    Google Scholar 

  22. Simonyan, K., Zissermann, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint (2014)

    Google Scholar 

  23. Sutton, C., McCallum, A.: An introduction to conditional random fields. Found. Trends Mach. Learn. 4(4), 267–373 (2012)

    Article  Google Scholar 

  24. Szirányi, T., Zerubia, J., Czúni, L., Kato, Z.: Image segmentation using Markov random field model in fully parallel cellular network architectures. Real-Time Imaging 6(3), 195–211 (2000)

    Article  Google Scholar 

  25. Tao, M.W., Srinivasan, P.P., Malik, J., Rusinkiewicz, S., Ramamoorthi, R.: Depth from shading, defocus, and correspondence using light-field angular coherence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1940–1948 (2015)

    Google Scholar 

  26. Taylor, J., Shotton, J., Sharp, T., Fitzgibbon, A.: The Vitruvian manifold: inferring dense correspondences for one-shot human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 103–110 (2012)

    Google Scholar 

  27. Ullman, S.: The interpretation of structure from motion. Proc. R. Soc. Lond. B 203(1153), 405–426 (1979)

    Article  Google Scholar 

  28. Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B., Yuille, A.: Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2800–2809 (2015)

    Google Scholar 

  29. Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M.: Shape-from-shading: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 21(8), 690–706 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Attila Kiss .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Harsányi, K., Kiss, A., Majdik, A., Sziranyi, T. (2019). A Hybrid CNN Approach for Single Image Depth Estimation: A Case Study. In: Choroś, K., Kopel, M., Kukla, E., Siemiński, A. (eds) Multimedia and Network Information Systems. MISSI 2018. Advances in Intelligent Systems and Computing, vol 833. Springer, Cham. https://doi.org/10.1007/978-3-319-98678-4_38

Download citation

Publish with us

Policies and ethics