Skip to main content

CNNapsule: A Lightweight Network with Fusion Features for Monocular Depth Estimation

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2021 (ICANN 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12891))

Included in the following conference series:

  • 3063 Accesses

Abstract

Depth estimation from 2D images is a fundamental task for many applications, for example, robotics and 3D reconstruction. Because of the weak ability to perspective transformation, the existing CNN methods have limited generalization performance and large number of parameters. To solve these problems, we propose CNNapsule network for monocular depth estimation. Firstly, we extract CNN and Matrix Capsule features. Next, we propose a Fusion Block to combine the CNN with Matrix Capsule features. Then the skip connections are used to transmit the extracted and fused features. Moreover, we design the loss function with the consideration of long-tailed distribution, gradient and structural similarity. At last, we compare our method with the existing methods on NYU Depth V2 dataset. The experiment shows that our method has higher accuracy than the traditional methods and similar networks without pre-trained. Compared with the state-of-the-art, the trainable parameters of our method decrease by 65%. In the test experiment of images collected in the Internet and real images collected by mobile phone, the generalization performance of our method is further verified.

Supported by the National Natural Science Foundation of China under grant No. 61672084 and the Fundamental Research Funds for the Central Universities under grant No. XK1802-4.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gao, W., Wang, K., Ding, W., Gao, F., Qin, T., Shen, S.: Autonomous aerial robot using dual-fisheye cameras. J. Robot. Syst. 37(4), 497–514 (2020)

    Google Scholar 

  2. Saleem, N.H., Chien, H.J., Rezaei, M., Klette, R.: Effects of ground manifold modeling on the accuracy of Stixel calculations. IEEE Trans. Intell. Transp. Syst. 20(10), 3675–3687 (2020)

    Article  Google Scholar 

  3. Civera, J., Davison, A.J., Montiel, J.M.M.: Inverse Depth parameterization for monocular SLAM. IEEE Trans. Robot. 24(5), 932–945 (2008)

    Article  Google Scholar 

  4. Ping, J., Thomas, B.J., Baumeister, J., Guo, J., Weng, D., Liu, Y.: Effects of shading model and opacity on depth perception in optical see-through augmented reality. J. Soc. Inform. Display 28, 892–904 (2020)

    Article  Google Scholar 

  5. Yang, X., Zhou, L., Jiang, H., Tang, Z.: Mobile3DRecon: real-time monocular 3D reconstruction on a mobile phone. IEEE Trans. Visual Comput. Graphics 26, 3446–3456 (2020)

    Article  Google Scholar 

  6. Hazirbas, C., Ma, L., Domokos, C., Cremers, D.: FuseNet: incorporating depth into semantic segmentation via fusion-based CNN architecture. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10111, pp. 213–228. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54181-5_14

    Chapter  Google Scholar 

  7. Atapour-Abarghouei, A.: Real-time monocular depth estimation using synthetic data with domain adaptation. In: IEEE/CVF Conference on Computer Vision & Pattern Recognition (2018)

    Google Scholar 

  8. Ji, R.R., et al.: Semi-Supervised adversarial monocular depth estimation. IEEE Trans. Pattern Anal. Mach. Intell. 42(10), 2410–2422 (2020)

    Article  Google Scholar 

  9. Zhang, M.L., Ye, X.C., Xin, F.: Unsupervised detail-preserving network for high quality monocular depth estimation. Neurocomputing 404, 1–13 (2020)

    Article  Google Scholar 

  10. Huang, K., Qu, X., Chen, S., Chen, Z.: Superb monocular depth estimation based on transfer learning and surface normal guidance. Sensors 20(17), 4856 (2020)

    Article  Google Scholar 

  11. Konrad, J., Wang, M., Ishwar, P.: 2D-to-3D image conversion by learning depth from examples. In: Proceedings of Conference on Computer Vision and Pattern Recognition Workshops, pp. 16–22 (2012)

    Google Scholar 

  12. Li, N.B., Shen, N.C., Dai, N.Y., Hengel, A.V.D., He, N.M.: Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1119–1127 (2015)

    Google Scholar 

  13. Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: IEEE International Conference on Computer Vision (2015)

    Google Scholar 

  14. Ye, X. C., Chen, S. D., Xu, R.: DPNet: detail-preserving network for high quality monocular depth estimation. Pattern Recogn. 109 (2021)

    Google Scholar 

  15. Fu, H., Gong, M., Wang, C., Batmanghelich, N., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2002–2011 (2018)

    Google Scholar 

  16. Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv: 1812.11941 (2018)

  17. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NIPS (2014)

    Google Scholar 

  18. Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 239–248 (2016)

    Google Scholar 

  19. Xu, D., Ricci, E., Ouyang, W., Wang, X., Sebe, N.: Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5354–5362 (2017)

    Google Scholar 

  20. Hao, Z., Li, Y., You, S., Lu, F.: Detail preserving depth estimation from a single image using attention guided networks. In: 2018 International Conference on 3D Vision (3DV), pp. 304–313 (2018)

    Google Scholar 

  21. Yeh, C.H., Huang, Y.P., Lin, C.Y., Chang, C.Y.: Transfer2Depth: dual attention network with transfer learning for monocular depth estimation. IEEE Access 99, 1–1 (2020)

    Google Scholar 

  22. Hinton, G. E., Sabour, S., Frosst, N.: Matrix capsules with EM routing. In: International Conference on Learning Representations (2018)

    Google Scholar 

  23. Jiao, J., Cao, Y., Song, Y., Lau, R.: Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 55–71. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_4

    Chapter  Google Scholar 

  24. Ummenhofer, B., et al.: DeMoN: depth and motion network for learning monocular stereo. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5622–5631 (2017)

    Google Scholar 

  25. Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54

    Chapter  Google Scholar 

  26. Zheng, C., Cham, T.J., Cai, J.: T2Net: synthetic-to-realistic translation for solving single-image depth estimation tasks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 767–783 (2018)

    Google Scholar 

  27. Liu, M., Salzmann, M., He, X.: Discrete-continuous depth estimation from a single image. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 716–723 (2014)

    Google Scholar 

  28. Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B., Yuille, A.L.: Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2800–2809 (2015)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under grant No. 61672084 and the Fundamental Research Funds for the Central Universities under grant No. XK1802-4.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haijiang Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Y., Zhu, H., Liu, M. (2021). CNNapsule: A Lightweight Network with Fusion Features for Monocular Depth Estimation. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12891. Springer, Cham. https://doi.org/10.1007/978-3-030-86362-3_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86362-3_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86361-6

  • Online ISBN: 978-3-030-86362-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics