CNNapsule: A Lightweight Network with Fusion Features for Monocular Depth Estimation

Wang, Yinchu; Zhu, Haijiang; Liu, Mengze

doi:10.1007/978-3-030-86362-3_41

Yinchu Wang¹²,
Haijiang Zhu¹² &
Mengze Liu¹³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12891))

Included in the following conference series:

International Conference on Artificial Neural Networks

3063 Accesses

Abstract

Depth estimation from 2D images is a fundamental task for many applications, for example, robotics and 3D reconstruction. Because of the weak ability to perspective transformation, the existing CNN methods have limited generalization performance and large number of parameters. To solve these problems, we propose CNNapsule network for monocular depth estimation. Firstly, we extract CNN and Matrix Capsule features. Next, we propose a Fusion Block to combine the CNN with Matrix Capsule features. Then the skip connections are used to transmit the extracted and fused features. Moreover, we design the loss function with the consideration of long-tailed distribution, gradient and structural similarity. At last, we compare our method with the existing methods on NYU Depth V2 dataset. The experiment shows that our method has higher accuracy than the traditional methods and similar networks without pre-trained. Compared with the state-of-the-art, the trainable parameters of our method decrease by 65%. In the test experiment of images collected in the Internet and real images collected by mobile phone, the generalization performance of our method is further verified.

Supported by the National Natural Science Foundation of China under grant No. 61672084 and the Fundamental Research Funds for the Central Universities under grant No. XK1802-4.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gao, W., Wang, K., Ding, W., Gao, F., Qin, T., Shen, S.: Autonomous aerial robot using dual-fisheye cameras. J. Robot. Syst. 37(4), 497–514 (2020)
Google Scholar
Saleem, N.H., Chien, H.J., Rezaei, M., Klette, R.: Effects of ground manifold modeling on the accuracy of Stixel calculations. IEEE Trans. Intell. Transp. Syst. 20(10), 3675–3687 (2020)
Article Google Scholar
Civera, J., Davison, A.J., Montiel, J.M.M.: Inverse Depth parameterization for monocular SLAM. IEEE Trans. Robot. 24(5), 932–945 (2008)
Article Google Scholar
Ping, J., Thomas, B.J., Baumeister, J., Guo, J., Weng, D., Liu, Y.: Effects of shading model and opacity on depth perception in optical see-through augmented reality. J. Soc. Inform. Display 28, 892–904 (2020)
Article Google Scholar
Yang, X., Zhou, L., Jiang, H., Tang, Z.: Mobile3DRecon: real-time monocular 3D reconstruction on a mobile phone. IEEE Trans. Visual Comput. Graphics 26, 3446–3456 (2020)
Article Google Scholar
Hazirbas, C., Ma, L., Domokos, C., Cremers, D.: FuseNet: incorporating depth into semantic segmentation via fusion-based CNN architecture. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10111, pp. 213–228. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54181-5_14
Chapter Google Scholar
Atapour-Abarghouei, A.: Real-time monocular depth estimation using synthetic data with domain adaptation. In: IEEE/CVF Conference on Computer Vision & Pattern Recognition (2018)
Google Scholar
Ji, R.R., et al.: Semi-Supervised adversarial monocular depth estimation. IEEE Trans. Pattern Anal. Mach. Intell. 42(10), 2410–2422 (2020)
Article Google Scholar
Zhang, M.L., Ye, X.C., Xin, F.: Unsupervised detail-preserving network for high quality monocular depth estimation. Neurocomputing 404, 1–13 (2020)
Article Google Scholar
Huang, K., Qu, X., Chen, S., Chen, Z.: Superb monocular depth estimation based on transfer learning and surface normal guidance. Sensors 20(17), 4856 (2020)
Article Google Scholar
Konrad, J., Wang, M., Ishwar, P.: 2D-to-3D image conversion by learning depth from examples. In: Proceedings of Conference on Computer Vision and Pattern Recognition Workshops, pp. 16–22 (2012)
Google Scholar
Li, N.B., Shen, N.C., Dai, N.Y., Hengel, A.V.D., He, N.M.: Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1119–1127 (2015)
Google Scholar
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: IEEE International Conference on Computer Vision (2015)
Google Scholar
Ye, X. C., Chen, S. D., Xu, R.: DPNet: detail-preserving network for high quality monocular depth estimation. Pattern Recogn. 109 (2021)
Google Scholar
Fu, H., Gong, M., Wang, C., Batmanghelich, N., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2002–2011 (2018)
Google Scholar
Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv: 1812.11941 (2018)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NIPS (2014)
Google Scholar
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 239–248 (2016)
Google Scholar
Xu, D., Ricci, E., Ouyang, W., Wang, X., Sebe, N.: Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5354–5362 (2017)
Google Scholar
Hao, Z., Li, Y., You, S., Lu, F.: Detail preserving depth estimation from a single image using attention guided networks. In: 2018 International Conference on 3D Vision (3DV), pp. 304–313 (2018)
Google Scholar
Yeh, C.H., Huang, Y.P., Lin, C.Y., Chang, C.Y.: Transfer2Depth: dual attention network with transfer learning for monocular depth estimation. IEEE Access 99, 1–1 (2020)
Google Scholar
Hinton, G. E., Sabour, S., Frosst, N.: Matrix capsules with EM routing. In: International Conference on Learning Representations (2018)
Google Scholar
Jiao, J., Cao, Y., Song, Y., Lau, R.: Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 55–71. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_4
Chapter Google Scholar
Ummenhofer, B., et al.: DeMoN: depth and motion network for learning monocular stereo. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5622–5631 (2017)
Google Scholar
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Chapter Google Scholar
Zheng, C., Cham, T.J., Cai, J.: T2Net: synthetic-to-realistic translation for solving single-image depth estimation tasks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 767–783 (2018)
Google Scholar
Liu, M., Salzmann, M., He, X.: Discrete-continuous depth estimation from a single image. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pp. 716–723 (2014)
Google Scholar
Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B., Yuille, A.L.: Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2800–2809 (2015)
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under grant No. 61672084 and the Fundamental Research Funds for the Central Universities under grant No. XK1802-4.

Author information

Authors and Affiliations

College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China
Yinchu Wang & Haijiang Zhu
PetroChina Jidong Oilfield Company, Hebei, China
Mengze Liu

Authors

Yinchu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Haijiang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Mengze Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haijiang Zhu .

Editor information

Editors and Affiliations

Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
iMotions A/S, Copenhagen, Denmark
Paolo Masulli
University of Tübingen, Tübingen, Baden-Württemberg, Germany
Sebastian Otte
Universität Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Zhu, H., Liu, M. (2021). CNNapsule: A Lightweight Network with Fusion Features for Monocular Depth Estimation. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12891. Springer, Cham. https://doi.org/10.1007/978-3-030-86362-3_41

Download citation

DOI: https://doi.org/10.1007/978-3-030-86362-3_41
Published: 07 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86361-6
Online ISBN: 978-3-030-86362-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics