Semi-Supervised Monocular Depth Estimation Based on Semantic Supervision

Yue, Min; Fu, Guangyuan; Wu, Ming; Wang, Hongqiao

doi:10.1007/s10846-020-01205-0

Semi-Supervised Monocular Depth Estimation Based on Semantic Supervision

Published: 30 May 2020

Volume 100, pages 455–463, (2020)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Min Yue ORCID: orcid.org/0000-0003-0808-2963¹,
Guangyuan Fu¹,
Ming Wu¹ &
…
Hongqiao Wang¹

455 Accesses
12 Citations
Explore all metrics

Abstract

Monocular depth estimation by unsupervised learning is a potential strategy, which is mainly self-supervised by calculating view reconstruction loss from stereo pairs or monocular sequences. However, most existing works only consider the geometric information during training, without using semantics. We propose a semantic monocular depth estimation (SE-Net), a neural network framework that estimates depth using semantic information and video sequences. The whole framework is semi-supervised, because we take advantage of labelled semantic ground truth data. In view of the structural consistency between the semantically segmented image and the depth map, we first perform semantic segmentation on the image, and then use the semantic labels to guide the construction of the depth estimation network. Experiments on the KITTI dataset show that learning semantic information from images can effectively improve the effect of monocular depth estimation, and SE-Net is superior to the most advanced methods in depth estimation accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Geometry Meets Semantics for Semi-supervised Monocular Depth Estimation

Transferring knowledge from monocular completion for self-supervised monocular depth estimation

Article 24 July 2021

Hierarchical Multi-scale Architecture Search for Self-supervised Monocular Depth Estimation

References

Mur-Artal, R., Montiel, J.M.M., Tardós, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)
Article Google Scholar
Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2017)
Article Google Scholar
G. Klein and D. Murray, "Parallel Tracking and Mapping for Small AR Workspaces," Presented at the Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, 2007
Forster, C., Zhang, Z., Gassner, M., Werlberger, M., Scaramuzza, D.: SVO: Semidirect visual Odometry for monocular and multicamera systems. IEEE Trans. Robot. 33(2), 249–265 (2017)
Article Google Scholar
J. Engel, T. Schöps, and D. Cremers, "LSD-SLAM: Large-Scale Direct Monocular SLAM," in Computer Vision – ECCV 2014, Cham, 2014, pp. 834–849: Springer International Publishing
J. Jiao, Y. Cao, Y. Song, and R. Lau, "Look deeper into depth: Monocular depth estimation with semantic booster and attention-driven loss," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 53–69
W. Chen, S. Qian, and J. Deng, "Learning single-image depth from videos using quality assessment networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 5604–5613
Z. Zhang, Z. Cui, C. Xu, Z. Jie, X. Li, and J. Yang, "Joint task-recursive learning for semantic segmentation and depth estimation," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 235–251
A. Kendall, M. Grimes, and R. Cipolla, "PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization," in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 2938–2946
D. Eigen, C. Puhrsch, and R. Fergus, "Depth Map Prediction from a Single Image using a Multi-Scale Deep Network," in Advances in Neural Information Processing Systems 27, 2014, pp. 2366--2374: Curran Associates, Inc.
D. Eigen and R. Fergus, "Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture," The IEEE International Conference on Computer Vision (ICCV), 2015
S. Wang, R. Clark, H. Wen, and N. Trigoni, "DeepVO: Towards End-to-End Visual Odometry with Deep Recurrent Convolutional Neural Networks," in 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017, Pp. 2043-2050
B. Ummenhofer et al., "DeMoN: Depth and Motion Network for Learning Monocular Stereo," in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
C. Godard, O. Mac Aodha, and G. J. Brostow, "Unsupervised monocular depth estimation with left-right consistency," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 270–279
T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, "Unsupervised Learning of Depth and Ego-Motion from Video," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6612–6619
Z. Yin and J. Shi, "GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, Pp. 1983-1992
A. Wong and S. Soatto, "Bilateral cyclic constraint and adaptive regularization for unsupervised monocular depth prediction," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 5644–5653
Y. Kuznietsov, J. Stuckler, and B. Leibe, "Semi-Supervised Deep Learning for Monocular Depth Map Prediction," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2215–2223
N. Yang, R. Wang, J. Stückler, and D. Cremers, "Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VIII," 2018, pp. 835–852
V. Nekrasov, T. Dharmasiri, A. Spek, T. Drummond, and I. Reid, "Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations," 2018
Google Scholar
A. Atapour-Abarghouei and T. P. Breckon, "Veritatem dies aperit-temporally consistent depth prediction enabled by a multi-task geometric and semantic scene understanding approach," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3373–3384
P. Z. Ramirez, M. Poggi, F. Tosi, S. Mattoccia, and L. Di Stefano, "Geometry meets semantics for semi-supervised monocular depth estimation," in Asian Conference on Computer Vision, 2018, pp. 298–313: Springer
A. Atapour-Abarghouei and T. P. Breckon, "Monocular segment-wise depth: Monocular depth estimation based on a semantic segmentation prior," in 2019 IEEE International Conference on Image Processing (ICIP), 2019, pp. 4295–4299: IEEE
G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, "The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3234–3243
G. Lin, A. Milan, C. Shen, and I. Reid, RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation. 2017, pp. 5168–5177
Fehn, C.: Depth-image-based rendering (DIBR), compression and transmission for a new approach on 3D-TV. Proc. SPIE. 5291, 05/01 (2004)
Article Google Scholar
T. Zhou, S. Tulsiani, W. Sun, J. Malik, and A. Efros, View Synthesis by Appearance Flow. 2016, pp. 286–301
Z. Wang, A. Bovik, H. R. Sheikh, and E. Simoncelli, "Image quality assessment: From error visibility to structural similarity," IEEE Trans. Image Process., vol. 13, pp. 600–612, 01/01 2014
M. Abadi et al., TensorFlow : Large-Scale Machine Learning on Heterogeneous Distributed Systems. 2015
A. Geiger, Are we ready for autonomous driving? The KITTI vision benchmark suite. 2012, pp. 3354–3361
.D. Kingma and J. Ba, "Adam: a Method for Stochastic Optimization," International Conference on Learning Representations, 12/22 2014
Liu, F., Shen, C., Lin, G., Reid, I.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern Anal. Mach. Intell. 38, 02/25 (2015)
Google Scholar
R. Garg, V. K. B G, G. Carneiro, and I. Reid, Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue. 2016

Download references

Author information

Authors and Affiliations

Xi’an research institute of high-tech, Shaanxi, 710025, China
Min Yue, Guangyuan Fu, Ming Wu & Hongqiao Wang

Authors

Min Yue
View author publications
You can also search for this author in PubMed Google Scholar
Guangyuan Fu
View author publications
You can also search for this author in PubMed Google Scholar
Ming Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hongqiao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Min Yue.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yue, M., Fu, G., Wu, M. et al. Semi-Supervised Monocular Depth Estimation Based on Semantic Supervision. J Intell Robot Syst 100, 455–463 (2020). https://doi.org/10.1007/s10846-020-01205-0

Download citation

Received: 12 December 2019
Accepted: 16 April 2020
Published: 30 May 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s10846-020-01205-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-Supervised Monocular Depth Estimation Based on Semantic Supervision

Abstract

Access this article

Similar content being viewed by others

Geometry Meets Semantics for Semi-supervised Monocular Depth Estimation

Transferring knowledge from monocular completion for self-supervised monocular depth estimation

Hierarchical Multi-scale Architecture Search for Self-supervised Monocular Depth Estimation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semi-Supervised Monocular Depth Estimation Based on Semantic Supervision

Abstract

Access this article

Similar content being viewed by others

Geometry Meets Semantics for Semi-supervised Monocular Depth Estimation

Transferring knowledge from monocular completion for self-supervised monocular depth estimation

Hierarchical Multi-scale Architecture Search for Self-supervised Monocular Depth Estimation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation