Self-supervised monocular depth estimation with occlusion mask and edge awareness

Zhou, Shi; Zhu, Miaomiao; Li, Zhen; Li, He; Mizumachi, Mitsunori; Zhang, Lifeng

doi:10.1007/s10015-021-00685-z

Self-supervised monocular depth estimation with occlusion mask and edge awareness

Original Article
Published: 26 May 2021

Volume 26, pages 354–359, (2021)
Cite this article

Artificial Life and Robotics Aims and scope Submit manuscript

Shi Zhou¹,
Miaomiao Zhu¹,
Zhen Li¹,
He Li²,
Mitsunori Mizumachi¹ &
…
Lifeng Zhang¹

499 Accesses
3 Citations
Explore all metrics

Abstract

Depth estimation is one of the basic and important tasks in 3D vision. Recently, many works have been done in self-supervised depth estimation based on geometric consistency between frames. However, these research works still have difficulties in ill-posed areas, such as occlusion areas and texture-less areas. This work proposed a novel self-supervised monocular depth estimation method based on occlusion mask and edge awareness to overcome these difficulties. The occlusion mask divides the image into two classes, making the training of the network more reasonable. The edge awareness loss function is designed based on the edge obtained by the traditional method, so that the method has strong robustness to various lighting conditions. Furthermore, we evaluated the proposed method on the KITTI datasets. The occlusion mask and edge awareness are both beneficial to find corresponding points in ill-posed areas.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Meng-Hao Guo, Tian-Xing Xu, … Shi-Min Hu

References

Tripathi N, Sistu G, Yogamani S (2020) Trained trajectory based automated parking system using visual slam. arXiv preprint arXiv, pp. 1–6
Xu D, Chen Y, Lin C et al (2012) Real-time dynamic gesture recognition system based on depth perception for robot navigation. In: Proceedings of the IEEE conference on robotics and biomimetics, pp. 689–694
Song S, Xiao J (2014) Sliding shapes for 3D object detection in depth images. In: European conference on computer vision, pp. 634–651
Garg R, Bg VK, Carneiro G et al (2016) Unsupervised cnn for single view depth estimation: geometry to the rescue. In: European conference on computer vision (ECCV), pp. 740–756
Li L, Zhang S, Yu X et al (2018) PMSC: patch match-based superpixel cut for accurate stereo matching. IEEE Trans Circuits Syst Video Technol, pp. 679–692
Clevert DA, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv, pp. 1–14
Guizilini V, Li J, Ambrus R et al (2020) Robust semi-supervised monocular depth estimation with reprojected distances. In: Conference on robot learning, pp. 503–512
Godard C, Mac Aodha O, Firman M et al (2019) Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE international conference on computer vision, pp. 3828–3838
Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE international conference on computer vision, pp. 2650–2658
Zhou T, Brown M, Snavely N et al (2017) Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 1851–1858
Yang Z, Wang P, Xu W et al (2017) Unsupervised learning of geometry with edge-aware depth-normal consistency. arXiv preprint arXiv, pp. 1–8
Mahjourian R, Wicke M, Angelova A (2018) Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 5667–5675
Wang C, Miguel Buenaposada J, Zhu R (2018) Learning depth from monocular videos using direct methods. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 2022–2030
Ranjan A, Jampani V, Balles L (2019) Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 12240–12249
Luo C, Yang Z, Wang P (2019) Every pixel counts++: joint learning of geometry and motion with 3d holistic understanding. IEEE Trans Pattern Anal Machine Intell, pp. 2624–2641
Casser V, Pirk S, Mahjourian R et al (2019) Depth prediction without the sensors: leveraging structure for unsupervised learning from monocular videos. In: Proceedings of the AAAI conference on artificial intelligence, pp. 8001–8008
Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 270–279

Download references

Author information

Authors and Affiliations

Kyushu Institute of Technology, Kitakyushu, Fukuoka, 804-8550, Japan
Shi Zhou, Miaomiao Zhu, Zhen Li, Mitsunori Mizumachi & Lifeng Zhang
Northeastern University, Shenyang, Liaoning, 110819, China
He Li

Authors

Shi Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Miaomiao Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Li
View author publications
You can also search for this author in PubMed Google Scholar
He Li
View author publications
You can also search for this author in PubMed Google Scholar
Mitsunori Mizumachi
View author publications
You can also search for this author in PubMed Google Scholar
Lifeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shi Zhou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Cite this article

Zhou, S., Zhu, M., Li, Z. et al. Self-supervised monocular depth estimation with occlusion mask and edge awareness. Artif Life Robotics 26, 354–359 (2021). https://doi.org/10.1007/s10015-021-00685-z

Download citation

Received: 11 February 2021
Accepted: 10 May 2021
Published: 26 May 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s10015-021-00685-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Self-supervised monocular depth estimation with occlusion mask and edge awareness

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

Attention mechanisms in computer vision: A survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

About this article

Cite this article

Keywords

Navigation

Self-supervised monocular depth estimation with occlusion mask and edge awareness

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

Attention mechanisms in computer vision: A survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

About this article

Cite this article

Share this article

Keywords

Search

Navigation