Relative order constraint for monocular depth estimation

Liu, Chunpu; Zuo, Wangmeng; Yang, Guanglei; Li, Wanlong; Wen, Feng; Zhang, Hongbo; Zang, Tianyi

doi:10.1007/s10489-023-04851-7

Relative order constraint for monocular depth estimation

Published: 29 July 2023

Volume 53, pages 24804–24821, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Chunpu Liu ORCID: orcid.org/0000-0001-5463-0807¹,
Wangmeng Zuo¹,
Guanglei Yang¹,
Wanlong Li²,
Feng Wen²,
Hongbo Zhang² &
…
Tianyi Zang¹

388 Accesses
Explore all metrics

Abstract

Monocular depth estimation, which is playing an increasingly important role in 3D scene understanding, has been attracting increasing attention in the computer vision field in recent years. The latest monocular depth estimation methods based on deep learning have achieved significant performance by exploring various network architectures. However, compared with designing larger and more complex model architectures for monocular depth estimation, leveraging scene geometry relations to boost the performance of monocular depth estimation models has been less studied. To explore further utilization of scene geometry relations on monocular depth estimation, we propose a geometry-aware constraint that makes use of relative order information to improve the performance of monocular depth estimation models. Specifically, we first design a relative order descriptor (ROD) to construct the relative order description on single scene location. Then, based on the ROD, the relative order map (ROM) is built to represent the relative order information of the whole scene. Finally, a loss term relative order loss (ROL), which relies on ROM to supervise the training process of the monocular depth estimation model is presented. Our proposed method can help monocular depth estimation models to predict more accurate depth maps. Moreover, with the geometry constraint from our method, the monocular depth estimation model can provide prediction results where high-quality scene structure can be better preserved. We conduct extensive experiments on the popular datasets NYU Depth V2 and KITTI. The experimental results demonstrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

Digging into the multi-scale structure for a more refined depth map and 3D reconstruction

Article 03 February 2020

Monocular depth estimation based on deep learning: An overview

Article 10 June 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

We only use the data from public datasets mention in Section. 4 and no extra data which is self-generated or self-collected is utilized.

References

Eigen D, Fergus R (2015) “Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture.” In Proceedings of the IEEE international conference on computer vision, pp 2650–2658
Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) “Deeper depth prediction with fully convolutional residual networks,” In 2016 Fourth international conference on 3D vision (3DV), pp 239–248. IEEE
Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) “Deep ordinal regression network for monocular depth estimation.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2002–2011
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) “Attention is all you need.” Adv Neural Inf Process Syst 30
Yuan W, Gu X, Dai Z, Zhu S, Tan P (2022) “Newcrfs: Neural window fully-connected crfs for monocular depth estimation.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Kim D, Ga W, Ahn P, Joo D, Chun S, Kim J (2022) “Global-local path networks for monocular depth estimation with vertical cutdepth.” arXiv:2201.07436
Lee JH, Han MK, Ko DW, Suh IH (2019) “From big to small: Multi-scale local planar guidance for monocular depth estimation.” arXiv:1907.10326
Qi X, Liao R, Liu Z, Urtasun R, Jia J (2018) “Geonet: geometric neural network for joint depth and surface normal estimation.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 283–291
Yin W, Liu Y, Shen C, Yan Y, (2019) “Enforcing geometric constraints of virtual normal for depth prediction.” In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 5684–5693
Eigen D, Puhrsch C, Fergus R (2014)“Depth map prediction from a single image using a multi-scale deep network.” Adv Neural Inf Process Syst 27
Silberman N, Hoiem D, Kohli P, Fergus R (2012)“Indoor segmentation and support inference from rgbd images.” In European conference on computer vision, pp 746–760. Springer
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237
Masoumian A, Rashwan HA, Cristiano J, Asif MS, Puig D (2022) Monocular depth estimation using deep learning: a review. Sensors 22(14):5353
Vyas P, Saxena C, Badapanda A, Goswami A (2022) “Outdoor monocular depth estimation: a research review.” arXiv:2205.01399
He K, Zhang X, Ren S, Sun J (2016) “Deep residual learning for image recognition.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Xu D, Ricci E, Ouyang W, Wang X, Sebe N (2017) “Multi-scale continuous crfs as sequential deep networks for monocular depth estimation.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5354–5362
Ricci E, Ouyang W, Wang X, Sebe N et al (2018) Monocular depth estimation using multi-scale continuous crfs as sequential deep networks. IEEE Trans Pattern Anal Mach Intell 41(6):1426–1440
Google Scholar
Cao Y, Wu Z, Shen C (2017) Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Trans Circuits Syst Video Technol 28(11):3174–3182
Article Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) “An image is worth 16x16 words: transformers for image recognition at scale.” In International Conference on Learning Representations
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) “Swin transformer: hierarchical vision transformer using shifted windows.” In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10 012–10 022
Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, Tang Y, Xiao A, Xu C, Xu Y et al (2022) A survey on vision transformer. IEEE Trans Pattern Anal Mach Intell 45(1):87–110
Article Google Scholar
Bhat SF, Alhashim I, Wonka P (2021) “Adabins: depth estimation using adaptive bins.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4009–4018
Ranftl R, Lasinger K, Hafner D, Schindler K, Koltun V (2020) “Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer.” IEEE Trans Pattern Anal Mach Intell (TPAMI)
Mo Y, Wu Y, Yang X, Liu F, Liao Y (2022) Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 493:626–646
Article Google Scholar
Wang Y, Zhou W, Lv Q, Yao G (2022) “Metricmask: single category instance segmentation by metric learning.” Neurocomputing
Gao B, Zhao Y, Zhang F, Luo B, Yang C (2022) Video object segmentation based on multi-level target models and feature integration. Neurocomputing 492:396–407
Article Google Scholar
Zhang Z, Cui Z, Xu C, Jie Z, Li X, Yang J (2018) “Joint task-recursive learning for semantic segmentation and depth estimation.” In Proceedings of the European Conference on Computer Vision (ECCV), pp 235–251
Kwak Dh, Lee Sh (2022) A novel method for estimating monocular depth using cycle gan and segmentation. Sensors 20(9):2567
Article Google Scholar
He L, Lu J, Wang G, Song S, Zhou J (2021) Sosd-net: joint semantic object segmentation and depth estimation from monocular images. Neurocomputing 440:251–263
Li R, Xue D, Su S, He X, Mao Q, Zhu Y, Sun J, Zhang Y (2023) “Learning depth via leveraging semantics: self-supervised monocular depth estimation with both implicit and explicit semantic guidance.” Pattern Recognit 109297
Benkirane FE, Crombez N, Ruichek Y, Hilaire V (2023) Integration of ontology reasoning-based monocular cues in deep learning modeling for single image depth estimation in urban driving scenarios. Knowl-Based Syst 260:110184
Article Google Scholar
Zhou T, Brown M, Snavely N, Lowe DG (2017) “Unsupervised learning of depth and ego-motion from video.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1851–1858
Zhan H, Garg R, Weerasekera CS, Li K, Agarwal H, Reid I (2018) “Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 340–349
Godard C, Mac Aodha O, Firman M, Brostow GJ (2019) “Digging into self-supervised monocular depth estimation.” In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 3828–3838
Zhao C, Tang Y, Sun Q (2022) Unsupervised monocular depth estimation in highly complex environments. IEEE Trans Emerg Topics Comput Intell 6(5):1237–1246
Article Google Scholar
Zhou Z, Dong Q (2022) “Self-distilled feature aggregation for self-supervised monocular depth estimation.” In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part I, pp 709–726. Springer
Masoumian A, Rashwan HA, Abdulwahab S, Cristiano J, Asif MS, Puig D (2023) Gcndepth: self-supervised monocular depth estimation based on graph convolutional network. Neurocomputing 517:81–92
He M, Hui L, Bian Y, Ren J, Xie J, Yang J (2022) “Ra-depth: resolution adaptive self-supervised monocular depth estimation.” In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVII, pp 565–581. Springer
Wofk D, Ma F, Yang TJ, Karaman S, Sze V (2019) “Fastdepth: fast monocular depth estimation on embedded systems.” In 2019 International Conference on Robotics and Automation (ICRA), pp 6101–6108. IEEE
Liu X, Wei W, Liu C, Peng Y, Huang J, Li J (2023) “Real-time monocular depth estimation merging vision transformers on edge devices for aiot.” IEEE Trans Instrum Meas
Dong X, Garratt MA, Anavatti SG, Abbass HA (2022) “Towards real-time monocular depth estimation for robotics: a survey.” IEEE Trans Intell Transport Syst 23(10):16 940–16 961
Liu Y, Chen K, Liu C, Qin Z, Luo Z, Wang J (2019) “Structured knowledge distillation for semantic segmentation.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2604–2613
Wang K, Zhang Z, Yan Z, Li X, Xu B, Li J, Yang J (2021) “Regularizing nighttime weirdness: efficient self-supervised monocular depth estimation in the dark.” In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 16 055–16 064
Ranftl R, Lasinger K, Hafner D, Schindler K, Koltun V (2020) Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans Pattern Analysis Machine Intell 44(3):1623–1637
Chen W, Fu Z, Yang D, Deng J (2016) “Single-image depth perception in the wild.” Adv Neural Inf Process Syst 29
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:8026–8037
Liu F, Shen C, Lin G, Reid I (2015) Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Mach Intell 38(10):2024–2039
Article Google Scholar
Abdulwahab S, Rashwan HA, Garcia MA, Masoumian A, Puig D (2022) “Monocular depth map estimation based on a multi-scale deep architecture and curvilinear saliency feature boosting.” Neural Comput Appl 34(19):16 423–16 440
Song M, Lim S, Kim W (2021) Monocular depth estimation using laplacian pyramid-based depth residuals. IEEE Trans Circuits Systems Video Technol 31(11):4381–4393
Article Google Scholar
Meng X, Fan C, Ming Y, Yu H (2021) Cornet: context-based ordinal regression network for monocular depth estimation. IEEE Trans Circuits Systr Video Technol 32(7):4841–4853

Download references

Author information

Authors and Affiliations

Harbin Institute of Technology (HIT), Harbin, China
Chunpu Liu, Wangmeng Zuo, Guanglei Yang & Tianyi Zang
Huawei Noah’s Ark Lab, Beijing, China
Wanlong Li, Feng Wen & Hongbo Zhang

Authors

Chunpu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wangmeng Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Guanglei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wanlong Li
View author publications
You can also search for this author in PubMed Google Scholar
Feng Wen
View author publications
You can also search for this author in PubMed Google Scholar
Hongbo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tianyi Zang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tianyi Zang.

Ethics declarations

Conflicts of interest

No conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, C., Zuo, W., Yang, G. et al. Relative order constraint for monocular depth estimation. Appl Intell 53, 24804–24821 (2023). https://doi.org/10.1007/s10489-023-04851-7

Download citation

Accepted: 29 June 2023
Published: 29 July 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s10489-023-04851-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Relative order constraint for monocular depth estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

Digging into the multi-scale structure for a more refined depth map and 3D reconstruction

Monocular depth estimation based on deep learning: An overview

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Relative order constraint for monocular depth estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

Digging into the multi-scale structure for a more refined depth map and 3D reconstruction

Monocular depth estimation based on deep learning: An overview

Explore related subjects

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation