Skip to main content

Advertisement

Log in

Relative order constraint for monocular depth estimation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Monocular depth estimation, which is playing an increasingly important role in 3D scene understanding, has been attracting increasing attention in the computer vision field in recent years. The latest monocular depth estimation methods based on deep learning have achieved significant performance by exploring various network architectures. However, compared with designing larger and more complex model architectures for monocular depth estimation, leveraging scene geometry relations to boost the performance of monocular depth estimation models has been less studied. To explore further utilization of scene geometry relations on monocular depth estimation, we propose a geometry-aware constraint that makes use of relative order information to improve the performance of monocular depth estimation models. Specifically, we first design a relative order descriptor (ROD) to construct the relative order description on single scene location. Then, based on the ROD, the relative order map (ROM) is built to represent the relative order information of the whole scene. Finally, a loss term relative order loss (ROL), which relies on ROM to supervise the training process of the monocular depth estimation model is presented. Our proposed method can help monocular depth estimation models to predict more accurate depth maps. Moreover, with the geometry constraint from our method, the monocular depth estimation model can provide prediction results where high-quality scene structure can be better preserved. We conduct extensive experiments on the popular datasets NYU Depth V2 and KITTI. The experimental results demonstrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

We only use the data from public datasets mention in Section. 4 and no extra data which is self-generated or self-collected is utilized.

References

  1. Eigen D, Fergus R (2015) “Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture.” In Proceedings of the IEEE international conference on computer vision, pp 2650–2658

  2. Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) “Deeper depth prediction with fully convolutional residual networks,” In 2016 Fourth international conference on 3D vision (3DV), pp 239–248. IEEE

  3. Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) “Deep ordinal regression network for monocular depth estimation.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2002–2011

  4. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) “Attention is all you need.” Adv Neural Inf Process Syst 30

  5. Yuan W, Gu X, Dai Z, Zhu S, Tan P (2022) “Newcrfs: Neural window fully-connected crfs for monocular depth estimation.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  6. Kim D, Ga W, Ahn P, Joo D, Chun S, Kim J (2022) “Global-local path networks for monocular depth estimation with vertical cutdepth.” arXiv:2201.07436

  7. Lee JH, Han MK, Ko DW, Suh IH (2019) “From big to small: Multi-scale local planar guidance for monocular depth estimation.” arXiv:1907.10326

  8. Qi X, Liao R, Liu Z, Urtasun R, Jia J (2018) “Geonet: geometric neural network for joint depth and surface normal estimation.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 283–291

  9. Yin W, Liu Y, Shen C, Yan Y, (2019) “Enforcing geometric constraints of virtual normal for depth prediction.” In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 5684–5693

  10. Eigen D, Puhrsch C, Fergus R (2014)“Depth map prediction from a single image using a multi-scale deep network.” Adv Neural Inf Process Syst 27

  11. Silberman N, Hoiem D, Kohli P, Fergus R (2012)“Indoor segmentation and support inference from rgbd images.” In European conference on computer vision, pp 746–760. Springer

  12. Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237

  13. Masoumian A, Rashwan HA, Cristiano J, Asif MS, Puig D (2022) Monocular depth estimation using deep learning: a review. Sensors 22(14):5353

  14. Vyas P, Saxena C, Badapanda A, Goswami A (2022) “Outdoor monocular depth estimation: a research review.” arXiv:2205.01399

  15. He K, Zhang X, Ren S, Sun J (2016) “Deep residual learning for image recognition.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  16. Xu D, Ricci E, Ouyang W, Wang X, Sebe N (2017) “Multi-scale continuous crfs as sequential deep networks for monocular depth estimation.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5354–5362

  17. Ricci E, Ouyang W, Wang X, Sebe N et al (2018) Monocular depth estimation using multi-scale continuous crfs as sequential deep networks. IEEE Trans Pattern Anal Mach Intell 41(6):1426–1440

    Google Scholar 

  18. Cao Y, Wu Z, Shen C (2017) Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Trans Circuits Syst Video Technol 28(11):3174–3182

    Article  Google Scholar 

  19. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) “An image is worth 16x16 words: transformers for image recognition at scale.” In International Conference on Learning Representations

  20. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) “Swin transformer: hierarchical vision transformer using shifted windows.” In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10 012–10 022

  21. Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, Tang Y, Xiao A, Xu C, Xu Y et al (2022) A survey on vision transformer. IEEE Trans Pattern Anal Mach Intell 45(1):87–110

    Article  Google Scholar 

  22. Bhat SF, Alhashim I, Wonka P (2021) “Adabins: depth estimation using adaptive bins.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4009–4018

  23. Ranftl R, Lasinger K, Hafner D, Schindler K, Koltun V (2020) “Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer.” IEEE Trans Pattern Anal Mach Intell (TPAMI)

  24. Mo Y, Wu Y, Yang X, Liu F, Liao Y (2022) Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 493:626–646

    Article  Google Scholar 

  25. Wang Y, Zhou W, Lv Q, Yao G (2022) “Metricmask: single category instance segmentation by metric learning.” Neurocomputing

  26. Gao B, Zhao Y, Zhang F, Luo B, Yang C (2022) Video object segmentation based on multi-level target models and feature integration. Neurocomputing 492:396–407

    Article  Google Scholar 

  27. Zhang Z, Cui Z, Xu C, Jie Z, Li X, Yang J (2018) “Joint task-recursive learning for semantic segmentation and depth estimation.” In Proceedings of the European Conference on Computer Vision (ECCV), pp 235–251

  28. Kwak Dh, Lee Sh (2022) A novel method for estimating monocular depth using cycle gan and segmentation. Sensors 20(9):2567

    Article  Google Scholar 

  29. He L, Lu J, Wang G, Song S, Zhou J (2021) Sosd-net: joint semantic object segmentation and depth estimation from monocular images. Neurocomputing 440:251–263

  30. Li R, Xue D, Su S, He X, Mao Q, Zhu Y, Sun J, Zhang Y (2023) “Learning depth via leveraging semantics: self-supervised monocular depth estimation with both implicit and explicit semantic guidance.” Pattern Recognit 109297

  31. Benkirane FE, Crombez N, Ruichek Y, Hilaire V (2023) Integration of ontology reasoning-based monocular cues in deep learning modeling for single image depth estimation in urban driving scenarios. Knowl-Based Syst 260:110184

    Article  Google Scholar 

  32. Zhou T, Brown M, Snavely N, Lowe DG (2017) “Unsupervised learning of depth and ego-motion from video.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1851–1858

  33. Zhan H, Garg R, Weerasekera CS, Li K, Agarwal H, Reid I (2018) “Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 340–349

  34. Godard C, Mac Aodha O, Firman M, Brostow GJ (2019) “Digging into self-supervised monocular depth estimation.” In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 3828–3838

  35. Zhao C, Tang Y, Sun Q (2022) Unsupervised monocular depth estimation in highly complex environments. IEEE Trans Emerg Topics Comput Intell 6(5):1237–1246

    Article  Google Scholar 

  36. Zhou Z, Dong Q (2022) “Self-distilled feature aggregation for self-supervised monocular depth estimation.” In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part I, pp 709–726. Springer

  37. Masoumian A, Rashwan HA, Abdulwahab S, Cristiano J, Asif MS, Puig D (2023) Gcndepth: self-supervised monocular depth estimation based on graph convolutional network. Neurocomputing 517:81–92

  38. He M, Hui L, Bian Y, Ren J, Xie J, Yang J (2022) “Ra-depth: resolution adaptive self-supervised monocular depth estimation.” In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVII, pp 565–581. Springer

  39. Wofk D, Ma F, Yang TJ, Karaman S, Sze V (2019) “Fastdepth: fast monocular depth estimation on embedded systems.” In 2019 International Conference on Robotics and Automation (ICRA), pp 6101–6108. IEEE

  40. Liu X, Wei W, Liu C, Peng Y, Huang J, Li J (2023) “Real-time monocular depth estimation merging vision transformers on edge devices for aiot.” IEEE Trans Instrum Meas

  41. Dong X, Garratt MA, Anavatti SG, Abbass HA (2022) “Towards real-time monocular depth estimation for robotics: a survey.” IEEE Trans Intell Transport Syst 23(10):16 940–16 961

  42. Liu Y, Chen K, Liu C, Qin Z, Luo Z, Wang J (2019) “Structured knowledge distillation for semantic segmentation.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2604–2613

  43. Wang K, Zhang Z, Yan Z, Li X, Xu B, Li J, Yang J (2021) “Regularizing nighttime weirdness: efficient self-supervised monocular depth estimation in the dark.” In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 16 055–16 064

  44. Ranftl R, Lasinger K, Hafner D, Schindler K, Koltun V (2020) Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans Pattern Analysis Machine Intell 44(3):1623–1637

  45. Chen W, Fu Z, Yang D, Deng J (2016) “Single-image depth perception in the wild.” Adv Neural Inf Process Syst 29

  46. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:8026–8037

  47. Liu F, Shen C, Lin G, Reid I (2015) Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Mach Intell 38(10):2024–2039

    Article  Google Scholar 

  48. Abdulwahab S, Rashwan HA, Garcia MA, Masoumian A, Puig D (2022) “Monocular depth map estimation based on a multi-scale deep architecture and curvilinear saliency feature boosting.” Neural Comput Appl 34(19):16 423–16 440

  49. Song M, Lim S, Kim W (2021) Monocular depth estimation using laplacian pyramid-based depth residuals. IEEE Trans Circuits Systems Video Technol 31(11):4381–4393

    Article  Google Scholar 

  50. Meng X, Fan C, Ming Y, Yu H (2021) Cornet: context-based ordinal regression network for monocular depth estimation. IEEE Trans Circuits Systr Video Technol 32(7):4841–4853

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianyi Zang.

Ethics declarations

Conflicts of interest

No conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, C., Zuo, W., Yang, G. et al. Relative order constraint for monocular depth estimation. Appl Intell 53, 24804–24821 (2023). https://doi.org/10.1007/s10489-023-04851-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04851-7

Keywords

Navigation