Skip to main content

Advertisement

Log in

Monocular depth estimation via cross-spectral stereo information fusion

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Although amount of works are focused on monocular depth estimation, these works mainly study on the RGB spectrum, which has a poor performance on the case of nighttime, low light environment and even zero light environment. The images of other spectrum provide an opportunity to obtain depth without an active projector source. In this paper, we design a three-step architecture to realize monocular depth estimation by fusing cross-spectral stereo information. In the first step, we employ Spectral Translation Network to tackle with the problem that different spectral images have huge appearance differences and propose a disparity reservation loss to reserve disparity when translating. In the second step, we use Monocular Estimation Network to predict disparity of the principal spectrum, which is used for test. In the third step, we retrain the Spectral Translation Network with a generative optimization loss to improve the quality of image translation. Experiments show that our method achieves preeminent performance and reaches real-time speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Gurram A (2022) Monocular depth estimation for autonomous driving. PhD thesis, Autonomous University of Barcelona, Spain

  2. Xue F, Zhuo G, Huang Z, Fu W, Wu Z, Ang MH (2020) Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications. In: International conference on intelligent robots and systems, pp 2330–2337

  3. Dong X, Garratt MA, Anavatti SG, Abbass HA (2022) Towards real-time monocular depth estimation for robotics: A survey. IEEE Trans Intell Transp Syst 23(10):16940–16961

    Article  Google Scholar 

  4. Schreiber AM, Hong M, Rozenblit JW (2021) Monocular depth estimation using synthetic data for an augmented reality training system in laparoscopic surgery. In: 2021 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2021, Melbourne, Australia, October 17–20, 2021, pp 2121–2126

  5. Godard C, Aodha OM, Firman M, Brostow GJ (2019) Digging into self-supervised monocular depth estimation. In: ICCV

  6. Wu Z, Wu X, Zhang X, Wang S, Ju L (2021) Learning depth from single image using depth-aware convolution and stereo knowledge. In: ICME

  7. Zheng Q, Yu T, Wang F (2023) Self–supervised monocular depth estimation based on combining convolution and multilayer perceptron. Eng Appl Artif Intell 117(Part):105587

  8. Petrovai A, Nedevschi S (2022) Exploiting pseudo labels in a self-supervised learning framework for improved monocular depth estimation. In: IEEE/CVF conference on computer vision and pattern recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, pp 1568–1578

  9. Chiu WW, Blanke U, Fritz M (2011) Improving the kinect by cross-modal stereo. In: BMVC

  10. Heo YS, Lee KM, Lee SU (2011) Robust stereo matching using adaptive normalized cross-correlation. TPAMI. 33(4)

  11. Kim S, Min D, Ham B, Ryu S, Do MN, Sohn K (2015) DASC: dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence. In: CVPR

  12. Duan Z, Jung C (2022) Joint disparity estimation and pseudo NIR generation from cross spectral image pairs. IEEE Access 10:7153–7163

    Article  Google Scholar 

  13. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial networks. CoRR. arXiv 1406:2661

    Google Scholar 

  14. Zhu J, Park T, Isola P, Efros AA (2017) Unpaired image–to–image translation using cycle-consistent adversarial networks. In: ICCV

  15. Chang J, Chen Y (2018) Pyramid stereo matching network. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp 5410–5418

  16. Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp 2366–2374

  17. Gruber T, Julca–Aguilar FD, Bijelic M, Heide F (2019) Gated2depth: real-time dense lidar from gated images. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 – November 2, 2019, pp 1506–1516

  18. Jaritz M, Charette R, Wirbel É, Perrotton X, Nashashibi F (2018) Sparse and dense data with cnns: depth completion and semantic segmentation. In: 2018 International Conference on 3D Vision, 3DV 2018, Verona, Italy, September 5–8, 2018, pp 52–60

  19. Ma F, Karaman S (2018) Sparse–to–dense: depth prediction from sparse depth samples and a single image. In: 2018 IEEE International Conference on Robotics and Automation, ICRA 2018, Brisbane, Australia, May 21–25, 2018, pp 1–8

  20. Kendall A, Martirosyan H, Dasgupta S, Henry P (2017) End–to–end learning of geometry and context for deep stereo regression. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22–29, 2017, pp 66–75

  21. Li Z, Dekel T, Cole F, Tucker R, Snavely N, Liu C, Freeman WT (2021) Mannequinchallenge: learning the depths of moving people by watching frozen people. IEEE Trans Pattern Anal Mach Intell 43(12):4229–4241

    Article  Google Scholar 

  22. Mayer N, Ilg E, Häusser P, Fischer P, Cremers D, Dosovitskiy A, Brox T (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp 4040–4048

  23. Chen RJ, Mahmood F, Yuille AL, Durr NJ (2018) Rethinking monocular depth estimation with adversarial training. CoRR. arxiv:1808.07528

  24. Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: Fourth International Conference on 3D Vision, 3DV 2016, Stanford, CA, USA, October 25–28, 2016, pp 239–248

  25. Zhang F, Prisacariu VA, Yang R, Torr PHS (2019) Ga-net: guided aggregation net for end-to-end stereo matching. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 185–194

  26. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16–21, 2012, pp 3354–3361

  27. Uhrig J, Schneider N, Schneider L, Franke U, Brox T, Geiger A (2017) Sparsity invariant cnns. In: 2017 International Conference on 3D Vision, 3DV 2017, Qingdao, China, October 10-12, 2017, pp 11–20

  28. Bijelic M, Gruber T, Mannan F, Kraus F, Ritter W, Dietmayer K, Heide F (2020) Seeing through fog without seeing fog: deep multimodal sensor fusion in unseen adverse weather. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 11679–11689

  29. Garg R, Kumar BGV, Carneiro G, Reid ID (2016) Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Computer Vision – ECCV 2016 – 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII. Lecture Notes in Computer Science, vol 9912, pp 740–756

  30. Godard C, Aodha OM, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: CVPR

  31. Guizilini V, Ambrus R, Pillai S, Raventos A, Gaidon A (2020) 3d packing for self–supervised monocular depth estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp 2482–2491

  32. Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego–motion from video. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, pp. 6612–6619

  33. Ummenhofer B, Zhou H, Uhrig J, Mayer N, Ilg E, Dosovitskiy A, Brox T (2017) Demon: depth and motion network for learning monocular stereo. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp 5622–5631

  34. Arif ZH, Mahmoud MA, Abdulkareem KH, Kadry S, Mohammed MA, Al-Mhiqani MN, Al-Waisy AS, Nedoma J (2022) Adaptive deep learning detection model for multi- foggy images. Int J Interact Multim Artif Intell 7(7):26

    Google Scholar 

  35. Pan S, Gu X, Chong Y, Guo Y (2022) Content-based hyperspectral image compression using a multi-depth weighted map with dynamic receptive field convolution. Int J Interact Multim Artif Intell 7(5):85

    Google Scholar 

  36. Zhi T, Pires BR, Hebert M, Narasimhan SG (2018) Deep material–aware cross-spectral stereo matching. In: CVPR

  37. Liang, M, Guo, X, Li, H, Wang, X, Song, Y (2019) Unsupervised cross-spectral stereo matching by learning to synthesize. In: AAAI, pp 8706–8713

  38. Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp 2414–2423

  39. Johnson J, Alahi A, Fei–Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: Computer Vision – ECCV 2016 – 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part II. Lecture Notes in Computer Science, vol 9906, pp 694–711

  40. Cheng Z, Yang Q, Sheng B (2015) Deep colorization. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015, pp 415–423

  41. Zhu J, Krähenbühl P, Shechtman E, Efros AA (2016) Generative visual manipulation on the natural image manifold. In: Computer Vision – ECCV 2016 – 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part V. Lecture Notes in Computer Science, vol 9909, pp 597–613

  42. Sangkloy P, Lu J, Fang C, Yu F, Hays J (2017) Scribbler: controlling deep image synthesis with sketch and color. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, pp 6836–6845

  43. Isola P, Zhu J, Zhou T, Efros AA (2017) Image–to–image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, pp. 5967–5976

  44. Xu D, Ouyang W, Ricci E, Wang X, Sebe N (2017) Learning cross-modal deep representations for robust pedestrian detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, pp 4236–4244

  45. Goodfellow IJ, Pouget–Abadie J, Mirza M, Xu B, Warde–Farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. In: NIPS

  46. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4)

  47. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: ICLR

  48. Theis L, Shi W, Cunningham A, Huszár F (2017) Lossy image compression with compressive autoencoders. In: ICLR

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huwei Liu.

Ethics declarations

Conflict of Interests

The authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H. Monocular depth estimation via cross-spectral stereo information fusion. Multimed Tools Appl 83, 61065–61081 (2024). https://doi.org/10.1007/s11042-023-17966-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17966-3

Keywords

Navigation