Skip to main content

Stable Depth Estimation Within Consecutive Video Frames

  • Conference paper
  • First Online:
Advances in Computer Graphics (CGI 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13002))

Included in the following conference series:

Abstract

Deep learning based depth estimation methods have been proven effective and promising, especially learning depth from monocular video. Depth-from-video is the real sense of unsupervised depth estimation, as it doesn’t need depth ground truth or stereo image pairs as supervision. However, most of existing depth-from-video methods did not think of frame-to-frame depth estimation stability. We found depths within temporally consecutive frames exist instability although single image depth can be estimated well by recent works. Thus, this work aims to solve this problem. Specifically, we define a temporal smoothness term for the depth map and propose a temporal stability loss to constrain depths of the same objects within consecutive frames to keep their stability. We also propose an inconsistency check processing according to the differences between synthetic view frames and their original RGB frame. Based on the inconsistency check, we propose a self-discovered mask to handle the moving and occluded objects. Experiments show that the proposed method is effective and can estimate stable depth results within temporally consecutive frames. Meanwhile, it achieves competitive performance on the KITTI dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Yang, L., Yan, Q., Fu, Y., Xiao, C.: Surface reconstruction via fusing sparse-sequence of depth images. IEEE Trans. Vis. Comput. Graph. 24(2), 1190–1203 (2017)

    Article  Google Scholar 

  2. Liao, J., Fu, Y., Yan, Q., Luo, F., Xiao, C.: Adaptive depth estimation for pyramid multi-view stereo. Comput. Graph. 97, 268–278 (2021)

    Article  Google Scholar 

  3. Garg, R., B.G., V.K., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_45

    Chapter  Google Scholar 

  4. Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 270–279 (2017)

    Google Scholar 

  5. Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., Stefano, L.D.: Real-time self-adaptive deep stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 195–204 (2019)

    Google Scholar 

  6. Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3828–3838 (2019)

    Google Scholar 

  7. Mahjourian, R., Wicke, M., Angelova, A.: Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5667–5675 (2018)

    Google Scholar 

  8. Wang, C., Buenaposada, J.M., Zhu, R., Lucey, S.: Learning depth from monocular videos using direct methods. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2022–2030 (2018)

    Google Scholar 

  9. Yin, Z., Shi, J.: Geonet: unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1983–1992 (2018)

    Google Scholar 

  10. Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1851–1858 (2017)

    Google Scholar 

  11. Chen, W., Qian, S., Deng, J.: Learning single-image depth from videos using quality assessment networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5604–5613 (2019)

    Google Scholar 

  12. Zhou, J., Wang, Y., Qin, K., Zeng, W.: Moving indoor: unsupervised video depth learning in challenging environments. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8618–8627 (2019)

    Google Scholar 

  13. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)

    Google Scholar 

  14. Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)

    Google Scholar 

  15. Saxena, A., Sun, M., Ng, A.Y.: Make3d: learning 3d scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2008)

    Article  Google Scholar 

  16. Karsch, K., Liu, C., Kang, S.B.: Depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2144–2158 (2014)

    Article  Google Scholar 

  17. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: 28th Annual Conference on Neural Information Processing Systems 2014, NIPS 2014, pp. 2366–2374. Neural information processing systems foundation (2014)

    Google Scholar 

  18. Ranjan, A., et al.: Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12240–12249 (2019)

    Google Scholar 

  19. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  20. Zou, Y., Luo, Z., Huang, J.B.: Df-net: unsupervised joint learning of depth and flow using cross-task consistency. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 36–53 (2018)

    Google Scholar 

  21. Bian, J., et al.: Unsupervised scale-consistent depth and ego-motion learning from monocular video. In: Advances in Neural Information Processing Systems, vol. 32, pp. 35–45 (2019)

    Google Scholar 

  22. Liu, F., Shen, C., Lin, G., Reid, I.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2024–2039 (2015)

    Article  Google Scholar 

  23. Kuznietsov, Y., Stuckler, J., Leibe, B.: Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6647–6655 (2017)

    Google Scholar 

  24. Yang, N., Wang, R., Stuckler, J., Cremers, D.: Deep virtual stereo odometry: leveraging deep depth prediction for monocular direct sparse odometry. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 817–833 (2018)

    Google Scholar 

  25. Yang, Z., Wang, P., Xu, W., Zhao, L., Nevatia, R.: Unsupervised learning of geometry with edge-aware depth-normal consistency. arXiv preprint arXiv:1711.03665 (2017)

  26. Casser, V., Pirk, S., Mahjourian, R., Angelova, A.: Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 8001–8008 (2019)

    Google Scholar 

  27. Klingner, M., Termöhlen, J.A., Mikolajczyk, J., Fingscheidt, T.: Self-supervised monocular depth estimation: solving the dynamic object problem by semantic guidance, pp. 582–600 (2020)

    Google Scholar 

  28. Johnston, A., Carneiro, G.: Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4756–4765 (2020)

    Google Scholar 

  29. Guizilini, V., Ambrus, R., Pillai, S., Raventos, A., Gaidon, A.: 3d packing for self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2485–2494 (2020)

    Google Scholar 

  30. Zhao, W., Liu, S., Shu, Y., Liu, Y.J.: Towards better generalization: joint depth-pose learning without posenet. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9151–9161 (2020)

    Google Scholar 

  31. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work is partially supported by the Key Technological Innovation Projects of Hubei Province (2018AAA062), NSFC (No. 61972298), Wuhan University-Huawei GeoInformatices Innovation Lab.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chunxia Xiao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Luo, F., Wei, L., Xiao, C. (2021). Stable Depth Estimation Within Consecutive Video Frames. In: Magnenat-Thalmann, N., et al. Advances in Computer Graphics. CGI 2021. Lecture Notes in Computer Science(), vol 13002. Springer, Cham. https://doi.org/10.1007/978-3-030-89029-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89029-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89028-5

  • Online ISBN: 978-3-030-89029-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics