Stable Depth Estimation Within Consecutive Video Frames

Luo, Fei; Wei, Lin; Xiao, Chunxia

doi:10.1007/978-3-030-89029-2_4

Fei Luo¹⁵,
Lin Wei¹⁵ &
Chunxia Xiao¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13002))

Included in the following conference series:

Computer Graphics International Conference

2201 Accesses
1 Citations

Abstract

Deep learning based depth estimation methods have been proven effective and promising, especially learning depth from monocular video. Depth-from-video is the real sense of unsupervised depth estimation, as it doesn’t need depth ground truth or stereo image pairs as supervision. However, most of existing depth-from-video methods did not think of frame-to-frame depth estimation stability. We found depths within temporally consecutive frames exist instability although single image depth can be estimated well by recent works. Thus, this work aims to solve this problem. Specifically, we define a temporal smoothness term for the depth map and propose a temporal stability loss to constrain depths of the same objects within consecutive frames to keep their stability. We also propose an inconsistency check processing according to the differences between synthetic view frames and their original RGB frame. Based on the inconsistency check, we propose a self-discovered mask to handle the moving and occluded objects. Experiments show that the proposed method is effective and can estimate stable depth results within temporally consecutive frames. Meanwhile, it achieves competitive performance on the KITTI dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Mono-ViFI: A Unified Learning Framework for Self-supervised Single and Multi-frame Monocular Depth Estimation

Efficient Unsupervised Monocular Depth Estimation with Inter-Frame Depth Interpolation

Unsupervised Scale-Consistent Depth Learning from Video

Article 18 June 2021

References

Yang, L., Yan, Q., Fu, Y., Xiao, C.: Surface reconstruction via fusing sparse-sequence of depth images. IEEE Trans. Vis. Comput. Graph. 24(2), 1190–1203 (2017)
Article Google Scholar
Liao, J., Fu, Y., Yan, Q., Luo, F., Xiao, C.: Adaptive depth estimation for pyramid multi-view stereo. Comput. Graph. 97, 268–278 (2021)
Article Google Scholar
Garg, R., B.G., V.K., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_45
Chapter Google Scholar
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 270–279 (2017)
Google Scholar
Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., Stefano, L.D.: Real-time self-adaptive deep stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 195–204 (2019)
Google Scholar
Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3828–3838 (2019)
Google Scholar
Mahjourian, R., Wicke, M., Angelova, A.: Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5667–5675 (2018)
Google Scholar
Wang, C., Buenaposada, J.M., Zhu, R., Lucey, S.: Learning depth from monocular videos using direct methods. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2022–2030 (2018)
Google Scholar
Yin, Z., Shi, J.: Geonet: unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1983–1992 (2018)
Google Scholar
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1851–1858 (2017)
Google Scholar
Chen, W., Qian, S., Deng, J.: Learning single-image depth from videos using quality assessment networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5604–5613 (2019)
Google Scholar
Zhou, J., Wang, Y., Qin, K., Zeng, W.: Moving indoor: unsupervised video depth learning in challenging environments. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8618–8627 (2019)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)
Google Scholar
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)
Google Scholar
Saxena, A., Sun, M., Ng, A.Y.: Make3d: learning 3d scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2008)
Article Google Scholar
Karsch, K., Liu, C., Kang, S.B.: Depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2144–2158 (2014)
Article Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: 28th Annual Conference on Neural Information Processing Systems 2014, NIPS 2014, pp. 2366–2374. Neural information processing systems foundation (2014)
Google Scholar
Ranjan, A., et al.: Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12240–12249 (2019)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Zou, Y., Luo, Z., Huang, J.B.: Df-net: unsupervised joint learning of depth and flow using cross-task consistency. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 36–53 (2018)
Google Scholar
Bian, J., et al.: Unsupervised scale-consistent depth and ego-motion learning from monocular video. In: Advances in Neural Information Processing Systems, vol. 32, pp. 35–45 (2019)
Google Scholar
Liu, F., Shen, C., Lin, G., Reid, I.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2024–2039 (2015)
Article Google Scholar
Kuznietsov, Y., Stuckler, J., Leibe, B.: Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6647–6655 (2017)
Google Scholar
Yang, N., Wang, R., Stuckler, J., Cremers, D.: Deep virtual stereo odometry: leveraging deep depth prediction for monocular direct sparse odometry. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 817–833 (2018)
Google Scholar
Yang, Z., Wang, P., Xu, W., Zhao, L., Nevatia, R.: Unsupervised learning of geometry with edge-aware depth-normal consistency. arXiv preprint arXiv:1711.03665 (2017)
Casser, V., Pirk, S., Mahjourian, R., Angelova, A.: Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 8001–8008 (2019)
Google Scholar
Klingner, M., Termöhlen, J.A., Mikolajczyk, J., Fingscheidt, T.: Self-supervised monocular depth estimation: solving the dynamic object problem by semantic guidance, pp. 582–600 (2020)
Google Scholar
Johnston, A., Carneiro, G.: Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4756–4765 (2020)
Google Scholar
Guizilini, V., Ambrus, R., Pillai, S., Raventos, A., Gaidon, A.: 3d packing for self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2485–2494 (2020)
Google Scholar
Zhao, W., Liu, S., Shu, Y., Liu, Y.J.: Towards better generalization: joint depth-pose learning without posenet. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9151–9161 (2020)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar

Download references

Acknowledgments

This work is partially supported by the Key Technological Innovation Projects of Hubei Province (2018AAA062), NSFC (No. 61972298), Wuhan University-Huawei GeoInformatices Innovation Lab.

Author information

Authors and Affiliations

School of Computer Science, Wuhan University, Wuhan, Hubei, China
Fei Luo, Lin Wei & Chunxia Xiao

Authors

Fei Luo
View author publications
You can also search for this author in PubMed Google Scholar
Lin Wei
View author publications
You can also search for this author in PubMed Google Scholar
Chunxia Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chunxia Xiao .

Editor information

Editors and Affiliations

University of Geneva, Carouge, Switzerland
Nadia Magnenat-Thalmann
University of Minnesota, Minneapolis, MN, USA
Victoria Interrante
EPFL, Lausanne, Switzerland
Daniel Thalmann
University of Crete, Heraklion, Crete, Greece
George Papagiannakis
Shanghai Jiao Tong University, Shanghai, China
Bin Sheng
University of Sydney, Sydney, NSW, Australia
Jinman Kim
University of Calgary, Calgary, AB, Canada
Marina Gavrilova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luo, F., Wei, L., Xiao, C. (2021). Stable Depth Estimation Within Consecutive Video Frames. In: Magnenat-Thalmann, N., et al. Advances in Computer Graphics. CGI 2021. Lecture Notes in Computer Science(), vol 13002. Springer, Cham. https://doi.org/10.1007/978-3-030-89029-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-89029-2_4
Published: 11 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89028-5
Online ISBN: 978-3-030-89029-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics