DevNet: Self-supervised Monocular Depth Learning via Density Volume Construction

Zhou, Kaichen; Hong, Lanqing; Chen, Changhao; Xu, Hang; Ye, Chaoqiang; Hu, Qingyong; Li, Zhenguo

doi:10.1007/978-3-031-19842-7_8

Kaichen Zhou¹²,
Lanqing Hong¹³,
Changhao Chen¹²,
Hang Xu¹³,
Chaoqiang Ye¹³,
Qingyong Hu¹² &
…
Zhenguo Li¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13699))

Included in the following conference series:

European Conference on Computer Vision

3775 Accesses

Abstract

Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames. However, they neither fully exploit the 3D point-wise geometric correspondences, nor effectively tackle the ambiguities in the photometric warping caused by occlusions or illumination inconsistency. To address these problems, this work proposes Density Volume Construction Network (DevNet), a novel self-supervised monocular depth learning framework, that can consider 3D spatial information, and exploit stronger geometric constraints among adjacent camera frustums. Instead of directly regressing the pixel value from a single image, our DevNet divides the camera frustum into multiple parallel planes and predicts the pointwise occlusion probability density on each plane. The final depth map is generated by integrating the density along corresponding rays. During the training process, novel regularization strategies and loss functions are introduced to mitigate photometric ambiguities and overfitting. Without obviously enlarging model parameters size or running time, DevNet outperforms several representative baselines on both the KITTI-2015 outdoor dataset and NYU-V2 indoor dataset. In particular, the root-mean-square-deviation is reduced by around 4% with DevNet on both KITTI-2015 and NYU-V2 in the task of depth estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

RA-Depth: Resolution Adaptive Self-supervised Monocular Depth Estimation

Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

BRNet: Exploring Comprehensive Features for Monocular Depth Estimation

References

Bhat, S.F., Alhashim, I., Wonka, P.: AdaBins: depth estimation using adaptive bins. In: CVPR (2021)
Google Scholar
Bian, J., et al.: Unsupervised scale-consistent depth and ego-motion learning from monocular video. In: NeurIPS (2019)
Google Scholar
Du, Y., Zhang, Y., Yu, H.X., Tenenbaum, J.B., Wu, J.: Neural radiance flow for 4D view synthesis and video processing. In: ICCV (2021)
Google Scholar
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV (2015)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. arXiv preprint arXiv:1406.2283 (2014)
Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. TPAMI 40(3), 611–625 (2017)
Article Google Scholar
Engel, J., Stückler, J., Cremers, D.: Large-scale direct slam with stereo cameras. In: IROS (2015)
Google Scholar
Garg, R., B.G., V.K., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_45
Chapter Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012)
Google Scholar
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)
Google Scholar
Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation. In: ICCV (2019)
Google Scholar
GonzalezBello, J.L., Kim, M.: Forget about the LiDAR: self-supervised depth estimators with med probability volumes. In: NeurIPS (2020)
Google Scholar
Gordon, A., Li, H., Jonschkowski, R., Angelova, A.: Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras. In: ICCV (2019)
Google Scholar
Guizilini, V., Ambrus, R., Pillai, S., Raventos, A., Gaidon, A.: 3D packing for self-supervised monocular depth estimation. In: CVPR (2020)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Huo, Y., et al.: Learning depth-guided convolutions for monocular 3D object detection. In: CVPR Workshop (2020)
Google Scholar
Ji, P., Li, R., Bhanu, B., Xu, Y.: MonoIndoor: towards good practice of self-supervised monocular depth estimation for indoor environments. In: ICCV (2021)
Google Scholar
Jin, H., Favaro, P., Soatto, S.: Real-time feature tracking and outlier rejection with changes in illumination. In: ICCV (2001)
Google Scholar
Johnston, A., Carneiro, G.: Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In: CVPR (2020)
Google Scholar
Kajiya, J.T.: The rendering equation. In: Annual Conference on Computer Graphics and Interactive Techniques (1986)
Google Scholar
Kalia, M., Navab, N., Salcudean, T.: A real-time interactive augmented reality depth estimation technique for surgical robotics. In: ICRA (2019)
Google Scholar
Lee, J.H., Han, M.K., Ko, D.W., Suh, I.H.: From big to small: multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326 (2019)
Li, Y., Ushiku, Y., Harada, T.: Pose graph optimization for unsupervised monocular visual odometry. In: ICRA (2019)
Google Scholar
Li, Z., Niklaus, S., Snavely, N., Wang, O.: Neural scene flow fields for space-time view synthesis of dynamic scenes. In: CVPR (2021)
Google Scholar
Liu, H., Tang, X., Shen, S.: Depth-map completion for large indoor scene reconstruction. Pattern Recogn. 99, 107112 (2020)
Article Google Scholar
Luo, C., et al.: Every pixel counts++: joint learning of geometry and motion with 3D holistic understanding. TPAMI 42(10), 2624–2641 (2019)
Article Google Scholar
Martin-Brualla, R., Radwan, N., Sajjadi, M., Barron, J.T., Dosovitskiy, A., Duckworth, D.: NeRF in the wild: neural radiance fields for unconstrained photo collections. arXiv preprint arXiv:2008.02268 (2020)
Max, N.: Optical models for direct volume rendering. IEEE Trans. Visual. Comput. Graph. 1(2), 99–108 (1995)
Article Google Scholar
Meng, Y., et al.: SigNet: semantic instance aided unsupervised 3D geometry perception. In: CVPR (2019)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Chapter Google Scholar
Peng, R., Wang, R., Lai, Y., Tang, L., Cai, Y.: Excavating the potential capacity of self-supervised monocular depth estimation. In: ICCV (2021)
Google Scholar
Pillai, S., Ambruş, R., Gaidon, A.: SuperDepth: self-supervised, super-resolved monocular depth estimation. In: ICRA (2019)
Google Scholar
Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-NeRF: neural radiance fields for dynamic scenes. In: CVPR (2021)
Google Scholar
Reading, C., Harakeh, A., Chae, J., Waslander, S.L.: Categorical depth distribution network for monocular 3d object detection. In: CVPR (2021)
Google Scholar
Saxena, A., Chung, S.H., Ng, A.Y., et al.: Learning depth from single monocular images. In: NeurIPS (2005)
Google Scholar
Shu, C., Yu, K., Duan, Z., Yang, K.: Feature-metric loss for self-supervised learning of depth and egomotion. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 572–588. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_34
Chapter Google Scholar
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Chapter Google Scholar
Sun, J., et al.: Disp R-CNN: Stereo 3D object detection via shape prior guided instance disparity estimation. In: CVPR (2020)
Google Scholar
Wang, C., Buenaposada, J.M., Zhu, R., Lucey, S.: Learning depth from monocular videos using direct methods. In: CVPR (2018)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Wei, Y., Liu, S., Rao, Y., Zhao, W., Lu, J., Zhou, J.: NerfingMVS: guided optimization of neural radiance fields for indoor multi-view stereo. In: ICCV (2021)
Google Scholar
Yang, N., Stumberg, L.v., Wang, R., Cremers, D.: D3VO: deep depth, deep pose and deep uncertainty for monocular visual odometry. In: CVPR (2020)
Google Scholar
Yen-Chen, L., Florence, P., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.Y.: iNeRF: inverting neural radiance fields for pose estimation. arXiv preprint arXiv:2012.05877 (2020)
Yin, Z., Shi, J.: GeoNet: unsupervised learning of dense depth, optical flow and camera pose. In: CVPR, pp. 1983–1992 (2018)
Google Scholar
Yuan, W., Gu, X., Dai, Z., Zhu, S., Tan, P.: New CRFs: neural window fully-connected CRFs for monocular depth estimation. arXiv preprint arXiv:2203.01502 (2022)
Zhan, H., Garg, R., Weerasekera, C.S., Li, K., Agarwal, H., Reid, I.: Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: CVPR (2018)
Google Scholar
Zhang, D., Lo, F.P.W., Zheng, J.Q., Bai, W., Yang, G.Z., Lo, B.: Data-driven microscopic pose and depth estimation for optical microrobot manipulation. ACS Photonics 7(11), 3003–3014 (2020)
Article Google Scholar
Zhao, W., Liu, S., Shu, Y., Liu, Y.J.: Towards better generalization: joint depth-pose learning without PoseNet. In: CVPR (2020)
Google Scholar
Zhou, J., Wang, Y., Qin, K., Zeng, W.: Unsupervised high-resolution depth learning from videos with dual networks. In: ICCV (2019)
Google Scholar
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR (2017)
Google Scholar
Zhou, Z., Fan, X., Shi, P., Xin, Y.: R-MSFM: recurrent multi-scale feature modulation for monocular depth estimating. In: ICCV (2021)
Google Scholar
Zhu, S., Brazil, G., Liu, X.: The edge of depth: explicit constraints between segmentation and depth. In: CVPR (2020)
Google Scholar

Download references

Acknowledgements

This work was partially supported by the Huawei UK AI Fellowship. We gratefully acknowledge the support of MindSpore, CANN(Compute Architecture for Neural Networks), and Ascend AI Processor used for this research.

Author information

Authors and Affiliations

University of Oxford, Oxford, UK
Kaichen Zhou, Changhao Chen & Qingyong Hu
Huawei Noah’s Ark Lab, Shatin, Hong Kong
Lanqing Hong, Hang Xu, Chaoqiang Ye & Zhenguo Li

Authors

Kaichen Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Lanqing Hong
View author publications
You can also search for this author in PubMed Google Scholar
Changhao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Chaoqiang Ye
View author publications
You can also search for this author in PubMed Google Scholar
Qingyong Hu
View author publications
You can also search for this author in PubMed Google Scholar
Zhenguo Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lanqing Hong .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, K. et al. (2022). DevNet: Self-supervised Monocular Depth Learning via Density Volume Construction. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13699. Springer, Cham. https://doi.org/10.1007/978-3-031-19842-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-19842-7_8
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19841-0
Online ISBN: 978-3-031-19842-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DevNet: Self-supervised Monocular Depth Learning via Density Volume Construction