BRNet: Exploring Comprehensive Features for Monocular Depth Estimation

Han, Wencheng; Yin, Junbo; Jin, Xiaogang; Dai, Xiangdong; Shen, Jianbing

doi:10.1007/978-3-031-19839-7_34

Wencheng Han¹²,
Junbo Yin¹³,
Xiaogang Jin¹⁴,
Xiangdong Dai¹⁵ &
…
Jianbing Shen ORCID: orcid.org/0000-0003-1883-2086¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13698))

Included in the following conference series:

European Conference on Computer Vision

2511 Accesses
9 Citations

Abstract

Self-supervised monocular depth estimation has achieved encouraging performance recently. A consensus is that high-resolution inputs often yield better results. However, we find that the performance gap between high and low resolutions in this task mainly lies in the inappropriate feature representation of the widely used U-Net backbone rather than the information difference. In this paper, we address the comprehensive feature representation problem for self-supervised depth estimation by paying attention to both local and global feature representation. Specifically, we first provide an in-depth analysis of the influence of different input resolutions and find out that the receptive fields play a more crucial role than the information disparity between inputs. To this end, we propose a bilateral depth encoder that can fully exploit detailed and global information. It benefits from more broad receptive fields and thus achieves substantial improvements. Furthermore, we propose a residual decoder to facilitate depth regression as well as save computations by focusing on the information difference between different layers. We named our new depth estimation model Bilateral Residual Depth Network (BRNet). Experimental results show that BRNet achieves new state-of-the-art performance on the KITTI benchmark with three types of self-supervision. Codes are available at: https://github.com/wencheng256/BRNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aleotti, F., Tosi, F., Poggi, M., Mattoccia, S.: Generative adversarial networks for unsupervised monocular depth prediction. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 337–354. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11009-3_20
Chapter Google Scholar
Bhat, S.F., Alhashim, I., Wonka, P.: AdaBins: depth estimation using adaptive bins. In: CVPR (2021)
Google Scholar
Casser, V., Pirk, S., Mahjourian, R., Angelova, A.: Depth prediction without the sensors: leveraging structure for unsupervised learning from monocular videos. In: AAAI (2019)
Google Scholar
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV (2015)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. arXiv preprint arXiv:1406.2283 (2014)
Garg, R., B.G., V.K., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_45
Chapter Google Scholar
Godard, C., Aodha, O.M., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)
Google Scholar
Godard, C., Aodha, O.M., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation. In: ICCV (2019)
Google Scholar
Guizilini, V., Ambrus, R., Pillai, S., Raventos, A., Gaidon, A.: 3D packing for self-supervised monocular depth estimation. In: CVPR (2020)
Google Scholar
Janai, J., Güney, F., Ranjan, A., Black, M., Geiger, A.: Unsupervised learning of multi-frame optical flow with occlusions. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 713–731. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_42
Chapter Google Scholar
Johnston, A., Carneiro, G.: Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In: CVPR (2020)
Google Scholar
Klodt, M., Vedaldi, A.: Supervising the new with the old: learning SFM from SFM. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 713–728. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_43
Chapter Google Scholar
Kuznietsov, Y., Stuckler, J., Leibe, B.: Semi-supervised deep learning for monocular depth map prediction. In: CVPR (2017)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
Google Scholar
Luo, C., et al.: Every pixel counts++: joint learning of geometry and motion with 3D holistic understanding. PAMI 42(10), 2624–2641 (2019)
Article Google Scholar
Luo, Y., et al.: Single view stereo matching. In: CVPR (2018)
Google Scholar
Lyu, X., et al.: HR-depth: high resolution self-supervised monocular depth estimation. CoRR abs/2012.07356 (2020)
Google Scholar
Mahjourian, R., Wicke, M., Angelova, A.: Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: CVPR (2018)
Google Scholar
Makarov, I., Bakhanova, M., Nikolenko, S., Gerasimova, O.: Self-supervised recurrent depth estimation with attention mechanisms. PeerJ Comput. Sci. 8, e865 (2022)
Article Google Scholar
Mahdi, S., Miangoleh, H., Dille, S., Mai, L., Paris, S., Aksoy, Y.: Boosting monocular depth estimation models to high-resolution via content-adaptive multi-resolution merging. In: CVPR (2021)
Google Scholar
Pillai, S., Ambruş, R., Gaidon, A.: SuperDepth: self-supervised, super-resolved monocular depth estimation. In: ICRA (2019)
Google Scholar
Poggi, M., Aleotti, F., Tosi, F., Mattoccia, S.: Towards real-time unsupervised monocular depth estimation on CPU. In: IROS (2018)
Google Scholar
Poggi, M., Tosi, F., Mattoccia, S.: Learning monocular depth estimation with unsupervised trinocular assumptions. In: 3DV (2018)
Google Scholar
Zama Ramirez, P., Poggi, M., Tosi, F., Mattoccia, S., Di Stefano, L.: Geometry meets semantics for semi-supervised monocular depth estimation. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11363, pp. 298–313. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20893-6_19
Chapter Google Scholar
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: CVPR (2021)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (2015)
Google Scholar
Saxena, A., Sun, M., Ng, A.Y.: Make3D: depth perception from a single still image. In: AAAI (2008)
Google Scholar
Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. PAMI 31(5), 824–840 (2008)
Google Scholar
Scharstein, D., Szeliski, R.: High-accuracy stereo depth maps using structured light. In: CVPR (2003)
Google Scholar
Shelhamer, E., Barron, J.T., Darrell, T.: Scene intrinsics and depth from a single image. In: ICCV Workshops (2015)
Google Scholar
Song, M., Lim, S., Kim, W.: Monocular depth estimation using Laplacian pyramid-based depth residuals. IEEE Trans. Circuits Syst. Video Technol. (2021)
Google Scholar
Tosi, F., Aleotti, F., Poggi, M., Mattoccia, S.: Learning monocular depth estimation infusing traditional stereo knowledge. In: CVPR (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Vijayanarasimhan, S., Ricco, S., Schmid, C., Sukthankar, R., Fragkiadaki, K., SFM-Net: learning of structure and motion from video. arXiv preprint arXiv:1704.07804 (2017)
Watson, J., Firman, M., Brostow, G.J., Turmukhambetov, D.: Self-supervised monocular depth hints. In: ICCV (2019)
Google Scholar
Yang, Z., Wang, P., Wang, Y., Xu, W., Nevatia, R.: Lego: learning edge with geometry all at once by watching videos. In: CVPR (2018)
Google Scholar
Yang, Z., Wang, P., Xu, W., Zhao, L., Nevatia, R.: Unsupervised learning of geometry with edge-aware depth-normal consistency. arXiv preprint arXiv:1711.03665 (2017)
Zbontar, J., LeCun, Y., et al.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17(1), 2287–2318 (2016)
MATH Google Scholar
Zhan, H., Garg, R., Weerasekera, C.S., Li, K., Agarwal, H., Reid, I.: Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In CVPR (2018)
Google Scholar
Zhao, W., Liu, S., Shu, Y., Liu, Y.-J.: Towards better generalization: joint depth-pose learning without posenet. In: CVPR (2020)
Google Scholar
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR (2017)
Google Scholar
Zhou, Z., Fan, X., Shi, P., Xin, Y.: R-MSFM: recurrent multi-scale feature modulation for monocular depth estimating. In: ICCV (2021)
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 62036010, 61972344), and the Start-up Research Grant (SRG) of University of Macau.

Author information

Authors and Affiliations

SKL-IOTSC, Computer and Information Science, University of Macau, Zhuhai, China
Wencheng Han & Jianbing Shen
School of Computer Science, Beijing Institute of Technology, Beijing, China
Junbo Yin
State Key Lab of CAD &CG, Zhejiang University, Hangzhou, 310058, China
Xiaogang Jin
Guangdong OPPO Mobile Telecommunications Corp., Ltd., Dongguan, China
Xiangdong Dai

Authors

Wencheng Han
View author publications
You can also search for this author in PubMed Google Scholar
Junbo Yin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaogang Jin
View author publications
You can also search for this author in PubMed Google Scholar
Xiangdong Dai
View author publications
You can also search for this author in PubMed Google Scholar
Jianbing Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianbing Shen .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 777 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, W., Yin, J., Jin, X., Dai, X., Shen, J. (2022). BRNet: Exploring Comprehensive Features for Monocular Depth Estimation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13698. Springer, Cham. https://doi.org/10.1007/978-3-031-19839-7_34

Download citation

DOI: https://doi.org/10.1007/978-3-031-19839-7_34
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19838-0
Online ISBN: 978-3-031-19839-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

BRNet: Exploring Comprehensive Features for Monocular Depth Estimation