skip to main content
10.1145/3581783.3611751acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

V2Depth: Monocular Depth Estimation via Feature-Level Virtual-View Simulation and Refinement

Published: 27 October 2023 Publication History

Abstract

Due to the lack of spatial cues giving merely a single image, many monocular depth estimation methods have been developed to leverage stereo or multi-view images to learn the spatial information of a scene in a self-supervised manner. However, these methods have limited performance gain since they are not able to exploit sufficient 3D geometry cues during inference, where only monocular images are available. In this work, we present V2Depth, a novel coarse-to-fine framework with Virtual View feature simulation for supervised monocular Depth estimation. Specifically, we first design a virtual-view feature simulator by leveraging the technique of novel view synthesis and contrastive learning to generate virtual view feature maps. In this way, we explicitly provide representative spatial geometry for subsequent depth estimation in both the training and inference stages. Then we introduce a 3DVA-Refiner to iteratively optimize the predicted depth map. During the optimization process, 3D-aware virtual attention is developed to capture the global spatial-context correlations to maintain the feature consistency of different views and estimation integrity of the 3D scene such as objects with occlusion relationships. Decisive improvements over state-of-the-art approaches on three benchmark datasets across all metrics demonstrate the superiority of our method.

References

[1]
Shubhra Aich, Jean Marie Uwabeza Vianney, Md Amirul Islam, and Mannat Kaur Bingbing Liu. 2021. Bidirectional attention network for monocular depth estimation. In IEEE Int. Conf. on Robotics and Automation (ICRA). IEEE, 11746--11752.
[2]
Gwangbin Bae, Ignas Budvytis, and Roberto Cipolla. 2022. Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2842--2851.
[3]
Jonathan T Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, and Pratul P Srinivasan. 2021. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In IEEE Int. Conf. on Computer Vision (ICCV). 5855--5864.
[4]
Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. 2022. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 5470--5479.
[5]
Zuria Bauer, Zuoyue Li, Sergio Orts-Escolano, Miguel Cazorla, Marc Pollefeys, and Martin R Oswald. 2021. NVS-MonoDepth: Improving monocular depth prediction with novel view synthesis. In 2021 International Conference on 3D Vision (3DV). IEEE, 848--858.
[6]
Shariq Farooq Bhat, Ibraheem Alhashim, and Peter Wonka. 2021. AdaBins: Depth estimation using adaptive bins. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 4009--4018.
[7]
Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. 2022. Tensorf: Tensorial radiance fields. In Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXII. Springer, 333--350.
[8]
Zhi Chen, Xiaoqing Ye, Wei Yang, Zhenbo Xu, Xiao Tan, Zhikang Zou, Errui Ding, Xinming Zhang, and Liusheng Huang. 2021. Revealing the reciprocal relations between self-supervised stereo and monocular depth estimation. In IEEE Int. Conf. on Computer Vision (ICCV). 15529--15538.
[9]
Kangle Deng, Andrew Liu, Jun-Yan Zhu, and Deva Ramanan. 2022. Depth-supervised nerf: Fewer views and faster training for free. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 12882--12891.
[10]
Helisa Dhamo, Nassir Navab, and Federico Tombari. 2019. Object-driven multi-layer scene decomposition from a single image. In IEEE Int. Conf. on Computer Vision (ICCV). 5369--5378.
[11]
Helisa Dhamo, Keisuke Tateno, Iro Laina, Nassir Navab, and Federico Tombari. 2019. Peeking behind objects: Layered depth prediction from a single image. Pattern Recognition Letters 125 (2019), 333--340.
[12]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Int. Conf. on Learning Representations (ICLR). OpenReview.net.
[13]
Ruofei Du, Eric Turner, Maksym Dzitsiuk, Luca Prasso, Ivo Duarte, Jason Dourgarian, Joao Afonso, Jose Pascoal, Josh Gladstone, Nuno Cruces, et al. 2020. DepthLab: Real-time 3D interaction with depth maps for mobile augmented reality. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (UIST). 829--843.
[14]
David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. Conference and Workshop on Neural Information Processing Systems (NeurIPS) (2014), 2366--2347.
[15]
Fatima El Jamiy and Ronald Marsh. 2019. Survey on depth perception in head mounted displays: distance estimation in virtual reality, augmented reality, and mixed reality. IET Image Processing 13, 5 (2019), 707--712.
[16]
Ziyue Feng, Liang Yang, Longlong Jing, Haiyan Wang, YingLi Tian, and Bing Li. 2022. Disentangling Object Motion and Occlusion for Unsupervised Multi-frame Monocular Depth. In European Conf. on Computer Vision (ECCV). 228--244.
[17]
Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. 2018. Deep ordinal regression network for monocular depth estimation. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2002--2011.
[18]
Adrien Gaidon, Qiao Wang, Yohann Cabon, and Eleonora Vig. 2016. Virtual worlds as proxy for multi-object tracking analysis. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 4340--4349.
[19]
Ravi Garg, Vijay Kumar Bg, Gustavo Carneiro, and Ian Reid. 2016. Unsupervised cnn for single view depth estimation: Geometry to the rescue. In European Conf. on Computer Vision (ECCV). Springer, 740--756.
[20]
Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 3354--3361.
[21]
Clement Godard, Oisin Mac Aodha, Michael Firman, and Gabriel J Brostow. 2019. Digging into self-supervised monocular depth estimation. In IEEE Int. Conf. on Computer Vision (ICCV). 3828--3838.
[22]
Clément Godard, Oisin Mac Aodha, and Gabriel J Brostow. 2017. Unsupervised monocular depth estimation with left-right consistency. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 270--279.
[23]
Xiaodong Gu, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Chengzhou Tang, Zilong Dong, and Ping Tan. 2023. DRO: Deep Recurrent Optimizer for Video to Depth. IEEE Robotics and Automation Letters (RA-L) (2023).
[24]
Vitor Guizilini, Rares Ambrus, Wolfram Burgard, and Adrien Gaidon. 2021. Sparse auxiliary networks for unified monocular depth prediction and completion. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 11078--11088.
[25]
Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raventos, and Adrien Gaidon. 2020. 3d packing for self-supervised monocular depth estimation. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2485--2494.
[26]
Vitor Guizilini, Rare s, Ambrus, Dian Chen, Sergey Zakharov, and Adrien Gaidon. 2022. Multi-Frame Self-Supervised Depth with Transformers. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 160--170.
[27]
Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), Vol. 2. IEEE, 1735--1742.
[28]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 770--778.
[29]
Heiko Hirschmuller. 2006. Stereo vision in structured environments by consistent semi-global matching. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Vol. 2. IEEE, 2386--2393.
[30]
Heiko Hirschmuller. 2007. Stereo processing by semiglobal matching and mutual information. IEEE Transactions on pattern analysis and machine intelligence 30, 2 (2007), 328--341.
[31]
Ajay Jain, Matthew Tancik, and Pieter Abbeel. 2021. Putting nerf on a diet: Semantically consistent few-shot view synthesis. In IEEE Int. Conf. on Computer Vision (ICCV). 5885--5894.
[32]
Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, and Junmo Kim. 2022. Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth. arXiv preprint arXiv:2201.07436 (2022).
[33]
Mijeong Kim, Seonguk Seo, and Bohyung Han. 2022. Infonerf: Ray entropy minimization for few-shot neural volume rendering. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 12912--12921.
[34]
Marvin Klingner, Jan-Aike Termöhlen, Jonas Mikolajczyk, and Tim Fingscheidt. 2020. Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance. In European Conf. on Computer Vision (ECCV). Springer, 582--600.
[35]
Varun Ravi Kumar, Marvin Klingner, Senthil Yogamani, Stefan Milz, Tim Fingscheidt, and Patrick Mader. 2021. Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 61--71.
[36]
Varun Ravi Kumar, Senthil Yogamani, Markus Bach, Christian Witt, Stefan Milz, and Patrick Mäder. 2020. Unrectdepthnet: Self-supervised monocular depth estimation using a generic framework for handling common camera distortion models. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 8177--8183.
[37]
Tristan Laidlow, Jan Czarnowski, and Stefan Leutenegger. 2019. DeepFusion: Real-time dense 3D reconstruction for monocular SLAM using single-view depth and gradient predictions. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 4068--4074.
[38]
Jin Han Lee, Myung-Kyu Han, Dong Wook Ko, and Il Hong Suh. 2019. From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326 (2019).
[39]
Seokju Lee, Sunghoon Im, Stephen Lin, and In So Kweon. 2021. Learning monocular depth in dynamic scenes via instance-aware projection consistency. In AAAI Conf. on Artificial Intell. (AAAI). 1863--1872.
[40]
Sihaeng Lee, Janghyeon Lee, Byungju Kim, Eojindl Yi, and Junmo Kim. 2021. Patch-wise attention network for monocular depth estimation. In AAAI Conf. on Artificial Intell. (AAAI). 1873--1881.
[41]
Bo Li, Chunhua Shen, Yuchao Dai, Anton Van Den Hengel, and Mingyi He. 2015. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical crfs. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 1119--1127.
[42]
Jiaxin Li, Zijian Feng, Qi She, Henghui Ding, Changhu Wang, and Gim Hee Lee. 2021. Mine: Towards continuous depth mpi with nerf for novel view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12578--12588.
[43]
Yingyan Li, Yuntao Chen, Jiawei He, and Zhaoxiang Zhang. 2022. Densely Constrained Depth Estimator for Monocular 3D Object Detection. In European Conf. on Computer Vision (ECCV). Springer, 718--734.
[44]
Zhengqi Li, Tali Dekel, Forrester Cole, Richard Tucker, Noah Snavely, Ce Liu, and William T Freeman. 2019. Learning the depths of moving people by watching frozen people. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 4521--4530.
[45]
Ce Liu, Suryansh Kumar, Shuhang Gu, Radu Timofte, and Luc Van Gool. 2023. Single Image Depth Prediction Made Better: A Multivariate Gaussian Take. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).
[46]
Fayao Liu, Chunhua Shen, and Guosheng Lin. 2015. Deep convolutional neural fields for depth estimation from a single image. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 5162--5170.
[47]
Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, and Christian Theobalt. 2020. Neural sparse voxel fields. Advances in Neural Information Processing Systems 33 (2020), 15651--15663.
[48]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In IEEE Int. Conf. on Computer Vision (ICCV). 10012--10022.
[49]
Chenxu Luo, Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, Ram Nevatia, and Alan Yuille. 2019. Every pixel counts: Joint learning of geometry and motion with 3d holistic understanding. IEEE transactions on pattern analysis and machine intelligence 42, 10 (2019), 2624--2641.
[50]
Nelson Max. 1995. Optical models for direct volume rendering. IEEE Transactions on Visualization and Computer Graphics 1, 2 (1995), 99--108.
[51]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99--106.
[52]
Norman Müller, Andrea Simonelli, Lorenzo Porzi, Samuel Rota Bulò, Matthias Nießner, and Peter Kontschieder. 2022. Autorf: Learning 3d object radiance fields from single view observations. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 3971--3980.
[53]
Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. 2022. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG) 41, 4 (2022), 1--15.
[54]
Fuseini Mumuni and Alhassan Mumuni. 2022. Bayesian cue integration of structure from motion and CNN-based monocular depth estimation for autonomous robot navigation. International Journal of Intelligent Robotics and Applications 6, 2 (2022), 191--206.
[55]
Michael Niemeyer, Jonathan T Barron, Ben Mildenhall, Mehdi SM Sajjadi, Andreas Geiger, and Noha Radwan. 2022. Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 5480--5490.
[56]
Matthias Ochs, Adrian Kretz, and Rudolf Mester. 2019. Sdnet: Semantically guided depth estimation network. In Pattern Recognition: 41st DAGM German Conference, DAGM GCPR 2019, Dortmund, Germany, September 10-13, 2019, Proceedings 41. Springer, 288--302.
[57]
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).
[58]
Vaishakh Patil, Christos Sakaridis, Alexander Liniger, and Luc Van Gool. 2022. P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 1610--1621.
[59]
Sudeep Pillai, Rare? Ambru?, and Adrien Gaidon. 2019. Superdepth: Self-supervised, super-resolved monocular depth estimation. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 9250--9256.
[60]
Matteo Poggi, Filippo Aleotti, Fabio Tosi, and Stefano Mattoccia. 2020. On the uncertainty of self-supervised monocular depth estimation. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 3227--3237.
[61]
Siyuan Qiao, Yukun Zhu, Hartwig Adam, Alan Yuille, and Liang-Chieh Chen. 2021. Vip-deeplab: Learning visual perception with depth-aware video panoptic segmentation. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 3997--4008.
[62]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748--8763.
[63]
René Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. 2021. Vision transformers for dense prediction. In IEEE Int. Conf. on Computer Vision (ICCV). 12179--12188.
[64]
René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. 2020. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE transactions on pattern analysis and machine intelligence 44, 3 (2020), 1623--1637.
[65]
Manuel Rey-Area, Mingze Yuan, and Christian Richardt. 2022. 360MonoDepth: High-Resolution 360deg Monocular Depth Estimation. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 3762--3772.
[66]
Elisa Ricci, Wanli Ouyang, Xiaogang Wang, Nicu Sebe, et al. 2018. Monocular depth estimation using multi-scale continuous CRFs as sequential deep networks. IEEE Trans. Pattern Anal. & Mach. Intell. 41, 6 (2018), 1426--1440.
[67]
Barbara Roessle, Jonathan T Barron, Ben Mildenhall, Pratul P Srinivasan, and Matthias Nießner. 2022. Dense depth priors for neural radiance fields from sparse input views. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 12892--12901.
[68]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115 (2015), 211--252.
[69]
Jonathan Shade, Steven Gortler, Li-wei He, and Richard Szeliski. 1998. Layered depth images. In Proceedings of the 25th annual conference on Computer graphics and interactive techniques. 231--242.
[70]
Prafull Sharma, Ayush Tewari, Yilun Du, Sergey Zakharov, Rares Ambrus, Adrien Gaidon, William T Freeman, Fredo Durand, Joshua B Tenenbaum, and Vincent Sitzmann. 2022. Seeing 3D Objects in a Single Image via Self-Supervised Static-Dynamic Disentanglement. arXiv preprint arXiv:2207.11232 (2022).
[71]
Kihyuk Sohn. 2016. Improved deep metric learning with multi-class n-pair loss objective. Advances in neural information processing systems 29 (2016).
[72]
Pratul P Srinivasan, Richard Tucker, Jonathan T Barron, Ravi Ramamoorthi, Ren Ng, and Noah Snavely. 2019. Pushing the boundaries of view extrapolation with multiplane images. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 175--184.
[73]
Towaki Takikawa, Joey Litalien, Kangxue Yin, Karsten Kreis, Charles Loop, Derek Nowrouzezahrai, Alec Jacobson, Morgan McGuire, and Sanja Fidler. 2021. Neural geometric level of detail: Real-time rendering with implicit 3D shapes. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 11358--11367.
[74]
Chengzhou Tang and Ping Tan. 2018. BA-Net: Dense Bundle Adjustment Networks. In Int. Conf. on Learning Representations (ICLR).
[75]
Zachary Teed and Jia Deng. 2019. DeepV2D: Video to Depth with Differentiable Structure from Motion. In Int. Conf. on Learning Representations (ICLR).
[76]
Zachary Teed and Jia Deng. 2020. RAFT: Recurrent all-pairs field transforms for optical flow. In European Conf. on Computer Vision (ECCV). 402--419.
[77]
Richard Tucker and Noah Snavely. 2020. Single-view view synthesis with multi-plane images. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 551--560.
[78]
Shubham Tulsiani, Richard Tucker, and Noah Snavely. 2018. Layer-structured 3d scene inference via view synthesis. In Proceedings of the European Conference on Computer Vision (ECCV). 302--317.
[79]
Jamie Watson, Michael Firman, Gabriel J Brostow, and Daniyar Turmukhambetov. 2019. Self-supervised monocular depth hints. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2162--2171.
[80]
Olivia Wiles, Georgia Gkioxari, Richard Szeliski, and Justin Johnson. 2020. Synsin: End-to-end view synthesis from a single image. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 7467--7477.
[81]
Felix Wimbauer, Nan Yang, Christian Rupprecht, and Daniel Cremers. 2023. Behind the Scenes: Density Fields for Single View Reconstruction. (2023).
[82]
Felix Wimbauer, Nan Yang, Lukas Von Stumberg, Niclas Zeller, and Daniel Cremers. 2021. MonoRec: Semi-supervised dense reconstruction in dynamic environments from a single moving camera. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 6112--6122.
[83]
Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, and Shuochen Su. 2022. Toward Practical Monocular Indoor Depth Estimation. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 3814--3824.
[84]
Zizhang Wu, Zhi-Gang Fan, Zhuozheng Li, Jizheng Wang, Tianhao Xu, Qiang Tang, Fan Wang, and Zhengbo Luo. 2022. Monocular Fisheye Depth Estimation for Automated Valet Parking: Dataset, Baseline and Deep Optimizers. In International Conference on Intelligent Transportation Systems (ITSC). IEEE, 01-07.
[85]
Dan Xu, Wei Wang, Hao Tang, Hong Liu, Nicu Sebe, and Elisa Ricci. 2018. Structured attention guided convolutional neural fields for monocular depth estimation. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 3917--3925.
[86]
Guorun Yang, Xiao Song, Chaoqin Huang, Zhidong Deng, Jianping Shi, and Bolei Zhou. 2019. Drivingstereo: A large-scale dataset for stereo matching in autonomous driving scenarios. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 899--908.
[87]
Wei Yin, Yifan Liu, Chunhua Shen, and Youliang Yan. 2019. Enforcing geometric constraints of virtual normal for depth prediction. In IEEE Int. Conf. on Computer Vision (ICCV). 5684--5693.
[88]
Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. 2021. pixelnerf: Neural radiance fields from one or few images. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 4578--4587.
[89]
Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, and Ping Tan. 2022. NeWCRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 3916--3925.
[90]
Zhenyu Zhang, Zhen Cui, Chunyan Xu, Yan Yan, Nicu Sebe, and Jian Yang. 2019. Pattern-affinitive propagation across depth, surface normal and semantic segmentation. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 4106--4115.
[91]
Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. 2018. Stereo magnification: Learning view synthesis using multiplane images. arXiv preprint arXiv:1805.09817 (2018).

Cited By

View all
  • (2024)SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One ModelProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681405(3469-3478)Online publication date: 28-Oct-2024
  • (2024)ColVO: Colonoscopic Visual Odometry Considering Geometric and Photometric ConsistencyProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681286(8100-8109)Online publication date: 28-Oct-2024
  • (2024)Domain Shared and Specific Prompt Learning for Incremental Monocular Depth EstimationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681155(8306-8315)Online publication date: 28-Oct-2024

Index Terms

  1. V2Depth: Monocular Depth Estimation via Feature-Level Virtual-View Simulation and Refinement

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. contrastive learning
    2. monocular depth estimation
    3. refinement network
    4. virtual-view feature simulation

    Qualifiers

    • Research-article

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)88
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One ModelProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681405(3469-3478)Online publication date: 28-Oct-2024
    • (2024)ColVO: Colonoscopic Visual Odometry Considering Geometric and Photometric ConsistencyProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681286(8100-8109)Online publication date: 28-Oct-2024
    • (2024)Domain Shared and Specific Prompt Learning for Incremental Monocular Depth EstimationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681155(8306-8315)Online publication date: 28-Oct-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media