skip to main content
research-article

Monocular Depth and Ego-motion Estimation with Scale Based on Superpixel and Normal Constraints

Published: 30 October 2024 Publication History

Abstract

Three-dimensional perception in intelligent virtual and augmented reality (VR/AR) and autonomous vehicles (AV) applications is critical and attracting significant attention. The self-supervised monocular depth and ego-motion estimation serves as a more intelligent learning approach that provides the required scene depth and location for 3D perception. However, the existing self-supervised learning methods suffer from scale ambiguity, boundary blur, and imbalanced depth distribution, limiting the practical applications of VR/AR and AV. In this article, we propose a new self-supervised learning framework based on superpixel and normal constraints to address these problems. Specifically, we formulate a novel 3D edge structure consistency loss to alleviate the boundary blur of depth estimation. To address the scale ambiguity of estimated depth and ego-motion, we propose a novel surface normal network for efficient camera height estimation. The surface normal network is composed of a deep fusion module and a full-scale hierarchical feature aggregation module. Meanwhile, to realize the global smoothing and boundary discriminability of the predicted normal map, we introduce a novel fusion loss which is based on the consistency constraints of the normal in edge domains and superpixel regions. Experiments are conducted on several benchmarks, and the results illustrate that the proposed approach outperforms the state-of-the-art methods in depth, ego-motion, and surface normal estimation.

References

[1]
Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Süsstrunk. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (2012), 2274–2282.
[2]
Jinwoo Bae, Sungho Moon, and Sunghoon Im. 2023. Deep digging into the generalization of self-supervised monocular depth estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 187–196.
[3]
Kanchan Bahirat, Chengyuan Lai, Ryan P. Mcmahan, and Balakrishnan Prabhakaran. 2018. Designing and evaluating a mesh simplification algorithm for virtual reality. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14, 3s, Article 63 (Jun. 2018), 26 pages. DOI:
[4]
Antyanta Bangunharcana, Ahmed Magd, and Kyung-Soo Kim. 2023. DualRefine: Self-supervised depth and pose estimation through iterative epipolar sampling and refinement toward equilibrium. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 726–738.
[5]
Jiawang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, and Ian Reid. 2019. Unsupervised scale-consistent depth and ego-motion learning from monocular video. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 32. 35–45.
[6]
Yuanzhouhan Cao, Tianqi Zhao, Ke Xian, Chunhua Shen, Zhiguo Cao, and Shugong Xu. 2018. Monocular depth estimation with augmented ordinal depth relationships. IEEE Transactions on Image Processing 30, 8 (2018), 2674–2682.
[7]
Vincent Casser, Soeren Pirk, Reza Mahjourian, and Anelia Angelova. 2019a. Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 8001–8008.
[8]
Vincent Casser, Soeren Pirk, Reza Mahjourian, and Anelia Angelova. 2019b. Unsupervised monocular depth and ego-motion learning with structure and semantics. In Proceedings of the IEEE/CVF Conference on Computer -Vision and Pattern Recognition Workshops. 381–388.
[9]
Hyesong Choi, Hunsang Lee, Sunkyung Kim, Sunok Kim, Seungryong Kim, Kwanghoon Sohn, and Dongbo Min. 2021. Adaptive confidence thresholding for monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12808–12818.
[10]
Jaehoon Choi, Dongki Jung, Donghwan Lee, and Changick Kim. 2020. SAFENet: Self-supervised monocular depth estimation with semantic-aware feature extraction. arXiv:2010.02893. Retrieved from https://api.semanticscholar.org/CorpusID:222142362
[11]
Weiqin Chuah, Ruwan Tennakoon, Reza Hoseinnezhad, and Alireza Bab-Hadiashar. 2021. Deep learning-based incorporation of planar constraints for robust stereo depth estimation in autonomous vehicle applications. IEEE Transactions on Intelligent Transportation Systems 23, 7 (2021), 6654–6665.
[12]
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3213–3223.
[13]
David Eigen and Rob Fergus. 2015. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE International Conference on Computer Vision. 2650–2658.
[14]
David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. arXiv:1406.2283. Retrieved from https://doi.org/10.48550/arXiv.1406.2283
[15]
Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. 2018. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2002–2011.
[16]
Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research 32, 11 (2013), 1231–1237.
[17]
Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? the kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3354–3361.
[18]
Clément Godard, Oisin Mac Aodha, and Gabriel J. Brostow. 2017. Unsupervised monocular depth estimation with left-right consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 270–279.
[19]
Clément Godard, Oisin Mac Aodha, Michael Firman, and Gabriel J Brostow. 2019. Digging into self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3828–3838.
[20]
Matan Goldman, Tal Hassner, and Shai Avidan. 2019. Learn stereo, infer mono: Siamese networks for self-supervised, monocular, depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2886–2895.
[21]
Juan Luis Gonzalez, Jaeho Moon, and Munchurl Kim. 2023. Detail-preserving self-supervised monocular depth with self-supervised structural sharpening. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 254–264.
[22]
Ariel Gordon, Hanhan Li, Rico Jonschkowski, and Anelia Angelova. 2019. Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8977–8986.
[23]
Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raventos, and Adrien Gaidon. 2020. 3D packing for self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2485–2494.
[24]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[25]
Huimin Huang, Lanfen Lin, Ruofeng Tong, Hongjie Hu, Qiaowei Zhang, Yutaro Iwamoto, Xianhua Han, Yen-Wei Chen, and Jian Wu. 2020. Unet 3+: A full-scale connected unet for medical image segmentation. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1055–1059.
[26]
Tak-Wai Hui. 2022. RM-Depth: unsupervised learning of recurrent monocular depth in dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1675–1684.
[27]
Rongrong Ji, Ke Li, Yan Wang, Xiaoshuai Sun, Feng Guo, Xiaowei Guo, Yongjian Wu, Feiyue Huang, and Jiebo Luo. 2019. Semi-supervised adversarial monocular depth estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 10 (2019), 2410–2422.
[28]
Jianbo Jiao, Ying Cao, Yibing Song, and Rynson Lau. 2018. Look deeper into depth: Monocular depth estimation with semantic booster and attention-driven loss. In Proceedings of the European Conference on Computer Vision (ECCV). 53–69.
[29]
Adrian Johnston and Gustavo Carneiro. 2020. Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4756–4765.
[30]
Krzysztof Jordan and Philippos Mordohai. 2014. A quantitative evaluation of surface normal estimation in point clouds. In Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 4220–4226.
[31]
Hyunyoung Jung, Eunhyeok Park, and Sungjoo Yoo. 2021. Fine-grained semantics-aware representation enhancement for self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12642–12652.
[32]
Marvin Klingner, Jan-Aike Termöhlen, Jonas Mikolajczyk, and Tim Fingscheidt. 2020. Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance. In Proceedings of the European Conference on Computer Vision. Springer, 582–600.
[33]
Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. 2016. Deeper depth prediction with fully convolutional residual networks. In Proceedings of the 4th International Conference on 3D Vision (3DV ’16). IEEE, 239–248.
[34]
Hanhan Li, Ariel Gordon, Hang Zhao, Vincent Casser, and Anelia Angelova. 2020. Unsupervised monocular depth learning in dynamic scenes. In Proceedings of the 4th Conference on Robot Learning (CoRL ’20), Vol. 155. 1908–1917.
[35]
Tingguang Li, Delong Zhu, and Max Q.-H. Meng. 2017. A hybrid 3dof pose estimation method based on camera and lidar data. In Proceedings of the IEEE International Conference on Robotics and Biomimetics (ROBIO ’17). IEEE, 361–366.
[36]
Xiaoyang Lyu, Liang Liu, Mengmeng Wang, Xin Kong, Lina Liu, Yong Liu, Xinxin Chen, and Yi Yuan. 2021. Hr-depth: High resolution self-supervised monocular depth estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 2294–2301.
[37]
Reza Mahjourian, Martin Wicke, and Anelia Angelova. 2018. Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5667–5675.
[38]
Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D. Tardos. 2015. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Transactions on Robotics 31, 5 (2015), 1147–1163.
[39]
Mertalp Ocal and Armin Mustafa. 2020. RealMonoDepth: Self-supervised monocular depth estimation for general scenes. arXiv:2004.06267. Retrieved from https://doi.org/10.48550/arXiv.2004.06267
[40]
Ram Prasad Padhy, Pankaj Kumar Sa, Fabio Narducci, Carmen Bisogni, and Sambit Bakshi. 2022. Monocular vision aided depth measurement from RGB images for autonomous UAV navigation. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 20, 2 (2022), 1–22.
[41]
Rui Peng, Ronggang Wang, Yawen Lai, Luyang Tang, and Yangang Cai. 2021. Excavating the potential capacity of self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15560–15569.
[42]
Andra Petrovai and Sergiu Nedevschi. 2022. Exploiting pseudo labels in a self-supervised learning framework for improved monocular depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1578–1588.
[43]
Andrea Pilzer, Dan Xu, Mihai Puscas, Elisa Ricci, and Nicu Sebe. 2018. Unsupervised adversarial depth estimation using cycled generative networks. In Proceedings of the International Conference on 3D Vision (3DV ’18). IEEE, 587–595.
[44]
Clément Pinard, Laure Chevalley, Antoine Manzanera, and David Filliat. 2018. Learning structure-from-motion from motion. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops. 363–376.
[45]
Xiaojuan Qi, Renjie Liao, Zhengzhe Liu, Raquel Urtasun, and Jiaya Jia. 2018. Geonet: Geometric neural network for joint depth and surface normal estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 283–291.
[46]
Shanbao Qiao, Neal N. Xiong, Yongbin Gao, Zhijun Fang, Wenjun Yu, Juan Zhang, and Xiaoyan Jiang. 2023. Self-supervised learning of depth and ego-motion for 3D perception in human computer interaction. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 20, 2 (Mar. 2023). 1551–6857
[47]
Rene Ranftl, Vibhav Vineet, Qifeng Chen, and Vladlen Koltun. 2016. Dense monocular depth estimation in complex dynamic scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4058–4066.
[48]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 234–241.
[49]
Lawrence Rosenblum. 2000. Virtual and augmented reality 2020. IEEE Computer Graphics and Applications 20, 1 (2000), 38–39.
[50]
Tom Roussel, Luc Van Eycken, and Tinne Tuytelaars. 2019. Monocular depth estimation in new environments with absolute scale. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS ’19). IEEE, 1735–1741.
[51]
Anirban Roy and Sinisa Todorovic. 2016. Monocular depth estimation using neural regression forest. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5506–5514.
[52]
Ashutosh Saxena, Min Sun, and Andrew Y Ng. 2008. Make3d: Learning 3D scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 5 (2008), 824–840.
[53]
Tianwei Shen, Zixin Luo, Lei Zhou, Hanyu Deng, Runze Zhang, Tian Fang, and Long Quan. 2019. Beyond photometric loss for self-supervised ego-motion estimation. In Proceedings of the International Conference on Robotics and Automation. IEEE, 6359–6365.
[54]
Young-Sik Shin, Yeong Sang Park, and Ayoung Kim. 2020. Dvl-slam: sparse depth enhanced direct visual-lidar slam. Autonomous Robots 44, 2 (2020), 115–130.
[55]
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from RGBD images. In Proceedings of the European Conference on Computer Vision. Springer, 746–760.
[56]
Xibin Song, Wei Li, Dingfu Zhou, Yuchao Dai, Jin Fang, Hongdong Li, and Liangjun Zhang. 2021. MLDA-Net: Multi-level dual attention-based network for self-supervised monocular depth estimation. IEEE Transactions on Image Processing 30 (2021), 4691–4705.
[57]
Kunal Swami, Amrit Muduli, Uttam Gurram, and Pankaj Bajpai. 2022. Do what you can, with what you have: Scale-aware and high quality monocular depth estimation without real world labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 988–997.
[58]
Brandon Wagstaff and Jonathan Kelly. 2021. Self-supervised scale recovery for monocular depth and egomotion estimation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS ’21). IEEE, 2620–2627.
[59]
Anjie Wang, Zhijun Fang, Yongbin Gao, Songchao Tan, Shanshe Wang, Siwei Ma, and Jenq-Neng Hwang. 2020a. Adversarial learning for joint optimization of depth and ego-motion. IEEE Transactions on Image Processing 29 (2020), 4130–4142.
[60]
Lijun Wang, Jianming Zhang, Oliver Wang, Zhe Lin, and Huchuan Lu. 2020b. Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 541–550.
[61]
Rui Wang, Stephen M. Pizer, and Jan-Michael Frahm. 2019. Recurrent neural network for (un-) supervised learning of monocular video visual odometry and depth. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5555–5564.
[62]
Xiaolong Wang, David Fouhey, and Abhinav Gupta. 2015. Designing deep networks for surface normal estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 539–547.
[63]
Xiangwei Wang, Hui Zhang, Xiaochuan Yin, Mingxiao Du, and Qijun Chen. 2018. Monocular visual odometry scale recovery using geometrical constraint. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’18). IEEE, 988–995.
[64]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.
[65]
Jamie Watson, Michael Firman, Gabriel J Brostow, and Daniyar Turmukhambetov. 2019. Self-supervised monocular depth hints. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2162–2171.
[66]
Jamie Watson, Oisin Mac Aodha, Victor Prisacariu, Gabriel Brostow, and Michael Firman. 2021. The temporal opportunist: Self-supervised multi-frame monocular depth. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1164–1174.
[67]
Jipeng Wu, Rongrong Ji, Qiang Wang, Shengchuan Zhang, Xiaoshuai Sun, Yan Wang, Mingliang Xu, and Feiyue Huang. 2022. Fast monocular depth estimation via side prediction aggregation with continuous spatial refinement. IEEE Transactions on Multimedia 25 (2022), 1204–1216.
[68]
Mingkang Xiong, Zhenghong Zhang, Weilin Zhong, Jinsheng Ji, Jiyuan Liu, and Huilin Xiong. 2020. Self-supervised monocular depth and visual odometry learning with scale-consistent geometric constraints. In Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence (IJCAI). 963–969.
[69]
Feng Xue, Guirong Zhuo, Ziyuan Huang, Wufei Fu, Zhuoyue Wu, and Marcelo H Ang. 2020. Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS ’20). IEEE, 2330–2337.
[70]
Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, and Ram Nevatia. 2018. Lego: Learning edge with geometry all at once by watching videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 225–234.
[71]
Zhenheng Yang, Peng Wang, Wei Xu, Liang Zhao, and Ramakant Nevatia. 2017. Unsupervised learning of geometry with edge-aware depth-normal consistency. arXiv:1711.03665. Retrieved from
[72]
Zhichao Yin and Jianping Shi. 2018. Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1983–1992.
[73]
Jin Zeng, Yanfeng Tong, Yunmu Huang, Qiong Yan, Wenxiu Sun, Jing Chen, and Yongtian Wang. 2019. Deep surface normal estimation with hierarchical RGB-D fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6153–6162.
[74]
Huangying Zhan, Chamara Saroj Weerasekera, Ravi Garg, and Ian Reid. 2019. Self-supervised learning for single view depth and surface normal estimation. In Proceedings of the International Conference on Robotics and Automation (ICRA ’19). IEEE, 4811–4817.
[75]
Yourun Zhang, Maoguo Gong, Jianzhao Li, Mingyang Zhang, Fenlong Jiang, and Hongyu Zhao. 2022. self-supervised monocular depth estimation with multiscale perception. IEEE Transactions on Image Processing 31 (2022), 3251–3266.
[76]
Zhenyu Zhang, Zhen Cui, Chunyan Xu, Yan Yan, Nicu Sebe, and Jian Yang. 2019. Pattern-affinitive propagation across depth, surface normal and semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4106–4115.
[77]
Chaoqiang Zhao, Gary G. Yen, Qiyu Sun, Chongzhen Zhang, and Yang Tang. 2020. Masked GAN for unsupervised depth and pose prediction with scale consistency. IEEE Transactions on Neural Networks and Learning Systems 32, 12 (2020), 5392–5403.
[78]
Yiming Zhao, Mahdi Elhousni, Ziming Zhang, and Xinming Huang. 2021. Distance transform pooling neural network for LiDAR depth completion. IEEE Transactions on Neural Networks and Learning Systems 34, 9 (2021), 5580–5589.
[79]
Dingfu Zhou, Yuchao Dai, and Hongdong Li. 2019. Ground-plane-based absolute scale estimation for monocular visual odometry. IEEE Transactions on Intelligent Transportation Systems 21, 2 (2019), 791–802.
[80]
Hang Zhou, David Greenwood, and Sarah Taylor. 2021. Self-Supervised Monocular Depth Estimation with Internal Feature Fusion. BMVA Press, 378.
[81]
Tinghui Zhou, Matthew Brown, Noah Snavely, and David G. Lowe. 2017. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1851–1858.
[82]
Shengjie Zhu, Garrick Brazil, and Xiaoming Liu. 2020. The edge of depth: Explicit constraints between segmentation and depth. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13116–13125.
[83]
Yuliang Zou, Zelun Luo, and Jia-Bin Huang. 2018. Df-net: Unsupervised joint learning of depth and flow using cross-task consistency. In Proceedings of the European conference on computer vision (ECCV). 36–53.

Index Terms

  1. Monocular Depth and Ego-motion Estimation with Scale Based on Superpixel and Normal Constraints

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 10
      October 2024
      729 pages
      EISSN:1551-6865
      DOI:10.1145/3613707
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 October 2024
      Online AM: 01 July 2024
      Accepted: 22 June 2024
      Revised: 28 February 2024
      Received: 17 July 2023
      Published in TOMM Volume 20, Issue 10

      Check for updates

      Author Tags

      1. Self-supervised
      2. monocular
      3. depth
      4. scale
      5. superpixel
      6. surface normal

      Qualifiers

      • Research-article

      Funding Sources

      • National Natural Science Foundation of China
      • Shanghai Local Capacity Enhancement project
      • “Science and Technology Innovation Action Plan” of Shanghai Science and Technology Commission for social development project

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 260
        Total Downloads
      • Downloads (Last 12 months)260
      • Downloads (Last 6 weeks)28
      Reflects downloads up to 28 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media