research-article

Monocular Depth and Ego-motion Estimation with Scale Based on Superpixel and Normal Constraints

Authors:

Jeng-Neng Hwang,

Zhijun FangAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 10

Article No.: 303, Pages 1 - 26

https://doi.org/10.1145/3674977

Published: 30 October 2024 Publication History

Abstract

Three-dimensional perception in intelligent virtual and augmented reality (VR/AR) and autonomous vehicles (AV) applications is critical and attracting significant attention. The self-supervised monocular depth and ego-motion estimation serves as a more intelligent learning approach that provides the required scene depth and location for 3D perception. However, the existing self-supervised learning methods suffer from scale ambiguity, boundary blur, and imbalanced depth distribution, limiting the practical applications of VR/AR and AV. In this article, we propose a new self-supervised learning framework based on superpixel and normal constraints to address these problems. Specifically, we formulate a novel 3D edge structure consistency loss to alleviate the boundary blur of depth estimation. To address the scale ambiguity of estimated depth and ego-motion, we propose a novel surface normal network for efficient camera height estimation. The surface normal network is composed of a deep fusion module and a full-scale hierarchical feature aggregation module. Meanwhile, to realize the global smoothing and boundary discriminability of the predicted normal map, we introduce a novel fusion loss which is based on the consistency constraints of the normal in edge domains and superpixel regions. Experiments are conducted on several benchmarks, and the results illustrate that the proposed approach outperforms the state-of-the-art methods in depth, ego-motion, and surface normal estimation.

References

[1]

Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Süsstrunk. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (2012), 2274–2282.

Digital Library

[2]

Jinwoo Bae, Sungho Moon, and Sunghoon Im. 2023. Deep digging into the generalization of self-supervised monocular depth estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 187–196.

Digital Library

[3]

Kanchan Bahirat, Chengyuan Lai, Ryan P. Mcmahan, and Balakrishnan Prabhakaran. 2018. Designing and evaluating a mesh simplification algorithm for virtual reality. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14, 3s, Article 63 (Jun. 2018), 26 pages. DOI:

Digital Library

[4]

Antyanta Bangunharcana, Ahmed Magd, and Kyung-Soo Kim. 2023. DualRefine: Self-supervised depth and pose estimation through iterative epipolar sampling and refinement toward equilibrium. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 726–738.

[5]

Jiawang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, and Ian Reid. 2019. Unsupervised scale-consistent depth and ego-motion learning from monocular video. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 32. 35–45.

[6]

Yuanzhouhan Cao, Tianqi Zhao, Ke Xian, Chunhua Shen, Zhiguo Cao, and Shugong Xu. 2018. Monocular depth estimation with augmented ordinal depth relationships. IEEE Transactions on Image Processing 30, 8 (2018), 2674–2682.

[7]

Vincent Casser, Soeren Pirk, Reza Mahjourian, and Anelia Angelova. 2019a. Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 8001–8008.

Digital Library

[8]

Vincent Casser, Soeren Pirk, Reza Mahjourian, and Anelia Angelova. 2019b. Unsupervised monocular depth and ego-motion learning with structure and semantics. In Proceedings of the IEEE/CVF Conference on Computer -Vision and Pattern Recognition Workshops. 381–388.

[9]

Hyesong Choi, Hunsang Lee, Sunkyung Kim, Sunok Kim, Seungryong Kim, Kwanghoon Sohn, and Dongbo Min. 2021. Adaptive confidence thresholding for monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12808–12818.

[10]

Jaehoon Choi, Dongki Jung, Donghwan Lee, and Changick Kim. 2020. SAFENet: Self-supervised monocular depth estimation with semantic-aware feature extraction. arXiv:2010.02893. Retrieved from https://api.semanticscholar.org/CorpusID:222142362

[11]

Weiqin Chuah, Ruwan Tennakoon, Reza Hoseinnezhad, and Alireza Bab-Hadiashar. 2021. Deep learning-based incorporation of planar constraints for robust stereo depth estimation in autonomous vehicle applications. IEEE Transactions on Intelligent Transportation Systems 23, 7 (2021), 6654–6665.

Digital Library

[12]

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3213–3223.

[13]

David Eigen and Rob Fergus. 2015. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE International Conference on Computer Vision. 2650–2658.

Digital Library

[14]

David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. arXiv:1406.2283. Retrieved from https://doi.org/10.48550/arXiv.1406.2283

[15]

Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. 2018. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2002–2011.

[16]

Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research 32, 11 (2013), 1231–1237.

Digital Library

[17]

Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? the kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3354–3361.

Digital Library

[18]

Clément Godard, Oisin Mac Aodha, and Gabriel J. Brostow. 2017. Unsupervised monocular depth estimation with left-right consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 270–279.

[19]

Clément Godard, Oisin Mac Aodha, Michael Firman, and Gabriel J Brostow. 2019. Digging into self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3828–3838.

[20]

Matan Goldman, Tal Hassner, and Shai Avidan. 2019. Learn stereo, infer mono: Siamese networks for self-supervised, monocular, depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2886–2895.

[21]

Juan Luis Gonzalez, Jaeho Moon, and Munchurl Kim. 2023. Detail-preserving self-supervised monocular depth with self-supervised structural sharpening. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 254–264.

[22]

Ariel Gordon, Hanhan Li, Rico Jonschkowski, and Anelia Angelova. 2019. Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8977–8986.

[23]

Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raventos, and Adrien Gaidon. 2020. 3D packing for self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2485–2494.

[24]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[25]

Huimin Huang, Lanfen Lin, Ruofeng Tong, Hongjie Hu, Qiaowei Zhang, Yutaro Iwamoto, Xianhua Han, Yen-Wei Chen, and Jian Wu. 2020. Unet 3+: A full-scale connected unet for medical image segmentation. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1055–1059.

[26]

Tak-Wai Hui. 2022. RM-Depth: unsupervised learning of recurrent monocular depth in dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1675–1684.

[27]

Rongrong Ji, Ke Li, Yan Wang, Xiaoshuai Sun, Feng Guo, Xiaowei Guo, Yongjian Wu, Feiyue Huang, and Jiebo Luo. 2019. Semi-supervised adversarial monocular depth estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 10 (2019), 2410–2422.

Digital Library

[28]

Jianbo Jiao, Ying Cao, Yibing Song, and Rynson Lau. 2018. Look deeper into depth: Monocular depth estimation with semantic booster and attention-driven loss. In Proceedings of the European Conference on Computer Vision (ECCV). 53–69.

Digital Library

[29]

Adrian Johnston and Gustavo Carneiro. 2020. Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4756–4765.

[30]

Krzysztof Jordan and Philippos Mordohai. 2014. A quantitative evaluation of surface normal estimation in point clouds. In Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 4220–4226.

[31]

Hyunyoung Jung, Eunhyeok Park, and Sungjoo Yoo. 2021. Fine-grained semantics-aware representation enhancement for self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12642–12652.

[32]

Marvin Klingner, Jan-Aike Termöhlen, Jonas Mikolajczyk, and Tim Fingscheidt. 2020. Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance. In Proceedings of the European Conference on Computer Vision. Springer, 582–600.

Digital Library

[33]

Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. 2016. Deeper depth prediction with fully convolutional residual networks. In Proceedings of the 4th International Conference on 3D Vision (3DV ’16). IEEE, 239–248.

[34]

Hanhan Li, Ariel Gordon, Hang Zhao, Vincent Casser, and Anelia Angelova. 2020. Unsupervised monocular depth learning in dynamic scenes. In Proceedings of the 4th Conference on Robot Learning (CoRL ’20), Vol. 155. 1908–1917.

[35]

Tingguang Li, Delong Zhu, and Max Q.-H. Meng. 2017. A hybrid 3dof pose estimation method based on camera and lidar data. In Proceedings of the IEEE International Conference on Robotics and Biomimetics (ROBIO ’17). IEEE, 361–366.

[36]

Xiaoyang Lyu, Liang Liu, Mengmeng Wang, Xin Kong, Lina Liu, Yong Liu, Xinxin Chen, and Yi Yuan. 2021. Hr-depth: High resolution self-supervised monocular depth estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 2294–2301.

[37]

Reza Mahjourian, Martin Wicke, and Anelia Angelova. 2018. Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5667–5675.

[38]

Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D. Tardos. 2015. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Transactions on Robotics 31, 5 (2015), 1147–1163.

Digital Library

[39]

Mertalp Ocal and Armin Mustafa. 2020. RealMonoDepth: Self-supervised monocular depth estimation for general scenes. arXiv:2004.06267. Retrieved from https://doi.org/10.48550/arXiv.2004.06267

[40]

Ram Prasad Padhy, Pankaj Kumar Sa, Fabio Narducci, Carmen Bisogni, and Sambit Bakshi. 2022. Monocular vision aided depth measurement from RGB images for autonomous UAV navigation. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 20, 2 (2022), 1–22.

[41]

Rui Peng, Ronggang Wang, Yawen Lai, Luyang Tang, and Yangang Cai. 2021. Excavating the potential capacity of self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15560–15569.

[42]

Andra Petrovai and Sergiu Nedevschi. 2022. Exploiting pseudo labels in a self-supervised learning framework for improved monocular depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1578–1588.

[43]

Andrea Pilzer, Dan Xu, Mihai Puscas, Elisa Ricci, and Nicu Sebe. 2018. Unsupervised adversarial depth estimation using cycled generative networks. In Proceedings of the International Conference on 3D Vision (3DV ’18). IEEE, 587–595.

[44]

Clément Pinard, Laure Chevalley, Antoine Manzanera, and David Filliat. 2018. Learning structure-from-motion from motion. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops. 363–376.

[45]

Xiaojuan Qi, Renjie Liao, Zhengzhe Liu, Raquel Urtasun, and Jiaya Jia. 2018. Geonet: Geometric neural network for joint depth and surface normal estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 283–291.

[46]

Shanbao Qiao, Neal N. Xiong, Yongbin Gao, Zhijun Fang, Wenjun Yu, Juan Zhang, and Xiaoyan Jiang. 2023. Self-supervised learning of depth and ego-motion for 3D perception in human computer interaction. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 20, 2 (Mar. 2023). 1551–6857

Digital Library

[47]

Rene Ranftl, Vibhav Vineet, Qifeng Chen, and Vladlen Koltun. 2016. Dense monocular depth estimation in complex dynamic scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4058–4066.

[48]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 234–241.

[49]

Lawrence Rosenblum. 2000. Virtual and augmented reality 2020. IEEE Computer Graphics and Applications 20, 1 (2000), 38–39.

Digital Library

[50]

Tom Roussel, Luc Van Eycken, and Tinne Tuytelaars. 2019. Monocular depth estimation in new environments with absolute scale. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS ’19). IEEE, 1735–1741.

Digital Library

[51]

Anirban Roy and Sinisa Todorovic. 2016. Monocular depth estimation using neural regression forest. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5506–5514.

[52]

Ashutosh Saxena, Min Sun, and Andrew Y Ng. 2008. Make3d: Learning 3D scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 5 (2008), 824–840.

Digital Library

[53]

Tianwei Shen, Zixin Luo, Lei Zhou, Hanyu Deng, Runze Zhang, Tian Fang, and Long Quan. 2019. Beyond photometric loss for self-supervised ego-motion estimation. In Proceedings of the International Conference on Robotics and Automation. IEEE, 6359–6365.

Digital Library

[54]

Young-Sik Shin, Yeong Sang Park, and Ayoung Kim. 2020. Dvl-slam: sparse depth enhanced direct visual-lidar slam. Autonomous Robots 44, 2 (2020), 115–130.

Digital Library

[55]

Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from RGBD images. In Proceedings of the European Conference on Computer Vision. Springer, 746–760.

Digital Library

[56]

Xibin Song, Wei Li, Dingfu Zhou, Yuchao Dai, Jin Fang, Hongdong Li, and Liangjun Zhang. 2021. MLDA-Net: Multi-level dual attention-based network for self-supervised monocular depth estimation. IEEE Transactions on Image Processing 30 (2021), 4691–4705.

Digital Library

[57]

Kunal Swami, Amrit Muduli, Uttam Gurram, and Pankaj Bajpai. 2022. Do what you can, with what you have: Scale-aware and high quality monocular depth estimation without real world labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 988–997.

[58]

Brandon Wagstaff and Jonathan Kelly. 2021. Self-supervised scale recovery for monocular depth and egomotion estimation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS ’21). IEEE, 2620–2627.

Digital Library

[59]

Anjie Wang, Zhijun Fang, Yongbin Gao, Songchao Tan, Shanshe Wang, Siwei Ma, and Jenq-Neng Hwang. 2020a. Adversarial learning for joint optimization of depth and ego-motion. IEEE Transactions on Image Processing 29 (2020), 4130–4142.

Digital Library

[60]

Lijun Wang, Jianming Zhang, Oliver Wang, Zhe Lin, and Huchuan Lu. 2020b. Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 541–550.

[61]

Rui Wang, Stephen M. Pizer, and Jan-Michael Frahm. 2019. Recurrent neural network for (un-) supervised learning of monocular video visual odometry and depth. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5555–5564.

[62]

Xiaolong Wang, David Fouhey, and Abhinav Gupta. 2015. Designing deep networks for surface normal estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 539–547.

[63]

Xiangwei Wang, Hui Zhang, Xiaochuan Yin, Mingxiao Du, and Qijun Chen. 2018. Monocular visual odometry scale recovery using geometrical constraint. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’18). IEEE, 988–995.

Digital Library

[64]

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.

Digital Library

[65]

Jamie Watson, Michael Firman, Gabriel J Brostow, and Daniyar Turmukhambetov. 2019. Self-supervised monocular depth hints. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2162–2171.

[66]

Jamie Watson, Oisin Mac Aodha, Victor Prisacariu, Gabriel Brostow, and Michael Firman. 2021. The temporal opportunist: Self-supervised multi-frame monocular depth. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1164–1174.

[67]

Jipeng Wu, Rongrong Ji, Qiang Wang, Shengchuan Zhang, Xiaoshuai Sun, Yan Wang, Mingliang Xu, and Feiyue Huang. 2022. Fast monocular depth estimation via side prediction aggregation with continuous spatial refinement. IEEE Transactions on Multimedia 25 (2022), 1204–1216.

Digital Library

[68]

Mingkang Xiong, Zhenghong Zhang, Weilin Zhong, Jinsheng Ji, Jiyuan Liu, and Huilin Xiong. 2020. Self-supervised monocular depth and visual odometry learning with scale-consistent geometric constraints. In Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence (IJCAI). 963–969.

[69]

Feng Xue, Guirong Zhuo, Ziyuan Huang, Wufei Fu, Zhuoyue Wu, and Marcelo H Ang. 2020. Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS ’20). IEEE, 2330–2337.

[70]

Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, and Ram Nevatia. 2018. Lego: Learning edge with geometry all at once by watching videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 225–234.

[71]

Zhenheng Yang, Peng Wang, Wei Xu, Liang Zhao, and Ramakant Nevatia. 2017. Unsupervised learning of geometry with edge-aware depth-normal consistency. arXiv:1711.03665. Retrieved from

[72]

Zhichao Yin and Jianping Shi. 2018. Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1983–1992.

[73]

Jin Zeng, Yanfeng Tong, Yunmu Huang, Qiong Yan, Wenxiu Sun, Jing Chen, and Yongtian Wang. 2019. Deep surface normal estimation with hierarchical RGB-D fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6153–6162.

[74]

Huangying Zhan, Chamara Saroj Weerasekera, Ravi Garg, and Ian Reid. 2019. Self-supervised learning for single view depth and surface normal estimation. In Proceedings of the International Conference on Robotics and Automation (ICRA ’19). IEEE, 4811–4817.

Digital Library

[75]

Yourun Zhang, Maoguo Gong, Jianzhao Li, Mingyang Zhang, Fenlong Jiang, and Hongyu Zhao. 2022. self-supervised monocular depth estimation with multiscale perception. IEEE Transactions on Image Processing 31 (2022), 3251–3266.

[76]

Zhenyu Zhang, Zhen Cui, Chunyan Xu, Yan Yan, Nicu Sebe, and Jian Yang. 2019. Pattern-affinitive propagation across depth, surface normal and semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4106–4115.

[77]

Chaoqiang Zhao, Gary G. Yen, Qiyu Sun, Chongzhen Zhang, and Yang Tang. 2020. Masked GAN for unsupervised depth and pose prediction with scale consistency. IEEE Transactions on Neural Networks and Learning Systems 32, 12 (2020), 5392–5403.

[78]

Yiming Zhao, Mahdi Elhousni, Ziming Zhang, and Xinming Huang. 2021. Distance transform pooling neural network for LiDAR depth completion. IEEE Transactions on Neural Networks and Learning Systems 34, 9 (2021), 5580–5589.

[79]

Dingfu Zhou, Yuchao Dai, and Hongdong Li. 2019. Ground-plane-based absolute scale estimation for monocular visual odometry. IEEE Transactions on Intelligent Transportation Systems 21, 2 (2019), 791–802.

[80]

Hang Zhou, David Greenwood, and Sarah Taylor. 2021. Self-Supervised Monocular Depth Estimation with Internal Feature Fusion. BMVA Press, 378.

[81]

Tinghui Zhou, Matthew Brown, Noah Snavely, and David G. Lowe. 2017. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1851–1858.

[82]

Shengjie Zhu, Garrick Brazil, and Xiaoming Liu. 2020. The edge of depth: Explicit constraints between segmentation and depth. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13116–13125.

[83]

Yuliang Zou, Zelun Luo, and Jia-Bin Huang. 2018. Df-net: Unsupervised joint learning of depth and flow using cross-task consistency. In Proceedings of the European conference on computer vision (ECCV). 36–53.

Digital Library

Index Terms

Monocular Depth and Ego-motion Estimation with Scale Based on Superpixel and Normal Constraints
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding
        Vision for robotics

Recommendations

Semi-dense Visual Odometry for a Monocular Camera
ICCV '13: Proceedings of the 2013 IEEE International Conference on Computer Vision

We propose a fundamentally novel approach to real-time visual odometry for a monocular camera. It allows to benefit from the simplicity and accuracy of dense tracking - which does not depend on visual features - while running in real-time on a CPU. The ...
Revisit Self-supervised Depth Estimation with Local Structure-from-Motion
Computer Vision – ECCV 2024
Abstract
Both self-supervised depth estimation and Structure-from-Motion (SfM) recover scene depth from RGB videos. Despite sharing a similar objective, the two approaches are disconnected. Prior works of self-supervision backpropagate losses defined ...
Semantic and Optical Flow Guided Self-supervised Monocular Depth and Ego-Motion Estimation
Image and Graphics
Abstract
The self-supervised depth and camera pose estimation methods are proposed to address the difficulty of acquiring the densely labeled ground-truth data and have achieved a great advance. As the stereo vision could constrain the predicted depth to a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 10

October 2024

729 pages

EISSN:1551-6865

DOI:10.1145/3613707

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2024

Online AM: 01 July 2024

Accepted: 22 June 2024

Revised: 28 February 2024

Received: 17 July 2023

Published in TOMM Volume 20, Issue 10

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Shanghai Local Capacity Enhancement project
“Science and Technology Innovation Action Plan” of Shanghai Science and Technology Commission for social development project

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
260
Total Downloads

Downloads (Last 12 months)260
Downloads (Last 6 weeks)28

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents