skip to main content
10.1145/3459066.3459073acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmvaConference Proceedingsconference-collections
research-article

Learning of Monocular Camera Depth Estimation Using Point-of-view Shots

Authors Info & Claims
Published:26 July 2021Publication History

ABSTRACT

Depth estimation using RGB images plays an important role in many computer vision applications, such as pose tracking for navigation, computational photography, and 3D reconstruction. Depth estimation using a single camera has relatively low accuracy compared to that using a conventional stereo camera system and time-of-flight (ToF) sensor. Nowadays, cameras in smartphone have the optical image stabilization system and can rotate to improve image quality such as image stabilization and image deblur. In this paper, we propose a novel depth estimation technique using a single camera equipped with tilting mechanism. Instead of motion compensation for optical image stabilization, a typical usage of tilting mechanism, we exploit it to capture multi-view images from a single viewpoint, by rotating the camera module in varying angles. The captured images have inside-out views, which is more challenging in depth estimation than out-side in views. We train a network based on structure-from-motion algorithm for depth estimation of a scene to achieve high accuracy depth maps. A synthetic dataset is created from 3D indoor models to train and validate the network. Evaluation results demonstrate that we achieve state-of-the-art performance in depth estimation using a single camera.

References

  1. Yongsik Jin and Minho Lee. 2018. Enhancing binocular depth estimation based on proactive perception and action cyclic learning for an autonomous developmental robot. IEEE Transactions on Systems, Man, and Cybernetics: Systems 49, 1 (2018), 169-180.Google ScholarGoogle ScholarCross RefCross Ref
  2. Fatima El Jamiy and Ronald Marsh. 2019. Survey on depth perception in head mounted displays: distance estimation in virtual reality, augmented reality, and mixed reality. IET Image Processing 13, 5 (2019), 707-712.Google ScholarGoogle ScholarCross RefCross Ref
  3. Yan Wang, Wei-Lun Chao, Divyansh Garg, Bharath Hariharan, Mark Campbell, and Kilian Q Weinberger. 2019a. Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8445-8453.Google ScholarGoogle ScholarCross RefCross Ref
  4. Meng-Li Shih, Shih-Yang Su, Johannes Kopf, and Jia-Bin Huang. 2020. 3D Photography using Context-aware Layered Depth Inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8028-8038.Google ScholarGoogle ScholarCross RefCross Ref
  5. Nikolai Smolyanskiy, Alexey Kamenev, and Stan Bircheld. 2018. On the importance of stereo for accurate depth estimation: An efficient semi-supervised deep neural network approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1007-1015.Google ScholarGoogle ScholarCross RefCross Ref
  6. Faisal Khan, Saqib Salahuddin, and Hossein Javidnia. 2020. Deep Learning-Based Monocular Depth Estimation Methods—A State-of-the-Art Review. Sensors 20, 8 (2020), 2272.Google ScholarGoogle ScholarCross RefCross Ref
  7. Zhuang Zhang, Rujin Zhao, Enhai Liu, Kun Yan, Yuebo Ma, and Yunze Xu. 2019. A fusion method of 1D laser and vision based on depth estimation for pose estimation and reconstruction. Robotics and Autonomous Systems 116 (2019), 181-191.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Rahul Garg, Neal Wadhwa, Sameer Ansari, and Jonathan T Barron. 2019. Learning single camera depth estimation using dual-pixels. In Proceedings of the IEEE International Conference on Computer Vision. 7628-7637.Google ScholarGoogle ScholarCross RefCross Ref
  9. Fabrizio La Rosa, Maria Celvisia Virzì, Filippo Bonaccorso, and Marco Branciforte. 2015. Optical Image Stabilization (OIS). STMicroelectronics. Available online: http://www.st. com/resource/en/white paper/ois white paper. pdf (2015).Google ScholarGoogle Scholar
  10. Jang Ho Lim, Jae Hoon Kim, Chang Il Jang, Jae Hun Jo, Ki Sung Yoo, Jeong Won No, and Tae Youn Hwang. 2020. Camera module with optical image stabilization function. US Patent App. 16/503,434.Google ScholarGoogle Scholar
  11. Ashutosh Saxena, Sung H Chung, and Andrew Y Ng. 2006. Learning depth from single monocular images. In Advances in neural information processing systems. 1161-1168.Google ScholarGoogle Scholar
  12. Miaomiao Liu, Mathieu Salzmann, and Xuming He. 2014. Discrete-continuous depth estimation from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 716-723.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. David Eigen and Rob Fergus. 2015. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE international conference on computer vision. 2650-2658.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yuanzhouhan Cao, Zifeng Wu, and Chunhua Shen. 2017. Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Transactions on Circuits and Systems for Video Technology 28, 11 (2017), 3174-3182Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. 2018. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2002-2011.Google ScholarGoogle ScholarCross RefCross Ref
  16. Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3354-3361.Google ScholarGoogle ScholarCross RefCross Ref
  17. Tinghui Zhou, Matthew Brown, Noah Snavely, and David G Lowe. 2017. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1851-1858.Google ScholarGoogle ScholarCross RefCross Ref
  18. Jiawang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, and Ian Reid. 2019. Unsupervised scale-consistent depth and ego-motion learning from monocular video. In Advances in neural information processing systems. 35-45.Google ScholarGoogle Scholar
  19. Pierluigi Zama Ramirez, Matteo Poggi, Fabio Tosi, Stefano Mattoccia, and Luigi Di Stefano. 2018. Geometry meets semantics for semi-supervised monocular depth estimation. In Asian Conference on Computer Vision. Springer, 298-313.Google ScholarGoogle Scholar
  20. Jaime Spencer, Richard Bowden, and Simon Hadfield. 2020. DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14402-14413.Google ScholarGoogle ScholarCross RefCross Ref
  21. Clément Godard, Oisin Mac Aodha, Michael Firman, and Gabriel J Brostow. 2019. Digging into self-supervised monocular depth estimation. In Proceedings of the IEEE international conference on computer vision. 3828-3838.Google ScholarGoogle ScholarCross RefCross Ref
  22. Yevhen Kuznietsov, Jorg Stuckler, and Bastian Leibe. 2017. Semi-supervised deep learning for monocular depth map prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6647-6655.Google ScholarGoogle ScholarCross RefCross Ref
  23. Ali Jahani Amiri, Shing Yan Loo, and Hong Zhang. 2019. Semi-supervised monocular depth estimation with left-right consistency using deep neural network. In 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 602-607.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Arun CS Kumar, Suchendra M Bhandarkar, and Mukta Prasad. 2018. Monocular depth prediction using generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 300-308.Google ScholarGoogle ScholarCross RefCross Ref
  25. Rongrong Ji, Ke Li, Yan Wang, Xiaoshuai Sun, Feng Guo, Xiaowei Guo, Yongjian Wu, Feiyue Huang, and Jiebo Luo. 2019. Semi-supervised adversarial monocular depth estimation. IEEE transactions on pattern analysis and machine intelligence (2019).Google ScholarGoogle Scholar
  26. Hu Tian and Fei Li. 2019. Semi-supervised Depth Estimation from a Single Image Based on Confidence Learning. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8573-8577.Google ScholarGoogle ScholarCross RefCross Ref
  27. Rui Wang, Stephen M Pizer, and Jan-Michael Frahm. 2019b. Recurrent neural network for (un-) supervised learning of monocular video visual odometry and depth. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5555-5564.Google ScholarGoogle ScholarCross RefCross Ref
  28. Haoyu Ren, Aman Raj, Mostafa El-Khamy, and Jungwon Lee. 2020. SUW-Learn: Joint Supervised, Unsupervised, Weakly Supervised Deep Learning for Monocular Depth Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 750-751.Google ScholarGoogle ScholarCross RefCross Ref
  29. Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research 32, 11 (2013), 1231-1237.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Moritz Menze and Andreas Geiger. 2015. Object scene flow for autonomous vehicles. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3061-3070.Google ScholarGoogle ScholarCross RefCross Ref
  31. Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from rgbd images. In European conference on computer vision. Springer, 746-760.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. 2016. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4040-4048.Google ScholarGoogle ScholarCross RefCross Ref
  33. Fei Xia, Amir R Zamir, Zhiyang He, Alexander Sax, Jitendra Malik, Silvio Savarese. 2018. Gibson env: Real-world perception for embodied agents. In Proceedings of the IEEE conference on computer vision and pattern recognition. 9068-9079.Google ScholarGoogle ScholarCross RefCross Ref
  34. Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, and Dhruv Batra. 2019. Habitat: A Platform for Embodied AI Research. In Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV).Google ScholarGoogle ScholarCross RefCross Ref
  35. Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600-612.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Clément Pinard, Laure Chevalley, Antoine Manzanera, and David Filliat. 2018. Learning structure-from-motion from motion. In Proceedings of the European Conference on Computer Vision (ECCV).Google ScholarGoogle Scholar
  37. Anurag Ranjan, Varun Jampani, Lukas Balles, Kihwan Kim, Deqing Sun, Jonas Wulff, and Michael J Black. 2019. Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 12240-12249.Google ScholarGoogle ScholarCross RefCross Ref
  38. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770-778.Google ScholarGoogle ScholarCross RefCross Ref
  39. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems. 8026-8037.Google ScholarGoogle Scholar

Index Terms

  1. Learning of Monocular Camera Depth Estimation Using Point-of-view Shots
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICMVA '21: Proceedings of the 2021 International Conference on Machine Vision and Applications
          February 2021
          75 pages
          ISBN:9781450389556
          DOI:10.1145/3459066

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 July 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited
        • Article Metrics

          • Downloads (Last 12 months)15
          • Downloads (Last 6 weeks)2

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format