research-article

Learning of Monocular Camera Depth Estimation Using Point-of-view Shots

Authors:

Byeongwook Yoo,

Jiwon JeongAuthors Info & Claims

ICMVA '21: Proceedings of the 2021 International Conference on Machine Vision and Applications

Pages 42 - 48

https://doi.org/10.1145/3459066.3459073

Published: 26 July 2021 Publication History

Abstract

Depth estimation using RGB images plays an important role in many computer vision applications, such as pose tracking for navigation, computational photography, and 3D reconstruction. Depth estimation using a single camera has relatively low accuracy compared to that using a conventional stereo camera system and time-of-flight (ToF) sensor. Nowadays, cameras in smartphone have the optical image stabilization system and can rotate to improve image quality such as image stabilization and image deblur. In this paper, we propose a novel depth estimation technique using a single camera equipped with tilting mechanism. Instead of motion compensation for optical image stabilization, a typical usage of tilting mechanism, we exploit it to capture multi-view images from a single viewpoint, by rotating the camera module in varying angles. The captured images have inside-out views, which is more challenging in depth estimation than out-side in views. We train a network based on structure-from-motion algorithm for depth estimation of a scene to achieve high accuracy depth maps. A synthetic dataset is created from 3D indoor models to train and validate the network. Evaluation results demonstrate that we achieve state-of-the-art performance in depth estimation using a single camera.

References

[1]

Yongsik Jin and Minho Lee. 2018. Enhancing binocular depth estimation based on proactive perception and action cyclic learning for an autonomous developmental robot. IEEE Transactions on Systems, Man, and Cybernetics: Systems 49, 1 (2018), 169-180.

[2]

Fatima El Jamiy and Ronald Marsh. 2019. Survey on depth perception in head mounted displays: distance estimation in virtual reality, augmented reality, and mixed reality. IET Image Processing 13, 5 (2019), 707-712.

[3]

Yan Wang, Wei-Lun Chao, Divyansh Garg, Bharath Hariharan, Mark Campbell, and Kilian Q Weinberger. 2019a. Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8445-8453.

[4]

Meng-Li Shih, Shih-Yang Su, Johannes Kopf, and Jia-Bin Huang. 2020. 3D Photography using Context-aware Layered Depth Inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8028-8038.

[5]

Nikolai Smolyanskiy, Alexey Kamenev, and Stan Bircheld. 2018. On the importance of stereo for accurate depth estimation: An efficient semi-supervised deep neural network approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1007-1015.

[6]

Faisal Khan, Saqib Salahuddin, and Hossein Javidnia. 2020. Deep Learning-Based Monocular Depth Estimation Methods—A State-of-the-Art Review. Sensors 20, 8 (2020), 2272.

[7]

Zhuang Zhang, Rujin Zhao, Enhai Liu, Kun Yan, Yuebo Ma, and Yunze Xu. 2019. A fusion method of 1D laser and vision based on depth estimation for pose estimation and reconstruction. Robotics and Autonomous Systems 116 (2019), 181-191.

Digital Library

[8]

Rahul Garg, Neal Wadhwa, Sameer Ansari, and Jonathan T Barron. 2019. Learning single camera depth estimation using dual-pixels. In Proceedings of the IEEE International Conference on Computer Vision. 7628-7637.

[9]

Fabrizio La Rosa, Maria Celvisia Virzì, Filippo Bonaccorso, and Marco Branciforte. 2015. Optical Image Stabilization (OIS). STMicroelectronics. Available online: http://www.st. com/resource/en/white paper/ois white paper. pdf (2015).

[10]

Jang Ho Lim, Jae Hoon Kim, Chang Il Jang, Jae Hun Jo, Ki Sung Yoo, Jeong Won No, and Tae Youn Hwang. 2020. Camera module with optical image stabilization function. US Patent App. 16/503,434.

[11]

Ashutosh Saxena, Sung H Chung, and Andrew Y Ng. 2006. Learning depth from single monocular images. In Advances in neural information processing systems. 1161-1168.

[12]

Miaomiao Liu, Mathieu Salzmann, and Xuming He. 2014. Discrete-continuous depth estimation from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 716-723.

Digital Library

[13]

David Eigen and Rob Fergus. 2015. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE international conference on computer vision. 2650-2658.

Digital Library

[14]

Yuanzhouhan Cao, Zifeng Wu, and Chunhua Shen. 2017. Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Transactions on Circuits and Systems for Video Technology 28, 11 (2017), 3174-3182

Digital Library

[15]

Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. 2018. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2002-2011.

[16]

Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3354-3361.

[17]

Tinghui Zhou, Matthew Brown, Noah Snavely, and David G Lowe. 2017. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1851-1858.

[18]

Jiawang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, and Ian Reid. 2019. Unsupervised scale-consistent depth and ego-motion learning from monocular video. In Advances in neural information processing systems. 35-45.

[19]

Pierluigi Zama Ramirez, Matteo Poggi, Fabio Tosi, Stefano Mattoccia, and Luigi Di Stefano. 2018. Geometry meets semantics for semi-supervised monocular depth estimation. In Asian Conference on Computer Vision. Springer, 298-313.

[20]

Jaime Spencer, Richard Bowden, and Simon Hadfield. 2020. DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14402-14413.

[21]

Clément Godard, Oisin Mac Aodha, Michael Firman, and Gabriel J Brostow. 2019. Digging into self-supervised monocular depth estimation. In Proceedings of the IEEE international conference on computer vision. 3828-3838.

[22]

Yevhen Kuznietsov, Jorg Stuckler, and Bastian Leibe. 2017. Semi-supervised deep learning for monocular depth map prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6647-6655.

[23]

Ali Jahani Amiri, Shing Yan Loo, and Hong Zhang. 2019. Semi-supervised monocular depth estimation with left-right consistency using deep neural network. In 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 602-607.

Digital Library

[24]

Arun CS Kumar, Suchendra M Bhandarkar, and Mukta Prasad. 2018. Monocular depth prediction using generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 300-308.

[25]

Rongrong Ji, Ke Li, Yan Wang, Xiaoshuai Sun, Feng Guo, Xiaowei Guo, Yongjian Wu, Feiyue Huang, and Jiebo Luo. 2019. Semi-supervised adversarial monocular depth estimation. IEEE transactions on pattern analysis and machine intelligence (2019).

[26]

Hu Tian and Fei Li. 2019. Semi-supervised Depth Estimation from a Single Image Based on Confidence Learning. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8573-8577.

[27]

Rui Wang, Stephen M Pizer, and Jan-Michael Frahm. 2019b. Recurrent neural network for (un-) supervised learning of monocular video visual odometry and depth. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5555-5564.

[28]

Haoyu Ren, Aman Raj, Mostafa El-Khamy, and Jungwon Lee. 2020. SUW-Learn: Joint Supervised, Unsupervised, Weakly Supervised Deep Learning for Monocular Depth Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 750-751.

[29]

Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research 32, 11 (2013), 1231-1237.

Digital Library

[30]

Moritz Menze and Andreas Geiger. 2015. Object scene flow for autonomous vehicles. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3061-3070.

[31]

Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from rgbd images. In European conference on computer vision. Springer, 746-760.

Digital Library

[32]

Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. 2016. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4040-4048.

[33]

Fei Xia, Amir R Zamir, Zhiyang He, Alexander Sax, Jitendra Malik, Silvio Savarese. 2018. Gibson env: Real-world perception for embodied agents. In Proceedings of the IEEE conference on computer vision and pattern recognition. 9068-9079.

[34]

Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, and Dhruv Batra. 2019. Habitat: A Platform for Embodied AI Research. In Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV).

[35]

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600-612.

Digital Library

[36]

Clément Pinard, Laure Chevalley, Antoine Manzanera, and David Filliat. 2018. Learning structure-from-motion from motion. In Proceedings of the European Conference on Computer Vision (ECCV).

[37]

Anurag Ranjan, Varun Jampani, Lukas Balles, Kihwan Kim, Deqing Sun, Jonas Wulff, and Michael J Black. 2019. Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 12240-12249.

[38]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770-778.

[39]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems. 8026-8037.

Index Terms

Learning of Monocular Camera Depth Estimation Using Point-of-view Shots
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
        Scene understanding
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Multiresolution and wide-scope depth estimation using a dual-PTZ-camera system

Depth information is a critical cue for scene understanding which is very important for video surveillance. Traditional depth estimation approaches based on stereo vision have been well studied in past decades. However, little research is conducted with ...
Disparity based depth estimation using light field camera
ICVGIP '21: Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing

Light field cameras have a unique feature of capturing the direction of light rays along with its intensity. This additional information is used to estimate the depth of a 3-D scene. Recently a considerable amount of research has been done in depth ...
Depth Estimation from Monocular Images Using Dilated Convolution and Uncertainty Learning
Advances in Multimedia Information Processing – PCM 2018
Abstract
Depth cues are vital in many challenging computer vision tasks. In this paper, we address the problem of dense depth prediction from a single RGB image. Compared with stereo depth estimation, sensing the depth of a scene from monocular images is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMVA '21: Proceedings of the 2021 International Conference on Machine Vision and Applications

February 2021

75 pages

ISBN:9781450389556

DOI:10.1145/3459066

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICMVA 2021

ICMVA 2021: 2021 International Conference on Machine Vision and Applications

February 20 - 22, 2021

Singapore, Singapore

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
90
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten