skip to main content
10.1145/3459066.3459073acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmvaConference Proceedingsconference-collections

Learning of Monocular Camera Depth Estimation Using Point-of-view Shots

Published: 26 July 2021 Publication History


Depth estimation using RGB images plays an important role in many computer vision applications, such as pose tracking for navigation, computational photography, and 3D reconstruction. Depth estimation using a single camera has relatively low accuracy compared to that using a conventional stereo camera system and time-of-flight (ToF) sensor. Nowadays, cameras in smartphone have the optical image stabilization system and can rotate to improve image quality such as image stabilization and image deblur. In this paper, we propose a novel depth estimation technique using a single camera equipped with tilting mechanism. Instead of motion compensation for optical image stabilization, a typical usage of tilting mechanism, we exploit it to capture multi-view images from a single viewpoint, by rotating the camera module in varying angles. The captured images have inside-out views, which is more challenging in depth estimation than out-side in views. We train a network based on structure-from-motion algorithm for depth estimation of a scene to achieve high accuracy depth maps. A synthetic dataset is created from 3D indoor models to train and validate the network. Evaluation results demonstrate that we achieve state-of-the-art performance in depth estimation using a single camera.


Yongsik Jin and Minho Lee. 2018. Enhancing binocular depth estimation based on proactive perception and action cyclic learning for an autonomous developmental robot. IEEE Transactions on Systems, Man, and Cybernetics: Systems 49, 1 (2018), 169-180.
Fatima El Jamiy and Ronald Marsh. 2019. Survey on depth perception in head mounted displays: distance estimation in virtual reality, augmented reality, and mixed reality. IET Image Processing 13, 5 (2019), 707-712.
Yan Wang, Wei-Lun Chao, Divyansh Garg, Bharath Hariharan, Mark Campbell, and Kilian Q Weinberger. 2019a. Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8445-8453.
Meng-Li Shih, Shih-Yang Su, Johannes Kopf, and Jia-Bin Huang. 2020. 3D Photography using Context-aware Layered Depth Inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8028-8038.
Nikolai Smolyanskiy, Alexey Kamenev, and Stan Bircheld. 2018. On the importance of stereo for accurate depth estimation: An efficient semi-supervised deep neural network approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1007-1015.
Faisal Khan, Saqib Salahuddin, and Hossein Javidnia. 2020. Deep Learning-Based Monocular Depth Estimation Methods—A State-of-the-Art Review. Sensors 20, 8 (2020), 2272.
Zhuang Zhang, Rujin Zhao, Enhai Liu, Kun Yan, Yuebo Ma, and Yunze Xu. 2019. A fusion method of 1D laser and vision based on depth estimation for pose estimation and reconstruction. Robotics and Autonomous Systems 116 (2019), 181-191.
Rahul Garg, Neal Wadhwa, Sameer Ansari, and Jonathan T Barron. 2019. Learning single camera depth estimation using dual-pixels. In Proceedings of the IEEE International Conference on Computer Vision. 7628-7637.
Fabrizio La Rosa, Maria Celvisia Virzì, Filippo Bonaccorso, and Marco Branciforte. 2015. Optical Image Stabilization (OIS). STMicroelectronics. Available online: com/resource/en/white paper/ois white paper. pdf (2015).
Jang Ho Lim, Jae Hoon Kim, Chang Il Jang, Jae Hun Jo, Ki Sung Yoo, Jeong Won No, and Tae Youn Hwang. 2020. Camera module with optical image stabilization function. US Patent App. 16/503,434.
Ashutosh Saxena, Sung H Chung, and Andrew Y Ng. 2006. Learning depth from single monocular images. In Advances in neural information processing systems. 1161-1168.
Miaomiao Liu, Mathieu Salzmann, and Xuming He. 2014. Discrete-continuous depth estimation from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 716-723.
David Eigen and Rob Fergus. 2015. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE international conference on computer vision. 2650-2658.
Yuanzhouhan Cao, Zifeng Wu, and Chunhua Shen. 2017. Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Transactions on Circuits and Systems for Video Technology 28, 11 (2017), 3174-3182
Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. 2018. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2002-2011.
Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3354-3361.
Tinghui Zhou, Matthew Brown, Noah Snavely, and David G Lowe. 2017. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1851-1858.
Jiawang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, and Ian Reid. 2019. Unsupervised scale-consistent depth and ego-motion learning from monocular video. In Advances in neural information processing systems. 35-45.
Pierluigi Zama Ramirez, Matteo Poggi, Fabio Tosi, Stefano Mattoccia, and Luigi Di Stefano. 2018. Geometry meets semantics for semi-supervised monocular depth estimation. In Asian Conference on Computer Vision. Springer, 298-313.
Jaime Spencer, Richard Bowden, and Simon Hadfield. 2020. DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14402-14413.
Clément Godard, Oisin Mac Aodha, Michael Firman, and Gabriel J Brostow. 2019. Digging into self-supervised monocular depth estimation. In Proceedings of the IEEE international conference on computer vision. 3828-3838.
Yevhen Kuznietsov, Jorg Stuckler, and Bastian Leibe. 2017. Semi-supervised deep learning for monocular depth map prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6647-6655.
Ali Jahani Amiri, Shing Yan Loo, and Hong Zhang. 2019. Semi-supervised monocular depth estimation with left-right consistency using deep neural network. In 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 602-607.
Arun CS Kumar, Suchendra M Bhandarkar, and Mukta Prasad. 2018. Monocular depth prediction using generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 300-308.
Rongrong Ji, Ke Li, Yan Wang, Xiaoshuai Sun, Feng Guo, Xiaowei Guo, Yongjian Wu, Feiyue Huang, and Jiebo Luo. 2019. Semi-supervised adversarial monocular depth estimation. IEEE transactions on pattern analysis and machine intelligence (2019).
Hu Tian and Fei Li. 2019. Semi-supervised Depth Estimation from a Single Image Based on Confidence Learning. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8573-8577.
Rui Wang, Stephen M Pizer, and Jan-Michael Frahm. 2019b. Recurrent neural network for (un-) supervised learning of monocular video visual odometry and depth. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5555-5564.
Haoyu Ren, Aman Raj, Mostafa El-Khamy, and Jungwon Lee. 2020. SUW-Learn: Joint Supervised, Unsupervised, Weakly Supervised Deep Learning for Monocular Depth Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 750-751.
Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research 32, 11 (2013), 1231-1237.
Moritz Menze and Andreas Geiger. 2015. Object scene flow for autonomous vehicles. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3061-3070.
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from rgbd images. In European conference on computer vision. Springer, 746-760.
Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. 2016. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4040-4048.
Fei Xia, Amir R Zamir, Zhiyang He, Alexander Sax, Jitendra Malik, Silvio Savarese. 2018. Gibson env: Real-world perception for embodied agents. In Proceedings of the IEEE conference on computer vision and pattern recognition. 9068-9079.
Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, and Dhruv Batra. 2019. Habitat: A Platform for Embodied AI Research. In Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV).
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600-612.
Clément Pinard, Laure Chevalley, Antoine Manzanera, and David Filliat. 2018. Learning structure-from-motion from motion. In Proceedings of the European Conference on Computer Vision (ECCV).
Anurag Ranjan, Varun Jampani, Lukas Balles, Kihwan Kim, Deqing Sun, Jonas Wulff, and Michael J Black. 2019. Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 12240-12249.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770-778.
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems. 8026-8037.

Index Terms

  1. Learning of Monocular Camera Depth Estimation Using Point-of-view Shots
        Index terms have been assigned to the content through auto-classification.



        Information & Contributors


        Published In

        cover image ACM Other conferences
        ICMVA '21: Proceedings of the 2021 International Conference on Machine Vision and Applications
        February 2021
        75 pages
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]


        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 26 July 2021


        Request permissions for this article.

        Check for updates

        Author Tags

        1. Depth Estimation
        2. Monocular Image
        3. Point-of-view


        • Research-article
        • Research
        • Refereed limited


        ICMVA 2021


        Other Metrics

        Bibliometrics & Citations


        Article Metrics

        • 0
          Total Citations
        • 90
          Total Downloads
        • Downloads (Last 12 months)19
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 18 Feb 2025

        Other Metrics


        View Options

        Login options

        View options


        View or Download as a PDF file.



        View online with eReader.


        HTML Format

        View this article in HTML Format.

        HTML Format






        Share this Publication link

        Share on social media