ABSTRACT
Depth estimation using RGB images plays an important role in many computer vision applications, such as pose tracking for navigation, computational photography, and 3D reconstruction. Depth estimation using a single camera has relatively low accuracy compared to that using a conventional stereo camera system and time-of-flight (ToF) sensor. Nowadays, cameras in smartphone have the optical image stabilization system and can rotate to improve image quality such as image stabilization and image deblur. In this paper, we propose a novel depth estimation technique using a single camera equipped with tilting mechanism. Instead of motion compensation for optical image stabilization, a typical usage of tilting mechanism, we exploit it to capture multi-view images from a single viewpoint, by rotating the camera module in varying angles. The captured images have inside-out views, which is more challenging in depth estimation than out-side in views. We train a network based on structure-from-motion algorithm for depth estimation of a scene to achieve high accuracy depth maps. A synthetic dataset is created from 3D indoor models to train and validate the network. Evaluation results demonstrate that we achieve state-of-the-art performance in depth estimation using a single camera.
- Yongsik Jin and Minho Lee. 2018. Enhancing binocular depth estimation based on proactive perception and action cyclic learning for an autonomous developmental robot. IEEE Transactions on Systems, Man, and Cybernetics: Systems 49, 1 (2018), 169-180.Google ScholarCross Ref
- Fatima El Jamiy and Ronald Marsh. 2019. Survey on depth perception in head mounted displays: distance estimation in virtual reality, augmented reality, and mixed reality. IET Image Processing 13, 5 (2019), 707-712.Google ScholarCross Ref
- Yan Wang, Wei-Lun Chao, Divyansh Garg, Bharath Hariharan, Mark Campbell, and Kilian Q Weinberger. 2019a. Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8445-8453.Google ScholarCross Ref
- Meng-Li Shih, Shih-Yang Su, Johannes Kopf, and Jia-Bin Huang. 2020. 3D Photography using Context-aware Layered Depth Inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8028-8038.Google ScholarCross Ref
- Nikolai Smolyanskiy, Alexey Kamenev, and Stan Bircheld. 2018. On the importance of stereo for accurate depth estimation: An efficient semi-supervised deep neural network approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1007-1015.Google ScholarCross Ref
- Faisal Khan, Saqib Salahuddin, and Hossein Javidnia. 2020. Deep Learning-Based Monocular Depth Estimation Methods—A State-of-the-Art Review. Sensors 20, 8 (2020), 2272.Google ScholarCross Ref
- Zhuang Zhang, Rujin Zhao, Enhai Liu, Kun Yan, Yuebo Ma, and Yunze Xu. 2019. A fusion method of 1D laser and vision based on depth estimation for pose estimation and reconstruction. Robotics and Autonomous Systems 116 (2019), 181-191.Google ScholarDigital Library
- Rahul Garg, Neal Wadhwa, Sameer Ansari, and Jonathan T Barron. 2019. Learning single camera depth estimation using dual-pixels. In Proceedings of the IEEE International Conference on Computer Vision. 7628-7637.Google ScholarCross Ref
- Fabrizio La Rosa, Maria Celvisia Virzì, Filippo Bonaccorso, and Marco Branciforte. 2015. Optical Image Stabilization (OIS). STMicroelectronics. Available online: http://www.st. com/resource/en/white paper/ois white paper. pdf (2015).Google Scholar
- Jang Ho Lim, Jae Hoon Kim, Chang Il Jang, Jae Hun Jo, Ki Sung Yoo, Jeong Won No, and Tae Youn Hwang. 2020. Camera module with optical image stabilization function. US Patent App. 16/503,434.Google Scholar
- Ashutosh Saxena, Sung H Chung, and Andrew Y Ng. 2006. Learning depth from single monocular images. In Advances in neural information processing systems. 1161-1168.Google Scholar
- Miaomiao Liu, Mathieu Salzmann, and Xuming He. 2014. Discrete-continuous depth estimation from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 716-723.Google ScholarDigital Library
- David Eigen and Rob Fergus. 2015. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE international conference on computer vision. 2650-2658.Google ScholarDigital Library
- Yuanzhouhan Cao, Zifeng Wu, and Chunhua Shen. 2017. Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Transactions on Circuits and Systems for Video Technology 28, 11 (2017), 3174-3182Google ScholarDigital Library
- Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. 2018. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2002-2011.Google ScholarCross Ref
- Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3354-3361.Google ScholarCross Ref
- Tinghui Zhou, Matthew Brown, Noah Snavely, and David G Lowe. 2017. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1851-1858.Google ScholarCross Ref
- Jiawang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, and Ian Reid. 2019. Unsupervised scale-consistent depth and ego-motion learning from monocular video. In Advances in neural information processing systems. 35-45.Google Scholar
- Pierluigi Zama Ramirez, Matteo Poggi, Fabio Tosi, Stefano Mattoccia, and Luigi Di Stefano. 2018. Geometry meets semantics for semi-supervised monocular depth estimation. In Asian Conference on Computer Vision. Springer, 298-313.Google Scholar
- Jaime Spencer, Richard Bowden, and Simon Hadfield. 2020. DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14402-14413.Google ScholarCross Ref
- Clément Godard, Oisin Mac Aodha, Michael Firman, and Gabriel J Brostow. 2019. Digging into self-supervised monocular depth estimation. In Proceedings of the IEEE international conference on computer vision. 3828-3838.Google ScholarCross Ref
- Yevhen Kuznietsov, Jorg Stuckler, and Bastian Leibe. 2017. Semi-supervised deep learning for monocular depth map prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6647-6655.Google ScholarCross Ref
- Ali Jahani Amiri, Shing Yan Loo, and Hong Zhang. 2019. Semi-supervised monocular depth estimation with left-right consistency using deep neural network. In 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 602-607.Google ScholarDigital Library
- Arun CS Kumar, Suchendra M Bhandarkar, and Mukta Prasad. 2018. Monocular depth prediction using generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 300-308.Google ScholarCross Ref
- Rongrong Ji, Ke Li, Yan Wang, Xiaoshuai Sun, Feng Guo, Xiaowei Guo, Yongjian Wu, Feiyue Huang, and Jiebo Luo. 2019. Semi-supervised adversarial monocular depth estimation. IEEE transactions on pattern analysis and machine intelligence (2019).Google Scholar
- Hu Tian and Fei Li. 2019. Semi-supervised Depth Estimation from a Single Image Based on Confidence Learning. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8573-8577.Google ScholarCross Ref
- Rui Wang, Stephen M Pizer, and Jan-Michael Frahm. 2019b. Recurrent neural network for (un-) supervised learning of monocular video visual odometry and depth. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5555-5564.Google ScholarCross Ref
- Haoyu Ren, Aman Raj, Mostafa El-Khamy, and Jungwon Lee. 2020. SUW-Learn: Joint Supervised, Unsupervised, Weakly Supervised Deep Learning for Monocular Depth Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 750-751.Google ScholarCross Ref
- Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research 32, 11 (2013), 1231-1237.Google ScholarDigital Library
- Moritz Menze and Andreas Geiger. 2015. Object scene flow for autonomous vehicles. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3061-3070.Google ScholarCross Ref
- Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from rgbd images. In European conference on computer vision. Springer, 746-760.Google ScholarDigital Library
- Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. 2016. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4040-4048.Google ScholarCross Ref
- Fei Xia, Amir R Zamir, Zhiyang He, Alexander Sax, Jitendra Malik, Silvio Savarese. 2018. Gibson env: Real-world perception for embodied agents. In Proceedings of the IEEE conference on computer vision and pattern recognition. 9068-9079.Google ScholarCross Ref
- Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, and Dhruv Batra. 2019. Habitat: A Platform for Embodied AI Research. In Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV).Google ScholarCross Ref
- Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600-612.Google ScholarDigital Library
- Clément Pinard, Laure Chevalley, Antoine Manzanera, and David Filliat. 2018. Learning structure-from-motion from motion. In Proceedings of the European Conference on Computer Vision (ECCV).Google Scholar
- Anurag Ranjan, Varun Jampani, Lukas Balles, Kihwan Kim, Deqing Sun, Jonas Wulff, and Michael J Black. 2019. Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 12240-12249.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770-778.Google ScholarCross Ref
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems. 8026-8037.Google Scholar
Index Terms
- Learning of Monocular Camera Depth Estimation Using Point-of-view Shots
Recommendations
Multiresolution and wide-scope depth estimation using a dual-PTZ-camera system
Depth information is a critical cue for scene understanding which is very important for video surveillance. Traditional depth estimation approaches based on stereo vision have been well studied in past decades. However, little research is conducted with ...
Disparity based depth estimation using light field camera
ICVGIP '21: Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image ProcessingLight field cameras have a unique feature of capturing the direction of light rays along with its intensity. This additional information is used to estimate the depth of a 3-D scene. Recently a considerable amount of research has been done in depth ...
Depth Estimation from Monocular Images Using Dilated Convolution and Uncertainty Learning
Advances in Multimedia Information Processing – PCM 2018AbstractDepth cues are vital in many challenging computer vision tasks. In this paper, we address the problem of dense depth prediction from a single RGB image. Compared with stereo depth estimation, sensing the depth of a scene from monocular images is ...
Comments