research-article

DoCam: depth sensing with an optical image stabilization supported RGB camera

Authors:

Xiaoyu JiAuthors Info & Claims

MobiCom '22: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking

Pages 405 - 418

https://doi.org/10.1145/3495243.3560523

Published: 14 October 2022 Publication History

Abstract

Optical image stabilizers (OIS) are widely used in digital cameras to counteract motion blur caused by camera shakes in capturing videos and photos. In this paper, we sought to expand the applicability of the lens-shift OIS technology for metric depth estimation, i.e., let a RGB camera to achieve the similar function of a time-of-flight (ToF) camera. Instead of having to move the entire camera for depth estimation, we propose DoCam, which controls the lens motion in the OIS module to achieve 3D reconstruction. After controlling the lens motion by altering the MEMS gyroscopes readings through acoustic injection, we improve the traditional bundle adjustment algorithm by establishing additional constraints from the linearity of the lens control model for high-precision camera pose estimation. Then, we elaborate a dense depth reconstruction algorithm to compute depth maps at real-world scale from multiple captures with micro lens motion (i.e., ≤ 3 mm). Extensive experiments demonstrate that our proposed DoCam can enable a 2D color camera to estimate high-accuracy depth information of the captured scene by means of controlling lens motion in the OIS. DoCam is suitable for a variety of applications that require depth information of the scenes, especially when only a single color camera is available and located at a fixed position.

References

[1]

91mobiles. 2022. Phones with optical image stabilization. https://www.91mobiles.com/list-of-phones/phones-with-ois. (2022).

[2]

S Abhishek Anand and Nitesh Saxena. 2018. Speechless: Analyzing the threat to speech privacy from smartphone motion sensors. In 2018 IEEE Symposium on Security and Privacy (SP). IEEE, 1000--1017.

[3]

Android ARCore. 2022. CameraIntrinsics. https://developers.google.com/ar/reference/java/com/google/ar/core/CameraIntrinsics. (2022).

[4]

Microsoft Azure. 2022. Azure Kinect DK Build for mixed reality using AI sensors. https://azure.microsoft.com/en-us/services/kinect-dk/. (2022).

[5]

Shariq Farooq Bhat, Ibraheem Alhashim, and Peter Wonka. 2021. Adabins: Depth estimation using adaptive bins. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4009--4018.

[6]

Thomas Brox, Andrés Bruhn, Nils Papenberg, and Joachim Weickert. 2004. High accuracy optical flow estimation based on a theory for warping. In European conference on computer vision. Springer, 25--36.

[7]

Alfred M Bruckstein, Robert J Holt, Thomas S Huang, and Arun N Netravali. 1999. Optimum fiducials under weak perspective projection. International Journal of Computer Vision 35, 3 (1999), 223--244.

Digital Library

[8]

Gannon Burgett. 2021. Digital Photography Review Report: Apple expected to use sensor-shift image stabilization units in all of its next-generation iPhone models. https://www.dpreview.com/news/7769281511/report-apple-expected-sensor-shift-image-stabilization-units-next-generation-iphone-models. (2021).

[9]

Google Clay Bavor, VP. 2021. Project Starline: Feel like you're there, together. https://blog.google/technology/research/project-starline/. (2021).

[10]

Robert T Collins. 1996. A space-sweep approach to true multi-image matching. In Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 358--363.

[11]

DETEC. 2022. IP Cameras Tagged optical image stabilization (OIS). https://detec.no/collections/ip-cameras/optical-image-stabilization-ois. (2022).

[12]

Android developer. 2022. LENS DISTORTION. https://developer.android.com/reference/android/hardware/camera2/CameraCharacteristics. (2022).

[13]

Habiba Farrukh, Reham Mohamed Aburas, Siyuan Cao, and He Wang. 2020. FaceRevelio: a face liveness detection system for smartphones with a single front camera. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1--13.

Digital Library

[14]

Gregory A Flamme, Mark R Stephenson, Kristy Deiters, Amanda Tatro, Devon Van Gessel, Kyle Geda, Krista Wyllys, and Kara McGregor. 2012. Typical noise exposure in daily life. International journal of audiology 51, sup1 (2012), S3--S11.

[15]

Sergi Foix, Guillem Alenya, and Carme Torras. 2011. Lock-in time-of-flight (ToF) cameras: A survey. IEEE Sensors Journal 11, 9 (2011), 1917--1926.

[16]

Silvano Galliani, Katrin Lasinger, and Konrad Schindler. 2015. Massively parallel multiview stereopsis by surface normal diffusion. In Proceedings of the IEEE International Conference on Computer Vision. 873--881.

Digital Library

[17]

Rahul Garg, Neal Wadhwa, Sameer Ansari, and Jonathan T Barron. 2019. Learning single camera depth estimation using dual-pixels. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7628--7637.

[18]

Dariu M Gavrila and Larry S Davis. 1996. 3-D model-based tracking of humans in action: a multi-view approach. In Proceedings cvpr ieee computer society conference on computer vision and pattern recognition. IEEE, 73--80.

[19]

Hyowon Ha, Sunghoon Im, Jaesik Park, Hae-Gon Jeon, and In So Kweon. 2016. High-quality depth from uncalibrated small motion clip. In Proceedings of the IEEE conference on computer vision and pattern Recognition. 5413--5421.

[20]

Christian Häne, Christopher Zach, Jongwoo Lim, Ananth Ranganathan, and Marc Pollefeys. 2011. Stereo depth map fusion for robot navigation. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 1618--1625.

[21]

Chris Harris, Mike Stephens, et al. 1988. A combined corner and edge detector. In Alvey vision conference. Citeseer, 10--5244.

[22]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961--2969.

[23]

Peter J Huber. 1992. Robust estimation of a location parameter. In Breakthroughs in statistics. Springer, 492--518.

[24]

Mark C Hughes. 2022. Understanding Sensor-Shift Technology for High-Resolution Images. https://digital-photography-school.com/understanding-sensor-shift-technology-high-resolution-images/. (2022).

[25]

Sunghoon Im, Hyowon Ha, Gyeongmin Choe, Hae-Gon Jeon, Kyungdon Joo, and In So Kweon. 2015. High quality structure from small motion for rolling shutter cameras. In Proceedings of the IEEE International Conference on Computer Vision. 837--845.

Digital Library

[26]

Fabrizio La Rosa, Maria Celvisia Virzì, Filippo Bonaccorso, and Marco Branciforte. 2015. Optical Image Stabilization (OIS). STMicroelectronics. Available online: http://www.st.com/resource/en/white_paper/ois_white_paper.pdf (2015).

[27]

Rushi Lan, Long Sun, Zhenbing Liu, Huimin Lu, Cheng Pang, and Xiaonan Luo. 2020. Madnet: A fast and lightweight network for single-image super resolution. IEEE transactions on cybernetics 51, 3 (2020), 1443--1453.

[28]

Chen Liu, Jimei Yang, Duygu Ceylan, Ersin Yumer, and Yasutaka Furukawa. 2018. Planenet: Piece-wise planar reconstruction from a single rgb image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2579--2588.

[29]

Fayao Liu, Chunhua Shen, Guosheng Lin, and Ian Reid. 2015. Learning depth from single monocular images using deep convolutional neural fields. IEEE transactions on pattern analysis and machine intelligence 38, 10 (2015), 2024--2039.

[30]

Bruce D Lucas, Takeo Kanade, et al. 1981. An iterative image registration technique with an application to stereo vision. In DARPA Image Understanding Workshop. Vancouver, British Columbia.

[31]

MathWorks. 2022. Single Camera Calibrator App. https://www.mathworks.com/help/vision/ug/single-camera-calibrator-app.html. (2022).

[32]

Yan Michalevsky, Dan Boneh, and Gabi Nakibly. 2014. Gyrophone: Recognizing speech from gyroscope signals. In 23rd {USENIX} Security Symposium ({USENIX} Security 14). 1053--1067.

[33]

Hao Pan, Yi-Chao Chen, Qi Ye, and Guangtao Xue. 2021. Magicinput: Training-free multi-lingual finger input system using data augmentation based on mnists. In Proceedings of the 20th International Conference on Information Processing in Sensor Networks (co-located with CPS-IoT Week 2021). 119--131.

Digital Library

[34]

Alex Perekalin. 2018. Why face unlock is a bad idea. https://www.kaspersky.com/blog/face-unlock-insecurity/21618/. (2018).

[35]

Raytrix. 2022. 3D light field camera technology. https://raytrix.de/. (2022).

[36]

RICOH. 2022. PENTAX Star Photography. https://www.pentax.com.tw/index.php?do=share&act=info&pid=0&id=83. (2022).

[37]

Hamed Sarbolandi, Damien Lefloch, and Andreas Kolb. 2015. Kinect range sensing: Structured-light versus Time-of-Flight Kinect. Computer vision and image understanding 139 (2015), 1--20.

[38]

Johannes L Schonberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4104--4113.

[39]

Steven M Seitz, Brian Curless, James Diebel, Daniel Scharstein, and Richard Szeliski. 2006. A comparison and evaluation of multi-view stereo reconstruction algorithms. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06), Vol. 1. IEEE, 519--528.

[40]

Yunmok Son, Hocheol Shin, Dongkwan Kim, Youngseok Park, Juhwan Noh, Kibum Choi, Jungwoo Choi, and Yongdae Kim. 2015. Rocking drones with intentional sound noise on gyroscopic sensors. In 24th {USENIX} Security Symposium ({USENIX} Security 15). 881--896.

[41]

Peter Sturm and Bill Triggs. 1996. A factorization based algorithm for multi-image projective structure and motion. In European conference on computer vision. Springer, 709--720.

[42]

Zachary Teed and Jia Deng. 2018. Deepv2d: Video to depth with differentiable structure from motion. arXiv preprint arXiv:1812.04605 (2018).

[43]

Carlo Tomasi and Takeo Kanade. 1991. Detection and tracking of point. Int J Comput Vis 9 (1991), 137--154.

Digital Library

[44]

Bill Triggs. 1996. Factorization methods for projective structure and motion. In Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 845--851.

[45]

Bill Triggs, Philip F McLauchlan, Richard I Hartley, and Andrew W Fitzgibbon. 1999. Bundle adjustment a modern synthesis. In International workshop on vision algorithms. Springer, 298--372.

[46]

Timothy Trippel, Ofir Weisse, Wenyuan Xu, Peter Honeyman, and Kevin Fu. 2017. WALNUT: Waging doubt on the integrity of MEMS accelerometers with acoustic injection attacks. In 2017 IEEE European symposium on security and privacy (EuroS&P). IEEE, 3--18.

[47]

Yazhou Tu, Zhiqiang Lin, Insup Lee, and Xiali Hei. 2018. Injected and delivered: Fabricating implicit control over actuation systems by spoofing inertial sensors. In 27th {USENIX} Security Symposium ({USENIX} Security 18). 1545--1562.

[48]

Lucía Vera, Jesús Gimeno, Inmaculada Coma, and Marcos Fernández. 2011. Augmented mirror: interactive augmented reality system based on kinect. In IFIP Conference on Human-Computer Interaction. Springer, 483--486.

[49]

Open Source Computer Vision. 2022. Depth Map from Stereo Images. https://docs.opencv.org/3.4/dd/d53/tutorial_py_depthmap.html. (2022).

[50]

Neal Wadhwa, Rahul Garg, David E Jacobs, Bryan E Feldman, Nori Kanazawa, Robert Carroll, Yair Movshovitz-Attias, Jonathan T Barron, Yael Pritch, and Marc Levoy. 2018. Synthetic depth-of-field with a single-camera mobile phone. ACM Transactions on Graphics (ToG) 37, 4 (2018), 1--13.

Digital Library

[51]

Jeremy H-S Wang, Kang-Fu Qiu, and Paul C-P Chao. 2017. Control design and digital implementation of a fast 2-degree-of-freedom translational optical image stabilizer for image sensors in mobile camera phones. Sensors 17, 10 (2017), 2333.

[52]

Xingkui Wei, Yinda Zhang, Zhuwen Li, Yanwei Fu, and Xiangyang Xue. 2020. Deepsfm: Structure from motion via deep bundle adjustment. In European conference on computer vision. Springer, 230--247.

Digital Library

[53]

Diana Wofk, Fangchang Ma, Tien-Ju Yang, Sertac Karaman, and Vivienne Sze. 2019. Fastdepth: Fast monocular depth estimation on embedded systems. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 6101--6108.

Digital Library

[54]

Jiangjian Xiao, Hui Cheng, Feng Han, and Harpreet Sawhney. 2008. Geo-spatial aerial video processing for scene understanding and object tracking. In 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.

[55]

Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan. 2018. Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision (ECCV). 767--783.

Digital Library

[56]

SEEED Yida. 2019. What is a Time of Flight Sensor and How does a ToF Sensor work? https://www.seeedstudio.com/blog/2020/01/08/what-is-a-time-of-flight-sensor-and-how-does-a-tof-sensor-work/. (2019).

[57]

Fisher Yu and David Gallup. 2014. 3d reconstruction from accidental motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3986--3993.

Digital Library

[58]

Sangki Yun, Yi-Chao Chen, Huihuang Zheng, Lili Qiu, and Wenguang Mao. 2017. Strata: Fine-grained acoustic-based device-free tracking. In Proceedings of the 15th annual international conference on mobile systems, applications, and services. 15--28.

Digital Library

[59]

Yongzhao Zhang, Wei-Hsiang Huang, Chih-Yun Yang, Wen-Ping Wang, Yi-Chao Chen, Chuang-Wen You, Da-Yuan Huang, Guangtao Xue, and Jiadi Yu. 2020. Endophasia: utilizing acoustic-based imaging for issuing contact-free silent speech commands. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020), 1--26.

Digital Library

Cited By

Lu YDing DPan HFu YZhang LTan FWang RChen YXue GRen JShu YLiu JTan RHe YChen J(2024)M3Cam: Extreme Super-resolution via Multi-Modal Optical Flow for Mobile CamerasProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699371(744-756)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3666025.3699371
Wang SZhao XDai DYang LGanesan DLane NShi W(2024)Mirror Never Lies: Unveiling Reflective Privacy Risks in Glass-laden Short VideosProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3690706(1485-1499)Online publication date: 4-Dec-2024
https://dl.acm.org/doi/10.1145/3636534.3690706

Index Terms

DoCam: depth sensing with an optical image stabilization supported RGB camera

Recommendations

3-D video generation using hybrid camera system
IMMERSCOM '09: Proceedings of the 2nd International Conference on Immersive Telecommunications

In this paper, we present a new camera system combining a time-of-flight depth camera and multiple video cameras to generate a multiview video-plus-depth. In order to get the 3-D video using the hybrid camera system, we first obtain a multiview image ...
Generation of multi-view video using a fusion camera system for 3D displays

In this paper, we present a fusion camera system combining one time-of-flight depth camera and two video cameras to generate multi-view video sequences. In order to obtain the multi-view video using the fusion camera system for 3D displays, we capture a ...
Efficient Disparity Map Generation Using Stereo and Time-of-Flight Depth Cameras
Proceedings, Part II, of the 16th Pacific-Rim Conference on Advances in Multimedia Information Processing -- PCM 2015 - Volume 9315

Three-dimensional content 3D creation has received a lot of attention due to numerous successes of 3D entertainment. Accurate estimation of depth information is necessary for efficient 3D content creation. In this paper, we propose a disparity map ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MobiCom '22: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking

October 2022

932 pages

ISBN:9781450391818

DOI:10.1145/3495243

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOBILE: ACM Special Interest Group on Mobility of Systems, Users, Data and Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Program of Shanghai Academic Research Leader
NSFC (National Nature Science Foundation of China)
NSFC (National Nature Science Foundation of China)

Conference

ACM MobiCom '22

Sponsor:

SIGMOBILE

ACM MobiCom '22: The 28th Annual International Conference on Mobile Computing and Networking

October 17 - 21, 2022

NSW, Sydney, Australia

Acceptance Rates

Overall Acceptance Rate 440 of 2,972 submissions, 15%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
700
Total Downloads

Downloads (Last 12 months)125
Downloads (Last 6 weeks)6

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lu YDing DPan HFu YZhang LTan FWang RChen YXue GRen JShu YLiu JTan RHe YChen J(2024)M3Cam: Extreme Super-resolution via Multi-Modal Optical Flow for Mobile CamerasProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699371(744-756)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3666025.3699371
Wang SZhao XDai DYang LGanesan DLane NShi W(2024)Mirror Never Lies: Unveiling Reflective Privacy Risks in Glass-laden Short VideosProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3690706(1485-1499)Online publication date: 4-Dec-2024
https://dl.acm.org/doi/10.1145/3636534.3690706

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten