skip to main content
10.1145/3495243.3560523acmconferencesArticle/Chapter ViewAbstractPublication PagesmobicomConference Proceedingsconference-collections
research-article

DoCam: depth sensing with an optical image stabilization supported RGB camera

Published: 14 October 2022 Publication History

Abstract

Optical image stabilizers (OIS) are widely used in digital cameras to counteract motion blur caused by camera shakes in capturing videos and photos. In this paper, we sought to expand the applicability of the lens-shift OIS technology for metric depth estimation, i.e., let a RGB camera to achieve the similar function of a time-of-flight (ToF) camera. Instead of having to move the entire camera for depth estimation, we propose DoCam, which controls the lens motion in the OIS module to achieve 3D reconstruction. After controlling the lens motion by altering the MEMS gyroscopes readings through acoustic injection, we improve the traditional bundle adjustment algorithm by establishing additional constraints from the linearity of the lens control model for high-precision camera pose estimation. Then, we elaborate a dense depth reconstruction algorithm to compute depth maps at real-world scale from multiple captures with micro lens motion (i.e., ≤ 3 mm). Extensive experiments demonstrate that our proposed DoCam can enable a 2D color camera to estimate high-accuracy depth information of the captured scene by means of controlling lens motion in the OIS. DoCam is suitable for a variety of applications that require depth information of the scenes, especially when only a single color camera is available and located at a fixed position.

References

[1]
91mobiles. 2022. Phones with optical image stabilization. https://www.91mobiles.com/list-of-phones/phones-with-ois. (2022).
[2]
S Abhishek Anand and Nitesh Saxena. 2018. Speechless: Analyzing the threat to speech privacy from smartphone motion sensors. In 2018 IEEE Symposium on Security and Privacy (SP). IEEE, 1000--1017.
[3]
Android ARCore. 2022. CameraIntrinsics. https://developers.google.com/ar/reference/java/com/google/ar/core/CameraIntrinsics. (2022).
[4]
Microsoft Azure. 2022. Azure Kinect DK Build for mixed reality using AI sensors. https://azure.microsoft.com/en-us/services/kinect-dk/. (2022).
[5]
Shariq Farooq Bhat, Ibraheem Alhashim, and Peter Wonka. 2021. Adabins: Depth estimation using adaptive bins. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4009--4018.
[6]
Thomas Brox, Andrés Bruhn, Nils Papenberg, and Joachim Weickert. 2004. High accuracy optical flow estimation based on a theory for warping. In European conference on computer vision. Springer, 25--36.
[7]
Alfred M Bruckstein, Robert J Holt, Thomas S Huang, and Arun N Netravali. 1999. Optimum fiducials under weak perspective projection. International Journal of Computer Vision 35, 3 (1999), 223--244.
[8]
Gannon Burgett. 2021. Digital Photography Review Report: Apple expected to use sensor-shift image stabilization units in all of its next-generation iPhone models. https://www.dpreview.com/news/7769281511/report-apple-expected-sensor-shift-image-stabilization-units-next-generation-iphone-models. (2021).
[9]
Google Clay Bavor, VP. 2021. Project Starline: Feel like you're there, together. https://blog.google/technology/research/project-starline/. (2021).
[10]
Robert T Collins. 1996. A space-sweep approach to true multi-image matching. In Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 358--363.
[11]
DETEC. 2022. IP Cameras Tagged optical image stabilization (OIS). https://detec.no/collections/ip-cameras/optical-image-stabilization-ois. (2022).
[12]
Android developer. 2022. LENS DISTORTION. https://developer.android.com/reference/android/hardware/camera2/CameraCharacteristics. (2022).
[13]
Habiba Farrukh, Reham Mohamed Aburas, Siyuan Cao, and He Wang. 2020. FaceRevelio: a face liveness detection system for smartphones with a single front camera. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1--13.
[14]
Gregory A Flamme, Mark R Stephenson, Kristy Deiters, Amanda Tatro, Devon Van Gessel, Kyle Geda, Krista Wyllys, and Kara McGregor. 2012. Typical noise exposure in daily life. International journal of audiology 51, sup1 (2012), S3--S11.
[15]
Sergi Foix, Guillem Alenya, and Carme Torras. 2011. Lock-in time-of-flight (ToF) cameras: A survey. IEEE Sensors Journal 11, 9 (2011), 1917--1926.
[16]
Silvano Galliani, Katrin Lasinger, and Konrad Schindler. 2015. Massively parallel multiview stereopsis by surface normal diffusion. In Proceedings of the IEEE International Conference on Computer Vision. 873--881.
[17]
Rahul Garg, Neal Wadhwa, Sameer Ansari, and Jonathan T Barron. 2019. Learning single camera depth estimation using dual-pixels. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7628--7637.
[18]
Dariu M Gavrila and Larry S Davis. 1996. 3-D model-based tracking of humans in action: a multi-view approach. In Proceedings cvpr ieee computer society conference on computer vision and pattern recognition. IEEE, 73--80.
[19]
Hyowon Ha, Sunghoon Im, Jaesik Park, Hae-Gon Jeon, and In So Kweon. 2016. High-quality depth from uncalibrated small motion clip. In Proceedings of the IEEE conference on computer vision and pattern Recognition. 5413--5421.
[20]
Christian Häne, Christopher Zach, Jongwoo Lim, Ananth Ranganathan, and Marc Pollefeys. 2011. Stereo depth map fusion for robot navigation. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 1618--1625.
[21]
Chris Harris, Mike Stephens, et al. 1988. A combined corner and edge detector. In Alvey vision conference. Citeseer, 10--5244.
[22]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961--2969.
[23]
Peter J Huber. 1992. Robust estimation of a location parameter. In Breakthroughs in statistics. Springer, 492--518.
[24]
Mark C Hughes. 2022. Understanding Sensor-Shift Technology for High-Resolution Images. https://digital-photography-school.com/understanding-sensor-shift-technology-high-resolution-images/. (2022).
[25]
Sunghoon Im, Hyowon Ha, Gyeongmin Choe, Hae-Gon Jeon, Kyungdon Joo, and In So Kweon. 2015. High quality structure from small motion for rolling shutter cameras. In Proceedings of the IEEE International Conference on Computer Vision. 837--845.
[26]
Fabrizio La Rosa, Maria Celvisia Virzì, Filippo Bonaccorso, and Marco Branciforte. 2015. Optical Image Stabilization (OIS). STMicroelectronics. Available online: http://www.st.com/resource/en/white_paper/ois_white_paper.pdf (2015).
[27]
Rushi Lan, Long Sun, Zhenbing Liu, Huimin Lu, Cheng Pang, and Xiaonan Luo. 2020. Madnet: A fast and lightweight network for single-image super resolution. IEEE transactions on cybernetics 51, 3 (2020), 1443--1453.
[28]
Chen Liu, Jimei Yang, Duygu Ceylan, Ersin Yumer, and Yasutaka Furukawa. 2018. Planenet: Piece-wise planar reconstruction from a single rgb image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2579--2588.
[29]
Fayao Liu, Chunhua Shen, Guosheng Lin, and Ian Reid. 2015. Learning depth from single monocular images using deep convolutional neural fields. IEEE transactions on pattern analysis and machine intelligence 38, 10 (2015), 2024--2039.
[30]
Bruce D Lucas, Takeo Kanade, et al. 1981. An iterative image registration technique with an application to stereo vision. In DARPA Image Understanding Workshop. Vancouver, British Columbia.
[31]
MathWorks. 2022. Single Camera Calibrator App. https://www.mathworks.com/help/vision/ug/single-camera-calibrator-app.html. (2022).
[32]
Yan Michalevsky, Dan Boneh, and Gabi Nakibly. 2014. Gyrophone: Recognizing speech from gyroscope signals. In 23rd {USENIX} Security Symposium ({USENIX} Security 14). 1053--1067.
[33]
Hao Pan, Yi-Chao Chen, Qi Ye, and Guangtao Xue. 2021. Magicinput: Training-free multi-lingual finger input system using data augmentation based on mnists. In Proceedings of the 20th International Conference on Information Processing in Sensor Networks (co-located with CPS-IoT Week 2021). 119--131.
[34]
Alex Perekalin. 2018. Why face unlock is a bad idea. https://www.kaspersky.com/blog/face-unlock-insecurity/21618/. (2018).
[35]
Raytrix. 2022. 3D light field camera technology. https://raytrix.de/. (2022).
[36]
RICOH. 2022. PENTAX Star Photography. https://www.pentax.com.tw/index.php?do=share&act=info&pid=0&id=83. (2022).
[37]
Hamed Sarbolandi, Damien Lefloch, and Andreas Kolb. 2015. Kinect range sensing: Structured-light versus Time-of-Flight Kinect. Computer vision and image understanding 139 (2015), 1--20.
[38]
Johannes L Schonberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4104--4113.
[39]
Steven M Seitz, Brian Curless, James Diebel, Daniel Scharstein, and Richard Szeliski. 2006. A comparison and evaluation of multi-view stereo reconstruction algorithms. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06), Vol. 1. IEEE, 519--528.
[40]
Yunmok Son, Hocheol Shin, Dongkwan Kim, Youngseok Park, Juhwan Noh, Kibum Choi, Jungwoo Choi, and Yongdae Kim. 2015. Rocking drones with intentional sound noise on gyroscopic sensors. In 24th {USENIX} Security Symposium ({USENIX} Security 15). 881--896.
[41]
Peter Sturm and Bill Triggs. 1996. A factorization based algorithm for multi-image projective structure and motion. In European conference on computer vision. Springer, 709--720.
[42]
Zachary Teed and Jia Deng. 2018. Deepv2d: Video to depth with differentiable structure from motion. arXiv preprint arXiv:1812.04605 (2018).
[43]
Carlo Tomasi and Takeo Kanade. 1991. Detection and tracking of point. Int J Comput Vis 9 (1991), 137--154.
[44]
Bill Triggs. 1996. Factorization methods for projective structure and motion. In Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 845--851.
[45]
Bill Triggs, Philip F McLauchlan, Richard I Hartley, and Andrew W Fitzgibbon. 1999. Bundle adjustment a modern synthesis. In International workshop on vision algorithms. Springer, 298--372.
[46]
Timothy Trippel, Ofir Weisse, Wenyuan Xu, Peter Honeyman, and Kevin Fu. 2017. WALNUT: Waging doubt on the integrity of MEMS accelerometers with acoustic injection attacks. In 2017 IEEE European symposium on security and privacy (EuroS&P). IEEE, 3--18.
[47]
Yazhou Tu, Zhiqiang Lin, Insup Lee, and Xiali Hei. 2018. Injected and delivered: Fabricating implicit control over actuation systems by spoofing inertial sensors. In 27th {USENIX} Security Symposium ({USENIX} Security 18). 1545--1562.
[48]
Lucía Vera, Jesús Gimeno, Inmaculada Coma, and Marcos Fernández. 2011. Augmented mirror: interactive augmented reality system based on kinect. In IFIP Conference on Human-Computer Interaction. Springer, 483--486.
[49]
Open Source Computer Vision. 2022. Depth Map from Stereo Images. https://docs.opencv.org/3.4/dd/d53/tutorial_py_depthmap.html. (2022).
[50]
Neal Wadhwa, Rahul Garg, David E Jacobs, Bryan E Feldman, Nori Kanazawa, Robert Carroll, Yair Movshovitz-Attias, Jonathan T Barron, Yael Pritch, and Marc Levoy. 2018. Synthetic depth-of-field with a single-camera mobile phone. ACM Transactions on Graphics (ToG) 37, 4 (2018), 1--13.
[51]
Jeremy H-S Wang, Kang-Fu Qiu, and Paul C-P Chao. 2017. Control design and digital implementation of a fast 2-degree-of-freedom translational optical image stabilizer for image sensors in mobile camera phones. Sensors 17, 10 (2017), 2333.
[52]
Xingkui Wei, Yinda Zhang, Zhuwen Li, Yanwei Fu, and Xiangyang Xue. 2020. Deepsfm: Structure from motion via deep bundle adjustment. In European conference on computer vision. Springer, 230--247.
[53]
Diana Wofk, Fangchang Ma, Tien-Ju Yang, Sertac Karaman, and Vivienne Sze. 2019. Fastdepth: Fast monocular depth estimation on embedded systems. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 6101--6108.
[54]
Jiangjian Xiao, Hui Cheng, Feng Han, and Harpreet Sawhney. 2008. Geo-spatial aerial video processing for scene understanding and object tracking. In 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.
[55]
Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan. 2018. Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision (ECCV). 767--783.
[56]
SEEED Yida. 2019. What is a Time of Flight Sensor and How does a ToF Sensor work? https://www.seeedstudio.com/blog/2020/01/08/what-is-a-time-of-flight-sensor-and-how-does-a-tof-sensor-work/. (2019).
[57]
Fisher Yu and David Gallup. 2014. 3d reconstruction from accidental motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3986--3993.
[58]
Sangki Yun, Yi-Chao Chen, Huihuang Zheng, Lili Qiu, and Wenguang Mao. 2017. Strata: Fine-grained acoustic-based device-free tracking. In Proceedings of the 15th annual international conference on mobile systems, applications, and services. 15--28.
[59]
Yongzhao Zhang, Wei-Hsiang Huang, Chih-Yun Yang, Wen-Ping Wang, Yi-Chao Chen, Chuang-Wen You, Da-Yuan Huang, Guangtao Xue, and Jiadi Yu. 2020. Endophasia: utilizing acoustic-based imaging for issuing contact-free silent speech commands. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020), 1--26.

Cited By

View all
  • (2024)M3Cam: Extreme Super-resolution via Multi-Modal Optical Flow for Mobile CamerasProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699371(744-756)Online publication date: 4-Nov-2024
  • (2024)Mirror Never Lies: Unveiling Reflective Privacy Risks in Glass-laden Short VideosProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3690706(1485-1499)Online publication date: 4-Dec-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MobiCom '22: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking
October 2022
932 pages
ISBN:9781450391818
DOI:10.1145/3495243
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. acoustic injection
  2. depth estimation
  3. optical image stabilization

Qualifiers

  • Research-article

Funding Sources

Conference

ACM MobiCom '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 440 of 2,972 submissions, 15%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)125
  • Downloads (Last 6 weeks)6
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)M3Cam: Extreme Super-resolution via Multi-Modal Optical Flow for Mobile CamerasProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699371(744-756)Online publication date: 4-Nov-2024
  • (2024)Mirror Never Lies: Unveiling Reflective Privacy Risks in Glass-laden Short VideosProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3690706(1485-1499)Online publication date: 4-Dec-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media