Skip to main content

Revisiting the Deep Learning-Based Eavesdropping Attacks via Facial Dynamics from VR Motion Sensors

  • Conference paper
  • First Online:
Information and Communications Security (ICICS 2023)

Abstract

Virtual Reality (VR) Head Mounted Display’s (HMD) are equipped with a range of sensors, which have been recently exploited to infer users’ sensitive and private information through a deep learning-based eavesdropping attack that leverage facial dynamics. Mindful that the eavesdropping attack employs facial dynamics, which vary across race and gender, we evaluate the robustness of such attack under various users characteristics. We base our evaluation on the existing anthropological research that shows statistically significant differences for face width, length, and lip length among ethnic/racial groups, suggesting that a “challenger” with similar features (ethnicity/race and gender) to a victim might be able to more easily deceive the eavesdropper than when they have different features. By replicating the classification model in [17] and examining its accuracy with six different scenarios that vary the victim and attacker based on their ethnicity/race and gender, we show that our adversary is able to impersonate a user with the same ethnicity/race and gender more accurately, with an average accuracy difference between the original and adversarial setting being the lowest among all scenarios. Similarly, an adversary with different ethnicity/race and gender than the victim had the highest average accuracy difference, emphasizing an inherent bias in the fundamentals of the approach through impersonation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Oculus Quest 2 tech specs deep dive (2023). https://business.oculus.com/products/specs/

  2. MediaRecorder overview (2023). https://developer.android.com/guide/topics/media/mediarecorder

  3. Get Raw Sensor Data (2023). https://developer.oculus.com/documentation/unreal/unreal-blueprints-get-raw-sensor-data

  4. Oculus SDK for developer (2023). https://developer.oculus.com/downloads/

  5. Oculus Device Specifications (2023). https://developer.oculus.com/resources/oculus-device-specs/

  6. Unitydocument: CommonUsages (2023). https://docs.unity3d.com/ScriptReference/XR.CommonUsages.html

  7. How Facebook protects the privacy of your Voice Commands and Voice Dictation (2023). https://support.oculus.com/articles/in-vr-experiences/oculus-features/privacy-protection-with-voice-commands

  8. tf.keras.losses.SparseCategoricalCrossentropy (2023). https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy

  9. Roark, D.A., Barrett, S.E., Spence, M.J., Abdi, H., O’Toole, A.J.: Psychological and neural perspectives on the role of motion in face recognition. Behav. Cogn. Neurosci. Rev. 2(1), 15–46 (2003)

    Article  Google Scholar 

  10. Abhishek, A.S., Nitesh, S.: Speechless: analyzing the threat to speech privacy from smartphone motion sensors. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 1000–1017. IEEE (2018)

    Google Scholar 

  11. Akansu, A.N., Haddad, R.A.: Time-frequency representations. In: Multiresolution Signal Decomposition, 2nd edn., pp. 331–390. Academic Press, San Diego (2001). https://doi.org/10.1016/B978-012047141-6/50005-7. https://www.sciencedirect.com/science/article/pii/B9780120471416500057

  12. Alan, C., Lei, Y., Erik, A.: Teaching language and culture with a virtual reality game. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 541–549 (2017)

    Google Scholar 

  13. Andrea, F., Marco, F., Xavier, G.G., Lea, L., Alberto, D.B.: Natural experiences in museums through virtual reality and voice commands. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 1233–1234 (2017)

    Google Scholar 

  14. Antitza, D., François, B.: Gender estimation based on smile-dynamics. IEEE Trans. Inf. Forensics Secur. 12(3), 719–729 (2016)

    Google Scholar 

  15. Barry, A.: A review of the cocktail party effect. J. Am. Voice I/O Soc. 12(7), 35–50 (1992)

    Google Scholar 

  16. Burdea, G.C., Coiffet, P.: Virtual Reality Technology. Wiley, Hoboken (2003)

    Book  Google Scholar 

  17. Shi, C., et al.: Face-Mic: inferring live speech and speaker identity via subtle facial dynamics captured by AR/VR motion sensors. In: Proceedings of the 27th Annual International Conference on Mobile Computing and Networking, pp. 478–490 (2021)

    Google Scholar 

  18. Shi, C., Wang, Y., Chen, Y., Saxena, N., Wang, C.: WearID: low-effort wearable-assisted authentication of voice commands via cross-domain comparison without training. In: Annual Computer Security Applications Conference, pp. 829–842 (2020)

    Google Scholar 

  19. Florian, K., Thore, K., Florian, N., Erich, L.M.: Using hand tracking and voice commands to physically align virtual surfaces in AR for handwriting and sketching with HoloLens 2. In: Proceedings of the 27th ACM Symposium on Virtual Reality Software and Technology, pp. 1–3 (2021)

    Google Scholar 

  20. Segura, R.J., del Pino, F.J., Ogáyar, C.J., Rueda, A.J.: VR-OCKS: a virtual reality game for learning the basic concepts of programming. Comput. Appl. Eng. Educ. 28(1), 31–41 (2020)

    Article  Google Scholar 

  21. Radianti, J., Majchrzak, T.A., Fromm, J., Stieglitz, S., Vom Brocke, J.: Virtual reality applications for higher educations: a market analysis (2021)

    Google Scholar 

  22. Zhang, L., Pathak, P.H., Wu, M., Zhao, Y., Mohapatra, P.: AccelWord: Energy efficient hotword detection through accelerometer. In: Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services, pp. 301–315 (2015)

    Google Scholar 

  23. Durak, L., Arikan, O.: Short-time Fourier transform: two fundamental properties and an optimal implementation. IEEE Trans. Sig. Process. 51(5), 1231–1242 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  24. Johns Hopkins Medicine: Vocal Cord Disorders (2023). https://www.hopkinsmedicine.org/health/conditions-and-diseases/vocal-cord-disorders

  25. Thelwell, M., Chiu, C.Y., Bullas, A., Hart, J., Wheat, J., Choppin, S.: How shape-based anthropometry can complement traditional anthropometric techniques: a cross-sectional study. Sci. Rep. 10(1), 1–11 (2020)

    Article  Google Scholar 

  26. Nick, N., Alexandros, K., Wouter, J., Christopher, K., Frank, P., Giovanni, V.: Cookieless monster: exploring the ecosystem of web-based device fingerprinting. In: 2013 IEEE Symposium on Security and Privacy, pp. 541–555. IEEE (2013)

    Google Scholar 

  27. Rick, P., Scott, K., Osamu, F.: Issues with lip sync animation: can you read my lips? In: Proceedings of Computer Animation 2002 (CA 2002), pp. 3–10. IEEE (2002)

    Google Scholar 

  28. Theodoros, G.: A method for silence removal and segmentation of speech signals, implemented in Matlab. University of Athens, Athens 2 (2009)

    Google Scholar 

  29. Ülkü, M.Y., Fazıl, Y.N., Amro, A., David, M.: A keylogging inference attack on air-tapping keyboards in virtual environments. In: 2022 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 765–774. IEEE (2022)

    Google Scholar 

  30. Yan, M., Dan, B., Gabi, N.: Gyrophone: recognizing speech from gyroscope signals. In: 23rd USENIX Security Symposium (USENIX Security 2014), pp. 1053–1067 (2014)

    Google Scholar 

  31. Zhuang, Z., Guan, J., Hsiao, H., Bradtmiller, B.: Evaluating the representativeness of the LANL respirator fit test panels for the current US civilian workers. J. Int. Soc. Respir. Prot. 21, 83–93 (2004)

    Google Scholar 

  32. Ba, Z., et al.: Learning-based practical smartphone eavesdropping with built-in accelerometer. In: NDSS (2020)

    Google Scholar 

  33. Ziqing, Z., Douglas, L., Stacey, B., Raymond, R., Ronald, S.: Facial anthropometric differences among gender, ethnicity, and age groups. Ann. Occup. Hyg. 54(4), 391–402 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soohyeon Choi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Choi, S., Mohaisen, M., Nyang, D., Mohaisen, D. (2023). Revisiting the Deep Learning-Based Eavesdropping Attacks via Facial Dynamics from VR Motion Sensors. In: Wang, D., Yung, M., Liu, Z., Chen, X. (eds) Information and Communications Security. ICICS 2023. Lecture Notes in Computer Science, vol 14252. Springer, Singapore. https://doi.org/10.1007/978-981-99-7356-9_24

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-7356-9_24

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-7355-2

  • Online ISBN: 978-981-99-7356-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics