Revisiting the Deep Learning-Based Eavesdropping Attacks via Facial Dynamics from VR Motion Sensors

Choi, Soohyeon; Mohaisen, Manar; Nyang, Daehun; Mohaisen, David

doi:10.1007/978-981-99-7356-9_24

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14252))

Included in the following conference series:

International Conference on Information and Communications Security

649 Accesses

Abstract

Virtual Reality (VR) Head Mounted Display’s (HMD) are equipped with a range of sensors, which have been recently exploited to infer users’ sensitive and private information through a deep learning-based eavesdropping attack that leverage facial dynamics. Mindful that the eavesdropping attack employs facial dynamics, which vary across race and gender, we evaluate the robustness of such attack under various users characteristics. We base our evaluation on the existing anthropological research that shows statistically significant differences for face width, length, and lip length among ethnic/racial groups, suggesting that a “challenger” with similar features (ethnicity/race and gender) to a victim might be able to more easily deceive the eavesdropper than when they have different features. By replicating the classification model in [17] and examining its accuracy with six different scenarios that vary the victim and attacker based on their ethnicity/race and gender, we show that our adversary is able to impersonate a user with the same ethnicity/race and gender more accurately, with an average accuracy difference between the original and adversarial setting being the lowest among all scenarios. Similarly, an adversary with different ethnicity/race and gender than the victim had the highest average accuracy difference, emphasizing an inherent bias in the fundamentals of the approach through impersonation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Oculus Quest 2 tech specs deep dive (2023). https://business.oculus.com/products/specs/
MediaRecorder overview (2023). https://developer.android.com/guide/topics/media/mediarecorder
Get Raw Sensor Data (2023). https://developer.oculus.com/documentation/unreal/unreal-blueprints-get-raw-sensor-data
Oculus SDK for developer (2023). https://developer.oculus.com/downloads/
Oculus Device Specifications (2023). https://developer.oculus.com/resources/oculus-device-specs/
Unitydocument: CommonUsages (2023). https://docs.unity3d.com/ScriptReference/XR.CommonUsages.html
How Facebook protects the privacy of your Voice Commands and Voice Dictation (2023). https://support.oculus.com/articles/in-vr-experiences/oculus-features/privacy-protection-with-voice-commands
tf.keras.losses.SparseCategoricalCrossentropy (2023). https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy
Roark, D.A., Barrett, S.E., Spence, M.J., Abdi, H., O’Toole, A.J.: Psychological and neural perspectives on the role of motion in face recognition. Behav. Cogn. Neurosci. Rev. 2(1), 15–46 (2003)
Article Google Scholar
Abhishek, A.S., Nitesh, S.: Speechless: analyzing the threat to speech privacy from smartphone motion sensors. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 1000–1017. IEEE (2018)
Google Scholar
Akansu, A.N., Haddad, R.A.: Time-frequency representations. In: Multiresolution Signal Decomposition, 2nd edn., pp. 331–390. Academic Press, San Diego (2001). https://doi.org/10.1016/B978-012047141-6/50005-7. https://www.sciencedirect.com/science/article/pii/B9780120471416500057
Alan, C., Lei, Y., Erik, A.: Teaching language and culture with a virtual reality game. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 541–549 (2017)
Google Scholar
Andrea, F., Marco, F., Xavier, G.G., Lea, L., Alberto, D.B.: Natural experiences in museums through virtual reality and voice commands. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 1233–1234 (2017)
Google Scholar
Antitza, D., François, B.: Gender estimation based on smile-dynamics. IEEE Trans. Inf. Forensics Secur. 12(3), 719–729 (2016)
Google Scholar
Barry, A.: A review of the cocktail party effect. J. Am. Voice I/O Soc. 12(7), 35–50 (1992)
Google Scholar
Burdea, G.C., Coiffet, P.: Virtual Reality Technology. Wiley, Hoboken (2003)
Book Google Scholar
Shi, C., et al.: Face-Mic: inferring live speech and speaker identity via subtle facial dynamics captured by AR/VR motion sensors. In: Proceedings of the 27th Annual International Conference on Mobile Computing and Networking, pp. 478–490 (2021)
Google Scholar
Shi, C., Wang, Y., Chen, Y., Saxena, N., Wang, C.: WearID: low-effort wearable-assisted authentication of voice commands via cross-domain comparison without training. In: Annual Computer Security Applications Conference, pp. 829–842 (2020)
Google Scholar
Florian, K., Thore, K., Florian, N., Erich, L.M.: Using hand tracking and voice commands to physically align virtual surfaces in AR for handwriting and sketching with HoloLens 2. In: Proceedings of the 27th ACM Symposium on Virtual Reality Software and Technology, pp. 1–3 (2021)
Google Scholar
Segura, R.J., del Pino, F.J., Ogáyar, C.J., Rueda, A.J.: VR-OCKS: a virtual reality game for learning the basic concepts of programming. Comput. Appl. Eng. Educ. 28(1), 31–41 (2020)
Article Google Scholar
Radianti, J., Majchrzak, T.A., Fromm, J., Stieglitz, S., Vom Brocke, J.: Virtual reality applications for higher educations: a market analysis (2021)
Google Scholar
Zhang, L., Pathak, P.H., Wu, M., Zhao, Y., Mohapatra, P.: AccelWord: Energy efficient hotword detection through accelerometer. In: Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services, pp. 301–315 (2015)
Google Scholar
Durak, L., Arikan, O.: Short-time Fourier transform: two fundamental properties and an optimal implementation. IEEE Trans. Sig. Process. 51(5), 1231–1242 (2003)
Article MathSciNet MATH Google Scholar
Johns Hopkins Medicine: Vocal Cord Disorders (2023). https://www.hopkinsmedicine.org/health/conditions-and-diseases/vocal-cord-disorders
Thelwell, M., Chiu, C.Y., Bullas, A., Hart, J., Wheat, J., Choppin, S.: How shape-based anthropometry can complement traditional anthropometric techniques: a cross-sectional study. Sci. Rep. 10(1), 1–11 (2020)
Article Google Scholar
Nick, N., Alexandros, K., Wouter, J., Christopher, K., Frank, P., Giovanni, V.: Cookieless monster: exploring the ecosystem of web-based device fingerprinting. In: 2013 IEEE Symposium on Security and Privacy, pp. 541–555. IEEE (2013)
Google Scholar
Rick, P., Scott, K., Osamu, F.: Issues with lip sync animation: can you read my lips? In: Proceedings of Computer Animation 2002 (CA 2002), pp. 3–10. IEEE (2002)
Google Scholar
Theodoros, G.: A method for silence removal and segmentation of speech signals, implemented in Matlab. University of Athens, Athens 2 (2009)
Google Scholar
Ülkü, M.Y., Fazıl, Y.N., Amro, A., David, M.: A keylogging inference attack on air-tapping keyboards in virtual environments. In: 2022 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 765–774. IEEE (2022)
Google Scholar
Yan, M., Dan, B., Gabi, N.: Gyrophone: recognizing speech from gyroscope signals. In: 23rd USENIX Security Symposium (USENIX Security 2014), pp. 1053–1067 (2014)
Google Scholar
Zhuang, Z., Guan, J., Hsiao, H., Bradtmiller, B.: Evaluating the representativeness of the LANL respirator fit test panels for the current US civilian workers. J. Int. Soc. Respir. Prot. 21, 83–93 (2004)
Google Scholar
Ba, Z., et al.: Learning-based practical smartphone eavesdropping with built-in accelerometer. In: NDSS (2020)
Google Scholar
Ziqing, Z., Douglas, L., Stacey, B., Raymond, R., Ronald, S.: Facial anthropometric differences among gender, ethnicity, and age groups. Ann. Occup. Hyg. 54(4), 391–402 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Central Florida, Orlando, FL, 32816, USA
Soohyeon Choi & David Mohaisen
Northeastern Illinois University, Chicago, IL, 60625, USA
Manar Mohaisen
Ewha Womans University, Seoul, South Korea
Daehun Nyang

Authors

Soohyeon Choi
View author publications
You can also search for this author in PubMed Google Scholar
Manar Mohaisen
View author publications
You can also search for this author in PubMed Google Scholar
Daehun Nyang
View author publications
You can also search for this author in PubMed Google Scholar
David Mohaisen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soohyeon Choi .

Editor information

Editors and Affiliations

Nankai University, Tianjin, China
Ding Wang
Columbia University, New York, NY, USA
Moti Yung
Nankai University, Tianjin, China
Zheli Liu
Xidian University, Xi’an, China
Xiaofeng Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Choi, S., Mohaisen, M., Nyang, D., Mohaisen, D. (2023). Revisiting the Deep Learning-Based Eavesdropping Attacks via Facial Dynamics from VR Motion Sensors. In: Wang, D., Yung, M., Liu, Z., Chen, X. (eds) Information and Communications Security. ICICS 2023. Lecture Notes in Computer Science, vol 14252. Springer, Singapore. https://doi.org/10.1007/978-981-99-7356-9_24

Download citation

DOI: https://doi.org/10.1007/978-981-99-7356-9_24
Published: 20 October 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7355-2
Online ISBN: 978-981-99-7356-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Revisiting the Deep Learning-Based Eavesdropping Attacks via Facial Dynamics from VR Motion Sensors