Abstract
Eavesdropping on human voice is one of the most common but harmful threats to personal privacy. Glasses are in direct contact with human face, which could sense facial motions when users speak, so human speech contents could be inferred by sensing the movements of glasses. In this paper, we present a live voice eavesdropping method, RF-Mic, which utilizes common glasses attached with a low-cost RFID tag to sense subtle facial speech dynamics for inferring possible voice contents. When a user with a glasses, which is attached an RFID tag on the glass bridge, is speaking, RF-Mic first collects RF signals through forward propagation and backscattering. Then, body motion interference is eliminated from the collected RF signals through a proposed Conditional Denoising AutoEncoder (CDAE) network. Next, RF-Mic extracts three kinds of facial speech dynamic features (i.e., facial movements, bone-borne vibrations, and airborne vibrations) by designing three different deep-learning models. Based on the extracted features, a facial speech dynamics model is constructed for live voice eavesdropping. Extensive experiments in different real environments demonstrate that RF-Mic can achieve robust and accurate human live voice eavesdropping.
- S. Abhishek Anand and Nitesh Saxena. 2018. Speechless: Analyzing the Threat to Speech Privacy from Smartphone Motion Sensors. In Proc. IEEE Symposium on Security and Privacy. San Francisco, USA, 1000--1017.Google Scholar
- Zhongjie Ba, Tianhang Zheng, Xinyu Zhang, Zhan Qin, Baochun Li, Xue Liu, and Kui Ren. 2020. Learning-based Practical Smartphone Eavesdropping with Built-in Accelerometer. In proc. NDSS. San Diego, USA, 23--26.Google Scholar
- C. BYU. 2020. Word frequency: based on 450 million word coca corpus. [Online]. Available: https://www.wordfrequency.info/.Google Scholar
- Zhe Chen, Tianyue Zheng, Chao Cai, and Jun Luo. 2021. MoVi-Fi: motion-robust vital signs waveform recovery via deep interpreted RF sensing. In Proc. ACM Mobicom. New Orleans, USA, 392--405.Google ScholarDigital Library
- M Dobhn Daniel et al. 2008. The rf in rfid passive uhf rfid in practice. In Elsevier.Google Scholar
- Abe Davis, Michae Rubinstein, Nea Wadhwa, Gautham J. Mysore, Fredo Durand, and William T. Freeman. 2014. The visual microphone: Passive recovery of sound from video. Acm Transactions on Graphics 33 (2014), 79--88.Google ScholarDigital Library
- Han Ding, Longfei Shangguan, Zheng Yang, Jinsong Han, Zimu Zhou, Panlong Yang, Wei Xi, and Jizhong Zhao. 2015. FEMO: A Platform for Free-weight Exercise Monitoring with RFIDs. In Proc SenSys. Seoul, South Korea, 141--154.Google ScholarDigital Library
- Pierre Divenyi, Steven Greenberg, and Georg Meyer. 2006. Dynamics of speech production and perception. Vol. 374. Ios Press.Google Scholar
- Chao Feng, Jie Xiong, Liqiong Chang, Fuwei Wang, Ju Wang, and Dingyi Fang. 2021. RF-Identity: Non-Intrusive Person Identification Based on Commodity RFID Devices. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. (2021), 1--23.Google ScholarDigital Library
- Yuanhao Feng, Panlong Yang, Yanyong Zhang, Xiang-Yang Li, Ziyang Chen, and Gang Huang. 2019. Demo: The RFID Can Hear Your Music Play. In Proc. MobiCom. Los Cabos, Mexico, 21--25.Google ScholarDigital Library
- Google. 2023. Google Assistant, your own personal Google. [Online]. Available: https://assistant.google.com/.Google Scholar
- Pengfei Hu, Wenhao Li, Yifan Ma, Panneer Selvam Santhalingam, Parth Pathak, Hong Li, Huanle Zhang, Guoming Zhang, Xiuzhen Cheng, and Prasant Mohapatra. 2022. Towards Unconstrained Vocabulary Eavesdropping With Mmwave Radar Using GAN. IEEE Transactions on Mobile Computing 01 (2022), 1--14.Google Scholar
- Pengfei Hu, Yifan Ma, Panneer Selvam Santhalingam, Parth H Pathak, and Xiuzhen Cheng. 2022. Milliear: Millimeter-wave acoustic eavesdropping with unconstrained vocabulary. In Proc. INFOCOM. Virtual Conference, 11--20.Google ScholarDigital Library
- Pengfei Hu, Hui Zhuang, Panneer Selvam Santhalingam, Riccardo Spolaor, Parth Pathak, Guoming Zhang, and Xiuzhen Cheng. 2022. AccEar: Accelerometer Acoustic Eavesdropping with Unconstrained Vocabulary. In Proc. IEEE Symposium on Security and Privacy (SP). San Francisco, CA, USA, 1530--1530.Google ScholarCross Ref
- iflytek. 2022. iFlytek Input. [Online]. Available: https://srf.xunfei.cn/.Google Scholar
- Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).Google Scholar
- Martin G Larson. 2006. Descriptive statistics and graphical displays. Circulation 114, 1 (2006), 76--81.Google ScholarCross Ref
- Mike Lenehan. 2021. Impinj, Inc. Application Note -- Low Level User Data Support. [Online]. Available: https://support.impinj.com/hc/en-us/articles/202755318-Application-Note-Low-Level-User-Data-Support.Google Scholar
- Ping Li, Zhenlin An, Lei Yang, and Panlong Yang. 2019. Towards Physical-Layer Vibration Sensing with RFIDs. In Proc. INFOCOM. Paris, France, 892--900.Google ScholarDigital Library
- Ping Li, Zhenlin An, Lei Yang, Panlong Yang, and QiongZheng Lin. 2019. RFID harmonic for vibration sensing. IEEE Transactions on Mobile Computing 20, 4 (2019), 1614--1626.Google ScholarCross Ref
- Héctor A. Cordourier Maruri, Paulo Lopez-Meyer, Jonathan Huang, Willem Marco Beltman, Lama Nachman, and Hong Lu. 2018. V-Speech: Noise-Robust Speech Capturing Glasses Using Vibration Sensors. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 4 (2018), 180:1--180:23.Google Scholar
- Yan Michalevsky, Dan Boneh, and Gabi Nakibly. 2014. Gyrophone: Recognizing Speech from Gyroscope Signals. In Proc. USENIX. San Diego, CA,USA, 1053--1067.Google Scholar
- F. Mavromatis N. Kargas and A. Bletsas. 2019. USRP reader. [Online]. Available: https://github.com/nkargas/Gen2-UHF-RFID-Reader.Google Scholar
- Ben Nassi, Yaron Pirutin, Adi Shamir, Yuval Elovici, and Boris Zadov. 2020. Lamphone: Real-Time Passive Sound Recovery from Light Bulb Vibrations. Cryptology ePrint Archive, Paper 2020/708.Google Scholar
- Louis C.W. Pols. 2011. SPEECH DYNAMICS. In Plenary Lecture.Google Scholar
- Richard Raspet, Jeremy Webster, and Kevin Dillion. 2006. Framework for wind noise studies. The Journal of the Acoustical Society of America 119, 2 (2006), 834--843.Google ScholarCross Ref
- rfidhy. 2022. The Smallest RFID Tag as Thin as Sand. [Online]. Available: https://www.rfidhy.com/the-smallest-rfid-tag-as-thin-as-sand/.Google Scholar
- Sriram Sami, Yimin Dai, Sean Rui Xiang Tan, Nirupam Roy, and Jun Han. 2020. Spying with Your Robot Vacuum Cleaner: Eavesdropping via Lidar Sensors. In Proc. SenSys. Yokohama, Japan, 354--367.Google ScholarDigital Library
- Baoguang Shi, Xiang Bai, and Cong Yao. 2016. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence 39, 11 (2016), 2298--2304.Google ScholarDigital Library
- Cong Shi, Xiangyu Xu, Tianfang Zhang, Payton Walker, Yi Wu, Jian Liu, Nitesh Saxena, Yingying Chen, and Jiadi Yu. 2021. Face-Mic: inferring live speech and speaker identity via subtle facial dynamics captured by AR/VR motion sensors. In Proc. MobiCom. New Orleans, United States, 478--490.Google ScholarDigital Library
- Weigao Su, Daibo Liu, Taiyuan Zhang, and Hongbo Jiang. 2021. Towards Device Independent Eavesdropping on Telephone Conversations with Built-in Accelerometer. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 4 (2021), 177:1--177:29.Google ScholarDigital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
- Chuyu Wang and Lei Xie. 2018. Rf-ecg: Heart rate variability assessment based on cots rfid tag array. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 2 (2018), 1--26.Google ScholarDigital Library
- Chuyu Wang, Lei Xie, Yuancan Lin, Wei Wang, and Yingying Chen et al. 2021. Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 4 (2021), 182:1--182:25.Google Scholar
- DeLiang Wang. 2005. On ideal binary mask as the computational goal of auditory scene analysis. In Speech separation by humans and machines. Springer, 181--197.Google Scholar
- Guanhua Wang, Yongpan Zou, Zimu Zhou, Kaishun Wu, and Lionel M Ni. 2016. We can hear you with Wi-Fi! IEEE Transactions on Mobile Computing 15, 11 (2016), 2907--2920.Google Scholar
- Zi Wang, Yili Ren, Yingying Chen, and Jie Yang. 2022. Toothsonic: Earable authentication via acoustic toothprint. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 2 (2022), 1--24.Google ScholarDigital Library
- Teng Wei, Shu Wang, Anfu Zhou, and Xinyu Zhang. 2015. Acoustic Eavesdropping through Wireless Vibrometry. In Proc. MobiCom. Paris, France, 130--141.Google ScholarDigital Library
- Zhichen Wu, Jianda Li, Jiadi Yu, Yanmin Zhu, Guangtao Xue, and Minglu Li. 2016. L3: Sensing driving conditions for vehicle lane-level localization on highways. In Proc. IEEE INFOCOM. San Francisco, CA, USA, 1--9.Google ScholarDigital Library
- Fu Xiao, Zhongqin Wang, Ning Ye, Ruchuan Wang, and Xiang-Yang Li. 2017. One more tag enables fine-grained RFID localization and tracking. IEEE/ACM Transactions on Networking 26, 1 (2017), 161--174.Google ScholarDigital Library
- Binbin Xie, Jie Xiong, Xiaojiang Chen, and Dingyi Fang. 2020. Exploring commodity rfid for contactless sub-millimeter vibration sensing. In Proc. ACM Sensys. Yokohama, Japan, 15--27.Google ScholarDigital Library
- Chenhan Xu, Zhengxiong Li, Hanbin Zhang, Aditya Singh Rathore, Huining Li, Chen Song, Kun Wang, and Wenyao Xu. 2019. WaveEar: Exploring a mmWave-based Noise-resistant Speech Sensing for Voice-User Interface. In Proc. MobiSys. Seoul, Korea, 14--26.Google ScholarDigital Library
- Xiangyu Xu, Jiadi Yu, Yingying Chen, Yanmin Zhu, Shiyou Qian, and Minglu Li. 2017. Leveraging audio signals for early recognition of inattentive driving with smartphones. IEEE Transactions on Mobile Computing 17, 7 (2017), 1553--1567.Google ScholarCross Ref
- Lei Yang, Yao Li, Qiongzheng Lin, Huanyu Jia, Xiang-Yang Li, and Yunhao Liu. 2017. Tagbeat: Sensing mechanical vibration period with cots rfid systems. IEEE/ACM transactions on networking 25, 6 (2017), 3823--3835.Google Scholar
- Lei Yang, Yao Li, Qiongzheng Lin, Xiang-Yang Li, and Yunhao Liu. 2016. Making sense of mechanical vibration period with sub-millisecond accuracy using backscatter signals. In Proc. MobiCom. New York City, NY, USA, 16--28.Google ScholarDigital Library
- Panlong Yang, Yuanhao Feng, Jie Xiong, Ziyang Chen, and Xiang-Yang Li. 2020. RF-Ear: Contactless Multi-device Vibration Sensing and Identification Using COTS RFID. In Proc. INFOCOM. Toronto, ON, Canada, 297--306.Google ScholarDigital Library
- Cheng Zhang, Qiuyue Xue, Anandghan Waghmare, Sumeet Jain, Yiming Pu, Sinan Hersek, Kent Lyons, Kenneth A Cunefare, Omer T Inan, and Gregory D Abowd. 2017. Soundtrak: Continuous 3d tracking of a finger using active acoustics. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 2 (2017), 1--25.Google ScholarDigital Library
- Li Zhang, Parth H. Pathak, Muchen Wu, Yixin Zhao, and Prasant Mohapatra. 2015. AccelWord: Energy Efficient Hotword Detection through Accelerometer. In Proc. MobiSys. Florence, Italy, 301--315.Google ScholarDigital Library
- Minghang Zhao, Shisheng Zhong, Xuyun Fu, Baoping Tang, and Michael Pecht. 2019. Deep residual shrinkage networks for fault diagnosis. IEEE Transactions on Industrial Informatics 16, 7 (2019), 4681--4690.Google ScholarCross Ref
- Yanmin Zhu, Ruobing Jiang, Jiadi Yu, Zhi Li, and Minglu Li. 2014. Geographic routing based on predictive locations in vehicular ad hoc networks. EURASIP Journal on Wireless Communications and Networking 2014 (2014), 1--9.Google ScholarCross Ref
Index Terms
- RF-Mic: Live Voice Eavesdropping via Capturing Subtle Facial Speech Dynamics Leveraging RFID
Recommendations
Face-Mic: inferring live speech and speaker identity via subtle facial dynamics captured by AR/VR motion sensors
MobiCom '21: Proceedings of the 27th Annual International Conference on Mobile Computing and NetworkingAugmented reality/virtual reality (AR/VR) has extended beyond 3D immersive gaming to a broader array of applications, such as shopping, tourism, education. And recently there has been a large shift from handheld-controller dominated interactions to ...
A smile can reveal your age: enabling facial dynamics in age estimation
MM '12: Proceedings of the 20th ACM international conference on MultimediaEstimation of a person's age from the facial image has many applications, ranging from biometrics and access control to cosmetics and entertainment. Many image-based methods have been proposed for this problem. In this paper, we propose a method for the ...
Combining appearance and motion for face and gender recognition from videos
While many works consider moving faces only as collections of frames and apply still image-based methods, recent developments indicate that excellent results can be obtained using texture-based spatiotemporal representations for describing and analyzing ...
Comments