skip to main content
10.1145/3539490.3539599acmconferencesArticle/Chapter ViewAbstractPublication PagesmobisysConference Proceedingsconference-collections
research-article

Beyond Microphone: mmWave-Based Interference-Resilient Voice Activity Detection

Published: 27 June 2022 Publication History

Abstract

Microphone-based voice activity detection systems usually require hotword detection and they cannot perform well under the presence of interference and noise. Users attending online meetings in noisy environments usually mute and unmute their microphones manually due to the limited performance of interference-resilient VAD. In order to automate voice detection in challenging environments without dictionary limitations, we explore beyond microphones and propose to use mmWave-based sensing, which is already available in many smart phones and IoT devices. Our preliminary experiments in multiple places with several users indicate that mmWave-based VAD can match and surpass the performance of an audio-based VAD in noisy conditions, while being robust against interference.

References

[1]
2021. Contactless Sleep Sensing in Nest Hub with Soli. https://ai.googleblog.com/2021/03/contactless-sleep-sensing-in-nest-hub.html
[2]
Fadel Adib, Hongzi Mao, Zachary Kabelac, Dina Katabi, and Robert C. Miller. 2015. Smart homes that monitor breathing and heart rate. In Proc. of the 33rd ACM CHI. 837–846.
[3]
Karan Ahuja, Andy Kong, Mayank Goel, and Chris Harrison. 2020. Direction-of-Voice (DoV) Estimation for Intuitive Speech Interaction with Smart Devices Ecosystems. Association for Computing Machinery, New York, NY, USA, 1121–1131.
[4]
Fuming Chen, Sheng Li, Yang Zhang, and Jianqi Wang. 2017. Detection of the Vibration Signal from Human Vocal Folds Using a 94-GHz Millimeter-Wave Radar. MDPI Sensors (Mar 2017), 543.
[5]
Shaojin Ding, Quan Wang, Shuo-Yiin Chang, Li Wan, and Ignacio Lopez Moreno. 2020. Personal VAD: Speaker-Conditioned Voice Activity Detection. In Proc. Odyssey 2020 The Speaker and Language Recognition Workshop. 433–439. https://doi.org/10.21437/Odyssey.2020-62
[6]
Friedrich Faubel, Munir Georges, Kenichi Kumatani, Andrés Bruhn, and Dietrich Klakow. 2011. Improving hands-free speech recognition in a car through audio-visual voice activity detection. In 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays. 70–75. https://doi.org/10.1109/HSCMA.2011.5942412
[7]
Sylvain Guy, Stéphane Lathuilière, Pablo Mesejo, and Radu Horaud. 2021. Learning visual voice activity detection with an automatically annotated dataset. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 4851–4856.
[8]
Chengkun Jiang, Junchen Guo, Yuan He, Meng Jin, Shuai Li, and Yunhao Liu. 2020. mmVib: Micrometer-Level Vibration Measurement with Mmwave Radar. In Proc. of the ACM MobiCom. Article 45, 13 pages.
[9]
K. J. Ray Liu and Beibei Wang. 2019. Wireless AI: Wireless Sensing, Positioning, IoT, and Communications. Cambridge University Press.
[10]
Peng Liu and Zuoying Wang. 2004. Voice activity detection using visual information. In 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1. IEEE, I–609.
[11]
Tiantian Liu, Ming Gao, Feng Lin, Chao Wang, Zhongjie Ba, Jinsong Han, Wenyao Xu, and Kui Ren. 2021. Wavoice: A Noise-Resistant Multi-Modal Speech Recognition System Fusing MmWave and Audio Signals. In Proc. of the ACM SenSys. Association for Computing Machinery, New York, NY, USA, 97–110.
[12]
Yi Luo, Zhuo Chen, and Takuya Yoshioka. 2020. Dual-path Rnn: Efficient Long Sequence Modeling for Time-domain Single-channel Speech Separation. In Proc. of the IEEE ICASSP 2020. IEEE, 46–50.
[13]
Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, and Jesper Jensen. 2021. An overview of deep-learning-based audio-visual speech enhancement and separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2021).
[14]
Muhammed Zahid Ozturk, Chenshu Wu, Beibei Wang, and K. J. Ray Liu. 2021. Sound Recovery From Radio Signals. In Proc. of the IEEE ICASSP 2021.
[15]
Muhammed Zahid Ozturk, Chenshu Wu, Beibei Wang, and K. J. Ray Liu. 2022a. Toward mmWave-Based Sound Enhancement and Separation. In Proc. of the IEEE ICASSP 2022. 6852–6856.
[16]
Muhammed Zahid Ozturk, Chenshu Wu, Beibei Wang, Min Wu, and K. J. Ray Liu. 2022b. RadioSES: mmWave-Based Audioradio Speech Enhancement and Separation System. https://doi.org/10.48550/ARXIV.2204.07092
[17]
Muhammad Shahid, Cigdem Beyan, and Vittorio Murino. 2021. S-VVAD: Visual Voice Activity Detection by Motion Segmentation. In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). 2331–2340. https://doi.org/10.1109/WACV48630.2021.00238
[18]
Jongseo Sohn, Nam Soo Kim, and Wonyong Sung. 1999. A statistical model-based voice activity detection. IEEE Signal Processing Letters 6, 1 (1999), 1–3.
[19]
Andrew G Stove. 1992. Linear FMCW radar techniques. In IEE Proceedings F (Radar and Signal Processing), Vol. 139. IET, 343–350.
[20]
Silero Team. 2021. Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier. https://github.com/snakers4/silero-vad.
[21]
Beibei Wang, Qinyi Xu, Chen Chen, Feng Zhang, and K. J. Ray Liu. 2018. The promise of radio analytics: A future paradigm of wireless positioning, tracking, and sensing. IEEE SPM 35, 3 (2018), 59–80.
[22]
Fengyu Wang, Feng Zhang, Chenshu Wu, Beibei Wang, and K. J. Ray Liu. 2021. ViMo: Multiperson Vital Sign Monitoring Using Commodity Millimeter-Wave Radio. IEEE Internet of Things Journal 8, 3 (2021), 1294–1307. https://doi.org/10.1109/JIOT.2020.3004046
[23]
Ziqi Wang, Zhe Chen, Akash Deep Singh, Luis Garcia, Jun Luo, and Mani B. Srivastava. 2020. UWHear: Through-Wall Extraction and Separation of Audio Vibrations Using Wireless Signals. In Proc. of the ACM SenSys. 1–14.
[24]
Gordon Wichern, Joe Antognini, Michael Flynn, Licheng Richard Zhu, Emmett McQuinn, Dwight Crow, Ethan Manilow, and Jonathan Le Roux. 2019. WHAM!: Extending Speech Separation to Noisy Environments. In Proc. of the Interspeech 2019.
[25]
Zhicheng Yang, Parth H Pathak, Yunze Zeng, Xixi Liran, and Prasant Mohapatra. 2016. Monitoring vital signs using millimeter wave. In Proc. of the 17th ACM MobiCom. 211–220.
[26]
Xiao-Lei Zhang and Ji Wu. 2013. Deep Belief Networks Based Voice Activity Detection. IEEE TASLP 21, 4 (2013), 697–710.
[27]
Tianyue Zheng, Zhe Chen, Shujie Zhang, Chao Cai, and Jun Luo. 2021. MoRe-Fi: Motion-Robust and Fine-Grained Respiration Monitoring via Deep-Learning UWB Radar. In Proc. of the ACM SenSys 2021. 111–124.

Cited By

View all
  • (2025)A Comprehensive Survey of Side-Channel Sound-Sensing MethodsIEEE Internet of Things Journal10.1109/JIOT.2024.350133412:2(1554-1578)Online publication date: 15-Jan-2025
  • (2023)RadioSES: mmWave-Based Audioradio Speech Enhancement and Separation SystemIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2023.325084631(1333-1347)Online publication date: 1-Mar-2023
  • (2023)RadioMic: Sound Sensing via Radio SignalsIEEE Internet of Things Journal10.1109/JIOT.2022.321796810:5(4431-4448)Online publication date: 1-Mar-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IASA '22: Proceedings of the 1st ACM International Workshop on Intelligent Acoustic Systems and Applications
July 2022
42 pages
ISBN:9781450394031
DOI:10.1145/3539490
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. mmwave sensing
  2. voice activity detection
  3. wireless sensing

Qualifiers

  • Research-article

Funding Sources

  • Key Bridge Foundation

Conference

MobiSys '22
Sponsor:

Acceptance Rates

IASA '22 Paper Acceptance Rate 6 of 6 submissions, 100%;
Overall Acceptance Rate 6 of 6 submissions, 100%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)56
  • Downloads (Last 6 weeks)2
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)A Comprehensive Survey of Side-Channel Sound-Sensing MethodsIEEE Internet of Things Journal10.1109/JIOT.2024.350133412:2(1554-1578)Online publication date: 15-Jan-2025
  • (2023)RadioSES: mmWave-Based Audioradio Speech Enhancement and Separation SystemIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2023.325084631(1333-1347)Online publication date: 1-Mar-2023
  • (2023)RadioMic: Sound Sensing via Radio SignalsIEEE Internet of Things Journal10.1109/JIOT.2022.321796810:5(4431-4448)Online publication date: 1-Mar-2023
  • (2023)A comprehensive multimodal dataset for contactless lip reading and acoustic analysisScientific Data10.1038/s41597-023-02793-w10:1Online publication date: 13-Dec-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media