poster

Privacy Preserving Continuous Speech Recording using Throat Microphones

Authors:
Tim Schneegans

TECO / Pervasive Computing Systems, Karlsruhe Institute of Technology, Germany

TECO / Pervasive Computing Systems, Karlsruhe Institute of Technology, Germany

0000-0002-8629-8949
View Profile

,
Leon Simmon

TECO / Pervasive Computing Systems, Karlsruhe Institute of Technology, Germany

TECO / Pervasive Computing Systems, Karlsruhe Institute of Technology, Germany

0000-0003-2546-7179
View Profile

,
Michael Beigl

TECO, Karlsruhe Institute of Technology (KIT), Germany

TECO, Karlsruhe Institute of Technology (KIT), Germany

0000-0001-5009-2327
View Profile

ISWC '22: Proceedings of the 2022 ACM International Symposium on Wearable ComputersSeptember 2022Pages 106–108https://doi.org/10.1145/3544794.3558480

Published:27 December 2022Publication History

ISWC '22: Proceedings of the 2022 ACM International Symposium on Wearable Computers

Pages 106–108

ABSTRACT

A prerequisite for field-research on audio data are privacy preserving recordings that exclusively contain the target speaker who gave consent. For this purpose, we investigated the potential of a simple but robust wearable technology consisting of three parts: first, a standard air-conduction microphone providing the necessary audio quality for speech analysis; second, a throat microphone used as a speech activity filter; third, a custom ESP32 based recording device enabling on-device real-time processing. The system was evaluated in two challenging free discussion settings with two and four participants each (total N=16). Results from manual annotations show an Equal Error Rate of M=23.4-29.69 %. Based on simple instructions, our participants managed to maintain a False Acceptance Rate below 5 % while recording more than half of their utterances.

Supplemental Material

Available for Download

zip

Code (Python, C) for hardware prototype and analysis (4.9 MB)

References

Melissa M Baese-Berk and Tuuli H Morrill. 2015. Speaking rate consistency in native and non-native speakers of English. The Journal of the Acoustical Society of America 138, 3 (2015), EL223–EL228.Google ScholarCross Ref
Ishan Chatterjee, Maruchi Kim, Vivek Jayaram, Shyamnath Gollakota, Ira Kemelmacher, Shwetak Patel, and Steven M Seitz. 2022. ClearBuds: wireless binaural earbuds for learning-based speech enhancement. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services. ACM New York, NY, USA, 384–396.Google ScholarDigital Library
Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement 20, 1 (1960), 37–46.Google Scholar
U.S. Congress. 2002. 18 USC 2511: Interception and disclosure of wire, oral, or electronic communications prohibited. https://uscode.house.gov/view.xhtml?req=%28title:18%20section:2511%29. Accessed: 2022-07-09.Google Scholar
Council of Europe. 2018. 128th Session of the Committee of Ministers (Elsinore, Denmark, 17-18 May 2018). Modernised Convention for the Protection of Individuals with Regard to the Processing of Personal Data – Consolidated text. https://search.coe.int/cm/Pages/result_details.aspx?ObjectId=09000016807c65bf.Google Scholar
Engin Erzin. 2009. Improving throat microphone speech recognition by joint analysis of throat and acoustic microphone recordings. IEEE transactions on audio, speech, and language processing 17, 7(2009), 1316–1324.Google ScholarCross Ref
Daniel Garcia-Romero and Carol Y Espy-Wilson. 2011. Analysis of i-vector length normalization in speaker recognition systems. In Twelfth annual conference of the international speech communication association.Google Scholar
Aleksei Gusev, Vladimir Volokhov, Alisa Vinogradova, Tseren Andzhukaev, Andrey Shulipa, Sergey Novoselov, Timur Pekhovsky, and Alexander Kozlov. 2020. STC-Innovation Speaker Recognition Systems for Far-Field Speaker Verification Challenge 2020.. In INTERSPEECH. 3466–3470.Google Scholar
Maximilian Haas, Matthias R Mehl, Nicola Ballhausen, Sascha Zuber, Matthias Kliegel, and Alexandra Hering. 2022. The Sounds of Memory: Extending the Age–Prospective Memory Paradox to Everyday Behavior and Conversations. The Journals of Gerontology: Series B 77, 4 (01 2022), 695–703. https://doi.org/10.1093/geronb/gbac012 arXiv:https://academic.oup.com/psychsocgerontology/article-pdf/77/4/695/43224411/gbac012.pdfGoogle ScholarCross Ref
Simon Haykin and Zhe Chen. 2005. The cocktail party problem. Neural computation 17, 9 (2005), 1875–1902.Google Scholar
Shin Katayama, Akhil Mathur, Marc Van den Broeck, Tadashi Okoshi, Jin Nakazawa, and Fahim Kawsar. 2019. Situation-Aware Emotion Regulation of Conversational Agents with Kinetic Earables. In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 725–731.Google Scholar
Patrick Kenny. 2010. Bayesian speaker verification with, heavy tailed priors. Proc. Odyssey 2010 (2010).Google Scholar
James F Knight and Chris Baber. 2005. A tool to assess the comfort of wearable computers. Human factors 47, 1 (2005), 77–91.Google Scholar
Effie Lai-Chong Law, Asbjørn Følstad, Jonathan Grudin, and Björn Schuller. 2022. Conversational Agent as Trustworthy Autonomous System (Trust-CA)(Dagstuhl Seminar 21381). In Dagstuhl Reports, Vol. 11. Schloss Dagstuhl-Leibniz-Zentrum für Informatik.Google Scholar
David A van Leeuwen and Niko Brümmer. 2007. An introduction to application-independent evaluation of speaker recognition systems. In Speaker classification I. Springer, 330–353.Google Scholar
Daniel M Low, Kate H Bentley, and Satrajit S Ghosh. 2020. Automated assessment of psychiatric disorders using speech: A systematic review. Laryngoscope Investigative Otolaryngology 5, 1 (2020), 96–116.Google ScholarCross Ref
Matthias R Mehl. 2017. The electronically activated recorder (EAR) a method for the naturalistic observation of daily social behavior. Current directions in psychological science 26, 2 (2017), 184–190.Google Scholar
Matthias R Mehl, Samuel D Gosling, and James W Pennebaker. 2006. Personality in its natural habitat: manifestations and implicit folk theories of personality in daily life.Journal of personality and social psychology 90, 5(2006), 862.Google Scholar
Matthias R Mehl, James W Pennebaker, D Michael Crow, James Dabbs, and John H Price. 2001. The Electronically Activated Recorder (EAR): A device for sampling naturalistic daily activities and conversations. Behavior research methods, instruments, & computers 33, 4 (2001), 517–523.Google Scholar
Matthias R Mehl, Megan L Robbins, and Fenne große Deters. 2012. Naturalistic observation of health-relevant social processes: The Electronically Activated Recorder (EAR) methodology in psychosomatics. Psychosomatic medicine 74, 4 (2012), 410.Google ScholarCross Ref
Yoshitaka Nakajima, Hideki Kashioka, Nick Campbell, and Kiyohiro Shikano. 2006. Non-audible murmur (NAM) recognition. IEICE TRANSACTIONS on Information and Systems 89, 1 (2006), 1–8.Google ScholarDigital Library
Yoshitaka Nakajima, Hideki Kashioka, Kiyohiro Shikano, and Nick Campbell. 2003. Non-audible murmur recognition input interface using stethoscopic microphone attached to the skin. In 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03)., Vol. 5. IEEE, V–708.Google ScholarCross Ref
National Institute of Standards and Technology. 2021. NIST 2021 Speaker Recognition Evaluation Plan. https://sre.nist.gov/.Google Scholar
Wei Rao, Chenglin Xu, Eng Siong Chng, and Haizhou Li. 2019. Target Speaker Extraction for Overlapped Multi-Talker Speaker Verification. https://doi.org/10.48550/ARXIV.1902.02546Google Scholar
Timothy J Trull and Ulrich Ebner-Priemer. 2013. Ambulatory assessment. Annual review of clinical psychology 9 (2013), 151.Google Scholar
European Union. 2016. Art. 6.1.a, Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=celex%3A32016R0679.Google Scholar
Chenglin Xu, Wei Rao, Eng Siong Chng, and Haizhou Li. 2019. Optimization of speaker extraction neural network with magnitude and temporal spectrum approximation loss. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6990–6994.Google ScholarCross Ref
Zhengyou Zhang, Zicheng Liu, Mike Sinclair, Alex Acero, Li Deng, Jasha Droppo, Xuedong Huang, and Yanli Zheng. 2004. Multi-sensory microphones for robust speech detection, enhancement and recognition. In 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 3. IEEE, iii–781.Google ScholarCross Ref
Siqi Zheng, Weilong Huang, Xianliang Wang, Hongbin Suo, Jinwei Feng, and Zhijie Yan. 2021. A real-time speaker diarization system based on spatial spectrum. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7208–7212.Google ScholarCross Ref
Yanli Zheng, Zicheng Liu, Zhengyou Zhang, Mike Sinclair, Jasha Droppo, Li Deng, Alex Acero, and Xuedong Huang. 2003. Air-and bone-conductive integrated microphones for robust speech detection and enhancement. In 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No. 03EX721). IEEE, 249–254.Google ScholarCross Ref

Index Terms

Privacy Preserving Continuous Speech Recording using Throat Microphones
1. Applied computing

Recommendations

MFCC-GMM based accent recognition system for Telugu speech signals

Speech processing is very important research area where speaker recognition, speech synthesis, speech codec, speech noise reduction are some of the research areas. Many of the languages have different speaking styles called accents or dialects. ...
Read More
Improving Throat Microphone Speech Recognition by Joint Analysis of Throat and Acoustic Microphone Recordings

We present a new framework for joint analysis of throat and acoustic microphone (TAM) recordings to improve throat microphone only speech recognition. The proposed analysis framework aims to learn joint sub-phone patterns of throat and acoustic ...
Read More
Robust Voice Liveness Detection and Speaker Verification Using Throat Microphones

While having a wide range of applications, automatic speaker verification ASV systems are vulnerable to spoofing attacks, in particular, replay attacks that are effective and easy to implement. Most prior work on detecting replay attacks uses audio from ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ISWC '22: Proceedings of the 2022 ACM International Symposium on Wearable Computers
September 2022
141 pages
ISBN:9781450394246
DOI:10.1145/3544794

Copyright © 2022 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 December 2022
Check for updates
Author Tags
Biometrics and security
Research on behaviour
and cognition
audio sensors
data privacy protection
psychology
speech processing
Qualifiers
- poster
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate38of196submissions,19%
Upcoming Conference
UBICOMP '24

Sponsor:

sigchi

sigchi

UBICOMP '24: The 2024 ACM International Joint Conference on Pervasive and Ubiquitous Computing

October 5 - 9, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 93
  Total Downloads
- Downloads (Last 12 months)50
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Privacy Preserving Continuous Speech Recording using Throat Microphones

ISWC '22: Proceedings of the 2022 ACM International Symposium on Wearable Computers

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

MFCC-GMM based accent recognition system for Telugu speech signals

Improving Throat Microphone Speech Recognition by Joint Analysis of Throat and Acoustic Microphone Recordings

Robust Voice Liveness Detection and Speaker Verification Using Throat Microphones

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Privacy Preserving Continuous Speech Recording using Throat Microphones

ISWC '22: Proceedings of the 2022 ACM International Symposium on Wearable Computers

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

MFCC-GMM based accent recognition system for Telugu speech signals

Improving Throat Microphone Speech Recognition by Joint Analysis of Throat and Acoustic Microphone Recordings

Robust Voice Liveness Detection and Speaker Verification Using Throat Microphones

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media