skip to main content
10.1145/3544794.3558480acmconferencesArticle/Chapter ViewAbstractPublication PagesubicompConference Proceedingsconference-collections
poster

Privacy Preserving Continuous Speech Recording using Throat Microphones

Published:27 December 2022Publication History

ABSTRACT

A prerequisite for field-research on audio data are privacy preserving recordings that exclusively contain the target speaker who gave consent. For this purpose, we investigated the potential of a simple but robust wearable technology consisting of three parts: first, a standard air-conduction microphone providing the necessary audio quality for speech analysis; second, a throat microphone used as a speech activity filter; third, a custom ESP32 based recording device enabling on-device real-time processing. The system was evaluated in two challenging free discussion settings with two and four participants each (total N=16). Results from manual annotations show an Equal Error Rate of M=23.4-29.69 %. Based on simple instructions, our participants managed to maintain a False Acceptance Rate below 5 % while recording more than half of their utterances.

Skip Supplemental Material Section

Supplemental Material

References

  1. Melissa M Baese-Berk and Tuuli H Morrill. 2015. Speaking rate consistency in native and non-native speakers of English. The Journal of the Acoustical Society of America 138, 3 (2015), EL223–EL228.Google ScholarGoogle ScholarCross RefCross Ref
  2. Ishan Chatterjee, Maruchi Kim, Vivek Jayaram, Shyamnath Gollakota, Ira Kemelmacher, Shwetak Patel, and Steven M Seitz. 2022. ClearBuds: wireless binaural earbuds for learning-based speech enhancement. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services. ACM New York, NY, USA, 384–396.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement 20, 1 (1960), 37–46.Google ScholarGoogle Scholar
  4. U.S. Congress. 2002. 18 USC 2511: Interception and disclosure of wire, oral, or electronic communications prohibited. https://uscode.house.gov/view.xhtml?req=%28title:18%20section:2511%29. Accessed: 2022-07-09.Google ScholarGoogle Scholar
  5. Council of Europe. 2018. 128th Session of the Committee of Ministers (Elsinore, Denmark, 17-18 May 2018). Modernised Convention for the Protection of Individuals with Regard to the Processing of Personal Data – Consolidated text. https://search.coe.int/cm/Pages/result_details.aspx?ObjectId=09000016807c65bf.Google ScholarGoogle Scholar
  6. Engin Erzin. 2009. Improving throat microphone speech recognition by joint analysis of throat and acoustic microphone recordings. IEEE transactions on audio, speech, and language processing 17, 7(2009), 1316–1324.Google ScholarGoogle ScholarCross RefCross Ref
  7. Daniel Garcia-Romero and Carol Y Espy-Wilson. 2011. Analysis of i-vector length normalization in speaker recognition systems. In Twelfth annual conference of the international speech communication association.Google ScholarGoogle Scholar
  8. Aleksei Gusev, Vladimir Volokhov, Alisa Vinogradova, Tseren Andzhukaev, Andrey Shulipa, Sergey Novoselov, Timur Pekhovsky, and Alexander Kozlov. 2020. STC-Innovation Speaker Recognition Systems for Far-Field Speaker Verification Challenge 2020.. In INTERSPEECH. 3466–3470.Google ScholarGoogle Scholar
  9. Maximilian Haas, Matthias R Mehl, Nicola Ballhausen, Sascha Zuber, Matthias Kliegel, and Alexandra Hering. 2022. The Sounds of Memory: Extending the Age–Prospective Memory Paradox to Everyday Behavior and Conversations. The Journals of Gerontology: Series B 77, 4 (01 2022), 695–703. https://doi.org/10.1093/geronb/gbac012 arXiv:https://academic.oup.com/psychsocgerontology/article-pdf/77/4/695/43224411/gbac012.pdfGoogle ScholarGoogle ScholarCross RefCross Ref
  10. Simon Haykin and Zhe Chen. 2005. The cocktail party problem. Neural computation 17, 9 (2005), 1875–1902.Google ScholarGoogle Scholar
  11. Shin Katayama, Akhil Mathur, Marc Van den Broeck, Tadashi Okoshi, Jin Nakazawa, and Fahim Kawsar. 2019. Situation-Aware Emotion Regulation of Conversational Agents with Kinetic Earables. In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 725–731.Google ScholarGoogle Scholar
  12. Patrick Kenny. 2010. Bayesian speaker verification with, heavy tailed priors. Proc. Odyssey 2010 (2010).Google ScholarGoogle Scholar
  13. James F Knight and Chris Baber. 2005. A tool to assess the comfort of wearable computers. Human factors 47, 1 (2005), 77–91.Google ScholarGoogle Scholar
  14. Effie Lai-Chong Law, Asbjørn Følstad, Jonathan Grudin, and Björn Schuller. 2022. Conversational Agent as Trustworthy Autonomous System (Trust-CA)(Dagstuhl Seminar 21381). In Dagstuhl Reports, Vol. 11. Schloss Dagstuhl-Leibniz-Zentrum für Informatik.Google ScholarGoogle Scholar
  15. David A van Leeuwen and Niko Brümmer. 2007. An introduction to application-independent evaluation of speaker recognition systems. In Speaker classification I. Springer, 330–353.Google ScholarGoogle Scholar
  16. Daniel M Low, Kate H Bentley, and Satrajit S Ghosh. 2020. Automated assessment of psychiatric disorders using speech: A systematic review. Laryngoscope Investigative Otolaryngology 5, 1 (2020), 96–116.Google ScholarGoogle ScholarCross RefCross Ref
  17. Matthias R Mehl. 2017. The electronically activated recorder (EAR) a method for the naturalistic observation of daily social behavior. Current directions in psychological science 26, 2 (2017), 184–190.Google ScholarGoogle Scholar
  18. Matthias R Mehl, Samuel D Gosling, and James W Pennebaker. 2006. Personality in its natural habitat: manifestations and implicit folk theories of personality in daily life.Journal of personality and social psychology 90, 5(2006), 862.Google ScholarGoogle Scholar
  19. Matthias R Mehl, James W Pennebaker, D Michael Crow, James Dabbs, and John H Price. 2001. The Electronically Activated Recorder (EAR): A device for sampling naturalistic daily activities and conversations. Behavior research methods, instruments, & computers 33, 4 (2001), 517–523.Google ScholarGoogle Scholar
  20. Matthias R Mehl, Megan L Robbins, and Fenne große Deters. 2012. Naturalistic observation of health-relevant social processes: The Electronically Activated Recorder (EAR) methodology in psychosomatics. Psychosomatic medicine 74, 4 (2012), 410.Google ScholarGoogle ScholarCross RefCross Ref
  21. Yoshitaka Nakajima, Hideki Kashioka, Nick Campbell, and Kiyohiro Shikano. 2006. Non-audible murmur (NAM) recognition. IEICE TRANSACTIONS on Information and Systems 89, 1 (2006), 1–8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yoshitaka Nakajima, Hideki Kashioka, Kiyohiro Shikano, and Nick Campbell. 2003. Non-audible murmur recognition input interface using stethoscopic microphone attached to the skin. In 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03)., Vol. 5. IEEE, V–708.Google ScholarGoogle ScholarCross RefCross Ref
  23. National Institute of Standards and Technology. 2021. NIST 2021 Speaker Recognition Evaluation Plan. https://sre.nist.gov/.Google ScholarGoogle Scholar
  24. Wei Rao, Chenglin Xu, Eng Siong Chng, and Haizhou Li. 2019. Target Speaker Extraction for Overlapped Multi-Talker Speaker Verification. https://doi.org/10.48550/ARXIV.1902.02546Google ScholarGoogle Scholar
  25. Timothy J Trull and Ulrich Ebner-Priemer. 2013. Ambulatory assessment. Annual review of clinical psychology 9 (2013), 151.Google ScholarGoogle Scholar
  26. European Union. 2016. Art. 6.1.a, Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=celex%3A32016R0679.Google ScholarGoogle Scholar
  27. Chenglin Xu, Wei Rao, Eng Siong Chng, and Haizhou Li. 2019. Optimization of speaker extraction neural network with magnitude and temporal spectrum approximation loss. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6990–6994.Google ScholarGoogle ScholarCross RefCross Ref
  28. Zhengyou Zhang, Zicheng Liu, Mike Sinclair, Alex Acero, Li Deng, Jasha Droppo, Xuedong Huang, and Yanli Zheng. 2004. Multi-sensory microphones for robust speech detection, enhancement and recognition. In 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 3. IEEE, iii–781.Google ScholarGoogle ScholarCross RefCross Ref
  29. Siqi Zheng, Weilong Huang, Xianliang Wang, Hongbin Suo, Jinwei Feng, and Zhijie Yan. 2021. A real-time speaker diarization system based on spatial spectrum. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7208–7212.Google ScholarGoogle ScholarCross RefCross Ref
  30. Yanli Zheng, Zicheng Liu, Zhengyou Zhang, Mike Sinclair, Jasha Droppo, Li Deng, Alex Acero, and Xuedong Huang. 2003. Air-and bone-conductive integrated microphones for robust speech detection and enhancement. In 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No. 03EX721). IEEE, 249–254.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Privacy Preserving Continuous Speech Recording using Throat Microphones

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Article Metrics

      • Downloads (Last 12 months)50
      • Downloads (Last 6 weeks)4

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format