skip to main content
research-article

StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the Skin

Published: 09 September 2024 Publication History

Abstract

We introduce StethoSpeech, a silent speech interface that transforms flesh-conducted vibrations behind the ear into speech. This innovation is designed to improve social interactions for those with voice disorders, and furthermore enable discreet public communication. Unlike prior efforts, StethoSpeech does not require (a) paired-speech data for recorded vibrations and (b) a specialized device for recording vibrations, as it can work with an off-the-shelf clinical stethoscope. The novelty of our framework lies in the overall design, simulation of the ground-truth speech, and a sequence-to-sequence translation network, which works in the latent space. We present comprehensive experiments on the existing CSTR NAM TIMIT Plus corpus and our proposed StethoText: a large-scale synchronized database of non-audible murmur and text for speech research. Our results show that StethoSpeech provides natural-sounding and intelligible speech, significantly outperforming existing methods on several quantitative and qualitative metrics. Additionally, we showcase its capacity to extend its application to speakers not encountered during training and its effectiveness in challenging, noisy environments. Speech samples are available at https://stethospeech.github.io/StethoSpeech/.

References

[1]
Sercan Arik, Jitong Chen, Kainan Peng, Wei Ping, and Yanqi Zhou. 2018. Neural Voice Cloning with a Few Samples. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2018/file/4559912e7a94a9c32b09d894f2bc3c82-Paper.pdf
[2]
Alexei Baevski and Abdelrahman Mohamed. 2020. Effectiveness of Self-Supervised Pre-Training for ASR. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7694--7698. https://doi.org/10.1109/ICASSP40776.2020.9054224
[3]
Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: a framework for self-supervised learning of speech representations. In Proceedings of the 34th International Conference on Neural Information Processing Systems (Vancouver, BC, Canada) (NIPS '20). Curran Associates Inc., Red Hook, NY, USA, Article 1044, 12 pages. https://dl.acm.org/doi/abs/10.5555/3495724.3496768
[4]
Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, and Neil Zeghidour. 2023. AudioLM: A Language Modeling Approach to Audio Generation. 31 (jun 2023), 2523--2533. https://doi.org/10.1109/TASLP.2023.3288409
[5]
Itzhak Brook and Joseph F Goodman. 2020. Tracheoesophageal voice prosthesis use and maintenance in laryngectomees. International Archives of Otorhinolaryngology 24, 04 (2020), 535--538. https://www.scielo.br/j/iao/a/X6VSZFNCS4VSHwDYBH4WXqp/?lang=en#
[6]
Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, et al. 2022. Wavlm: Large-scale self-supervised pre-training for full stack speech processing. IEEE Journal of Selected Topics in Signal Processing 16, 6 (2022), 1505--1518. https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9814838
[7]
Steven R. Cox, Julie A. Theurer, Sandi J. Spaulding, and Philip C. Doyle. 2015. The multidimensional impact of total laryngectomy on women. Journal of Communication Disorders 56 (2015), 59--75. https://www.sciencedirect.com/science/article/abs/pii/S002199241500043X
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171--4186. https://doi.org/10.18653/V1/N19-1423
[9]
Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Haithem Boussaid, Ebtessam Almazrouei, and Merouane Debbah. 2023. Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13790--13801. https://openaccess.thecvf.com/content/ICCV2023/papers/Djilali_Lip2Vec_Efficient_and_Robust_Visual_Speech_Recognition_via_Latent-to-Latent_Visual_ICCV_2023_paper.pdf
[10]
Carol Y. Espy-Wilson, Venkatesh R. Chari, Joel M. MacAuslan, Caroline B. Huang, and Michael J. Walsh. 1998. Enhancement of Electrolaryngeal Speech by Adaptive Filtering. Journal of Speech, Language, and Hearing Research 41, 6 (1998), 1253--1264. https://doi.org/10.1044/jslhr.4106.1253
[11]
John S Garofolo, Lori F Lamel, William M Fisher, Jonathan G Fiscus, and David S Pallett. 1993. DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon technical report n 93 (1993), 27403. https://ui.adsabs.harvard.edu/abs/1993STIN...9327403G/abstract
[12]
Jose A. Gonzalez, Lam A. Cheah, Angel M. Gomez, Phil D. Green, James M. Gilbert, Stephen R. Ell, Roger K. Moore, and Ed Holdsworth. 2017. Direct Speech Reconstruction From Articulatory Sensor Data by Machine Learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, 12 (2017), 2362--2374. https://doi.org/10.1109/TASLP.2017.2757263
[13]
Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning (Pittsburgh, Pennsylvania, USA) (ICML '06). Association for Computing Machinery, New York, NY, USA, 369--376. https://doi.org/10.1145/1143844.1143891
[14]
Horn H, Göz G, Bacher M, Müllauer M, Kretschmer I, and Axmann-Krcmar D. 1997. Reliability of electromagnetic articulography recording during speaking sequences. European journal of orthodontics 19, 6 (1997), 647--655. https://doi.org/10.1093/ejo/19.6.647
[15]
Panikos Heracleous, Tomomi Kaino, Hiroshi Saruwatari, and Kiyohiro Shikano. 2006. Unvoiced speech recognition using tissue-conductive acoustic sensor. EURASIP Journal on Advances in Signal Processing 2007 (2006), 1--11. https://asp-eurasipjournals.springeropen.com/articles/10.1155/2007/94068
[16]
Tatsuya Hirahara, Shota Shimizu, and Makoto Otani. 2007. Acoustic characteristics of non-audible murmur. In The Japan China Joint Conference of Acoustics, Vol. 100. 4000. https://www.researchgate.net/profile/Tatsuya-Hirahara/publication/251737500_ACOUSTIC_CHARACTERISTICS_OF_NON-AUDIBLE_MURMUR/links/00b7d52a80765613f2000000/ACOUSTIC-CHARACTERISTICS-OF-NON-AUDIBLE-MURMUR.pdf
[17]
Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. 2021. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units. IEEE/ACM Trans. Audio, Speech and Lang. Proc. 29 (oct 2021), 3451--3460. https://doi.org/10.1109/TASLP.2021.3122291
[18]
Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, and Yossi Adi. 2023. ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18795--18805. https://openaccess.thecvf.com/content/CVPR2023/html/Hsu_ReVISE_Self-Supervised_Speech_Resynthesis_With_Visual_Input_for_Universal_and_CVPR_2023_paper.html
[19]
Keith Ito and Linda Johnson. 2017. The LJ Speech Dataset. https://keithito.com/LJ-Speech-Dataset/.
[20]
M. Janke, M. Wand, T. Heistermann, T. Schultz, and K. Prahallad. 2014. Fundamental frequency generation for whisper-to-audible speech conversion. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2579--2583. https://doi.org/10.1109/ICASSP.2014.6854066
[21]
Arnav Kapur, Shreyas Kapur, and Pattie Maes. 2018. AlterEgo: A Personalized Wearable Silent Speech Interface. In Proceedings of the 23rd International Conference on Intelligent User Interfaces (Tokyo, Japan) (IUI '18). Association for Computing Machinery, New York, NY, USA, 43--53. https://doi.org/10.1145/3172944.3172977
[22]
Yoshinobu Kikuchi and Hideki Kasuya. 2004. Development and evaluation of pitch adjustable electrolarynx. In Speech Prosody 2004, International Conference (Nara, Japan). https://www.isca-archive.org/speechprosody_2004/kikuchi04_speechprosody.pdf
[23]
Naoki Kimura, Tan Gemicioglu, Jonathan Womack, Richard Li, Yuhui Zhao, Abdelkareem Bedri, Alex Olwal, Jun Rekimoto, and Thad Starner. 2021. Mobile, Hands-free, Silent Speech Texting Using SilentSpeller. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI EA '21). Association for Computing Machinery, New York, NY, USA, Article 178, 5 pages. https://doi.org/10.1145/3411763.3451552
[24]
Juerg Kollbrunner, Anne-Dorine Menet, and Eberhard Seifert. 2010. Psychogenic aphonia: no fixation even after a lengthy period of aphonia. Swiss medical weekly 140, 1--2 (2010), 12--17. https://boris.unibe.ch/1509/1/smw-12776.pdf
[25]
Jungil Kong, Jaehyeon Kim, and Jaekyoung Bae. 2020. HiFi-GAN: generative adversarial networks for efficient and high fidelity speech synthesis. In Proceedings of the 34th International Conference on Neural Information Processing Systems (Vancouver, BC, Canada) (NIPS '20). Curran Associates Inc., Red Hook, NY, USA, Article 1428, 12 pages. https://dl.acm.org/doi/abs/10.5555/3495724.3497152
[26]
Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu Anh Nguyen, Morgan Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, and Yossi Adi. 2022. Textless Speech Emotion Conversion using Discrete & Decomposed Representations. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 11200--11214. https://doi.org/10.18653/v1/2022.emnlp-main.769
[27]
Harshit Malaviya, Jui Shah, Maitreya Patel, Jalansh Munshi, and Hemant A. Patil. 2020. Mspec-Net: Multi-Domain Speech Conversion Network. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7764--7768. https://doi.org/10.1109/ICASSP40776.2020.9052966
[28]
Héctor A. Cordourier Maruri, Paulo Lopez-Meyer, Jonathan Huang, Willem Beltman, Lama Nachman, and Hong Lu. 2020. V-Speech: Noise-Robust Speech Capturing Glasses Using Vibration Sensors. GetMobile: Mobile Comp. and Comm. 24, 2 (sep 2020), 18--24. https://doi.org/10.1145/3427384.3427392
[29]
Yoshitaka Nakajima, Hideki Kashioka, Kiyohiro Shikano, and Nick Campbell. 2003. Non-audible murmur recognition input interface using stethoscopic microphone attached to the skin. In 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)., Vol. 5. V-708. https://doi.org/10.1109/ICASSP.2003.1200069
[30]
Yoshitaka Nakajima and Kiyohiro Shikano. 2006. Methods of fitting a nonaudible murmur microphone for daily use and development of urethane elastmer duplex structure type nonaudible murmur microphone. The Journal of the Acoustical Society of America 120, 5 (2006), 3330--3330. https://www.researchgate.net/publication/272259200_Methods_of_fitting_a_nonaudible_murmur_microphone_for_daily_use_and_development_of_urethane_elastmer_duplex_structure_type_nonaudible_murmur_microphone
[31]
Yuto Otani, Shun Sawada, Hidefumi Ohmura, and Kouichi Katsurada. 2023. Speech Synthesis from Articulatory Movements Recorded by Real-time MRI.In Proc. INTERSPEECH 2023. 127--131. https://doi.org/10.21437/Interspeech.2023-286
[32]
Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A Fast, Extensible Toolkit for Sequence Modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Waleed Ammar, Annie Louis, and Nasrin Mostafazadeh (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 48--53. https://doi.org/10.18653/v1/N19-4009
[33]
Mihir Parmar, Savan Doshi, Nirmesh J. Shah, Maitreya Patel, and Hemant A. Patil. 2019. Effectiveness of Cross-Domain Architectures for Whisper-to-Normal Speech Conversion. In 2019 27th European Signal Processing Conference (EUSIPCO). 1--5. https://doi.org/10.23919/EUSIPCO.2019.8902961
[34]
Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Abdelrahman Mohamed, and Emmanuel Dupoux. 2021. Speech Resynthesis from Discrete Disentangled Self-Supervised Representations. In Proc. Interspeech 2021. 3615--3619. https://doi.org/10.21437/Interspeech.2021-475
[35]
K R Prajwal, Rudrabha Mukhopadhyay, Vinay P. Namboodiri, and C.V. Jawahar. 2020. Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://openaccess.thecvf.com/content_CVPR_2020/papers/Prajwal_Learning_Individual_Speaking_Styles_for_Accurate_Lip_to_Speech_Synthesis_CVPR_2020_paper.pdf
[36]
Thomas F. Quatieri. 2002. Discrete-time speech signal processing: principles and practice. Pearson Education India. https://books.google.co.in/books/about/Discrete_time_Speech_Signal_Processing.html?id=5KYeAQAAIAAJ
[37]
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine Mcleavey, and Ilya Sutskever. 2023. Robust Speech Recognition via Large-Scale Weak Supervision. In Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 28492--28518. https://proceedings.mlr.press/v202/radford23a.html
[38]
Jun Rekimoto. 2023. WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech Interactions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23). Association for Computing Machinery, New York, NY, USA, Article 700, 12 pages. https://doi.org/10.1145/3544548.3580706
[39]
Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, and Tie-Yan Liu. 2021. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech. In International Conference on Learning Representations. https://openreview.net/forum?id=piLPYqxtWuA
[40]
Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, and Tie-Yan Liu. 2019. FastSpeech: fast, robust and controllable text to speech. Curran Associates Inc., Red Hook, NY, USA. https://dl.acm.org/doi/abs/10.5555/3454287.3454572
[41]
Neha Sahipjohn, Neil Shah, Vishal Tambrahalli, and Vineet Gandhi. 2023. RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations. In 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 1492--1499. https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10317357
[42]
Steffen Schneider, Alexei Baevski, Ronan Collobert, and Michael Auli. 2019. wav2vec: Unsupervised Pre-Training for Speech Recognition. In Proc. Interspeech 2019. 3465--3469. https://doi.org/10.21437/Interspeech.2019-1873
[43]
Neil Shah, Saiteja Kosgi, Vishal Tambrahalli, Neha S, Anil Nelakanti, and Vineet Gandhi. 2024. ParrotTTS: Text-to-speech synthesis exploiting disentangled self-supervised representations. In Findings of the Association for Computational Linguistics: EACL 2024. Association for Computational Linguistics, St. Julian's, Malta, 79--91. https://aclanthology.org/2024.findings-eacl.6
[44]
Neil Shah, Nirmesh Shah, and Hemant Patil. 2018. Effectiveness of Generative Adversarial Network for Non-Audible Murmur-to-Whisper Speech Conversion. In Proc. Interspeech 2018. 3157--3161. https://doi.org/10.21437/Interspeech.2018-1565
[45]
Nirmesh J Shah, Mihir Parmar, Neil Shah, and Hemant A Patil. 2018. Novel MMSE DiscoGAN for cross-domain whisper-to-speech conversion. In Machine Learning in Speech and Language Processing (MLSLP) Workshop. Google Office, 1--3. https://www.researchgate.net/publication/326668619_Novel_MMSE_DiscoGAN_for_Cross-Domain_Whisper-to-Speech_Conversion
[46]
Shota Shimizu, Makoto Otani, and Tatsuya Hirahara. 2009. Frequency characteristics of several non-audible murmur (NAM) microphones. Acoustical science and technology 30, 2 (2009), 139--142. https://www.jstage.jst.go.jp/article/ast/30/2/30_2_139/_pdf
[47]
Abhayjeet Singh, Amala Nagireddi, Anjali Jayakumar, Deekshitha G, Jesuraja Bandekar, Roopa R, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Prasanta Kumar Ghosh, Hema A Murthy, et al. 2024. Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech. IEEE Open Journal of Signal Processing 5 (2024), 790--798. https://doi.org/10.1109/OJSP.2024.3379092
[48]
Zixiong Su, Shitao Fang, and Jun Rekimoto. 2023. LipLearner: Customizable Silent Speech Interactions on Mobile Devices. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI '23). Association for Computing Machinery, New York, NY, USA, Article 696, 21 pages. https://doi.org/10.1145/3544548.3581465
[49]
Tomoki Toda, Alan W. Black, and Keiichi Tokuda. 2007. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory. IEEE Transactions on Audio, Speech, and Language Processing 15, 8 (2007), 2222--2235. https://doi.org/10.1109/TASL.2007.907344
[50]
Tomoki Toda, Keigo Nakamura, Hidehiko Sekimoto, and Kiyohiro Shikano. 2009. Voice conversion for various types of body transmitted speech. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. 3601--3604. https://doi.org/10.1109/ICASSP.2009.4960405
[51]
Tomoki Toda and Kiyohiro Shikano. 2005. NAM-to-speech conversion with Gaussian mixture models. In Proc. Interspeech 2005. 1957--1960. https://doi.org/10.21437/Interspeech.2005-611
[52]
Viet-Anh Tran, Gérard Bailly, Hélène Lœvenbruck, and Tomoki Toda. 2010. Improvement to a NAM-captured whisper-to-speech system. Speech Communication 52, 4 (2010), 314--326. https://doi.org/10.1016/j.specom.2009.11.005 Silent Speech Interfaces.
[53]
László Tóth, Gábor Gosztolya, Tamás Grósz, Alexandra Markó, and Tamás Gábor Csapó. 2018. Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces. In Proc. Interspeech 2018. 3172--3176. https://doi.org/10.21437/Interspeech.2018-1078
[54]
Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. 2017. Neural discrete representation learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6309--6318. https://dl.acm.org/doi/10.5555/3295222.3295378
[55]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6000--6010. https://dl.acm.org/doi/10.5555/3295222.3295349
[56]
Patrick von Platen. 2022. Fine-tuning Wav2Vec2 for English ASR. https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_tuning_Wav2Vec2_for_English_ASR.ipynb Google Colab Notebook, Last accessed on 06 September 2023.
[57]
Jingxian Wang, Chengfeng Pan, Haojian Jin, Vaibhav Singh, Yash Jain, Jason I. Hong, Carmel Majidi, and Swarun Kumar. 2020. RFID Tattoo: A Wireless Platform for Speech Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 4, Article 155 (sep 2020), 24 pages. https://doi.org/10.1145/3369812
[58]
Chen-Yu Yang, Georgina Brown, Liang Lu, Junichi Yamagishi, and Simon King. 2012. Noise-robust whispered speech recognition using a non-audible-murmur microphone with VTS compensation. In 2012 8th International Symposium on Chinese Spoken Language Processing. 220--223. https://doi.org/10.1109/ISCSLP.2012.6423522
[59]
Shang Zeng, Haoran Wan, Shuyu Shi, and Wei Wang. 2023. mSilent: Towards General Corpus Silent Speech Recognition Using COTS mmWave Radar. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 7, 1, Article 39 (mar 2023), 28 pages. https://doi.org/10.1145/3580838
[60]
zeta chicken. 2017. toWhisper. https://github.com/zeta-chicken/toWhisper 2017 github.

Index Terms

  1. StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the Skin

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
      Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 8, Issue 3
      September 2024
      1782 pages
      EISSN:2474-9567
      DOI:10.1145/3695755
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 September 2024
      Published in IMWUT Volume 8, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. HuBERT
      2. NAM-to-speech conversion
      3. StethoSpeech
      4. artificial learning
      5. self-supervised learning
      6. silent speech
      7. zero-pair setting

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 258
        Total Downloads
      • Downloads (Last 12 months)258
      • Downloads (Last 6 weeks)52
      Reflects downloads up to 17 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media