research-article

SilentMask: Mask-type Silent Speech Interface with Measurement of Mouth Movement

Authors:

Hirotaka Hiraki,

Jun RekimotoAuthors Info & Claims

AHs '21: Proceedings of the Augmented Humans International Conference 2021

Pages 86 - 90

https://doi.org/10.1145/3458709.3458985

Published: 11 July 2021 Publication History

Get Access

Abstract

Silent Speech Interaction (SSI) is a non-speech interaction used as an input method for speech recognition devices such as smartphones and as a support tool for people with speech difficulties. Conventional SSI methods using lip reading, electromyography(EMG), ultrasonic echo, and electrostatic positioning in the palate have been proposed, but there have been issues such as not being able to use one hand and being easily noticeable.

In this study, we propose a mask-based SSI that recognizes silent speech by measuring the motion around the mouth using acceleration and angular velocity sensors attached to mask.

Using two acceleration and angular velocity sensors to acquire 12-dimensional motion information around the mouth and analyzing it using deep learning, we were able to identify a total of 22 states (21 types of voice commands and no speech) with 79.9% accuracy.

The results also showed that the device can be worn for a longer period of time compared to the method of applying the sensors directly to the skin. This research presents new possibilities for masks, as they are a non-contact, unobtrusive interface that does not use camera images and is therefore independent of lighting conditions.

Supplementary Material

3458709.3458985 (3458709.3458985.mp4)

Supplementary video

Download
66.38 MB

References

[1]

Tamás Csapó, Tamás Grósz, Gábor Gosztolya, László Tóth, and Alexandra Markó. 2017. DNN-Based Ultrasound-to-Speech Conversion for a Silent Speech Interface. In interspeech 2017. 3672–3676. https://doi.org/10.21437/Interspeech.2017-939

Crossref

Google Scholar

[2]

B. Denby, T. Schultz, K. Honda, T. Hueber, J.M. Gilbert, and J.S. Brumberg. 2010. Silent speech interfaces. Speech Communication 52, 4 (2010), 270 – 287. https://doi.org/10.1016/j.specom.2009.08.002 Silent Speech Interfaces.

Digital Library

Google Scholar

[3]

Masaaki Fukumoto. 2018. SilentVoice: Unnoticeable Voice Input by Ingressive Speech. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology (Berlin, Germany) (UIST ’18). Association for Computing Machinery, New York, NY, USA, 237–246. https://doi.org/10.1145/3242587.3242603

Digital Library

Google Scholar

[4]

Charles Jorgensen and Sorin Dusan. 2010. Speech interfaces based upon surface electromyography. Speech Communication 52 (04 2010), 354–366. https://doi.org/10.1016/j.specom.2009.11.003

Digital Library

Google Scholar

[5]

Arnav Kapur, Shreyas Kapur, and Pattie Maes. 2018. AlterEgo: A Personalized Wearable Silent Speech Interface. In 23rd International Conference on Intelligent User Interfaces (Tokyo, Japan) (IUI ’18). Association for Computing Machinery, New York, NY, USA, 43–53. https://doi.org/10.1145/3172944.3172977

Digital Library

Google Scholar

[6]

Naoki Kimura, Michinari Kono, and Jun Rekimoto. 2019. SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3290605.3300376

Digital Library

Google Scholar

[7]

Richard Li, Jason Wu, and Thad Starner. 2019. TongueBoard: An Oral Interface for Subtle Input. In Proceedings of the 10th Augmented Human International Conference 2019 (Reims, France) (AH2019). Association for Computing Machinery, New York, NY, USA, Article 1, 9 pages. https://doi.org/10.1145/3311823.3311831

Digital Library

Google Scholar

[8]

Geoffrey S. Meltzner, James T. Heaton, Yunbin Deng, Gianluca De Luca, Serge H. Roy, and Joshua C. Kline. 2018. Development of sEMG sensors and algorithms for silent speech recognition., In Journal of Neural Engineering 2018. Journal of neural engineering 15, 4, 1741–2552.

Google Scholar

[9]

Jun Rekimoto and Yu Nishimura. 2020. Derma: Silent Speech Interaction by Skin Movement Measurement Silent Speech Interaction. In IPSJ Interaction 2020. Transactions of Information Processing Society of Japan, 11–20.

Google Scholar

[10]

Alexander I. Rudnicky. 1989. The Design of Voice-Driven Interfaces. In Proceedings of the Workshop on Speech and Natural Language (Philadelphia, Pennsylvania) (HLT ’89). Association for Computational Linguistics, USA, 120–124. https://doi.org/10.3115/100964.100972

Digital Library

Google Scholar

[11]

Ke Sun, Chun Yu, Weinan Shi, Lan Liu, and Yuanchun Shi. 2018. Lip-Interact: Improving Mobile Device Interaction with Silent Speech Commands. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology(Berlin, Germany) (UIST ’18). Association for Computing Machinery, New York, NY, USA, 581–593. https://doi.org/10.1145/3242587.3242599

Digital Library

Google Scholar

[12]

M. Wand, J. Koutník, and J. Schmidhuber. 2016. Lipreading with long short-term memory. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6115–6119.

Google Scholar

[13]

Michael Wand and Tanja Schultz. 2011. Session-independent EMG-based Speech Recognition., In Proceedings of Biosignals 2011. BIOSIGNALS 2011 - Proceedings of the International Conference on Bio-Inspired Systems and Signal Processing, 295–300.

Google Scholar

Cited By

View all

Srivastava TWinters RGable TWang YLaScala TTashev I(2024)Whispering Wearables: Multimodal Approach to Silent Speech Recognition with Head-Worn DevicesProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3685720(214-223)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3678957.3685720
Hiraki HRekimoto J(2024)Piezoelectric Sensing of Mask Surface Waves for Noise-Suppressive Speech InputAdjunct Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3672539.3686331(1-3)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3672539.3686331
Srivastava TKhanna PPan SNguyen PJain SShu YLiu JTan RHe YChen J(2024)Unvoiced: Designing an LLM-assisted Unvoiced User Interface using EarablesProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699374(784-798)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3666025.3699374
Show More Cited By

Recommendations

Statistical conversion of silent articulation into audible speech using full-covariance HMM

Conversion of silent articulation captured by ultrasound and video to modal speech.Comparison of GMM and full-covariance phonetic HMM without vocabulary limitation.HMM-based approach allows the use of linguistic information for regularization.Objective ...
Silent Speech Interface
Speech Reconstruction from Silent Lip and Tongue Articulation by Diffusion Models and Text-Guided Pseudo Target Generation
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

This paper studies the task of speech reconstruction from ultrasound tongue images and optical lip videos recorded in a silent speaking mode, where people only activate their intra-oral and extra-oral articulators without producing real speech. This task ...

Comments

Information & Contributors

Information

Published In

AHs '21: Proceedings of the Augmented Humans International Conference 2021

February 2021

321 pages

ISBN:9781450384285

DOI:10.1145/3458709

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

AHs '21

AHs '21: Augmented Humans International Conference 2021

February 22 - 24, 2021

Rovaniemi, Finland

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
283
Total Downloads

Downloads (Last 12 months)65
Downloads (Last 6 weeks)4

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Srivastava TWinters RGable TWang YLaScala TTashev I(2024)Whispering Wearables: Multimodal Approach to Silent Speech Recognition with Head-Worn DevicesProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3685720(214-223)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3678957.3685720
Hiraki HRekimoto J(2024)Piezoelectric Sensing of Mask Surface Waves for Noise-Suppressive Speech InputAdjunct Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3672539.3686331(1-3)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3672539.3686331
Srivastava TKhanna PPan SNguyen PJain SShu YLiu JTan RHe YChen J(2024)Unvoiced: Designing an LLM-assisted Unvoiced User Interface using EarablesProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699374(784-798)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3666025.3699374
Hiraki HKanazawa SMiura TYoshida MMochimaru MRekimoto J(2024)WhisperMask: a noise suppressive mask-type microphone for whisper speechProceedings of the Augmented Humans International Conference 202410.1145/3652920.3652925(1-14)Online publication date: 4-Apr-2024
https://dl.acm.org/doi/10.1145/3652920.3652925
Yamamoto TMasai KWithana ASugiura Y(2023)Masktrap: Designing and Identifying Gestures to Transform Mask Strap into an Input InterfaceProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584062(762-775)Online publication date: 27-Mar-2023
https://dl.acm.org/doi/10.1145/3581641.3584062
Zhang RLi KHao YWang YLai ZGuimbretière FZhang C(2023)EchoSpeech: Continuous Silent Speech Recognition on Minimally-obtrusive Eyewear Powered by Acoustic SensingProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580801(1-18)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3580801
Fujii AMurao KMatsuhisa N(2023)Pulse Wave Generation Method for PPG by Using DisplayIEEE Access10.1109/ACCESS.2023.326086211(31199-31211)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3260862
Nagayama KTakada R(2023)Unvoiced Vowel Recognition Using Active Bio-Acoustic Sensing for Silent Speech InteractionArtificial Intelligence in HCI10.1007/978-3-031-35891-3_10(150-161)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.1007/978-3-031-35891-3_10
Takeuchi MUeha R(2022)Review of the Speech-aid Device発声支援デバイスの開発と今後の展望Koutou (THE LARYNX JAPAN)10.5426/larynx.34.5834:2(58-64)Online publication date: 1-Dec-2022
https://doi.org/10.5426/larynx.34.58
Narayanaswamy RD. KS. GR VL. A(2022)An Analysis on Multimodal Framework for Silent Speech RecognitionPrinciples and Applications of Socio-Cognitive and Affective Computing10.4018/978-1-6684-3843-5.ch010(159-176)Online publication date: 30-Sep-2022
https://doi.org/10.4018/978-1-6684-3843-5.ch010
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

Supplementary Material

References

Cited By

Recommendations

Statistical conversion of silent articulation into audible speech using full-covariance HMM

Silent Speech Interface

Speech Reconstruction from Silent Lip and Tongue Articulation by Diffusion Models and Text-Guided Pseudo Target Generation

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

HTML Format

Share

Share this Publication link

Share on social media

Affiliations