ABSTRACT
Sensing movements and gestures inside the oral cavity has been a long-standing challenge for the wearable research community. This paper introduces EchoNose, a novel nose interface that explores a unique sensing approach to recognize gestures related to mouth, breathing, and tongue by analyzing the acoustic signal reflections inside the nasal and oral cavities. The interface incorporates a speaker and a microphone placed at the nostrils, emitting inaudible acoustic signals and capturing the corresponding reflections. These received signals were processed using a customized data processing and machine learning pipeline, enabling the distinction of 16 gestures involving speech, tongue, and breathing. A user study with 10 participants demonstrates that EchoNose achieves an average accuracy of 93.7% in recognizing these 16 gestures. Based on these promising results, we discuss the potential opportunities and challenges associated with applying this innovative nose interface in various future applications.
- Abdelkareem Bedri, Himanshu Sahni, Pavleen Thukral, Thad Starner, David Byrd, Peter Presti, Gabriel Reyes, Maysam Ghovanloo, and Zehua Guo. 2015. Toward Silent-Speech Control of Consumer Wearables. Computer 48, 10 (2015), 54–62. https://doi.org/10.1109/MC.2015.310Google ScholarDigital Library
- Ho-Seung Cha, Won-Du Chang, and Chang-Hwan Im. 2022. Deep-learning-based real-time silent speech recognition using facial electromyogram recorded around eyes for hands-free interfacing in a virtual reality environment. Virtual Reality 26, 3 (2022), 1047–1057. https://doi.org/10.1007/s10055-021-00616-0Google ScholarDigital Library
- Yeqing Chen, Yulong Bian, Wei Gai, and Chenglei Yang. 2020. Adaptive Blowing Interaction Method Based on a Siamese Network. IEEE Access 8 (2020), 115486–115500. https://doi.org/10.1109/ACCESS.2020.3004349Google ScholarCross Ref
- Dyson. 2023. Zone™ air-purifying headphones | Dyson US. Retrieved May 26, 2023 from https://www.dyson.com/headphones/zoneGoogle Scholar
- Mayank Goel, Chen Zhao, Ruth Vinisha, and Shwetak N. Patel. 2015. Tongue-in-Cheek: Using Wireless Signals to Enable Non-Intrusive and Flexible Facial Gestures Detection. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). Association for Computing Machinery, New York, NY, USA, 255–258. https://doi.org/10.1145/2702123.2702591Google ScholarDigital Library
- Takuma Hashimoto, Suzanne Low, Koji Fujita, Risa Usumi, Hiroshi Yanagihara, Chihiro Takahashi, Maki Sugimoto, and Yuta Sugiura. 2018. TongueInput: Input Method by Tongue Gestures Using Optical Sensors Embedded in Mouthpiece. In 2018 57th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE). 1219–1224. https://doi.org/10.23919/SICE.2018.8492690Google ScholarCross Ref
- Yincheng Jin, Yang Gao, Xuhai Xu, Seokmin Choi, Jiyang Li, Feng Liu, Zhengxiong Li, and Zhanpeng Jin. 2022. EarCommand: "Hearing" Your Silent Speech Commands In Ear. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 2, Article 57 (jul 2022), 28 pages. https://doi.org/10.1145/3534613Google ScholarDigital Library
- Naoki Kimura, Tan Gemicioglu, Jonathan Womack, Richard Li, Yuhui Zhao, Abdelkareem Bedri, Alex Olwal, Jun Rekimoto, and Thad Starner. 2021. Mobile, Hands-Free, Silent Speech Texting Using SilentSpeller. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI EA ’21). Association for Computing Machinery, New York, NY, USA, Article 178, 5 pages. https://doi.org/10.1145/3411763.3451552Google ScholarDigital Library
- Naoki Kimura, Michinari Kono, and Jun Rekimoto. 2019. SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3290605.3300376Google ScholarDigital Library
- Ke Li, Ruidong Zhang, Bo Liang, François Guimbretière, and Cheng Zhang. 2022. EarIO: A Low-Power Acoustic Sensing Earable for Continuously Tracking Detailed Facial Movements. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 2, Article 62 (jul 2022), 24 pages. https://doi.org/10.1145/3534621Google ScholarDigital Library
- Richard Li, Jason Wu, and Thad Starner. 2019. TongueBoard: An Oral Interface for Subtle Input. In Proceedings of the 10th Augmented Human International Conference 2019 (Reims, France) (AH2019). Association for Computing Machinery, New York, NY, USA, Article 1, 9 pages. https://doi.org/10.1145/3311823.3311831Google ScholarDigital Library
- Hangue Park, Mehdi Kiani, Hyung-Min Lee, Jeonghee Kim, Jacob Block, Benoit Gosselin, and Maysam Ghovanloo. 2012. A Wireless Magnetoresistive Sensing System for an Intraoral Tongue-Computer Interface. IEEE Transactions on Biomedical Circuits and Systems 6, 6 (2012), 571–585. https://doi.org/10.1109/TBCAS.2012.2227962Google ScholarCross Ref
- Makoto Sasaki, Kohei Onishi, Dimitar Stefanov, Katsuhiro Kamata, Atsushi Nakayama, Masahiro Yoshikawa, and Goro Obinata. 2016. Tongue interface based on surface EMG signals of suprahyoid muscles. ROBOMECH Journal 3, 1 (April 2016). https://doi.org/10.1186/s40648-016-0048-0Google ScholarCross Ref
- Misha Sra, Xuhai Xu, and Pattie Maes. 2018. BreathVR: Leveraging Breathing as a Directly Controlled Interface for Virtual Reality Games. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3173574.3173914Google ScholarDigital Library
- Ruidong Zhang, Mingyang Chen, Benjamin Steeper, Yaxuan Li, Zihan Yan, Yizhuo Chen, Songyun Tao, Tuochao Chen, Hyunchul Lim, and Cheng Zhang. 2022. SpeeChin: A Smart Necklace for Silent Speech Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 4, Article 192 (dec 2022), 23 pages. https://doi.org/10.1145/3494987Google ScholarDigital Library
- Ruidong Zhang, Ke Li, Yihong Hao, Yufan Wang, Zhengnan Lai, François Guimbretière, and Cheng Zhang. 2023. EchoSpeech: Continuous Silent Speech Recognition on Minimally-Obtrusive Eyewear Powered by Acoustic Sensing. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 852, 18 pages. https://doi.org/10.1145/3544548.3580801Google ScholarDigital Library
- Yuzhou Zhuang, Yuntao Wang, Yukang Yan, Xuhai Xu, and Yuanchun Shi. 2021. ReflecTrack: Enabling 3D Acoustic Position Tracking Using Commodity Dual-Microphone Smartphones. In The 34th Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’21). Association for Computing Machinery, New York, NY, USA, 1050–1062. https://doi.org/10.1145/3472749.3474805Google ScholarDigital Library
- Daniel Zielasko, Neha Neha, Benjamin Weyers, and Torsten W. Kuhlen. 2017. A reliable non-verbal vocal input metaphor for clicking. In 2017 IEEE Symposium on 3D User Interfaces (3DUI). 40–49. https://doi.org/10.1109/3DUI.2017.7893316Google ScholarCross Ref
Index Terms
- EchoNose: Sensing Mouth, Breathing and Tongue Gestures inside Oral Cavity using a Non-contact Nose Interface
Recommendations
HPSpeech: Silent Speech Interface for Commodity Headphones
ISWC '23: Proceedings of the 2023 ACM International Symposium on Wearable ComputersWe present HPSpeech, a silent speech interface for commodity headphones. HPSpeech utilizes the existing speakers of the headphones to emit inaudible acoustic signals. The movements of the temporomandibular joint (TMJ) during speech modify the ...
EarCommand: "Hearing" Your Silent Speech Commands In Ear
Intelligent speech interfaces have been developing vastly to support the growing demands for convenient control and interaction with wearable/earable and portable devices. To avoid privacy leakage during speech interactions and strengthen the resistance ...
Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips
This article presents a segmental vocoder driven by ultrasound and optical images (standard CCD camera) of the tongue and lips for a ''silent speech interface'' application, usable either by a laryngectomized patient or for silent communication. The ...
Comments