Mouth State Detection From Low-Frequency Ultrasonic Reflection

McLoughlin, Ian Vince; Song, Yan

doi:10.1007/s00034-014-9904-4

Mouth State Detection From Low-Frequency Ultrasonic Reflection

Published: 09 October 2014

Volume 34, pages 1279–1304, (2015)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Ian Vince McLoughlin¹ &
Yan Song¹

231 Accesses
1 Citation
3 Altmetric
Explore all metrics

Abstract

This paper develops, simulates and experimentally evaluates a novel method based on non-contact low frequency (LF) ultrasound which can determine, from airborne reflection, whether the lips of a subject are open or closed. The method is capable of accurately distinguishing between open and closed lip states through the use of a low-complexity detection algorithm, and is highly robust to interfering audible noise. A novel voice activity detector is implemented and evaluated using the proposed method and shown to detect voice activity with high accuracy, even in the presence of high levels of background noise. The lip state detector is evaluated at a number of angles of incidence to the mouth and under various conditions of background noise. The underlying mouth state detection technique relies upon an inaudible LF ultrasonic excitation, generated in front of the face of a user, either reflecting back from their face as a simple echo in the closed mouth state or resonating inside the open mouth and vocal tract, affecting the spectral response of the reflected wave when the mouth is open. The difference between echo and resonance behaviours is used as the basis for automated lip opening detection, which implies determining whether the mouth is open or closed at the lips. Apart from this, potential applications include use in voice generation prosthesis for speech impaired patients, or as a hands-free control for electrolarynx and similar rehabilitation devices. It is also applicable to silent speech interfaces and may have use for speech authentication.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Databases, features and classifiers for speech emotion recognition: a review

Article 19 January 2018

Milestones in speaker recognition

Article Open access 15 February 2024

Guidelines for appropriate use of BirdNET scores and other detector outputs

Article 14 February 2024

Notes

The speed of sound is approximated to 1,600 m/s in muscle and 343 m/s in air.
The six vowel geometries are: /i/, /æ/, /u/, /ɛ/, /ɔ/, /o/ as in heed, had, who, head, paw, and hoe respectively.
The conversion was made using the lpcaa2rf() and lpcrf2ar() functions from the excellent Voicebox package [9]
Office and Car recordings were obtained as 96 kHz, 24- and 32-bit sample files from Freesound.org (nos. 108695 and 193780 respectively), recorded on Tascam DR-100 mk-II using on board directional condenser microphones (TEAC Corp., Tokyo, Japan). Other recordings were made by the author using the on board directional condenser microphones of a Zoom H4n (Zoom Corp., Tokyo, Japan), recorded at a 96 kHz sample rate with 16-bit resolution. The original recordings are available upon request.
Note that, since the system detects lip opening rather than speaking, it is possible that some of these false detections did actually correspond to non-speech lip opening events if the subject opened their lips, for example to breathe through their mouth.

References

F. Ahmadi, Voice replacement for the severely speech impaired through sub-ultrasonic excitation of the vocal tract. Ph.D. Thesis, Nanyang Technological University (2013). http://repository.ntu.edu.sg/handle/10356/52661
F. Ahmadi, M. Ahmadi, I.V. McLoughlin, Human mouth state detection using low frequency ultrasound, in INTERSPEECH, (2013) pp. 1806–1810
F. Ahmadi, I.V. McLoughlin, The use of low-frequency ultrasonics in speech processing, in Signal Processing, ed. by Sebastian Miron (InTech, 2010). ISBN: 978-953-7619-91-6
F. Ahmadi, I.V. McLoughlin, Measuring resonances of the vocal tract using frequency sweeps at the lips, in 2012 5th International Symposium on Communications Control and Signal Processing (ISCCSP) (2012)
F. Ahmadi, I.V. McLoughlin, S. Chauhan, G. ter Haar, Bio-effects and safety of low-intensity, low-frequency ultrasonic exposure. Progr. Biophys. Mol. Biol. 108, 3 (2012)
Article Google Scholar
F. Ahmadi, I.V. McLoughlin, H.R. Sharifzadeh, Autoregressive modelling for linear prediction of ultrasonic speech, in INTERSPEECH, (2010), pp. 1616–1619
S.P. Arjunan, H. Weghorn, D.K. Kumar, W.C. Yau, Vowel recognition of English and German language using facial movement (SEMG) for speech control based HCI, in Proceedings of the HCSNet workshop on Use of vision in human–computer interaction—Volume 56, VisHCI ’06, ( Australian Computer Society, Inc. 2006), pp. 13–18
D. Beautemps, P. Badin, R. Laboissihere, Deriving vocal-tract area functions from midsagittal profiles and formant frequencies: a new model for vowels and fricative consonants based on experimental data. Speech Commun. 16, 27–47 (1995)
Article Google Scholar
M. Brookes, et al., Voicebox: Speech processing toolbox for matlab. Software, available [Mar. 2011] from www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html (1997)
G.L. Calhoun, G.R. McMillan, Hands-free input devices for wearable computers, in Proceedings of the Fourth Symposium on Human Interaction with Complex Systems, HICS ’98, (IEEE Computer Society 1998) p. 118
B.G. Douglass, Apparatus and Method for Detecting Speech Using Acoustic Signals Outside the Audible Frequency Range (United States Patent and Trademark Office, United States, 2006)
Google Scholar
J. Epps, J.R. Smith, J. Wolfe, A novel instrument to measure acoustic resonances of the vocal tract during speech. Meas. Sci. Technol. 8, 1112–1121 (1997)
Article Google Scholar
L.J. Eriksson, Higher order mode effects in circular ducts and expansion chambers. J. Acoust. Soc. Am. 68(2), 545–550 (1980)
Article MathSciNet Google Scholar
J.-P. Fouque, J. Garnier, G. Papanicolaou, K. Solna, Wave Propagation and Time Reversal in Randomly Layered Media (Springer, 2010)
J. Freitas, A. Teixeira, M.S. Dias, Towards a silent speech interface for Portuguese: surface electromyography and the nasality challenge, in Proceedings of the International Conference on Bio-inspired Systems and Signal Processing BIOSIGNALS 2012 (Vilamoura, Algarve, Portugal, 2012)
C. Jorgensen, S. Dusan, Speech interfaces based upon surface electromyography. Speech Commun. 52(4), 354–366 (2010)
Article Google Scholar
K. Kalgaonkar, R. Hu, B. Raj, Ultrasonic doppler sensor for voice activity detection. IEEE Signal Process. Lett. 14(10), 754–757 (2007)
Article Google Scholar
R. Kaucic, B. Dalton, A. Blake, Real-time lip tracking for audio-visual speech recognition applications, in Computer Vision ECCV ’96, vol. 1065, Lecture Notes in Computer Science, ed. by B. Buxton, R. Cipolla (Springer, Berlin / Heidelberg, 1996), pp. 376–387
M. Kob, C. Neuschaefer-Rube, A method for measurement of the vocal tract impedance at the mouth. Med. Eng. Phys. 24, 467–471 (2002)
Article Google Scholar
R.J. Lahr, Head-worn, Trimodal Device to Increase Transcription Accuracy in a Voice Recognition System and to Process Unvocalized Speech (United States Patent and Trademark Office, United States, 2002)
Google Scholar
I. McLoughlin, Super-audible voice activity detection. IEEE/ACM Trans. Audio Speech Lang. Process. 22(9), 1424–1433 (2014). doi:10.1109/TASLP.2014.2335055
Article Google Scholar
I.V. McLoughlin, Applied Speech and Audio Processing (Cambridge University Press, Cambridge, 2009)
Book Google Scholar
I.V. McLoughlin, F. Ahmadi, Method and apparatus for determining mouth state using low frequency ultrasonics. UK Patent Office (pending) (2012)
I.V. McLoughlin, F. Ahmadi, A new mechanical index for gauging the human bioeffects of low frequency ultrasound, in Proceedings of the IEEE Engineering in Medicine and Biology Conference, (2013), pp. 1964–1967
B. Rivet, L. Girin, C. Jutten, Mixing audiovisual speech processing and blind source separation for the extraction of speech signals from convolutive mixtures. IEEE Trans. Audio Speech Lang. Process. 15(1), 96–108 (2007)
Article Google Scholar
H.R. Sharifzadeh, I.V. McLoughlin, F. Ahmadi, Speech rehabilitation methods for laryngectomised patients, in Electronic Engineering and Computing Technology, vol. 60, Lecture Notes in Electrical Engineering, ed. by S.I. Ao, L. Gelman (Springer, Netherlands, 2010), pp. 597–607
D.J. Sinder, Speech synthesis using an aeroacoustic fricative model (PhD Thesis). The State University of New Jersey (1999)
M.M. Sondhi, B. Gopinath, Determination of vocal-tract shape from impulse response at the lips. J. Acoust. Soc. Am. 49(6), 1867–1873 (1971)
Article Google Scholar
B.H. Story, Physiologically-based speech simulation using an enhanced wave-reflection model of the vocal tract (PhD Thesis). The University of Iowa (1995)
B.H. Story, I.R. Titze, E.A. Hoffman, Vocal tract area functions from magnetic resonance imaging. J. Acoust. Soc. Am. 100, 1 (1996)
Google Scholar
Texas Instruments: TIMIT database (Texas Instruments and MIT). a CD-ROM database of phonetically classified recordings of sentences spoken by a number of different male and female speakers (1990)
C.A. Tosaya, J.W. Sliwa, Signal Injection Coupling into the Human Vocal Tract for Robust Audible and Inaudible Voice Recognition (United States Patent and Trademark Office, United States, 1999)
Google Scholar
H.K. Vorperian, S. Wang, M.K. Chung, E.M. Schimek, R.B. Durtschi, R.D. Kent, A.J. Ziegert, L.R. Gentry, Anatomic development of the oral and pharyngeal portions of the vocal tract: an imaging study. J. Acoust. Soc. Am. 125, 1666 (2009)
Article Google Scholar
J. Wolfe, M. Garnier, J. Smith, Vocal tract resonances in speech, singing and playing musical instruments. Hum. Front. Sci. Progr. J. 3, 6–23 (2009)
Google Scholar
J.A. Zagzebski, Essentials of Ultrasound Physics (Mosby, Elsevier, St. Louis, 1996)
A.J. Zuckerwar, Speed of sound in fluids, in Handbook of Acoustics, ed. by M.J. Crocker (Wiley, New York, 1998)
Google Scholar

Download references

Acknowledgments

Some of the data for this paper was recorded and processed at the School of Computer Engineering, Nanyang Technological University (NTU), Singapore by student assistants Farzaneh Ahmadi, Mark Huan, and Chu Thanh Minh. Their contribution to this work is gratefully acknowledged, particularly the PhD research of Farzaneh Ahmadi [1]. Thanks are also due to Prof. Eng Siong Chng of NTU, and Jingjie Li of USTC for their assistance with the experimental work.

Author information

Authors and Affiliations

National Engineering Laboratory of Speech and Language Information Processing, The University of Science & Technology of China, Hefei, 230027, Anhui, China
Ian Vince McLoughlin & Yan Song

Authors

Ian Vince McLoughlin
View author publications
You can also search for this author in PubMed Google Scholar
Yan Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ian Vince McLoughlin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

McLoughlin, I.V., Song, Y. Mouth State Detection From Low-Frequency Ultrasonic Reflection. Circuits Syst Signal Process 34, 1279–1304 (2015). https://doi.org/10.1007/s00034-014-9904-4

Download citation

Received: 25 February 2014
Revised: 24 September 2014
Accepted: 24 September 2014
Published: 09 October 2014
Issue Date: April 2015
DOI: https://doi.org/10.1007/s00034-014-9904-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mouth State Detection From Low-Frequency Ultrasonic Reflection

Abstract

Access this article

Similar content being viewed by others

Databases, features and classifiers for speech emotion recognition: a review

Milestones in speaker recognition

Guidelines for appropriate use of BirdNET scores and other detector outputs

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mouth State Detection From Low-Frequency Ultrasonic Reflection

Abstract

Access this article

Similar content being viewed by others

Databases, features and classifiers for speech emotion recognition: a review

Milestones in speaker recognition

Guidelines for appropriate use of BirdNET scores and other detector outputs

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation