An Algorithm for Detection of Breath Sounds in Spontaneous Speech with Application to Speaker Recognition

Dumpala, Sri Harsha; Alluri, K. N. R. K. Raju

doi:10.1007/978-3-319-66429-3_9

Sri Harsha Dumpala¹⁶ &
K. N. R. K. Raju Alluri¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

International Conference on Speech and Computer

2503 Accesses
3 Altmetric

Abstract

Automatic detection and demarcation of non-speech sounds in speech is critical for developing sophisticated human-machine interaction systems. The main objective of this study is to develop acoustic features capturing the production differences between speech and breath sounds in terms of both, excitation source and vocal tract system based characteristics. Using these features, a rule-based algorithm is proposed for automatic detection of breath sounds in spontaneous speech. The proposed algorithm outperforms the previous methods for detection of breath sounds in spontaneous speech. Further, the importance of breath detection for speaker recognition is analyzed by considering an i-vector-based speaker recognition system. Experimental results show that the detection of breath sounds, prior to i-vector extraction, is essential to nullify the effect of breath sounds occurring in test samples on speaker recognition, which otherwise will degrade the performance of i-vector-based speaker recognition systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A novel voice activity detection algorithm using modified global thresholding

Article 18 November 2020

An Unsupervised Voice Activity Detection Using Time-Frequency Features

A Novel Self-adaptive Voice Activity Detector Based on Robust Logistic Regression

References

Lei, B., Rahman, S.A., Song, I.: Content-based classification of breath sound with enhanced features. Neurocomputing 141, 139–147 (2014)
Article Google Scholar
Dumpala, S.H., Sridaran, K.V., Gangashetty, S.V., Yegnanarayana, B.: Analysis of laughter and speech-laugh signals using excitation source information. In: ICASSP, pp. 975–979 (2014)
Google Scholar
Drugman, T., Urbain, J., Dutoit, T.: Assessment of audio features for automatic cough detection. In: EUSIPCO, pp. 1289–1293 (2011)
Google Scholar
Dumpala, S.H., Gangamohan, P., Gangashetty, S.V., Yegnanarayana, B.: Use of vowels in discriminating speech-laugh from laughter and neutral speech. In: Interspeech, pp. 1437–1441 (2016)
Google Scholar
Ruinskiy, D., Lavner, Y.: An effective algorithm for automatic detection and exact demarcation of breath sounds in speech and song signals. IEEE Trans. Audio Speech Lang. Process. 15(3), 838–850 (2007)
Article Google Scholar
Zelasko, P., Jadczyk, T., Zilko, B.: HMM-based breath and filled pauses elimination in ASR. In: SIGMAP, pp. 255–260 (2014)
Google Scholar
Igras, M., Zilko, B.: Wavelet method for breath detection in audio signals. In: ICME, pp. 1–6 (2013)
Google Scholar
Godin, K.W., Hansen, J.H.: Physical task stress and speaker variability in voice quality. EURASIP J. Audio Speech Music Proc. 1, 1–13 (2015)
Google Scholar
Nakano, T., Ogata, J., Goto, M., Hiraga, Y.: Analysis and automatic detection of breath sounds in unaccompanied singing voice. In: ICMPC, pp. 387–390 (2008)
Google Scholar
Igras, M., Zilko, B.: Different types of pauses as a source of biometry. In: Models and Analysis of Vocal Emissions for Biomedical Applications, pp. 197–200 (2013)
Google Scholar
Rapcan, V., D’Arcy, S., Reilly, R.B.: Automatic breath sound detection and removal for cognitive studies of speech and language. In: ISSC, pp. 1–6 (2009)
Google Scholar
Janicki, A.: On the impact of non-speech sounds on speaker recognition. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 566–572. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32790-2_69
Chapter Google Scholar
Pitt, M.A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E., Fosler-Lussier, E.: Buckeye Corpus of Conversational Speech (2nd release). Department of Psychology, Ohio State University (Distributor), Columbus, OH (2007)
Google Scholar
Dumpala, S.H., Nellore, B.T., Nevali, R.R., Gangashetty, S.V., Yegnanarayana, B.: Robust features for sonorant segmentation in continuous speech. In: Interspeech, pp. 1987–1991 (2015)
Google Scholar
Dumpala, S.H., Nellore, B.T., Nevali, R.R., Gangashetty, S.V., Yegnanarayana, B.: Robust vowel landmark detection using epoch-based features. In: Interspeech, pp. 160–164 (2016)
Google Scholar
Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16, 1602–1613 (2008)
Article Google Scholar
Yegnanarayana, B., Dhananjaya, N.G.: Spectro-temporal analysis of speech signals using zero-time windowing and group delay function. Speech Commun. 55(6), 782–795 (2013)
Article Google Scholar
Hirose, H.: Investigating the physiology of laryngeal structures. In: The Handbook of Phonetic Sciences, Cambridge, pp. 116–136 (1995)
Google Scholar
Brookes, M., et al.: Voicebox: Speech processing toolbox for Matlab (2011). www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
Voice Biometry Standardization (VBS) (2015). http://voicebiometry.org/
Dumpala, S.H., Kopparapu, S.K.: Improved speaker recognition system for stressed speech using deep neural networks. In: IJCNN, pp. 1257–1264 (2017)
Google Scholar

Download references

Acknowledgments

The authors would like to thank Dr. Sunil Kumar Kopparapu, of TCS Innovation Labs - Mumbai, for providing his critical comments and suggestions which helped improve the content of this paper.

Author information

Authors and Affiliations

TCS Innovation Labs, Mumbai, India
Sri Harsha Dumpala
International Institute of Information Technology, Hyderabad, India
K. N. R. K. Raju Alluri

Authors

Sri Harsha Dumpala
View author publications
You can also search for this author in PubMed Google Scholar
K. N. R. K. Raju Alluri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sri Harsha Dumpala .

Editor information

Editors and Affiliations

SPIIRAS, Saint Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Hertfordshire, Hatfield, United Kingdom
Iosif Mporas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dumpala, S.H., Alluri, K.N.R.K.R. (2017). An Algorithm for Detection of Breath Sounds in Spontaneous Speech with Application to Speaker Recognition. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-66429-3_9
Published: 13 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics