Abstract
Automatic detection and demarcation of non-speech sounds in speech is critical for developing sophisticated human-machine interaction systems. The main objective of this study is to develop acoustic features capturing the production differences between speech and breath sounds in terms of both, excitation source and vocal tract system based characteristics. Using these features, a rule-based algorithm is proposed for automatic detection of breath sounds in spontaneous speech. The proposed algorithm outperforms the previous methods for detection of breath sounds in spontaneous speech. Further, the importance of breath detection for speaker recognition is analyzed by considering an i-vector-based speaker recognition system. Experimental results show that the detection of breath sounds, prior to i-vector extraction, is essential to nullify the effect of breath sounds occurring in test samples on speaker recognition, which otherwise will degrade the performance of i-vector-based speaker recognition systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lei, B., Rahman, S.A., Song, I.: Content-based classification of breath sound with enhanced features. Neurocomputing 141, 139–147 (2014)
Dumpala, S.H., Sridaran, K.V., Gangashetty, S.V., Yegnanarayana, B.: Analysis of laughter and speech-laugh signals using excitation source information. In: ICASSP, pp. 975–979 (2014)
Drugman, T., Urbain, J., Dutoit, T.: Assessment of audio features for automatic cough detection. In: EUSIPCO, pp. 1289–1293 (2011)
Dumpala, S.H., Gangamohan, P., Gangashetty, S.V., Yegnanarayana, B.: Use of vowels in discriminating speech-laugh from laughter and neutral speech. In: Interspeech, pp. 1437–1441 (2016)
Ruinskiy, D., Lavner, Y.: An effective algorithm for automatic detection and exact demarcation of breath sounds in speech and song signals. IEEE Trans. Audio Speech Lang. Process. 15(3), 838–850 (2007)
Zelasko, P., Jadczyk, T., Zilko, B.: HMM-based breath and filled pauses elimination in ASR. In: SIGMAP, pp. 255–260 (2014)
Igras, M., Zilko, B.: Wavelet method for breath detection in audio signals. In: ICME, pp. 1–6 (2013)
Godin, K.W., Hansen, J.H.: Physical task stress and speaker variability in voice quality. EURASIP J. Audio Speech Music Proc. 1, 1–13 (2015)
Nakano, T., Ogata, J., Goto, M., Hiraga, Y.: Analysis and automatic detection of breath sounds in unaccompanied singing voice. In: ICMPC, pp. 387–390 (2008)
Igras, M., Zilko, B.: Different types of pauses as a source of biometry. In: Models and Analysis of Vocal Emissions for Biomedical Applications, pp. 197–200 (2013)
Rapcan, V., D’Arcy, S., Reilly, R.B.: Automatic breath sound detection and removal for cognitive studies of speech and language. In: ISSC, pp. 1–6 (2009)
Janicki, A.: On the impact of non-speech sounds on speaker recognition. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 566–572. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32790-2_69
Pitt, M.A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E., Fosler-Lussier, E.: Buckeye Corpus of Conversational Speech (2nd release). Department of Psychology, Ohio State University (Distributor), Columbus, OH (2007)
Dumpala, S.H., Nellore, B.T., Nevali, R.R., Gangashetty, S.V., Yegnanarayana, B.: Robust features for sonorant segmentation in continuous speech. In: Interspeech, pp. 1987–1991 (2015)
Dumpala, S.H., Nellore, B.T., Nevali, R.R., Gangashetty, S.V., Yegnanarayana, B.: Robust vowel landmark detection using epoch-based features. In: Interspeech, pp. 160–164 (2016)
Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16, 1602–1613 (2008)
Yegnanarayana, B., Dhananjaya, N.G.: Spectro-temporal analysis of speech signals using zero-time windowing and group delay function. Speech Commun. 55(6), 782–795 (2013)
Hirose, H.: Investigating the physiology of laryngeal structures. In: The Handbook of Phonetic Sciences, Cambridge, pp. 116–136 (1995)
Brookes, M., et al.: Voicebox: Speech processing toolbox for Matlab (2011). www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
Voice Biometry Standardization (VBS) (2015). http://voicebiometry.org/
Dumpala, S.H., Kopparapu, S.K.: Improved speaker recognition system for stressed speech using deep neural networks. In: IJCNN, pp. 1257–1264 (2017)
Acknowledgments
The authors would like to thank Dr. Sunil Kumar Kopparapu, of TCS Innovation Labs - Mumbai, for providing his critical comments and suggestions which helped improve the content of this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Dumpala, S.H., Alluri, K.N.R.K.R. (2017). An Algorithm for Detection of Breath Sounds in Spontaneous Speech with Application to Speaker Recognition. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-66429-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)