Abstract
A multi-scale analysis method, called Empirical Mode Decomposition (EMD), has been proposed for analysis of nonlinear and non stationary data. The empirical mode decomposition is a method initiated by Huang et al. as an alternative technique to the traditional Fourier and wavelet techniques for examining signals. It decomposes a signal into several components called intrinsic mode functions. This paper deals with this new tool to detect usable speech in co-channel speech. We applied empirical mode decomposition to decompose the co-channel speech signal into intrinsic oscillatory modes. Detected usable speech segments are organized into speaker streams, which are applied to speaker identification system. The system is evaluated on co-channel speech across various Targets to Interferer Ratios (TIR). Performance evaluation has shown that empirical mode decomposition performs better than linear multi-scale decomposition by discrete wavelet for usable speech detection.
Similar content being viewed by others
References
Brett Y. Smolenski and Ravi P. Ramachandran (2011) Usable Speech processing: a filterless approach in the presence of interference. IEEE CIRCUITS AND SYSTEMS MAGAZINE
Carlson BA, Clements MA (1991) A computationally compact divergence measure for speech processing. IEEE Transactions on Pattern Analysis Machine Intelligence 13:1–6
Daqrouq K (2011) Wavelet entropy and neural network for text-independent speaker identification. Eng Appl Artif Intell 24:796–802
Flandrin P, Rilling G, Goncalves P (2004) Empirical mode decomposition as a filter bank. IEEE Signal Process Letters 11:112–114
Ghezaiel W, Ben Slimane A, and Braiek EB (2010) Usable speech detection for speaker identification system under co-channel conditions. International conference on electrical system and automatic control JTEA 2010 Tunisia
Ghezaiel W, Slimane AB, Braiek EB (2011) Evaluation of a multi-resolution dyadic wavelet transform method for usable speech detection. World Academy of Science, Engineering and Technology Journal WASET 5(7):851–855
Ghezaiel W, Slimane AB, Braiek EB (2012) Usable speech assignment for speaker identification under Co-Channel situation. Int J Comput Appl 59(18):7–11
Ghezaiel W, Slimane AB, BRAIEK EB (2013a) Usable speech detection based on empirical mode decomposition. IET Electronic Letters 49(7):503–504
Ghezaiel W, Slimane AB, BRAIEK EB (2013b) Multi-Resolution Analysis by Empirical Mode Decomposition for Usable Speech Detection. International Multi-Conference on Systems, Signals & Devices, Conference on Communication & Signal Processing, SSD 2013 Tunisia
Ghezaiel W, Slimane AB, Braiek EB (2013c) Improved EMD usable speech detection for co-channel speaker identification. Lecture Notes in Computer Science, Advances in Non-Linear Speech Processing 7911:184–191
Ghoraani B, Krishnan S (2011) Time-frequency matrix feature extraction and classification of environmental audio signals. Audio, Speech, and Language Processing, IEEE Transactions on 19(7):2197–2209
Hershey JR, Rennie SJ, Olsen PA, Kristjansson TT (2010) Super-human multi-talker speech recognition: a graphical model approach. Comput Speech Lang 24(1):45–66
Huang NE, Shen Z, Long SR et al (1998) The empirical mode decomposition and Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc R Soc Lond A 454:903–995
Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Comm 52(1):12–40
Kinnunen T, Karpov E, Franti P (2006) Real-time speaker identification and verification. Audio, Speech, and Language Processing, IEEE Transactions on 14(1):277–288
Kizhanatham A and Yantorno RE (2003) Peak Difference Autocorrelation of Wavelet Transform Algorithm Based Usable Speech Measure. 7th World Multi-conference on Systemic, Cybernetics, and Informatics
Krishnamachari KR, Yantorno RE, Benincasa DS and Wenndt SJ (2000) Spectral autocorrelation ratio as a usability measure of speech segments under co-channel conditions. IEEE International Symposium Intelligent Sig. Process. and Comm Sys
Kullback S (1968) Information theory and statistics. Dover Publications, New York
Lovekin J, Yantorno RE, Benincasa S, Wenndt S and Huggins M (2001) Developing usable speech criteria for speaker identification. Proc. ICASSP pp. 421–424
Morgan DP, George EB, Lee LT, Kay SM (1997) Co-channel speaker separation by harmonic enhancement and suppression. IEEE Transactions on Speech and Audio Processing 5:407–424
Naylor JA, Porter J (1991) An effective speech separation system which requires no a priori information. Proc IEEE ICASSP 937–940. doi:10.1109/ICASSP.1991.150494
Quatieri TF, Danisewicz RG (1990) An approach to co-channel talker interference suppression using a sinusoidal model for speech. IEEE Trans Acoust Speech Signal Process 38(1):56–69
Reynolds DA (1995) Speaker identification and verification using Gaussian mixture speaker models. Speech Comm 17:91–108
Saeidi R, Mowlaee P, Kinnunen T, Tan ZH, Christensen MG, Franti P, and Jensen SH (2010) Signal-to-signal ratio independent speaker identification for co-channel speech signals. Proc IEEE Int Conf Pattern Recognition, pp. 4545–4548
Shao Y, Wang DL (2003) Co-channel speaker identification using usable speech extraction based on multi-pitch tracking. Proceedings of ICASSP-03 II:205–208
Wu J-D, Tsai Y-J (2011) Speaker identification system using empirical mode decomposition and an artificial neural network. Expert Syst Appl 38(5):6112–6117
Yantorno, R. E (2008) Method for improving speaker identification by determining usable speech. Journal of the Acoustical Society of America. 124
Zissman, M. A., and Seward, D. C. (1991) Two-talker pitch tracking for co-channel talker interference suppression. Technical Report, MIT Lincoln Laboratory
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ghezaiel, W., Ben Slimane, A. & Ben Braiek, E. Nonlinear multi-scale decomposition by EMD for Co-Channel speaker identification. Multimed Tools Appl 76, 20973–20988 (2017). https://doi.org/10.1007/s11042-016-4044-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-4044-4