Nonlinear multi-scale decomposition by EMD for Co-Channel speaker identification

Ghezaiel, Wajdi; Ben Slimane, Amel; Ben Braiek, Ezzedine

doi:10.1007/s11042-016-4044-4

Nonlinear multi-scale decomposition by EMD for Co-Channel speaker identification

Published: 17 October 2016

Volume 76, pages 20973–20988, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Wajdi Ghezaiel¹,
Amel Ben Slimane² &
Ezzedine Ben Braiek¹

243 Accesses
7 Citations
Explore all metrics

Abstract

A multi-scale analysis method, called Empirical Mode Decomposition (EMD), has been proposed for analysis of nonlinear and non stationary data. The empirical mode decomposition is a method initiated by Huang et al. as an alternative technique to the traditional Fourier and wavelet techniques for examining signals. It decomposes a signal into several components called intrinsic mode functions. This paper deals with this new tool to detect usable speech in co-channel speech. We applied empirical mode decomposition to decompose the co-channel speech signal into intrinsic oscillatory modes. Detected usable speech segments are organized into speaker streams, which are applied to speaker identification system. The system is evaluated on co-channel speech across various Targets to Interferer Ratios (TIR). Performance evaluation has shown that empirical mode decomposition performs better than linear multi-scale decomposition by discrete wavelet for usable speech detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Chinese dialect speech recognition: a comprehensive survey

Article Open access 31 January 2024

EEG Spectral Analysis

References

Brett Y. Smolenski and Ravi P. Ramachandran (2011) Usable Speech processing: a filterless approach in the presence of interference. IEEE CIRCUITS AND SYSTEMS MAGAZINE
Carlson BA, Clements MA (1991) A computationally compact divergence measure for speech processing. IEEE Transactions on Pattern Analysis Machine Intelligence 13:1–6
Article Google Scholar
Daqrouq K (2011) Wavelet entropy and neural network for text-independent speaker identification. Eng Appl Artif Intell 24:796–802
Article Google Scholar
Flandrin P, Rilling G, Goncalves P (2004) Empirical mode decomposition as a filter bank. IEEE Signal Process Letters 11:112–114
Article Google Scholar
Ghezaiel W, Ben Slimane A, and Braiek EB (2010) Usable speech detection for speaker identification system under co-channel conditions. International conference on electrical system and automatic control JTEA 2010 Tunisia
Ghezaiel W, Slimane AB, Braiek EB (2011) Evaluation of a multi-resolution dyadic wavelet transform method for usable speech detection. World Academy of Science, Engineering and Technology Journal WASET 5(7):851–855
Google Scholar
Ghezaiel W, Slimane AB, Braiek EB (2012) Usable speech assignment for speaker identification under Co-Channel situation. Int J Comput Appl 59(18):7–11
Google Scholar
Ghezaiel W, Slimane AB, BRAIEK EB (2013a) Usable speech detection based on empirical mode decomposition. IET Electronic Letters 49(7):503–504
Article Google Scholar
Ghezaiel W, Slimane AB, BRAIEK EB (2013b) Multi-Resolution Analysis by Empirical Mode Decomposition for Usable Speech Detection. International Multi-Conference on Systems, Signals & Devices, Conference on Communication & Signal Processing, SSD 2013 Tunisia
Ghezaiel W, Slimane AB, Braiek EB (2013c) Improved EMD usable speech detection for co-channel speaker identification. Lecture Notes in Computer Science, Advances in Non-Linear Speech Processing 7911:184–191
Article Google Scholar
Ghoraani B, Krishnan S (2011) Time-frequency matrix feature extraction and classification of environmental audio signals. Audio, Speech, and Language Processing, IEEE Transactions on 19(7):2197–2209
Article Google Scholar
Hershey JR, Rennie SJ, Olsen PA, Kristjansson TT (2010) Super-human multi-talker speech recognition: a graphical model approach. Comput Speech Lang 24(1):45–66
Article Google Scholar
Huang NE, Shen Z, Long SR et al (1998) The empirical mode decomposition and Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc R Soc Lond A 454:903–995
Article MathSciNet MATH Google Scholar
Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Comm 52(1):12–40
Article Google Scholar
Kinnunen T, Karpov E, Franti P (2006) Real-time speaker identification and verification. Audio, Speech, and Language Processing, IEEE Transactions on 14(1):277–288
Article MATH Google Scholar
Kizhanatham A and Yantorno RE (2003) Peak Difference Autocorrelation of Wavelet Transform Algorithm Based Usable Speech Measure. 7th World Multi-conference on Systemic, Cybernetics, and Informatics
Krishnamachari KR, Yantorno RE, Benincasa DS and Wenndt SJ (2000) Spectral autocorrelation ratio as a usability measure of speech segments under co-channel conditions. IEEE International Symposium Intelligent Sig. Process. and Comm Sys
Kullback S (1968) Information theory and statistics. Dover Publications, New York
MATH Google Scholar
Lovekin J, Yantorno RE, Benincasa S, Wenndt S and Huggins M (2001) Developing usable speech criteria for speaker identification. Proc. ICASSP pp. 421–424
Morgan DP, George EB, Lee LT, Kay SM (1997) Co-channel speaker separation by harmonic enhancement and suppression. IEEE Transactions on Speech and Audio Processing 5:407–424
Article Google Scholar
Naylor JA, Porter J (1991) An effective speech separation system which requires no a priori information. Proc IEEE ICASSP 937–940. doi:10.1109/ICASSP.1991.150494
Quatieri TF, Danisewicz RG (1990) An approach to co-channel talker interference suppression using a sinusoidal model for speech. IEEE Trans Acoust Speech Signal Process 38(1):56–69
Article Google Scholar
Reynolds DA (1995) Speaker identification and verification using Gaussian mixture speaker models. Speech Comm 17:91–108
Article Google Scholar
Saeidi R, Mowlaee P, Kinnunen T, Tan ZH, Christensen MG, Franti P, and Jensen SH (2010) Signal-to-signal ratio independent speaker identification for co-channel speech signals. Proc IEEE Int Conf Pattern Recognition, pp. 4545–4548
Shao Y, Wang DL (2003) Co-channel speaker identification using usable speech extraction based on multi-pitch tracking. Proceedings of ICASSP-03 II:205–208
Google Scholar
Wu J-D, Tsai Y-J (2011) Speaker identification system using empirical mode decomposition and an artificial neural network. Expert Syst Appl 38(5):6112–6117
Article Google Scholar
Yantorno, R. E (2008) Method for improving speaker identification by determining usable speech. Journal of the Acoustical Society of America. 124
Zissman, M. A., and Seward, D. C. (1991) Two-talker pitch tracking for co-channel talker interference suppression. Technical Report, MIT Lincoln Laboratory

Download references

Author information

Authors and Affiliations

CEREP- ENSIT, University of Tunis , Tunis, Tunisia
Wajdi Ghezaiel & Ezzedine Ben Braiek
ENSI, University of Manouba , Manouba, Tunisia
Amel Ben Slimane

Authors

Wajdi Ghezaiel
View author publications
You can also search for this author in PubMed Google Scholar
Amel Ben Slimane
View author publications
You can also search for this author in PubMed Google Scholar
Ezzedine Ben Braiek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wajdi Ghezaiel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghezaiel, W., Ben Slimane, A. & Ben Braiek, E. Nonlinear multi-scale decomposition by EMD for Co-Channel speaker identification. Multimed Tools Appl 76, 20973–20988 (2017). https://doi.org/10.1007/s11042-016-4044-4

Download citation

Received: 11 August 2015
Revised: 27 September 2016
Accepted: 05 October 2016
Published: 17 October 2016
Issue Date: October 2017
DOI: https://doi.org/10.1007/s11042-016-4044-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonlinear multi-scale decomposition by EMD for Co-Channel speaker identification

Abstract

Access this article

Similar content being viewed by others

Speech Emotion Recognition: A Comprehensive Survey

Chinese dialect speech recognition: a comprehensive survey

EEG Spectral Analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Nonlinear multi-scale decomposition by EMD for Co-Channel speaker identification

Abstract

Access this article

Similar content being viewed by others

Speech Emotion Recognition: A Comprehensive Survey

Chinese dialect speech recognition: a comprehensive survey

EEG Spectral Analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation