Skip to main content
Log in

Nonlinear multi-scale decomposition by EMD for Co-Channel speaker identification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

A multi-scale analysis method, called Empirical Mode Decomposition (EMD), has been proposed for analysis of nonlinear and non stationary data. The empirical mode decomposition is a method initiated by Huang et al. as an alternative technique to the traditional Fourier and wavelet techniques for examining signals. It decomposes a signal into several components called intrinsic mode functions. This paper deals with this new tool to detect usable speech in co-channel speech. We applied empirical mode decomposition to decompose the co-channel speech signal into intrinsic oscillatory modes. Detected usable speech segments are organized into speaker streams, which are applied to speaker identification system. The system is evaluated on co-channel speech across various Targets to Interferer Ratios (TIR). Performance evaluation has shown that empirical mode decomposition performs better than linear multi-scale decomposition by discrete wavelet for usable speech detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Brett Y. Smolenski and Ravi P. Ramachandran (2011) Usable Speech processing: a filterless approach in the presence of interference. IEEE CIRCUITS AND SYSTEMS MAGAZINE

  2. Carlson BA, Clements MA (1991) A computationally compact divergence measure for speech processing. IEEE Transactions on Pattern Analysis Machine Intelligence 13:1–6

    Article  Google Scholar 

  3. Daqrouq K (2011) Wavelet entropy and neural network for text-independent speaker identification. Eng Appl Artif Intell 24:796–802

    Article  Google Scholar 

  4. Flandrin P, Rilling G, Goncalves P (2004) Empirical mode decomposition as a filter bank. IEEE Signal Process Letters 11:112–114

    Article  Google Scholar 

  5. Ghezaiel W, Ben Slimane A, and Braiek EB (2010) Usable speech detection for speaker identification system under co-channel conditions. International conference on electrical system and automatic control JTEA 2010 Tunisia

  6. Ghezaiel W, Slimane AB, Braiek EB (2011) Evaluation of a multi-resolution dyadic wavelet transform method for usable speech detection. World Academy of Science, Engineering and Technology Journal WASET 5(7):851–855

    Google Scholar 

  7. Ghezaiel W, Slimane AB, Braiek EB (2012) Usable speech assignment for speaker identification under Co-Channel situation. Int J Comput Appl 59(18):7–11

    Google Scholar 

  8. Ghezaiel W, Slimane AB, BRAIEK EB (2013a) Usable speech detection based on empirical mode decomposition. IET Electronic Letters 49(7):503–504

    Article  Google Scholar 

  9. Ghezaiel W, Slimane AB, BRAIEK EB (2013b) Multi-Resolution Analysis by Empirical Mode Decomposition for Usable Speech Detection. International Multi-Conference on Systems, Signals & Devices, Conference on Communication & Signal Processing, SSD 2013 Tunisia

  10. Ghezaiel W, Slimane AB, Braiek EB (2013c) Improved EMD usable speech detection for co-channel speaker identification. Lecture Notes in Computer Science, Advances in Non-Linear Speech Processing 7911:184–191

    Article  Google Scholar 

  11. Ghoraani B, Krishnan S (2011) Time-frequency matrix feature extraction and classification of environmental audio signals. Audio, Speech, and Language Processing, IEEE Transactions on 19(7):2197–2209

    Article  Google Scholar 

  12. Hershey JR, Rennie SJ, Olsen PA, Kristjansson TT (2010) Super-human multi-talker speech recognition: a graphical model approach. Comput Speech Lang 24(1):45–66

    Article  Google Scholar 

  13. Huang NE, Shen Z, Long SR et al (1998) The empirical mode decomposition and Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc R Soc Lond A 454:903–995

    Article  MathSciNet  MATH  Google Scholar 

  14. Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Comm 52(1):12–40

    Article  Google Scholar 

  15. Kinnunen T, Karpov E, Franti P (2006) Real-time speaker identification and verification. Audio, Speech, and Language Processing, IEEE Transactions on 14(1):277–288

    Article  MATH  Google Scholar 

  16. Kizhanatham A and Yantorno RE (2003) Peak Difference Autocorrelation of Wavelet Transform Algorithm Based Usable Speech Measure. 7th World Multi-conference on Systemic, Cybernetics, and Informatics

  17. Krishnamachari KR, Yantorno RE, Benincasa DS and Wenndt SJ (2000) Spectral autocorrelation ratio as a usability measure of speech segments under co-channel conditions. IEEE International Symposium Intelligent Sig. Process. and Comm Sys

  18. Kullback S (1968) Information theory and statistics. Dover Publications, New York

    MATH  Google Scholar 

  19. Lovekin J, Yantorno RE, Benincasa S, Wenndt S and Huggins M (2001) Developing usable speech criteria for speaker identification. Proc. ICASSP pp. 421–424

  20. Morgan DP, George EB, Lee LT, Kay SM (1997) Co-channel speaker separation by harmonic enhancement and suppression. IEEE Transactions on Speech and Audio Processing 5:407–424

    Article  Google Scholar 

  21. Naylor JA, Porter J (1991) An effective speech separation system which requires no a priori information. Proc IEEE ICASSP 937–940. doi:10.1109/ICASSP.1991.150494

  22. Quatieri TF, Danisewicz RG (1990) An approach to co-channel talker interference suppression using a sinusoidal model for speech. IEEE Trans Acoust Speech Signal Process 38(1):56–69

    Article  Google Scholar 

  23. Reynolds DA (1995) Speaker identification and verification using Gaussian mixture speaker models. Speech Comm 17:91–108

    Article  Google Scholar 

  24. Saeidi R, Mowlaee P, Kinnunen T, Tan ZH, Christensen MG, Franti P, and Jensen SH (2010) Signal-to-signal ratio independent speaker identification for co-channel speech signals. Proc IEEE Int Conf Pattern Recognition, pp. 4545–4548

  25. Shao Y, Wang DL (2003) Co-channel speaker identification using usable speech extraction based on multi-pitch tracking. Proceedings of ICASSP-03 II:205–208

    Google Scholar 

  26. Wu J-D, Tsai Y-J (2011) Speaker identification system using empirical mode decomposition and an artificial neural network. Expert Syst Appl 38(5):6112–6117

    Article  Google Scholar 

  27. Yantorno, R. E (2008) Method for improving speaker identification by determining usable speech. Journal of the Acoustical Society of America. 124

  28. Zissman, M. A., and Seward, D. C. (1991) Two-talker pitch tracking for co-channel talker interference suppression. Technical Report, MIT Lincoln Laboratory

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wajdi Ghezaiel.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghezaiel, W., Ben Slimane, A. & Ben Braiek, E. Nonlinear multi-scale decomposition by EMD for Co-Channel speaker identification. Multimed Tools Appl 76, 20973–20988 (2017). https://doi.org/10.1007/s11042-016-4044-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-4044-4

Keywords

Navigation