Linear Versus Nonlinear Multi-scale Decomposition for Co-channel Speaker Identification System

Ghezaiel, Wajdi; Ben Slimane, Amel; Ben Braiek, Ezzedine

doi:10.1007/978-3-319-28109-4_17

Wajdi Ghezaiel¹⁰,
Amel Ben Slimane¹¹ &
Ezzedine Ben Braiek¹⁰

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 48))

842 Accesses

Abstract

Co-channel speech is a combination of speech utterances over a single communication channel. Traditional approach to co-channel speech processing is to attempt to extract the speech of the speaker of interest (target speech) from other (interfering) speech. Usable speech criteria are proposed to extract minimally corrupted speech for speaker identification in co-channel speech. In this paper, we present usable speech extraction method based on pitch information obtained from linear multi-scale decomposition by dyadic wavelet transform and nonlinear multi-scale decomposition by empirical mode decomposition. Detected usable speech are organized into speaker stream, and applied to speaker identification system. The proposed methods are evaluated and compared across various Target to Interferer Ratio (TIR) for speaker identification system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Nonlinear multi-scale decomposition by EMD for Co-Channel speaker identification

Article 17 October 2016

Unsupervised speech separation by detecting speaker changeover points under single channel condition

Article 03 August 2021

Combining Evidences from Mel Cepstral and Cochlear Cepstral Features for Speaker Recognition Using Whispered Speech

References

Quatieri, Danisewicz: An approach to co-channel talker interference suppression using a sinusoidal model for speech. IEEE Trans. Acoust. Speech Sig. Process. 38, 56–69 (1990)
Google Scholar
Lovekin, J., Yantorno, R.E., Benincasa, S., Wenndt, S., Huggins, M.: Developing usable speech criteria for speaker identification. Proc. ICASSP 421–424 (2001)
Google Scholar
Yantorno, R.E.: Method for improving speaker identification by determining usable speech. J. Acoust. Soc. Am. 124 (2008)
Google Scholar
Krishnamachari, K.R., Yantorno, R.E., Benincasa, D.S., Wenndt, S.J.: Spectral autocorrelation ratio as a usability measure of speech segments under cochannel conditions. IEEE Int. Symp. Intell. Sig. Process. Commun. Syst. (2000)
Google Scholar
Lovekin, J., Krishnamachari, K.R., Yantorno, R.E., Benincasa, D.S., Wenndt, S.J.: Adjacent pitch period comparison (APPC) as a usability measure of speech segments under cochannel conditions. IEEE Intell. Sig. Process. Commun. Syst. 139–142 (2001)
Google Scholar
Smolenski, B.Y., Ramachandran, R.P.: Usable speech processing: a filterless approach in the presence of interference. IEEE Circuits Syst. Mag. (2011)
Google Scholar
Kizhanatham, Yantorno, R.E.: Peak difference autocorrelation of wavelet transform algorithm based usable speech measure, 7th world multi-conference on systemic, cybernetics, and informatics, (2003)
Google Scholar
Ghezaiel, W., Ben Slimane, A., Ben Braiek, E.: Usable speech detection for speaker identification system under co-channel conditions, international conference on electrical system and automatic control JTEA, Tunisia (2010)
Google Scholar
Ghezaiel, W., Ben Slimane, A., Ben Braiek, E.: Evaluation of a multi-resolution dyadic wavelet transform method for usable speech detection. World Acad. Sci. Eng. Technol. J. WASET 829–833 (2011). pISSN:2010-376X, eISSN:2010-3778
Google Scholar
Huang, N.E., Shen, Z., Long, S.R., et al.: The empirical mode decomposition and Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. A 454, 903–995 (1998)
Google Scholar
Flandrin, P., Rilling, G., Goncalves, P.: Empirical mode decomposition as a filter bank. IEEE Sig. Process. Lett. 11(2), 112114 (2004)
Google Scholar
Ghezaiel, W., Ben Slimane, A., Ben Braiek, E.: Usable speech detection based on empirical mode decomposition. IET Electron. Lett. 49(7), (2013)
Google Scholar
Ghezaiel, W., Ben Slimane, A., Ben Braiek, E.: Multi-resolution analysis by empirical mode decomposition for usable speech detection, international multi-conference on systems, signals devices, conference on communication signal processing, SSD, Tunisia (2013)
Google Scholar
Ghezaiel, W., Ben Slimane, A., Ben Braiek, E.: Improved, E.M.D., usable speech detection for co-channel speaker identification. Lecture Notes in Computer Science, Advances in Non-Linear Speech Processing, 7911, pp. 184–191; International Conference: Non Linear Speech Processing 2013, (NOLISP 2013). Mons, Belgium (2013)
Google Scholar
Hess, W.H.: Pitch Determination of Speech Signal: Algorithms and Devices. Springer, Heidelberg (1983)
Google Scholar
Ghezaiel, W., Ben Slimane, A., Ben Braiek, E.: Usable speech assignment for speaker identification under co-channel situation. Int. J. Comput. Appl. 59(18), 7–11, (2012)
Google Scholar
Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Commun. 17, 91108 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

CEREP, ENSIT University of Tunis, Tunis, Tunisia
Wajdi Ghezaiel & Ezzedine Ben Braiek
ENSI University of Mannouba, Mannouba, Tunisia
Amel Ben Slimane

Authors

Wajdi Ghezaiel
View author publications
You can also search for this author in PubMed Google Scholar
Amel Ben Slimane
View author publications
You can also search for this author in PubMed Google Scholar
Ezzedine Ben Braiek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wajdi Ghezaiel .

Editor information

Editors and Affiliations

Department of Psychology, Seconda Università di Napoli and IIASS, Caserta, Italy
Anna Esposito
(Pompeu Fabra University), Escola Superior Politècnica Tecnocampus, Mataró, Spain
Marcos Faundez-Zanuy
sezione di Napoli Osservatorio, Istituto Nazionale di Geofisica e Vulcan, Napoli, Italy
Antonietta M. Esposito
Department of Psychology, Seconda Universita di Napoli and IIASS, Caserta, Italy
Gennaro Cordasco
Boulevard Dolez, University of Mons, TCTS Lab.31, Mons, Belgium
Thomas Drugman
Data and Signal Processing Research Grou, University of Vic, Vic, Spain
Jordi Solé-Casals
NeuroLab, Università degli Studi "Mediterranea" di, Reggio Calabria, Italy
Francesco Carlo Morabito

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ghezaiel, W., Ben Slimane, A., Ben Braiek, E. (2016). Linear Versus Nonlinear Multi-scale Decomposition for Co-channel Speaker Identification System. In: Esposito, A., et al. Recent Advances in Nonlinear Speech Processing. Smart Innovation, Systems and Technologies, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-319-28109-4_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-28109-4_17
Published: 23 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28107-0
Online ISBN: 978-3-319-28109-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics