Enhancing Speaker Discrimination at the Feature Level

Koreman, Jacques; Wu, Dalei; Morris, Andrew C.

doi:10.1007/978-3-540-74200-5_15

Jacques Koreman¹,
Dalei Wu¹ &
Andrew C. Morris²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4343))

2457 Accesses
3 Citations

Abstract

This chapter describes a method for enhancing the differences between speaker classes at the feature level (feature enhancement) in an automatic speaker recognition system. The original Mel-frequency cepstral coefficient (MFCC) space is projected onto a new feature space by a neural network trained on a subset of speakers which is representative for the whole target population. The new feature space better discriminates between the target classes (speakers) than the original feature space. The chapter focuses on the method for selecting a representative subset of speakers, comparing several approaches to speaker selection. The effect of feature enhancement is tested both for clean and various noisy speech types to evaluate its applicability under practical conditions. It is shown that the proposed method leads to a substantial improvement in speaker recognition performance. The method can also be applied to other automatic speaker classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Text-Independent Speaker Recognition System Using Feature-Level Fusion for Audio Databases of Various Sizes

Article Open access 18 July 2023

Speaker Identification Using MFCC Feature Extraction ANN Classification Technique

Article 01 May 2024

Using combined features to improve speaker verification in the face of limited reverberant data

Article 01 September 2023

References

Laver, J.: The Phonetic Description of Voice Quality. Cambridge University Press, Cambridge (1980)
Google Scholar
Dellwo, V., Huckvale, M., Ashby, M.: How is individuality expressed in voice? An introduction to speech production & description for speaker classification. In: Müller, C. (ed.) Speaker Classification I. LNCS(LNAI), vol. 4343, Springer, Heidelberg (this issue, 2007)
Google Scholar
Konig, Y., Heck, L., Weintraub, M., Sonmez, K.: Nonlinear discriminant feature extraction for robust text-independent speaker recognition. In: Proceedings of RLA2C, ESCA workshop on Speaker Recognition and its Commercial and Forensic Applications, Avignon, France, pp. 72–75 (1998)
Google Scholar
Heck, L., Konig, Y., Kemal Sönmez, M., Weintraub, M.: Robustness to telephone handset distortion in speaker recognition by discriminative feature design. Speech Communication 31, 181–192 (2000)
Article Google Scholar
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L., Zue, V.: TIMIT Acoustic-Phonetic Continuous Speech Corpus. Linguistic Data Consortium (1993)
Google Scholar
Varga, A., Steeneken, H.J.M.: Assesment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication 12, 247–252 (1993)
Article Google Scholar
ITU recommendation P.56: Objective measurement of active speech level (March 1993)
Google Scholar
Fisher, W.M., Doddington, G.R., Goudie-Mashall, K.M., Jankowski, C., Kalyanswamy, A., Basson, S., Spitz, J.: NTIMIT. Linguistic Data Consortium (1993)
Google Scholar
Sturim, D.E., Campbell, W.M., Reynolds, D.A.: Classification methods for speaker recognition. In: Müller, C. (ed.) Speaker Classification I. LNCS(LNAI), vol. 4343, Springer, Heidelberg (this issue, 2007)
Google Scholar
Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Communication 17, 91–108 (1995)
Article Google Scholar
Reynolds, D.A., Zissman, M.A., Quatieri, T.F., O’Leary, G.C., Carlson, B.A.: The effect of telephone transmission degradations on speaker recognition performance. In: Proceedings ICASSP 1995, Detroit, Michigan, pp. 329–332 (1995)
Google Scholar
Collobert, R., Bengio, S., Mariéthoz, J.: Torch: a modular machine learning software library. Technical Report IDIAP-RR 02-46 (2002)
Google Scholar
Sharma, S., Ellis, D., Kajarekar, S., Jain, P., Hermansky, H.: Feature extraction using non-linear transformation for robust speech recognition on the Aurora database. In: Proceedings ICASSP2000, Istanbul, Turkey, pp. 1117–1120 (2000)
Google Scholar
Wu, D., Morris, A.C., Koreman, J.: MLP internal representation as discriminative features for improved speaker recognition. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds.) NOLISP 2005. LNCS (LNAI), vol. 3817, pp. 72–80. Springer, Heidelberg (2006)
Chapter Google Scholar
Morris, A.C., Wu, D., Koreman, J.: MLP trained to separate problem speakers provides improved features for speaker identification. In: ICCST 2005. Proceedings IEEE Int. Carnahan Conf. on Security Technology, Las Palmas, Spain (2005)
Google Scholar
Wu, D.: Discriminative Preprocessing of Speech: Towards Improving Biometric Identification. Ph.D. thesis Saarland University, Saarbrücken, Germany (2007)
Google Scholar
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 2nd edn. Elsevier Academic Press, Amsterdam (2003)
Google Scholar
Reynolds, D.A.: Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing 2, 639–643 (1994)
Article Google Scholar
Gillick, L., Cox, S.J.: Some statistical issues in the comparison of speech recognition algorithms. In: Proceedings ICASSP, Glasgow, pp. 532–535 (1989)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Language and Communication Studies, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
Jacques Koreman & Dalei Wu
SpinVox Ltd., Wethered House, Pound Lane, Marlow, Bucks, SL7 2AF, United Kingdom
Andrew C. Morris

Authors

Jacques Koreman
View author publications
You can also search for this author in PubMed Google Scholar
Dalei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Andrew C. Morris
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Christian Müller

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Koreman, J., Wu, D., Morris, A.C. (2007). Enhancing Speaker Discrimination at the Feature Level. In: Müller, C. (eds) Speaker Classification I. Lecture Notes in Computer Science(), vol 4343. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74200-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-74200-5_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74186-2
Online ISBN: 978-3-540-74200-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing Speaker Discrimination at the Feature Level

Abstract

Access this chapter

Preview

Similar content being viewed by others

Text-Independent Speaker Recognition System Using Feature-Level Fusion for Audio Databases of Various Sizes

Speaker Identification Using MFCC Feature Extraction ANN Classification Technique

Using combined features to improve speaker verification in the face of limited reverberant data

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Enhancing Speaker Discrimination at the Feature Level

Abstract

Access this chapter

Preview

Similar content being viewed by others

Text-Independent Speaker Recognition System Using Feature-Level Fusion for Audio Databases of Various Sizes

Speaker Identification Using MFCC Feature Extraction ANN Classification Technique

Using combined features to improve speaker verification in the face of limited reverberant data

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation