Modelling Speaker Variability Using Covariance Learning

Ekpenyong, Moses; Umoren, Imeh

doi:10.1007/978-3-319-91253-0_4

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10841))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

2144 Accesses

Abstract

In this contribution, we investigate the relationship between speakers and speech utterance, and propose a speaker normalization/adaptation model that incorporates correlation amongst the utterance classes produced by male and female speakers of varying age categories (children: 0–15; youths: 16–30; adults: 31–50; seniors: \({>}50\)). Using Principal Component Analysis (PCA), a speaker space was constructed, and based on the speaker covariance matrix obtained directly from the speech data signals, a visualisation of the first three principal components (PCs) was achieved. For effective covariance learning, a component-wise normalisation of each vector weights of the covariance matrix was performed, and a machine learning algorithm (the SOM: self organising map) implemented to model selected speaker features (F0, intensity, pulse) variability. Results obtained reveal that, for the features selected, F0 gave the most variance, as both genders exhibited high variability. For male speakers, PC1 captured the most variance of 87%, while PC2 and PC3 captured the least variances of 7% and 3%, respectively. For female speakers, PC1 captured the most variance of 97%, while PC2 and PC3 captured the least variances of 2% and 1%, respectively. Further, intensity and pulse features show close similarity patterns between the speech features, and are not most relevant for speaker variability modelling. Component planes visualisation of the respective speech patterns learned from the features covariance revealed consistent patterns, and hence, useful in speaker recognition systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kajarekar, S.S.: Analysis of variability in speech with applications to speech and speaker recognition. Ph.D. thesis, Oregon Health and Science University, Oregon (2002)
Google Scholar
Chen, T., Huang, C., Chang, E., Wang, J.: On the use of Gaussian mixture model for speaker variability analysis. In: 17th International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 1–4 (2002)
Google Scholar
Huang, C., Chen, T., Li, S., Chang, E., Zhou, J.: Analysis of speaker variability. In: 7th European Conference on Speech Communication and Technology, Scandinavia, pp. 1–4 (2001)
Google Scholar
Kohonen, T.: MATLAB Implementations and Applications of the Self-organizing Map. Unigrafia Oy, Helsinki (2014)
Google Scholar
Zehraoui, F., Bennani, Y.: M-SOM: matricial self organizing map for sequence clustering and classification. In: Proceedings of IEEE International Joint Conference on Neural Networks, Budapest, Hungary, vol. 1, pp. 763–768 (2004)
Google Scholar
Le Cun, Y., Kanter, I., Solla, S.A.: Eigenvalues of covariance matrices: application to neural-network learning. Phys. Rev. Lett. 66(18), 2396 (1991)
Article Google Scholar
Park, S., Mun, S., Lee, Y., Ko, H.: Acoustic scene classification based on convolution neural network using double image features. In: Proceedings of Detection and Classification of Acoustic Scenes and Events Workshop, Munich, Germany, pp. 1–5 (2017)
Google Scholar
Vesanto, J., Himberg, J., Alhoniemi, E., Parhankangas, J.: Self-organizing map in MATLAB: the SOM Toolbox. In: Proceedings of MATLAB DSP Conference, Espoo, Finland (1999)
Google Scholar

Download references

Acknowledgments

This research is funded by the Tertiary Education Trust Fund (TETFund), Nigeria. We also appreciate the students, staff and other participants who accepted to offer their voices for this experiment.

Author information

Authors and Affiliations

University of Uyo, Nwaniba Campus, Uyo, Nigeria
Moses Ekpenyong
Akwa Ibom State University, Ikot Akpaden Campus, Mkpat-Enin, Nigeria
Imeh Umoren

Authors

Moses Ekpenyong
View author publications
You can also search for this author in PubMed Google Scholar
Imeh Umoren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Moses Ekpenyong .

Editor information

Editors and Affiliations

Częstochowa University of Technology, Częstochowa, Poland
Leszek Rutkowski
Częstochowa University of Technology, Częstochowa, Poland
Rafał Scherer
Częstochowa University of Technology, Częstochowa, Poland
Marcin Korytkowski
University of Alberta, Edmonton, AB, Canada
Witold Pedrycz
AGH University of Science and Technology, Kraków, Poland
Ryszard Tadeusiewicz
University of Louisville, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ekpenyong, M., Umoren, I. (2018). Modelling Speaker Variability Using Covariance Learning. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2018. Lecture Notes in Computer Science(), vol 10841. Springer, Cham. https://doi.org/10.1007/978-3-319-91253-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-91253-0_4
Published: 11 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91252-3
Online ISBN: 978-3-319-91253-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics