Skip to main content

Modelling Speaker Variability Using Covariance Learning

  • Conference paper
  • First Online:
Artificial Intelligence and Soft Computing (ICAISC 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10841))

Included in the following conference series:

  • 2144 Accesses

Abstract

In this contribution, we investigate the relationship between speakers and speech utterance, and propose a speaker normalization/adaptation model that incorporates correlation amongst the utterance classes produced by male and female speakers of varying age categories (children: 0–15; youths: 16–30; adults: 31–50; seniors: \({>}50\)). Using Principal Component Analysis (PCA), a speaker space was constructed, and based on the speaker covariance matrix obtained directly from the speech data signals, a visualisation of the first three principal components (PCs) was achieved. For effective covariance learning, a component-wise normalisation of each vector weights of the covariance matrix was performed, and a machine learning algorithm (the SOM: self organising map) implemented to model selected speaker features (F0, intensity, pulse) variability. Results obtained reveal that, for the features selected, F0 gave the most variance, as both genders exhibited high variability. For male speakers, PC1 captured the most variance of 87%, while PC2 and PC3 captured the least variances of 7% and 3%, respectively. For female speakers, PC1 captured the most variance of 97%, while PC2 and PC3 captured the least variances of 2% and 1%, respectively. Further, intensity and pulse features show close similarity patterns between the speech features, and are not most relevant for speaker variability modelling. Component planes visualisation of the respective speech patterns learned from the features covariance revealed consistent patterns, and hence, useful in speaker recognition systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kajarekar, S.S.: Analysis of variability in speech with applications to speech and speaker recognition. Ph.D. thesis, Oregon Health and Science University, Oregon (2002)

    Google Scholar 

  2. Chen, T., Huang, C., Chang, E., Wang, J.: On the use of Gaussian mixture model for speaker variability analysis. In: 17th International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 1–4 (2002)

    Google Scholar 

  3. Huang, C., Chen, T., Li, S., Chang, E., Zhou, J.: Analysis of speaker variability. In: 7th European Conference on Speech Communication and Technology, Scandinavia, pp. 1–4 (2001)

    Google Scholar 

  4. Kohonen, T.: MATLAB Implementations and Applications of the Self-organizing Map. Unigrafia Oy, Helsinki (2014)

    Google Scholar 

  5. Zehraoui, F., Bennani, Y.: M-SOM: matricial self organizing map for sequence clustering and classification. In: Proceedings of IEEE International Joint Conference on Neural Networks, Budapest, Hungary, vol. 1, pp. 763–768 (2004)

    Google Scholar 

  6. Le Cun, Y., Kanter, I., Solla, S.A.: Eigenvalues of covariance matrices: application to neural-network learning. Phys. Rev. Lett. 66(18), 2396 (1991)

    Article  Google Scholar 

  7. Park, S., Mun, S., Lee, Y., Ko, H.: Acoustic scene classification based on convolution neural network using double image features. In: Proceedings of Detection and Classification of Acoustic Scenes and Events Workshop, Munich, Germany, pp. 1–5 (2017)

    Google Scholar 

  8. Vesanto, J., Himberg, J., Alhoniemi, E., Parhankangas, J.: Self-organizing map in MATLAB: the SOM Toolbox. In: Proceedings of MATLAB DSP Conference, Espoo, Finland (1999)

    Google Scholar 

Download references

Acknowledgments

This research is funded by the Tertiary Education Trust Fund (TETFund), Nigeria. We also appreciate the students, staff and other participants who accepted to offer their voices for this experiment.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moses Ekpenyong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ekpenyong, M., Umoren, I. (2018). Modelling Speaker Variability Using Covariance Learning. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2018. Lecture Notes in Computer Science(), vol 10841. Springer, Cham. https://doi.org/10.1007/978-3-319-91253-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91253-0_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91252-3

  • Online ISBN: 978-3-319-91253-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics