Emotional speech analysis using harmonic plus noise model and Gaussian mixture model

Singh, Jang Bahadur; Lehana, Parveen Kumar

doi:10.1007/s10772-018-9549-y

Emotional speech analysis using harmonic plus noise model and Gaussian mixture model

Published: 28 August 2018

Volume 22, pages 483–496, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

234 Accesses
4 Citations
Explore all metrics

Abstract

Extracting the valuable information from the emotional speech is one of the major challenges in the areas of emotion recognition and human-machine interfaces. Most of the research in emotion recognition is based on the analysis of fundamental frequency, energy contour, duration of silence, formant, Mel-band energies, linear prediction cepstral coefficients, and Mel frequency cepstral coefficients. It was observed that emotion classification using sinusoidal features perform better as compared to the linear prediction and cepstral features. Harmonic models are considered as a variant of the sinusoidal model. In order to improve emotional speech classification rate and conversion of neutral speech to emotional speech, analysis using different harmonic features of emotional speech is a critical step. In this paper, investigations have been carried out using Berlin emotional speech database to analyze gender-based emotional speech using harmonic plus noise model (HNM) features and Gaussian mixture model (GMM). Analysis has been performed with the HNM features like pitch, harmonic amplitude, maximum voiced frequency and noise components. From the results, it can be observed that different emotional speech of male and female speakers can be represented with K components of GMM distribution. The optimal number of GMM components have been decided on the basis of Akaike information criterion (AIC) score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Akaike, H. (2011). Akaike’s information criterion. International encyclopedia of statistical science. Berlin: Springer.
Google Scholar
Ali, F. B., & Djaziri-Larbi, S. (2017). A long term harmonic plus noise model for narrow-band speech coding at very low bit-rates. In Telecommunications and Signal Processing, 40th International Conference, pp. 372–376.
Anagnostopoulos, C. N., Iliou, T., & Giannoukos, I. (2015). Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011. Artificial Intelligence Review., 43(2), 155–177.
Article Google Scholar
Bandoin, G., & Stylianou, Y. (1996). On the transformation of the speech spectrum for voice conversion. In Proceeding of Fourth International Conference on Spoken Language Processing ICSLP ’96.
Bhaykar, M., Yadav, J., & Rao, K. S. (2013). Speaker dependent, speaker independent and cross language emotion recognition from speech using GMM and HMM. In Communications, National Conference, pp. 1–5.
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. In Ninth European Conference on Speech Communication and Technology.
Chavhan, Y., Dhore, M. L., & Yesaware, P. (2010). Speech emotion recognition using support vector machine. International Journal of Computer Applications, 1(20), 6–9.
Article Google Scholar
Degottex, G., & Stylianou, Y. (2013). Analysis and synthesis of speech using an adaptive full-band harmonic model. IEEE Transactions on Audio, Speech, and Language Processing, 21(10), 2085–2095.
Article Google Scholar
Erro, D., Sainz, I., Navas, E., & Hernaez, I. (2014). Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE Journal of Selected Topics in Signal Processing, 8(2), 184–194.
Article Google Scholar
Eslava, D. E., & Bilbao, A. M. (2008). Intra-lingual and cross-lingual voice conversion using harmonic plus stochastic models. Barcelona, Spain: PhD Thesis, Universitat Politechnica de Catalunya.
Gangeh, M. J., Fewzee, P., Ghodsi, A., Kamel, M. S., & Karray, F. (2014). Multiview supervised dictionary learning in speech emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing Institute of Electrical and Electronics Engineers (IEEE), 22(6), 1056–1068.
Google Scholar
Han, K., Yu, D., & Tashev, I. (2014). Speech emotion recognition using deep neural network and extreme learning machine. In Fifteenth Annual Conference of the International Speech Communication Association.
Haque, A., & Rao, K. S. (2017). Modification of energy spectra, epoch parameters and prosody for emotion conversion in speech. International Journal of Speech Technology, 20(1), 15–25.
Article Google Scholar
Hemptinne, C. (2006). Integration of the harmonic plus noise model into the hidden Markov model-based speech synthesis system. Master thesis.
Kafentzis, G. P., Rosec, O., & Stylianou, Y. (2014a). Robust full-band adaptive Sinusoidal analysis and synthesis of speech. In International Conference on Acoustics, Speech, and Signal Processing, pp. 6260–6264.
Kafentzis, G. P., Yakoumaki, T., Mouchtaris, A., & Stylianou, Y. (2014b). Analysis of emotional speech using an adaptive sinusoidal model. In European Signal Processing Conference, 2014 Proceedings of the 22nd European, pp. 1492–1496.
Karimi, S., & Sedaaghi, M. H. (2016). How to categorize emotional speech signals with respect to the speaker’s degree of emotional intensity. Turkish Journal of Electrical Engineering & Computer Sciences, 24(3), 1306–1324.
Article Google Scholar
Khanna, P., & Kumar, M. S. (2011). Application of vector quantization in emotion recognition from human speech. In International Conference on Information Intelligence, Systems, Technology and Management, pp. 118–125.
Kwon, O. W., Chan, K., Hao, J., & Lee, T. W. (2003). Emotion recognition by speech signals. In Eighth European Conference on Speech Communication and Technology.
Lehana, P. K., & Pandey, P. C. (2004). Harmonic plus noise model based speech synthesis in Hindi and pitch modification. In Proceedings of the 16th International Congress on Acoustics, pp. 3333–3336.
Li, R., Perneczky, R., Yakushev, I., Förster, S., Kurz, A., & Drzezga, A. (2015). Gaussian mixture models and model selection for [18F] fluorodeoxyglucose positron emission tomography classification in Alzheimer’s disease. PLoS ONE, 10(4), e0122731.
Article Google Scholar
Mao, X., Chen, L., & Fu, L., (2009). Multi-level speech emotion recognition based on HMM and ANN. In 2009 World Congress on Computer Science and Information Engineering, Los Angeles, CA, pp. 225–229.
Moon, T. K. (1996). The expectation-maximization algorithm. IEEE Signal Processing Magazine, 13(6), 47–60.
Article Google Scholar
Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623.
Article Google Scholar
Pantazis, Y., Rosec, O., & Stylianou, Y. (2008). On the estimation of the speech harmonic model. In ISCA Tutorial and Research Workshop (ITRW) on Speech Analysis and Processing for Knowledge Discovery.
Pantazis, Y., Rosec, O., & Stylianou, Y. (2011). Adaptive AM-FM signal decomposition with application to speech analysis. IEEE Transactions on Audio, Speech, and Language Processing, 19(2), 290–300.
Article Google Scholar
Pantazis, Y., & Stylianou, Y. (2008). Improving the modeling of the noise part in the harmonic plus noise model of speech. In Acoustics, Speech and Signal Processing, IEEE International Conference, pp. 4609–4612.
Ramakrishnan, S., & El Emary, I. M. (2013). Speech emotion recognition approaches in human computer interaction. Telecommunication Systems, 52(3), 1467–1478.
Article Google Scholar
Ramamohan, S., & Dandapat, S. (2006). Sinusoidal model-based analysis and classification of stressed speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 737–746.
Article Google Scholar
Shahzadi, A., Ahmadyfard, A., Harimi, A., & Yaghmaie, K. (2015). Speech emotion recognition using nonlinear dynamics features. Turkish Journal of Electrical Engineering & Computer Sciences, 23, 2056–2073.
Article Google Scholar
Singh, R., Kumar, A., & Lehana, P. K. (2017). Effect of bandwidth modifications on the quality of speech imitated by Alexandrine and Indian Ringneck parrots. International Journal of Speech Technology, 20(3), 659–672.
Article Google Scholar
Stylianou, Y. (2001). Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Transactions on Speech and Audio Processing, 9(1), 21–29.
Article Google Scholar
Stylianou, Y., & Cappe, O. (1998). A system for voice conversion based on probabilistic classification and a harmonic plus noise model. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP ’98 (Cat No98CH36181).
Tao, J., Kang, Y., & Li, A. (2006). Prosody conversion from neutral speech to emotional speech. IEEE Transactions on Audio, Speech and Language Processing, 14(4), 1145–1154.
Article Google Scholar
Truong, K. P., & van Leeuwen, D. A. (2007). Automatic discrimination between laughter and speech. Speech Communication, 49(2), 144–158.
Article Google Scholar
Ververidis, D., & Kotropoulos, C. (2004). Automatic speech classification to five emotional states based on gender information. In European Signal Processing Conference, pp. 341–344.
Vogt, T., & André, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In Proceedings of the Language Resources and Evaluation Conference, Genoa.
Yakoumaki, T., Kafentzis, G. P., & Stylianou, Y. (2014). Emotional speech classification using adaptive sinusoidal modelling. In Fifteenth Annual Conference of the International Speech Communication Association.

Download references

Author information

Authors and Affiliations

DSP Lab, Department of Electronics, University of Jammu, Jammu, 180006, India
Jang Bahadur Singh & Parveen Kumar Lehana

Authors

Jang Bahadur Singh
View author publications
You can also search for this author in PubMed Google Scholar
Parveen Kumar Lehana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jang Bahadur Singh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh, J.B., Lehana, P.K. Emotional speech analysis using harmonic plus noise model and Gaussian mixture model. Int J Speech Technol 22, 483–496 (2019). https://doi.org/10.1007/s10772-018-9549-y

Download citation

Received: 07 March 2018
Accepted: 27 July 2018
Published: 28 August 2018
Issue Date: September 2019
DOI: https://doi.org/10.1007/s10772-018-9549-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Emotional speech analysis using harmonic plus noise model and Gaussian mixture model

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Speech Emotion Recognition: A Comprehensive Survey

Imagined speech classification exploiting EEG power spectrum features

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Emotional speech analysis using harmonic plus noise model and Gaussian mixture model

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Speech Emotion Recognition: A Comprehensive Survey

Imagined speech classification exploiting EEG power spectrum features

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation