Abstract
Most of the speech processing applications suffer from a degradation in performance when operated in emotional environments. The degradation in performance is mostly due to a mismatch between developing and operating environments. Model adaptation and feature adaptation schemes have been employed to adapt speech systems developed in neutral environments to emotional environments. In this study, we have considered only anger emotion in emotional environments. In this work, we have studied the signal level conversion from anger emotion to neutral emotion. Emotion in human speech is concentrated over a small region in the entire utterance. The regions of speech that are highly influenced by the emotive state of the speaker is are considered as emotionally significant regions of an utterance. Physiological constraints of human speech production mechanism are explored to detect the emotionally significant regions of an utterance. Variation of various prosody parameters (Pitch, duration and energy) based on their position in the sentences is analyzed to obtain the modification factors. Speech signal in the emotionally significant regions is modified using the corresponding modification factor to generate the neutral version of the anger speech. Speech samples from Indian Institute of Technology Kharagpur Simulated Emotion Speech Corpus (IITKGP-SESC) are used in this study. A subjective listening test is performed for evaluating the effectiveness of the proposed conversion.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alku, P.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Commun. 11(2), 109–118 (1992)
Batliner, A., Steidl, S., Seppi, D., Schuller, B.: Segmenting into adequate units for automatic recognition of emotion-related episodes: a speech-based approach. Adv. Hum. Comput. Interact. 2010, 3 (2010)
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. Sig. Process. Mag. IEEE 18(1), 32–80 (2001)
Gangamohan, P., Kadiri, S.R., Yegnanarayana, B.: Analysis of emotional speech at subsegmental level. In: INTERSPEECH, pp. 1916–1920 (2013)
Hansen, J.H., Bou-Ghazale, S.E., Sarikaya, R., Pellom, B.: Getting started with susas: a speech under simulated and actual stress database. In: Eurospeech, vol. 97, pp. 1743–1746 (1997)
Hansen, J.H., Womack, B.D.: Feature analysis and neural network-based classification of speech under stress. IEEE Trans. Speech Audio Process. 4(4), 307–313 (1996)
Kadiri, S.R., Gangamohan, P., Yegnanarayana, B.: Discriminating neutral and emotional speech using neural networks. ICON (2014)
Koolagudi, S.G., Maity, S., Kumar, V.A., Chakrabarti, S., Rao, K.S.: IITKGP-SESC: speech database for emotion analysis. In: Ranka, S., Aluru, S., Buyya, R., Chung, Y.-C., Dua, S., Grama, A., Gupta, S.K.S., Kumar, R., Phoha, V.V. (eds.) IC3 2009. CCIS, vol. 40, pp. 485–492. Springer, Heidelberg (2009)
Krothapalli, S.R., Yadav, J., Sarkar, S., Koolagudi, S.G., Vuppala, A.K.: Neural network based feature transformation for emotion independent speaker identification. Int. J. Speech Technol. 15(3), 335–349 (2012)
Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Speech Audio Lang. Process. 16(8), 1602–1613 (2008)
Murty, K.: Significance of excitation source information for speech analysis. Ph.D. thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras (2009)
Ortony, A., Clore, G.L., Collins, A.: The cognitive structure of emotions. Cambridge University Press, Cambridge (1990)
Raja, G.S., Dandapat, S.: Speaker recognition under stressed condition. Int. J. Speech Technol. 13(3), 141–161 (2010)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Signal Process. 10(1), 19–41 (2000)
Schuller, B., Stadermann, J., Rigoll, G.: Affect-robust speech recognition by dynamic emotional adaptation. In: Proceedings of Speech Prosody. Citeseer (2006)
Stevens, K.N.: Acoustic Phonetics, vol. 30. MIT press, Cambridge (2000)
Tao, J., Kang, Y., Li, A.: Prosody conversion from neutral speech to emotional speech. IEEE Trans. Audio Speech Lang. Process. 14(4), 1145–1154 (2006)
Valbret, H., Moulines, E., Tubach, J.P.: Voice transformation using psola technique. In: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP-92, vol. 1, pp. 145–148. IEEE (1992)
Vlasenko, B., Philippou-Hübner, D., Prylipko, D., Böck, R., Siegert, I., Wendemuth, A.: Vowels formants analysis allows straightforward detection of high arousal emotions. In: 2011 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2011)
Vlasenko, B., Prylipko, D., Wendemuth, A.: Towards robust spontaneous speech recognition with emotional speech adapted acoustic models. In: Poster and Demo Track of the 35th German Conference on Artificial Intelligence, KI-2012, pp. 103–107. Citeseer, Saarbrucken, Germany (2012)
Vlasenko, B., Wendemuth, A.: Location of an emotionally neutral region in valence-arousal space: two-class vs. three-class cross corpora emotion recognition evaluations. In: 2014 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2014)
Vuppala, A.K., Kadiri, S.R.: Neutral to anger speech conversion using non-uniform duration modification. In: 2014 9th International Conference on Industrial and Information Systems (ICIIS), pp. 1–4. IEEE (2014)
Vuppala, A.K., Limmayya, J., Raghavendra, G.: Neutral speech to anger speech conversion using prosody modification. In: Prasath, R., Kathirvalavakumar, T. (eds.) MIKE 2013. LNCS, vol. 8284, pp. 383–390. Springer, Heidelberg (2013)
Vydana, H.K., Kadiri, S.R., Vuppala, A.K.: Vowel-based non-uniform prosody modification for emotion conversion. Circuits Syst. Signal Process. 34, 1–21 (2015)
Vydana, H.K., Kumar, P.P., Krishna, K., Vuppala, A.K.: Improved emotion recognition using GMM-UBMs. In: 2015 International Conference on Signal Processing And Communication Engineering Systems (SPACES), pp. 53–57. IEEE (2015)
Yang, B., Lugger, M.: Emotion recognition from speech signals using new harmony features. Signal Process. 90(5), 1415–1423 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Vydana, H.K., Raju, V.V.V., Gangashetty, S.V., Vuppala, A.K. (2015). Significance of Emotionally Significant Regions of Speech for Emotive to Neutral Conversion. In: Prasath, R., Vuppala, A., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2015. Lecture Notes in Computer Science(), vol 9468. Springer, Cham. https://doi.org/10.1007/978-3-319-26832-3_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-26832-3_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26831-6
Online ISBN: 978-3-319-26832-3
eBook Packages: Computer ScienceComputer Science (R0)