Skip to main content
Log in

Vocal emotion recognition in five native languages of Assam using new wavelet features

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The present work investigates the following specific research questions concerning voice emotion recognition: whether vocal emotion expressions of discrete emotion (i) can be distinguished from no-emotion (i.e. neutral), (ii) can be distinguished from another, (iii) of surprise, which is actually a cognitive component that could be present with any emotion, can also be recognized as distinct emotion, (iv) can be recognized cross-lingually. This study will enable us to get more information regarding nature and function of emotion. Furthermore, this work will help in developing a generalized voice emotion recognition system, which will increase the efficiency of human-machine interaction systems. In this work an emotional utterance database is created with 140 acted utterances per speaker consisting of short sentences of six full-blown basic emotions and neutral of five native languages of Assam. This database is validated by a Listening Test. Four feature sets are extracted based on WPCC2 (Wavelet-Packet-Cepstral-Coefficients computed by method 2), MFCC (Mel-Frequency-Cepstral-Coefficients), tfWPCC2 (Teager-energy-operated-in-Transform-domain WPCC2) and tfMFCC. The Gaussian Mixture Model (GMM) is used as classifier. The performances of all these feature sets are compared in respect of accuracy of classification in two experiments: (i) text-and-speaker independent vocal emotion recognition in individual languages, and (ii) cross-lingual vocal emotion recognition. tfWPCC2 is a new wavelet feature set proposed by the same authors in one of their recent papers in a National Seminar in India as cited in References.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Bahoura, M., & Rouat, J. (2006). Wavelet speech enhancement based on time-scale adaptation. Speech Communication, 48, 1620–1637.

    Article  Google Scholar 

  • Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70(3), 614–636.

    Article  Google Scholar 

  • Borden, G. J., Harris, K. S., & Raphael, L. J. (1994). Speech science primer: Physiology, acoustics and perception of speech (3rd ed.). Baltimore: Williams and Wilkins.

    Google Scholar 

  • Boruah, B. K. (2003). Asamar Bhasa. Dibrugarh: Banalata.

    Google Scholar 

  • Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., & Taylor, J. G. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80.

    Article  Google Scholar 

  • Darwin, C. (1872/1965). The expression of the emotions in man and animals. Chicago: Chicago University Press.

    Google Scholar 

  • Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Audio Speech and Signal Processing, 28(4), 357–365.

    Article  Google Scholar 

  • Ekman, P. (1992). An argument for basic emotion. Cognition & Emotion, 6, 169–200.

    Article  Google Scholar 

  • Ekman, P. (1999). Basic emotions. In T. Dalgleish & M. Power (Eds.), Handbook of cognition and emotion. London: Wiley. Chap. 3.

    Google Scholar 

  • Farooq, O., & Datta, S. (2001). Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Processing Letters, 8(7), 196–198.

    Article  Google Scholar 

  • Fukunaga, K. (1990). Introduction to statistical pattern recognition (2nd ed.). New York: Morgan Kaufmann, Academic Press.

    MATH  Google Scholar 

  • Furui, S. (1989). Digital speech processing, synthesis and recognition. New York: Dekker.

    Google Scholar 

  • Goswami, G. C., & Tamuli, J. (2003). Asamiya. In G. Cardona & D. Jain (Eds.), Routledge language family series : Vol. 2. The Indo-Aryan languages (pp. 391–404). London: Routledge.

    Google Scholar 

  • Hammond, K. R., & Stewart, T. R. (2001). The essential Brunswik—beginnings, explications and applications. Oxford: Oxford University Press.

    Google Scholar 

  • Holmes, J., & Holmes, W. (2001). Speech synthesis and recognition (2nd ed.). New York: Taylor & Francis.

    Google Scholar 

  • Hui, G., Shanguang, C., & Guangchuan, S. (2007). Emotion classification of Mandarin speech based on TEO nonlinear features. In Proc. IEEE 8th ACIS int. conf. SNPD (Vol. 3, pp. 394–398).

  • Jacquesson, F. (2008). A Dimasa grammar. Internet.

  • Jurafsky, D., & Martin, J. H. (2000). Speech and language processing. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Juslin, P. N., & Lauka, P. (2003). Communication of emotions in vocal expression and music performance. Psychological Bulletin, 129(5), 770–814.

    Article  Google Scholar 

  • Kaiser, J. F. (1990a). On a simple algorithm to calculate the ‘energy’ of a signal. In Proc. IEEE int. conf. acoustics. speech. and signal processing (Vol. 1, pp. 381–384), Albuquerque, NM.

  • Kaiser, J. F. (1990b). On Teager’s energy algorithm and its generalization to continuous signals. In Proc. 4th IEEE digital signal processing workshop, Mohonk (New Paltz), NY.

  • Kakati, B. (1995). Assamese, its formation and development. Guwahati: LBS Publications.

    Google Scholar 

  • Kandali, A. B., Routray, A., & Basu, T. K. (2008a). Emotion recognition from speeches of some native languages of Assam independent of text and speaker. National Seminar on Devices, Circuits and Communication, Department of E.C.E., B.I.T. Mesra, Ranchi, Jharkhand, India, 6–7 Nov.

  • Kandali, A. B., Routray, A., & Basu, T. K. (2008b). Emotion recognition from Assamese speeches using MFCC features and GMM classifier. In Proc. IEEE region 10 conference TENCON 2008, 19–21 Nov., Hyderabad, India (pp. 1–5).

  • LaPolla, R. J., & Thurgood, G. (Eds.) (2002). Routledge language family series. The Sino-Tibetan languages. London: Routledge.

    Google Scholar 

  • Laukka, P. (2004). Vocal expression of emotion—discrete-emotion and dimensional accounts. Comprehensive Summaries of Uppsala Dissertations from the Faculty of Social Sciences 141, ACTA Universitatis Upsaliensis, Uppsala. Experiments.

  • Lazarus, R. S. (1991). Emotion & adaptation. New York: Oxford University Press.

    Google Scholar 

  • Linde, Y., Buzo, A., & Gray, R. M. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28(1), 84–95.

    Article  Google Scholar 

  • Mallat, S. (2006). A wavelet tour of signal processing (2nd ed.). New Delhi: Academic Press, Elsevier.

    Google Scholar 

  • Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. The Journal of the Acoustic Society of America, 93(2), 1097–1108.

    Article  Google Scholar 

  • Murray, I. R., & Arnott, J. L. (1995). Implementation and testing of a system for producing emotion-by-rule in synthetic speech. Speech Communication, 16, 369–390.

    Article  Google Scholar 

  • New, T. L., Foo, S. W., & Silva, L. C. D. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603–623.

    Article  Google Scholar 

  • Oatley, K., & Johnson-Laird, P. N. (1987). Towards a cognitive theory of emotions. Cognition and Emotion, 1, 29–50.

    Article  Google Scholar 

  • Pathak, R. (2008). Asomiya Bhasar Itihas. Guwahati: Ashok Book Stall.

    Google Scholar 

  • Patil, H. A., Dutta, P. K., & Basu, T. K. (2006). The wavelet packet based cepstral features for open set speaker classification in Marathi. In M. Spiliopoulou et al. (Eds.), Studies in classification, data analysis, and knowledge organization (pp. 134–141). Berlin: Springer.

    Google Scholar 

  • Picard, R. W. (1997). Affective computing. Cambridge: MIT Press.

    Google Scholar 

  • Plutchik, R. (1994). The psychology and biology of emotion. New York: Harper Collins.

    Google Scholar 

  • Power, M., & Dalgleish, T. (2008). Cognition and emotion—from order to disorder. Hove: Psychology Press.

    Google Scholar 

  • Quatieri, T. F. (2002). Discrete time speech signal processing. Upper Saddle River: Prentice-Hall.

    Google Scholar 

  • Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Ramamohan, S., & Dandapat, S. (2006). Sinusoidal model-based analysis and classification of stressed speech. IEEE Transactions on Audio, Speech and Language Processing, 14(3), 737–746.

    Article  Google Scholar 

  • Razak, A. A., Isa, A. H. M., & Komiya, R. (2004). A neural network approach for emotion recognition in speech. In Proc. 2nd int. conf. art. intell. in engineering & technology, Kota Kinabalu, Sabah, Malaysia.

  • Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Audio, Speech and Language Processing, 3(1), 72–83.

    Article  Google Scholar 

  • Rose, P. (2002). Forensic speaker identification. New York: Taylor & Francis, 302.

    Google Scholar 

  • Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39, 1161–1178.

    Article  Google Scholar 

  • Sarikiya, R., Pellom, B. L., & Hansen, J. H. L. (1998). Wavelet packet transform features with application to speaker identification. In Proc. IEEE nordic signal processing symposium (pp. 81–84).

  • Scherer, K. R. (1986). Vocal affect expression: a review and a model for future research. Psychological Bulletin, 99(2), 143–165.

    Article  MathSciNet  Google Scholar 

  • Scherer, K. R. (2003). Vocal communication of emotion: a review of research paradigms. Speech Communication, 40, 227–256.

    Article  MATH  Google Scholar 

  • Scherer, K. R., Banse, R., & Wallbott, H. G. (2001). Emotion inferences from vocal expression correlate across languages and cultures. Journal of Cross-Cultural Psychology, 32(1), 76–92.

    Article  Google Scholar 

  • Scherer, K. R., Johnstone, T., & Klasmeyer, G. (2003). Vocal expression of emotion. In R. J. Davidson, K. R. Scherer, & H. H. Goldsmith (Eds.), Handbook of affective science (1st ed.). Oxford: Oxford University Press. Part IV, Chap. 23.

    Google Scholar 

  • Singha, D. (2003). The phonology & morphology of Dimasa. M.A. Dissertation, Assam University, Silchar, Assam, India.

  • Singha, D. (2008). An introduction to Dimasa phonology. New Delhi: Saujanya Books.

    Google Scholar 

  • Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: resources, features, and methods. Speech Communication, 48, 1162–1181.

    Article  Google Scholar 

  • Ververidis, D., Kotropoulos, C., & Pitas, I. (2004). Automatic emotional speech classification. In ICASSP, 2004 (pp. I-593–I-596).

  • Vogt, T., & Andre, E. (2005). Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In Proc. IEEE.

  • Wang, Y., & Guan, L. (2004). An investigation of speech-based human emotion recognition. In IEEE 6th workshop on multimedia signal processing (pp. 15–18).

  • Williams, C. E., & Stevens, K. N. (1972). Emotions and speech: some acoustical correlates. Journal of Acoustic Society of America, 52, 1238–1250.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aditya Bihar Kandali.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kandali, A.B., Routray, A. & Basu, T.K. Vocal emotion recognition in five native languages of Assam using new wavelet features. Int J Speech Technol 12, 1–13 (2009). https://doi.org/10.1007/s10772-009-9046-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-009-9046-4

Keywords

Navigation