Skip to main content
Log in

Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features

  • Published:
Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Abstract

Functional paralanguage includes considerable emotion information, and it is insensitive to speaker changes. To improve the emotion recognition accuracy under the condition of speaker-independence, a fusion method combining the functional paralanguage features with the accompanying paralanguage features is proposed for the speaker-independent speech emotion recognition. Using this method, the functional paralanguages, such as laughter, cry, and sigh, are used to assist speech emotion recognition. The contributions of our work are threefold. First, one emotional speech database including six kinds of functional paralanguage and six typical emotions were recorded by our research group. Second, the functional paralanguage is put forward to recognize the speech emotions combined with the accompanying paralanguage features. Third, a fusion algorithm based on confidences and probabilities is proposed to combine the functional paralanguage features with the accompanying paralanguage features for speech emotion recognition. We evaluate the usefulness of the functional paralanguage features and the fusion algorithm in terms of precision, recall, and F1-measurement on the emotional speech database recorded by our research group. The overall recognition accuracy achieved for six emotions is over 67% in the speaker-independent condition using the functional paralanguage features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bachorowski, J.A., Smoski, M.J., Owren, M.J., 2011. The acoustic features of human laughter. J. Acoust. Soc. Am., 110(3):1581–1597. [doi:10.1121/1.1391244]

    Article  Google Scholar 

  • Berler, A., Shimony, S.E., 1997. Bayes Networks for Sonar Sensor Fusion. Proc. 13th Conf. on Uncertainty in Artificial Intelligence, p.14–21.

    Google Scholar 

  • Devillers, L., Vidrascu, L., 2006. Real-Life Emotions Detection with Lexical and Paralinguistic Cues on Human-Human Call Center Dialogs. Proc. Interspeech, p.801–804.

    Google Scholar 

  • El Ayadi, M., Kamel, M.S., Karray, F., 2011. Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn., 44(3):572–587. [doi:10.1016/j.patcog.2010.09.020]

    Article  MATH  Google Scholar 

  • Fujie, S., Ejiri, Y., Matsusaka, Y., Kikuchi, H., 2003. Recognition of Paralinguistic Information and Its Application to Spoken Dialogue System. Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, p.231–236. [doi:10.1109/ASRU.2003.1318446]

    Google Scholar 

  • Hayashi, Y., 1999. Recognition of Vocal Expression of Emotions in Japanese: Using the Interjection eh ‘Korean’. Proc. Int. Conf. on Phonetic Sciences, p.2355–2359.

    Google Scholar 

  • Huang, C.L., Wang, C.J., 2006. A GA-based feature selection and parameters optimization for support vector machines. Expert Syst. Appl., 31(2):231–240. [doi:10.1016/j.eswa.2005.09.024]

    Article  Google Scholar 

  • Huang, C.W., Jin, Y., Zhao, Y., Yu, Y.H., Zhao, L., 2010. Design and establishment of practical speech emotion database. Techn. Acoust., 29(4):396–399 (in Chinese).

    Google Scholar 

  • Huq, S., Moussavi, Z., 2012. Acoustic breath-phase detection using tracheal breath sounds. Med. Biol. Eng. Comput., 50(3):297–308. [doi:10.1007/s11517-012-0869-9]

    Article  Google Scholar 

  • Ishi, C.T., Ishiguro, H., Hagita, N., 2006. Evaluation of Prosodic and Voice Quality Features on Automatic Extraction of Paralinguistic Information. Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, p.9–15. [doi:10. 1109/IROS.2006.281786]

    Google Scholar 

  • Ishi, C.T., Ishiguro, H., Hagita, N., 2008. Automatic extraction of paralinguistic information using prosodic features related to F0, duration and voice quality. Speech Commun., 50(6):531–543. [doi:10.1016/j.specom.2008.03.009]

    Article  Google Scholar 

  • Jones, C., Jonsson, I.M., 2008. Using paralinguistic cues in speech to recognize emotions in older car drivers. LNCS, 4868:229–240. [doi:10.1007/ 978-3-540-85099-1_20]

    Google Scholar 

  • Kennedy, L.S., Ellis, D.P.W., 2004. Laughter Detection in Meetings. Proc. Int. Conf. on Acoustics, Speech, and Signal Processing Meeting Recognition Workshop, p.118–121.

    Google Scholar 

  • Kleckova, J., 2009. Important Nonverbal Attributes for Spontaneous Speech Recognition. 4th Int. Conf. on Systems, p.13–16. [doi:10.1109/ICONS.2009.41]

    Google Scholar 

  • Li, C.G., 2004. Paralinguistic Studying. MS Thesis, Heilongjiang University, Harbin, China (in Chinese).

    Google Scholar 

  • Li, Y.C., Wang, B., Wei, J., Qian, C., Huang, Y., 2002. An efficient combination rule of evidence theory. J. Data Acquis. Process., 17(1):33–36 (in Chinese).

    Google Scholar 

  • Li, Y.X., He, Q.H., 2011. Detecting laughter in spontaneous speech by constructing laughter bouts. Int. J. Speech Technol., 14(3):211–225. [doi:10.1007/s10772-011-9097-1]

    Article  Google Scholar 

  • Maekawam, K., 2004. Production and Perception of ‘Paralinguistic’ Information. Int. Conf. on Speech Prosody, p.367–374.

    Google Scholar 

  • Mao, Q.R., Wang, X.J., Zhan, Y.Z., 2010. Speech emotion recognition method based on improved decision tree and layered feature selection. Int. J. Human. Rob., 7(2):245–261. [doi:10.1142/S0219843610002088]

    Article  Google Scholar 

  • Matos, S., Birring, S.S., Pavord, I.D., Evans, D.H., 2006. Detection of cough signals in continuous audio recordings using hidden Markov models. IEEE Trans. Biomed. Eng., 53(6):1078–1083. [doi:10.1109/TBME.2006.873548]

    Article  Google Scholar 

  • Pal, P., Iyer, A.N., Yantorno, R.E., 2006. Emotion Detection from Infant Facial Expressions and Cries. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.721–724. [doi:10.1109/ICASSP.2006.1660444]

    Google Scholar 

  • Petridis, S., Pantic, M., 2008. Audiovisual Discrimination Between Laughter and Speech. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.5117–5120. [doi:10.1109/ICASSP.2008.4518810]

    Google Scholar 

  • Pudil, P., Novovicova, J., Kittler, J., 1994. Flating search methods in feature selection. Pattern Recogn. Lett., 15(11): 1119–1125. [doi:10.1016/0167-8655(94)90127-9]

    Article  Google Scholar 

  • Sundaramb, S., Narayananc, S., 2007. Automatic acoustic synthesis of human-like laughter. J. Acoust. Soc. Am., 121(1):527–535. [doi:10.1121/1.2390679]

    Article  Google Scholar 

  • Szameitat, D.P., Darwin, C.J., Szameitat, A.J., 2007. Formant Characteristics of Human Laughter. Interdisciplinary Workshop on the Phonetics of Laughter, p.4–5.

    Google Scholar 

  • Truong, K.P., van Leeuwen, D.A., 2005. Automatic Detection of Laughter. Proc. 9th European Conf. on Speech Communication and Technology, p.485–488.

    Google Scholar 

  • Truong, K.P., van Leeuwen, D.A., 2007. Automatic discrimination between laughter and speech. Speech Commun., 49(2):144–158. [doi:10.1016/j.specom.2007.01.001]

    Article  Google Scholar 

  • Yang, Y.M., Liu, X., 1999. A Re-examination of Text Categorization Methods. Proc. ACM SIGIR Conf. on Research and Development in Information Retrieval, p.42–49.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi-rong Mao.

Additional information

Project supported by the National Natural Science Foundation of China (Nos. 61272211 and 61170126), the Natural Science Foundation of Jiangsu Province (No. BK2011521), and the Research Foundation for Talented Scholars of Jiangsu University (No. 10JDG065), China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mao, Qr., Zhao, Xl., Huang, Zw. et al. Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features. J. Zhejiang Univ. - Sci. C 14, 573–582 (2013). https://doi.org/10.1631/jzus.CIDE1310

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.CIDE1310

Key words

CLC number

Navigation