Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features

Mao, Qi-rong; Zhao, Xiao-lei; Huang, Zheng-wei; Zhan, Yong-zhao

doi:10.1631/jzus.CIDE1310

Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features

Published: 12 July 2013

Volume 14, pages 573–582, (2013)
Cite this article

Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Qi-rong Mao¹,
Xiao-lei Zhao¹,
Zheng-wei Huang¹ &
…
Yong-zhao Zhan¹

345 Accesses
5 Citations
Explore all metrics

Abstract

Functional paralanguage includes considerable emotion information, and it is insensitive to speaker changes. To improve the emotion recognition accuracy under the condition of speaker-independence, a fusion method combining the functional paralanguage features with the accompanying paralanguage features is proposed for the speaker-independent speech emotion recognition. Using this method, the functional paralanguages, such as laughter, cry, and sigh, are used to assist speech emotion recognition. The contributions of our work are threefold. First, one emotional speech database including six kinds of functional paralanguage and six typical emotions were recorded by our research group. Second, the functional paralanguage is put forward to recognize the speech emotions combined with the accompanying paralanguage features. Third, a fusion algorithm based on confidences and probabilities is proposed to combine the functional paralanguage features with the accompanying paralanguage features for speech emotion recognition. We evaluate the usefulness of the functional paralanguage features and the fusion algorithm in terms of precision, recall, and F1-measurement on the emotional speech database recorded by our research group. The overall recognition accuracy achieved for six emotions is over 67% in the speaker-independent condition using the functional paralanguage features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhanced multiclass SVM with thresholding fusion for speech-based emotion classification

Article 28 October 2016

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Speech emotion recognition based on multi‐feature and multi‐lingual fusion

Article 28 August 2021

References

Bachorowski, J.A., Smoski, M.J., Owren, M.J., 2011. The acoustic features of human laughter. J. Acoust. Soc. Am., 110(3):1581–1597. [doi:10.1121/1.1391244]
Article Google Scholar
Berler, A., Shimony, S.E., 1997. Bayes Networks for Sonar Sensor Fusion. Proc. 13th Conf. on Uncertainty in Artificial Intelligence, p.14–21.
Google Scholar
Devillers, L., Vidrascu, L., 2006. Real-Life Emotions Detection with Lexical and Paralinguistic Cues on Human-Human Call Center Dialogs. Proc. Interspeech, p.801–804.
Google Scholar
El Ayadi, M., Kamel, M.S., Karray, F., 2011. Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn., 44(3):572–587. [doi:10.1016/j.patcog.2010.09.020]
Article MATH Google Scholar
Fujie, S., Ejiri, Y., Matsusaka, Y., Kikuchi, H., 2003. Recognition of Paralinguistic Information and Its Application to Spoken Dialogue System. Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, p.231–236. [doi:10.1109/ASRU.2003.1318446]
Google Scholar
Hayashi, Y., 1999. Recognition of Vocal Expression of Emotions in Japanese: Using the Interjection eh ‘Korean’. Proc. Int. Conf. on Phonetic Sciences, p.2355–2359.
Google Scholar
Huang, C.L., Wang, C.J., 2006. A GA-based feature selection and parameters optimization for support vector machines. Expert Syst. Appl., 31(2):231–240. [doi:10.1016/j.eswa.2005.09.024]
Article Google Scholar
Huang, C.W., Jin, Y., Zhao, Y., Yu, Y.H., Zhao, L., 2010. Design and establishment of practical speech emotion database. Techn. Acoust., 29(4):396–399 (in Chinese).
Google Scholar
Huq, S., Moussavi, Z., 2012. Acoustic breath-phase detection using tracheal breath sounds. Med. Biol. Eng. Comput., 50(3):297–308. [doi:10.1007/s11517-012-0869-9]
Article Google Scholar
Ishi, C.T., Ishiguro, H., Hagita, N., 2006. Evaluation of Prosodic and Voice Quality Features on Automatic Extraction of Paralinguistic Information. Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, p.9–15. [doi:10. 1109/IROS.2006.281786]
Google Scholar
Ishi, C.T., Ishiguro, H., Hagita, N., 2008. Automatic extraction of paralinguistic information using prosodic features related to F0, duration and voice quality. Speech Commun., 50(6):531–543. [doi:10.1016/j.specom.2008.03.009]
Article Google Scholar
Jones, C., Jonsson, I.M., 2008. Using paralinguistic cues in speech to recognize emotions in older car drivers. LNCS, 4868:229–240. [doi:10.1007/ 978-3-540-85099-1_20]
Google Scholar
Kennedy, L.S., Ellis, D.P.W., 2004. Laughter Detection in Meetings. Proc. Int. Conf. on Acoustics, Speech, and Signal Processing Meeting Recognition Workshop, p.118–121.
Google Scholar
Kleckova, J., 2009. Important Nonverbal Attributes for Spontaneous Speech Recognition. 4th Int. Conf. on Systems, p.13–16. [doi:10.1109/ICONS.2009.41]
Google Scholar
Li, C.G., 2004. Paralinguistic Studying. MS Thesis, Heilongjiang University, Harbin, China (in Chinese).
Google Scholar
Li, Y.C., Wang, B., Wei, J., Qian, C., Huang, Y., 2002. An efficient combination rule of evidence theory. J. Data Acquis. Process., 17(1):33–36 (in Chinese).
Google Scholar
Li, Y.X., He, Q.H., 2011. Detecting laughter in spontaneous speech by constructing laughter bouts. Int. J. Speech Technol., 14(3):211–225. [doi:10.1007/s10772-011-9097-1]
Article Google Scholar
Maekawam, K., 2004. Production and Perception of ‘Paralinguistic’ Information. Int. Conf. on Speech Prosody, p.367–374.
Google Scholar
Mao, Q.R., Wang, X.J., Zhan, Y.Z., 2010. Speech emotion recognition method based on improved decision tree and layered feature selection. Int. J. Human. Rob., 7(2):245–261. [doi:10.1142/S0219843610002088]
Article Google Scholar
Matos, S., Birring, S.S., Pavord, I.D., Evans, D.H., 2006. Detection of cough signals in continuous audio recordings using hidden Markov models. IEEE Trans. Biomed. Eng., 53(6):1078–1083. [doi:10.1109/TBME.2006.873548]
Article Google Scholar
Pal, P., Iyer, A.N., Yantorno, R.E., 2006. Emotion Detection from Infant Facial Expressions and Cries. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.721–724. [doi:10.1109/ICASSP.2006.1660444]
Google Scholar
Petridis, S., Pantic, M., 2008. Audiovisual Discrimination Between Laughter and Speech. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.5117–5120. [doi:10.1109/ICASSP.2008.4518810]
Google Scholar
Pudil, P., Novovicova, J., Kittler, J., 1994. Flating search methods in feature selection. Pattern Recogn. Lett., 15(11): 1119–1125. [doi:10.1016/0167-8655(94)90127-9]
Article Google Scholar
Sundaramb, S., Narayananc, S., 2007. Automatic acoustic synthesis of human-like laughter. J. Acoust. Soc. Am., 121(1):527–535. [doi:10.1121/1.2390679]
Article Google Scholar
Szameitat, D.P., Darwin, C.J., Szameitat, A.J., 2007. Formant Characteristics of Human Laughter. Interdisciplinary Workshop on the Phonetics of Laughter, p.4–5.
Google Scholar
Truong, K.P., van Leeuwen, D.A., 2005. Automatic Detection of Laughter. Proc. 9th European Conf. on Speech Communication and Technology, p.485–488.
Google Scholar
Truong, K.P., van Leeuwen, D.A., 2007. Automatic discrimination between laughter and speech. Speech Commun., 49(2):144–158. [doi:10.1016/j.specom.2007.01.001]
Article Google Scholar
Yang, Y.M., Liu, X., 1999. A Re-examination of Text Categorization Methods. Proc. ACM SIGIR Conf. on Research and Development in Information Retrieval, p.42–49.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, 212013, China
Qi-rong Mao, Xiao-lei Zhao, Zheng-wei Huang & Yong-zhao Zhan

Authors

Qi-rong Mao
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-lei Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zheng-wei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yong-zhao Zhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qi-rong Mao.

Additional information

Project supported by the National Natural Science Foundation of China (Nos. 61272211 and 61170126), the Natural Science Foundation of Jiangsu Province (No. BK2011521), and the Research Foundation for Talented Scholars of Jiangsu University (No. 10JDG065), China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mao, Qr., Zhao, Xl., Huang, Zw. et al. Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features. J. Zhejiang Univ. - Sci. C 14, 573–582 (2013). https://doi.org/10.1631/jzus.CIDE1310

Download citation

Received: 29 December 2012
Revised: 12 May 2013
Published: 12 July 2013
Issue Date: July 2013
DOI: https://doi.org/10.1631/jzus.CIDE1310

Key words

CLC number

TP391.4

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features

Abstract

Access this article

Similar content being viewed by others

Enhanced multiclass SVM with thresholding fusion for speech-based emotion classification

Speech Emotion Recognition: A Comprehensive Survey

Speech emotion recognition based on multi‐feature and multi‐lingual fusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features

Abstract

Access this article

Similar content being viewed by others

Enhanced multiclass SVM with thresholding fusion for speech-based emotion classification

Speech Emotion Recognition: A Comprehensive Survey

Speech emotion recognition based on multi‐feature and multi‐lingual fusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation