Vocal-based emotion recognition using random forests and decision tree

Noroozi, Fatemeh; Sapiński, Tomasz; Kamińska, Dorota; Anbarjafari, Gholamreza

doi:10.1007/s10772-017-9396-2

Vocal-based emotion recognition using random forests and decision tree

Published: 09 February 2017

Volume 20, pages 239–246, (2017)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Fatemeh Noroozi¹,
Tomasz Sapiński²,
Dorota Kamińska² &
…
Gholamreza Anbarjafari ORCID: orcid.org/0000-0001-8460-5717^3,4

1485 Accesses
64 Citations
Explore all metrics

Abstract

This paper proposes a new vocal-based emotion recognition method using random forests, where pairs of the features on the whole speech signal, namely, pitch, intensity, the first four formants, the first four formants bandwidths, mean autocorrelation, mean noise-to-harmonics ratio and standard deviation, are used in order to recognize the emotional state of a speaker. The proposed technique adopts random forests to represent the speech signals, along with the decision-trees approach, in order to classify them into different categories. The emotions are broadly categorised into the six groups, which are happiness, fear, sadness, neutral, surprise, and disgust. The Surrey Audio-Visual Expressed Emotion database is used. According to the experimental results using leave-one-out cross-validation, by means of combining the most significant prosodic features, the proposed method has an average recognition rate of \(66.28\%\), and at the highest level, the recognition rate of \(78\%\) has been obtained, which belongs to the happiness voice signals. The proposed method has \(13.78\%\) higher average recognition rate and \(28.1\%\) higher best recognition rate compared to the linear discriminant analysis as well as \(6.58\%\) higher average recognition rate than the deep neural networks results, both of which have been implemented on the same database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PCA-Based Random Forest Classifier for Speech Emotion Recognition Using FFTF Features, Jitter, and Shimmer

Speech emotion recognition using multimodal feature fusion with machine learning approach

Article 21 April 2023

Speech Emotion Recognition Using CNN, k-NN, MLP and Random Forest

References

Anagnostopoulos, C. N., Iliou, T., & Giannoukos, I. (2015). Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011. Artificial Intelligence Review, 43(2), 155–177.
Article Google Scholar
Anbarjafari, G., & Aabloo, A. (2014). Expression recognition by using facial and vocal expressions. V&L Net, 2014, 103–105.
Google Scholar
Atassi, H., Esposito, A., Smekal, Z. (2011). Analysis of high-level features for vocal emotion recognition. In 2011 34th international conference on telecommunications and signal processing (TSP) (pp. 361–366). IEEE
Bahreini, K., Nadolski, R., Westera, W. (2013). Filtwam and voice emotion recognition. In Games and learning alliance (vol. 8605, pp. 116–129). Springer.
Bellantonio, M., Haque, M. A., Rodriguez, P., Nasrollahi, K., Telve, T., Escarela, S., Gonzalez, J., Moeslund, T. B., Rasti, P., Anbarjafari, G. (2016). Spatio-temporal pain recognition in cnn-based super-resolved facial images. In International conference on pattern recognition (ICPR). Springer.
Boersma, P., & Weenink, D. (2013). Praat software. Amsterdam: University of Amsterdam.
Google Scholar
Borchert, M., Dusterhoft, A. (2005). Emotions in speech-experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments. In Proceedings of 2005 IEEE international conference on natural language processing and knowledge engineering, 2005. IEEE NLP-KE’05 (pp. 147–151). IEEE.
Bouckaert, R. R., Frank, E., Hall, M., Kirkby, R., Reutemann, P., Seewald, A., Scuse, D. (2013). Weka manual for version 3-7-8.
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5–32.
Article MATH Google Scholar
Burget, R., Karasek, J., & Smekal, Z. (2011). Recognition of emotions in czech newspaper headlines. Radioengineering, 20(1), 39–47.
Google Scholar
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., et al. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80.
Article Google Scholar
Deterding, D. (1997). The formants of monophthong vowels in standard southern british english pronunciation. Journal of the International Phonetic Association, 27(1–2), 47–55.
Article Google Scholar
Devillers, L., Vidrascu, L. (2006). Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. In Interspeech (pp. 801–804).
Devillers, L., Vidrascu, L., & Lamel, L. (2005). Challenges in real-life emotion annotation and machine learning based detection. Neural Networks, 18(4), 407–422.
Article Google Scholar
El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.
Article MATH Google Scholar
Esposito, A., Esposito, A. M., & Vogel, C. (2015). Needs and challenges in human computer interaction for processing social emotional information. Pattern Recognition Letters, 66, 41–51.
Article Google Scholar
Fayek, H., Lech, M., Cavedon, L. (2015). Towards real-time speech emotion recognition using deep neural networks. In 2015 9th international conference on signal processing and communication systems (ICSPCS) (pp. 1–5). IEEE.
Gorham-Rowan, M. M., & Laures-Gore, J. (2006). Acoustic-perceptual correlates of voice quality in elderly men and women. Journal of communication disorders, 39(3), 171–184.
Article Google Scholar
Haq, S., Jackson, P. J., Edge, J. (2008). Audio-visual feature selection and reduction for emotion classification. In Proceedings of international conference on auditory-visual speech processing (AVSP), Tangalooma, Australia (2008)
Hunter, G., Kebede, H. (2012). Formant frequencies of British English vowels produced by native speakers of farsi. In Acoustics 2012
Ingale, A. B., & Chaudhari, D. (2012). Speech emotion recognition. International Journal of Soft Computing and Engineering (IJSCE), 2(1), 235–238.
Google Scholar
Jackson, P., Haq, S. (2014). Surrey audio-visual expressed emotion(savee) database.
Kamińska, D., & Pelikant, A. (2012). Recognition of human emotion from a speech signal based on plutchik’s model. International Journal of Electronics and Telecommunications, 58(2), 165–170.
Google Scholar
Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech: A review. International Journal of Speech Technology, 15(2), 99–117.
Article Google Scholar
Liaw, A., & Wiener, M. (2002). Classification and regression by randomforest. R News, 2(3), 18–22.
Google Scholar
Liu, H., & Motoda, H. (2007). Computational methods of feature selection. Boca Raton: CRC Press.
MATH Google Scholar
Lüsi, I., Escarela, S., Anbarjafari, G. (2016). Sase: Rgb-depth database for human head pose estimation. In Computer vision–ECCV 2016 workshops (pp. 325–336). Springer
Millhouse, T., Clermont, F., Davis, P. (2002). Exploring the importance of formant bandwidths in the production of the singer’s formant. In Proceedings of the 9th Australian SST (pp. 373–378).
Neiberg, D., Elenius, K., Laskowski, K. (2006). Emotion recognition in spontaneous speech using gmms. In Interspeech (pp. 809–812)
Nordhausen, K. (2013). Ensemble methods: Foundations and algorithms by Zhi-Hua Zhou. International Statistical Review, 81(3), 470–470.
Article Google Scholar
Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden markov models. Speech Communication, 41(4), 603–623.
Article Google Scholar
Palm, G., Glodek, M. (2013). Towards emotion recognition in human computer interaction. In Neural nets and surroundings (vol. 19, pp. 323–336). Springer.
Petrushin, V. A. (2000). Emotion recognition in speech signal: experimental study, development, and application. Studies, 3, 222–225.
Google Scholar
Pribil, J., & Pribilova, A. (2013). Determination of formant features in czech and slovak for gmm emotional speech classifier. Radioengineering, 22(1), 52–59.
Google Scholar
Puts, D. A., Hodges, C. R., Cárdenas, R. A., & Gaulin, S. J. (2007). Men’s voices as dominance signals: Vocal fundamental and formant frequencies influence dominance attributions among men. Evolution and Human Behavior, 28(5), 340–344.
Article Google Scholar
Rabiei, M., Gasparetto, A. (2014). A system for feature classification of emotions based on speech analysis; applications to human-robot interaction. In 2014 second RSI/ISM international conference on robotics and mechatronics (ICRoM) (pp. 795–800). IEEE
Refaeilzadeh, P., Tang, L., Liu, H. (2009). Cross-validation. In Encyclopedia of database systems (pp. 532–538). Springer (2009)
Rodriguez, J. J., Kuncheva, L. I., & Alonso, C. J. (2006). Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1619–1630.
Article Google Scholar
Scherer, K. R. (2013). Vocal markers of emotion: Comparing induction and acting elicitation. Computer Speech & Language, 27(1), 40–58.
Article Google Scholar
Scherer, K. R., Sundberg, J., Tamarit, L., & Salomão, G. L. (2015). Comparing the acoustic expression of emotion in the speaking and the singing voice. Computer Speech & Language, 29(1), 218–235.
Article Google Scholar
Schuller, B., Seppi, D., Batliner, A., Maier, A., Steidl, S. (2007). Towards more reality in the recognition of emotional speech. In IEEE international conference on Acoustics, speech and signal processing, 2007. ICASSP 2007 (vol. 4, pp. IV–941). IEEE.
Sebe, N., Lew, M. S., Sun, Y., Cohen, I., Gevers, T., & Huang, T. S. (2007). Authentic facial expression analysis. Image and Vision Computing, 25(12), 1856–1863.
Article Google Scholar
Stiefelhagen, R., Fügen, C., Gieselmann, P., Holzapfel, H., Nickel, K., Waibel, A. (2004). Natural human-robot interaction using speech, head pose and gestures. In 2004 IEEE/RSJ international conference on intelligent robots and systems, 2004 (IROS 2004). Proceedings (vol. 3, pp. 2422–2427). IEEE.
Sun, N., Zheng, W., Sun, C., Zou, C., Zhao, L. (2006). Facial expression recognition based on boostingtree. In Advances in neural networks-ISNN 2006 (pp 77–84). Springer.
Townsend, J. T. (1971). Theoretical analysis of an alphabetic confusion matrix. Perception & Psychophysics, 9(1), 40–50.
Article Google Scholar
Vlasenko, B., Schuller, B., Wendemuth, A., Rigoll, G. (2007). Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing. In Affective computing and intelligent interaction (pp. 139–147). Springer.
Vogt, T., André, E., Wagner, J. (2008). Automatic recognition of emotions from speech: A review of the literature and recommendations for practical realisation. In Affect and emotion in human-computer interaction (vol. 4868, pp. 75–91). Springer.
Wu, C. H., & Liang, W. B. (2011). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transactions on Affective Computing, 2(1), 10–21.
Article Google Scholar
Yoon, W. J., Park, K. S. (2007). A study of emotion recognition and its applications. In: Modeling decisions for artificial intelligence (pp. 455–462). Springer.
Zeng, Z., Hu, Y., Roisman, G. I., Wen, Z., Fu, Y., Huang, T. S. (2007). Audio-visual spontaneous emotion recognition. In Artifical intelligence for human computing (pp. 72–90). Springer.
Zhang, S., Zhao, X., Lei, B. (2013). Speech emotion recognition using an enhanced kernel isomap for human-robot interaction. International Journal of Advanced Robotic Systems. doi:10.5772/55403.
Zhou, Z. H. (2012). Ensemble methods: Foundations and algorithms. Boca Raton: CRC Press.
Google Scholar

Download references

Acknowledgements

This work has been partially supported by Estonian Research Grant (PUT638), the Estonian Centre of Excellence in IT (EXCITE) funded by the European Regional Development Fund, Estonian-Polish Joint Research Project and the European Network on Integrating Vision and Language (iV&L Net) ICT COST Action IC1307.

Author information

Authors and Affiliations

Institute of Technology, University of Tartu, Nooruse 1, 50411, Tartu, Estonia
Fatemeh Noroozi
Institute of Mechatronics and Information Systems, Łodz University of Technology, Lodz, Poland
Tomasz Sapiński & Dorota Kamińska
iCV Research Group, Institute of Technology, University of Tartu, Nooruse 1, 50411, Tartu, Estonia
Gholamreza Anbarjafari
Department of Electrical and Electronic Engineering, Hasan Kalyoncu University, Gazinatep, Turkey
Gholamreza Anbarjafari

Authors

Fatemeh Noroozi
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Sapiński
View author publications
You can also search for this author in PubMed Google Scholar
Dorota Kamińska
View author publications
You can also search for this author in PubMed Google Scholar
Gholamreza Anbarjafari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gholamreza Anbarjafari.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Noroozi, F., Sapiński, T., Kamińska, D. et al. Vocal-based emotion recognition using random forests and decision tree. Int J Speech Technol 20, 239–246 (2017). https://doi.org/10.1007/s10772-017-9396-2

Download citation

Received: 29 June 2016
Accepted: 05 January 2017
Published: 09 February 2017
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10772-017-9396-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Vocal-based emotion recognition using random forests and decision tree

Abstract

Access this article

Similar content being viewed by others

PCA-Based Random Forest Classifier for Speech Emotion Recognition Using FFTF Features, Jitter, and Shimmer

Speech emotion recognition using multimodal feature fusion with machine learning approach

Speech Emotion Recognition Using CNN, k-NN, MLP and Random Forest

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Vocal-based emotion recognition using random forests and decision tree

Abstract

Access this article

Similar content being viewed by others

PCA-Based Random Forest Classifier for Speech Emotion Recognition Using FFTF Features, Jitter, and Shimmer

Speech emotion recognition using multimodal feature fusion with machine learning approach

Speech Emotion Recognition Using CNN, k-NN, MLP and Random Forest

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation