Skip to main content
Log in

Text-independent speech emotion recognition using frequency adaptive features

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper a text-independent emotional speech feature extraction method is studied based on various spectral frequency bands of speech formants. First, the speech emotional feature analysis is performed for different text contents. Various sentences are involved to study the phonetic influences. Formant frequencies are grouped into different classes in order to reduce the text variability. These speech features are sensitive to phonetic changes in the sentence. Speaker emotions are then modeled in different formant groups. Second, adaptive fundamental frequency and Teager Energy Operator are constructed in different frequency bands. The Teager frequency bands are dynamically adapted to different pitch and formant distributions. The proposed adaptation is sensitive to emotional changes in speech, as shown in statistics of mean, variance, maximum and minimum. Statistics on the basic acoustic parameters are used as the emotional features. Experimental results show that the proposed emotional features are robust against text changes with the lowest variance value of 0.034. The final recognition results for six major emotion types are improved constantly and 5 percent improvement for sadness and 3.3 percent improvement for boredom are observed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Ali S A, Khan A, Bashir N (2015) Analyzing the impact of prosodic feature (pitch) on learning classifiers for speech emotion corpus. Int J Inf Tech Comput Sci 7 (2):54

    Google Scholar 

  2. Augustine N, Srinivasan CR, Richards K (2015) Speech emotion recognition system using both spectral and prosodic features. Adv Res Elect Elect Eng 2(10):50–55

    Google Scholar 

  3. Boudraa AO, Cexus JC, Salzenstein F (2004) If estimation using empirical mode decomposition and nonlinear teager energy operator. In: International Symposium on Control Communications and Signal Processing, pp 45–48

  4. Cambria E (2016) Affective computing and sentiment analysis. IEEE Trans Intelligent Syst 31(2):102–107

    Article  Google Scholar 

  5. Chen M, Zhang Y, Li Y (2015) Aiwac: affective interaction through wearable computing and cloud technology. IEEE Trans Wirel Commun 22(1):20–27

    Article  Google Scholar 

  6. Cui J, Ye L, Yuandong X, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low- and high-dimensional approaches. IEEE Trans Syst Man Cybern Syst Hum 43(4):996–1002

    Article  Google Scholar 

  7. Gao H, Chen S, Su G (2007) Emotion classification of mandarin speech based on teo nonlinear features. In: 8th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, vol 3, pp 394–398

  8. Huang C (2013) Research on seveal key technologies in practical speech emotion recognition. School of information science and engineering. Southeast University, Nanjing

    Google Scholar 

  9. Lanjewar R B, Mathurkar S, Patel N (2015) Implementation and comparison of speech emotion recognition system using gaussian mixture model and k-nearest neighbor techniques. Procedia Comput Sci 49(1):50–57

    Article  Google Scholar 

  10. Li X, Li X (2011) Speech emotion recognition using novel hht-teo based features. J Comput 6(5):989–998

    Google Scholar 

  11. Li X, Li X, Hu C, Lu X (2013) Design and implementation of speech emotion interaction system based on teager for intelligent robot. Chin J Sci Instrum 34(8):123–124

    Google Scholar 

  12. Liu L, Cheng L, Liu Y, JiaDavid Y, Rosenblum S (2016) Recognizing complex activities by a probabilistic interval-based model. In: 13th AAAI conference on artificial intelligence (AAAI-16), pp 1266–1272

  13. Liu Y, Nie L, Han L, ZhangDavid L, Rosenblum S (2015) Action2Activity: recognizing complex activities from sensor data. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence (IJCAI 2015), pp 1617–1623

  14. Liu Y, Nie L, LiuDavid L, Rosenblum S (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115

    Article  Google Scholar 

  15. Liu Y, Zhang L, Nie L, YanDavid Y, Rosenblum S (2016) Fortune teller: predicting your career path. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), pp 201–207

  16. Liu Y, Zheng Y, Liang Y, LiuDavid S, Rosenblum S (2016) Urban Water Quality Prediction based on Multi-task Multi-view Learning. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI 2016), pp 1–7

  17. Pankratova A, Zyryanova N (2014) The relationship of emotion intelligence with intelligence and personality. Personal Individ Differ 60:75

    Article  Google Scholar 

  18. Preotiuc-Pietro D, Liu Y, HopkinsL DJ (2017) Ungar, beyond binary labels: political ideology prediction of twitter users. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, July 30 - August 4, pp 729–740

  19. Shah M, Chakrabarti CS, Spanias A (2015) Within and cross-corpus speech emotion recognition using latent topic model-based features. EURASIP J Audio, Speech, Music Process 2015(1):1–17

    Article  Google Scholar 

  20. Song P, Zheng W, Liu J (2015) A novel speech emotion recognition method via transfer pca and sparse coding. Biometric Recognition 12(1):393–400

    Article  Google Scholar 

  21. Wang C, Yan J, Zhou A, He X (2017) Transductive non-linear learning for chinese hypernym prediction. In: Meeting of the Association for Computational Linguistics, Vancouver, Canada, July 30 - August 4, 2017, pp 1394–1404

  22. Zhang X, Zhang H, Nie S, Gao G, Liu W (2015) A pairwise algorithm using the deep stacking network for speech separation and pitch estimation. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp 246–250

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 11401412) and Natural Science Foundation of Jiangsu Province of China (No. BK20150342). The authors would like to thank NVIDIA for their generous donation of Titan X GPU, which eased the computation burden for expression modeling.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, C., Huang, C. & Chen, H. Text-independent speech emotion recognition using frequency adaptive features. Multimed Tools Appl 77, 24353–24363 (2018). https://doi.org/10.1007/s11042-018-5742-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5742-x

Keywords

Navigation