Skip to main content
Log in

Segment based emotion recognition using combined reduced features

  • S.I.: Emotion Recognition in Speech
  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The attitude of a human being involves with their emotions. Emotions can be observed in either verbally or visually or both. Verbal emotion recognition is a difficult task and an area of speech processing. It has a wide variety of applications in almost all fields. In this work, the authors have tried to recognize five types of emotion as anger, sadness, happiness, fear, and neutral. The work is focussed on the choice of spectral feature computation. For such purpose, Mel-frequency Cepstral coefficients (MFCC), spectral roll-off, spectral centroid and spectral flux are considered on frame-level extraction. Some of these features need to be reduced, combined, and balanced. The combined methods are verified and observed the effectiveness of results. The resulting features are used with neural network (NN) based models for recognition purpose. The models of multilayer perceptron (MLP), radial basis function network (RBFN), probabilistic neural network (PNN) and deep neural network (DNN) are considered and tested for the chosen features. It is observed that less amount of features provides reliable accuracy in case of PNN. The same utilizes less time for training and testing in case of MLP, RBFN, and PNN. However, DNN is not suitable for fewer amounts of features. It requires large data for better accuracy in the particular field. The results support the PNN with an average accuracy of 96.9% with low-dimensional feature sets, whereas the average accuracy of MLP, RBFN, DNN models found 90.1%, 92.7%, and 73.6% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig.8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Al-Shoshan, A. I. (2006). Speech and music classification and separation: A review. Journal of King Saud University,19(1), 95–133.

    Article  Google Scholar 

  • Bhattacharjee, D., Basu, D. K., Nasipuri, M., & Kundu, M. (2010). Reduction of feature vectors using rough set theory for human face recognition. CoRR abs/1005 (pp. 40–44)

  • Bitouk, D., Verma, R., & Nenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication,52(7), 613–625.

    Article  Google Scholar 

  • Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. In Ninth European Conference on Speech Communication and Technology, Interspeech, (pp. 1517–1520).

  • Burred, J. J., & Lerch, A. (2004). Hierarchical automatic audio signal classification. Journal of the Audio Engineering Society,52(7/8), 724–739.

    Google Scholar 

  • Chiou, B. C., & Chen, C. P. (2013, October). Feature space dimension reduction in speech emotion recognition using support vector machine. In Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific (pp. 1–6). IEEE.

  • Delac, K., Grgic, M., & Grgic, S. (2005). Independent comparative study of PCA, ICA, and LDA on the FERET data set. International Journal of Imaging Systems and Technology,15(5), 252–260.

    Article  Google Scholar 

  • Fewzee, P., & Karray, F. (2012). Dimensionality reduction for emotional speech recognition. In Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Conference on Social Computing (SocialCom) (pp. 532–537). IEEE.

  • Gamage, K. W., Sethu, V., Le, P. N., & Ambikairajah, E. (2015, December). An i-vector GPLDA system for speech based emotion recognition. In Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific (pp. 289–292). IEEE.

  • Gomes, J., & El-Sharkawy, M. (2016). Implementation of i-vector algorithm in speech emotion recognition by using two different classifiers: Gaussian mixture model and support vector machine. International Journal of Advanced Research in Computer Science and Software Engineering,6(9), 8–16.

    Google Scholar 

  • Haq, S., & Jackson, P. J. (2010). Multimodal emotion recognition. In Machine audition: Principles, algorithms and systems, 398–423.

  • Haykins, S. (2006). Neural networks: A comprehensive foundation (2nd ed.). Delhi, India: Pearson Education.

    Google Scholar 

  • Jiang, J., Wu, Z., Xu, M., Jia, J., & Cai, L. (2013). Comparing feature dimension reduction algorithms for GMM-SVM based speech emotion recognition. In Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific (pp. 1–4). IEEE.

  • Kaushik, R., Sharma, M., Sarma, K. K., & Kaplun, D. I. (2016). I-vector based emotion recognition in assamese speech. International Journal of Engineering and Future Technology,1(1), 111–124.

    Google Scholar 

  • Khanna, P., & Kumar, M. S. (2011). Application of vector quantization in emotion recognition from human speech. In International conference on information intelligence, systems, technology and management (pp. 118–125). Springer, Berlin, Heidelberg.

  • Koolagudi, S. G., Murthy, Y. S., & Bhaskar, S. P. (2018). Choice of a classifier, based on properties of a dataset: Case study-speech emotion recognition. International Journal of Speech Technology,21(1), 167–183.

    Article  Google Scholar 

  • Lopez-Otero, P., Dacia-Fernandez, L., & Garcia-Mateo, C. (2014). A study of acoustic features for depression detection. In 2014 International Workshop on Biometrics and Forensics (IWBF) (pp. 1–6). IEEE.

  • Low, L. S. A., Maddage, N. C., Lech, M., Sheeber, L. B., & Allen, N. B. (2011). Detection of clinical depression in adolescents’ speech during family interactions. IEEE Transactions on Biomedical Engineering,58(3), 574–586.

    Article  Google Scholar 

  • Mao, K. Z., Tan, K. C., & Ser, W. (2000). Probabilistic neural-network structure determination for pattern classification. IEEE Transactions on Neural Networks,11(4), 1009–1016.

    Article  Google Scholar 

  • Martínez, A. M., & Kak, A. C. (2001). Pca versus lda. IEEE Transactions on Pattern Analysis and Machine Intelligence,23(2), 228–233.

    Article  Google Scholar 

  • Mohanty, M. N., & Routray, A. (2015). Machine learning approach for emotional speech classification. Springer International Publishing Switzerland 2015, SEMCCO 2014, LNCS 8947, Chap. 43 (pp. 1–12).

  • Navarrete, P., & Ruiz-del-Solar, J. (2002). Analysis and comparison of eigenspace-based face recognition approaches. International Journal of Pattern Recognition and Artificial Intelligence,16(07), 817–830.

    Article  Google Scholar 

  • Ooi, K. E. B., Lech, M., & Allen, N. B. (2013). Multichannel weighted speech classification system for prediction of major depression in adolescents. IEEE Trans. Biomed. Engineering,60(2), 497–506.

    Article  Google Scholar 

  • Palo, H. K., & Mohanty, M. N. (2017). Wavelet based feature combination for recognition of emotions. Ain Shams Engineering Journal. https://doi.org/10.1016/j.asej.2016.11.001.

    Article  Google Scholar 

  • Palo, H. K., Mohanty, M. N., & Chandra, M. (2016). Efficient feature combination techniques for emotional speech classification. International Journal of Speech Technology,19(1), 135–150.

    Article  Google Scholar 

  • Parthasarathy, S., Cowie, R., & Busso, C. (2016). Using agreement on direction of change to build rank-based emotion classifiers. IEEE/ACM Transactions on Audio, Speech, and Language Processing,24(11), 2108–2121.

    Article  Google Scholar 

  • Peeters, G., Giordano, B. L., Susini, P., Misdariis, N., & McAdams, S. (2011). The timbre toolbox: Extracting audio descriptors from musical signals. The Journal of the Acoustical Society of America,130(5), 2902–2916.

    Article  Google Scholar 

  • Přibil, J., & Přibilová, A. (2013). Evaluation of influence of spectral and prosodic features on GMM classification of Czech and Slovak emotional speech. EURASIP Journal on Audio, Speech, and Music Processing,2013(1), 8. https://doi.org/10.1186/1687-4722-2013-8.

    Article  Google Scholar 

  • Quan, C., Wan, D., Zhang, B., & Ren, F. (2013, December). Reduce the dimensions of emotional features by principal component analysis for speech emotion recognition. In 2013 IEEE/SICE International Symposium on System Integration (SII) (pp. 222–226). IEEE.

  • Quan, C., Zhang, B., Sun, X., & Ren, F. (2017). A combined cepstral distance method for emotional speech recognition. International Journal of Advanced Robotic Systems,14(4), 1–9.

    Article  Google Scholar 

  • Rabiner, L. R., & Schafer, R. W. (2007). Introduction to digital speech processing. Foundations and Trends in Signal Processing,1(1–2), 1–194.

    Article  Google Scholar 

  • Schuller, B., Valstar, M., Eyben, F., McKeown, G., Cowie, R., & Pantic, M. (2011). Avec 2011–the first international audio/visual emotion challenge. In Affective Computing and Intelligent Interaction (pp. 415–424). Springer, Berlin, Heidelberg.

  • Sivanandam, S. N., & Deepa, D. S. (2011). Principle of soft computing (2nd ed.). India: Wiley.

    Google Scholar 

  • Specht, D. F., & Romsdahl, H. (1994). Experience with adaptive probabilistic neural network and adaptive general regression neural network. IEEE/INNS International Joint Conference, Neural Network,2, 203–1208.

    Google Scholar 

  • Stolar, M. N., Lech, M., Stolar, S. J., & Allen, N. B. (2018). Detection of adolescent depression from speech using optimised spectral roll-off parameters. Biomedical Journal of Scientific & Technical Research. https://doi.org/10.26717/BJSTR.2018.05.001156.

    Article  Google Scholar 

  • Tao, Y., Wang, K., Yang, J., An, N., & Li, L. (2015). Harmony search for feature selection in speech emotion recognition. In 2015 International Conference on Affective Computing and Intelligent Interaction (ACII) (pp. 362–367). IEEE.

  • Wang, K., An, N., & Li, L. (2014). Speech emotion recognition based on wavelet packet coefficient model. In 2014 9th International Symposium on Chinese Spoken Language Processing (ISCSLP) (pp. 478–482). IEEE.

  • Wang, K., An, N., Li, B. N., & Zhang, Y. (2015). Speech emotion recognition using Fourier parameters. IEEE Transaction on Affective Computing,6(1), 69–75.

    Article  Google Scholar 

  • Wenjing, H., Haifeng, L., & Chunyu, G. (2009). A hybrid speech emotion perception method of VQ-based feature processing and ANN recognition. In WRI Global Congress on Intelligent Systems, 2009. GCIS’09 (Vol. 2, pp. 145–149). IEEE.

  • Wu, S., Falk, T. H., & Chan, W.-Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, Elsevier,53, 768–785.

    Article  Google Scholar 

  • Xu, X., Deng, J., Zheng, W., Zhao, L., & Schuller, B. (2015). Dimensionality reduction for speech emotion features by multiscale kernels. In Sixteenth Annual Conference of the International Speech Communication Association, Interspeech 2015 (pp. 1532–1536)

  • Yuan, J., Chen, L., Fan, T., & Jia, J. (2015). Dimension reduction of speech emotion feature based on weighted linear discriminant analysis. International Journal of Signal Processing, Image Processing and Pattern Recognition,8(11), 299–308.

    Article  Google Scholar 

  • Zwick, W. R., & Velicer, W. F. (1986). Comparison of five rules for determining the number of components to retain. Psychological Bulletin,99(3), 432.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mihir Narayan Mohanty.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mohanty, M.N., Palo, H.K. Segment based emotion recognition using combined reduced features. Int J Speech Technol 22, 865–884 (2019). https://doi.org/10.1007/s10772-019-09628-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-019-09628-3

Keywords

Navigation