Skip to main content
Log in

Emotional information processing based on feature vector enhancement and selection for human–computer interaction via speech

  • Published:
Telecommunication Systems Aims and scope Submit manuscript

Abstract

This paper proposes techniques for enhancement and selection of emotional feature vectors to correctly process emotional information from users’ spoken data. In real-world devices, speech signals may contain emotional information that is distorted or anomalous owing to environmental noises and the acoustic similarities between emotions. To correctly enhance harmonics of the noise-contaminated speech and thereby utilize them as emotional features, we propose a modified adaptive comb filter, in which the frequency response of the conventional comb filter is re-estimated on the basis of speech presence probability. In addition, to eliminate acoustically anomalous emotional data, we propose a feature vector classification scheme. In this approach, emotional feature vectors are categorized as either discriminative or indiscriminative in an iterative manner, and then only the discriminative vectors are selected for emotional information processing. In emotion recognition experiments using noise-contaminated emotional speech data, our approach exhibited superior performance over the conventional approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Sharp, H., Rogers, Y., & Preece, J. (2007). Interaction design: Beyond human computer interaction (2nd ed.). New York: Wiley.

    Google Scholar 

  2. Cristescu, I. (2008). Emotions in human–computer interaction: The role of nonverbal behavior in interactive systems. Informatica Economica, 2(46), 110–116.

    Google Scholar 

  3. Ramakrishnan, S., & El Emary, I. (2011). Speech emotion recognition approaches in human computer interaction. Telecommunication Systems, 52, 1467–1478. doi:10.1007/s11235-011-9624-z.

    Article  Google Scholar 

  4. Adams, B., Breazeal, C., Brooks, R., & Scassellati, B. (2000). Humanoid robots: A new kind of tool. IEEE Intelligent Systems and Their Applications, 15(4), 25–31.

    Article  Google Scholar 

  5. Kim, E., Hyun, K., Kim, S., & Kwak, Y. (2007). Emotion interactive robot focus on speaker independently emotion recognition. In Proceedings of International Conference on Advanced Intelligent Mechatronics.

  6. Neerincx, M., & Streefkerk, J. (2003). Interacting in desktop and mobile context: Emotion, trust, and task performance. Ambient Intelligence, 2875, 119–132.

    Article  Google Scholar 

  7. Pittermann, J., Pittermann, A., & Minker, W. (2010). Handling emotions in human-computer dialogues. New York: Springer.

    Book  Google Scholar 

  8. Richins, M. (1997). Measuring emotions in the consumption experience. Journal of Consumer Research, 24, 127–146.

    Article  Google Scholar 

  9. Cowie, R., Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., et al. (2001). Emotion recognition in human–computer interaction. IEEE Signal Processing Magazine, 18, 32–80.

    Article  Google Scholar 

  10. Nwe, T., Foo, S., & Silva, L. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623.

    Article  Google Scholar 

  11. Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162–1181.

    Article  Google Scholar 

  12. Kwon, O., Chan, K., Hao, J., & Lee, T. (2003). Emotion recognition by speech signals. In Proceedings of Eurospeech (pp. 125–128).

  13. Tato, R., Santos, R., Kompe, R., & Pardo, J. (2002). Emotional space improves emotion recognition. In Proceedings of International Conference on Spoken Language Processing (ICSLP) (pp. 2029–2032).

  14. Huang, R., & Ma, C. (2006). Toward a speaker-independent real time affect detection system. In Proceedings of International Conference on Pattern Recognition (ICPR) (pp. 1204–1207).

  15. Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: Towards a new generation of databases. Speech Communication, 40(1), 33–60.

    Article  Google Scholar 

  16. Kacur, J., & Rozinaj, G. (2011). Building accurate and robust HMM models for practical ASR systems. Telecommunication Systems, 52, 1683–1696. doi:10.1007/s11235-011-9660-8.

    Article  Google Scholar 

  17. Bilmes, J. (1997). A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian Mixture and Hidden Markov Models. Technical Report, University of Berkeley, ICSI-TR-97-021.

  18. Gong, Y. (1995). Speech recognition in noisy environments: A survey. Speech Communication, 16, 261–291.

    Article  Google Scholar 

  19. Jiucang, H., Attias, H., Nagarajan, S., Te-Won, L., & Sejnowski, T. J. (2009). Speech enhancement, gain, and noise spectrum adaptation using approximate Bayesian estimation. IEEE Transactions on Audio, Speech, and Language Processing, 17(1), 24–37.

    Article  Google Scholar 

  20. Frazier, R., & Samsam, S. (1976). Enhancement of speech by adaptive filtering. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 251–253).

  21. Nehorai, A., & Porat, B. (1986). Adaptive comb filtering for harmonic signal enhancement. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(5), 1124–1138.

    Article  Google Scholar 

  22. Malah, D., Cox, R., & Accardi, A. (Mar. 1999). Tracking speech-presence uncertainty to improve speech enhancement in nonstationary noise environments. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 201–204).

  23. Kim, H., & Rose, R. (2003). Cepstrum-domain acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments. IEEE Transactions on Speech and Audio Processing, 11(5), 435–446.

    Article  Google Scholar 

  24. Kwon, S., & Narayanan, S. (2007). Robust speaker identification based on selective use of feature vectors. Pattern Recognition Letters, 28, 85–89.

    Article  Google Scholar 

  25. Park, J., Kim, J., & Oh, Y. (2008). Speaker-independent emotion recognition based on feature vector classification. In Proceedings of Interspeech.

  26. Jiang, H. (2005). Confidence measures for speech recognition : A survey. Speech Communication, 45(4), 455–470.

    Article  Google Scholar 

  27. Liberman, M., Davis, K., Grossman, M., Martey, N., & Bell, J. (2002). Emotional prosody speech and transcripts, Linguistic Data Consortium (LDC). Philadelphia, USA: University of Pennsylvania.

    Google Scholar 

  28. Varga, A., Steenken, H., Tomlinson, M., & Jones, D. (1992). The NOISEX-92 study on the effect of additive noise on automatic speech recognition. Technical Report, DRA Speech Research Unit.

  29. Sethu, V., Ambikairajah, E., & Epps, J. (2007). Speaker normalization for speech-based emotion detection. In Proceedings of International Conference on Digital Signal Processing (pp. 611–614).

  30. Veeneman, D., & Mazor, B. (1989). A fully adaptive comb filter for enhancing block-coded speech. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(6), 955–957.

    Article  Google Scholar 

  31. Kamath, S., & Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 101–111).

  32. Kim, Jun Woo, & Sok, Yun Young. (2012). A simple job shop scheduling game for industrial engineering students. Journal of Future Game Technology, 2(2), 165–170.

    Google Scholar 

Download references

Acknowledgments

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2014R1A1A2057751).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ji-hwan Kim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Park, Js., Kim, Jh. Emotional information processing based on feature vector enhancement and selection for human–computer interaction via speech. Telecommun Syst 60, 201–213 (2015). https://doi.org/10.1007/s11235-015-0023-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11235-015-0023-8

Keywords

Navigation