Skip to main content
Log in

Robust emotional speech classification in the presence of babble noise

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Emotional speech recognition (ESR) is a new field of research in the realm of human-computer interactions. Most of the studies in this field are performed in clean environments. Nevertheless, in the real world conditions, there are different noise and disturbance parameters such as car noise, background music, buzz and etc., which can decrease the performance of such recognizing systems. One of the most common noises which can be heard in different places is the babble noise. Because of the similarity of this kind of noise to the desired speech sounds, babble or cross-talk, is highly challenging for different speech-related systems. In this paper, in order to find the most appropriate features for ESR in the presence of babble noise with different signal to noise ratios, 286 features are extracted from speech utterances of two emotional speech datasets in German and Persian. Then the best features are selected among them using different filter and wrapper methods. Finally, different classifiers like Bayes, KNN, GMM, ANN and SVM are used for the selected features in two ways, namely multi-class and binary classifications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Banziger, T., Tran, V., & Scherer, K. R. (2005). The Geneva emotion wheel: a tool for the verbal report of emotional reactions. In ISRE’05 proceedings. Bari: ISRE.

    Google Scholar 

  • Beritelli, F., Casale, S., & Cavallaro, A. (1998). A robust voice activity detector for wireless communications using soft computing. IEEE Journal on Selected Areas in Communications, 16(9), 1818–1829.

    Article  Google Scholar 

  • Bremner, D., Demaine, E., Erickson, J., Iacono, J., Langerman, S., Morin, P., & Toussaint, G. (2005). Output-sensitive algorithms for computing nearest-neighbor decision boundaries. Discrete and Computational Geometry, 33(4), 593–604.

    Article  MathSciNet  MATH  Google Scholar 

  • Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Interspeech’05 proceedings. Lisbon: Interspeech.

    Google Scholar 

  • Everitt, B. S., & Hand, D. J. (1981). Finite mixture distributions. New York: Chapman and Hall.

    Book  MATH  Google Scholar 

  • Grimm, M., Kroschel, K., & Harris, H. (2007). On the necessity and feasibility of detecting a driver’s emotional state while driving. In A. Paiva, R. Prada, & R. W. Picard (Eds.), Affective computing and intelligent interaction (pp. 126–138). Berlin: Springer.

    Chapter  Google Scholar 

  • Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis for speech. Journal of Acoustic Society, America, 1738–1752.

  • Hermansky, H., Morgan, N., Bayya, A., & Kohn, P. (1991). RASTA_PLP speech analysis (TR-91-069).

  • Hess, W. J. (1992). Pitch and voicing determination. In S. Furui & M. M. Sondhi (Eds.) Advances in speech signal processing. New York: Marcel Dekker.

    Google Scholar 

  • Kim, E. H., Hyun, K. H., & Kwak, Y. K. (2005). Robust emotion recognition feature, frequency range of meaningful signal. In ROMAN’05 proceedings, Nashville, TN, USA.

    Google Scholar 

  • Krishnamurthy, N., & Hansen, J. H. L. (2009). Babble noise: modeling, analysis, and applications. IEEE Transactions on Audio, Speech, and Language Processing, 17(7), 1394–1407.

    Article  Google Scholar 

  • Lane, H., & Tranel, B. (1971). The lombard sign and the role of hearing in speech. Journal of Speech and Hearing Research, 14, 677–709.

    Google Scholar 

  • Lee, K. K., Cho, Y. H., & Park, K. S. (2006). Robust feature extraction for mobile-based speech emotion recognition system. In Lecture notes in control and information sciences. Intelligent computing in signal processing and pattern recognition (pp. 470–477). Berlin: Springer.

    Chapter  Google Scholar 

  • Loizou, P. (2003). Colea: a MATLAB software-tool for speech analysis. University of Arkansas.

  • Markel, J. D., & Gray, A. H. (1976). Linear prediction of speech. Berlin: Springer.

    Book  MATH  Google Scholar 

  • McGilloway, S., Cowie, R., & Douglas-Cowi, E. (2000). Approaching automatic recognition of emotion from voice: a rough benchmark. In ISCAWSE’00 proceedings. Newcastle: ISCAWSE.

    Google Scholar 

  • Ning, T., & Whiting, S. (1990). Power spectrum estimation is a orthogona1 transformation. In ASSP’90 proceedings (pp. 2523–2526).

    Google Scholar 

  • Pearce, D., & Hirsch, H. G. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy. In ICSLP’00 proceedings. Beijing: ICSLP.

    Google Scholar 

  • Rabiner, L. R., & Sambur, M. R. (1977). Voiced-unvoiced-silence detection using itakura LPC distance measure. In ASSP’77 proceedings (pp. 323–326).

    Google Scholar 

  • Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Shah, F., Krishnan, V., Sukumar, R., Jayakumar, A., & Anto, B. (2009). Speaker independent automatic emotion recognition from speech, a comparison of MFCCs and discrete wavelet transforms. In ARTCC’09 proceedings (pp. 528–531).

    Google Scholar 

  • Sedaaghi, M. H. (2008a). Gender classification in emotional speech. In F. Mihelic, & J. Zibert (Eds.) Speech recognition: technologies and applications. Vienna: I-Tech (Chap. 20).

    Google Scholar 

  • Sedaaghi, M. H. (2008b). Documentation of the sahand emotional speech database (SES) (Technical Report). Department of Electrical Eng., Sahand Univ. of Tech, Iran.

  • Sedaaghi, M. H., Kotropoulos, C., & Ververidis, D. (2007). Using adaptive genetic algorithms to improve speech emotion recognition. In MMSP’07 proceedings (pp. 461–464). Greece: MMSP.

    Google Scholar 

  • Schuller, B., Steidl, S., & Batliner, A. (2009). The INTERSPEECH 2009 emotion challenge. In ISCA’09 proceedings (pp. 312–315). Brighton: ISCA.

    Google Scholar 

  • Snell, R. C., & Milinazzo, F. (1993). Formant location from LPC analysis data. IEEE Transactions on Speech and Audio Processing, 1(2), 129–134.

    Article  Google Scholar 

  • Tawari, A., & Trivedi, M. (2010). Speech emotion analysis in noisy real-world environment. In ICPR’10 proceedings (pp. 4605–4608). Istanbul: ICPR.

    Google Scholar 

  • Ververidis, D., & Kotropoulos, C. (2006a). Emotional speech recognition: resources, features, and methods. Speech Communication, 48, 1162–1181.

    Article  Google Scholar 

  • Ververidis, D., & Kotropoulos, C. (2006b). Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections. In EUSIPCO’06 proceedings. Italy: EUSIPCO.

    Google Scholar 

  • Wechsler, J. D. (1994). Detection of human speech in structured noise. In ASSP’94 proceedings (pp. 237–240).

    Google Scholar 

  • Yoon, W. J., Cho, Y. H., & Park, K. S. (2007). A study of speech emotion recognition and its application to mobile services. In Lecture notes in computer science. Ubiquitous intelligence and computing (pp. 758–766). Berlin: Springer.

    Chapter  Google Scholar 

  • You, M., Chen, C., Bu, J., Liu, J., & Tao, J. (2007). Manifolds based emotion recognition in speech. Computational Linguistics and Chinese Language Processing, 12(1), 49–64.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Salman Karimi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karimi, S., Sedaaghi, M.H. Robust emotional speech classification in the presence of babble noise. Int J Speech Technol 16, 215–227 (2013). https://doi.org/10.1007/s10772-012-9176-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-012-9176-y

Keywords

Navigation