Robust emotional speech classification in the presence of babble noise

Karimi, Salman; Sedaaghi, Mohammad Hossein

doi:10.1007/s10772-012-9176-y

Robust emotional speech classification in the presence of babble noise

Published: 23 October 2012

Volume 16, pages 215–227, (2013)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Salman Karimi^1,2 &
Mohammad Hossein Sedaaghi¹

413 Accesses
5 Citations
Explore all metrics

Abstract

Emotional speech recognition (ESR) is a new field of research in the realm of human-computer interactions. Most of the studies in this field are performed in clean environments. Nevertheless, in the real world conditions, there are different noise and disturbance parameters such as car noise, background music, buzz and etc., which can decrease the performance of such recognizing systems. One of the most common noises which can be heard in different places is the babble noise. Because of the similarity of this kind of noise to the desired speech sounds, babble or cross-talk, is highly challenging for different speech-related systems. In this paper, in order to find the most appropriate features for ESR in the presence of babble noise with different signal to noise ratios, 286 features are extracted from speech utterances of two emotional speech datasets in German and Persian. Then the best features are selected among them using different filter and wrapper methods. Finally, different classifiers like Bayes, KNN, GMM, ANN and SVM are used for the selected features in two ways, namely multi-class and binary classifications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Banziger, T., Tran, V., & Scherer, K. R. (2005). The Geneva emotion wheel: a tool for the verbal report of emotional reactions. In ISRE’05 proceedings. Bari: ISRE.
Google Scholar
Beritelli, F., Casale, S., & Cavallaro, A. (1998). A robust voice activity detector for wireless communications using soft computing. IEEE Journal on Selected Areas in Communications, 16(9), 1818–1829.
Article Google Scholar
Bremner, D., Demaine, E., Erickson, J., Iacono, J., Langerman, S., Morin, P., & Toussaint, G. (2005). Output-sensitive algorithms for computing nearest-neighbor decision boundaries. Discrete and Computational Geometry, 33(4), 593–604.
Article MathSciNet MATH Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Interspeech’05 proceedings. Lisbon: Interspeech.
Google Scholar
Everitt, B. S., & Hand, D. J. (1981). Finite mixture distributions. New York: Chapman and Hall.
Book MATH Google Scholar
Grimm, M., Kroschel, K., & Harris, H. (2007). On the necessity and feasibility of detecting a driver’s emotional state while driving. In A. Paiva, R. Prada, & R. W. Picard (Eds.), Affective computing and intelligent interaction (pp. 126–138). Berlin: Springer.
Chapter Google Scholar
Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis for speech. Journal of Acoustic Society, America, 1738–1752.
Hermansky, H., Morgan, N., Bayya, A., & Kohn, P. (1991). RASTA_PLP speech analysis (TR-91-069).
Hess, W. J. (1992). Pitch and voicing determination. In S. Furui & M. M. Sondhi (Eds.) Advances in speech signal processing. New York: Marcel Dekker.
Google Scholar
Kim, E. H., Hyun, K. H., & Kwak, Y. K. (2005). Robust emotion recognition feature, frequency range of meaningful signal. In ROMAN’05 proceedings, Nashville, TN, USA.
Google Scholar
Krishnamurthy, N., & Hansen, J. H. L. (2009). Babble noise: modeling, analysis, and applications. IEEE Transactions on Audio, Speech, and Language Processing, 17(7), 1394–1407.
Article Google Scholar
Lane, H., & Tranel, B. (1971). The lombard sign and the role of hearing in speech. Journal of Speech and Hearing Research, 14, 677–709.
Google Scholar
Lee, K. K., Cho, Y. H., & Park, K. S. (2006). Robust feature extraction for mobile-based speech emotion recognition system. In Lecture notes in control and information sciences. Intelligent computing in signal processing and pattern recognition (pp. 470–477). Berlin: Springer.
Chapter Google Scholar
Loizou, P. (2003). Colea: a MATLAB software-tool for speech analysis. University of Arkansas.
Markel, J. D., & Gray, A. H. (1976). Linear prediction of speech. Berlin: Springer.
Book MATH Google Scholar
McGilloway, S., Cowie, R., & Douglas-Cowi, E. (2000). Approaching automatic recognition of emotion from voice: a rough benchmark. In ISCAWSE’00 proceedings. Newcastle: ISCAWSE.
Google Scholar
Ning, T., & Whiting, S. (1990). Power spectrum estimation is a orthogona1 transformation. In ASSP’90 proceedings (pp. 2523–2526).
Google Scholar
Pearce, D., & Hirsch, H. G. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy. In ICSLP’00 proceedings. Beijing: ICSLP.
Google Scholar
Rabiner, L. R., & Sambur, M. R. (1977). Voiced-unvoiced-silence detection using itakura LPC distance measure. In ASSP’77 proceedings (pp. 323–326).
Google Scholar
Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs: Prentice-Hall.
Google Scholar
Shah, F., Krishnan, V., Sukumar, R., Jayakumar, A., & Anto, B. (2009). Speaker independent automatic emotion recognition from speech, a comparison of MFCCs and discrete wavelet transforms. In ARTCC’09 proceedings (pp. 528–531).
Google Scholar
Sedaaghi, M. H. (2008a). Gender classification in emotional speech. In F. Mihelic, & J. Zibert (Eds.) Speech recognition: technologies and applications. Vienna: I-Tech (Chap. 20).
Google Scholar
Sedaaghi, M. H. (2008b). Documentation of the sahand emotional speech database (SES) (Technical Report). Department of Electrical Eng., Sahand Univ. of Tech, Iran.
Sedaaghi, M. H., Kotropoulos, C., & Ververidis, D. (2007). Using adaptive genetic algorithms to improve speech emotion recognition. In MMSP’07 proceedings (pp. 461–464). Greece: MMSP.
Google Scholar
Schuller, B., Steidl, S., & Batliner, A. (2009). The INTERSPEECH 2009 emotion challenge. In ISCA’09 proceedings (pp. 312–315). Brighton: ISCA.
Google Scholar
Snell, R. C., & Milinazzo, F. (1993). Formant location from LPC analysis data. IEEE Transactions on Speech and Audio Processing, 1(2), 129–134.
Article Google Scholar
Tawari, A., & Trivedi, M. (2010). Speech emotion analysis in noisy real-world environment. In ICPR’10 proceedings (pp. 4605–4608). Istanbul: ICPR.
Google Scholar
Ververidis, D., & Kotropoulos, C. (2006a). Emotional speech recognition: resources, features, and methods. Speech Communication, 48, 1162–1181.
Article Google Scholar
Ververidis, D., & Kotropoulos, C. (2006b). Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections. In EUSIPCO’06 proceedings. Italy: EUSIPCO.
Google Scholar
Wechsler, J. D. (1994). Detection of human speech in structured noise. In ASSP’94 proceedings (pp. 237–240).
Google Scholar
Yoon, W. J., Cho, Y. H., & Park, K. S. (2007). A study of speech emotion recognition and its application to mobile services. In Lecture notes in computer science. Ubiquitous intelligence and computing (pp. 758–766). Berlin: Springer.
Chapter Google Scholar
You, M., Chen, C., Bu, J., Liu, J., & Tao, J. (2007). Manifolds based emotion recognition in speech. Computational Linguistics and Chinese Language Processing, 12(1), 49–64.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Sahand University of Technology, Tabriz, 51335-1996, Iran
Salman Karimi & Mohammad Hossein Sedaaghi
Department of Electrical Engineering, University of Ayat-Allah Boroujrerdi, Boroujerd, Iran
Salman Karimi

Authors

Salman Karimi
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Hossein Sedaaghi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Salman Karimi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karimi, S., Sedaaghi, M.H. Robust emotional speech classification in the presence of babble noise. Int J Speech Technol 16, 215–227 (2013). https://doi.org/10.1007/s10772-012-9176-y

Download citation

Received: 08 July 2012
Accepted: 03 October 2012
Published: 23 October 2012
Issue Date: June 2013
DOI: https://doi.org/10.1007/s10772-012-9176-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust emotional speech classification in the presence of babble noise

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

A Deep Learning Framework for Audio Deepfake Detection

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust emotional speech classification in the presence of babble noise

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

A Deep Learning Framework for Audio Deepfake Detection

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation