A Study of Speech Emotion Recognition and Its Application to Mobile Services

Yoon, Won-Joong; Cho, Youn-Ho; Park, Kyu-Sik

doi:10.1007/978-3-540-73549-6_74

Won-Joong Yoon¹,
Youn-Ho Cho¹ &
Kyu-Sik Park¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4611))

Included in the following conference series:

International Conference on Ubiquitous Intelligence and Computing

1634 Accesses
14 Citations

Abstract

In this paper, a speech emotion recognition agent for mobile communication service is proposed. The proposed system can recognize five emotional states - neutral, happiness, sadness, anger, and annoyance from the speech captured by a cellular phone in real time and then it calculates the degree of affection such as love, truthfulness, weariness, trick, friendship of the person who you are interesting to know through the mobile phone. In general, a speech acquired by a cellular phone contains noise due to the mobile network and environmental noise. Thus it can causes serious performance degradation due to the distortion in emotional features of the query speech. In order to alleviate the effect of these noises, we adopt a MA (Moving Average) filter which has relatively simple structure and low computational complexity. Then a feature optimization method is implemented to further improve and stabilize the system performance. For a practical application, we create an agent that can measure the degree of affection from the person who you want to know on the mobile phone. Two pattern classification methods, k-NN and SVM with probability estimates, are compared for estimating the degree of affection. The experimental results indicate that the proposed method provides very stable and successful emotional classification performance as 72.5% over five emotional states and it shows the feasibility of the agent for mobile communication services.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dellaert, F., Polzin, T., Waibel, A.: Recognizing emotion in Speech. In: Proc. International Conf. on Spoken Language Processing, pp. 1970–1973 (1996)
Google Scholar
Scherer, K.R.: Adding the affective dimension-A new look in speech analysis and synthesis. In: Proc. International Conf. on Spoken Language Processing, pp. 1808–1811 (1996)
Google Scholar
Zhou, G., Hansen, J.H.L., Kaiser, J.F.: Nonlinear Feature Based Classification of Speech Under Stress. IEEE Transactions on speech and audio processing 9(3) (2001)
Google Scholar
Yacoub, S., Simske, S., Lin, X., Burns, J.: Recognition of emotions in interactive voice response system. In: Eurospeech 2003 Proc. (2003)
Google Scholar
Kostov, V., Fukuda, S.: Emotion in user interface. Voice Interaction system. no. 2. In: IEEE Intl. Conf. on systems, Man, Cybernetics Representation, pp. 798–803 (2000)
Google Scholar
Oriyama, T.M., Oazwa.: Emotion recognition and synthesis system on speech. In: IEEE Intl. Conference on Multimedia Computing and Systems, pp. 840–844. IEEE Computer Society Press, Los Alamitos (1999)
Chapter Google Scholar
Lee, C.M., Narayanan, S., Pieraccini, R.: Classifying emotions in human-machine spoken dialogs. In: ICME 2002 (2002)
Google Scholar
Wu, T.-F., Lin, C.-J., Weng, R.C.: Probability Estimates for Multi-class Classification by Pairwise Coupling. Journal of Machine Learning Research (2004)
Google Scholar
Gu, L., Zahorian, S.A.: A new robust algorithm for isolated word end-point detection. In: ICASSP 2002, Orlando, USA (2002)
Google Scholar
Noll, M.: Pitch determination of human speech by the harmonic product spectrum, the harmonic sum spectrum, and a maximum likelihood estimate. In: Proceedings of the Symposium on Computer Processing Communications, pp. 779–797 (1969)
Google Scholar
Ross, M.J., Shaer, H.L., Cohen, A., Freudberg, R., Manley, H.J.: Average magnitude difference function pitch extractor. ASSP-22, 353–362 (1974)
Google Scholar
Sun, X.: A pitch determination algorithm based on subharmonic-to harmonic ratio. In: ICSLP, pp. 676–679 (2000)
Google Scholar
Liu, M., Wan, C.: A study on content-based classification retrieval of audio database. In: Proc. of the International Database Engineering & Applications Symposium, pp. 339–345 (2001)
Google Scholar
Bong-Seok, K.: A text-independent emotion recognition algorithm using speech signal. MS Thesis, Yonsei University (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Dankook University, Division of Information and Computer Science, San 8, Hannam-Dong,Yongsan-Ku,Seoul, 140-714, Korea
Won-Joong Yoon, Youn-Ho Cho & Kyu-Sik Park

Authors

Won-Joong Yoon
View author publications
You can also search for this author in PubMed Google Scholar
Youn-Ho Cho
View author publications
You can also search for this author in PubMed Google Scholar
Kyu-Sik Park
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jadwiga Indulska Jianhua Ma Laurence T. Yang Theo Ungerer Jiannong Cao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yoon, WJ., Cho, YH., Park, KS. (2007). A Study of Speech Emotion Recognition and Its Application to Mobile Services. In: Indulska, J., Ma, J., Yang, L.T., Ungerer, T., Cao, J. (eds) Ubiquitous Intelligence and Computing. UIC 2007. Lecture Notes in Computer Science, vol 4611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73549-6_74

Download citation

DOI: https://doi.org/10.1007/978-3-540-73549-6_74
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73548-9
Online ISBN: 978-3-540-73549-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics