Abstract
With the development of Interactive Voice Response (IVR) systems , people can not only operate computer systems through task-oriented conversation but also enjoy non-task-oriented conversation with the computer. When an IVR system generates a response, it usually refers to just verbal information of the user’s utterance. However, when a person gloomily says “I’m fine,” people will respond not by saying “That’s wonderful” but “Really?” or “Are you OK?” because we can consider both verbal and non-verbal information such as tone of voice, facial expressions, gestures, and so on. In this article, we propose an intelligent IVR system that considers not only verbal but also non-verbal information. To estimate a speaker’s emotion (positive, negative, or neutral), 384 acoustic features extracted from the speaker’s utterance are utilized to machine learning (SVM). Artificial Intelligence Markup Language (AIML)-based response generating rules are expanded to be able to consider the speaker’s emotion. As a result of the experiment, subjects felt that the proposed dialog system was more likable, enjoyable, and did not give machine-like reactions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
AIML—The Artificial Intelligence Markup Language. http://www.alicebot.org/aiml.html. Accessed 9 May 2016
Home Page of the Loebner Prize. http://www.loebner.net/Prizef/loebner-prize.html. Accessed 9 May 2016
Emotion Challenge—AAAC emotion-research.net—Association for the Advancement of Affective Computing. http://emotion-research.net/sigs/speech-sig/emotion-challenge. Accessed 9 May 2016
Eyben, F., Wöllmer, M., Schuller, B.: openSMILE: The Munich versatile and fast open-source audio feature extractor. In: Proceedings of the International Conference on Multimedia (2010)
Lee, A., Kawahara, T., Shikano, K.: Real-time confidence scoring based on word posterior probability on two-pass search algorithm. Tech. Rep. IEICE 103(520), 35–40 (2003). (in Japanese)
Ihaka, R., Gentleman, R.: A language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996)
Takayoshi, K., Tanaka, T.: The relationship between the behavior of kuwata and impression of his intelligence & personality. IPSJ SIG Tech. Rep. 160, 43–48 (2007). (in Japanese)
Acknowledgements
This research is supported by JSPS KAKENHI Grant Number 26330313 and the Center of Innovation Program from Japan Science and Technology Agency, JST.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Singapore
About this chapter
Cite this chapter
Takahashi, T., Mera, K., Nhat, T.B., Kurosawa, Y., Takezawa, T. (2017). Natural Language Dialog System Considering Speaker’s Emotion Calculated from Acoustic Features. In: Jokinen, K., Wilcock, G. (eds) Dialogues with Social Robots. Lecture Notes in Electrical Engineering, vol 427. Springer, Singapore. https://doi.org/10.1007/978-981-10-2585-3_11
Download citation
DOI: https://doi.org/10.1007/978-981-10-2585-3_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2584-6
Online ISBN: 978-981-10-2585-3
eBook Packages: EngineeringEngineering (R0)