Abstract
This paper reports an analysis on effect of text-to-speech (TTS) and avatar agent in evoking user’s user’s spontaneous backchannels. We construct an HMMbased dialogue-style TTS system that generates human-like cues that evoke users’ backchannels. We also constructed an avatar agent that can make several listener’s reactions. A spoken dialogue system for information navigation was implemented and was evaluated in terms of evoked user backchannels. We conducted user experiments and the results indicated that (1) the user backchannels evoked by our TTS are more informative for the system in detecting users’ feelings than those by conventional reading-style TTS and (2) use of avatar agent can invite more user backchannels.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
A Gravano and J Hirschberg (2009) Backchannel-inviting cues in task-oriented dialogue. In: Proc. Interspeech, pp 1019–1022
Abe M, Sagisaka Y, Umeda T, Kuwabara H (1990) Speech Database User Manual. ATR Technical Report TR-I-0166
Andersson S, Georgila K, Traum D, M Aylett RC (2010) Prediction and Realisation of Conversational Characteristics by Utilising Spontaneous Speech for Unit Selection. In: Proc. Speech Prosody
Bohus D, Horvitz E (2009) Models for Multiparty Engagement in Open-World Dialog. In: Proc. SIGDIAL, pp 225–234
Campbell N (2006) Conversational speech synthesis and the need for some laughter. IEEE Trans on Audio, Speech and Language Processing 14(4):1171–1178
Fujie S, Fukushima K, Kobayashi T (2005) Back-channel feedback generation using linguistic and nonlinguistic information and its application to spoken dialogue system. In: Proc. Interspeech, pp 889–892
Hori C, Ohtake K, Misu T, Kashioka H, Nakamura S (2008) Dialog Management using Weighted Finite-state Transducers. In: Proc. Interspeech, pp 211–214
J Cassell MBLCKCHV T Bickmore, Yan H (1999) Embodiment in conversational interfaces: Rea. In: Proc. of Conference on Human Factors in Computing Systems, pp 520–527
Kawahara T, Toyokura M, Misu T, Hori C (2008) Detection of Feeling Through Back- Channels in Spoken Dialogue. In: Proc. Interspeech, pp 1696–1696
Kayama K, Kobayashi A, Mizukami E, Misu T, Kashioka H, Kawai H, Nakamura S (2010) Spoken Dialog System on Plasma Display Panel Estimating User’s Interest by Image Processing. In: Proc. 1st International Workshop on Human-Centric Interfaces for Ambient Intelligence (HCIAmi)
Koiso H, Horiuchi Y, Tutiya S, Ichikawa A, Den Y (1998) An Analysis of Turn-Taking and Backchannels based on Prosodic and Syntactic Features in Japanese Map Task Dialogue. Language and Speech 41(3–4):295–322
Marge M, Miranda J, Black A, Rudnicky AI (2010) Towards Improving the Naturalness of Social Conversations with Dialogue Systems. In: Proc. SIGDIAL, pp 91–94
Maynard S (1986) On back-channel behavior in japanese and english casual conversation. Linguistics 24(6):1079–1108
Misu T, Ohtake K, Hori C, Kashioka H, Nakamura S (2009) Annotating Communicative Function and Semantic Content in Dialogue Act for Construction of Consulting Dialogue Systems. In: Proc. Interspeech
Misu T, Sugiura K, Ohtake K, Hori C, Kashioka H, Kawai H, Nakamura S (2010) Dialogue Strategy Optimization to Assist User’s Decision for Spoken Consulting Dialogue Systems. In: Proc. IEEE-SLT, pp 342–347
Okato Y, Kato K, Yamamoto M, Itahashi S (1996) Insertion of interjectory response based on prosodic information. In: Proc. of IEEE Workshop Interactive Voice Technology for Telecommunication Applications, pp 85–88
Reeves B, Nass C (1996) The Media Equation: How People Treat Computers, Television, and New Media Like Real People and Places. Cambridge University Press
SPTK (2011) Speech Signal Processing Toolkit (SPTK). http://sp-tk.sourceforge.net/
Ward N, TsukaharaW(2000) Prosodic features which cue backchannel responses in English and Japanese. Journal of Pragmatics 32(8):1177–1207
Y Matsuyama and S Fujie and H Taniyama and T Kobayashi (2010) Psychological Evaluation of a Group Communication Activation Robot in a Party Game. In: Proc. Interspeech, pp 3046–3049
Zen H, Nose T, Yamagishi J, Sako S, Masuko T, Black A, Tokuda K (2007) The HMM-based speech synthesis system version 2.0. In: Proc. ISCA SSW6
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this paper
Cite this paper
Misu, T., Mizukami, E., Shiga, Y., Kawamoto, S., Kawai, H., Nakamura, S. (2011). Analysis on Effects of Text-to-Speech and Avatar Agent in Evoking Users’ Spontaneous Listener’s Reactions. In: Delgado, RC., Kobayashi, T. (eds) Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems Workshop. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1335-6_10
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1335-6_10
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1334-9
Online ISBN: 978-1-4614-1335-6
eBook Packages: EngineeringEngineering (R0)