Analysis on Effects of Text-to-Speech and Avatar Agent in Evoking Users’ Spontaneous Listener’s Reactions

Misu, Teruhisa; Mizukami, Etsuo; Shiga, Yoshinori; Kawamoto, Shinichi; Kawai, Hisashi; Nakamura, Satoshi

doi:10.1007/978-1-4614-1335-6_10

Teruhisa Misu³,
Etsuo Mizukami³,
Yoshinori Shiga³,
Shinichi Kawamoto³,
Hisashi Kawai³ &
…
Satoshi Nakamura⁴

480 Accesses
7 Citations

Abstract

This paper reports an analysis on effect of text-to-speech (TTS) and avatar agent in evoking user’s user’s spontaneous backchannels. We construct an HMMbased dialogue-style TTS system that generates human-like cues that evoke users’ backchannels. We also constructed an avatar agent that can make several listener’s reactions. A spoken dialogue system for information navigation was implemented and was evaluated in terms of evoked user backchannels. We conducted user experiments and the results indicated that (1) the user backchannels evoked by our TTS are more informative for the system in detecting users’ feelings than those by conventional reading-style TTS and (2) use of avatar agent can invite more user backchannels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

English Master AMMU: Advanced Spoken English Chatbot

Agent “Nah”: Development of a Voice-Driven Embodied Entrainment Character with Non-agreeable Responses

Bridging the Communication Rate Gap: Enhancing Text Input for Augmentative and Alternative Communication (AAC)

References

A Gravano and J Hirschberg (2009) Backchannel-inviting cues in task-oriented dialogue. In: Proc. Interspeech, pp 1019–1022
Google Scholar
Abe M, Sagisaka Y, Umeda T, Kuwabara H (1990) Speech Database User Manual. ATR Technical Report TR-I-0166
Google Scholar
Andersson S, Georgila K, Traum D, M Aylett RC (2010) Prediction and Realisation of Conversational Characteristics by Utilising Spontaneous Speech for Unit Selection. In: Proc. Speech Prosody
Google Scholar
Bohus D, Horvitz E (2009) Models for Multiparty Engagement in Open-World Dialog. In: Proc. SIGDIAL, pp 225–234
Google Scholar
Campbell N (2006) Conversational speech synthesis and the need for some laughter. IEEE Trans on Audio, Speech and Language Processing 14(4):1171–1178
Article Google Scholar
Fujie S, Fukushima K, Kobayashi T (2005) Back-channel feedback generation using linguistic and nonlinguistic information and its application to spoken dialogue system. In: Proc. Interspeech, pp 889–892
Google Scholar
Hori C, Ohtake K, Misu T, Kashioka H, Nakamura S (2008) Dialog Management using Weighted Finite-state Transducers. In: Proc. Interspeech, pp 211–214
Google Scholar
J Cassell MBLCKCHV T Bickmore, Yan H (1999) Embodiment in conversational interfaces: Rea. In: Proc. of Conference on Human Factors in Computing Systems, pp 520–527
Google Scholar
Kawahara T, Toyokura M, Misu T, Hori C (2008) Detection of Feeling Through Back- Channels in Spoken Dialogue. In: Proc. Interspeech, pp 1696–1696
Google Scholar
Kayama K, Kobayashi A, Mizukami E, Misu T, Kashioka H, Kawai H, Nakamura S (2010) Spoken Dialog System on Plasma Display Panel Estimating User’s Interest by Image Processing. In: Proc. 1st International Workshop on Human-Centric Interfaces for Ambient Intelligence (HCIAmi)
Google Scholar
Koiso H, Horiuchi Y, Tutiya S, Ichikawa A, Den Y (1998) An Analysis of Turn-Taking and Backchannels based on Prosodic and Syntactic Features in Japanese Map Task Dialogue. Language and Speech 41(3–4):295–322
Google Scholar
Marge M, Miranda J, Black A, Rudnicky AI (2010) Towards Improving the Naturalness of Social Conversations with Dialogue Systems. In: Proc. SIGDIAL, pp 91–94
Google Scholar
Maynard S (1986) On back-channel behavior in japanese and english casual conversation. Linguistics 24(6):1079–1108
Article Google Scholar
Misu T, Ohtake K, Hori C, Kashioka H, Nakamura S (2009) Annotating Communicative Function and Semantic Content in Dialogue Act for Construction of Consulting Dialogue Systems. In: Proc. Interspeech
Google Scholar
Misu T, Sugiura K, Ohtake K, Hori C, Kashioka H, Kawai H, Nakamura S (2010) Dialogue Strategy Optimization to Assist User’s Decision for Spoken Consulting Dialogue Systems. In: Proc. IEEE-SLT, pp 342–347
Google Scholar
Okato Y, Kato K, Yamamoto M, Itahashi S (1996) Insertion of interjectory response based on prosodic information. In: Proc. of IEEE Workshop Interactive Voice Technology for Telecommunication Applications, pp 85–88
Google Scholar
Reeves B, Nass C (1996) The Media Equation: How People Treat Computers, Television, and New Media Like Real People and Places. Cambridge University Press
Google Scholar
SPTK (2011) Speech Signal Processing Toolkit (SPTK). http://sp-tk.sourceforge.net/
Ward N, TsukaharaW(2000) Prosodic features which cue backchannel responses in English and Japanese. Journal of Pragmatics 32(8):1177–1207
Google Scholar
Y Matsuyama and S Fujie and H Taniyama and T Kobayashi (2010) Psychological Evaluation of a Group Communication Activation Robot in a Party Game. In: Proc. Interspeech, pp 3046–3049
Google Scholar
Zen H, Nose T, Yamagishi J, Sako S, Masuko T, Black A, Tokuda K (2007) The HMM-based speech synthesis system version 2.0. In: Proc. ISCA SSW6
Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Information and Communications Technology (NICT), Kyoto, Japan
Teruhisa Misu, Etsuo Mizukami, Yoshinori Shiga, Shinichi Kawamoto & Hisashi Kawai
Nara Institute of Science and Technology (NAIST), Nara, Japan
Satoshi Nakamura

Authors

Teruhisa Misu
View author publications
You can also search for this author in PubMed Google Scholar
Etsuo Mizukami
View author publications
You can also search for this author in PubMed Google Scholar
Yoshinori Shiga
View author publications
You can also search for this author in PubMed Google Scholar
Shinichi Kawamoto
View author publications
You can also search for this author in PubMed Google Scholar
Hisashi Kawai
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Nakamura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Teruhisa Misu .

Editor information

Editors and Affiliations

, Dept. of Languages and Computer Systems, University of Granada, Granada, 18071, Spain
Ramón López-Cózar Delgado
, Dept. of Computer Science & Engineering, Waseda University, Okubo 3-4-1, Tokyo, 169-8555, Japan
Tetsunori Kobayashi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Misu, T., Mizukami, E., Shiga, Y., Kawamoto, S., Kawai, H., Nakamura, S. (2011). Analysis on Effects of Text-to-Speech and Avatar Agent in Evoking Users’ Spontaneous Listener’s Reactions. In: Delgado, RC., Kobayashi, T. (eds) Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems Workshop. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1335-6_10

Download citation

DOI: https://doi.org/10.1007/978-1-4614-1335-6_10
Published: 12 August 2011
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1334-9
Online ISBN: 978-1-4614-1335-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Analysis on Effects of Text-to-Speech and Avatar Agent in Evoking Users’ Spontaneous Listener’s Reactions

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

English Master AMMU: Advanced Spoken English Chatbot

Agent “Nah”: Development of a Voice-Driven Embodied Entrainment Character with Non-agreeable Responses

Bridging the Communication Rate Gap: Enhancing Text Input for Augmentative and Alternative Communication (AAC)

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Analysis on Effects of Text-to-Speech and Avatar Agent in Evoking Users’ Spontaneous Listener’s Reactions

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

English Master AMMU: Advanced Spoken English Chatbot

Agent “Nah”: Development of a Voice-Driven Embodied Entrainment Character with Non-agreeable Responses

Bridging the Communication Rate Gap: Enhancing Text Input for Augmentative and Alternative Communication (AAC)

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation