Getting closer: tailored human–computer speech dialog

Metze, Florian; Englert, Roman; Bub, Udo; Burkhardt, Felix; Stegmann, Joachim

doi:10.1007/s10209-008-0133-0

Getting closer: tailored human–computer speech dialog

Long Paper
Published: 15 July 2008

Volume 8, pages 97–108, (2009)
Cite this article

Universal Access in the Information Society Aims and scope Submit manuscript

Florian Metze¹,
Roman Englert²,
Udo Bub¹,
Felix Burkhardt³ &
…
Joachim Stegmann³

316 Accesses
15 Citations
Explore all metrics

Abstract

This paper presents an advanced call center, which adapts presentation and interaction strategy to properties of the caller such as age, gender, and emotional state. User studies on interactive voice response (IVR) systems have shown that these properties can be used effectively to “tailor” services to users or user groups who do not maintain personal preferences, e.g., because they do not use the service on a regular basis. The adopted approach to achieve individualization of services, without being able to personalize them, is based on the analysis of a caller’s voice. This paper shows how this approach benefits service providers by being able to target entertainment and recommendation options. It also shows how this analysis at the same time benefits the customer, as it can increase accessibility of IVR systems to user segments which have particular expectations or which do not cope well with a “one size fits all” system. The paper summarizes the authors’ current work on component technologies, such as emotion detection, age and gender recognition on telephony speech, and presents results of usability and acceptability tests as well as an architecture to integrate these technologies in future multi-modal contact centers. It is envisioned that these will eventually serve customers with an avatar representation of an agent and tailored interaction strategies, matching powerful output capabilities with advanced analysis of the user’s input.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Giving Voices to Multimodal Applications

Identifying Synthetic Voices’ Qualities for Conversational Agents

Accent and Gender Bias in Perceptions of Interactive Voice Systems

Notes

http://www.w3.org/TR/voicexml20/.
http://java.sun.com/javaee/.
http://www.w3.org/2005/Incubator/emotion/.
These were “children” plus “young”, “adult”, and “senior” variants for “males” and “females”. These classes were chosen so that different application scenarios initially envisaged could be tested by collapsing classes, i.e., without having to retrain the classifiers for every scenario.
Children were represented by separate male and female models.

References

Balentine, B., Morgan, D.P.: How to Build a Speech Recognition Application. A Style Guide for Telephony Dialogues. Enterprise Integration Group, Inc., San Ramon (1999)
Google Scholar
Batliner, A., Burkhardt, F., van Ballegooy, M., Nöth, E.: A taxonomy of applications that utilize emotional awareness. In: Proceedings of Fifth Slovenian and First International Language Technologies Conference, Ljubljana (2006)
Batliner, A., Fischer, K., Huber, R., Spilker, J., Nöth, E.: Desperately seeking emotions: Actors, wizards, and human beings. In: Proceedings of the ISCA Workshop on Speech and Emotion, Belfast, SEP (2000)
Burkhardt, F., Ajmera, J., Englert, R., Burleson, W., Stegmann, J.: Detecting anger in automated voice portal dialogs. In: Proceedings of Interspeech 2006, Pittsburgh (2006)
Burkhardt, F., Metze, F., Stegmann, J.: Speaker classification for next generation voice dialog systems. In: Advances in Digital Speech Transmission. Wiley, London (2008)
Burkhardt, F., van Ballegooy, M., Englert, R., Huber, R.: An emotion-aware voice portal. In: Proceedings of 16th conference for electronic speech signal processing (ESSP) 2005, Prague (2005)
Chu-Carroll, J., Nickerson, J.: Evaluating automatic dialogue strategy adaptation for a spoken dialogue system. In: Proceedings NAACL 1, pp. 202–209 (2000)
Devillers, L., Lamel, L., Vasilescu, I.: Annotation and detection of emotion in a task-oriented humanhuman dialog corpus. In: Proceedings of ISLE Workshop on dialogue tagging (2002)
Devillers, L., Vidrascu, L., Lamel, L.: Emotion detection in real-life spoken dialogs recorded in call center. J. Neural Netw. 18(4), 407–422 (2005)
Article Google Scholar
Dusan, S., Flanagan, J.: Adaptive interface for spoken dialog. J. Acoust. Soc. Am. 111(5), 2481–2510 (2002)
Google Scholar
Gorin, A.L., Riccardi, G., Wright, J.H.: How may i help you? Speech Commun. 23(1–2), 113–127 (1997)
Article Google Scholar
Hempel, T.: Usability of a telephone-based speech dialogue system as experienced by user groups of different age and background. In: Proceedings of 2nd ISCA/DEGA tutorial and research workshop on perceptual quality of systems, Berlin (2006)
Höge, H., Draxler, C., van den Heuvel, H., Johansen, F.T., Sanders, E., Tropf, H.S.: Speechdat multilingual speech databases for teleservices: Across the finish line. In: Proceedings of Eurospeech 1999, Budapest (1999), ISCA. http://www.speechdat.org/
Kronenberg, S., Philopoulos, A.: User-adaptive dialog support for speech dialog systems. Tech. rep., Daimler Chrysler AG, 2004. International Patent, Application No.: PCT/EP2004/008085
Lee, C.M., Narayanan, S.S.: Toward detecting emotions in spoken dialogs. IEEE Trans. Speech Audio Process. 13(2), 293–303 (2005)
Article Google Scholar
Linville, S.E.: Vocal Aging. Singular Publishing Group, San Diego (2001)
Google Scholar
Liscombe, J., Riccardi, G., Hakkani-Tür, D.: Using context to improve emotion detection in spoken dialog systems. In: Proceedings of Interspeech 05, Lisbon (2005)
Litman, D.J., Pan, S.: Designing and evaluating an adaptive spoken dialogue system. User Model. User Adapt. Interact. 12(2–3), 111–137 (2002)
Article MATH Google Scholar
McTear, M.: New directions in spoken dialogue technology for pervasive interfaces. In: COLING 2004 satellite workshop robust and adaptive information processing for mobile speech interfaces (2004)
Meisel, W.: The telephone as marketing media. http://www.tmaa.com/conversationalmarketing.htm (2003)
Metze, F., Ajmera, J., Englert, R., Bub, U., Burkhardt, F., Stegmann, J., Müller, C., Huber, R., Andrassy, B., Bauer, J.G., Littel, B.: Comparison of four approaches to age and gender recognition for telephone applications. In: Proceedings of ICASSP 2007, Honolulu (2007)
Minematsu, N., Sekiguchi, M., Hirose, K.: Automatic estimation of one’s age with his/her speech based upon acoustic modeling techniques of speakers. In: Proceedings of ICASSP 2002, Orlando (2002)
Müller, C., Wittig, F., Baus, J.: Exploiting speech for recognizing elderly users to respond to their special needs. In: Proceedings of Interspeech 2003 (Eurospeech), Geneva (2003)
Mysak, E.D.: Pitch duration characteristics of older males. J. Speech Hear. Res. 2, 46–54 (1959)
Google Scholar
Schötz, S.: Automatic prediction of speaker age using CART. http://www.ling.lu.se/persons/Suzi/downloads/RFpaperSusanneS2004.pdf (2004). Term paper for course in Forensic Phonetics, Göteborg University
Shafran, I., Riley, M., Mohri, M.: Voice signatures. In: Proceedings of ASRU 2003, Virgin Islands (2003)
Turunen, M., Hakulinen, J., Kainulainen, A.: System architectures for speech-based and multimodal pervasive computing applications. In: Proceedings of the 1st international workshop on requirements and solutions for pervasive software infrastructures (2006)
Turunen, M., Hakulinen, J., Kari-Jouko Räihä, E.-P.S., Kainulainen, A., Prusi, P.: An architecture and application for speech-based accessibility systems. IBM Syst. J. 44(3), 485–504 (2005)
Article Google Scholar
Walker, M., Langkilde-Geary, I., Wright, H., Wright, J., Gorin, A.: Automatically training a problematic dialogue predictor for a spoken dialogue system. J. Artif. Intell. Res. 16, 293–319 (2002)
MATH Google Scholar
Yacoub, S., Simske, S., Lin, X., and Burns, J.: Recognition of emotions in interactive voice response systems. In: Proceedings of Interspeech 2003 (Eurospeech), Geneva (2003)

Download references

Acknowledgments

The authors would like to thank Martin Eckert, Wiebke Johannsen, Bernhard Kaspar, and Ralf Kirchherr for providing many helpful comments on the analysis of the data and earlier versions of this paper.

Author information

Authors and Affiliations

Deutsche Telekom Laboratories, Berlin, Germany
Florian Metze & Udo Bub
Deutsche Telekom Laboratories at Ben-Gurion University of the Negev, Beersheba, Israel
Roman Englert
T-Systems Enterprise Services GmbH, Darmstadt/Berlin, Germany
Felix Burkhardt & Joachim Stegmann

Authors

Florian Metze
View author publications
You can also search for this author in PubMed Google Scholar
Roman Englert
View author publications
You can also search for this author in PubMed Google Scholar
Udo Bub
View author publications
You can also search for this author in PubMed Google Scholar
Felix Burkhardt
View author publications
You can also search for this author in PubMed Google Scholar
Joachim Stegmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florian Metze.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Metze, F., Englert, R., Bub, U. et al. Getting closer: tailored human–computer speech dialog. Univ Access Inf Soc 8, 97–108 (2009). https://doi.org/10.1007/s10209-008-0133-0

Download citation

Published: 15 July 2008
Issue Date: June 2009
DOI: https://doi.org/10.1007/s10209-008-0133-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Getting closer: tailored human–computer speech dialog

Abstract

Access this article

Similar content being viewed by others

Giving Voices to Multimodal Applications

Identifying Synthetic Voices’ Qualities for Conversational Agents

Accent and Gender Bias in Perceptions of Interactive Voice Systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Getting closer: tailored human–computer speech dialog

Abstract

Access this article

Similar content being viewed by others

Giving Voices to Multimodal Applications

Identifying Synthetic Voices’ Qualities for Conversational Agents

Accent and Gender Bias in Perceptions of Interactive Voice Systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation