Abstract
This paper presents an advanced call center, which adapts presentation and interaction strategy to properties of the caller such as age, gender, and emotional state. User studies on interactive voice response (IVR) systems have shown that these properties can be used effectively to “tailor” services to users or user groups who do not maintain personal preferences, e.g., because they do not use the service on a regular basis. The adopted approach to achieve individualization of services, without being able to personalize them, is based on the analysis of a caller’s voice. This paper shows how this approach benefits service providers by being able to target entertainment and recommendation options. It also shows how this analysis at the same time benefits the customer, as it can increase accessibility of IVR systems to user segments which have particular expectations or which do not cope well with a “one size fits all” system. The paper summarizes the authors’ current work on component technologies, such as emotion detection, age and gender recognition on telephony speech, and presents results of usability and acceptability tests as well as an architecture to integrate these technologies in future multi-modal contact centers. It is envisioned that these will eventually serve customers with an avatar representation of an agent and tailored interaction strategies, matching powerful output capabilities with advanced analysis of the user’s input.
Similar content being viewed by others
Notes
These were “children” plus “young”, “adult”, and “senior” variants for “males” and “females”. These classes were chosen so that different application scenarios initially envisaged could be tested by collapsing classes, i.e., without having to retrain the classifiers for every scenario.
Children were represented by separate male and female models.
References
Balentine, B., Morgan, D.P.: How to Build a Speech Recognition Application. A Style Guide for Telephony Dialogues. Enterprise Integration Group, Inc., San Ramon (1999)
Batliner, A., Burkhardt, F., van Ballegooy, M., Nöth, E.: A taxonomy of applications that utilize emotional awareness. In: Proceedings of Fifth Slovenian and First International Language Technologies Conference, Ljubljana (2006)
Batliner, A., Fischer, K., Huber, R., Spilker, J., Nöth, E.: Desperately seeking emotions: Actors, wizards, and human beings. In: Proceedings of the ISCA Workshop on Speech and Emotion, Belfast, SEP (2000)
Burkhardt, F., Ajmera, J., Englert, R., Burleson, W., Stegmann, J.: Detecting anger in automated voice portal dialogs. In: Proceedings of Interspeech 2006, Pittsburgh (2006)
Burkhardt, F., Metze, F., Stegmann, J.: Speaker classification for next generation voice dialog systems. In: Advances in Digital Speech Transmission. Wiley, London (2008)
Burkhardt, F., van Ballegooy, M., Englert, R., Huber, R.: An emotion-aware voice portal. In: Proceedings of 16th conference for electronic speech signal processing (ESSP) 2005, Prague (2005)
Chu-Carroll, J., Nickerson, J.: Evaluating automatic dialogue strategy adaptation for a spoken dialogue system. In: Proceedings NAACL 1, pp. 202–209 (2000)
Devillers, L., Lamel, L., Vasilescu, I.: Annotation and detection of emotion in a task-oriented humanhuman dialog corpus. In: Proceedings of ISLE Workshop on dialogue tagging (2002)
Devillers, L., Vidrascu, L., Lamel, L.: Emotion detection in real-life spoken dialogs recorded in call center. J. Neural Netw. 18(4), 407–422 (2005)
Dusan, S., Flanagan, J.: Adaptive interface for spoken dialog. J. Acoust. Soc. Am. 111(5), 2481–2510 (2002)
Gorin, A.L., Riccardi, G., Wright, J.H.: How may i help you? Speech Commun. 23(1–2), 113–127 (1997)
Hempel, T.: Usability of a telephone-based speech dialogue system as experienced by user groups of different age and background. In: Proceedings of 2nd ISCA/DEGA tutorial and research workshop on perceptual quality of systems, Berlin (2006)
Höge, H., Draxler, C., van den Heuvel, H., Johansen, F.T., Sanders, E., Tropf, H.S.: Speechdat multilingual speech databases for teleservices: Across the finish line. In: Proceedings of Eurospeech 1999, Budapest (1999), ISCA. http://www.speechdat.org/
Kronenberg, S., Philopoulos, A.: User-adaptive dialog support for speech dialog systems. Tech. rep., Daimler Chrysler AG, 2004. International Patent, Application No.: PCT/EP2004/008085
Lee, C.M., Narayanan, S.S.: Toward detecting emotions in spoken dialogs. IEEE Trans. Speech Audio Process. 13(2), 293–303 (2005)
Linville, S.E.: Vocal Aging. Singular Publishing Group, San Diego (2001)
Liscombe, J., Riccardi, G., Hakkani-Tür, D.: Using context to improve emotion detection in spoken dialog systems. In: Proceedings of Interspeech 05, Lisbon (2005)
Litman, D.J., Pan, S.: Designing and evaluating an adaptive spoken dialogue system. User Model. User Adapt. Interact. 12(2–3), 111–137 (2002)
McTear, M.: New directions in spoken dialogue technology for pervasive interfaces. In: COLING 2004 satellite workshop robust and adaptive information processing for mobile speech interfaces (2004)
Meisel, W.: The telephone as marketing media. http://www.tmaa.com/conversationalmarketing.htm (2003)
Metze, F., Ajmera, J., Englert, R., Bub, U., Burkhardt, F., Stegmann, J., Müller, C., Huber, R., Andrassy, B., Bauer, J.G., Littel, B.: Comparison of four approaches to age and gender recognition for telephone applications. In: Proceedings of ICASSP 2007, Honolulu (2007)
Minematsu, N., Sekiguchi, M., Hirose, K.: Automatic estimation of one’s age with his/her speech based upon acoustic modeling techniques of speakers. In: Proceedings of ICASSP 2002, Orlando (2002)
Müller, C., Wittig, F., Baus, J.: Exploiting speech for recognizing elderly users to respond to their special needs. In: Proceedings of Interspeech 2003 (Eurospeech), Geneva (2003)
Mysak, E.D.: Pitch duration characteristics of older males. J. Speech Hear. Res. 2, 46–54 (1959)
Schötz, S.: Automatic prediction of speaker age using CART. http://www.ling.lu.se/persons/Suzi/downloads/RFpaperSusanneS2004.pdf (2004). Term paper for course in Forensic Phonetics, Göteborg University
Shafran, I., Riley, M., Mohri, M.: Voice signatures. In: Proceedings of ASRU 2003, Virgin Islands (2003)
Turunen, M., Hakulinen, J., Kainulainen, A.: System architectures for speech-based and multimodal pervasive computing applications. In: Proceedings of the 1st international workshop on requirements and solutions for pervasive software infrastructures (2006)
Turunen, M., Hakulinen, J., Kari-Jouko Räihä, E.-P.S., Kainulainen, A., Prusi, P.: An architecture and application for speech-based accessibility systems. IBM Syst. J. 44(3), 485–504 (2005)
Walker, M., Langkilde-Geary, I., Wright, H., Wright, J., Gorin, A.: Automatically training a problematic dialogue predictor for a spoken dialogue system. J. Artif. Intell. Res. 16, 293–319 (2002)
Yacoub, S., Simske, S., Lin, X., and Burns, J.: Recognition of emotions in interactive voice response systems. In: Proceedings of Interspeech 2003 (Eurospeech), Geneva (2003)
Acknowledgments
The authors would like to thank Martin Eckert, Wiebke Johannsen, Bernhard Kaspar, and Ralf Kirchherr for providing many helpful comments on the analysis of the data and earlier versions of this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Metze, F., Englert, R., Bub, U. et al. Getting closer: tailored human–computer speech dialog. Univ Access Inf Soc 8, 97–108 (2009). https://doi.org/10.1007/s10209-008-0133-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10209-008-0133-0