Skip to main content
Log in

Getting closer: tailored human–computer speech dialog

  • Long Paper
  • Published:
Universal Access in the Information Society Aims and scope Submit manuscript

Abstract

This paper presents an advanced call center, which adapts presentation and interaction strategy to properties of the caller such as age, gender, and emotional state. User studies on interactive voice response (IVR) systems have shown that these properties can be used effectively to “tailor” services to users or user groups who do not maintain personal preferences, e.g., because they do not use the service on a regular basis. The adopted approach to achieve individualization of services, without being able to personalize them, is based on the analysis of a caller’s voice. This paper shows how this approach benefits service providers by being able to target entertainment and recommendation options. It also shows how this analysis at the same time benefits the customer, as it can increase accessibility of IVR systems to user segments which have particular expectations or which do not cope well with a “one size fits all” system. The paper summarizes the authors’ current work on component technologies, such as emotion detection, age and gender recognition on telephony speech, and presents results of usability and acceptability tests as well as an architecture to integrate these technologies in future multi-modal contact centers. It is envisioned that these will eventually serve customers with an avatar representation of an agent and tailored interaction strategies, matching powerful output capabilities with advanced analysis of the user’s input.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://www.w3.org/TR/voicexml20/.

  2. http://java.sun.com/javaee/.

  3. http://www.w3.org/2005/Incubator/emotion/.

  4. These were “children” plus “young”, “adult”, and “senior” variants for “males” and “females”. These classes were chosen so that different application scenarios initially envisaged could be tested by collapsing classes, i.e., without having to retrain the classifiers for every scenario.

  5. Children were represented by separate male and female models.

References

  1. Balentine, B., Morgan, D.P.: How to Build a Speech Recognition Application. A Style Guide for Telephony Dialogues. Enterprise Integration Group, Inc., San Ramon (1999)

    Google Scholar 

  2. Batliner, A., Burkhardt, F., van Ballegooy, M., Nöth, E.: A taxonomy of applications that utilize emotional awareness. In: Proceedings of Fifth Slovenian and First International Language Technologies Conference, Ljubljana (2006)

  3. Batliner, A., Fischer, K., Huber, R., Spilker, J., Nöth, E.: Desperately seeking emotions: Actors, wizards, and human beings. In: Proceedings of the ISCA Workshop on Speech and Emotion, Belfast, SEP (2000)

  4. Burkhardt, F., Ajmera, J., Englert, R., Burleson, W., Stegmann, J.: Detecting anger in automated voice portal dialogs. In: Proceedings of Interspeech 2006, Pittsburgh (2006)

  5. Burkhardt, F., Metze, F., Stegmann, J.: Speaker classification for next generation voice dialog systems. In: Advances in Digital Speech Transmission. Wiley, London (2008)

  6. Burkhardt, F., van Ballegooy, M., Englert, R., Huber, R.: An emotion-aware voice portal. In: Proceedings of 16th conference for electronic speech signal processing (ESSP) 2005, Prague (2005)

  7. Chu-Carroll, J., Nickerson, J.: Evaluating automatic dialogue strategy adaptation for a spoken dialogue system. In: Proceedings NAACL 1, pp. 202–209 (2000)

  8. Devillers, L., Lamel, L., Vasilescu, I.: Annotation and detection of emotion in a task-oriented humanhuman dialog corpus. In: Proceedings of ISLE Workshop on dialogue tagging (2002)

  9. Devillers, L., Vidrascu, L., Lamel, L.: Emotion detection in real-life spoken dialogs recorded in call center. J. Neural Netw. 18(4), 407–422 (2005)

    Article  Google Scholar 

  10. Dusan, S., Flanagan, J.: Adaptive interface for spoken dialog. J. Acoust. Soc. Am. 111(5), 2481–2510 (2002)

    Google Scholar 

  11. Gorin, A.L., Riccardi, G., Wright, J.H.: How may i help you? Speech Commun. 23(1–2), 113–127 (1997)

    Article  Google Scholar 

  12. Hempel, T.: Usability of a telephone-based speech dialogue system as experienced by user groups of different age and background. In: Proceedings of 2nd ISCA/DEGA tutorial and research workshop on perceptual quality of systems, Berlin (2006)

  13. Höge, H., Draxler, C., van den Heuvel, H., Johansen, F.T., Sanders, E., Tropf, H.S.: Speechdat multilingual speech databases for teleservices: Across the finish line. In: Proceedings of Eurospeech 1999, Budapest (1999), ISCA. http://www.speechdat.org/

  14. Kronenberg, S., Philopoulos, A.: User-adaptive dialog support for speech dialog systems. Tech. rep., Daimler Chrysler AG, 2004. International Patent, Application No.: PCT/EP2004/008085

  15. Lee, C.M., Narayanan, S.S.: Toward detecting emotions in spoken dialogs. IEEE Trans. Speech Audio Process. 13(2), 293–303 (2005)

    Article  Google Scholar 

  16. Linville, S.E.: Vocal Aging. Singular Publishing Group, San Diego (2001)

    Google Scholar 

  17. Liscombe, J., Riccardi, G., Hakkani-Tür, D.: Using context to improve emotion detection in spoken dialog systems. In: Proceedings of Interspeech 05, Lisbon (2005)

  18. Litman, D.J., Pan, S.: Designing and evaluating an adaptive spoken dialogue system. User Model. User Adapt. Interact. 12(2–3), 111–137 (2002)

    Article  MATH  Google Scholar 

  19. McTear, M.: New directions in spoken dialogue technology for pervasive interfaces. In: COLING 2004 satellite workshop robust and adaptive information processing for mobile speech interfaces (2004)

  20. Meisel, W.: The telephone as marketing media. http://www.tmaa.com/conversationalmarketing.htm (2003)

  21. Metze, F., Ajmera, J., Englert, R., Bub, U., Burkhardt, F., Stegmann, J., Müller, C., Huber, R., Andrassy, B., Bauer, J.G., Littel, B.: Comparison of four approaches to age and gender recognition for telephone applications. In: Proceedings of ICASSP 2007, Honolulu (2007)

  22. Minematsu, N., Sekiguchi, M., Hirose, K.: Automatic estimation of one’s age with his/her speech based upon acoustic modeling techniques of speakers. In: Proceedings of ICASSP 2002, Orlando (2002)

  23. Müller, C., Wittig, F., Baus, J.: Exploiting speech for recognizing elderly users to respond to their special needs. In: Proceedings of Interspeech 2003 (Eurospeech), Geneva (2003)

  24. Mysak, E.D.: Pitch duration characteristics of older males. J. Speech Hear. Res. 2, 46–54 (1959)

    Google Scholar 

  25. Schötz, S.: Automatic prediction of speaker age using CART. http://www.ling.lu.se/persons/Suzi/downloads/RFpaperSusanneS2004.pdf (2004). Term paper for course in Forensic Phonetics, Göteborg University

  26. Shafran, I., Riley, M., Mohri, M.: Voice signatures. In: Proceedings of ASRU 2003, Virgin Islands (2003)

  27. Turunen, M., Hakulinen, J., Kainulainen, A.: System architectures for speech-based and multimodal pervasive computing applications. In: Proceedings of the 1st international workshop on requirements and solutions for pervasive software infrastructures (2006)

  28. Turunen, M., Hakulinen, J., Kari-Jouko Räihä, E.-P.S., Kainulainen, A., Prusi, P.: An architecture and application for speech-based accessibility systems. IBM Syst. J. 44(3), 485–504 (2005)

    Article  Google Scholar 

  29. Walker, M., Langkilde-Geary, I., Wright, H., Wright, J., Gorin, A.: Automatically training a problematic dialogue predictor for a spoken dialogue system. J. Artif. Intell. Res. 16, 293–319 (2002)

    MATH  Google Scholar 

  30. Yacoub, S., Simske, S., Lin, X., and Burns, J.: Recognition of emotions in interactive voice response systems. In: Proceedings of Interspeech 2003 (Eurospeech), Geneva (2003)

Download references

Acknowledgments

The authors would like to thank Martin Eckert, Wiebke Johannsen, Bernhard Kaspar, and Ralf Kirchherr for providing many helpful comments on the analysis of the data and earlier versions of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florian Metze.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Metze, F., Englert, R., Bub, U. et al. Getting closer: tailored human–computer speech dialog. Univ Access Inf Soc 8, 97–108 (2009). https://doi.org/10.1007/s10209-008-0133-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10209-008-0133-0

Keywords

Navigation