Skip to main content
Log in

Gibberish speech as a tool for the study of affective expressiveness for robotic agents

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recent technological advancements bring virtual agents, avatars, and social robotic characters into our daily lives. These characters must acquire the ability to express (simulated) emotions vocally and gesturally. In the vocal channel, Natural Language Interaction technologies today have some limitations when used in real-world natural environments and the models of expressivity of text to speech synthesis engines are not yet mature enough. To address these limitations, an alternative form of vocal communication - gibberish speech - is introduced in this paper. Gibberish speech consists of vocalizations of meaningless strings of speech sounds, and thus has no semantic meaning. It is occasionally used by performing artists or for cartoon animations and games to express intended emotions (e.g. Teletubbies and The Sims). In this paper, our approach for constructing expressive gibberish speech is described and the experimental evaluations with its intended robotic agents are reported. It is shown that the generated gibberish speech can contribute to a significant extent to studies concerning emotion expression for robotic agents and can be further utilized in affective human-robot interaction studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. In this context a sample is considered natural when it sounds like an unrecognized real language and not as an unnatural or random combination of sounds.

  2. In hypothesis testing (or statistical significance testing) in statistics, p value is the significance of the sample statistics [1]. It represents the probability of obtaining the observed effect (or larger) under a null hypothesis. A significant effect can be claimed if the p value is smaller than a conventional significance level (which is typically 0.05).

  3. ETRO audio-visual lab, http://www.etro.vub.ac.be/Research/Nosey_Elephant_Studios/.

  4. Annosoft Lipsync Tool 4.1 can be downloaded from: http://www.annosoft.com/lipsync-tool.

References

  1. Argyrous G (2005) Statistics for Research. Sage Publications Ltd, London

    Google Scholar 

  2. Ayesh A (2009) Emotionally expressive music based interaction language for social robots. ICGST Int J Autom Robot Auton Syst 9(1):1–10

    Google Scholar 

  3. Bamidis PD, Luneski A, Vivas A, Papadelis C, Maglaveras N (2007) Multi-channel physiological sensing of human emotion: insights into emotion-aware computing using affective protocols, avatars and emotion specifications. In: Medinfo 2007: Proceedings of the 12th world congress on health (Medical) Informatics IOS Press, Building Sustainable Health Systems

  4. Breazeal C (2000) Sociable machines: expressive social exchange between humans and robots. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology

  5. Breazeal C (2001) Emotive qualities in robot speech. In: Proceedings of the 2001 IEEE/RSJ international conference on intelligent robots and systems. pp 1388–1394

  6. Burleson W (2006) Affective learning companions: strategies for empathetic agents with real-time multimodal affective sensing to foster meta-cognitive and meta-affective approaches to learning, motivation. PhD thesis, Massachusetts Institute of Technology

  7. Busso C, Narayanan S (2008) Recording audio-visual emotional databases from actors look. In: 2nd international workshop on emotion: corpora for research on emotion and affect, international conference on language resources and evaluation (LREC 2008), pp 17–22

  8. Carlson R, Granström B, Nord L(1991) Segmental evaluation using the Esprit/SAM test procedures and monosyllabic words. In: The ESCA workshop on speech synthesis

  9. Chomsky N (1956) Three models for the description of language. Inf Theory, IRE Trans 2(3):113–124

    Article  MATH  Google Scholar 

  10. Corveleyn S, Coose B, Verhelst W (2002) Voice modification and conversion using PLAR-Parameters. In: IEEE Benelux workshop on model based processing and coding of audio (MPCA)

  11. Goodrich MA, Schultz A C(2007) Human-robot interaction: a survey. Found Trends Human-Comput Inter 1(3):203–275

    Article  MATH  Google Scholar 

  12. Gouaillier D, Hugel V, Blazevic P, Kilner C, Monceaux J, Lafourcade P, Marnier B, Serre J, Maisonnier B (2008) The NAO humanoid: a combination of performance and affordability. CoRR abs/0807. 3223

  13. Hart M (1971) Project Gutenberg. http://www.gutenberg.org. Accessed March 2014

  14. Jee ES, Jeong YJ, Kim CH, Kobayashi H (2010) Sound design for emotion and intention expression of socially interactive robots. Intel Serv Robo 3:199–206 634

    Article  Google Scholar 

  15. Juslin PN, Laukka P (2003) Communication of emotions in vocal expression and music performance: different channels, same code? Psychol Bull 129(5):770–814

    Article  Google Scholar 

  16. Latacz L, Kong Y, MattheysesW, VerhelstW(2008) An overview of the VUB entry for the 2008 blizzard challenge. In: Proceedings of the interspeech blizzard challenge

  17. Libin AV, Libin EV (2004) Person-robot interactions from the robopsychologists’ point of view: the robotic psychology and robotherapy approach. Proc IEEE 92(11):1789–1803

    Article  Google Scholar 

  18. Lisetti C, Nasoz F, LeRouge C, Ozyer O, Alvarez K (2003) Developing multimodal intelligent affective interfaces for tele-home health care. Int J Human-Computer Stud 59(1):245–255

    Article  Google Scholar 

  19. Luneski A, Konstantinidis E, Bamidis P (2010) Affective medicine: a review of affective computing efforts in medical informatics. Methods Inf Med 49(3):207–218

    Article  Google Scholar 

  20. Mubin O, Bartneck C, Feijs L (2009) What you say is not what you get: arguing for artificial languages instead of natural languages in human robot speech interaction. In: The spoken dialogue and human-robot interaction workshop at IEEE RoMan 2009. IEEE, Japan

  21. Nijholt A, Tan D (2007) Playing with your brain: brain-computer interfaces and games. In: Proceedings of the international conference on Advances in computer entertainment technology. ACM, pp 305306

  22. Olive J, Buchsbaum A (1987) Changing voice characteristics in text to speech synthesis. AT&T Bell-Labs, Technical Memorandum

    Google Scholar 

  23. Oudeyer PY (2003) The production and recognition of emotions in speech: features and algorithms. Int J Human-Comput Stud 59(1):157–183

    Google Scholar 

  24. Prendinger H, Ishizuka M (2004) What affective computing and life-like character technology can do for tele-home health care. In: Proceedings workshop HCI and homecare, Citeseer

  25. Read R, Belpaeme T (2012) How to Use Non-Linguistic Utterances to Convey Emotion in Child-Robot Interaction. In: Proceedings of the 7th annual ACM/IEEE international conference on Human-Robot Interaction. ACM, Boston, MA, pp 219–220

  26. Riek LD (2012) Wizard of Oz studies in HRI: a systematic review and new reporting guidelines. J Human-Robot Interact 1(1):119–136

    Article  Google Scholar 

  27. Saldien J, Goris K, Yilmazyildiz S, Verhelst W, Lefeber D (2008) On the design of the huggable robot Probo. J Phys Agents 2(2):3–12

    Google Scholar 

  28. Saldien J, Goris K, Vanderborght B, Vanderfaeillie J, Lefeber D (2010) Expressing emotions with the social robot Probo. Int J Soc Robot 2(4):377–389

    Article  Google Scholar 

  29. Schröder M (2003) Speech and emotion research: an overview of research frameworks and a dimensional approach to emotional speech synthesis. PhD thesis, University of Saarland

  30. Schröder M (2009) Expressive speech synthesis: past, present, and possible futures. In: Tao J, Tan T (eds) Affective information processing. Springer, London, pp 111–126

  31. Schröder M, Trouvain J (2003) The German text-to-speech synthesis system MARY: a tool for research, development and teaching. Int J Speech Technol 6(4):365–377

    Article  Google Scholar 

  32. Schröder M, Cowie R, Douglas-Cowie E, Westerdijk M, Gielen SC (2001) Acoustic correlates of emotion dimensions in view of speech synthesis. In: INTERSPEECH, pp 87–90

  33. Smith RN, Frawley WJ (1999) Affective computing: medical applications. In: Proceedings of HCI international (the 8th international conference on human-computer interaction) on human-computer interaction: ergonomics and user interfaces-Volume I-Volume I, L. Erlbaum Associates Inc., pp 843–847

  34. Verhelst W, Roelands M Verhelst W, Roelands M (1993) An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. In: Conference on IEEE international acoustics, speech, and signal processing (ICASSP), vol 2. IEEE, pp 554–557

  35. Wang W, Athanasopoulos G, Yilmazyildiz S, Patsis G, Enescu V, Sahli H, Verhelst W, Hiolle A, Lewis M, Canamero L (2014) Natural emotion elicitation for emotion modeling in child-robot interactions, (accepted)

  36. Winters RM, Wanderley MM (2013) Sonification of Emotion: Strategies for Continuous Display of Arousal and Valence. In: Luck G, Brabant O (eds) Proceedings of the 3rd international conference on music & emotion (ICME3). University of Jyväskylä, Department of Music. Jyväskylä, Finland

    Google Scholar 

  37. Yang PF, Stylianou Y (1998) Real Time voice alteration based on linear prediction. In: Proceedings of ICSLP, Citeseer. Sydney, Australia, pp 1667–1670

    Google Scholar 

  38. Yilmazyildiz S, Mattheyses W, Patsis Y, Verhelst W (2006) Expressive speech recognition and synthesis as enabling technologies for affective robot-child communication. In: Zhuang Y, Yang SQ, Rui Y, He Q (eds) Advances in multimedia information processing - PCM 2006, lecture notes in computer science, vol 4261. Springer, Berlin Heidelberg, pp 1–8

    Google Scholar 

  39. Yilmazyildiz S, Latacz L, Mattheyses W, Verhelst W (2010) Expressive gibberish speech synthesis for affective human-computer interaction. In: Sojka P, Horák A, Kopecék I, Pala K (eds) text, speech and dialogue, lecture notes in computer science, vol 6231. Springer Berlin, Heidelberg, pp 584–590

    Google Scholar 

  40. Yilmazyildiz S, Henderickx D, Vanderborght B, Verhelst W, Soetens E, Lefeber D (2011) EMOGIB: emotional gibberish speech database for affective human-robot interaction. In: DMello S, Graesser A, Schuller B, Martin JC (eds) Affective computing and intelligent interaction, lecture notes in computer science, vol 6975. Springer Berlin, Heidelberg. Memphis, Tennessee, pp 163–172

    Google Scholar 

  41. Yilmazyildiz S, Athanasopoulos G, Patsis G, Wang W, Oveneke MC, Latacz L, Verhelst W, Sahli H, Henderickx D, Vanderborght B, Soetens E, Lefeber D (2013) Voice modification forwizard-of-oz experiments in robot-child interaction. In: Workshop on affective social speech signals (WASSS 2013)

  42. Yilmazyildiz S, Henderickx D, Vanderborght B, Verhelst W, Soetens E, Lefeber D (2013) Multi-modal emotion expression for affective human-robot interaction. In: Workshop on affective social speech signals (WASSS 2013) 705

Download references

Acknowledgements

The research reported in this paper was supported in part by the Research counsel of the Vrije Universiteit Brussel with horizontale onderzoeksactie HOA16 and by the European Commission (EU-FP7 project ALIZ-E, ICT-248116).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Selma Yilmazyildiz.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yilmazyildiz, S., Verhelst, W. & Sahli, H. Gibberish speech as a tool for the study of affective expressiveness for robotic agents. Multimed Tools Appl 74, 9959–9982 (2015). https://doi.org/10.1007/s11042-014-2165-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2165-1

Keywords

Navigation