Gibberish speech as a tool for the study of affective expressiveness for robotic agents

Yilmazyildiz, Selma; Verhelst, Werner; Sahli, Hichem

doi:10.1007/s11042-014-2165-1

Gibberish speech as a tool for the study of affective expressiveness for robotic agents

Published: 31 July 2014

Volume 74, pages 9959–9982, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Selma Yilmazyildiz¹,
Werner Verhelst^1,2 &
Hichem Sahli^1,3

555 Accesses
5 Citations
Explore all metrics

Abstract

Recent technological advancements bring virtual agents, avatars, and social robotic characters into our daily lives. These characters must acquire the ability to express (simulated) emotions vocally and gesturally. In the vocal channel, Natural Language Interaction technologies today have some limitations when used in real-world natural environments and the models of expressivity of text to speech synthesis engines are not yet mature enough. To address these limitations, an alternative form of vocal communication - gibberish speech - is introduced in this paper. Gibberish speech consists of vocalizations of meaningless strings of speech sounds, and thus has no semantic meaning. It is occasionally used by performing artists or for cartoon animations and games to express intended emotions (e.g. Teletubbies and The Sims). In this paper, our approach for constructing expressive gibberish speech is described and the experimental evaluations with its intended robotic agents are reported. It is shown that the generated gibberish speech can contribute to a significant extent to studies concerning emotion expression for robotic agents and can be further utilized in affective human-robot interaction studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Expressivity in Interactive Speech Synthesis; Some Paralinguistic and Nonlinguistic Issues of Speech Prosody for Conversational Dialogue Systems

Cross Modal Evaluation of High Quality Emotional Speech Synthesis with the Virtual Human Toolkit

A Dynamic Speech Breathing System for Virtual Characters

Notes

In this context a sample is considered natural when it sounds like an unrecognized real language and not as an unnatural or random combination of sounds.
In hypothesis testing (or statistical significance testing) in statistics, p value is the significance of the sample statistics [1]. It represents the probability of obtaining the observed effect (or larger) under a null hypothesis. A significant effect can be claimed if the p value is smaller than a conventional significance level (which is typically 0.05).
ETRO audio-visual lab, http://www.etro.vub.ac.be/Research/Nosey_Elephant_Studios/.
Annosoft Lipsync Tool 4.1 can be downloaded from: http://www.annosoft.com/lipsync-tool.

References

Argyrous G (2005) Statistics for Research. Sage Publications Ltd, London
Google Scholar
Ayesh A (2009) Emotionally expressive music based interaction language for social robots. ICGST Int J Autom Robot Auton Syst 9(1):1–10
Google Scholar
Bamidis PD, Luneski A, Vivas A, Papadelis C, Maglaveras N (2007) Multi-channel physiological sensing of human emotion: insights into emotion-aware computing using affective protocols, avatars and emotion specifications. In: Medinfo 2007: Proceedings of the 12th world congress on health (Medical) Informatics IOS Press, Building Sustainable Health Systems
Breazeal C (2000) Sociable machines: expressive social exchange between humans and robots. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology
Breazeal C (2001) Emotive qualities in robot speech. In: Proceedings of the 2001 IEEE/RSJ international conference on intelligent robots and systems. pp 1388–1394
Burleson W (2006) Affective learning companions: strategies for empathetic agents with real-time multimodal affective sensing to foster meta-cognitive and meta-affective approaches to learning, motivation. PhD thesis, Massachusetts Institute of Technology
Busso C, Narayanan S (2008) Recording audio-visual emotional databases from actors look. In: 2nd international workshop on emotion: corpora for research on emotion and affect, international conference on language resources and evaluation (LREC 2008), pp 17–22
Carlson R, Granström B, Nord L(1991) Segmental evaluation using the Esprit/SAM test procedures and monosyllabic words. In: The ESCA workshop on speech synthesis
Chomsky N (1956) Three models for the description of language. Inf Theory, IRE Trans 2(3):113–124
Article MATH Google Scholar
Corveleyn S, Coose B, Verhelst W (2002) Voice modification and conversion using PLAR-Parameters. In: IEEE Benelux workshop on model based processing and coding of audio (MPCA)
Goodrich MA, Schultz A C(2007) Human-robot interaction: a survey. Found Trends Human-Comput Inter 1(3):203–275
Article MATH Google Scholar
Gouaillier D, Hugel V, Blazevic P, Kilner C, Monceaux J, Lafourcade P, Marnier B, Serre J, Maisonnier B (2008) The NAO humanoid: a combination of performance and affordability. CoRR abs/0807. 3223
Hart M (1971) Project Gutenberg. http://www.gutenberg.org. Accessed March 2014
Jee ES, Jeong YJ, Kim CH, Kobayashi H (2010) Sound design for emotion and intention expression of socially interactive robots. Intel Serv Robo 3:199–206 634
Article Google Scholar
Juslin PN, Laukka P (2003) Communication of emotions in vocal expression and music performance: different channels, same code? Psychol Bull 129(5):770–814
Article Google Scholar
Latacz L, Kong Y, MattheysesW, VerhelstW(2008) An overview of the VUB entry for the 2008 blizzard challenge. In: Proceedings of the interspeech blizzard challenge
Libin AV, Libin EV (2004) Person-robot interactions from the robopsychologists’ point of view: the robotic psychology and robotherapy approach. Proc IEEE 92(11):1789–1803
Article Google Scholar
Lisetti C, Nasoz F, LeRouge C, Ozyer O, Alvarez K (2003) Developing multimodal intelligent affective interfaces for tele-home health care. Int J Human-Computer Stud 59(1):245–255
Article Google Scholar
Luneski A, Konstantinidis E, Bamidis P (2010) Affective medicine: a review of affective computing efforts in medical informatics. Methods Inf Med 49(3):207–218
Article Google Scholar
Mubin O, Bartneck C, Feijs L (2009) What you say is not what you get: arguing for artificial languages instead of natural languages in human robot speech interaction. In: The spoken dialogue and human-robot interaction workshop at IEEE RoMan 2009. IEEE, Japan
Nijholt A, Tan D (2007) Playing with your brain: brain-computer interfaces and games. In: Proceedings of the international conference on Advances in computer entertainment technology. ACM, pp 305306
Olive J, Buchsbaum A (1987) Changing voice characteristics in text to speech synthesis. AT&T Bell-Labs, Technical Memorandum
Google Scholar
Oudeyer PY (2003) The production and recognition of emotions in speech: features and algorithms. Int J Human-Comput Stud 59(1):157–183
Google Scholar
Prendinger H, Ishizuka M (2004) What affective computing and life-like character technology can do for tele-home health care. In: Proceedings workshop HCI and homecare, Citeseer
Read R, Belpaeme T (2012) How to Use Non-Linguistic Utterances to Convey Emotion in Child-Robot Interaction. In: Proceedings of the 7th annual ACM/IEEE international conference on Human-Robot Interaction. ACM, Boston, MA, pp 219–220
Riek LD (2012) Wizard of Oz studies in HRI: a systematic review and new reporting guidelines. J Human-Robot Interact 1(1):119–136
Article Google Scholar
Saldien J, Goris K, Yilmazyildiz S, Verhelst W, Lefeber D (2008) On the design of the huggable robot Probo. J Phys Agents 2(2):3–12
Google Scholar
Saldien J, Goris K, Vanderborght B, Vanderfaeillie J, Lefeber D (2010) Expressing emotions with the social robot Probo. Int J Soc Robot 2(4):377–389
Article Google Scholar
Schröder M (2003) Speech and emotion research: an overview of research frameworks and a dimensional approach to emotional speech synthesis. PhD thesis, University of Saarland
Schröder M (2009) Expressive speech synthesis: past, present, and possible futures. In: Tao J, Tan T (eds) Affective information processing. Springer, London, pp 111–126
Schröder M, Trouvain J (2003) The German text-to-speech synthesis system MARY: a tool for research, development and teaching. Int J Speech Technol 6(4):365–377
Article Google Scholar
Schröder M, Cowie R, Douglas-Cowie E, Westerdijk M, Gielen SC (2001) Acoustic correlates of emotion dimensions in view of speech synthesis. In: INTERSPEECH, pp 87–90
Smith RN, Frawley WJ (1999) Affective computing: medical applications. In: Proceedings of HCI international (the 8th international conference on human-computer interaction) on human-computer interaction: ergonomics and user interfaces-Volume I-Volume I, L. Erlbaum Associates Inc., pp 843–847
Verhelst W, Roelands M Verhelst W, Roelands M (1993) An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. In: Conference on IEEE international acoustics, speech, and signal processing (ICASSP), vol 2. IEEE, pp 554–557
Wang W, Athanasopoulos G, Yilmazyildiz S, Patsis G, Enescu V, Sahli H, Verhelst W, Hiolle A, Lewis M, Canamero L (2014) Natural emotion elicitation for emotion modeling in child-robot interactions, (accepted)
Winters RM, Wanderley MM (2013) Sonification of Emotion: Strategies for Continuous Display of Arousal and Valence. In: Luck G, Brabant O (eds) Proceedings of the 3rd international conference on music & emotion (ICME3). University of Jyväskylä, Department of Music. Jyväskylä, Finland
Google Scholar
Yang PF, Stylianou Y (1998) Real Time voice alteration based on linear prediction. In: Proceedings of ICSLP, Citeseer. Sydney, Australia, pp 1667–1670
Google Scholar
Yilmazyildiz S, Mattheyses W, Patsis Y, Verhelst W (2006) Expressive speech recognition and synthesis as enabling technologies for affective robot-child communication. In: Zhuang Y, Yang SQ, Rui Y, He Q (eds) Advances in multimedia information processing - PCM 2006, lecture notes in computer science, vol 4261. Springer, Berlin Heidelberg, pp 1–8
Google Scholar
Yilmazyildiz S, Latacz L, Mattheyses W, Verhelst W (2010) Expressive gibberish speech synthesis for affective human-computer interaction. In: Sojka P, Horák A, Kopecék I, Pala K (eds) text, speech and dialogue, lecture notes in computer science, vol 6231. Springer Berlin, Heidelberg, pp 584–590
Google Scholar
Yilmazyildiz S, Henderickx D, Vanderborght B, Verhelst W, Soetens E, Lefeber D (2011) EMOGIB: emotional gibberish speech database for affective human-robot interaction. In: DMello S, Graesser A, Schuller B, Martin JC (eds) Affective computing and intelligent interaction, lecture notes in computer science, vol 6975. Springer Berlin, Heidelberg. Memphis, Tennessee, pp 163–172
Google Scholar
Yilmazyildiz S, Athanasopoulos G, Patsis G, Wang W, Oveneke MC, Latacz L, Verhelst W, Sahli H, Henderickx D, Vanderborght B, Soetens E, Lefeber D (2013) Voice modification forwizard-of-oz experiments in robot-child interaction. In: Workshop on affective social speech signals (WASSS 2013)
Yilmazyildiz S, Henderickx D, Vanderborght B, Verhelst W, Soetens E, Lefeber D (2013) Multi-modal emotion expression for affective human-robot interaction. In: Workshop on affective social speech signals (WASSS 2013) 705

Download references

Acknowledgements

The research reported in this paper was supported in part by the Research counsel of the Vrije Universiteit Brussel with horizontale onderzoeksactie HOA16 and by the European Commission (EU-FP7 project ALIZ-E, ICT-248116).

Author information

Authors and Affiliations

Department of Electronics and Informatics, Vrije Universiteit Brussel, Brussels, Belgium
Selma Yilmazyildiz, Werner Verhelst & Hichem Sahli
iMinds, Future Media and Imaging Department, Ghent, Belgium
Werner Verhelst
Interuniveristy Microelectronics Center (IMEC), Leuven, Belgium
Hichem Sahli

Authors

Selma Yilmazyildiz
View author publications
You can also search for this author in PubMed Google Scholar
Werner Verhelst
View author publications
You can also search for this author in PubMed Google Scholar
Hichem Sahli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Selma Yilmazyildiz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yilmazyildiz, S., Verhelst, W. & Sahli, H. Gibberish speech as a tool for the study of affective expressiveness for robotic agents. Multimed Tools Appl 74, 9959–9982 (2015). https://doi.org/10.1007/s11042-014-2165-1

Download citation

Received: 04 April 2014
Revised: 19 June 2014
Accepted: 23 June 2014
Published: 31 July 2014
Issue Date: November 2015
DOI: https://doi.org/10.1007/s11042-014-2165-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gibberish speech as a tool for the study of affective expressiveness for robotic agents

Abstract

Access this article

Similar content being viewed by others

Expressivity in Interactive Speech Synthesis; Some Paralinguistic and Nonlinguistic Issues of Speech Prosody for Conversational Dialogue Systems

Cross Modal Evaluation of High Quality Emotional Speech Synthesis with the Virtual Human Toolkit

A Dynamic Speech Breathing System for Virtual Characters

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Gibberish speech as a tool for the study of affective expressiveness for robotic agents

Abstract

Access this article

Similar content being viewed by others

Expressivity in Interactive Speech Synthesis; Some Paralinguistic and Nonlinguistic Issues of Speech Prosody for Conversational Dialogue Systems

Cross Modal Evaluation of High Quality Emotional Speech Synthesis with the Virtual Human Toolkit

A Dynamic Speech Breathing System for Virtual Characters

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation