poster

Voice Puppetry: Speech Synthesis Adventures in Human Centred AI

Authors:

Matthew P. Aylett,

Yolanda Vazquez-AlvarezAuthors Info & Claims

IUI '20 Companion: Companion Proceedings of the 25th International Conference on Intelligent User Interfaces

Pages 108 - 109

https://doi.org/10.1145/3379336.3381478

Published: 17 March 2020 Publication History

Get Access

Abstract

State-of-the-art speech synthesis owes much to modern AI machine learning, with recurrent neural networks becoming the new standard. However, how you say something is just as important as what you say. If we draw inspiration from human dramatic performance, ideas such as artistic direction can help us design interactive speech synthesis systems which can be finely controlled by a human voice. This "voice puppetry" has many possible applications from film dubbing to the pre-creation of prompts for a conversational agent. Previous work in voice puppetry has raised the question of how such a system should work and how we might interact with it. Here, we share the results of a focus group discussing voice puppetry and responding to a voice puppetry demo. Results highlight a main challenge in user-centred AI: where is the trade-off between control and automation? and how may users control this trade-off?

References

[1]

Matthew P Aylett, David A Braude, Christopher J Pidcock, and Blaise Potard. 2019. Voice Puppetry: Exploring Dramatic Performance to Develop Speech Synthesis. In Proc. 10th ISCA Speech Synthesis Workshop. 117--120.

Crossref

Google Scholar

[2]

Yuan-Yi Fan, Soyoung Shin, and Vids Samanta. 2019. Evaluating expressiveness of a voice-guided speech re-synthesis system using vocal prosodic parameters. In Proceedings of the 24th International Conference on Intelligent User Interfaces: Companion. ACM, 67--68.

Digital Library

Google Scholar

[3]

Wendy J Holmes. 1989. Copy synthesis of female speech using the JSRU parallel formant synthesiser. In First European Conference on Speech Communication and Technology. 2513--2516.

Crossref

Google Scholar

[4]

Hilary Hutchinson, Wendy Mackay, Bo Westerlund, Benjamin B Bederson, Allison Druin, Catherine Plaisant, Michel Beaudouin-Lafon, Stéphane Conversy, Helen Evans, Heiko Hansen, et al. 2003. Technology probes: inspiring design for and with families. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 17--24.

Digital Library

Google Scholar

[5]

Oytun Turk and Marc Schroder. 2010. Evaluation of expressive speech synthesis with voice conversion and copy resynthesis techniques. IEEE Transactions on Audio, Speech, and Language Processing 18, 5 (2010), 965--973.

Digital Library

Google Scholar

Cited By

View all

Canyakan S(2024)Perceptual differences between AI and human compositions: the impact of musical factors and cultural backgroundRast Müzikoloji Dergisi10.12975/rastmd.2024124512:4(463-490)Online publication date: 30-Dec-2024
https://doi.org/10.12975/rastmd.20241245
Dubiel MSergeeva ALeiva L(2024)Impact of Voice Fidelity on Decision Making: A Potential Dark Pattern?Proceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645202(181-194)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645202
Li MWan YZhou LRao H(2024)An enhanced governance measure for deep synthesis applicationsInformation and Management10.1016/j.im.2024.10398261:5Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1016/j.im.2024.103982
Show More Cited By

Index Terms

Voice Puppetry: Speech Synthesis Adventures in Human Centred AI
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Natural language interfaces

Recommendations

Voice Puppetry: Towards Conversational HRI WoZ Experiments with Synthesised Voices
HRI '20: Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction

In order to research conversational factors in robot design the use of Wizard of Oz (WoZ) experiments, where an experimenter plays the part of the robot, are common. However, for conversational systems using a synthetic voice, it is extremely difficult ...
Non-linear frequency scale mapping for voice conversion in text-to-speech system with cepstral description

Voice conversion, i.e. modification of a speech signal to sound as if spoken by a different speaker, finds its use in speech synthesis with a new voice without necessity of a new database. This paper introduces two new simple non-linear methods of ...
Lithuanian Speech Corpus Liepa for Development of Human-Computer Interfaces Working in Voice Recognition and Synthesis Mode

The problem of speech corpus for design of human-computer interfaces working in voice recognition and synthesis mode is investigated. Specific requirements of speech corpus for speech recognizers and synthesizers were accented. It has been discussed that ...

Comments

Information & Contributors

Information

Published In

IUI '20 Companion: Companion Proceedings of the 25th International Conference on Intelligent User Interfaces

March 2020

153 pages

ISBN:9781450375139

DOI:10.1145/3379336

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 March 2020

Check for updates

Author Tags

Qualifiers

Poster
Research
Refereed limited

Funding Sources

H2020 Leadership in Enabling and Industrial Technologies

Conference

IUI '20

Sponsor:

SIGAI
SIGCHI

IUI '20: 25th International Conference on Intelligent User Interfaces

March 17 - 20, 2020

Cagliari, Italy

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Sponsor:
sigai
sigai

30th International Conference on Intelligent User Interfaces

March 24 - 27, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
252
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)2

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Canyakan S(2024)Perceptual differences between AI and human compositions: the impact of musical factors and cultural backgroundRast Müzikoloji Dergisi10.12975/rastmd.2024124512:4(463-490)Online publication date: 30-Dec-2024
https://doi.org/10.12975/rastmd.20241245
Dubiel MSergeeva ALeiva L(2024)Impact of Voice Fidelity on Decision Making: A Potential Dark Pattern?Proceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645202(181-194)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645202
Li MWan YZhou LRao H(2024)An enhanced governance measure for deep synthesis applicationsInformation and Management10.1016/j.im.2024.10398261:5Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1016/j.im.2024.103982
Aylett MShapiro APrasad SNachman LMarcella SScott-Morgan PMakedon F(2022)Peter 2.0: Building a CyborgProceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3529190.3529209(169-175)Online publication date: 29-Jun-2022
https://dl.acm.org/doi/10.1145/3529190.3529209
Li MWan YGao J(2022)What drives the ethical acceptance of deep synthesis applications? A fuzzy set qualitative comparative analysisComputers in Human Behavior10.1016/j.chb.2022.107286133:COnline publication date: 1-Aug-2022
https://dl.acm.org/doi/10.1016/j.chb.2022.107286
Chan SGunasekaran TPai YZhang HNanayakkara S(2021)KinVoices: Using Voices of Friends and Family in Voice InterfacesProceedings of the ACM on Human-Computer Interaction10.1145/34795905:CSCW2(1-25)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3479590

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Voice Puppetry: Towards Conversational HRI WoZ Experiments with Synthesised Voices

Non-linear frequency scale mapping for voice conversion in text-to-speech system with cepstral description

Lithuanian Speech Corpus Liepa for Development of Human-Computer Interfaces Working in Voice Recognition and Synthesis Mode

Comments

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Voice Puppetry: Towards Conversational HRI WoZ Experiments with Synthesised Voices

Non-linear frequency scale mapping for voice conversion in text-to-speech system with cepstral description

Lithuanian Speech Corpus Liepa for Development of Human-Computer Interfaces Working in Voice Recognition and Synthesis Mode

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations