poster

Contour: An Efficient Voice-enabled Workflow for Producing Text-to-Speech Content

Authors:

Yuan-Yi Fan,

Soyoung Shin,

Vids SamantaAuthors Info & Claims

UIST '17 Adjunct: Adjunct Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology

Pages 133 - 135

https://doi.org/10.1145/3131785.3131835

Published: 20 October 2017 Publication History

Get Access

Abstract

Voice assistant technology has expanded the design space for voice-activated consumer products and audio-centric user experience. To navigate this emerging design space, Speech Synthesis Markup Language (SSML) provides a standard to characterize synthetic speech based on parametric control of the prosody elements, i.e. pitch, rate, volume, contour, range, and duration. However, the existing voice assistants utilizing Text-to-Speech (TTS) lack expressiveness. The need of a new production workflow for more efficient and emotional audio content using TTS is discussed. A prototype that allows a user to produce TTS-based content in any emotional tone using voice input is presented. To evaluate the new workflow enabled by the prototype, an initial comparative study is conducted against the parametric approach. Preliminary quantitative and qualitative results suggest the new workflow is more efficient based on time to complete tasks and number of design iterations, while maintaining the same level of user preferred production quality.

Supplementary Material

PDF File (uistpp0178-file4.pdf)

Download
1.06 MB

References

[1]

AMFM decompy. https://pypi.python.org/pypi/AMFM_decompy.

Google Scholar

[2]

Baume, C., Plumbley, M. D., and Calic, J. Use of audio editors in radio production. In Audio Engineering Society Convention 138, Audio Engineering Society (2015).

Google Scholar

[3]

Boersma, P. Praat: doing phonetics by computer. http://www.praat.org/ (2006).

Google Scholar

[4]

Jin, Z., Mysore, G. J., DiVerdi, S., Lu, J., and Finkelstein, A. VoCo: Text-based insertion and replacement in audio narration. ACM Transactions on Graphics 36, 4 (July 2017), Article 96, 13 pages.

Digital Library

Google Scholar

[5]

Mahrt, T. PraatIO. https://github.com/timmahrt/praatIO.

Google Scholar

[6]

Oord, A. v. d., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., and Kavukcuoglu, K. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016).

Google Scholar

[7]

Rubin, S., Berthouzoz, F., Mysore, G. J., Li, W., and Agrawala, M. Content-based tools for editing audio stories. In Proceedings of the 26th annual ACM symposium on User interface software and technology, ACM (2013), 113--122.

Digital Library

Google Scholar

[8]

Zahorian, S. A., and Hu, H. A spectral/temporal method for robust fundamental frequency tracking. The Journal of the Acoustical Society of America 123, 6 (2008), 4559--4571.

Crossref

Google Scholar

Cited By

View all

Kim YReza MMcGrenere JYoon DKitamura YQuigley AIsbister KIgarashi TBjørn PDrucker S(2021)Designers Characterize Naturalness in Voice User Interfaces: Their Goals, Practices, and ChallengesProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445579(1-13)Online publication date: 6-May-2021
https://dl.acm.org/doi/10.1145/3411764.3445579
Khan MSaiyeda A(2020)Reader: Speech Synthesizer and Speech RecognizerInternational Conference on Innovative Computing and Communications10.1007/978-981-15-5148-2_76(877-886)Online publication date: 31-Jul-2020
https://doi.org/10.1007/978-981-15-5148-2_76

Index Terms

Contour: An Efficient Voice-enabled Workflow for Producing Text-to-Speech Content

Recommendations

Analysis and modeling of F0 contours for cantonese text-to-speech

For the generation of highly natural synthetic speech, the control of prosody is of primary importance. The fundamental frequency (F0) is one of the most important components of speech prosody. This research investigates the variation of F0 in ...
Modeling improved syllabification algorithm for Amharic
MEDES '12: Proceedings of the International Conference on Management of Emergent Digital EcoSystems

In this paper, a rule-based automatic syllabification Algorithm for Amharic language using linguistic implementation notions is designed following the Maximal Onset and Sonority Hierarchy principles. Amharic is a syllabic language in which every ...
SilentWhisper: faint whisper speech using wearable microphone
UIST '22 Adjunct: Adjunct Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology

Voice interaction is a fundamental human capacity, and we can use voice user interfaces just speaking. However, in public spaces, we are hesitant to use them because of consideration for their surroundings and low privacy. Silent speech, a method that ...

Comments

Information & Contributors

Information

Published In

UIST '17 Adjunct: Adjunct Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology

October 2017

217 pages

ISBN:9781450354196

DOI:10.1145/3131785

General Chair:
Krzysztof Gajos
Harvard University
,
Program Chairs:
Jennifer Mankoff
Carnegie Mellon University
,
Chris Harrison
Carnegie Mellon University

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2017

Check for updates

Author Tags

Qualifiers

Poster

Conference

UIST '17

Sponsor:

UIST '17: The 30th Annual ACM Symposium on User Interface Software and Technology

October 22 - 25, 2017

QC, Québec City, Canada

Acceptance Rates

UIST '17 Adjunct Paper Acceptance Rate 73 of 324 submissions, 23%;

Overall Acceptance Rate 355 of 1,733 submissions, 20%

Upcoming Conference

UIST '25

Sponsor:
sigchi
sigchi

The 38th Annual ACM Symposium on User Interface Software and Technology

September 28 - October 1, 2025

Busan , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
234
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Kim YReza MMcGrenere JYoon DKitamura YQuigley AIsbister KIgarashi TBjørn PDrucker S(2021)Designers Characterize Naturalness in Voice User Interfaces: Their Goals, Practices, and ChallengesProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445579(1-13)Online publication date: 6-May-2021
https://dl.acm.org/doi/10.1145/3411764.3445579
Khan MSaiyeda A(2020)Reader: Speech Synthesizer and Speech RecognizerInternational Conference on Innovative Computing and Communications10.1007/978-981-15-5148-2_76(877-886)Online publication date: 31-Jul-2020
https://doi.org/10.1007/978-981-15-5148-2_76

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Analysis and modeling of F0 contours for cantonese text-to-speech

Modeling improved syllabification algorithm for Amharic

SilentWhisper: faint whisper speech using wearable microphone

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations