skip to main content
10.1145/3131785.3131835acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
poster

Contour: An Efficient Voice-enabled Workflow for Producing Text-to-Speech Content

Published: 20 October 2017 Publication History

Abstract

Voice assistant technology has expanded the design space for voice-activated consumer products and audio-centric user experience. To navigate this emerging design space, Speech Synthesis Markup Language (SSML) provides a standard to characterize synthetic speech based on parametric control of the prosody elements, i.e. pitch, rate, volume, contour, range, and duration. However, the existing voice assistants utilizing Text-to-Speech (TTS) lack expressiveness. The need of a new production workflow for more efficient and emotional audio content using TTS is discussed. A prototype that allows a user to produce TTS-based content in any emotional tone using voice input is presented. To evaluate the new workflow enabled by the prototype, an initial comparative study is conducted against the parametric approach. Preliminary quantitative and qualitative results suggest the new workflow is more efficient based on time to complete tasks and number of design iterations, while maintaining the same level of user preferred production quality.

Supplementary Material

PDF File (uistpp0178-file4.pdf)

References

[1]
AMFM decompy. https://pypi.python.org/pypi/AMFM_decompy.
[2]
Baume, C., Plumbley, M. D., and Calic, J. Use of audio editors in radio production. In Audio Engineering Society Convention 138, Audio Engineering Society (2015).
[3]
Boersma, P. Praat: doing phonetics by computer. http://www.praat.org/ (2006).
[4]
Jin, Z., Mysore, G. J., DiVerdi, S., Lu, J., and Finkelstein, A. VoCo: Text-based insertion and replacement in audio narration. ACM Transactions on Graphics 36, 4 (July 2017), Article 96, 13 pages.
[5]
Mahrt, T. PraatIO. https://github.com/timmahrt/praatIO.
[6]
Oord, A. v. d., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., and Kavukcuoglu, K. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016).
[7]
Rubin, S., Berthouzoz, F., Mysore, G. J., Li, W., and Agrawala, M. Content-based tools for editing audio stories. In Proceedings of the 26th annual ACM symposium on User interface software and technology, ACM (2013), 113--122.
[8]
Zahorian, S. A., and Hu, H. A spectral/temporal method for robust fundamental frequency tracking. The Journal of the Acoustical Society of America 123, 6 (2008), 4559--4571.

Cited By

View all
  • (2021)Designers Characterize Naturalness in Voice User Interfaces: Their Goals, Practices, and ChallengesProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445579(1-13)Online publication date: 6-May-2021
  • (2020)Reader: Speech Synthesizer and Speech RecognizerInternational Conference on Innovative Computing and Communications10.1007/978-981-15-5148-2_76(877-886)Online publication date: 31-Jul-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
UIST '17 Adjunct: Adjunct Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology
October 2017
217 pages
ISBN:9781450354196
DOI:10.1145/3131785
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2017

Check for updates

Author Tags

  1. audio production workflow
  2. text-to-speech
  3. voice user interface

Qualifiers

  • Poster

Conference

UIST '17

Acceptance Rates

UIST '17 Adjunct Paper Acceptance Rate 73 of 324 submissions, 23%;
Overall Acceptance Rate 355 of 1,733 submissions, 20%

Upcoming Conference

UIST '25
The 38th Annual ACM Symposium on User Interface Software and Technology
September 28 - October 1, 2025
Busan , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Designers Characterize Naturalness in Voice User Interfaces: Their Goals, Practices, and ChallengesProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445579(1-13)Online publication date: 6-May-2021
  • (2020)Reader: Speech Synthesizer and Speech RecognizerInternational Conference on Innovative Computing and Communications10.1007/978-981-15-5148-2_76(877-886)Online publication date: 31-Jul-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media