poster

TransVoice: Real-Time Voice Conversion for Augmenting Near-Field Speech Communication

Authors:

Riku Arakawa,

Shinnosuke Takamichi,

Hiroshi SaruwatariAuthors Info & Claims

UIST '19 Adjunct: Adjunct Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology

Pages 33 - 35

https://doi.org/10.1145/3332167.3357106

Published: 14 October 2019 Publication History

Get Access

Abstract

Despite promising initial studies, a speaker's original voice can cause problems when it comes to the application of real-time voice conversion (data-driven speaker conversion) technology in our daily lives, specifically in our near-field communication, because the overlapping speech degrades the sense of immersion to the converted speech. We present TransVoice, a real-time voice conversion system that physically confines original speech with a mask-shaped device. Our preliminary study shows the proposed device can reduce the volume of original speech significantly, while it ameliorates the deteriorated conversion quality of the deep neural network (DNN) thanks to an integrated filter that weakens the low frequency range. We discuss novel applications using TransVoice that can augment our communication.

References

[1]

Riku Arakawa, Shinnosuke Takamichi, and Hiroshi Saruwatari. 2019. Implementation of DNN-based real-time voice conversion and its improvements by audio data augmentation and mask-shaped device. In The 10th ISCA Speech Synthesis Workshop (to appear).

Crossref

Google Scholar

[2]

Rudnick Chad, Sulaiman Emaan, and Orden Jillian. 2018. Effect of virtual reality headset for pediatric fear and pain distraction during immunization. Pain management 8, 3 (2018), 175--179.

Google Scholar

[3]

Marc Delcroix, Katerina Zmolikova, Keisuke Kinoshita, Atsunori Ogawa, and Tomohiro Nakatani. 2018. Single channel target speaker extraction and recognition with speaker beam. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Calgary, Canada, 5554--5558.

Digital Library

Google Scholar

[4]

Arnav Kapur, Shreyas Kapur, and Pattie Maes. 2018. Alterego: A personalized wearable silent speech interface. In 23rd International Conference on Intelligent User Interfaces. ACM, Tokyo, Japan, 43--53.

Digital Library

Google Scholar

[5]

Naoki Kimura, Michinari Kono, and Jun Rekimoto. 2019. SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, Glasgow, United Kingdom, 146.

Digital Library

Google Scholar

[6]

Ji-Zhao Liang and Xing-Hua Jiang. 2012. Soundproofing effect of polypropylene/inorganic particle composites. Composites Part B: Engineering 43, 4 (2012), 1995--1998.

Crossref

Google Scholar

[7]

Yannis Stylianou. 2009. Voice transformation: a survey. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Taipei, Taiwan, 3585--3588.

Digital Library

Google Scholar

[8]

Tomoki Toda, Alan W Black, and Keiichi Tokuda. 2007. Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech, and Language Processing 15, 8 (2007), 2222--2235.

Digital Library

Google Scholar

[9]

Tomoki Toda and Kiyohiro Shikano. 2005. NAM-to-speech conversion with Gaussian mixture models. In Proceedings of the INTERSPEECH2005 - the 9th European Conference on Speech Communication and Technology. ISCA, Lisbon, Portugal, 1957--1960.

Crossref

Google Scholar

[10]

y_benjo and MagnesiumRibbon. 2017. Voice-Actress Corpus. http://voice-statistics.github.io/. (2017).

Google Scholar

Cited By

View all

Chen QGu ZLu LXu XBa ZLin FLiu ZRen K(2024)Conan's Bow Tie: A Streaming Voice Conversion for Real-Time VTuber LivestreamingProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645146(35-50)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645146

Index Terms

TransVoice: Real-Time Voice Conversion for Augmenting Near-Field Speech Communication
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction devices
      1. Sound-based input / output

Recommendations

Speaker-independent HMM-based voice conversion using adaptive quantization of the fundamental frequency

This paper describes a speaker-independent HMM-based voice conversion technique that incorporates context-dependent prosodic symbols obtained using adaptive quantization of the fundamental frequency (F0). In the HMM-based conversion of our previous study,...
Voice conversion by mapping the speaker-specific features using pitch synchronous approach

The basic goal of the voice conversion system is to modify the speaker-specific characteristics, keeping the message and the environmental information contained in the speech signal intact. Speaker characteristics reflect in speech at different levels, ...
Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech

An electrolarynx (EL) is a medical device that generates sound source signals to provide laryngectomees with a voice. In this article we focus on two problems of speech produced with an EL (EL speech). One problem is that EL speech is extremely ...

Comments

Information & Contributors

Information

Published In

UIST '19 Adjunct: Adjunct Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology

October 2019

192 pages

ISBN:9781450368179

DOI:10.1145/3332167

General Chair:
François Guimbretière
Cornell University, USA
,
Program Chairs:
Michael Bernstein
Stanford University, USA
,
Katharina Reinecke
University of Washington, USA

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2019

Check for updates

Author Tags

Qualifiers

Poster

Funding Sources

MIC/SCOPE

Conference

UIST '19

Sponsor:

UIST '19: The 32nd Annual ACM Symposium on User Interface Software and Technology

October 20 - 23, 2019

LA, New Orleans, USA

Acceptance Rates

Overall Acceptance Rate 355 of 1,733 submissions, 20%

Upcoming Conference

UIST '25

Sponsor:
sigchi
sigchi

The 38th Annual ACM Symposium on User Interface Software and Technology

September 28 - October 1, 2025

Busan , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
220
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Chen QGu ZLu LXu XBa ZLin FLiu ZRen K(2024)Conan's Bow Tie: A Streaming Voice Conversion for Real-Time VTuber LivestreamingProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645146(35-50)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645146

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Speaker-independent HMM-based voice conversion using adaptive quantization of the fundamental frequency

Voice conversion by mapping the speaker-specific features using pitch synchronous approach

Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations