skip to main content
10.1145/3332167.3357106acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
poster

TransVoice: Real-Time Voice Conversion for Augmenting Near-Field Speech Communication

Published: 14 October 2019 Publication History

Abstract

Despite promising initial studies, a speaker's original voice can cause problems when it comes to the application of real-time voice conversion (data-driven speaker conversion) technology in our daily lives, specifically in our near-field communication, because the overlapping speech degrades the sense of immersion to the converted speech. We present TransVoice, a real-time voice conversion system that physically confines original speech with a mask-shaped device. Our preliminary study shows the proposed device can reduce the volume of original speech significantly, while it ameliorates the deteriorated conversion quality of the deep neural network (DNN) thanks to an integrated filter that weakens the low frequency range. We discuss novel applications using TransVoice that can augment our communication.

References

[1]
Riku Arakawa, Shinnosuke Takamichi, and Hiroshi Saruwatari. 2019. Implementation of DNN-based real-time voice conversion and its improvements by audio data augmentation and mask-shaped device. In The 10th ISCA Speech Synthesis Workshop (to appear).
[2]
Rudnick Chad, Sulaiman Emaan, and Orden Jillian. 2018. Effect of virtual reality headset for pediatric fear and pain distraction during immunization. Pain management 8, 3 (2018), 175--179.
[3]
Marc Delcroix, Katerina Zmolikova, Keisuke Kinoshita, Atsunori Ogawa, and Tomohiro Nakatani. 2018. Single channel target speaker extraction and recognition with speaker beam. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Calgary, Canada, 5554--5558.
[4]
Arnav Kapur, Shreyas Kapur, and Pattie Maes. 2018. Alterego: A personalized wearable silent speech interface. In 23rd International Conference on Intelligent User Interfaces. ACM, Tokyo, Japan, 43--53.
[5]
Naoki Kimura, Michinari Kono, and Jun Rekimoto. 2019. SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, Glasgow, United Kingdom, 146.
[6]
Ji-Zhao Liang and Xing-Hua Jiang. 2012. Soundproofing effect of polypropylene/inorganic particle composites. Composites Part B: Engineering 43, 4 (2012), 1995--1998.
[7]
Yannis Stylianou. 2009. Voice transformation: a survey. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Taipei, Taiwan, 3585--3588.
[8]
Tomoki Toda, Alan W Black, and Keiichi Tokuda. 2007. Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech, and Language Processing 15, 8 (2007), 2222--2235.
[9]
Tomoki Toda and Kiyohiro Shikano. 2005. NAM-to-speech conversion with Gaussian mixture models. In Proceedings of the INTERSPEECH2005 - the 9th European Conference on Speech Communication and Technology. ISCA, Lisbon, Portugal, 1957--1960.
[10]
y_benjo and MagnesiumRibbon. 2017. Voice-Actress Corpus. http://voice-statistics.github.io/. (2017).

Cited By

View all
  • (2024)Conan's Bow Tie: A Streaming Voice Conversion for Real-Time VTuber LivestreamingProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645146(35-50)Online publication date: 18-Mar-2024

Index Terms

  1. TransVoice: Real-Time Voice Conversion for Augmenting Near-Field Speech Communication

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    UIST '19 Adjunct: Adjunct Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology
    October 2019
    192 pages
    ISBN:9781450368179
    DOI:10.1145/3332167
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 October 2019

    Check for updates

    Author Tags

    1. deep neural network
    2. speech communication
    3. voice conversion

    Qualifiers

    • Poster

    Funding Sources

    • MIC/SCOPE

    Conference

    UIST '19

    Acceptance Rates

    Overall Acceptance Rate 355 of 1,733 submissions, 20%

    Upcoming Conference

    UIST '25
    The 38th Annual ACM Symposium on User Interface Software and Technology
    September 28 - October 1, 2025
    Busan , Republic of Korea

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Conan's Bow Tie: A Streaming Voice Conversion for Real-Time VTuber LivestreamingProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645146(35-50)Online publication date: 18-Mar-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media