skip to main content
10.1145/1667780.1667858acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiucsConference Proceedingsconference-collections
research-article

Unit selection using k-nearest neighbor search for concatenative speech synthesis

Published: 03 December 2009 Publication History

Abstract

We propose a new approach to rapidly identifying adequate synthesis units in extremely large speech corpora. Our aim is to develop a concatenative speech synthesis system with high performance (both speech quality and throughput) for various practical applications. Utilizing very large speech corpora allows more natural sounding synthesized speech to be created; the downside is an increase in the time taken to locate the synthesis units needed. The key to overcoming this problem is introducing state-of-the art database retrieval technologies. The first selection step, based on simple hash search, tabulates all synthesis unit candidates. The second step selects N best candidates using nearest neighbor search, a typical database retrieval technique. Finally, the best sequence of synthesis units is determined by Viterbi search. A runtime measurement test and subjective experiment are carried out. Their results confirm that the proposed approach reduces the runtime by about 40% compared to using only hash search with no degradation in the quality of synthesized speech for a 15 hour corpus.

References

[1]
Black, A, W., and Taylor, P. "CHATR: A Generic Speech Synthesis System." In Proc. of COLING, pp. 983--986, 1994
[2]
Beutnagel, M., Conkie, A., Schroeter, J., Styliano. Y, and Syrdal, A. "The AT&T Next Gen TTS System.", In Joint Mtg. ASAEAA and DEGA, Berlin, 1999.
[3]
Kawai, H., Toda, T., Ni, J, Tsuzaki, M., and Tokuda, K. "XIMERA: A New TTS from ATR Based on Corpus-based Technologies." In Proc. of 5th ISCA Speech Synthesis Workshop (SSW5), Pittsburgh, pp. 179--184, 2004.
[4]
Hunt, A. J., and Black, A. W. "Unit Selection in a Concatenative Speech Synthesis System Using a Large Speech Database." In Proc. of ICASSP'96, Atlanta, vol. 1, pp. 373--376, 1996.
[5]
Allauzen, C., Mohri, M. and Riley, M. "Statistical Modeling for Unit Selection in Speech Synthesis." In Proc. of ACL 2004, pp. 55--62, Barcelona, 2004.
[6]
Conkie, A., Beutnagel, M., Syrdal, A., and Brown, E. "Preselection of Candidate Units in a Unit Selection-Based Text-to-Speech Synthesis System", In Proc. of ICSLP 2000, vol. 3, pp. 314--317, Beijing, 2000.
[7]
Donovan, R. "Segment Pre-selection in Decision-Tree Based Speech Synthesis Systems." In proc. of ICASSP 2000, vol. 2, Istanbul, pp. 937--940, 2000.
[8]
Guttman, A. "R-trees: A Dynamic Index Structure for Spatial Searching," In Proc. ACM SIGMOD, Boston, pp. 47--57, 1984.
[9]
Katayama, N. and Satoh, S. "The sr-tree: An index Structure for High Dimensional Nearest Neighbor Queries." In Proc. of SIGMOD, pp. 369--380, 1997.

Index Terms

  1. Unit selection using k-nearest neighbor search for concatenative speech synthesis

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    IUCS '09: Proceedings of the 3rd International Universal Communication Symposium
    December 2009
    404 pages
    ISBN:9781605586410
    DOI:10.1145/1667780
    • General Chair:
    • Kazumasa Enami
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • NICT: National Institute of Information and Communications Technology

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 December 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. concatenative speech synthesis
    2. nearest neighbor search
    3. synthesis unit selection
    4. text to speech

    Qualifiers

    • Research-article

    Conference

    IUCS '09
    Sponsor:
    • NICT

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 138
      Total Downloads
    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media