research-article

Unit selection using k-nearest neighbor search for concatenative speech synthesis

Authors:

Hideyuki Mizuno,

Satoshi TakahashiAuthors Info & Claims

IUCS '09: Proceedings of the 3rd International Universal Communication Symposium

Pages 379 - 382

https://doi.org/10.1145/1667780.1667858

Published: 03 December 2009 Publication History

Get Access

Abstract

We propose a new approach to rapidly identifying adequate synthesis units in extremely large speech corpora. Our aim is to develop a concatenative speech synthesis system with high performance (both speech quality and throughput) for various practical applications. Utilizing very large speech corpora allows more natural sounding synthesized speech to be created; the downside is an increase in the time taken to locate the synthesis units needed. The key to overcoming this problem is introducing state-of-the art database retrieval technologies. The first selection step, based on simple hash search, tabulates all synthesis unit candidates. The second step selects N best candidates using nearest neighbor search, a typical database retrieval technique. Finally, the best sequence of synthesis units is determined by Viterbi search. A runtime measurement test and subjective experiment are carried out. Their results confirm that the proposed approach reduces the runtime by about 40% compared to using only hash search with no degradation in the quality of synthesized speech for a 15 hour corpus.

References

[1]

Black, A, W., and Taylor, P. "CHATR: A Generic Speech Synthesis System." In Proc. of COLING, pp. 983--986, 1994

Digital Library

Google Scholar

[2]

Beutnagel, M., Conkie, A., Schroeter, J., Styliano. Y, and Syrdal, A. "The AT&T Next Gen TTS System.", In Joint Mtg. ASAEAA and DEGA, Berlin, 1999.

Crossref

Google Scholar

[3]

Kawai, H., Toda, T., Ni, J, Tsuzaki, M., and Tokuda, K. "XIMERA: A New TTS from ATR Based on Corpus-based Technologies." In Proc. of 5th ISCA Speech Synthesis Workshop (SSW5), Pittsburgh, pp. 179--184, 2004.

Google Scholar

[4]

Hunt, A. J., and Black, A. W. "Unit Selection in a Concatenative Speech Synthesis System Using a Large Speech Database." In Proc. of ICASSP'96, Atlanta, vol. 1, pp. 373--376, 1996.

Digital Library

Google Scholar

[5]

Allauzen, C., Mohri, M. and Riley, M. "Statistical Modeling for Unit Selection in Speech Synthesis." In Proc. of ACL 2004, pp. 55--62, Barcelona, 2004.

Digital Library

Google Scholar

[6]

Conkie, A., Beutnagel, M., Syrdal, A., and Brown, E. "Preselection of Candidate Units in a Unit Selection-Based Text-to-Speech Synthesis System", In Proc. of ICSLP 2000, vol. 3, pp. 314--317, Beijing, 2000.

Google Scholar

[7]

Donovan, R. "Segment Pre-selection in Decision-Tree Based Speech Synthesis Systems." In proc. of ICASSP 2000, vol. 2, Istanbul, pp. 937--940, 2000.

Digital Library

Google Scholar

[8]

Guttman, A. "R-trees: A Dynamic Index Structure for Spatial Searching," In Proc. ACM SIGMOD, Boston, pp. 47--57, 1984.

Digital Library

Google Scholar

[9]

Katayama, N. and Satoh, S. "The sr-tree: An index Structure for High Dimensional Nearest Neighbor Queries." In Proc. of SIGMOD, pp. 369--380, 1997.

Digital Library

Google Scholar

Index Terms

Unit selection using k-nearest neighbor search for concatenative speech synthesis
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition

Recommendations

Acoustic speech unit segmentation for concatenative synthesis

Synthesis by concatenation of natural speech improves perceptual results when phonemes and syllables are segmented at places where spectral variations are small [Klatt, D., 1987. Review of text-to-speech conversion for English. J. Acoust. Soc. Am 82 (3),...
Fast Concatenative Speech Synthesis Using Pre-Fused Speech Units Based on the Plural Unit Selection and Fusion Method

We have previously developed a concatenative speech synthesizer based on the plural speech unit selection and fusion method that can synthesize stable and human-like speech. In this method, plural speech units for each speech segment are selected using ...
Non-uniform unit selection in Vietnamese speech synthesis
SoICT '11: Proceedings of the 2nd Symposium on Information and Communication Technology

In concatenative-based speech synthesis systems, speech is generated by concatenating acoustic units together, so selection of these units directly impacts the quality of synthetic speech. In our previous Text To Speech (TTS) system [8], speech was ...

Comments

Information & Contributors

Information

Published In

IUCS '09: Proceedings of the 3rd International Universal Communication Symposium

December 2009

404 pages

ISBN:9781605586410

DOI:10.1145/1667780

General Chair:
Kazumasa Enami
National Institute of Information and Communications Technology (NICT), Japan

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 December 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

IUCS '09

Sponsor:

NICT

IUCS '09: 3rd International Universal Communication Symposium

December 3 - 4, 2009

Tokyo, Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
138
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

Acoustic speech unit segmentation for concatenative synthesis

Fast Concatenative Speech Synthesis Using Pre-Fused Speech Units Based on the Plural Unit Selection and Fusion Method

Non-uniform unit selection in Vietnamese speech synthesis

Comments

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Other Metrics

Article Metrics

Other Metrics

Login options

Full Access

PDF

eReader

Abstract

References

Index Terms

Recommendations

Acoustic speech unit segmentation for concatenative synthesis

Fast Concatenative Speech Synthesis Using Pre-Fused Speech Units Based on the Plural Unit Selection and Fusion Method

Non-uniform unit selection in Vietnamese speech synthesis

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations