Speech Database Design for a Concatenative Text-to-Speech Synthesis System for Individuals with Communication Disorders

Iida, Akemi; Campbell, Nick

doi:10.1023/A:1025761017833

Speech Database Design for a Concatenative Text-to-Speech Synthesis System for Individuals with Communication Disorders

Published: October 2003

Volume 6, pages 379–392, (2003)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Akemi Iida¹ &
Nick Campbell²

258 Accesses
13 Citations
Explore all metrics

Abstract

ATR's CHATR is a corpus-based text-to-speech (TTS) synthesis system that selects concatenation units from a natural speech database. The system's approach enables us to create a voice output communication aid (VOCA) using the voices of individuals who are anticipating the loss of phonatory functions. The advantage of CHATR is that individuals can use their own voice for communication even after vocal loss. This paper reports on a case study of the development of a VOCA using recordings of Japanese read speech (i.e., oral reading) from an individual with amyotrophic lateral sclerosis (ALS). In addition to using the individual's speech, we designed a speech database that could reproduce the characteristics of natural utterances in both general and specific situations. We created three speech corpora in Japanese to synthesize ordinary daily speech (i.e., in a normal speaking style): (1) a phonetically balanced sentence set, to assure that the system was able to synthesize all speech sounds; (2) readings of manuscripts, written by the same individual, for synthesizing talks regularly given as a source of natural intonation, articulation and voice quality; and (3) words and short phrases, to provide daily vocabulary entries for reproducing natural utterances in predictable situations. By combining one or more corpora, we were able to create four kinds of source database for CHATR synthesis. Using each source database, we synthesized speech from six test sentences. We selected the source database to use by observing selected units of synthesized speech and by performing perceptual experiments where we presented the speech to 20 Japanese native speakers. Analyzing the results of both observations and evaluations, we selected a source database compiled from all corpora. Incorporating CHATR, the selected source database, and an input acceleration function, we developed a VOCA for the individual to use in his daily life. We also created emotional speech source databases designed for loading separately to the VOCA in addition to the compiled speech database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Perception of vocoded speech in domestic dogs

Article Open access 16 April 2024

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

References

Abe, M., Sagisaka, Y., Umeda, T., and Kuwabara, H. (1990). ATR Technical Report TR-I-0166, Speech Database User's Manual. ATR Interpreting Telephony Research Lab. (in Japanese)
Beukelman, D.R., Yorkston, K.M., Poblete, M., and Naranjo, C. (1984). Frequency of word usage in communication samples produced by adult communication aid users. Journal of Speech & Hearing Disorders, 49:360-367.
Google Scholar
Black A. and Campbell, N. (1995). Optimising selection of units from speech databases for concatenative synthesis. Proceedings of Eurospeech 95, Madrid, Spain, pp. 581-584.
Black A. and Hunt. A. (1996). Generating F0 contours from ToBI labels using linear regression. Proceedings of ICSLP96, Philadelphia, PA, vol. 3, pp. 1385-1388.
Google Scholar
Cambridge Adaptive Communication (2002). Cambridge Homepage. Retrieved May 20, 2003 from http://www.possum.co.uk/ Cambridge/Index.htm
Campbell,W.N. (1996). Autolabelling Japanese TOBI. Proceedings of ICSLP96, Philadelphia, PA, vol. 4, pp. 2399-2402.
Campbell, W.N. and Black, A. (1997). Prosody and the selection of source units for concatenative synthesis. In J. van Santen, R. Sproat, J. Olive, and J. Hirshberg, (Eds.), Progress in Speech Synthesis. New York, NY: Springer-Verlag, pp. 279-292.
Google Scholar
Chen, J. and Campbell, N. (1999). Objective distance measures for assessing concatenative speech synthesis, Proceedings of Eurospeech99, Budapest, Hungary, pp. 611-614.
Conroy, D., Vitale, T., and Klatt, D.H. (1986). DECtalk DTC03 Text-to-Speech System Owner's Manual, EK-DTC03-OM-001, Nashua, NH: Educational Services of Digital Equipment Corporation.
Google Scholar
Hallahan, W.I. (1996). DECtalk Software: Text-to-Speech Technology and Implementation, Retrieved May 20, 2003 from http://research.compaq.com/wrl/DECarchives/DTJ/DTJK01/ Hitachi Keiyo Engineering and Systems Ltd. (n.d). Hitachi Keiii Sisutemuzu, “Den no Shin” [Hitachi Keiyo Engineering and Systems Ltd. “Den no Shin”]. Retrieved May 20, 2003, from http://www.hke.co.jp/products/dennosin/denindex.htm (in Japanese)
Iida, A., Higuchi, F., Campbell, N., and Yasumura, M (2003). A corpus-based speech synthesis system with emotion, Speech Communication, 40:161-187.
Google Scholar
KTH Department of Speech, Music and Hearing (2003).WaveSurfer. Retrieved May 20, 2003 from http://www.speech.kth.se/ wavesurfer/
Marumoto, T. and Ding, W. (1998). ATR Technical Report TR-IT-0276 Improving Prosody of CHATR Output Speech Based on Partial PSOLA and a MOS Decision Tree, ATR Interpreting Telephony Research Lab. (In Japanese). Motor Neurone Disease Association (n.d.) What is MND? Retrieved May 20, 2003 from http://www.mndassociation.org/fullsite/ what/index.htm
Namco Co., Ltd. (n.d.).Welfare. RetrievedMay 20, 2003 from http:// www.namco.co.jp/welfare/disabled/index.html (in Japanese).
National Institute of Neurological Disorders and Stroke (2001). NINDS Muscular Dystrophy (MD) Information Page; NINDS Motor Neuron Diseases Information Page. Retrieved May 20, 2003 from http://www.ninds.nih.gov/health and medical/disorders/md.htm; http://www.ninds.nih.gov/health and medical/disorders/motor neuron diseases.htm Ricoh Co., Ltd. (n.d.). News Release. Retrieved March 10, 2003 from http://www.ricoh.co.jp/release/soft/yuben/ (in Japanese).
Sall, J. and Lehman, A. (1996). JMP Start Statistics. SAS Institute, Belmont, CA: Duxbury Press.
Toyoura, Y. (1996). Inochi no Komyunikeishon [Communication between lives]. Touhou Shuppan, Osaka, Japan (in Japanese). University of Nebraska-Lincoln, Aphasia Group (n.d.). Using Augmentative and Alternative Communication with People with Aphasia. Retrieved May 20, 2003 from http://www.unl.edu/ aphasia/AAC.html
Vandelheiden, G.C. and Kelso, D.P. (1987). Comparative analysis of fixed-vocabulary communication acceleration techniques. Augmentative and Alternative Communication, 3:196-206.
Google Scholar
Venditti, J.J. (1995). Japanese ToBI Labelling Guidelines, Technical Report, Ohio State University, Cleveland, OH.
Yamaguchi, S. (2000). Pasokon wo Tsukaikonasou [Let's Use PC]. Japan Amyotropic Lateral Sclerosis Assosiation, Fukuoka branch Newsletter, No. 6 (in Japanese).
Yorkston, K.M., Dowden, P. A., Honsinger, M.J., Marriner, N., and Smith, K. (1988). A comparison of standard and user vocabulary lists. Augmentative and Alternative Communication, 4:189-210.
Google Scholar

Download references

Author information

Authors and Affiliations

Keio Research Institute at SFC, Keio University; JST (Japan Science and Technology), CREST, Japan
Akemi Iida
CREST; ATR Human Information Sciences Research Laboratories, JST (Japan Science and Technology), Japan
Nick Campbell

Authors

Akemi Iida
View author publications
You can also search for this author in PubMed Google Scholar
Nick Campbell
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Iida, A., Campbell, N. Speech Database Design for a Concatenative Text-to-Speech Synthesis System for Individuals with Communication Disorders. International Journal of Speech Technology 6, 379–392 (2003). https://doi.org/10.1023/A:1025761017833

Download citation

Issue Date: October 2003
DOI: https://doi.org/10.1023/A:1025761017833

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech Database Design for a Concatenative Text-to-Speech Synthesis System for Individuals with Communication Disorders

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Perception of vocoded speech in domestic dogs

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Speech Database Design for a Concatenative Text-to-Speech Synthesis System for Individuals with Communication Disorders

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Perception of vocoded speech in domestic dogs

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation