Pitch-Asynchronous Overlap-Add Waveform-Concatenation Speech Synthesis by Using a Phase-Optimizing Neural Network

Tsuda, Keiichi; Hirose, Akira

doi:10.1007/978-3-540-45226-3_46

Keiichi Tsuda⁹ &
Akira Hirose⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2774))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

951 Accesses

Abstract

The pitch-synchronous overlap-add (PSOLA) speech synthesis method has been conventionally used for a high-quality waveform-concatenation. The basis lies in the periodic structure of voiced speech, i.e., the pitchmark. Though the PSOLA-synthesized sound has a high quality so far as the pitchmark detection is successful, it is sometimes degraded to a great extent when it fails to detect the pitchmark or, more fundamentally, when the sound is unvoiced consonant. In this paper, we propose a pitch-asynchronous waveform-concatenation speech synthesis method. It is based on an adaptive phase optimization by using a complex-valued neural processing to maintain a desirable degree of pulse sharpness. Experimental results demonstrate a successful generation of high-quality sound.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Charpentier, F.J., Stella, M.G.: Diphone synthesis using an overlap-add technique for speech waveforms concatenation. In: ICASSP 1986, pp. 2015–2018 (1986)
Google Scholar
Banno, H., Lu, J., Nakamura, S., Shikano, K., Kawahara, H.: Efficient representation of short-time phase based on group delay. In: ICASSP, pp. 861–864 (1998)
Google Scholar
Hamon, C., Moulines, E., Charpentier, F.: A diphone synthesis system based on time-domain prosodic modifications of speech. ICASSP 1989, 238–241 (1989)
Google Scholar
Charpentier, F., Moulines, E.: Text-to-speech algorithms based on FFT synthesis. In: ICASSP 1988, pp. 667–670 (1988)
Google Scholar
Thomson, D.L., Prezas, D.P.: Selective modeling of the LPC residual during unvoiced frames. In: ICASSP, pp. 3087–3090 (1986)
Google Scholar
Schroeder, M.R.: New results concerning monaural phase sensitivity. J. Acoust. Soc. Am. (1959)
Google Scholar
Hirose, A. (ed.): Complex-valued Neural Networks: Theories and Applications. World Scientific Publishing Co. Pte. Ltd., Singapore (2003) (to be published)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. Frontier Informatics, The University of Tokyo,
Keiichi Tsuda & Akira Hirose

Authors

Keiichi Tsuda
View author publications
You can also search for this author in PubMed Google Scholar
Akira Hirose
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computing Laboratory, Oxford University, Parks Road, OXI 3QD, Oxford, United Kingdom
Vasile Palade
Centre for SMART Systems, School of Environment and Technology, University of Brighton, BN2 4GJ, Brighton, UK
Robert J. Howlett
Knowledge-Based Intelligent Engineering Systems Centre, University of South Australia, Mawson Lakes, SA 5095, Adelaide, Australia
Lakhmi Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tsuda, K., Hirose, A. (2003). Pitch-Asynchronous Overlap-Add Waveform-Concatenation Speech Synthesis by Using a Phase-Optimizing Neural Network. In: Palade, V., Howlett, R.J., Jain, L. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2003. Lecture Notes in Computer Science(), vol 2774. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45226-3_46

Download citation

DOI: https://doi.org/10.1007/978-3-540-45226-3_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40804-8
Online ISBN: 978-3-540-45226-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics