Abstract
The pitch-synchronous overlap-add (PSOLA) speech synthesis method has been conventionally used for a high-quality waveform-concatenation. The basis lies in the periodic structure of voiced speech, i.e., the pitchmark. Though the PSOLA-synthesized sound has a high quality so far as the pitchmark detection is successful, it is sometimes degraded to a great extent when it fails to detect the pitchmark or, more fundamentally, when the sound is unvoiced consonant. In this paper, we propose a pitch-asynchronous waveform-concatenation speech synthesis method. It is based on an adaptive phase optimization by using a complex-valued neural processing to maintain a desirable degree of pulse sharpness. Experimental results demonstrate a successful generation of high-quality sound.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Charpentier, F.J., Stella, M.G.: Diphone synthesis using an overlap-add technique for speech waveforms concatenation. In: ICASSP 1986, pp. 2015–2018 (1986)
Banno, H., Lu, J., Nakamura, S., Shikano, K., Kawahara, H.: Efficient representation of short-time phase based on group delay. In: ICASSP, pp. 861–864 (1998)
Hamon, C., Moulines, E., Charpentier, F.: A diphone synthesis system based on time-domain prosodic modifications of speech. ICASSP 1989, 238–241 (1989)
Charpentier, F., Moulines, E.: Text-to-speech algorithms based on FFT synthesis. In: ICASSP 1988, pp. 667–670 (1988)
Thomson, D.L., Prezas, D.P.: Selective modeling of the LPC residual during unvoiced frames. In: ICASSP, pp. 3087–3090 (1986)
Schroeder, M.R.: New results concerning monaural phase sensitivity. J. Acoust. Soc. Am. (1959)
Hirose, A. (ed.): Complex-valued Neural Networks: Theories and Applications. World Scientific Publishing Co. Pte. Ltd., Singapore (2003) (to be published)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tsuda, K., Hirose, A. (2003). Pitch-Asynchronous Overlap-Add Waveform-Concatenation Speech Synthesis by Using a Phase-Optimizing Neural Network. In: Palade, V., Howlett, R.J., Jain, L. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2003. Lecture Notes in Computer Science(), vol 2774. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45226-3_46
Download citation
DOI: https://doi.org/10.1007/978-3-540-45226-3_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40804-8
Online ISBN: 978-3-540-45226-3
eBook Packages: Springer Book Archive