Abstract
Synthesizing Vietnamese tone plays an important role in Vietnamese text-to-speech systems. To accomplish this, the first important step is to determine the pitch-markers of voice utterances and this technique is still an open issue. In this paper, we propose a simple and efficient algorithm that locates the pitch-markers at the peaks of the cumulative signal of each voiced part of the input utterance. The experimentation has shown that the proposed algorithm presents pitch-markers with high accuracy and based on this obtained result, we have already synthesized Vietnamese complex tones such as the drop and the broken tones for isolated syllables with clear hearing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
A short guide to pitch-marking in the festival speech synthesis system and recommendations for improvements. Local Language Speech Technology Initiative (LLSTI) Reports (2004). http://www.llsti.org
Legát, M., Matoušek, J., Tihelka, D.: On the detection of pitch marks using a robust multi-phase algorithm. Speech Commun. 53(4), 552 (2011)
Bořil, H., Pollák, P.: Direct time domain fundamental frequency estimation of speech in noisy conditions. In: EUSIPCO, pp. 1003–1006 (2004)
Wang, D., Hansen, J.H.L.: F0 estimation for noisy speech by exploring temporal harmonic structures in local time frequency spectrum segment. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP. IEEE Press (2016)
Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Kleijn, W.B., Paliwal, K.K. (eds.) Speech Coding and Synthesis. Elsevier (1995)
Chen, J.-H., Kao, Y.-A.: Pitch marking based on an adaptable filter and a peak-valley estimation method. Comput. Linguist. Chin. Lang. Process. 6(2), 1–12 (2001)
Legát, M., Tihelka, D., Matoušek, J.: Pitch marks at peaks or valleys? In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 502–507. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74628-7_65
PRAAT: doing phonetics by computer. http://www.Praat.org
Kounoudes, A., Naylor, P.A., Brookes, M.: The DYPSA algorithm for estimation of glottal closure instants in voiced speech. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP. IEEE Press (2002)
Babacan, O., Drugman, T., d’Alessandro, N., Henrich, N., Dutoit, T.: A quantitative comparison of glottal closure instant estimation algorithms on a Large Variety of Singing Sounds. In: Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP. IEEE Press (2013)
Yin Pitch Estimator (2012). http://audition.ens.fr/adc/sw/yin.zip. Accessed 27 Nov 2012
Charpentier, F., Stella, M.: Diphone synthesis using an overlap-add technique for speech waveforms concatenation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1986. IEEE Press (1986)
Xu, C.X., Xu, Y., Luo, L.-S.: A pitch target approximation model for F0 contours in Mandarin. In: ICPHS99 (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Ta, T.Y., Van Nguyen, H., Van Dao, T., Ngo, H.H., Sergey, A. (2018). An Effective Algorithm for Determining Pitch Markers of Vietnamese Speech Sentences. In: Huang, T., Lv, J., Sun, C., Tuzikov, A. (eds) Advances in Neural Networks – ISNN 2018. ISNN 2018. Lecture Notes in Computer Science(), vol 10878. Springer, Cham. https://doi.org/10.1007/978-3-319-92537-0_72
Download citation
DOI: https://doi.org/10.1007/978-3-319-92537-0_72
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92536-3
Online ISBN: 978-3-319-92537-0
eBook Packages: Computer ScienceComputer Science (R0)