Skip to main content

An Effective Algorithm for Determining Pitch Markers of Vietnamese Speech Sentences

  • Conference paper
  • First Online:
Advances in Neural Networks – ISNN 2018 (ISNN 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10878))

Included in the following conference series:

Abstract

Synthesizing Vietnamese tone plays an important role in Vietnamese text-to-speech systems. To accomplish this, the first important step is to determine the pitch-markers of voice utterances and this technique is still an open issue. In this paper, we propose a simple and efficient algorithm that locates the pitch-markers at the peaks of the cumulative signal of each voiced part of the input utterance. The experimentation has shown that the proposed algorithm presents pitch-markers with high accuracy and based on this obtained result, we have already synthesized Vietnamese complex tones such as the drop and the broken tones for isolated syllables with clear hearing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. A short guide to pitch-marking in the festival speech synthesis system and recommendations for improvements. Local Language Speech Technology Initiative (LLSTI) Reports (2004). http://www.llsti.org

  2. Legát, M., Matoušek, J., Tihelka, D.: On the detection of pitch marks using a robust multi-phase algorithm. Speech Commun. 53(4), 552 (2011)

    Article  Google Scholar 

  3. Bořil, H., Pollák, P.: Direct time domain fundamental frequency estimation of speech in noisy conditions. In: EUSIPCO, pp. 1003–1006 (2004)

    Google Scholar 

  4. Wang, D., Hansen, J.H.L.: F0 estimation for noisy speech by exploring temporal harmonic structures in local time frequency spectrum segment. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP. IEEE Press (2016)

    Google Scholar 

  5. Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Kleijn, W.B., Paliwal, K.K. (eds.) Speech Coding and Synthesis. Elsevier (1995)

    Google Scholar 

  6. Chen, J.-H., Kao, Y.-A.: Pitch marking based on an adaptable filter and a peak-valley estimation method. Comput. Linguist. Chin. Lang. Process. 6(2), 1–12 (2001)

    Google Scholar 

  7. Legát, M., Tihelka, D., Matoušek, J.: Pitch marks at peaks or valleys? In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 502–507. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74628-7_65

    Chapter  Google Scholar 

  8. PRAAT: doing phonetics by computer. http://www.Praat.org

  9. Kounoudes, A., Naylor, P.A., Brookes, M.: The DYPSA algorithm for estimation of glottal closure instants in voiced speech. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP. IEEE Press (2002)

    Google Scholar 

  10. Babacan, O., Drugman, T., d’Alessandro, N., Henrich, N., Dutoit, T.: A quantitative comparison of glottal closure instant estimation algorithms on a Large Variety of Singing Sounds. In: Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP. IEEE Press (2013)

    Google Scholar 

  11. Yin Pitch Estimator (2012). http://audition.ens.fr/adc/sw/yin.zip. Accessed 27 Nov 2012

  12. Charpentier, F., Stella, M.: Diphone synthesis using an overlap-add technique for speech waveforms concatenation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1986. IEEE Press (1986)

    Google Scholar 

  13. Xu, C.X., Xu, Y., Luo, L.-S.: A pitch target approximation model for F0 contours in Mandarin. In: ICPHS99 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thai Yen Ta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ta, T.Y., Van Nguyen, H., Van Dao, T., Ngo, H.H., Sergey, A. (2018). An Effective Algorithm for Determining Pitch Markers of Vietnamese Speech Sentences. In: Huang, T., Lv, J., Sun, C., Tuzikov, A. (eds) Advances in Neural Networks – ISNN 2018. ISNN 2018. Lecture Notes in Computer Science(), vol 10878. Springer, Cham. https://doi.org/10.1007/978-3-319-92537-0_72

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-92537-0_72

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-92536-3

  • Online ISBN: 978-3-319-92537-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics