Skip to main content
Log in

Parameterization of Excitation Signal for Improving the Quality of HMM-Based Speech Synthesis System

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

This paper proposes a new approach of parameterizing the excitation signal for improving the quality of HMM-based speech synthesis system. The proposed method tries to model the excitation or residual signal by segregating the regions of the residual signal based on their perceptual importance. Initially, a study on the characteristics of the residual signal around glottal closure instant (GCI) is performed using principal component analysis (PCA). Based on the present study, and from the previous literature (Adiga and Prasanna in Proceedings of Interspeech, pp 1677–1681, 2013; Cabral in Proceedings of Interspeech, pp 1082–1086, 2013), it is concluded that the segment of the residual signal around GCI which carries perceptually important information is considered as the deterministic component and the remaining part of the residual signal is considered as the noise component. The deterministic component is compactly represented using PCA coefficients (with about 95% accuracy), and the noise component is parameterized in terms of spectral and amplitude envelopes. The proposed excitation modeling approach is incorporated in the HMM-based speech synthesis system. Subjective evaluation results show a significant improvement of quality for both female and male speakers’ speech synthesized by the proposed method, compared to three existing excitation modeling methods. Accurate parameterization of the segment of the residual signal around GCI resulted in the improvement of the quality of the synthesized speech. Synthesized speech samples of the proposed and existing source models are made available online at http://www.sit.iitkgp.ernet.in/~ksrao/parametric-hts/pcd-hts.html.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. N. Adiga, S.R.M. Prasanna, Significance of instants of significant excitation for source modeling, in Proceedings of Interspeech (2013), pp. 1677–1681

  2. P. Alku, Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Commun. 11(2–3), 109–118 (1992)

    Article  Google Scholar 

  3. J.P. Cabral, S. Renals, J. Yamagishi, K. Richmond, HMM-based speech synthesiser using the LF-model of the glottal source, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 4704–4707

  4. J.P. Cabral, Uniform concatenative excitation model for synthesising speech without voiced/unvoiced classification, in Proceedings of Interspeech (2013) pp. 1082–1086

  5. CMU ARCTIC speech synthesis databases (online). http://festvox.org/cmu_arctic/

  6. T.G. Csapó, G. Németh, A novel irregular voice model for HMM-based speech synthesis. in Proceedings of ISCA Speech Synthesis Workshop (2013), pp. 229–234

  7. T.G. Csapó, G. Németh, Modeling irregular voice in statistical parametric speech synthesis with residual codebook based excitation. IEEE J. Sel. Top. Signal Process. 8(2), 209–220 (2014)

    Article  Google Scholar 

  8. T. Drugman, A. Moinet, T. Dutoit, G. Wilfart, Using a pitch-synchrounous residual codebook for hybrid HMM/frame selection speech synthesis, in Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2009), pp. 3793–3796

  9. T. Drugman, G. Wilfart, T. Dutoit, A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis, in Proceeding of Interspeech (2009), pp. 1779–1782

  10. T. Drugman, G. Wilfart, T. Dutoit, Eigenresiduals for improved parametric speech synthesis, in Proceedings of European Signal Processing Conference (EUSIPCO) (2009), pp. 2177–2180

  11. T. Drugman, T. Dutoit, The deterministic plus stochastic model of the residual signal and its applications. IEEE Trans. Audio Speech Lang. Process. 20(3), 968–981 (2012)

    Article  Google Scholar 

  12. T. Drugman, T. Raitio, Excitation modeling for HMM-based speech synthesis: breaking down the impact of periodic and aperiodic components, in Proceedings of International Conference on Audio, Speech and Signal Processing (ICASSP) (2014), pp. 260–264

  13. HMM-based speech synthesis system (HTS) (online). http://hts.sp.nitech.ac.jp/

  14. X. Huang, A. Acero, H.W. Hon, Spoken Language Processing: A Guide to Theory, Algorithm and System Development (Prentice Hall, Upper Saddle River, 2001)

    Google Scholar 

  15. ITU-T Draft Recommendation P.862, Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs (2000)

  16. H. Kawahara, I. Masuda-Katsuse, A. de Cheveigne, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27, 187–207 (1998)

    Article  Google Scholar 

  17. H. Kawahara, M. Morise, T. Takahashi, R. Nisimura, T. Irino, H. Banno, Tandem-STRAIGHT: a temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation, in Proceeding of International Conference on Audio, Speech and Signal Processing (ICASSP) (2008), pp. 3933–3936

  18. S. Kim, J. Kim, M. Hahn, HMM-based Korean speech synthesis system for hand-held devices. IEEE Trans. Consum. Electron. 52, 1384–1390 (2006)

    Article  Google Scholar 

  19. P. Loizou, Speech Enhancement: Theory and Practice (CRC Press, Boca Raton, 2007)

    Google Scholar 

  20. S.L. Maguer, N. Barbot, O. Boeffard, Evaluation of contextual descriptors for HMM-based speech synthesis in French, in Proceedings of ISCA Speech Synthesis Workshop (2013), pp. 153–158

  21. R. Maia, T. Toda, H. Zen, Y. Nankaku, K. Tokuda, An excitation model for HMM-based speech synthesis based on residual modeling, in Proceeding of International Speech Communication Association Speech Synthesis Workshop 6 (ISCA SW6) (2007), pp. 131–136

  22. J.D. Markel, A.H. Gray, Linear Prediction of Speech (Springer, Berlin, 1976)

    Book  MATH  Google Scholar 

  23. A. McCree, K. Truong, E. George, T. Barnwell, V. Viswanathan, A 2.4 kbit/s MELP coder candidate for the new U.S. Federal Standard, in Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (1996), pp. 200–203

  24. A. McCree, A 14 kb/s wideband speech coder with a parametric highband model, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2000), pp. 1153–1156

  25. K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)

    Article  Google Scholar 

  26. N.P. Narendra, K.S. Rao, K. Ghosh, R.R. Vempada, S. Maity, Development of syllable-based text to speech synthesis system in Bengali. Int. J. Speech Technol. 14(3), 167–181 (2011)

    Article  Google Scholar 

  27. N.P. Narendra, K.S. Rao, K. Ghosh, V.R. Reddy, S. Maity, Development of Bengali screen reader using Festival speech synthesizer, in Proceedings of IEEE India Conference (INDICON) (2011), pp. 1–4

  28. N.P. Narendra, K.S. Rao, Robust voicing detection and F0 estimation for HMM-based speech synthesis. Circuits Syst. Signal Process. 34(8), 2597–2619 (2015)

    Article  Google Scholar 

  29. N.P. Narendra, K.S. Rao, A deterministic plus noise model of excitation signal using principal component analysis for parametric speech synthesis, in Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2016), pp. 5635–5639

  30. J.J. Odella, The Use of Context in Large Vocabulary Speech Recognition. Ph.D. thesis, Cambridge University, Cambridge (1995)

  31. K. Paliwal, W. Kleijn, Quantization of LPC parameters, in Speech Coding and Synthesis, ed. by W. Kleijn, E.K. Paliwal (Elsevier, Amsterdam, 1995)

    Google Scholar 

  32. Y. Pantazis, Y. Stylianou, Improving the modeling of the noise part in the harmonic plus noise model of speech, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4609–4612 (2008)

  33. B. Picart, T. Drugman, T. Dutoit, HMM-based speech synthesis with various degrees of articulation: a perceptual study. J. Neurocomput. 132, 142–147 (2014)

    Article  Google Scholar 

  34. T.F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice (Prentice Hall, Upper Saddle River, 2002)

    Google Scholar 

  35. E.V. Raghavendra, K. Prahallad, A multilingual screen reader in Indian languages, in Proceedings of National Conference on Communications (NCC) (2010), pp. 1–5

  36. T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, P. Alku, HMM-based speech synthesis utilizing glottal inverse filtering. IEEE Trans. Audio Speech Lang. Process. 19(1), 153–165 (2011)

    Article  Google Scholar 

  37. T. Raitio, A. Suni, H. Pulakka, M. Vainio, P. Alku, Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis, in Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 4564–4567

  38. K. Shinoda, T. Watanabe, MDL-based context-dependent subword modeling for speech recognition. J. Acoust. Soc. Jpn. (E) 21(2), 79–86 (2000)

    Article  Google Scholar 

  39. F. Soong, B. Juang, Line spectrum pair (LSP) and speech data compression, in Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (1984) pp. 37–40

  40. Y. Stylianou, Harmonic Plus Noise Models for Speech, Combined with Statistical Methods, for Speech and Speaker Modification. Ph.D. thesis, Ecole Nationale Supérieure des Télécommunications (1996)

  41. T. Toda, K. Tokuda, A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Trans. Inform. Syst. 90(5), 816–824 (2007)

    Article  Google Scholar 

  42. K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, Speech parameter generation algorithms for HMM-based speech synthesis, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing, (ICASSP) (2000), pp. 1315–1318

  43. K. Tokuda, Y. Nankaku, T. Toda, H. Zen, J. Yamagishi, K. Oura, Speech synthesis based on hidden Markov models. Proc. IEEE 101(5), 1234–1252 (2013)

    Article  Google Scholar 

  44. Z. Wen, J. Tao, S. Pan, Y. Wang, Pitch-scaled spectrum based excitation model for HMM-based speech synthesis. J. Signal Process. Syst. 74(3), 423–435 (2013)

    Article  Google Scholar 

  45. T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, T. Kitamura, Mixed-excitation for HMM-based speech synthesis, in Proceedings of Eurospeech (2001), pp. 2259–2262

  46. E. Yumoto, W. Gould, T. Baer, Harmonics-to-noise ratio as an index of the degree of hoarseness. J. Acoust. Soc. Am. 71(6), 1544–1550 (1982)

  47. H. Zen, T. Toda, M. Nakamura, K. Tokuda, Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans. Inform. Syst. E90-D, 325–333 (2007)

  48. H. Zen, T. Toda, K. Tokuda, The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006. IEICE Trans. Inform. Syst. E91-D(6), 1764–1773 (2008)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. P. Narendra.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Narendra, N.P., Rao, K.S. Parameterization of Excitation Signal for Improving the Quality of HMM-Based Speech Synthesis System. Circuits Syst Signal Process 36, 3650–3673 (2017). https://doi.org/10.1007/s00034-016-0476-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-016-0476-3

Keywords

Navigation