Skip to main content
Log in

Unconstrained Pitch Contour Modification Using Instants of Significant Excitation

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

This paper proposes a flexible method for pitch contour modification using the instants of significant excitation of the vocal tract system during the production of speech. The instants of significant excitation correspond to the instants of glottal closure (epochs) in the case of voiced speech, and to some random excitations like onset of burst in the case of nonvoiced speech. Instants of significant excitation are computed from the Linear Prediction (LP) residual of speech signals by using the property of average group-delay of minimum phase signals. The modification of pitch contour is achieved by manipulating the LP residual with the help of the knowledge of the instants of significant excitation. The modified residual is used to excite the time-varying filter, whose parameters are derived from the original speech signal. Perceptual quality of the synthesized speech is good, and is without any significant distortion. The proposed method is evaluated using waveforms, spectrograms and listening tests. Listening tests are performed on voice conversion application, where the source speaker’s pitch contour is modified by the proposed method according to the target speaker’s pitch contour. The performance of the proposed method is compared with Linear Prediction Pitch Synchronous Overlap and Add (LP-PSOLA) method using listening tests, for the voice conversion application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. B. Bozkurt, T. Dutoit, R. Prudon, C. D’Alessandro, V. Pagel, Improving quality of MBROLA synthesis for non-uniform units synthesis, in IEEE Workshop on Speech Synthesis, Santa Monica, California, USA, September (2002)

    Google Scholar 

  2. R. Crochiere, A weighted overlap-add method of short time Fourier analysis/synthesis. IEEE Trans. Acoust. Speech Signal Process. 28, 99–102 (1980)

    Article  Google Scholar 

  3. S. Desai, A.W. Black, B. Yegnanarayana, K. Prahlad, Spectral mapping using artificial neural networks for voice conversion. IEEE Trans. Speech Audio Process. 18, 954–964 (2010)

    Article  Google Scholar 

  4. J.R. Deller, J.G. Proakis, J.H.L. Hansen, Discrete-time processing of speech signals (Macmillan Co., New York, 1993)

    Google Scholar 

  5. T. Dutoit, H. Leich, Text-to-speech synthesis based on a MBE resynthesis of segments database. Speech Commun. 13, 435–440 (1993)

    Article  Google Scholar 

  6. M. Edgington, A. Lowry, Residual-based speech modification algorithms for text-to-speech synthesis, in ICSLP, Philadelphia, PA, USA, October (1996)

    Google Scholar 

  7. D. Govind, S.R.M. Prasanna, Expressive speech synthesis using prosodic modification and dynamic time warping. In NCC 2009, Guwahati, India, January (2009)

    Google Scholar 

  8. R.V. Hogg, J. Ledolter, Engineering Statistics (Macmillan Co., New York, 1987)

    Google Scholar 

  9. H. Kawahara, YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. 111, 1917–1930 (2002)

    Article  Google Scholar 

  10. H. Kawahara, I. Masuda-Katsuse, A. de Cheveigne, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27, 187–207 (1999)

    Article  Google Scholar 

  11. H. Kawahara, Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Munich, Germany, vol. 2 (1997), pp. 1303–1306

    Google Scholar 

  12. J. Laroche, Y. Stylianou, E. Moulines, HNS: Speech modification based on a harmonic + noise model, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Minneapolis, USA, April (1993), pp. 550–553

    Chapter  Google Scholar 

  13. S. Lemmetty, Review of speech synthesis technology. Master thesis, Dept. of Electrical and Communications Engineering, Helsinki University of Technology, Espoo, Finland, March (1999)

  14. R.H. Laskar, Voice conversion by transforming the vocal tract and prosodic characteristics. Master thesis, Dept. of Electronic and Communication Engineering, Indian Institute of Technology Guwahati, May (2006)

  15. E. Moulines, F. Charpentier, Pitch-synchronous waveform processing techniques for text to speech synthesis using diphones. Speech Commun. 9, 453–467 (1990)

    Article  Google Scholar 

  16. E. Moulines, J. Laroche, Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Commun. 16, 175–205 (1995)

    Article  Google Scholar 

  17. J. Makhoul, Linear prediction: a tutorial review. Proc. IEEE 63, 561–580 (1975)

    Article  Google Scholar 

  18. R. Murali Sankar, A.G. Ramakrishnan, P. Prathibha, Modification of pitch using DCT in source domain. Speech Commun. 42, 143–154 (2004)

    Article  Google Scholar 

  19. P.S. Murthy, B. Yegnanarayana, Robustness of group-delay-based method for extraction of significant excitation from speech signals. IEEE Trans. Speech Audio Process. 7, 609–619 (1999)

    Article  Google Scholar 

  20. M. Narendranadh, H.A. Murthy, S. Rajendran, B. Yegnanarayana, Transformation of formants for voice conversion using artificial neural networks. Speech Commun. 16, 206–216 (1995)

    Google Scholar 

  21. A.V. Oppenheim, R.W. Schafer, J.R. Buck, Discrete-Time Signal Processing (Prentice-Hall, Upper Saddle River, 1999)

    Google Scholar 

  22. S.R.M. Prasanna, C.S. Gupta, B. Yegnarayana, Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Commun. 48, 1243–1261 (2006)

    Article  Google Scholar 

  23. S.R.M. Prasanna, P.K. Murthy, B. Yegnanarayana, Speech enhancement using source features and group delay analysis, in INDICON, Chennai, India, December (2005), pp. 19–23

    Google Scholar 

  24. S.R.M. Prasanna, D. Govind, K.S. Rao, B. Yegnanarayana, Fast prosody modification using instants of significant excitation, in Speech Prosody 2010, Chicago, USA, May (2010)

    Google Scholar 

  25. T.F. Quatieri, R.J. McAulay, Shape invariant time-scale and pitch modification of speech. IEEE Trans. Signal Process. 40, 497–510 (1992)

    Article  Google Scholar 

  26. K.S. Rao, Acquisition and incorporation prosody knowledge for speech systems in Indian languages. Ph.D. thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India, May (2005)

  27. K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Speech Audio Process. 14, 972–980 (2006)

    Article  Google Scholar 

  28. K.S. Rao, R.H. Laskar, S.G. Koolagudi, Voice transformation by mapping the features at syllable level, in 2nd International Conference on Pattern Recognition and Machine Intelligence, Premi-2007, Kolkota, India, December. LNCS (2007) pp. 479–486

    Chapter  Google Scholar 

  29. K.S. Rao, Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Comput. Speech Lang. 24, 474–494 (2010)

    Article  MATH  Google Scholar 

  30. Y. Stylianou, Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Trans. Speech Audio Process. 9, 21–29 (2001)

    Article  Google Scholar 

  31. R. Smits, B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3, 325–333 (1995)

    Article  Google Scholar 

  32. K. Sjolander, J. Beskow, Wavesurfer: an open source speech tool, in Proc. Int. Conf. Spoken Language Processing, Beijing, China (2000). http://www.speech.kth.se/wavesurfer/download.html

    Google Scholar 

  33. B. Yegnanarayana, C. d’Alessandro, V. Darsinos, An iterative algorithm for decomposition of speech signals into periodic and aperiodic components. IEEE Trans. Speech Audio Process. 6, 1–11 (1998)

    Article  Google Scholar 

  34. B. Yegnanarayana, P.S. Murthy, Enhancement of reverberant speech using LP residual signal. IEEE Trans. Speech Audio Process. 8, 267–281 (2000)

    Article  Google Scholar 

  35. Y. Zhang, J. Tao, Prosody modification on mixed-language speech synthesis, in Proc. Int. Conf. Spoken Language Processing, Brisbane, Australia, September (2008)

    Google Scholar 

Download references

Acknowledgements

Author would like to acknowledge the reviewers for their valuable comments and suggested corrections. Those have helped us a lot for improving the quality of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krothapalli Sreenivasa Rao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rao, K.S. Unconstrained Pitch Contour Modification Using Instants of Significant Excitation. Circuits Syst Signal Process 31, 2133–2152 (2012). https://doi.org/10.1007/s00034-012-9428-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-012-9428-8

Keywords

Navigation