Skip to main content
Log in

Dynamic prosody modification using zero frequency filtered signal

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Modifying the prosody parameters like pitch, duration and strength of excitation by desired factor is termed as prosody modification. The objective of this work is to develop a dynamic prosody modification method based on zero frequency filtered signal (ZFFS), a byproduct of zero frequency filtering (ZFF). The existing epoch based prosody modification techniques use epochs as pitch markers and the required prosody modification is achieved by the interpolation of epoch intervals plot. Alternatively, this work proposes a method for prosody modification by the resampling of ZFFS. Also the existing epoch based prosody modification method is further refined for modifying the prosodic parameters at every epoch level. Thus providing more flexibility for prosody modification. The general framework for deriving the modified epoch locations can also be used for obtaining the dynamic prosody modification from existing PSOLA and epoch based prosody modification methods. The quality of the prosody modified speech is evaluated using waveforms, spectrograms and subjective studies. The usefulness of the proposed dynamic prosody modification is demonstrated for neutral to emotional conversion task. The subjective evaluations performed for the emotion conversion indicate the effectiveness of the dynamic prosody modification over the fixed prosody modification for emotion conversion. The dynamic prosody modified speech files synthesized using the proposed, epoch based and TD-PSOLA methods are available at http://www.iitg.ac.in/eee/emstlab/demos/demo5.php.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Cabral, J. P. (2006). Transforming prosody and voice quality to generate emotions in speech. Master’s thesis, L2F-Spoken Language Systems Lab, Lisboa, Portugal.

  • Cabral, J. P., & Oliveira, L. C. (2006). Pitch-synchronous time-scaling for prosodic and voice quality transformations. In Proc. INTERSPEECH.

    Google Scholar 

  • Cahn, J. E. (1989). Generation of affect in synthesized speech. In Proc. American Voice I/O Society.

    Google Scholar 

  • Campell, N., Hamza, W., Hog, H., & Tao, J. (2006). Editorial special section on expressive speech synthesis. IEEE Transactions on Audio, Speech, and Language Processing, 14, 1097–1098.

    Article  Google Scholar 

  • Childers, D. G., Wu, K., & Yegnanarayana, B. (1989). Voice conversion. Speech Communication, 8, 147–158.

    Article  Google Scholar 

  • Dhananjaya, N., & Yegananarayana, B. (2010). Voiced/nonvoiced detection based on robustness of voiced epochs. IEEE Signal Processing Letters, 17(3), 273–276.

    Article  Google Scholar 

  • Govind, D., Prasanna, S. R. M., & Yegnanarayana, B. (2011). Neutral to target emotion conversion using source and suprasegmental information. In Proc. INTERSPEECH 2011.

    Google Scholar 

  • Gu, H. -Y. (1998). Notes for the Syllable-signal synthesis method: Tipw. In Proc. ISCSLP.

    Google Scholar 

  • Gu, H.-Y., & Shiu, W.-L. (1998). A mandarin-syllable signal synthesis method with increased flexibility in duration, tone and timbre control. Proceedings of the National Science Council, Republic of China. Part A, 22(3), 385–395.

    Google Scholar 

  • Hofer, G., Richmond, K., & Clark, B. (2005). Informed blending of databases for emotional speech synthesis. In Proc. INTERSPEECH.

    Google Scholar 

  • Moulines, E., & Charpentier, F. (1990). Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication, 9, 452–467.

    Google Scholar 

  • Mourlines, E., & Laroche, J. (1995). Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Communication, 16, 175–205.

    Article  Google Scholar 

  • Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1614.

    Article  Google Scholar 

  • Murty, K. S. R., & Yegnanarayana, B. (2009). Characterization of glottal activity from speech signals. IEEE Signal Processing Letters, 16(6), 469–472.

    Article  Google Scholar 

  • Pollard, M. P., et al. (1996). Enhanced shape-invarient pitch and time-scale modification for concatenative speech synthesis. In Proc. ICSLP.

    Google Scholar 

  • Portnoff, M. R. (1981). Time-scale modification of speech based on short-time Fourier analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-29, 374–390.

    Article  MathSciNet  Google Scholar 

  • Prasanna, S. R. M., & Govind, D. (2010). Analysis of excitation source information in emotional speech. In Proc. INTERSPEECH (pp. 781–784).

    Google Scholar 

  • Prasanna, S. R. M., Govind, D., Rao, K. S., & Yenanarayana, B. (2010). Fast prosody modification using instants of significant excitation. In Proc. speech prosody.

    Google Scholar 

  • Quatieri, T. F., & McAulay, R. J. (1992). Shape invariant time scale and pitch modification of speech. IEEE Transactions on Signal Processing, 40(3), 497–510.

    Article  Google Scholar 

  • Rao, K. S., & Yegananarayana, B. (2009). Duration modification using glottal closure instants and vowel onset points. Speech Communication, 51(12), 1263–1269.

    Article  Google Scholar 

  • Rao, K. S., & Yegnanarayana, B. (2006). Prosody modification using instants of significant excitation. IEEE Transactions on Audio, Speech, and Language Processing, 14, 972–980.

    Article  Google Scholar 

  • Rao, K. S., Prasanna, S. R. M., & Yegnanarayana, B. (2007). Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Processing Letters, 14, 762–765.

    Article  Google Scholar 

  • Schroeder, M. R., Flanagan, J. L., & Lundry, E. A. (1967). Bandwidth compression of speech by analytic-signal rooting. Proceedings of the IEEE, 55(3), 396–401.

    Article  Google Scholar 

  • Smits, R., & Yegnanarayana, B. (1995). Determination of instants of significant excitation in speech using group delay function. IEEE Transactions on Acoustics, Speech, and Signal Processing, 4, 325–333.

    Google Scholar 

  • Tao, J., Kang, Y., & Li, A. (2006). Prosody conversion from neutral speech to emotional speech. IEEE Transactions on Audio, Speech, and Language Processing, 14, 1145–1154.

    Article  Google Scholar 

  • Taylor, P. (2009). Text to speech synthesis. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Theune, M., Meijs, K., Heylen, D., & Ordelman, R. (2006). Generating expressive speech for story telling applications. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1099–1108.

    Article  Google Scholar 

  • Thomas, M. R. P., Gudnason, J., & Naylor, P. A. (2008). Application of the dypsa algorithm to segmented time scale modification of speech. In Proc. European signal processing conference.

    Google Scholar 

Download references

Acknowledgements

The work done in this paper is funded by the on going UK-India Education Research Initiative (UKIERI) project titled “study of source features for speech synthesis and speaker recognition” between IIT Guwahati, IIIT Hyderabad and University of Edinburgh.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Govind.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Govind, D., Mahadeva Prasanna, S.R. Dynamic prosody modification using zero frequency filtered signal. Int J Speech Technol 16, 41–54 (2013). https://doi.org/10.1007/s10772-012-9155-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-012-9155-3

Keywords

Navigation