Dynamic prosody modification using zero frequency filtered signal

Govind, D.; Mahadeva Prasanna, S. R.

doi:10.1007/s10772-012-9155-3

Dynamic prosody modification using zero frequency filtered signal

Published: 09 June 2012

Volume 16, pages 41–54, (2013)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

D. Govind¹ &
S. R. Mahadeva Prasanna¹

477 Accesses
11 Citations
Explore all metrics

Abstract

Modifying the prosody parameters like pitch, duration and strength of excitation by desired factor is termed as prosody modification. The objective of this work is to develop a dynamic prosody modification method based on zero frequency filtered signal (ZFFS), a byproduct of zero frequency filtering (ZFF). The existing epoch based prosody modification techniques use epochs as pitch markers and the required prosody modification is achieved by the interpolation of epoch intervals plot. Alternatively, this work proposes a method for prosody modification by the resampling of ZFFS. Also the existing epoch based prosody modification method is further refined for modifying the prosodic parameters at every epoch level. Thus providing more flexibility for prosody modification. The general framework for deriving the modified epoch locations can also be used for obtaining the dynamic prosody modification from existing PSOLA and epoch based prosody modification methods. The quality of the prosody modified speech is evaluated using waveforms, spectrograms and subjective studies. The usefulness of the proposed dynamic prosody modification is demonstrated for neutral to emotional conversion task. The subjective evaluations performed for the emotion conversion indicate the effectiveness of the dynamic prosody modification over the fixed prosody modification for emotion conversion. The dynamic prosody modified speech files synthesized using the proposed, epoch based and TD-PSOLA methods are available at http://www.iitg.ac.in/eee/emstlab/demos/demo5.php.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modification of energy spectra, epoch parameters and prosody for emotion conversion in speech

Article 27 October 2016

Arijul Haque & Krothapalli Sreenivasa Rao

Significance of Epoch Identification Accuracy in Prosody Modification for Effective Emotion Conversion

Improving the Flexibility of Dynamic Prosody Modification Using Instants of Significant Excitation

Article 04 September 2015

D. Govind & Tinu T. Joy

References

Cabral, J. P. (2006). Transforming prosody and voice quality to generate emotions in speech. Master’s thesis, L2F-Spoken Language Systems Lab, Lisboa, Portugal.
Cabral, J. P., & Oliveira, L. C. (2006). Pitch-synchronous time-scaling for prosodic and voice quality transformations. In Proc. INTERSPEECH.
Google Scholar
Cahn, J. E. (1989). Generation of affect in synthesized speech. In Proc. American Voice I/O Society.
Google Scholar
Campell, N., Hamza, W., Hog, H., & Tao, J. (2006). Editorial special section on expressive speech synthesis. IEEE Transactions on Audio, Speech, and Language Processing, 14, 1097–1098.
Article Google Scholar
Childers, D. G., Wu, K., & Yegnanarayana, B. (1989). Voice conversion. Speech Communication, 8, 147–158.
Article Google Scholar
Dhananjaya, N., & Yegananarayana, B. (2010). Voiced/nonvoiced detection based on robustness of voiced epochs. IEEE Signal Processing Letters, 17(3), 273–276.
Article Google Scholar
Govind, D., Prasanna, S. R. M., & Yegnanarayana, B. (2011). Neutral to target emotion conversion using source and suprasegmental information. In Proc. INTERSPEECH 2011.
Google Scholar
Gu, H. -Y. (1998). Notes for the Syllable-signal synthesis method: Tipw. In Proc. ISCSLP.
Google Scholar
Gu, H.-Y., & Shiu, W.-L. (1998). A mandarin-syllable signal synthesis method with increased flexibility in duration, tone and timbre control. Proceedings of the National Science Council, Republic of China. Part A, 22(3), 385–395.
Google Scholar
Hofer, G., Richmond, K., & Clark, B. (2005). Informed blending of databases for emotional speech synthesis. In Proc. INTERSPEECH.
Google Scholar
Moulines, E., & Charpentier, F. (1990). Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication, 9, 452–467.
Google Scholar
Mourlines, E., & Laroche, J. (1995). Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Communication, 16, 175–205.
Article Google Scholar
Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1614.
Article Google Scholar
Murty, K. S. R., & Yegnanarayana, B. (2009). Characterization of glottal activity from speech signals. IEEE Signal Processing Letters, 16(6), 469–472.
Article Google Scholar
Pollard, M. P., et al. (1996). Enhanced shape-invarient pitch and time-scale modification for concatenative speech synthesis. In Proc. ICSLP.
Google Scholar
Portnoff, M. R. (1981). Time-scale modification of speech based on short-time Fourier analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-29, 374–390.
Article MathSciNet Google Scholar
Prasanna, S. R. M., & Govind, D. (2010). Analysis of excitation source information in emotional speech. In Proc. INTERSPEECH (pp. 781–784).
Google Scholar
Prasanna, S. R. M., Govind, D., Rao, K. S., & Yenanarayana, B. (2010). Fast prosody modification using instants of significant excitation. In Proc. speech prosody.
Google Scholar
Quatieri, T. F., & McAulay, R. J. (1992). Shape invariant time scale and pitch modification of speech. IEEE Transactions on Signal Processing, 40(3), 497–510.
Article Google Scholar
Rao, K. S., & Yegananarayana, B. (2009). Duration modification using glottal closure instants and vowel onset points. Speech Communication, 51(12), 1263–1269.
Article Google Scholar
Rao, K. S., & Yegnanarayana, B. (2006). Prosody modification using instants of significant excitation. IEEE Transactions on Audio, Speech, and Language Processing, 14, 972–980.
Article Google Scholar
Rao, K. S., Prasanna, S. R. M., & Yegnanarayana, B. (2007). Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Processing Letters, 14, 762–765.
Article Google Scholar
Schroeder, M. R., Flanagan, J. L., & Lundry, E. A. (1967). Bandwidth compression of speech by analytic-signal rooting. Proceedings of the IEEE, 55(3), 396–401.
Article Google Scholar
Smits, R., & Yegnanarayana, B. (1995). Determination of instants of significant excitation in speech using group delay function. IEEE Transactions on Acoustics, Speech, and Signal Processing, 4, 325–333.
Google Scholar
Tao, J., Kang, Y., & Li, A. (2006). Prosody conversion from neutral speech to emotional speech. IEEE Transactions on Audio, Speech, and Language Processing, 14, 1145–1154.
Article Google Scholar
Taylor, P. (2009). Text to speech synthesis. Cambridge: Cambridge University Press.
Book Google Scholar
Theune, M., Meijs, K., Heylen, D., & Ordelman, R. (2006). Generating expressive speech for story telling applications. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1099–1108.
Article Google Scholar
Thomas, M. R. P., Gudnason, J., & Naylor, P. A. (2008). Application of the dypsa algorithm to segmented time scale modification of speech. In Proc. European signal processing conference.
Google Scholar

Download references

Acknowledgements

The work done in this paper is funded by the on going UK-India Education Research Initiative (UKIERI) project titled “study of source features for speech synthesis and speaker recognition” between IIT Guwahati, IIIT Hyderabad and University of Edinburgh.

Author information

Authors and Affiliations

Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati, 781039, India
D. Govind & S. R. Mahadeva Prasanna

Authors

D. Govind
View author publications
You can also search for this author in PubMed Google Scholar
S. R. Mahadeva Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Govind.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Govind, D., Mahadeva Prasanna, S.R. Dynamic prosody modification using zero frequency filtered signal. Int J Speech Technol 16, 41–54 (2013). https://doi.org/10.1007/s10772-012-9155-3

Download citation

Received: 02 March 2012
Accepted: 28 May 2012
Published: 09 June 2012
Issue Date: March 2013
DOI: https://doi.org/10.1007/s10772-012-9155-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic prosody modification using zero frequency filtered signal

Abstract

Access this article

Similar content being viewed by others

Modification of energy spectra, epoch parameters and prosody for emotion conversion in speech

Significance of Epoch Identification Accuracy in Prosody Modification for Effective Emotion Conversion

Improving the Flexibility of Dynamic Prosody Modification Using Instants of Significant Excitation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dynamic prosody modification using zero frequency filtered signal

Abstract

Access this article

Similar content being viewed by others

Modification of energy spectra, epoch parameters and prosody for emotion conversion in speech

Significance of Epoch Identification Accuracy in Prosody Modification for Effective Emotion Conversion

Improving the Flexibility of Dynamic Prosody Modification Using Instants of Significant Excitation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation