Improving the Flexibility of Dynamic Prosody Modification Using Instants of Significant Excitation

Govind, D.; Joy, Tinu T.

doi:10.1007/s00034-015-0159-5

Improving the Flexibility of Dynamic Prosody Modification Using Instants of Significant Excitation

Published: 04 September 2015

Volume 35, pages 2518–2543, (2016)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

D. Govind¹ &
Tinu T. Joy¹

273 Accesses
9 Citations
Explore all metrics

Abstract

Modification of suprasegmental features such as pitch and duration of original speech by fixed scaling factors is referred to as static prosody modification. In dynamic prosody modification, the prosodic scaling factors (time-varying modification factors) are defined for all the pitch cycles present in the original speech. The present work is focused on improving the naturalness of the prosody modified speech by reducing the generation of piecewise constant segments in the modified pitch contour. The prosody modification is performed by anchoring around the accurate instants of significant excitation estimated from the original speech. The division of longer pitch intervals into many equal intervals over long speech segments introduces step-like discontinuities in the form of piecewise constant segments in the modified pitch contours. The effectiveness of proposed dynamic modification method is initially confirmed from the smooth modified pitch contour plot obtained for finer static prosody scaling factors, waveforms, spectrogram plots and comparison subjective evaluations. Also, the average \(F_0\) jitter computed from the pitch segments of each glottal activity region in the modified speech is proposed as an objective measure for the prosody modification. The naturalness of the prosody modified speech using the proposed method is objectively and subjectively compared with that of the existing zero frequency filtered signal-based dynamic prosody modification. Also, the proposed algorithm effectively preserves the dynamics of the prosodic patterns in singing voices where in the \(F_0\) parameters rapidly and continuously fluctuate within a higher \(F_0\) range.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Semi-automatic Segmentation and Marking of Pitch Contours for Prosodic Analysis

Significance of Epoch Identification Accuracy in Prosody Modification for Effective Emotion Conversion

A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model

Article Open access 11 July 2018

Chen-Yu Chiang

Notes

The terms epochs and ISE are interchangeably used throughout this article.
The epoch intervals and instantaneous pitch periods are considered as the same parameter in the context of prosody modification.
Since pitch cycles are either repeated or dropped in case of duration modification, no overlap in successive pitch intervals occurs and hence samples in the pitch intervals interval are not copied in overlap-add manner. For pitch modification, to reduce the effect of truncation and expansion of pitch cycles in the waveform reconstruction, the samples in each pitch intervals of the original speech signal are copied in an overlap-add manner.
Methods for subjective determination of transmission quality: ITU-T Recommendation P.800 is available from the ITU Web site: http://www.itu.int/rec/T-REC-P.800-199608-I/en.

References

N. Adiga, D. Govind, S.R.M. Prasanna, Significance of epoch identification accuracy for prosody modification, in Proceedings of the SPCOM (2014)
J.P. Cabral, L.C. Oliveira, Emo voice: a system to generate emotions in speech, in Proceedings of the INTERSPEECH (2006a), pp. 1798–1801
J.P. Cabral, L.C. Oliveira, Pitch-synchronous time-scaling for prosodic and voice quality transformations, in Proceedings of the INTERSPEECH (2006b)
K.T. Deepak, S.R.M. Prasanna, Epoch extraction using zero band filtering from speech signal. Circuits Syst. Signal Process. (2014). doi:10.1007/s00034-014-9957-4
J.R. Deller, J.G. Proakis, J.H.L. Hanson, Discrete-Time Processing of Speech Signals (Macmillan, New York, 1993)
Google Scholar
M. Farrus, J. Hernando, Using jitter and shimmer in speaker verification. IET Signal Process. 3(4), 247–257 (2009)
Article Google Scholar
D. Govind, A.S. Biju, A. Smily, Automatic speech polarity detection using phase information from complex analytic signal representations, in SPCOM (2014)
D. Govind, S.R.M. Prasanna, Epoch extraction from emotional speech, in Proceedings of the Signal Processing & Communications (SPCOM) (2012), pp. 1–5
D. Govind, S.R.M. Prasanna, B. Yegnanarayana, Neutral to target emotion conversion using source and suprasegmental information, in Proceedings of the INTERSPEECH (2011)
D. Govind, S.R.M. Prasanna, Dynamic prosody modification using zero frequency filtered signal. Int. J. Speech Technol. 16(1), 41–54 (2013)
Article Google Scholar
H.-Y. Gu, Notes for the syllable-signal synthesis method: Tipw, in Proceedings of the ISCSLP (1998)
H.-Y. Gu, W.-L. Shiu, A mandarin-syllable signal synthesis method with increased flexibility in duration, tone and timbre control. Proc. Natl. Sci. Counc. ROC(A) 22(3), 385–395 (1998)
G. Hu, D.L. Wang, monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Netw. 15(5), 1135–1150 (2004)
Article Google Scholar
S. King, An introduction to statistical parametric speech synthesis. Sadhana 36(5), 837–852 (2011)
Article Google Scholar
J. Kominek, A. Black, CMU-Arctic speech databases, in 5th ISCA Speech Synthesis Workshop (Pittsburgh, PA, 2004), pp. 223–224
Y. Li, D. Wang, Separation of singing voice from music accompaniment for monaural recordings. IEEE Trans. Audio Speech Lang. Process. 15, 1475–1487 (2007)
Article Google Scholar
E. Moulines, F. Charpentier, Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9, 452–467 (1990)
Google Scholar
E. Moulines, J. Laroche, Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Commun. 16, 175–205 (1995)
P.S. Murthy, B. Yegnanarayana, Robustness of group-delay-based method for extraction of significant instants of excitation from speech signals. IEEE Trans. Speech Audio Process. 7(6), 609–619 (1999)
Article Google Scholar
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1614 (2008)
Article Google Scholar
K.S.R. Murty, B. Yegnanarayana, Characterization of glottal activity from speech signals. IEEE Signal Process. Lett. 16(6), 469–472 (2009)
Article Google Scholar
P.A. Naylor, A. Kounoudes, J. Gudnason, M. Brookes, Estimation of glottal closure instants in voiced speech using DYPSA algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007)
Article Google Scholar
M.P. Pollard, B.M.G. Cheetham, C.C. Goodyear, M.D. Edgington, A. Lowry, Enhanced shape-invariant pitch and time-scale modification for concatenative speech synthesis, in Proceedings of the ICSLP (1996)
M.R. Portnoff, Time-scale modification of speech based on short-time fourier analysis. IEEE Trans. Acoust. Speech Signal Process. ASSP 29, 374–390 (1981)
Article MathSciNet Google Scholar
S.R.M. Prasanna, D. Govind, Unified pitch markers generation method for pitch and duration modification, in Proceedings of the National Conference on Communications (NCC) (2013)
S.R.M. Prasanna, D. Govind, K.S. Rao, B. Yenanarayana, Fast prosody modification using instants of significant excitation, in Proceedings of the Speech Prosody (2010)
A. Prathosh, T. Ananthapadmanabha, A. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans Audio Speech Lang. Process. 21(12), 2471–2480 (2013)
Article Google Scholar
T.F. Quatieri, R.J. McAulay, Shape invariant time scale and pitch modification of speech. IEEE Trans. Signal Process. 40(3), 497–510 (1992)
Article Google Scholar
K.S. Rao, S.R.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Process. Lett. 14, 762–765 (2007)
Article Google Scholar
K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14, 972–980 (2006)
Article Google Scholar
P. Taylor, Text to Speech Synthesis (Cambridge University Press, Cambridge, MA, 2009)
Book Google Scholar
M.R.P. Thomas, J. Gudnason, P.A. Naylor, Application of DYPSA algorithm to segmented time scale modification of speech, in Proceedings of the EUSIPCO (2008)
S.P. Whiteside, Simulated emotions: an acoustic study of voice and perturbation measures, in Proceedings of the ICSLP (Sydney, 1998), pp. 699–703
H. Zen, K. Tokuda, A. Black, Statistical parametric speech synthesis. Speech Commun. 51, 1039–1064 (2009)
Article Google Scholar

Download references

Acknowledgments

The present work is supported by Department of Science and Technology sponsored project entitled “Analysis, processing and synthesis of emotions in speech.” The project Reference No. SB/FTP/ETA-370/2012.

Author information

Authors and Affiliations

Center for Computational Engineering and Networking, Amrita Vishwa Vidyapeetham (University), Coimbatore, Tamilnadu, India
D. Govind & Tinu T. Joy

Authors

D. Govind
View author publications
You can also search for this author in PubMed Google Scholar
Tinu T. Joy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Govind.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Govind, D., Joy, T.T. Improving the Flexibility of Dynamic Prosody Modification Using Instants of Significant Excitation. Circuits Syst Signal Process 35, 2518–2543 (2016). https://doi.org/10.1007/s00034-015-0159-5

Download citation

Received: 03 July 2014
Revised: 23 August 2015
Accepted: 23 August 2015
Published: 04 September 2015
Issue Date: July 2016
DOI: https://doi.org/10.1007/s00034-015-0159-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Improving the Flexibility of Dynamic Prosody Modification Using Instants of Significant Excitation

Abstract

Access this article

Similar content being viewed by others

Semi-automatic Segmentation and Marking of Pitch Contours for Prosodic Analysis

Significance of Epoch Identification Accuracy in Prosody Modification for Effective Emotion Conversion

A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving the Flexibility of Dynamic Prosody Modification Using Instants of Significant Excitation

Abstract

Access this article

Similar content being viewed by others

Semi-automatic Segmentation and Marking of Pitch Contours for Prosodic Analysis

Significance of Epoch Identification Accuracy in Prosody Modification for Effective Emotion Conversion

A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation