Epoch Estimation from Emotional Speech Signals Using Variational Mode Decomposition

Lal, G. Jyothish; Gopalakrishnan, E. A.; Govind, D.

doi:10.1007/s00034-018-0804-x

Epoch Estimation from Emotional Speech Signals Using Variational Mode Decomposition

Published: 20 March 2018

Volume 37, pages 3245–3274, (2018)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

G. Jyothish Lal¹,
E. A. Gopalakrishnan¹ &
D. Govind¹

508 Accesses
22 Citations
Explore all metrics

Abstract

This paper presents a novel approach for the estimation of epochs from the emotional speech signal. Epochs are the locations of significant excitation in the vocal tract during the production of voiced sound by the vibration of vocal folds. The estimation of epoch locations is essential for deriving instantaneous pitch contours for accurate emotion analysis. Many well-known algorithms for epoch extraction are found to show degraded performance due to the varying nature of excitation characteristics in the emotional speech signal. The proposed approach exploits the effectiveness of a new adaptive time series decomposition technique called variational mode decomposition (VMD) for the estimation of epochs. The VMD algorithm is applied on the emotional speech signal for decomposition of the signal into various sub-signals. Analysis of these signals shows that the VMD algorithm captures the center frequency close to the fundamental frequency defined for each glottal cycle of emotional speech utterance through its modes. This center frequency characteristic of the corresponding mode signal helps in the accurate estimation of epoch locations from the emotional speech signal. The performance evaluation of the proposed method is carried out on six different emotions taken from the German emotional speech database with simultaneous electroglottographic signals. Experimental results on clean emotive speech signals show that the proposed method provides identification rate and accuracy comparable to that of the best performing algorithm. Besides, the proposed method provides better reliability in epoch estimation from emotive speech signals degraded by the presence of noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Significance of Epoch Identification Accuracy in Prosody Modification for Effective Emotion Conversion

An Extended Variational Mode Decomposition Algorithm Developed Speech Emotion Recognition Performance

Epoch Extraction Using Hilbert–Huang Transform for Identification of Closed Glottis Interval

References

T. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Process. 27(4), 309–319 (1979)
Article Google Scholar
M. Brookes, VOICEBOX: speech processing toolbox for MATLAB. http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html. Accessed 30 May 2017
M. Bulut, S. Narayanan, On the robustness of overall f0-only modifications to the perception of emotions in speech. J. Acoust. Soc. Am. 123, 4547–4558 (2008)
Article Google Scholar
F. Burkhardt, A. Paeschke, M. Rolfes, W.F. Sendlmeier, B. Weiss, A database of german emotional speech, in Interspeech, pp. 1–4 (2005)
J.P. Cabral, L.C. Oliveira, Emo voice: a system to generate emotions in speech, in Interspeech, pp. 1798–1801 (2006)
J.P. Cabral, L.C. Oliveira, Pitch-synchronous time-scaling for prosodic and voice quality transformations, in Interspeech, pp. 1137–1140 (2005)
F. Dellaert, T. Polzin, A. Waibel, Recognizing emotion in speech. Spoken Language, in ICSLP 96, pp, 1970–1973 (1996)
K.T. Deepak, S.R.M. Prasanna, Epoch extraction using zero band filtering from speech signal. Circuits Syst. Signal Process. 34(7), 2309–2333 (2015)
Article Google Scholar
P. Deshpande, M.S. Manikandan, Effective glottal instant detection and electroglottographic parameter extraction for automated voice pathology assessment. IEEE J. Biomed. Health Inf. PP(99), 1–11 (2017)
Google Scholar
K. Dragomiretskiy, D. Zosso, Variational mode decomposition. IEEE Trans. Signal Process. 62(3), 531–544 (2014)
Article MathSciNet Google Scholar
T. Drugman, P. Alku, A. Alwan, B. Yegnanarayana, Glottal source processing: from analysis to applications. Comput. Speech Lang. 28(5), 1117–1138 (2014)
Article Google Scholar
T. Drugman, T. Dutoit, Glottal closure and opening instant detection from speech signals, in Interspeech, pp. 2891–2894 (2009)
S.R. Dumpala, K.V. Sridaran, S.V. Gangashetty, B. Yegnanarayana, Analysis of laughter and speech-laugh signals using excitation source information, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 975–979 (2014)
Z. Gao, X. Wang, J. Lin, Y. Liao, Online evaluation of metal burn degrees based on acoustic emission and variational mode decomposition. Measurement 103, 302–310 (2017)
Article Google Scholar
J. Gilles, Empirical wavelet transform. IEEE Trans. Signal Process. 61(16), 3999–4010 (2013)
Article MathSciNet Google Scholar
D. Govind, Epoch based dynamic prosody modification for neutral to expressive conversion, Ph.D Thesis, http://gyan.iitg.ernet.in/handle/123456789/363. Accessed 10 July 2017
D. Govind, P. Hisham, D. Pravena, Effectiveness of polarity detection for improved epoch extraction from speech, in National Conference on Communication (NCC), pp. 1–6 (2016)
D. Govind, S.R.M. Prasanna, Epoch extraction from emotional speech, in International Conference on Signal Processing and Communications (SPCOM), pp. 1–5 (2012)
D. Govind, S.R.M. Prasanna, Expressive speech synthesis: a review. Int. J. Speech Technol. 16(2), 237–260 (2013)
Article Google Scholar
D. Govind, S.R.M. Prasanna, B. Yegnanarayana, Neutral to target emotion conversion using source and suprasegmental information, in Interspeech, pp. 2969–2972 (2011)
D. Govind, R. Vishnu, D. Pravena, Improved method for epoch estimation in telephonic speech signals using zero frequency filtering, in IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp. 11–15 (2015)
N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.C. Yen, C.C. Tung, H.H. Liu, The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Royal Soc. Lond. A Math. Phys. Eng. Sci. 454, 903–995 (1988)
Article MathSciNet MATH Google Scholar
S. R. Kadiri, P. Gangamohan, S.V Gangashetty, B. Yegnanarayana, Analysis of excitation source features of speech for emotion recognition, in Interspeech, pp. 1324–1328 (2015)
S.R. Kadiri, B. Yegnanarayana, Epoch extraction from emotional speech using single frequency filtering approach. Speech Commun. 86, 52–63 (2017)
Article Google Scholar
S.R. Kadiri, B. Yegnanarayana, Speech polarity detection using strength of impulse-like excitation extracted from speech epochs, in ICASSP), pp. 5610–5614 (2017)
S.R. Kadiri, B. Yegnanarayana, Analysis of singing voice for epoch extraction using zero frequency filtering method, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4260–4264 (2015)
V. Khanagha, K. Daoudi, H. Yahia, Detection of glottal closure instants based on the microcanonical multiscale formalism. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1941–1950 (2014)
Article Google Scholar
S.G. Koolagudi, S. Devliyal, B. Chawla, A. Barthwal, K.S. Rao, Recognition of emotions from speech using excitation source features. Procedia Eng. 38, 3409–3417 (2012)
Article Google Scholar
S.G. Koolagudi, R. Reddy, K.S. Rao, Emotion recognition from speech signal using epoch parameters, in International Conference on Signal Processing and Communications (SPCOM), pp. 1–5 (2010)
A.I. Koutrouvelis, G.P. Kafentzis, N.D. Gaubitch, R. Heusdens, A fast method for high-resolution voiced/unvoiced detection and glottal closure/opening instant estimation of speech. IEEE/ACM Trans. Audio Speech Lang. Process. 24(2), 316–328 (2016)
Article Google Scholar
S.R. Krothapalli, S.G. Koolagudi, Characterization and recognition of emotions from speech using excitation source information. Int. J. Speech Technol. 16(2), 181–201 (2013)
Article Google Scholar
K.S. Kumar, M.S.H. Reddy, K.S.R. Murty, B. Yegnanarayana, Analysis of laugh signals for detecting in continuous speech, Interspeech, pp. 1591–1594 (2009)
G.J. Lal, E.A. Gopalakrishnan, D. Govind, Accurate estimation of glottal closure instants and glottal opening instants from electroglottographic signal using variational mode decomposition. Circuits Syst. Signal Process. 37(2), 810–830 (2018)
Article MathSciNet Google Scholar
A. Mert, ECG feature extraction based on the bandwidth properties of variational mode decomposition. Physiol. Meas. 37(4), 530–543 (2016)
Article Google Scholar
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
Article Google Scholar
P.A. Naylor, A. Kounoudes, J. Gudnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007)
Article Google Scholar
Noisex-92, www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html. Accessed 9 Dec 2017
S.R.M. Prasanna, D. Govind, Analysis of excitation source information in emotional speech, in Interspeech, pp. 781–784 (2010)
A.P. Prathosh, T.V. Ananthapadmanabha, A.G. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans. Audio Speech Lang. Process. 21(12), 2471–2480 (2013)
Article Google Scholar
L.R. Rabiner, M.J. Cheng, A.E. Rosenberg, C.A. McGonegal, A comparative performance study of several pitch detection algorithms. IEEE Trans. Audio Speech Lang. Process. 24(5), 399–418 (1976)
Article Google Scholar
K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14, 972–980 (2006)
Article Google Scholar
K.R. Scherer, Vocal affect expressions: a review and a model for future research. Psychol. Bull. 99, 143–165 (1986)
Article Google Scholar
K.P. Soman, P. Prabaharan, S. Athira, K. Harikumar, Recursive variational mode decomposition algorithm for real time power signal decomposition. Procedia Technol. 21, 540–546 (2015)
Article Google Scholar
D. Talkin, A robust algorithm for pitch tracking, in Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, New Providence, 1995), pp. 495–518
Google Scholar
S.A. Thati, K.S. Kumar, B. Yegnanarayana, Synthesis of laughter by modifying excitation characteristics. J. Acoust. Soc. Am. 133(5), 3072–3082 (2013)
Article Google Scholar
M.R.P. Thomas, J. Gudnason, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Trans. Audio Speech Lang. Process. 20(1), 82–91 (2012)
Article Google Scholar
A. Upadhyay, R.B. Pachori, Instantaneous voiced/non-voiced detection in speech signals based on variational mode decomposition. J. Frankl. Inst. 352, 2679–2707 (2015)
Article Google Scholar
A. Upadhyay, R.B. Pachori, A new method for determination of instantaneous pitch frequency from speech signals, in IEEE Signal Processing and Signal Processing Education Workshop, pp. 325–330 (2015)
WAVESURFER, https://www.speech.kth.se/wavesurfer. Accessed 6 Mar 2017
C.E. Williams, K. Stevens, Emotions and speech: some acoustic correlates. J. Acoust. Soc. Am. 52, 1238–1250 (1972)
Article Google Scholar
Y.J. Xue, J.X. Cao, D.X. Wang, H.K. Du, Y. Yao, Application of the variational-mode decomposition for seismic time–frequency analysis. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 9(8), 3821–3831 (2016)
Article Google Scholar
W. Yang, Z. Peng, K. Wei, P. Shi, W. Tian, Superiorities of variational mode decomposition over empirical mode decomposition particularly in time–frequency feature extraction and wind turbine condition monitoring. IET Renew. Power Gener. 11, 443–452 (2016). https://doi.org/10.1049/iet-rpg.2016.0088
Article Google Scholar
B. Yegnanarayana, K.S.R. Murty, Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang. Process. 17(4), 614–624 (2009)
Article Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge Amrita Vishwa Vidyapeetham for supporting the first author in pursing his Ph.D. The authors would like to thank Dr. K.P. Soman and Ms. M. Neethu (Amrita Vishwa Vidyapeetham) for lucidly explaining the concept of VMD .

Author information

Authors and Affiliations

Centre for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
G. Jyothish Lal, E. A. Gopalakrishnan & D. Govind

Authors

G. Jyothish Lal
View author publications
You can also search for this author in PubMed Google Scholar
E. A. Gopalakrishnan
View author publications
You can also search for this author in PubMed Google Scholar
D. Govind
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to E. A. Gopalakrishnan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lal, G.J., Gopalakrishnan, E.A. & Govind, D. Epoch Estimation from Emotional Speech Signals Using Variational Mode Decomposition. Circuits Syst Signal Process 37, 3245–3274 (2018). https://doi.org/10.1007/s00034-018-0804-x

Download citation

Received: 04 August 2017
Revised: 13 March 2018
Accepted: 15 March 2018
Published: 20 March 2018
Issue Date: August 2018
DOI: https://doi.org/10.1007/s00034-018-0804-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Epoch Estimation from Emotional Speech Signals Using Variational Mode Decomposition

Abstract

Access this article

Similar content being viewed by others

Significance of Epoch Identification Accuracy in Prosody Modification for Effective Emotion Conversion

An Extended Variational Mode Decomposition Algorithm Developed Speech Emotion Recognition Performance

Epoch Extraction Using Hilbert–Huang Transform for Identification of Closed Glottis Interval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Epoch Estimation from Emotional Speech Signals Using Variational Mode Decomposition

Abstract

Access this article

Similar content being viewed by others

Significance of Epoch Identification Accuracy in Prosody Modification for Effective Emotion Conversion

An Extended Variational Mode Decomposition Algorithm Developed Speech Emotion Recognition Performance

Epoch Extraction Using Hilbert–Huang Transform for Identification of Closed Glottis Interval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation