Abstract
This paper presents a novel approach for the estimation of epochs from the emotional speech signal. Epochs are the locations of significant excitation in the vocal tract during the production of voiced sound by the vibration of vocal folds. The estimation of epoch locations is essential for deriving instantaneous pitch contours for accurate emotion analysis. Many well-known algorithms for epoch extraction are found to show degraded performance due to the varying nature of excitation characteristics in the emotional speech signal. The proposed approach exploits the effectiveness of a new adaptive time series decomposition technique called variational mode decomposition (VMD) for the estimation of epochs. The VMD algorithm is applied on the emotional speech signal for decomposition of the signal into various sub-signals. Analysis of these signals shows that the VMD algorithm captures the center frequency close to the fundamental frequency defined for each glottal cycle of emotional speech utterance through its modes. This center frequency characteristic of the corresponding mode signal helps in the accurate estimation of epoch locations from the emotional speech signal. The performance evaluation of the proposed method is carried out on six different emotions taken from the German emotional speech database with simultaneous electroglottographic signals. Experimental results on clean emotive speech signals show that the proposed method provides identification rate and accuracy comparable to that of the best performing algorithm. Besides, the proposed method provides better reliability in epoch estimation from emotive speech signals degraded by the presence of noise.
Similar content being viewed by others
References
T. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Process. 27(4), 309–319 (1979)
M. Brookes, VOICEBOX: speech processing toolbox for MATLAB. http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html. Accessed 30 May 2017
M. Bulut, S. Narayanan, On the robustness of overall f0-only modifications to the perception of emotions in speech. J. Acoust. Soc. Am. 123, 4547–4558 (2008)
F. Burkhardt, A. Paeschke, M. Rolfes, W.F. Sendlmeier, B. Weiss, A database of german emotional speech, in Interspeech, pp. 1–4 (2005)
J.P. Cabral, L.C. Oliveira, Emo voice: a system to generate emotions in speech, in Interspeech, pp. 1798–1801 (2006)
J.P. Cabral, L.C. Oliveira, Pitch-synchronous time-scaling for prosodic and voice quality transformations, in Interspeech, pp. 1137–1140 (2005)
F. Dellaert, T. Polzin, A. Waibel, Recognizing emotion in speech. Spoken Language, in ICSLP 96, pp, 1970–1973 (1996)
K.T. Deepak, S.R.M. Prasanna, Epoch extraction using zero band filtering from speech signal. Circuits Syst. Signal Process. 34(7), 2309–2333 (2015)
P. Deshpande, M.S. Manikandan, Effective glottal instant detection and electroglottographic parameter extraction for automated voice pathology assessment. IEEE J. Biomed. Health Inf. PP(99), 1–11 (2017)
K. Dragomiretskiy, D. Zosso, Variational mode decomposition. IEEE Trans. Signal Process. 62(3), 531–544 (2014)
T. Drugman, P. Alku, A. Alwan, B. Yegnanarayana, Glottal source processing: from analysis to applications. Comput. Speech Lang. 28(5), 1117–1138 (2014)
T. Drugman, T. Dutoit, Glottal closure and opening instant detection from speech signals, in Interspeech, pp. 2891–2894 (2009)
S.R. Dumpala, K.V. Sridaran, S.V. Gangashetty, B. Yegnanarayana, Analysis of laughter and speech-laugh signals using excitation source information, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 975–979 (2014)
Z. Gao, X. Wang, J. Lin, Y. Liao, Online evaluation of metal burn degrees based on acoustic emission and variational mode decomposition. Measurement 103, 302–310 (2017)
J. Gilles, Empirical wavelet transform. IEEE Trans. Signal Process. 61(16), 3999–4010 (2013)
D. Govind, Epoch based dynamic prosody modification for neutral to expressive conversion, Ph.D Thesis, http://gyan.iitg.ernet.in/handle/123456789/363. Accessed 10 July 2017
D. Govind, P. Hisham, D. Pravena, Effectiveness of polarity detection for improved epoch extraction from speech, in National Conference on Communication (NCC), pp. 1–6 (2016)
D. Govind, S.R.M. Prasanna, Epoch extraction from emotional speech, in International Conference on Signal Processing and Communications (SPCOM), pp. 1–5 (2012)
D. Govind, S.R.M. Prasanna, Expressive speech synthesis: a review. Int. J. Speech Technol. 16(2), 237–260 (2013)
D. Govind, S.R.M. Prasanna, B. Yegnanarayana, Neutral to target emotion conversion using source and suprasegmental information, in Interspeech, pp. 2969–2972 (2011)
D. Govind, R. Vishnu, D. Pravena, Improved method for epoch estimation in telephonic speech signals using zero frequency filtering, in IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp. 11–15 (2015)
N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.C. Yen, C.C. Tung, H.H. Liu, The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Royal Soc. Lond. A Math. Phys. Eng. Sci. 454, 903–995 (1988)
S. R. Kadiri, P. Gangamohan, S.V Gangashetty, B. Yegnanarayana, Analysis of excitation source features of speech for emotion recognition, in Interspeech, pp. 1324–1328 (2015)
S.R. Kadiri, B. Yegnanarayana, Epoch extraction from emotional speech using single frequency filtering approach. Speech Commun. 86, 52–63 (2017)
S.R. Kadiri, B. Yegnanarayana, Speech polarity detection using strength of impulse-like excitation extracted from speech epochs, in ICASSP), pp. 5610–5614 (2017)
S.R. Kadiri, B. Yegnanarayana, Analysis of singing voice for epoch extraction using zero frequency filtering method, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4260–4264 (2015)
V. Khanagha, K. Daoudi, H. Yahia, Detection of glottal closure instants based on the microcanonical multiscale formalism. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1941–1950 (2014)
S.G. Koolagudi, S. Devliyal, B. Chawla, A. Barthwal, K.S. Rao, Recognition of emotions from speech using excitation source features. Procedia Eng. 38, 3409–3417 (2012)
S.G. Koolagudi, R. Reddy, K.S. Rao, Emotion recognition from speech signal using epoch parameters, in International Conference on Signal Processing and Communications (SPCOM), pp. 1–5 (2010)
A.I. Koutrouvelis, G.P. Kafentzis, N.D. Gaubitch, R. Heusdens, A fast method for high-resolution voiced/unvoiced detection and glottal closure/opening instant estimation of speech. IEEE/ACM Trans. Audio Speech Lang. Process. 24(2), 316–328 (2016)
S.R. Krothapalli, S.G. Koolagudi, Characterization and recognition of emotions from speech using excitation source information. Int. J. Speech Technol. 16(2), 181–201 (2013)
K.S. Kumar, M.S.H. Reddy, K.S.R. Murty, B. Yegnanarayana, Analysis of laugh signals for detecting in continuous speech, Interspeech, pp. 1591–1594 (2009)
G.J. Lal, E.A. Gopalakrishnan, D. Govind, Accurate estimation of glottal closure instants and glottal opening instants from electroglottographic signal using variational mode decomposition. Circuits Syst. Signal Process. 37(2), 810–830 (2018)
A. Mert, ECG feature extraction based on the bandwidth properties of variational mode decomposition. Physiol. Meas. 37(4), 530–543 (2016)
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
P.A. Naylor, A. Kounoudes, J. Gudnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007)
Noisex-92, www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html. Accessed 9 Dec 2017
S.R.M. Prasanna, D. Govind, Analysis of excitation source information in emotional speech, in Interspeech, pp. 781–784 (2010)
A.P. Prathosh, T.V. Ananthapadmanabha, A.G. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans. Audio Speech Lang. Process. 21(12), 2471–2480 (2013)
L.R. Rabiner, M.J. Cheng, A.E. Rosenberg, C.A. McGonegal, A comparative performance study of several pitch detection algorithms. IEEE Trans. Audio Speech Lang. Process. 24(5), 399–418 (1976)
K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14, 972–980 (2006)
K.R. Scherer, Vocal affect expressions: a review and a model for future research. Psychol. Bull. 99, 143–165 (1986)
K.P. Soman, P. Prabaharan, S. Athira, K. Harikumar, Recursive variational mode decomposition algorithm for real time power signal decomposition. Procedia Technol. 21, 540–546 (2015)
D. Talkin, A robust algorithm for pitch tracking, in Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, New Providence, 1995), pp. 495–518
S.A. Thati, K.S. Kumar, B. Yegnanarayana, Synthesis of laughter by modifying excitation characteristics. J. Acoust. Soc. Am. 133(5), 3072–3082 (2013)
M.R.P. Thomas, J. Gudnason, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Trans. Audio Speech Lang. Process. 20(1), 82–91 (2012)
A. Upadhyay, R.B. Pachori, Instantaneous voiced/non-voiced detection in speech signals based on variational mode decomposition. J. Frankl. Inst. 352, 2679–2707 (2015)
A. Upadhyay, R.B. Pachori, A new method for determination of instantaneous pitch frequency from speech signals, in IEEE Signal Processing and Signal Processing Education Workshop, pp. 325–330 (2015)
WAVESURFER, https://www.speech.kth.se/wavesurfer. Accessed 6 Mar 2017
C.E. Williams, K. Stevens, Emotions and speech: some acoustic correlates. J. Acoust. Soc. Am. 52, 1238–1250 (1972)
Y.J. Xue, J.X. Cao, D.X. Wang, H.K. Du, Y. Yao, Application of the variational-mode decomposition for seismic time–frequency analysis. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 9(8), 3821–3831 (2016)
W. Yang, Z. Peng, K. Wei, P. Shi, W. Tian, Superiorities of variational mode decomposition over empirical mode decomposition particularly in time–frequency feature extraction and wind turbine condition monitoring. IET Renew. Power Gener. 11, 443–452 (2016). https://doi.org/10.1049/iet-rpg.2016.0088
B. Yegnanarayana, K.S.R. Murty, Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang. Process. 17(4), 614–624 (2009)
Acknowledgements
The authors gratefully acknowledge Amrita Vishwa Vidyapeetham for supporting the first author in pursing his Ph.D. The authors would like to thank Dr. K.P. Soman and Ms. M. Neethu (Amrita Vishwa Vidyapeetham) for lucidly explaining the concept of VMD .
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lal, G.J., Gopalakrishnan, E.A. & Govind, D. Epoch Estimation from Emotional Speech Signals Using Variational Mode Decomposition. Circuits Syst Signal Process 37, 3245–3274 (2018). https://doi.org/10.1007/s00034-018-0804-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-018-0804-x