Skip to main content
Log in

Epoch Estimation from Emotional Speech Signals Using Variational Mode Decomposition

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

This paper presents a novel approach for the estimation of epochs from the emotional speech signal. Epochs are the locations of significant excitation in the vocal tract during the production of voiced sound by the vibration of vocal folds. The estimation of epoch locations is essential for deriving instantaneous pitch contours for accurate emotion analysis. Many well-known algorithms for epoch extraction are found to show degraded performance due to the varying nature of excitation characteristics in the emotional speech signal. The proposed approach exploits the effectiveness of a new adaptive time series decomposition technique called variational mode decomposition (VMD) for the estimation of epochs. The VMD algorithm is applied on the emotional speech signal for decomposition of the signal into various sub-signals. Analysis of these signals shows that the VMD algorithm captures the center frequency close to the fundamental frequency defined for each glottal cycle of emotional speech utterance through its modes. This center frequency characteristic of the corresponding mode signal helps in the accurate estimation of epoch locations from the emotional speech signal. The performance evaluation of the proposed method is carried out on six different emotions taken from the German emotional speech database with simultaneous electroglottographic signals. Experimental results on clean emotive speech signals show that the proposed method provides identification rate and accuracy comparable to that of the best performing algorithm. Besides, the proposed method provides better reliability in epoch estimation from emotive speech signals degraded by the presence of noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. T. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Process. 27(4), 309–319 (1979)

    Article  Google Scholar 

  2. M. Brookes, VOICEBOX: speech processing toolbox for MATLAB. http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html. Accessed 30 May 2017

  3. M. Bulut, S. Narayanan, On the robustness of overall f0-only modifications to the perception of emotions in speech. J. Acoust. Soc. Am. 123, 4547–4558 (2008)

    Article  Google Scholar 

  4. F. Burkhardt, A. Paeschke, M. Rolfes, W.F. Sendlmeier, B. Weiss, A database of german emotional speech, in Interspeech, pp. 1–4 (2005)

  5. J.P. Cabral, L.C. Oliveira, Emo voice: a system to generate emotions in speech, in Interspeech, pp. 1798–1801 (2006)

  6. J.P. Cabral, L.C. Oliveira, Pitch-synchronous time-scaling for prosodic and voice quality transformations, in Interspeech, pp. 1137–1140 (2005)

  7. F. Dellaert, T. Polzin, A. Waibel, Recognizing emotion in speech. Spoken Language, in ICSLP 96, pp, 1970–1973 (1996)

  8. K.T. Deepak, S.R.M. Prasanna, Epoch extraction using zero band filtering from speech signal. Circuits Syst. Signal Process. 34(7), 2309–2333 (2015)

    Article  Google Scholar 

  9. P. Deshpande, M.S. Manikandan, Effective glottal instant detection and electroglottographic parameter extraction for automated voice pathology assessment. IEEE J. Biomed. Health Inf. PP(99), 1–11 (2017)

    Google Scholar 

  10. K. Dragomiretskiy, D. Zosso, Variational mode decomposition. IEEE Trans. Signal Process. 62(3), 531–544 (2014)

    Article  MathSciNet  Google Scholar 

  11. T. Drugman, P. Alku, A. Alwan, B. Yegnanarayana, Glottal source processing: from analysis to applications. Comput. Speech Lang. 28(5), 1117–1138 (2014)

    Article  Google Scholar 

  12. T. Drugman, T. Dutoit, Glottal closure and opening instant detection from speech signals, in Interspeech, pp. 2891–2894 (2009)

  13. S.R. Dumpala, K.V. Sridaran, S.V. Gangashetty, B. Yegnanarayana, Analysis of laughter and speech-laugh signals using excitation source information, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 975–979 (2014)

  14. Z. Gao, X. Wang, J. Lin, Y. Liao, Online evaluation of metal burn degrees based on acoustic emission and variational mode decomposition. Measurement 103, 302–310 (2017)

    Article  Google Scholar 

  15. J. Gilles, Empirical wavelet transform. IEEE Trans. Signal Process. 61(16), 3999–4010 (2013)

    Article  MathSciNet  Google Scholar 

  16. D. Govind, Epoch based dynamic prosody modification for neutral to expressive conversion, Ph.D Thesis, http://gyan.iitg.ernet.in/handle/123456789/363. Accessed 10 July 2017

  17. D. Govind, P. Hisham, D. Pravena, Effectiveness of polarity detection for improved epoch extraction from speech, in National Conference on Communication (NCC), pp. 1–6 (2016)

  18. D. Govind, S.R.M. Prasanna, Epoch extraction from emotional speech, in International Conference on Signal Processing and Communications (SPCOM), pp. 1–5 (2012)

  19. D. Govind, S.R.M. Prasanna, Expressive speech synthesis: a review. Int. J. Speech Technol. 16(2), 237–260 (2013)

    Article  Google Scholar 

  20. D. Govind, S.R.M. Prasanna, B. Yegnanarayana, Neutral to target emotion conversion using source and suprasegmental information, in Interspeech, pp. 2969–2972 (2011)

  21. D. Govind, R. Vishnu, D. Pravena, Improved method for epoch estimation in telephonic speech signals using zero frequency filtering, in IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp. 11–15 (2015)

  22. N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.C. Yen, C.C. Tung, H.H. Liu, The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Royal Soc. Lond. A Math. Phys. Eng. Sci. 454, 903–995 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  23. S. R. Kadiri, P. Gangamohan, S.V Gangashetty, B. Yegnanarayana, Analysis of excitation source features of speech for emotion recognition, in Interspeech, pp. 1324–1328 (2015)

  24. S.R. Kadiri, B. Yegnanarayana, Epoch extraction from emotional speech using single frequency filtering approach. Speech Commun. 86, 52–63 (2017)

    Article  Google Scholar 

  25. S.R. Kadiri, B. Yegnanarayana, Speech polarity detection using strength of impulse-like excitation extracted from speech epochs, in ICASSP), pp. 5610–5614 (2017)

  26. S.R. Kadiri, B. Yegnanarayana, Analysis of singing voice for epoch extraction using zero frequency filtering method, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4260–4264 (2015)

  27. V. Khanagha, K. Daoudi, H. Yahia, Detection of glottal closure instants based on the microcanonical multiscale formalism. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1941–1950 (2014)

    Article  Google Scholar 

  28. S.G. Koolagudi, S. Devliyal, B. Chawla, A. Barthwal, K.S. Rao, Recognition of emotions from speech using excitation source features. Procedia Eng. 38, 3409–3417 (2012)

    Article  Google Scholar 

  29. S.G. Koolagudi, R. Reddy, K.S. Rao, Emotion recognition from speech signal using epoch parameters, in International Conference on Signal Processing and Communications (SPCOM), pp. 1–5 (2010)

  30. A.I. Koutrouvelis, G.P. Kafentzis, N.D. Gaubitch, R. Heusdens, A fast method for high-resolution voiced/unvoiced detection and glottal closure/opening instant estimation of speech. IEEE/ACM Trans. Audio Speech Lang. Process. 24(2), 316–328 (2016)

    Article  Google Scholar 

  31. S.R. Krothapalli, S.G. Koolagudi, Characterization and recognition of emotions from speech using excitation source information. Int. J. Speech Technol. 16(2), 181–201 (2013)

    Article  Google Scholar 

  32. K.S. Kumar, M.S.H. Reddy, K.S.R. Murty, B. Yegnanarayana, Analysis of laugh signals for detecting in continuous speech, Interspeech, pp. 1591–1594 (2009)

  33. G.J. Lal, E.A. Gopalakrishnan, D. Govind, Accurate estimation of glottal closure instants and glottal opening instants from electroglottographic signal using variational mode decomposition. Circuits Syst. Signal Process. 37(2), 810–830 (2018)

    Article  MathSciNet  Google Scholar 

  34. A. Mert, ECG feature extraction based on the bandwidth properties of variational mode decomposition. Physiol. Meas. 37(4), 530–543 (2016)

    Article  Google Scholar 

  35. K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)

    Article  Google Scholar 

  36. P.A. Naylor, A. Kounoudes, J. Gudnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007)

    Article  Google Scholar 

  37. Noisex-92, www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html. Accessed 9 Dec 2017

  38. S.R.M. Prasanna, D. Govind, Analysis of excitation source information in emotional speech, in Interspeech, pp. 781–784 (2010)

  39. A.P. Prathosh, T.V. Ananthapadmanabha, A.G. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans. Audio Speech Lang. Process. 21(12), 2471–2480 (2013)

    Article  Google Scholar 

  40. L.R. Rabiner, M.J. Cheng, A.E. Rosenberg, C.A. McGonegal, A comparative performance study of several pitch detection algorithms. IEEE Trans. Audio Speech Lang. Process. 24(5), 399–418 (1976)

    Article  Google Scholar 

  41. K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14, 972–980 (2006)

    Article  Google Scholar 

  42. K.R. Scherer, Vocal affect expressions: a review and a model for future research. Psychol. Bull. 99, 143–165 (1986)

    Article  Google Scholar 

  43. K.P. Soman, P. Prabaharan, S. Athira, K. Harikumar, Recursive variational mode decomposition algorithm for real time power signal decomposition. Procedia Technol. 21, 540–546 (2015)

    Article  Google Scholar 

  44. D. Talkin, A robust algorithm for pitch tracking, in Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, New Providence, 1995), pp. 495–518

    Google Scholar 

  45. S.A. Thati, K.S. Kumar, B. Yegnanarayana, Synthesis of laughter by modifying excitation characteristics. J. Acoust. Soc. Am. 133(5), 3072–3082 (2013)

    Article  Google Scholar 

  46. M.R.P. Thomas, J. Gudnason, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Trans. Audio Speech Lang. Process. 20(1), 82–91 (2012)

    Article  Google Scholar 

  47. A. Upadhyay, R.B. Pachori, Instantaneous voiced/non-voiced detection in speech signals based on variational mode decomposition. J. Frankl. Inst. 352, 2679–2707 (2015)

    Article  Google Scholar 

  48. A. Upadhyay, R.B. Pachori, A new method for determination of instantaneous pitch frequency from speech signals, in IEEE Signal Processing and Signal Processing Education Workshop, pp. 325–330 (2015)

  49. WAVESURFER, https://www.speech.kth.se/wavesurfer. Accessed 6 Mar 2017

  50. C.E. Williams, K. Stevens, Emotions and speech: some acoustic correlates. J. Acoust. Soc. Am. 52, 1238–1250 (1972)

    Article  Google Scholar 

  51. Y.J. Xue, J.X. Cao, D.X. Wang, H.K. Du, Y. Yao, Application of the variational-mode decomposition for seismic time–frequency analysis. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 9(8), 3821–3831 (2016)

    Article  Google Scholar 

  52. W. Yang, Z. Peng, K. Wei, P. Shi, W. Tian, Superiorities of variational mode decomposition over empirical mode decomposition particularly in time–frequency feature extraction and wind turbine condition monitoring. IET Renew. Power Gener. 11, 443–452 (2016). https://doi.org/10.1049/iet-rpg.2016.0088

    Article  Google Scholar 

  53. B. Yegnanarayana, K.S.R. Murty, Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang. Process. 17(4), 614–624 (2009)

    Article  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge Amrita Vishwa Vidyapeetham for supporting the first author in pursing his Ph.D. The authors would like to thank Dr. K.P. Soman and Ms. M. Neethu (Amrita Vishwa Vidyapeetham) for lucidly explaining the concept of VMD .

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to E. A. Gopalakrishnan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lal, G.J., Gopalakrishnan, E.A. & Govind, D. Epoch Estimation from Emotional Speech Signals Using Variational Mode Decomposition. Circuits Syst Signal Process 37, 3245–3274 (2018). https://doi.org/10.1007/s00034-018-0804-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-018-0804-x

Keywords

Navigation