Abstract
The objective of the proposed work is to accurately estimate the glottal closure instants (GCIs) and glottal opening instant (GOIs) from electroglottographic (EGG) signals. This work also addresses the issues with existing EGG-based GCI/GOI detection methods. GCIs are the instants at which excitation to the vocal tract is maximum and GOIs, on the other hand, have minimum excitation compared to GCIs. Both these instants occur instantaneously with a fundamental frequency defined for each glottal cycle in a given EGG signal. Accurate detection of these instants from the EGG signal is essential for the performance evaluation of GCIs and GOIs estimated from the speech signal directly. This work proposes a new method for accurate detection of GCIs and GOIs from the EGG signal using variational mode decomposition (VMD) algorithm. The EGG signal has been decomposed into sub-signals using the VMD algorithm. It is shown that VMD captures the center frequency close to the fundamental frequency of the EGG signal through one of its modes. This property of the corresponding mode helps to estimate GCIs and GOIs from the same. Besides, instantaneous pitch frequency is estimated from the obtained GCIs. The proposed method has been evaluated on the CMU-arctic database for GCI/GOI estimation and the Keele pitch extraction reference database for instantaneous pitch frequency estimation. The effectiveness of the proposed method is confirmed by comparison with state-of-the-art methods. Experimental results show that the proposed method has better accuracy and identification rate compared to state-of-the-art methods.
Similar content being viewed by others
References
C. Aneesh, S.S. Kumar, P.M. Hisham, K.P. Soman, Performance comparison of variational mode decomposition over empirical wavelet transform for the classification of power quality disturbances using support vector machine. Proc. Comput. Sci. 46, 372–380 (2015)
A. Bouzid, N. Ellouze, Multiscale product of electroglottogram signal for glottal closure and opening instant detection, in Multiconference on Computational Engineering in Systems Applications (2006), pp. 106–109
A. Bouzid, N. Ellouze, Voice source parameter measurement based on multi-scale analysis of electroglottographic signal. Speech Commun. 51, 782–792 (2009)
M. Brookes, VOICEBOX: speech processing toolbox for MATLAB (Online). http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
J. Deller, Some notes on closed phase glottal inverse filtering. IEEE Trans. Acoust. Speech Signal Process. 29(4), 917–919 (1981)
K. Dragomiretskiy, D. Zosso, Variational mode decomposition. IEEE Trans. Signal Process. 62(3), 531–544 (2014)
T. Drugman, T. Dutoit, Glottal closure and opening instant detection from speech signals, in Interspeech (2009), pp. 2891–2894
T. Drugman, P. Alku, A. Alwan, B. Yegnanarayana, Glottal source processing: from analysis to applications. Comput. Speech Lang. 28(5), 1117–1138 (2014)
J. Gilles, Empirical wavelet transform. IEEE Trans. Signal Process. 61(16), 3999–4010 (2013)
D. Govind, P. Hisham, D. Pravena, Effectiveness of polarity detection for improved epoch extraction from speech, in National Conference on Communication (2016), pp. 1–6
J. Gudnason, M. Brookes, Voice source cepstrum coefficients for speaker identification, in IEEE International Conference on Acoustics, Speech and Signal Processing (2008), pp. 4821–4824
N. Henrich, C. d’Alessandro, B. Doval, M. Castellengo, On the use of the derivative of electroglottographic signals for characterization of nonpathological phonation. J. Acoust. Soc. Am. 115(3), 1321–1332 (2004)
W. Hess, H. Indefrey, Accurate pitch determination of speech signals by means of a laryngograph, in IEEE International Conference on Acoustics, Speech and Signal Processing (1984), pp. 73–76
N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.C. Yen, C.C. Tung, H.H. Liu, The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. R. Soc. Lond. A: Math. Phys. Eng. Sci. 454, 903–995 (1988)
M.A. Huckvale, Speech filing system: tools for speech (Online). http://www.phon.ucl.ac.uk/resource/sfs/
J. Kominek, A. Black, CMU-arctic speech databases, in ISCA Speech Synthesis Workshop (2004), pp. 223–224
A.I. Koutrouvelis, G.P. Kafentzis, N.D. Gaubitch, R. Heusdens, A fast method for high-resolution voiced/unvoiced detection and glottal closure/opening instant estimation of speech. IEEE/ACM Trans. Audio Speech Lang. Process. 24(2), 316–328 (2016)
A. Mert, ECG feature extraction based on the bandwidth properties of variational mode decomposition. Physiol. Meas. 37(4), 530–543 (2016)
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
P.A. Naylor, A. Kounoudes, J. Gudnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007)
F. Plante, G.F. Meyer, W.A. Aubsworth, A pitch extraction reference database, in Eur. Conf. Speech Commun. (Eurospeech) (1995), pp. 827–840
E. Prabhakararao, M.S. Manikandan, On the use of variational mode decomposition for removal of baseline wander in ECG signals, in National Conference on Communication (2016), pp. 1–6
T.F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice (Prentice-Hall, Upper Saddle River, 2002)
L.R. Rabiner, M.J. Cheng, A.E. Rosenberg, C.A. McGonegal, A comparative performance study of several pitch detection algorithms. IEEE Trans. Acoust. Speech Signal Process. 24(5), 399–418 (1976)
K. Ramesh, S.R.M. Prasanna, D. Govind, Detection of glottal opening instants using hilbert envelope, in Interspeech (2013), pp. 44–48
K. Ramesh, S.R.M. Prasanna, R.K. Das, Significance of glottal activity detection and glottal signature for text dependent speaker verification, in International Conference on Signal Processing and Communications (SPCOM) (2014), pp. 1–5
K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14, 972–980 (2006)
K.P. Soman, P. Prabaharan, S. Athira, K. Harikumar, Recursive variational mode decomposition algorithm for real time power signal decomposition. Proc. Technol. 21, 540–546 (2015)
D. Talkin, A robust algorithm for pitch tracking, in Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, New Providence, 1995), pp. 495–518
M.R.P. Thomas, P.A. Naylor, The SIGMA algorithm: a glottal activity detector for electroglottographic signals. IEEE Trans. Audio Speech Lang. Process. 17, 1557–1566 (2009)
M.R.P. Thomas, J. Gudnason, P.A. Naylor, Data-driven voice source waveform modelling, in IEEE International Conference on Acoustics, Speech and Signal Processing (2009), pp. 3965–3968
M.R.P. Thomas, J. Gudnason, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Trans. Audio Speech Lang. Process. 20(1), 82–91 (2012)
D. Thotappa, S.R.M. Prasanna, Reference and automatic marking of glottal opening instants using EGG signal, in International Conference on Signal Processing and Communications (SPCOM) (2014), pp. 1–5
A. Upadhyay, R.B. Pachori, A new method for determination of instantaneous pitch frequency from speech signals, in IEEE Signal Processing and Signal Processing Education Workshop (2015), pp. 325–330
A. Upadhyay, R.B. Pachori, Instantaneous voiced/non-voiced detection in speech signals based on variational mode decomposition. J. Frankl. Inst. 352, 2679–2707 (2015)
D. Veeneman, S. BeMent, Automatic glottal inverse filtering from speech and electroglottographic signals. IEEE Trans. Signal Process. 33(4), 369–377 (1985)
E. Wechsler, A laryngographic study of voice disorders. Int. J. Lang. Commun. Disord. 12, 9–22 (1977)
Y.J. Xue, J.X. Cao, D.X. Wang, H.K. Du, Y. Yao, Application of the variational-mode decomposition for seismic timefrequency analysis. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 9(8), 3821–3831 (2016)
B. Yegnanarayana, K.S.R. Murty, Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang. Process. 17(4), 614–624 (2009)
B. Yegnanarayana, R.N.J. Veldhuis, Extraction of vocal-tract system characteristics from speech signals. IEEE Trans. Speech Audio Process. 6, 313–327 (1998)
Acknowledgements
We gratefully acknowledge the generous funding provided by Amrita university. The authors would like to thank Dr. K. P. Soman and Ms. M. Neethu for the help given in understanding the concept of VMD algorithm. Next, the authors would like to thank Mr. M. A. Huckvale for providing the speech filing system toolbox. Again, the authors would like to acknowledge Mr. M. Brookes for providing easy access to VOICEBOX toolbox. Finally, the authors would like to thank Mr. F. Plante and Mr. J. Kominek for the EGG reference database used.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lal, G.J., Gopalakrishnan, E.A. & Govind, D. Accurate Estimation of Glottal Closure Instants and Glottal Opening Instants from Electroglottographic Signal Using Variational Mode Decomposition. Circuits Syst Signal Process 37, 810–830 (2018). https://doi.org/10.1007/s00034-017-0582-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-017-0582-x