Abstract
This study jointly utilizes stationary wavelet transform (SWT) and harmonic superposition (HS) techniques to locate zero-crossings closely related to glottal closure instants (GCI). The entire process is performed directly on voiced speech signals without referring to the linear prediction residual or voiced source signal derived by inverse filtering. Subsequent to the multi-scale SWT decomposition, a linear phase FIR filter is introduced to translate positive zero-crossings into pulse-like features. While the product across the approximation coefficients in various SWT levels sharpens impulse features, the HS is employed to sieve out the main pulses corresponding GCIs. The advantages of using the proposed SWT–HS scheme for GCI detection are examined using the PTDB-TUG database. Compared with the other two advanced methods, namely the SEDREAMS and ZFR, without the assistance of any refining process the proposed SWT–HS not only renders better accuracy in GCI positioning but exhibits superior robustness against additive noise.
Similar content being viewed by others
References
Bing, S., Gu, C., & Zhang, J. (2005). A new pitch detection algorithm based on wavelet transform. Journal of Shanghai University (English Edition), 9, 309–313.
Chen, S. H., & Wang, J. F. (2002). Noise-robust pitch detection method using wavelet transform with aliasing compensation. IEE Proceedings-Vision, Image and Signal Processing, 149, 327–334.
Drugman, T., & Dutoit, T. (2009). Glottal closure and opening instant detection from speech signals. In Proceedings of Interspeech Conference (pp. 2891–2894).
Drugman, T., Thomas, M., Gudnason, J., Naylor, P., & Dutoit, T. (2012). Detection of glottal closure instants from speech signals: A quantitative review. IEEE Transactions on Audio Speech, and Language Processing, 20, 994–1006.
Enders, J., Geng, W., Li, P., & Frazier, M. W. (2005). The shift-invariant discrete wavelet transform and application to speech waveform analysis. Journal of the Acoustic Society of America, 117, 2122–2133.
Erçelebi, E. (2003). Second generation wavelet transform-based pitch period estimation and voiced/unvoiced decision for speech signals. Applied Acoustics, 64, 25–41.
Fant, G. (1970). Acoustic theory of speech production with calculations based on X-ray studies of Russian articulations (2nd ed.). The Hague: Mouton.
Hu, H.-T., Hsu, S.-T., & Yu, C. (2003). Determination of glottal closure instants by harmonic superposition. Signal Processing, 83, 1985–1995.
Kadambe, S., & Boudreaux-Bartels, G. F. (1992). Application of the wavelet transform for pitch detection of speech signals. IEEE Transactions on Information Theory, 38, 917–924.
Kane, J., & Gobl, C. (2013). Evaluation of glottal closure instant detection in a range of voice qualities. Speech Communication, 55, 295–314.
Mallat, S. G. (1999). A wavelet tour of signal processing (2nd ed.). San Diego: Academic Press.
Mallat, S., & Zhong, S. (1992). Characterization of signals from multiscale edges. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 710–732.
Messaoud, M. A. B., Bouzid, A., & Ellouze, N. (2011). Using multi-scale product spectrum for single and multi-pitch estimation. IET Signal Processing, 5, 344–355.
Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio Speech, and Language Processing, 16, 1602–1613.
Obaidat, M. S., Brodzik, A., & Sadoum, B. (1998). A performance evaluation study of four wavelet algorithms for the pitch period estimation of speech signals. Information Sciences, 112, 213–221.
Obaidat, M. S., Lee, C., Sadoun, B., & Nelson, D. (1999). Estimation of pitch period of speech signal using a new dyadic wavelet algorithm. Information Sciences, 119, 21–39.
Pirker, G., Wohlmayr, M., Petrik, S., & Pernkopf, F. (2011). A pitch tracking corpus with evaluation on multipitch tracking scenario. In Proceedings of Interspeech Conference (pp. 1509–1512).
Quatieri, T. F. (2002). Discrete-time speech signal processing: Principles and practice. Upper Saddle River, NJ: Prentice Hall.
Rabiner, L. R., & Schafer, R. W. (2011). Theory and applications of digital speech processing (1st ed.). Upper Saddle River: Pearson.
Stylianou, Y. (2001). Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Transactions on Speech and Audio Processing, 9, 21–29.
Talkin, D. (1995). A robust algorithm for pitch tracking (RAPT). In W. B. Kleijn & K. K. Paliwal (Eds.), Speech coding and synthesis. Amsterdam: Elsevier.
Thomas, M. R. P., Gudnason, J., & Naylor, P. A. (2012). Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Transactions on Audio Speech, and Language Processing, 20, 82–91.
Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12, 247–251.
Veprek, P., & Scordilis, M. S. (2002). Analysis, enhancement and evaluation of five pitch determination techniques. Speech Communication, 37, 249–270.
Xu, Y., Weaver, J. B., Healy, D. M., & Lu, J. (1994). Wavelet transform domain filters: A spatially selective noise filtration technique. IEEE Transactions on Image Processing, 3, 747–758.
Zad-Issa, M. R., & Kabal, P. (1997). A new LPC error criterion for improved pitch tracking. In IEEE Workshop on Speech Coding For Telecommunications Proceeding (pp. 1–2).
Acknowledgments
This work was supported by the Ministry of Science and Technology, Taiwan, ROC, under Grant MOST 102-2221-E-197-020.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hu, HT., Hsu, LY. Robust glottal closure instant detection by jointly exploiting stationary wavelet transform and harmonic superposition. Int J Speech Technol 18, 685–695 (2015). https://doi.org/10.1007/s10772-015-9316-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-015-9316-2