Skip to main content
Log in

Robust glottal closure instant detection by jointly exploiting stationary wavelet transform and harmonic superposition

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This study jointly utilizes stationary wavelet transform (SWT) and harmonic superposition (HS) techniques to locate zero-crossings closely related to glottal closure instants (GCI). The entire process is performed directly on voiced speech signals without referring to the linear prediction residual or voiced source signal derived by inverse filtering. Subsequent to the multi-scale SWT decomposition, a linear phase FIR filter is introduced to translate positive zero-crossings into pulse-like features. While the product across the approximation coefficients in various SWT levels sharpens impulse features, the HS is employed to sieve out the main pulses corresponding GCIs. The advantages of using the proposed SWT–HS scheme for GCI detection are examined using the PTDB-TUG database. Compared with the other two advanced methods, namely the SEDREAMS and ZFR, without the assistance of any refining process the proposed SWT–HS not only renders better accuracy in GCI positioning but exhibits superior robustness against additive noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Bing, S., Gu, C., & Zhang, J. (2005). A new pitch detection algorithm based on wavelet transform. Journal of Shanghai University (English Edition), 9, 309–313.

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, S. H., & Wang, J. F. (2002). Noise-robust pitch detection method using wavelet transform with aliasing compensation. IEE Proceedings-Vision, Image and Signal Processing, 149, 327–334.

    Article  Google Scholar 

  • Drugman, T., & Dutoit, T. (2009). Glottal closure and opening instant detection from speech signals. In Proceedings of Interspeech Conference (pp. 2891–2894).

  • Drugman, T., Thomas, M., Gudnason, J., Naylor, P., & Dutoit, T. (2012). Detection of glottal closure instants from speech signals: A quantitative review. IEEE Transactions on Audio Speech, and Language Processing, 20, 994–1006.

    Article  Google Scholar 

  • Enders, J., Geng, W., Li, P., & Frazier, M. W. (2005). The shift-invariant discrete wavelet transform and application to speech waveform analysis. Journal of the Acoustic Society of America, 117, 2122–2133.

    Article  Google Scholar 

  • Erçelebi, E. (2003). Second generation wavelet transform-based pitch period estimation and voiced/unvoiced decision for speech signals. Applied Acoustics, 64, 25–41.

    Article  Google Scholar 

  • Fant, G. (1970). Acoustic theory of speech production with calculations based on X-ray studies of Russian articulations (2nd ed.). The Hague: Mouton.

    Google Scholar 

  • Hu, H.-T., Hsu, S.-T., & Yu, C. (2003). Determination of glottal closure instants by harmonic superposition. Signal Processing, 83, 1985–1995.

    Article  MATH  Google Scholar 

  • Kadambe, S., & Boudreaux-Bartels, G. F. (1992). Application of the wavelet transform for pitch detection of speech signals. IEEE Transactions on Information Theory, 38, 917–924.

    Article  Google Scholar 

  • Kane, J., & Gobl, C. (2013). Evaluation of glottal closure instant detection in a range of voice qualities. Speech Communication, 55, 295–314.

    Article  Google Scholar 

  • Mallat, S. G. (1999). A wavelet tour of signal processing (2nd ed.). San Diego: Academic Press.

    MATH  Google Scholar 

  • Mallat, S., & Zhong, S. (1992). Characterization of signals from multiscale edges. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 710–732.

    Article  Google Scholar 

  • Messaoud, M. A. B., Bouzid, A., & Ellouze, N. (2011). Using multi-scale product spectrum for single and multi-pitch estimation. IET Signal Processing, 5, 344–355.

    Article  Google Scholar 

  • Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio Speech, and Language Processing, 16, 1602–1613.

    Article  Google Scholar 

  • Obaidat, M. S., Brodzik, A., & Sadoum, B. (1998). A performance evaluation study of four wavelet algorithms for the pitch period estimation of speech signals. Information Sciences, 112, 213–221.

    Article  Google Scholar 

  • Obaidat, M. S., Lee, C., Sadoun, B., & Nelson, D. (1999). Estimation of pitch period of speech signal using a new dyadic wavelet algorithm. Information Sciences, 119, 21–39.

    Article  Google Scholar 

  • Pirker, G., Wohlmayr, M., Petrik, S., & Pernkopf, F. (2011). A pitch tracking corpus with evaluation on multipitch tracking scenario. In Proceedings of Interspeech Conference (pp. 1509–1512).

  • Quatieri, T. F. (2002). Discrete-time speech signal processing: Principles and practice. Upper Saddle River, NJ: Prentice Hall.

    Google Scholar 

  • Rabiner, L. R., & Schafer, R. W. (2011). Theory and applications of digital speech processing (1st ed.). Upper Saddle River: Pearson.

    Google Scholar 

  • Stylianou, Y. (2001). Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Transactions on Speech and Audio Processing, 9, 21–29.

    Article  Google Scholar 

  • Talkin, D. (1995). A robust algorithm for pitch tracking (RAPT). In W. B. Kleijn & K. K. Paliwal (Eds.), Speech coding and synthesis. Amsterdam: Elsevier.

    Google Scholar 

  • Thomas, M. R. P., Gudnason, J., & Naylor, P. A. (2012). Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Transactions on Audio Speech, and Language Processing, 20, 82–91.

    Article  Google Scholar 

  • Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12, 247–251.

    Article  Google Scholar 

  • Veprek, P., & Scordilis, M. S. (2002). Analysis, enhancement and evaluation of five pitch determination techniques. Speech Communication, 37, 249–270.

    Article  MATH  Google Scholar 

  • Xu, Y., Weaver, J. B., Healy, D. M., & Lu, J. (1994). Wavelet transform domain filters: A spatially selective noise filtration technique. IEEE Transactions on Image Processing, 3, 747–758.

    Article  Google Scholar 

  • Zad-Issa, M. R., & Kabal, P. (1997). A new LPC error criterion for improved pitch tracking. In IEEE Workshop on Speech Coding For Telecommunications Proceeding (pp. 1–2).

Download references

Acknowledgments

This work was supported by the Ministry of Science and Technology, Taiwan, ROC, under Grant MOST 102-2221-E-197-020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hwai-Tsu Hu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, HT., Hsu, LY. Robust glottal closure instant detection by jointly exploiting stationary wavelet transform and harmonic superposition. Int J Speech Technol 18, 685–695 (2015). https://doi.org/10.1007/s10772-015-9316-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-015-9316-2

Keywords

Navigation