Robust glottal closure instant detection by jointly exploiting stationary wavelet transform and harmonic superposition

Hu, Hwai-Tsu; Hsu, Ling-Yuan

doi:10.1007/s10772-015-9316-2

Robust glottal closure instant detection by jointly exploiting stationary wavelet transform and harmonic superposition

Published: 27 October 2015

Volume 18, pages 685–695, (2015)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Hwai-Tsu Hu¹ &
Ling-Yuan Hsu²

166 Accesses
1 Citation
Explore all metrics

Abstract

This study jointly utilizes stationary wavelet transform (SWT) and harmonic superposition (HS) techniques to locate zero-crossings closely related to glottal closure instants (GCI). The entire process is performed directly on voiced speech signals without referring to the linear prediction residual or voiced source signal derived by inverse filtering. Subsequent to the multi-scale SWT decomposition, a linear phase FIR filter is introduced to translate positive zero-crossings into pulse-like features. While the product across the approximation coefficients in various SWT levels sharpens impulse features, the HS is employed to sieve out the main pulses corresponding GCIs. The advantages of using the proposed SWT–HS scheme for GCI detection are examined using the PTDB-TUG database. Compared with the other two advanced methods, namely the SEDREAMS and ZFR, without the assistance of any refining process the proposed SWT–HS not only renders better accuracy in GCI positioning but exhibits superior robustness against additive noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Glottal Closure Instant Detection by the Multi-scale Product of the Derivative Glottal Waveform Signal

Group Delay Function Followed by Dynamic Programming Versus Multiscale-Product for Glottal Closure Instant Detection

Detection of the Glottal Closure Instants Using Empirical Mode Decomposition

Article 17 November 2017

References

Bing, S., Gu, C., & Zhang, J. (2005). A new pitch detection algorithm based on wavelet transform. Journal of Shanghai University (English Edition), 9, 309–313.
Article MathSciNet MATH Google Scholar
Chen, S. H., & Wang, J. F. (2002). Noise-robust pitch detection method using wavelet transform with aliasing compensation. IEE Proceedings-Vision, Image and Signal Processing, 149, 327–334.
Article Google Scholar
Drugman, T., & Dutoit, T. (2009). Glottal closure and opening instant detection from speech signals. In Proceedings of Interspeech Conference (pp. 2891–2894).
Drugman, T., Thomas, M., Gudnason, J., Naylor, P., & Dutoit, T. (2012). Detection of glottal closure instants from speech signals: A quantitative review. IEEE Transactions on Audio Speech, and Language Processing, 20, 994–1006.
Article Google Scholar
Enders, J., Geng, W., Li, P., & Frazier, M. W. (2005). The shift-invariant discrete wavelet transform and application to speech waveform analysis. Journal of the Acoustic Society of America, 117, 2122–2133.
Article Google Scholar
Erçelebi, E. (2003). Second generation wavelet transform-based pitch period estimation and voiced/unvoiced decision for speech signals. Applied Acoustics, 64, 25–41.
Article Google Scholar
Fant, G. (1970). Acoustic theory of speech production with calculations based on X-ray studies of Russian articulations (2nd ed.). The Hague: Mouton.
Google Scholar
Hu, H.-T., Hsu, S.-T., & Yu, C. (2003). Determination of glottal closure instants by harmonic superposition. Signal Processing, 83, 1985–1995.
Article MATH Google Scholar
Kadambe, S., & Boudreaux-Bartels, G. F. (1992). Application of the wavelet transform for pitch detection of speech signals. IEEE Transactions on Information Theory, 38, 917–924.
Article Google Scholar
Kane, J., & Gobl, C. (2013). Evaluation of glottal closure instant detection in a range of voice qualities. Speech Communication, 55, 295–314.
Article Google Scholar
Mallat, S. G. (1999). A wavelet tour of signal processing (2nd ed.). San Diego: Academic Press.
MATH Google Scholar
Mallat, S., & Zhong, S. (1992). Characterization of signals from multiscale edges. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 710–732.
Article Google Scholar
Messaoud, M. A. B., Bouzid, A., & Ellouze, N. (2011). Using multi-scale product spectrum for single and multi-pitch estimation. IET Signal Processing, 5, 344–355.
Article Google Scholar
Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio Speech, and Language Processing, 16, 1602–1613.
Article Google Scholar
Obaidat, M. S., Brodzik, A., & Sadoum, B. (1998). A performance evaluation study of four wavelet algorithms for the pitch period estimation of speech signals. Information Sciences, 112, 213–221.
Article Google Scholar
Obaidat, M. S., Lee, C., Sadoun, B., & Nelson, D. (1999). Estimation of pitch period of speech signal using a new dyadic wavelet algorithm. Information Sciences, 119, 21–39.
Article Google Scholar
Pirker, G., Wohlmayr, M., Petrik, S., & Pernkopf, F. (2011). A pitch tracking corpus with evaluation on multipitch tracking scenario. In Proceedings of Interspeech Conference (pp. 1509–1512).
Quatieri, T. F. (2002). Discrete-time speech signal processing: Principles and practice. Upper Saddle River, NJ: Prentice Hall.
Google Scholar
Rabiner, L. R., & Schafer, R. W. (2011). Theory and applications of digital speech processing (1st ed.). Upper Saddle River: Pearson.
Google Scholar
Stylianou, Y. (2001). Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Transactions on Speech and Audio Processing, 9, 21–29.
Article Google Scholar
Talkin, D. (1995). A robust algorithm for pitch tracking (RAPT). In W. B. Kleijn & K. K. Paliwal (Eds.), Speech coding and synthesis. Amsterdam: Elsevier.
Google Scholar
Thomas, M. R. P., Gudnason, J., & Naylor, P. A. (2012). Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Transactions on Audio Speech, and Language Processing, 20, 82–91.
Article Google Scholar
Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12, 247–251.
Article Google Scholar
Veprek, P., & Scordilis, M. S. (2002). Analysis, enhancement and evaluation of five pitch determination techniques. Speech Communication, 37, 249–270.
Article MATH Google Scholar
Xu, Y., Weaver, J. B., Healy, D. M., & Lu, J. (1994). Wavelet transform domain filters: A spatially selective noise filtration technique. IEEE Transactions on Image Processing, 3, 747–758.
Article Google Scholar
Zad-Issa, M. R., & Kabal, P. (1997). A new LPC error criterion for improved pitch tracking. In IEEE Workshop on Speech Coding For Telecommunications Proceeding (pp. 1–2).

Download references

Acknowledgments

This work was supported by the Ministry of Science and Technology, Taiwan, ROC, under Grant MOST 102-2221-E-197-020.

Author information

Authors and Affiliations

Department of Electronic Engineering, National I-Lan University, Yi-Lan, 26041, Taiwan, ROC
Hwai-Tsu Hu
Department of Information Management, St. Mary’s Junior College of Medicine, Nursing and Management, Yi-Lan, 26644, Taiwan, ROC
Ling-Yuan Hsu

Authors

Hwai-Tsu Hu
View author publications
You can also search for this author in PubMed Google Scholar
Ling-Yuan Hsu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hwai-Tsu Hu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, HT., Hsu, LY. Robust glottal closure instant detection by jointly exploiting stationary wavelet transform and harmonic superposition. Int J Speech Technol 18, 685–695 (2015). https://doi.org/10.1007/s10772-015-9316-2

Download citation

Received: 07 July 2015
Accepted: 20 October 2015
Published: 27 October 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s10772-015-9316-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust glottal closure instant detection by jointly exploiting stationary wavelet transform and harmonic superposition

Abstract

Access this article

Similar content being viewed by others

Glottal Closure Instant Detection by the Multi-scale Product of the Derivative Glottal Waveform Signal

Group Delay Function Followed by Dynamic Programming Versus Multiscale-Product for Glottal Closure Instant Detection

Detection of the Glottal Closure Instants Using Empirical Mode Decomposition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust glottal closure instant detection by jointly exploiting stationary wavelet transform and harmonic superposition

Abstract

Access this article

Similar content being viewed by others

Glottal Closure Instant Detection by the Multi-scale Product of the Derivative Glottal Waveform Signal

Group Delay Function Followed by Dynamic Programming Versus Multiscale-Product for Glottal Closure Instant Detection

Detection of the Glottal Closure Instants Using Empirical Mode Decomposition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation