Skip to main content
Log in

Effectiveness of Teager energy operator for epoch detection from speech signals

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, we try to present the problem of epoch detection from a different perspective that not only deals with estimation of epoch instances (i.e., glottal activity) but also with quantification of the absence of epochs (i.e., no glottal activity) in the unvoiced regions of speech signal. Most of the epoch detection methods perform significantly well in the voiced regions of speech but are not robust enough in the unvoiced regions of speech, i.e., they detect a number of pseudo epochs in the unvoiced regions of speech. We propose a simple method based on Teager Energy Operator (TEO) which not only determines the epochs in voiced region (due to its superior temporal resolution and its ability to capture airflow properties through the glottis) but also is very effective in unvoiced region. Recently proposed methods such as 0-Hz resonator-based method and DYPSA method gave a combined rate (CR) (for detecting epochs in voiced and unvoiced regions of speech) of 74.7% and 60%, respectively and a pseudo epoch rate (PER) (i.e., spurious epochs in the unvoiced regions of speech) of 62.9% and 54.04%, respectively. On the other hand, our proposed method gave a CR and PER of 87% and 0.27%, respectively. This result suggests that the proposed method captures glottal activity more efficiently both in voiced and unvoiced regions of speech signal. The performance of the proposed method is demonstrated using publicly available CMU-Arctic database using the epoch information from the electro-glottograph (EGG) as reference signal to serve as ground truth for estimation of glottal closure instants (GCI). Due to the noise suppression capability of TEO, the proposed method has almost no or little effect (i.e., robust) against signal degradations like white, babble, high frequency and vehicle noises as compared to 0-Hz resonator and DYPSA methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ananthapadmanabha, T. V., & Yegnanarayana, B. (1979). Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(4), 309–319.

    Article  Google Scholar 

  • Atal, B. S., & Hanauer, S. L. (1971). Speech analysis and synthesis by linear prediction of the speech wave. The Journal of the Acoustical Society of America, 50(2), 637–655.

    Article  Google Scholar 

  • Bahoura, M., & Rouat, J. (2001). Wavelet speech enhancement based on the Teager energy operator. IEEE Signal Processing Letters, 8(1), 10–12.

    Article  Google Scholar 

  • Boudraa, A. O., Cexus, J. C., & Karim, A. M. (2008). Cross Ψ B -energy operator based signal detection. The Journal of the Acoustical Society of America, 123(6), 4283–4289.

    Article  Google Scholar 

  • Brookes, M. (2006) Voicebox: A speech processing toolbox for MATLAB. [Online]. A vailable: http://www.ee.imperial.ac.uk/hp/staff/dmb/voicebox/voicebox.html.

  • Cairns, D. A., & Hansen, J. H. L. (1996). A noninvasive technique for detecting hypernasal speech using a nonlinear operator. IEEE Transactions on Signal Processing, 43(1), 35–44.

    Google Scholar 

  • Cairns, D. A., Hansen, J. H. L., & Kaiser, J. F. (1996). Recent advances in hypernasal speech detection using the nonlinear Teager energy operator. In Proc. int. conf. spoken lang. process., ICSLP (Vol. 2, pp. 780–783).

    Chapter  Google Scholar 

  • “CMU-ARCTIC Speech Synthesis Databases.” [Online]. Available: http://festvox.org/cmu_arctic/index.html.

  • Dorman, M. F., Raphael, L. J., & Liberman, A. M. (1979). Some experiments on the sound of silence in phonetic perception. The Journal of the Acoustical Society of America, 65(6), 1518–1532.

    Article  Google Scholar 

  • Hamila, R., Lohan, S., & Renfors, M. (2003). Subchip multipath delay estimation for downlink WCDMA system based on Teager operator. IEEE Communications Letters, 7(1), 1–3.

    Article  Google Scholar 

  • Jabloun, F., Cetin, A. E., & Erzin, E. (1999). Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Processing Letters, 6(10), 259–261.

    Article  Google Scholar 

  • Kaiser, J. F. (1990). On a simple algorithm to calculate the ‘energy’ of a signal. In Proc. IEEE int. conf. acoustics, speech, and signal processing, Albuquerque, NM (Vol. 1, pp. 381–384).

    Google Scholar 

  • Kaushik, L., & Shaughnessy, D. (2009). A novel method for epoch extraction from speech signals. In Interspeech 2009, Brighton, UK (pp. 2883–2886).

    Google Scholar 

  • Kominek, J., & Black, A. (2004). The CMU-Arctic speech databases. In 5th ISCA Speech Synthesis Workshop, Pittsburgh, PA (pp. 223–224).

    Google Scholar 

  • Maragos, P., Kaiser, J. F., & Quatieri, T. F. (1991). Speech nonlinearities, modulations and energy operators. In Proc. int. conf. acoustics, speech, and signal processing, Toronto, Canada (pp. 421–424).

    Google Scholar 

  • Markel, J. E., & Gray, A. H. (1982). Linear prediction of speech. New York: Springer.

    Google Scholar 

  • Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613.

    Article  Google Scholar 

  • Murty, K. S. R., Yegnanarayana, B., & Joseph, M. A. (2009). Characterization of glottal activity from speech signals. IEEE Signal Processing Letters, 16(6), 469–472.

    Article  Google Scholar 

  • Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43.

    Article  Google Scholar 

  • Ney, H. (1981). A dynamic programming technique for non-linear smoothing. In Proc. IEEE int. conf. acoust, speech, signal processing (pp. 62–65).

    Google Scholar 

  • NOISEX-92 [Online]. Available: http://www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html.

  • Patil, H. A., & Parhi, K. (2010a). Novel variable length Teager energy based features for person recognition from their Hum. In IEEE international conference on acoustics speech and signal processing (ICASSP) (pp. 4526–4529).

    Google Scholar 

  • Patil, H. A., & Parhi, K. (2010b). Development of TEO phase for speaker recognition. In Proc. of int. conf. on signal process. and Comm, SPCOM’10, Bangalore (pp. 1–5).

    Chapter  Google Scholar 

  • Quatieri, T. F. (2002). Discrete-time speech signal processing: Principles and practices. Upper Saddle River: Pearson Education.

    Google Scholar 

  • Shikhah, N., & Deriche, M. (1999). A novel pitch estimation technique using the Teager energy function. In Int. symposium on signal process. and its applications, ISSPA, Brisbane, Australia (pp. 135–138).

    Chapter  Google Scholar 

  • Sinder, D. J. (1999). Speech synthesis using an aeroacoustic fricative model. Ph.D. Thesis, Rutgers University, New Brunswick, NJ.

  • Smits, R., & Yegnanarayana, B. (1995). Determination of instants of significant excitation in speech using group delay function. IEEE Transactions on Speech and Audio Processing, 3, 325–333.

    Article  Google Scholar 

  • Strube, H. W. (1974). Determination of the instant of glottal closures from the speech wave. The Journal of the Acoustical Society of America, 56, 1625–1629.

    Article  Google Scholar 

  • Teager, H. M., & Teager, S. M. (1990). Evidence for nonlinear sound production mechanisms in the vocal tract. In W. J. Hardcastle & A. Marchal (Eds.), Speech production and speech modeling (pp. 241–261). Dordrecht: Kluwer Academic.

    Google Scholar 

  • Veenemanand, D., & BeMent, S. (1985). Automatic glottal inverse filtering from speech and electroglottographic signals. IEEE Transactions on Signal Processing, SP-33(4), 369–377.

    Google Scholar 

  • Yegnanarayana, B., & Veldhuis, R. N. J. (1998). Extraction of vocal tract system characteristics from speech signals. IEEE Transactions on Speech and Audio Processing, 6(4), 313–327.

    Article  Google Scholar 

  • Zhou, G., Hansen, J. H. L., & Kaiser, J. F. (2001). Non linear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 9(3), 201–216.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hemant A. Patil.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Patil, H.A., Viswanath, S. Effectiveness of Teager energy operator for epoch detection from speech signals. Int J Speech Technol 14, 321–337 (2011). https://doi.org/10.1007/s10772-011-9110-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-011-9110-8

Keywords

Navigation