Effectiveness of Teager energy operator for epoch detection from speech signals

Patil, Hemant A.; Viswanath, Srikant

doi:10.1007/s10772-011-9110-8

Effectiveness of Teager energy operator for epoch detection from speech signals

Published: 23 September 2011

Volume 14, pages 321–337, (2011)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Hemant A. Patil¹ &
Srikant Viswanath¹

333 Accesses
12 Citations
Explore all metrics

Abstract

In this paper, we try to present the problem of epoch detection from a different perspective that not only deals with estimation of epoch instances (i.e., glottal activity) but also with quantification of the absence of epochs (i.e., no glottal activity) in the unvoiced regions of speech signal. Most of the epoch detection methods perform significantly well in the voiced regions of speech but are not robust enough in the unvoiced regions of speech, i.e., they detect a number of pseudo epochs in the unvoiced regions of speech. We propose a simple method based on Teager Energy Operator (TEO) which not only determines the epochs in voiced region (due to its superior temporal resolution and its ability to capture airflow properties through the glottis) but also is very effective in unvoiced region. Recently proposed methods such as 0-Hz resonator-based method and DYPSA method gave a combined rate (CR) (for detecting epochs in voiced and unvoiced regions of speech) of 74.7% and 60%, respectively and a pseudo epoch rate (PER) (i.e., spurious epochs in the unvoiced regions of speech) of 62.9% and 54.04%, respectively. On the other hand, our proposed method gave a CR and PER of 87% and 0.27%, respectively. This result suggests that the proposed method captures glottal activity more efficiently both in voiced and unvoiced regions of speech signal. The performance of the proposed method is demonstrated using publicly available CMU-Arctic database using the epoch information from the electro-glottograph (EGG) as reference signal to serve as ground truth for estimation of glottal closure instants (GCI). Due to the noise suppression capability of TEO, the proposed method has almost no or little effect (i.e., robust) against signal degradations like white, babble, high frequency and vehicle noises as compared to 0-Hz resonator and DYPSA methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ananthapadmanabha, T. V., & Yegnanarayana, B. (1979). Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(4), 309–319.
Article Google Scholar
Atal, B. S., & Hanauer, S. L. (1971). Speech analysis and synthesis by linear prediction of the speech wave. The Journal of the Acoustical Society of America, 50(2), 637–655.
Article Google Scholar
Bahoura, M., & Rouat, J. (2001). Wavelet speech enhancement based on the Teager energy operator. IEEE Signal Processing Letters, 8(1), 10–12.
Article Google Scholar
Boudraa, A. O., Cexus, J. C., & Karim, A. M. (2008). Cross Ψ_B-energy operator based signal detection. The Journal of the Acoustical Society of America, 123(6), 4283–4289.
Article Google Scholar
Brookes, M. (2006) Voicebox: A speech processing toolbox for MATLAB. [Online]. A vailable: http://www.ee.imperial.ac.uk/hp/staff/dmb/voicebox/voicebox.html.
Cairns, D. A., & Hansen, J. H. L. (1996). A noninvasive technique for detecting hypernasal speech using a nonlinear operator. IEEE Transactions on Signal Processing, 43(1), 35–44.
Google Scholar
Cairns, D. A., Hansen, J. H. L., & Kaiser, J. F. (1996). Recent advances in hypernasal speech detection using the nonlinear Teager energy operator. In Proc. int. conf. spoken lang. process., ICSLP (Vol. 2, pp. 780–783).
Chapter Google Scholar
“CMU-ARCTIC Speech Synthesis Databases.” [Online]. Available: http://festvox.org/cmu_arctic/index.html.
Dorman, M. F., Raphael, L. J., & Liberman, A. M. (1979). Some experiments on the sound of silence in phonetic perception. The Journal of the Acoustical Society of America, 65(6), 1518–1532.
Article Google Scholar
Hamila, R., Lohan, S., & Renfors, M. (2003). Subchip multipath delay estimation for downlink WCDMA system based on Teager operator. IEEE Communications Letters, 7(1), 1–3.
Article Google Scholar
Jabloun, F., Cetin, A. E., & Erzin, E. (1999). Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Processing Letters, 6(10), 259–261.
Article Google Scholar
Kaiser, J. F. (1990). On a simple algorithm to calculate the ‘energy’ of a signal. In Proc. IEEE int. conf. acoustics, speech, and signal processing, Albuquerque, NM (Vol. 1, pp. 381–384).
Google Scholar
Kaushik, L., & Shaughnessy, D. (2009). A novel method for epoch extraction from speech signals. In Interspeech 2009, Brighton, UK (pp. 2883–2886).
Google Scholar
Kominek, J., & Black, A. (2004). The CMU-Arctic speech databases. In 5th ISCA Speech Synthesis Workshop, Pittsburgh, PA (pp. 223–224).
Google Scholar
Maragos, P., Kaiser, J. F., & Quatieri, T. F. (1991). Speech nonlinearities, modulations and energy operators. In Proc. int. conf. acoustics, speech, and signal processing, Toronto, Canada (pp. 421–424).
Google Scholar
Markel, J. E., & Gray, A. H. (1982). Linear prediction of speech. New York: Springer.
Google Scholar
Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613.
Article Google Scholar
Murty, K. S. R., Yegnanarayana, B., & Joseph, M. A. (2009). Characterization of glottal activity from speech signals. IEEE Signal Processing Letters, 16(6), 469–472.
Article Google Scholar
Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43.
Article Google Scholar
Ney, H. (1981). A dynamic programming technique for non-linear smoothing. In Proc. IEEE int. conf. acoust, speech, signal processing (pp. 62–65).
Google Scholar
NOISEX-92 [Online]. Available: http://www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html.
Patil, H. A., & Parhi, K. (2010a). Novel variable length Teager energy based features for person recognition from their Hum. In IEEE international conference on acoustics speech and signal processing (ICASSP) (pp. 4526–4529).
Google Scholar
Patil, H. A., & Parhi, K. (2010b). Development of TEO phase for speaker recognition. In Proc. of int. conf. on signal process. and Comm, SPCOM’10, Bangalore (pp. 1–5).
Chapter Google Scholar
Quatieri, T. F. (2002). Discrete-time speech signal processing: Principles and practices. Upper Saddle River: Pearson Education.
Google Scholar
Shikhah, N., & Deriche, M. (1999). A novel pitch estimation technique using the Teager energy function. In Int. symposium on signal process. and its applications, ISSPA, Brisbane, Australia (pp. 135–138).
Chapter Google Scholar
Sinder, D. J. (1999). Speech synthesis using an aeroacoustic fricative model. Ph.D. Thesis, Rutgers University, New Brunswick, NJ.
Smits, R., & Yegnanarayana, B. (1995). Determination of instants of significant excitation in speech using group delay function. IEEE Transactions on Speech and Audio Processing, 3, 325–333.
Article Google Scholar
Strube, H. W. (1974). Determination of the instant of glottal closures from the speech wave. The Journal of the Acoustical Society of America, 56, 1625–1629.
Article Google Scholar
Teager, H. M., & Teager, S. M. (1990). Evidence for nonlinear sound production mechanisms in the vocal tract. In W. J. Hardcastle & A. Marchal (Eds.), Speech production and speech modeling (pp. 241–261). Dordrecht: Kluwer Academic.
Google Scholar
Veenemanand, D., & BeMent, S. (1985). Automatic glottal inverse filtering from speech and electroglottographic signals. IEEE Transactions on Signal Processing, SP-33(4), 369–377.
Google Scholar
Yegnanarayana, B., & Veldhuis, R. N. J. (1998). Extraction of vocal tract system characteristics from speech signals. IEEE Transactions on Speech and Audio Processing, 6(4), 313–327.
Article Google Scholar
Zhou, G., Hansen, J. H. L., & Kaiser, J. F. (2001). Non linear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 9(3), 201–216.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar, Gujarat, India
Hemant A. Patil & Srikant Viswanath

Authors

Hemant A. Patil
View author publications
You can also search for this author in PubMed Google Scholar
Srikant Viswanath
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hemant A. Patil.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Patil, H.A., Viswanath, S. Effectiveness of Teager energy operator for epoch detection from speech signals. Int J Speech Technol 14, 321–337 (2011). https://doi.org/10.1007/s10772-011-9110-8

Download citation

Received: 20 June 2011
Accepted: 13 August 2011
Published: 23 September 2011
Issue Date: December 2011
DOI: https://doi.org/10.1007/s10772-011-9110-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effectiveness of Teager energy operator for epoch detection from speech signals

Abstract

Access this article

Similar content being viewed by others

Epoch Extraction Using Hilbert–Huang Transform for Identification of Closed Glottis Interval

Toward Improving the Performance of Epoch Extraction from Telephonic Speech

Epoch Extraction from Telephonic Speech Signal using Stockwell Transform

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effectiveness of Teager energy operator for epoch detection from speech signals

Abstract

Access this article

Similar content being viewed by others

Epoch Extraction Using Hilbert–Huang Transform for Identification of Closed Glottis Interval

Toward Improving the Performance of Epoch Extraction from Telephonic Speech

Epoch Extraction from Telephonic Speech Signal using Stockwell Transform

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation