Skip to main content
Log in

Toward Improving the Performance of Epoch Extraction from Telephonic Speech

  • Short Paper
  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Epoch is an abrupt closure event within a glottal cycle at which significant excitation to the vocal-tract system happens during the production of voiced speech. The state-of-the-art zero frequency filtering technique is a simple and efficient method that shows robustness in extracting the epochs from clean speech. However, this method has shown poor performance for telephonic quality speech, due to the presence of spurious zero crossings in epoch evidence, which leads to a high false alarm rate. Recently, zero-phase zero frequency resonator (ZP-ZFR) an alternative to zero frequency filter is proposed for stable implementation of zero frequency filtering technique. In this study, higher-order ZP-ZFR is investigated to improve the performance of zero frequency filtering for epoch extraction from telephonic speech. The performance of the proposed ZP-ZFR method is quantitatively evaluated on telephonic speech simulated using six standard databases having simultaneous electroglottograph recordings as ground truth. Experimental results suggest that the performance of the proposed method is significantly better than the state-of-the-art methods in terms of identification rate and false alarm rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Availability of data

The current study used the publicly available datasets for the analysis. The datasets are available in APLAWDW repository: http://www.commsp.ee.ic.ac.uk/sap/resources/aplawdw/ and CMU Arctic repository: http://www.festvox.org/cmu_arctic/index.html.

Notes

  1. The APLAWDW database is available in https://www.commsp.ee.ic.ac.uk/~sap/resources/aplawdw/.

  2. The covarap toolbox is an open source repository of advanced speech processing algorithms, and it can be obtained from https://github.com/covarep/covarep.git.

References

  1. M. Airaksinen, T. Raitio, B. Story, P. Alku, Quasi closed phase glottal inverse filtering analysis with weighted linear prediction. IEEE/ACM Trans. Audio Speech Lang. Process. 22(3), 596–607 (2014)

    Article  Google Scholar 

  2. P. Alku, Glottal inverse filtering analysis of human voice production-a review of estimation and parameterization methods of the glottal excitation and their applications. Sadhana 36(5), 623–650 (2011)

    Article  Google Scholar 

  3. T. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction of voiced speech. IEEE Trans. Acoust. Speech Signal Process. 23(6), 562–570 (1975)

    Article  Google Scholar 

  4. J.P. Cabral, K. Richmond, J. Yamagishi, S. Renals, Glottal spectral separation for speech synthesis. IEEE J. Sel. Top. Signal Process. 8(2), 195–208 (2014)

    Article  Google Scholar 

  5. T. Drugman, A. Alwan, Joint robust voicing detection and pitch estimation based on residual harmonics, in Proceedings of Interspeech, pp. 1973–1976 (2011)

  6. T. Drugman, T. Dutoit, Glottal closure and opening instant detection from speech signals, in Proceedings of interspeech, pp. 2891–2894 (2009)

  7. T. Drugman, M. Thomas, J. Gudnason, P. Naylor, T. Dutoit, Detection of glottal closure instants from speech signals: a quantitative review. IEEE Trans. Audio Speech Lang. Process. 20(3), 994–1006 (2011)

    Article  Google Scholar 

  8. B.R. Gerratt, J. Kreiman, M. Garellek, Comparing measures of voice quality from sustained phonation and continuous speech. J. Speech Lang. Hear. Res. 59(5), 994–1001 (2016)

    Article  Google Scholar 

  9. P. Gómez-Vilda, R. Fernández-Baillo, V. Rodellar-Biarge, V.N. Lluis, A. Álvarez-Marquina, L.M. Mazaira-Fernández, R. Martínez-Olalla, J.I. Godino-Llorente, Glottal source biometrical signature for voice pathology detection. Speech Commun. 51(9), 759–781 (2009)

    Article  Google Scholar 

  10. K. Gurugubelli, A.K. Vuppala, Stable implementation of zero frequency filtering of speech signals for efficient epoch extraction. IEEE Signal Process. Lett. 26(9), 1310–1314 (2019)

    Article  Google Scholar 

  11. S.R. Kadiri, A quantitative comparison of epoch extraction algorithms for telephone speech, in Proceedings of IEEE ICASSP, pp. 6500–6504 (2019)

  12. J. Kane, C. Gobl, Evaluation of glottal closure instant detection in a range of voice qualities. Speech Commun. 55(2), 295–314 (2013)

    Article  Google Scholar 

  13. Y.M. Keerthana, M.K. Reddy, K.S. Rao, Cwt-based approach for epoch extraction from telephone quality speech. IEEE Signal Process. Lett. 26(8), 1107–1111 (2019)

    Article  Google Scholar 

  14. J. Kominek, A.W. Black, The CMU Arctic speech databases, in Proceedings of 5th ISCA speech synthesis workshop, pp. 223–224 (2004)

  15. A. Kounoudes, P.A. Naylor, M. Brookes, The DYPSA algorithm for estimation of glottal closure instants in voiced speech, in Proceedings of IEEE ICASSP, pp. 349–352 (2002)

  16. A.I. Koutrouvelis, G.P. Kafentzis, N.D. Gaubitch, R. Heusdens, A fast method for high-resolution voiced/unvoiced detection and glottal closure/opening instant estimation of speech. IEEE/ACM Trans. Audio Speech Lang Process. 24(2), 316–328 (2016)

    Article  Google Scholar 

  17. K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)

    Article  Google Scholar 

  18. P.A. Naylor, A. Kounoudes, J. Gudnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the dypsa algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007)

    Article  Google Scholar 

  19. S.M. Prasanna, D. Govind, Analysis of excitation source information in emotional speech, in Proceedings of Interspeech, pp. 781–784 (2010)

  20. A. Prathosh, T. Ananthapadmanabha, A. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans. Audio Speech Lang. Process. 21(12), 2471–2480 (2013)

    Article  Google Scholar 

  21. K.S. Rao, S.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function. IEEE Signal Process. Lett. 14(10), 762–765 (2007)

    Article  Google Scholar 

  22. K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14(3), 972–980 (2006)

    Article  Google Scholar 

  23. M.R. Thomas, J. Gudnason, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Trans. Audio Speech Lang. Process. 20(1), 82–91 (2012)

    Article  Google Scholar 

  24. K. Vijayan, K.S.R. Murty, Epoch extraction from all pass residual of speech signals, in Proceedings of IEEE ICASSP, pp. 1493–1497 (2014)

  25. K. Vijayan, K.S.R. Murty, Epoch extraction by phase modelling of speech signals. Circuits Syst. Signal Process. 35(7), 2584–2609 (2016)

    Article  Google Scholar 

  26. C. Vikram, S.M. Prasanna, Epoch extraction from telephone quality speech using single pole filter. IEEE/ACM Trans. Audio Speech Lang. Process. 25(3), 624–636 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers, and editor-in-chief M. N. S. Swamy for their support and constructive criticisms, which helped us to improve the quality of this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krishna Gurugubelli.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gurugubelli, K., Javid, M.H., Alluri, K.N.R.K.R. et al. Toward Improving the Performance of Epoch Extraction from Telephonic Speech. Circuits Syst Signal Process 40, 2050–2064 (2021). https://doi.org/10.1007/s00034-020-01551-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-020-01551-2

Keywords

Navigation