Abstract
Epoch is an abrupt closure event within a glottal cycle at which significant excitation to the vocal-tract system happens during the production of voiced speech. The state-of-the-art zero frequency filtering technique is a simple and efficient method that shows robustness in extracting the epochs from clean speech. However, this method has shown poor performance for telephonic quality speech, due to the presence of spurious zero crossings in epoch evidence, which leads to a high false alarm rate. Recently, zero-phase zero frequency resonator (ZP-ZFR) an alternative to zero frequency filter is proposed for stable implementation of zero frequency filtering technique. In this study, higher-order ZP-ZFR is investigated to improve the performance of zero frequency filtering for epoch extraction from telephonic speech. The performance of the proposed ZP-ZFR method is quantitatively evaluated on telephonic speech simulated using six standard databases having simultaneous electroglottograph recordings as ground truth. Experimental results suggest that the performance of the proposed method is significantly better than the state-of-the-art methods in terms of identification rate and false alarm rate.
Similar content being viewed by others
Availability of data
The current study used the publicly available datasets for the analysis. The datasets are available in APLAWDW repository: http://www.commsp.ee.ic.ac.uk/sap/resources/aplawdw/ and CMU Arctic repository: http://www.festvox.org/cmu_arctic/index.html.
Notes
The APLAWDW database is available in https://www.commsp.ee.ic.ac.uk/~sap/resources/aplawdw/.
The covarap toolbox is an open source repository of advanced speech processing algorithms, and it can be obtained from https://github.com/covarep/covarep.git.
References
M. Airaksinen, T. Raitio, B. Story, P. Alku, Quasi closed phase glottal inverse filtering analysis with weighted linear prediction. IEEE/ACM Trans. Audio Speech Lang. Process. 22(3), 596–607 (2014)
P. Alku, Glottal inverse filtering analysis of human voice production-a review of estimation and parameterization methods of the glottal excitation and their applications. Sadhana 36(5), 623–650 (2011)
T. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction of voiced speech. IEEE Trans. Acoust. Speech Signal Process. 23(6), 562–570 (1975)
J.P. Cabral, K. Richmond, J. Yamagishi, S. Renals, Glottal spectral separation for speech synthesis. IEEE J. Sel. Top. Signal Process. 8(2), 195–208 (2014)
T. Drugman, A. Alwan, Joint robust voicing detection and pitch estimation based on residual harmonics, in Proceedings of Interspeech, pp. 1973–1976 (2011)
T. Drugman, T. Dutoit, Glottal closure and opening instant detection from speech signals, in Proceedings of interspeech, pp. 2891–2894 (2009)
T. Drugman, M. Thomas, J. Gudnason, P. Naylor, T. Dutoit, Detection of glottal closure instants from speech signals: a quantitative review. IEEE Trans. Audio Speech Lang. Process. 20(3), 994–1006 (2011)
B.R. Gerratt, J. Kreiman, M. Garellek, Comparing measures of voice quality from sustained phonation and continuous speech. J. Speech Lang. Hear. Res. 59(5), 994–1001 (2016)
P. Gómez-Vilda, R. Fernández-Baillo, V. Rodellar-Biarge, V.N. Lluis, A. Álvarez-Marquina, L.M. Mazaira-Fernández, R. Martínez-Olalla, J.I. Godino-Llorente, Glottal source biometrical signature for voice pathology detection. Speech Commun. 51(9), 759–781 (2009)
K. Gurugubelli, A.K. Vuppala, Stable implementation of zero frequency filtering of speech signals for efficient epoch extraction. IEEE Signal Process. Lett. 26(9), 1310–1314 (2019)
S.R. Kadiri, A quantitative comparison of epoch extraction algorithms for telephone speech, in Proceedings of IEEE ICASSP, pp. 6500–6504 (2019)
J. Kane, C. Gobl, Evaluation of glottal closure instant detection in a range of voice qualities. Speech Commun. 55(2), 295–314 (2013)
Y.M. Keerthana, M.K. Reddy, K.S. Rao, Cwt-based approach for epoch extraction from telephone quality speech. IEEE Signal Process. Lett. 26(8), 1107–1111 (2019)
J. Kominek, A.W. Black, The CMU Arctic speech databases, in Proceedings of 5th ISCA speech synthesis workshop, pp. 223–224 (2004)
A. Kounoudes, P.A. Naylor, M. Brookes, The DYPSA algorithm for estimation of glottal closure instants in voiced speech, in Proceedings of IEEE ICASSP, pp. 349–352 (2002)
A.I. Koutrouvelis, G.P. Kafentzis, N.D. Gaubitch, R. Heusdens, A fast method for high-resolution voiced/unvoiced detection and glottal closure/opening instant estimation of speech. IEEE/ACM Trans. Audio Speech Lang Process. 24(2), 316–328 (2016)
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
P.A. Naylor, A. Kounoudes, J. Gudnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the dypsa algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007)
S.M. Prasanna, D. Govind, Analysis of excitation source information in emotional speech, in Proceedings of Interspeech, pp. 781–784 (2010)
A. Prathosh, T. Ananthapadmanabha, A. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans. Audio Speech Lang. Process. 21(12), 2471–2480 (2013)
K.S. Rao, S.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function. IEEE Signal Process. Lett. 14(10), 762–765 (2007)
K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14(3), 972–980 (2006)
M.R. Thomas, J. Gudnason, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm. IEEE Trans. Audio Speech Lang. Process. 20(1), 82–91 (2012)
K. Vijayan, K.S.R. Murty, Epoch extraction from all pass residual of speech signals, in Proceedings of IEEE ICASSP, pp. 1493–1497 (2014)
K. Vijayan, K.S.R. Murty, Epoch extraction by phase modelling of speech signals. Circuits Syst. Signal Process. 35(7), 2584–2609 (2016)
C. Vikram, S.M. Prasanna, Epoch extraction from telephone quality speech using single pole filter. IEEE/ACM Trans. Audio Speech Lang. Process. 25(3), 624–636 (2017)
Acknowledgements
The authors would like to thank the anonymous reviewers, and editor-in-chief M. N. S. Swamy for their support and constructive criticisms, which helped us to improve the quality of this article.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gurugubelli, K., Javid, M.H., Alluri, K.N.R.K.R. et al. Toward Improving the Performance of Epoch Extraction from Telephonic Speech. Circuits Syst Signal Process 40, 2050–2064 (2021). https://doi.org/10.1007/s00034-020-01551-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-020-01551-2