Abstract
In this work, we present an algorithm for voiced/unvoiced decision and pitch estimation from speech signals. Our approach is based on classifying the peaks provided by the autocorrelation of the speech multi-scale product. The multi-scale product is based on making the product of the speech wavelet transform coefficients at three successive dyadic scales. The autocorrelation function of the multi-scale product is calculated over frames of a specific length. The experimental results show the robustness and the effectiveness of our approach. Besides, the proposed method outperforms some existing algorithms in a clean and noisy environment.
Similar content being viewed by others
References
Qi Y, Hunt BR. Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier. IEEE Trans Speech Audio Process. 1993;1(2):250–6.
Martin A, Charlet D, Mauuary L. Robust speech/non-speech detection using LDA applied to MFCC. IEEE Int Conf Acoust Speech Signal Process. 2001;1:237–40.
Shaughnessy DO. Speech communications: human and machine. 2nd ed. Piscataway, NJ: IEEE Press; 1999.
Childers DG, Hahn M, Larar JN. Silent and voiced/unvoiced/mixed excitation classification of speech. IEEE Trans Acoust Speech Signal Process. 1989;37(11):1771–4.
Liao L, Gregory M. Algorithms for speech classification. IEEE Int Conf Signal Process Appl. 1999;2:623–7.
Hess W. Pitch determination of speech signals: algorithms and devices. New York: Springer; 1983.
Bagshaw PC, Hiller SM, Jack MA. Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching. In: The 3rd European conference on speech communication and technology; 1993.
Talkin D. A robust algorithm for pitch tracking. In: Kleijn WB, Paliwal KK, editors. Speech coding and synthesis. Amsterdam: Elsevier; 1995. p. 495–518.
Rabiner L. On the use of autocorrelation analysis for pitch detection. IEEE Trans Acoust Speech Signal Process. 1977;25(1):24–33.
Krubsack DA, Niederjohn RJ. An autocorrelation pitch detector and voicing decision with confidence measures developed for noise-corrupted speech. IEEE Trans Signal Process. 1991;39(2):319–29.
De Cheveigné A, Kawahara H. YIN, a fundamental frequency estimator for speech and music. J Acoust Soc Amer. 2002;111(4):1917–30.
Boersma P. Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Proc Inst Phon Sci. 1993;17:97–110.
Noll AM. Cepstrum pitch determination. J Acoust Soc Amer. 1967;41(2):293–309.
Shimamura T, Takagi H. Noise-robust fundamental frequency extraction method based on exponentiated band-limited amplitude spectrum. IEEE Int Conf Midwest Symposium on Circuits and Systems. 2004;47(2):141–4.
Shahnaz C, Zhu WP, Ahmad MO. A spectro-temporal algorithm for pitch frequency estimation from noisy observations. In: IEEE international symposium on circuits and systems. Seattle, WA; 2008. p. 1704–7.
Ben Messaoud MA, Bouzid A, Ellouze N. Spectral multi-scale product analysis for pitch estimation from noisy speech signal. In: Solé-Casals J, Zaiats V, editors. Advances on non-linear speech processing, International conference on non-linear speech processing, NOLISP’09, LNAI, vol. 5933. Berlin: Springer; 2010. p. 95–102.
Ben Messaoud MA, Bouzid A, Ellouze N. A new method for pitch tracking and voicing decision based on spectral multi-scale analysis. Signal Process: An Int J. 2009;3(5):144–9.
Burrus CS, Gopinath RA, Guo H. Introduction to wavelets and wavelet transforms: a primer. Englewood Cliffs: Prentice Hall; 1998.
Mallat S. A wavelet tour of signal processing: the sparse way. 3rd ed. Burlington, VT: Academic Press; 2008.
Berman Z, Baras JS. Properties of the multiscale maxima and zero-crossings representations. IEEE Trans Signal Process. 1993;41(12):3216–31.
Kadambe S, Boudreaux-Bartels GF. Application of the wavelet transform for pitch detection of speech signals. IEEE Trans Inf Theory. 1992;38(2):917–8.
Bouzid A, Ellouze N. Voice source parameter measurement based on multi-scale analysis of electroglottographic signal. Speech Commun. 2009;51(9):782–92.
Bouzid A, Ellouze N. Open quotient measurements based on multiscale product of speech signal wavelet transform. New York: Hindawi Publishing Corp, Res Lett Signal Process; 2007. p. 1–6.
Xu Y, Weaver JB, Healy DM, Lu J. Wavelet transform domain filters: a spatially selective noise filtration technique. IEEE Trans Image Process. 1994;3(6):747–58.
Sadler BM, Swami A. Analysis of multi-scale products for step detection and estimation. IEEE Trans Inf Theory. 1999;45(3):1043–9.
Meyer G, Plante F, Ainsworth WA. A pitch extraction reference database. The 4th European conference on speech communication and technology, EUROSPEECH. Madrid, Spain; 1995. p. 837–40.
Keele Pitch Database. In: Psychology Home page-human machine perception. University of Liverpool. 1995. http://www.liv.ac.uk/Psychology/hmp/projects/pitch/speech/keele_pitch_database.html. Accessed 24 April 2010.
Joho D, Bennewitz M, Behnke S. Pitch estimation using models of voiced speech on three levels. IEEE Int Conf Acoust Speech Signal Process. 2007;4:1077–80.
Sha F, Saul LK. Real time pitch determination of one or more voices by nonnegative matrix factorization. In: Saul LK, Weiss Y, Bottou L, editors. Advances in neural information processing systems. Cambridge: MIT Press; 2005. p. 1233–40.
Sha F, Burgoyne JA, Saul LK. Multiband statistical learning for f0 estimation in speech. IEEE Int Conf Acoust Speech Signal process. 2004;5:661–4.
Nakatani T, Irino T. Robust and accurate fundamental frequency estimation based on dominant harmonic components. J Acoust Soc Amer. 2004;116(6):3690–700.
Noisex92. In: Signal Processing Information Base (SPIB). The Signal Processing Society and the National Science Foundation. 2007. http://spib.rice.edu/spib/select_noise.html. Accessed 24 April 2010.
Acknowledgments
The authors are very grateful to Alain de Cheveigné for providing the fundamental frequency estimation software (Yin algorithm).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ben Messaoud, M.A., Bouzid, A. & Ellouze, N. Autocorrelation of the Speech Multi-Scale Product for Voicing Decision and Pitch Estimation. Cogn Comput 2, 151–159 (2010). https://doi.org/10.1007/s12559-010-9048-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-010-9048-1