Skip to main content
Log in

Autocorrelation of the Speech Multi-Scale Product for Voicing Decision and Pitch Estimation

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

In this work, we present an algorithm for voiced/unvoiced decision and pitch estimation from speech signals. Our approach is based on classifying the peaks provided by the autocorrelation of the speech multi-scale product. The multi-scale product is based on making the product of the speech wavelet transform coefficients at three successive dyadic scales. The autocorrelation function of the multi-scale product is calculated over frames of a specific length. The experimental results show the robustness and the effectiveness of our approach. Besides, the proposed method outperforms some existing algorithms in a clean and noisy environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Qi Y, Hunt BR. Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier. IEEE Trans Speech Audio Process. 1993;1(2):250–6.

    Article  Google Scholar 

  2. Martin A, Charlet D, Mauuary L. Robust speech/non-speech detection using LDA applied to MFCC. IEEE Int Conf Acoust Speech Signal Process. 2001;1:237–40.

    Google Scholar 

  3. Shaughnessy DO. Speech communications: human and machine. 2nd ed. Piscataway, NJ: IEEE Press; 1999.

    Google Scholar 

  4. Childers DG, Hahn M, Larar JN. Silent and voiced/unvoiced/mixed excitation classification of speech. IEEE Trans Acoust Speech Signal Process. 1989;37(11):1771–4.

    Article  Google Scholar 

  5. Liao L, Gregory M. Algorithms for speech classification. IEEE Int Conf Signal Process Appl. 1999;2:623–7.

    Google Scholar 

  6. Hess W. Pitch determination of speech signals: algorithms and devices. New York: Springer; 1983.

    Google Scholar 

  7. Bagshaw PC, Hiller SM, Jack MA. Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching. In: The 3rd European conference on speech communication and technology; 1993.

  8. Talkin D. A robust algorithm for pitch tracking. In: Kleijn WB, Paliwal KK, editors. Speech coding and synthesis. Amsterdam: Elsevier; 1995. p. 495–518.

    Google Scholar 

  9. Rabiner L. On the use of autocorrelation analysis for pitch detection. IEEE Trans Acoust Speech Signal Process. 1977;25(1):24–33.

    Article  Google Scholar 

  10. Krubsack DA, Niederjohn RJ. An autocorrelation pitch detector and voicing decision with confidence measures developed for noise-corrupted speech. IEEE Trans Signal Process. 1991;39(2):319–29.

    Article  Google Scholar 

  11. De Cheveigné A, Kawahara H. YIN, a fundamental frequency estimator for speech and music. J Acoust Soc Amer. 2002;111(4):1917–30.

    Article  Google Scholar 

  12. Boersma P. Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Proc Inst Phon Sci. 1993;17:97–110.

    Google Scholar 

  13. Noll AM. Cepstrum pitch determination. J Acoust Soc Amer. 1967;41(2):293–309.

    Article  CAS  Google Scholar 

  14. Shimamura T, Takagi H. Noise-robust fundamental frequency extraction method based on exponentiated band-limited amplitude spectrum. IEEE Int Conf Midwest Symposium on Circuits and Systems. 2004;47(2):141–4.

    Google Scholar 

  15. Shahnaz C, Zhu WP, Ahmad MO. A spectro-temporal algorithm for pitch frequency estimation from noisy observations. In: IEEE international symposium on circuits and systems. Seattle, WA; 2008. p. 1704–7.

  16. Ben Messaoud MA, Bouzid A, Ellouze N. Spectral multi-scale product analysis for pitch estimation from noisy speech signal. In: Solé-Casals J, Zaiats V, editors. Advances on non-linear speech processing, International conference on non-linear speech processing, NOLISP’09, LNAI, vol. 5933. Berlin: Springer; 2010. p. 95–102.

    Google Scholar 

  17. Ben Messaoud MA, Bouzid A, Ellouze N. A new method for pitch tracking and voicing decision based on spectral multi-scale analysis. Signal Process: An Int J. 2009;3(5):144–9.

    Google Scholar 

  18. Burrus CS, Gopinath RA, Guo H. Introduction to wavelets and wavelet transforms: a primer. Englewood Cliffs: Prentice Hall; 1998.

    Google Scholar 

  19. Mallat S. A wavelet tour of signal processing: the sparse way. 3rd ed. Burlington, VT: Academic Press; 2008.

    Google Scholar 

  20. Berman Z, Baras JS. Properties of the multiscale maxima and zero-crossings representations. IEEE Trans Signal Process. 1993;41(12):3216–31.

    Article  Google Scholar 

  21. Kadambe S, Boudreaux-Bartels GF. Application of the wavelet transform for pitch detection of speech signals. IEEE Trans Inf Theory. 1992;38(2):917–8.

    Article  Google Scholar 

  22. Bouzid A, Ellouze N. Voice source parameter measurement based on multi-scale analysis of electroglottographic signal. Speech Commun. 2009;51(9):782–92.

    Article  Google Scholar 

  23. Bouzid A, Ellouze N. Open quotient measurements based on multiscale product of speech signal wavelet transform. New York: Hindawi Publishing Corp, Res Lett Signal Process; 2007. p. 1–6.

  24. Xu Y, Weaver JB, Healy DM, Lu J. Wavelet transform domain filters: a spatially selective noise filtration technique. IEEE Trans Image Process. 1994;3(6):747–58.

    Article  CAS  PubMed  Google Scholar 

  25. Sadler BM, Swami A. Analysis of multi-scale products for step detection and estimation. IEEE Trans Inf Theory. 1999;45(3):1043–9.

    Article  Google Scholar 

  26. Meyer G, Plante F, Ainsworth WA. A pitch extraction reference database. The 4th European conference on speech communication and technology, EUROSPEECH. Madrid, Spain; 1995. p. 837–40.

  27. Keele Pitch Database. In: Psychology Home page-human machine perception. University of Liverpool. 1995. http://www.liv.ac.uk/Psychology/hmp/projects/pitch/speech/keele_pitch_database.html. Accessed 24 April 2010.

  28. Joho D, Bennewitz M, Behnke S. Pitch estimation using models of voiced speech on three levels. IEEE Int Conf Acoust Speech Signal Process. 2007;4:1077–80.

    Google Scholar 

  29. Sha F, Saul LK. Real time pitch determination of one or more voices by nonnegative matrix factorization. In: Saul LK, Weiss Y, Bottou L, editors. Advances in neural information processing systems. Cambridge: MIT Press; 2005. p. 1233–40.

    Google Scholar 

  30. Sha F, Burgoyne JA, Saul LK. Multiband statistical learning for f0 estimation in speech. IEEE Int Conf Acoust Speech Signal process. 2004;5:661–4.

    Google Scholar 

  31. Nakatani T, Irino T. Robust and accurate fundamental frequency estimation based on dominant harmonic components. J Acoust Soc Amer. 2004;116(6):3690–700.

    Article  Google Scholar 

  32. Noisex92. In: Signal Processing Information Base (SPIB). The Signal Processing Society and the National Science Foundation. 2007. http://spib.rice.edu/spib/select_noise.html. Accessed 24 April 2010.

Download references

Acknowledgments

The authors are very grateful to Alain de Cheveigné for providing the fundamental frequency estimation software (Yin algorithm).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Anouar Ben Messaoud.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ben Messaoud, M.A., Bouzid, A. & Ellouze, N. Autocorrelation of the Speech Multi-Scale Product for Voicing Decision and Pitch Estimation. Cogn Comput 2, 151–159 (2010). https://doi.org/10.1007/s12559-010-9048-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-010-9048-1

Keywords

Navigation