Autocorrelation of the Speech Multi-Scale Product for Voicing Decision and Pitch Estimation

Ben Messaoud, Mohamed Anouar; Bouzid, Aïcha; Ellouze, Noureddine

doi:10.1007/s12559-010-9048-1

Autocorrelation of the Speech Multi-Scale Product for Voicing Decision and Pitch Estimation

Published: 26 May 2010

Volume 2, pages 151–159, (2010)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Mohamed Anouar Ben Messaoud¹,
Aïcha Bouzid¹ &
Noureddine Ellouze¹

201 Accesses
5 Citations
Explore all metrics

Abstract

In this work, we present an algorithm for voiced/unvoiced decision and pitch estimation from speech signals. Our approach is based on classifying the peaks provided by the autocorrelation of the speech multi-scale product. The multi-scale product is based on making the product of the speech wavelet transform coefficients at three successive dyadic scales. The autocorrelation function of the multi-scale product is calculated over frames of a specific length. The experimental results show the robustness and the effectiveness of our approach. Besides, the proposed method outperforms some existing algorithms in a clean and noisy environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pitch Estimation Based on the Cepstrum Analysis by the Multi Scale Product of Clean and Noisy Speech

An Efficient Method for Fundamental Frequency Determination of Noisy Speech

Pitch segmentation of speech signals based on short-time energy waveform

Article 19 September 2017

References

Qi Y, Hunt BR. Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier. IEEE Trans Speech Audio Process. 1993;1(2):250–6.
Article Google Scholar
Martin A, Charlet D, Mauuary L. Robust speech/non-speech detection using LDA applied to MFCC. IEEE Int Conf Acoust Speech Signal Process. 2001;1:237–40.
Google Scholar
Shaughnessy DO. Speech communications: human and machine. 2nd ed. Piscataway, NJ: IEEE Press; 1999.
Google Scholar
Childers DG, Hahn M, Larar JN. Silent and voiced/unvoiced/mixed excitation classification of speech. IEEE Trans Acoust Speech Signal Process. 1989;37(11):1771–4.
Article Google Scholar
Liao L, Gregory M. Algorithms for speech classification. IEEE Int Conf Signal Process Appl. 1999;2:623–7.
Google Scholar
Hess W. Pitch determination of speech signals: algorithms and devices. New York: Springer; 1983.
Google Scholar
Bagshaw PC, Hiller SM, Jack MA. Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching. In: The 3rd European conference on speech communication and technology; 1993.
Talkin D. A robust algorithm for pitch tracking. In: Kleijn WB, Paliwal KK, editors. Speech coding and synthesis. Amsterdam: Elsevier; 1995. p. 495–518.
Google Scholar
Rabiner L. On the use of autocorrelation analysis for pitch detection. IEEE Trans Acoust Speech Signal Process. 1977;25(1):24–33.
Article Google Scholar
Krubsack DA, Niederjohn RJ. An autocorrelation pitch detector and voicing decision with confidence measures developed for noise-corrupted speech. IEEE Trans Signal Process. 1991;39(2):319–29.
Article Google Scholar
De Cheveigné A, Kawahara H. YIN, a fundamental frequency estimator for speech and music. J Acoust Soc Amer. 2002;111(4):1917–30.
Article Google Scholar
Boersma P. Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Proc Inst Phon Sci. 1993;17:97–110.
Google Scholar
Noll AM. Cepstrum pitch determination. J Acoust Soc Amer. 1967;41(2):293–309.
Article CAS Google Scholar
Shimamura T, Takagi H. Noise-robust fundamental frequency extraction method based on exponentiated band-limited amplitude spectrum. IEEE Int Conf Midwest Symposium on Circuits and Systems. 2004;47(2):141–4.
Google Scholar
Shahnaz C, Zhu WP, Ahmad MO. A spectro-temporal algorithm for pitch frequency estimation from noisy observations. In: IEEE international symposium on circuits and systems. Seattle, WA; 2008. p. 1704–7.
Ben Messaoud MA, Bouzid A, Ellouze N. Spectral multi-scale product analysis for pitch estimation from noisy speech signal. In: Solé-Casals J, Zaiats V, editors. Advances on non-linear speech processing, International conference on non-linear speech processing, NOLISP’09, LNAI, vol. 5933. Berlin: Springer; 2010. p. 95–102.
Google Scholar
Ben Messaoud MA, Bouzid A, Ellouze N. A new method for pitch tracking and voicing decision based on spectral multi-scale analysis. Signal Process: An Int J. 2009;3(5):144–9.
Google Scholar
Burrus CS, Gopinath RA, Guo H. Introduction to wavelets and wavelet transforms: a primer. Englewood Cliffs: Prentice Hall; 1998.
Google Scholar
Mallat S. A wavelet tour of signal processing: the sparse way. 3rd ed. Burlington, VT: Academic Press; 2008.
Google Scholar
Berman Z, Baras JS. Properties of the multiscale maxima and zero-crossings representations. IEEE Trans Signal Process. 1993;41(12):3216–31.
Article Google Scholar
Kadambe S, Boudreaux-Bartels GF. Application of the wavelet transform for pitch detection of speech signals. IEEE Trans Inf Theory. 1992;38(2):917–8.
Article Google Scholar
Bouzid A, Ellouze N. Voice source parameter measurement based on multi-scale analysis of electroglottographic signal. Speech Commun. 2009;51(9):782–92.
Article Google Scholar
Bouzid A, Ellouze N. Open quotient measurements based on multiscale product of speech signal wavelet transform. New York: Hindawi Publishing Corp, Res Lett Signal Process; 2007. p. 1–6.
Xu Y, Weaver JB, Healy DM, Lu J. Wavelet transform domain filters: a spatially selective noise filtration technique. IEEE Trans Image Process. 1994;3(6):747–58.
Article CAS PubMed Google Scholar
Sadler BM, Swami A. Analysis of multi-scale products for step detection and estimation. IEEE Trans Inf Theory. 1999;45(3):1043–9.
Article Google Scholar
Meyer G, Plante F, Ainsworth WA. A pitch extraction reference database. The 4th European conference on speech communication and technology, EUROSPEECH. Madrid, Spain; 1995. p. 837–40.
Keele Pitch Database. In: Psychology Home page-human machine perception. University of Liverpool. 1995. http://www.liv.ac.uk/Psychology/hmp/projects/pitch/speech/keele_pitch_database.html. Accessed 24 April 2010.
Joho D, Bennewitz M, Behnke S. Pitch estimation using models of voiced speech on three levels. IEEE Int Conf Acoust Speech Signal Process. 2007;4:1077–80.
Google Scholar
Sha F, Saul LK. Real time pitch determination of one or more voices by nonnegative matrix factorization. In: Saul LK, Weiss Y, Bottou L, editors. Advances in neural information processing systems. Cambridge: MIT Press; 2005. p. 1233–40.
Google Scholar
Sha F, Burgoyne JA, Saul LK. Multiband statistical learning for f0 estimation in speech. IEEE Int Conf Acoust Speech Signal process. 2004;5:661–4.
Google Scholar
Nakatani T, Irino T. Robust and accurate fundamental frequency estimation based on dominant harmonic components. J Acoust Soc Amer. 2004;116(6):3690–700.
Article Google Scholar
Noisex92. In: Signal Processing Information Base (SPIB). The Signal Processing Society and the National Science Foundation. 2007. http://spib.rice.edu/spib/select_noise.html. Accessed 24 April 2010.

Download references

Acknowledgments

The authors are very grateful to Alain de Cheveigné for providing the fundamental frequency estimation software (Yin algorithm).

Author information

Authors and Affiliations

Department of Electrical Engineering, Tunis El Manar University, ENIT, BP. 37 Le Belvédère, 1002, Tunis, Tunisia
Mohamed Anouar Ben Messaoud, Aïcha Bouzid & Noureddine Ellouze

Authors

Mohamed Anouar Ben Messaoud
View author publications
You can also search for this author in PubMed Google Scholar
Aïcha Bouzid
View author publications
You can also search for this author in PubMed Google Scholar
Noureddine Ellouze
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Anouar Ben Messaoud.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ben Messaoud, M.A., Bouzid, A. & Ellouze, N. Autocorrelation of the Speech Multi-Scale Product for Voicing Decision and Pitch Estimation. Cogn Comput 2, 151–159 (2010). https://doi.org/10.1007/s12559-010-9048-1

Download citation

Received: 30 December 2009
Accepted: 10 May 2010
Published: 26 May 2010
Issue Date: September 2010
DOI: https://doi.org/10.1007/s12559-010-9048-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Autocorrelation of the Speech Multi-Scale Product for Voicing Decision and Pitch Estimation

Abstract

Access this article

Similar content being viewed by others

Pitch Estimation Based on the Cepstrum Analysis by the Multi Scale Product of Clean and Noisy Speech

An Efficient Method for Fundamental Frequency Determination of Noisy Speech

Pitch segmentation of speech signals based on short-time energy waveform

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Autocorrelation of the Speech Multi-Scale Product for Voicing Decision and Pitch Estimation

Abstract

Access this article

Similar content being viewed by others

Pitch Estimation Based on the Cepstrum Analysis by the Multi Scale Product of Clean and Noisy Speech

An Efficient Method for Fundamental Frequency Determination of Noisy Speech

Pitch segmentation of speech signals based on short-time energy waveform

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation