Skip to main content
Log in

Wavelet packet approximation of critical bands for speaker verification

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Exploiting the capabilities offered by the plethora of existing wavelets, together with the powerful set of orthonormal bases provided by wavelet packets, we construct a novel wavelet packet-based set of speech features that is optimized for the task of speaker verification. Our approach differs from previous wavelet-based work, primarily in the wavelet-packet tree design that follows the concept of critical bands, as well as in the particular wavelet basis function that has been used. In comparative experiments, we investigate several alternative speech parameterizations with respect to their usefulness for differentiating among human voices. The experimental results confirm that the proposed speech features outperform Mel-Frequency Cepstral Coefficients (MFCC) and previously used wavelet features on the task of speaker verification. A relative reduction of the equal error rate by 15%, 15% and 8% was observed for the proposed speech features, when compared to the wavelet packet features introduced by Farooq and Datta, the MFCC of Slaney, and the subband based cepstral coefficients of Sarikaya et al., respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Assaleh, K. T., & Mammone, R. J. (1994a). Robust cepstral features for speaker identification. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, (ICASSP’94) (Vol. 1, pp. 129–132). Adelaide, Australia.

  • Assaleh, K. T., & Mammone, R. J. (1994b). New LP-derived features for speaker identification. IEEE Transactions on Speech and Audio Processing, 2(4), 630–638.

    Article  Google Scholar 

  • Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Journal of the Acoustical Society of America, 55(6), 1304–1312.

    Article  Google Scholar 

  • Atal, B. S., & Hanauer, S. L. (1971). Speech analysis and synthesis by linear prediction of the speech wave. Journal of the Acoustical Society of America, 50(2), 637–655.

    Article  Google Scholar 

  • Daubechies, I. (1992). Ten lectures on wavelets. Philadelphia: SIAM.

    MATH  Google Scholar 

  • Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustic, Speech and Signal Processing, 28(4), 357–366.

    Article  Google Scholar 

  • Erzin, E., Cetin, A. E., & Yardimci, Y. (1995). Subband analysis for speech recognition in the presence of car noise. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP-95) (Vol. 1, pp. 417–420). Detroit, MI, USA.

  • Farooq, O., & Datta, S. (2001). Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Processing Letters, 8(7), 196–198.

    Article  Google Scholar 

  • Farooq, O., & Datta, S. (2002). Mel-scaled wavelet filter based features for noisy unvoiced phoneme recognition. In Proceedings of the 7th international conference on spoken language processing (ICSLP 2002) (pp. 1017–1020). Denver, Colorado, USA.

  • Fletcher, H. (1940). Auditory patterns. Reviews of Modern Physics, 12, 47–65.

    Article  Google Scholar 

  • Ganchev, T. (2005). Speaker recognition. Ph.D. dissertation, Dept. of Electrical and Computer Engineering, University of Patras, Greece, Nov. 2005.

  • Ganchev, T., Fakotakis, N., & Kokkinakis, G. (2002a). Text-independent speaker verification based on Probabilistic Neural Networks. In Proceedings of the acoustics 2002 (pp. 159–166). Patras, Greece.

  • Ganchev, T., Fakotakis, N., & Kokkinakis, G. (2002b). A speaker verification system based on Probabilistic Neural Networks. In 2002 NIST speaker recognition evaluation, results CD workshop presentations & final release of results, Vienna, Virginia, USA.

  • Ganchev, T., Siafarikas, M., & Fakotakis, N. (2004). Speaker verification based on wavelet packets. Lecture notes in computer science. Heidelberg: Springer. ISSN: 0302-9743, LNAI 3206/2004:299–306.

    Google Scholar 

  • Ganchev, T., Fakotakis, N., & Kokkinakis, G. (2005). Comparative evaluation of various MFCC implementations on the speaker verification task. In Proceedings of the 10th international conference on speech and computer, SPECOM 2005 (Vol. 1, pp. 191–194). October 17–19, 2005, Patras, Greece.

  • Glasberg, B. R., & Moore, B. C. J. (1990). Derivation of auditory filter shapes from notched-noise data. Hearing Research, 47(1–2), 103–138.

    Article  Google Scholar 

  • Hartigan, J. A., & Wong, M. A. (1979). A k-means clustering algorithm. Applied Statistics, 28(1), 100–108.

    Article  MATH  Google Scholar 

  • Hennebert, J., Melin, H., Genoud, D., & Petrovska-Delacretaz, D. (1996). The POLYCOST 250 Database (v1.0), COST250 report.

  • Hennebert, J., Melin, H., Petrovska, D., & Genoud, D. (2000). Polycost: a telephone-speech database for speaker recognition. Speech Communication, 31(2–3), 265–270.

    Article  Google Scholar 

  • Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis for speech. Journal of the Acoustical Society of America, 87(4), 1738–1752.

    Article  Google Scholar 

  • Long, C. J., & Datta, S. (1996). Wavelet based feature extraction for phoneme recognition. In Proceedings of the 4th international conference on spoken language processing (ICSLP-96) (Vol. 1, pp. 264–267). Philadelphia, USA.

  • Mallat, S. (1998). A wavelet tour of signal processing. San Diego: Academic Press.

    MATH  Google Scholar 

  • Moore, B. C. J. (2003). An introduction to the psychology of hearing (5th edn.). San Diego: Academic Press.

    Google Scholar 

  • NIST (2001). The NIST year 2001 speaker recognition evaluation plan. National Institute of Standards and Technology of USA. Available: http://www.nist.gov/speech/tests/spk/2001/doc/2001-spkrec-evalplan-v05.9.pdf.

  • NIST (2002). The NIST year 2002 speaker recognition evaluation plan. National Institute of Standards and Technology of USA. Available: http://www.nist.gov/speech/tests/spk/2002/doc/2002-spkrec-evalplan-v60.pdf.

  • Nogueira, W., Büchner, A., Lenarz, T., & Edler, B. (2005). A Psychoacoustic “NofM”-type speech coding strategy for cochlear implants. EURASIP Journal on Applied Signal Processing—Special Issue on DSP in Hearing Aids and Cochlear Implants, 18, 3044–3059.

    Google Scholar 

  • Nogueira, W., Giese, A., Edler, B., & Büchner, A. (2006). Wavelet packet filter-bank for speech processing strategies in cochlear implants. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP 2006) (Vol. 5, pp. 121–124). Toulouse, France.

  • Oppenheim, A. V. (1969). A speech analysis-synthesis system based on homomorphic filtering. Journal of the Acoustical Society of America, 45, 458–465.

    Article  Google Scholar 

  • Parzen, E. (1962). On estimation of a probability density function and mode. Annals in Mathematical Statistics, 33, 1065–1076.

    Article  MATH  MathSciNet  Google Scholar 

  • Percival, D. B., & Walden, A. T. (2000). Wavelet methods for time series analysis. Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • Polycost Bugs (1999). A list of known bugs in version 1.0 of POLYCOST database. The Polycost Web-page. Available: http://circhp.epfl.ch/polycost/polybugs.htm.

  • Rabiner, L. R., Cheng, M. J., Rosenberg, A. E., & McGonegal, C. A. (1976). A comparative performance study of several pitch detection algorithms. IEEE Transactions on Acoustics, Speech and Signal Processing, 24(5), 399–418.

    Article  Google Scholar 

  • Sarikaya, R., & Hansen, H. L. (2000). High resolution speech feature parameterization for monophone-based stressed speech recognition. IEEE Signal Processing Letters, 7(7), 182–185.

    Article  Google Scholar 

  • Sarikaya, R., Pellom, B. L., & Hansen, J. H. L. (1998). Wavelet packet transform features with application to speaker identification. In Proceedings of the IEEE nordic signal processing symposium: (NORSIG’98) (pp. 81–84). Visgo, Denmark.

  • Siafarikas, M., Ganchev, T., & Fakotakis, N. (2004). Wavelet packets based speaker verification. In Proceedings of the ISCA speaker and language recognition workshop—Odyssey 2004 (pp. 257–264). Toledo, Spain.

  • Slaney, M. (1998). Auditory toolbox. Version 2 (Technical Report #1998-010). Interval Research Corporation.

  • Specht, D. F. (1990). Probabilistic neural networks. Neural Networks, 3(1), 109–118.

    Article  Google Scholar 

  • Tufekci, Z., & Gowdy, J. N. (2000). Feature extraction using discrete wavelet transform for speech recognition. In Proceedings of the IEEE SoutheastCon 2000 (pp. 116–123). Nashville, Tennessee, USA.

  • Young, S. J. (1993). The HTK hidden Markov model toolkit: design and philosophy (Technical Report TR. 153). Department of Engineering, Cambridge University, UK.

  • Zwicker, E. (1961). Subdivision of the audible frequency range into critical bands (Frequenzgruppen). Journal of the Acoustical Society of America, 33, 248–249.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Todor Ganchev.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Siafarikas, M., Ganchev, T., Fakotakis, N. et al. Wavelet packet approximation of critical bands for speaker verification. Int J Speech Technol 10, 197–218 (2007). https://doi.org/10.1007/s10772-009-9028-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-009-9028-6

Keywords

Navigation