Wavelet basis selection for enhanced speech parametrization in speaker verification

Ganchev, Todor; Siafarikas, Mihalis; Mporas, Iosif; Stoyanova, Tsenka

doi:10.1007/s10772-013-9202-8

Wavelet basis selection for enhanced speech parametrization in speaker verification

Published: 16 June 2013

Volume 17, pages 27–36, (2014)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Todor Ganchev¹,
Mihalis Siafarikas²,
Iosif Mporas^3,4 &
…
Tsenka Stoyanova²

312 Accesses
6 Citations
Explore all metrics

Abstract

We study the inherent properties of nine wavelet functions and subsequently evaluate their applicability as basis functions in a speech parametrization scheme that is advantageous for speaker verification. Particularly, the inherent properties of nine candidate basis functions are initially analysed and their advantages and disadvantages are discussed. Subsequently, all candidates are employed in a well-proven speech parametrization scheme, and the resulting speech features are computed. Finally, these speech features are evaluated in a common experimental set-up on the speaker verification task. The experimental results, obtained on two well-known speaker recognition databases, show that the Battle-Lemarié wavelet function is the most advantageous one, among all other functions evaluated here, since it leads to the most beneficial speech descriptors. When compared to the baseline Mel-frequency cepstral coefficients (MFCC), a relative reduction of the equal error rate by 4.2 % was observed on the 2001 NIST speaker recognition evaluation database, and by 2.3 % on the Polycost speaker recognition database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text-independent speaker identification system using discrete wavelet transform with linear prediction coding

Article Open access 09 February 2024

Othman Alrusaini & Khaled Daqrouq

Wavelet Packet Based Mel Frequency Cepstral Features for Text Independent Speaker Identification

Wavelet Scattering Transform Depth Benefit, An Application for Speaker Identification

References

Battle, G. (1987). A block spin construction of ondelettes. Part I: Lemarié functions. Communications in Mathematical Physics, 110, 601–615.
Article MathSciNet Google Scholar
Beylkin, G., Coifman, R., & Rokhlin, V. (1991). Fast wavelet transforms and numerical algorithms. Communications on Pure and Applied Mathematics, 44, 141–183.
Article MATH MathSciNet Google Scholar
Daubechies, I. (1992). Ten lectures on wavelets. Philadelphia: SIAM.
Book MATH Google Scholar
Erzin, E., Cetin, A. E., & Yardimci, Y. (1995). Subband analysis for speech recognition in the presence of car noise. In Proc. of the ICASSP-95 (Vol. 1, pp. 417–420).
Google Scholar
ETSI ES 201 108, V1.1.2 (2000-4) (2000). ETSI Standard: speech processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Extended advanced front-end feature extraction algorithm; Compression algorithms; Back-end speech reconstruction algorithm, April 2000, Chap. 4, pp. 8–11.
ETSI ES 202 050, V1.1.5 (2007-1) (2007). ETSI Standard: speech processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Extended advanced front-end feature extraction algorithm; Compression algorithms; Back-end speech reconstruction algorithm, January 2007, Sect. 5.3, pp. 21–24.
Farooq, O., & Datta, S. (2001). Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Processing Letters, 8(7), 196–198.
Article Google Scholar
Ganchev, T. (2005). Speaker recognition. PhD dissertation, Dept. of Electrical and Computer Engineering, University of Patras, Greece, Nov. 2005.
Ganchev, T., Fakotakis, N., & Kokkinakis, G. (2002). A speaker verification system based on probabilistic neural networks. In 2002 NIST speaker recognition evaluation, results CD workshop presentations & final release of results, Vienna, Virginia, USA.
Google Scholar
Guido, R. C., Vieira, L. S., Junior, S. B., Sanchez, F. L., Maciel, C. D., Fonseca, E. S., & Pereira, J. C. (2007). A neural-wavelet architecture for voice conversion. Neurocomputing, 71, 174–180.
Article Google Scholar
Hennebert, J., Melin, H., Petrovska, D., & Genoud, D. (2000). Polycost: a telephone-speech database for speaker recognition. Speech Communication, 31(2–3), 265–270.
Article Google Scholar
Lemarié, P. G. (1988). Ondelettes à localisation exponentielle. Journal de Mathématiques Pures et Appliquées, 67, 227–236.
MATH Google Scholar
Li, J., Tang, Y., Yan, Z., & Zhang, W. (2001). Uniform analytic construction of wavelet analysis filters based on sine and cosine trigonometric functions. Applied Mathematics and Mechanics, 22(5), 569–585.
Article MATH MathSciNet Google Scholar
Long, C. J., & Datta, S. (1996). Wavelet based feature extraction for phoneme recognition. In Proc. of the ICSLP-96 (Vol. 1, pp. 264–267).
Google Scholar
Mallat, S. (1998). A wavelet tour of signal processing. San Diego: Academic Press.
MATH Google Scholar
Moore, B. C. J. (2003). An introduction to the psychology of hearing (5th edn.). London: Academic Press.
Google Scholar
NIST SRE Plan (2001). The NIST year 2001 speaker recognition evaluation plan. National Institute of Standards and Technology of USA. Available: http://www.nist.gov/speech/tests/spk/2001/doc/2001-spkrec-evalplan-v05.9.pdf.
NIST SRE Plan (2002). The NIST year 2002 speaker recognition evaluation plan. National Institute of Standards and Technology of USA. Available: http://www.nist.gov/speech/tests/spk/2002/doc/2002-spkrec-evalplan-v60.pdf.
Nogueira, W., Giese, A., Edler, B., & Büchner, A. (2006). Wavelet packet filterbank for speech processing strategies in cochlear implants. In Proc. of the IEEE ICASSP 2006 (Vol. 5, pp. 121–124).
Google Scholar
Polycost Bugs (1999). A list of known bugs in version 1.0 of POLYCOST database. The Polycost web-page. Available. http://circhp.epfl.ch/polycost/polybugs.htm.
Rabiner, L. R., Cheng, M. J., Rosenberg, A. E., & McGonegal, C. A. (1976). A comparative performance study of several pitch detection algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(5), 399–418.
Article Google Scholar
Sarikaya, R., & Hansen, J. H. L. (2000). High resolution speech feature parameterization for monophone-based stressed speech recognition. IEEE Signal Processing Letters, 7(7), 182–185.
Article Google Scholar
Sarikaya, R., Pellom, B. L., & Hansen, J. H. L. (1998). Wavelet packet transform features with application to speaker identification. In Proc. of the IEEE nordic signal processing symposium (NORSIG’98) (pp. 81–84).
Google Scholar
Siafarikas, M., Ganchev, T., Fakotakis, N., & Kokkinakis, G. (2007). Wavelet packet approximation of critical bands for speaker verification. International Journal of Speech Technology, 10(4), 197–218.
Article Google Scholar
Slaney, M. (1998). Auditory toolbox. Version 2 (Technical Report #1998-010). Interval Research Corporation.
Tufekci, Z., & Gowdy, J. N. (2000). Feature extraction using discrete wavelet transform for speech recognition. In Proc. of the IEEE SoutheastCon 2000 (pp. 116–123).
Google Scholar
Yan, R. (2007). Base wavelet selection criteria for non-stationary vibration analysis in bearing health diagnosis. PhD dissertation, Dept. of Mechanical & Industrial Engineering, University of Massachusetts, Amherst, USA, May 2007.

Download references

Author information

Authors and Affiliations

Department of Electronics, Technical University–Varna, 9010, Varna, Bulgaria
Todor Ganchev
Department of Electrical & Computer Engineering, University of Patras, Rion-Patras, 26500, Greece
Mihalis Siafarikas & Tsenka Stoyanova
Dept. of ECE, University of Patras, 26500, Patras, Greece
Iosif Mporas
Dept. of Mechanical Engineering, Technological Educational Institute of Patras, 26334, Patras, Greece
Iosif Mporas

Authors

Todor Ganchev
View author publications
You can also search for this author in PubMed Google Scholar
Mihalis Siafarikas
View author publications
You can also search for this author in PubMed Google Scholar
Iosif Mporas
View author publications
You can also search for this author in PubMed Google Scholar
Tsenka Stoyanova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Todor Ganchev.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ganchev, T., Siafarikas, M., Mporas, I. et al. Wavelet basis selection for enhanced speech parametrization in speaker verification. Int J Speech Technol 17, 27–36 (2014). https://doi.org/10.1007/s10772-013-9202-8

Download citation

Received: 14 February 2013
Accepted: 28 May 2013
Published: 16 June 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s10772-013-9202-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Wavelet basis selection for enhanced speech parametrization in speaker verification

Abstract

Access this article

Similar content being viewed by others

Text-independent speaker identification system using discrete wavelet transform with linear prediction coding

Wavelet Packet Based Mel Frequency Cepstral Features for Text Independent Speaker Identification

Wavelet Scattering Transform Depth Benefit, An Application for Speaker Identification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Wavelet basis selection for enhanced speech parametrization in speaker verification

Abstract

Access this article

Similar content being viewed by others

Text-independent speaker identification system using discrete wavelet transform with linear prediction coding

Wavelet Packet Based Mel Frequency Cepstral Features for Text Independent Speaker Identification

Wavelet Scattering Transform Depth Benefit, An Application for Speaker Identification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation