Frame Based Features

Schacht, Stefan; Koreman, Jacques; Lauer, Christoph; Morris, Andrew; Wu, Dalei; Klakow, Dietrich

doi:10.1007/978-3-540-74200-5_13

Stefan Schacht¹,
Jacques Koreman¹,
Christoph Lauer¹,
Andrew Morris¹,
Dalei Wu¹ &
…
Dietrich Klakow¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4343))

2376 Accesses
1 Citations

Abstract

In this chapter we will discuss feature extraction methods for speaker classification. We introduce linear predictive coding, mel frequency cepstral coefficients and wavelets and perform experimental studies on AURORA and TIMIT data. For the speaker identification task, we can show that wavelets are beneficial.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Itakura, F., Saito, S.: Analysis synthesis telephony based upon the maximum likelihood method. In: Kohasi, Y. (ed.) Reports of 6th Int. Cong. Acoust. (1968)
Google Scholar
Itakura, F., Saito, S.: Analysis synthesis telephony based on the partial autocorrelation coefficient, Acoust. Soc. of Japan Meeting (1969)
Google Scholar
Mouly, M., Pautet, M.B.: The GSM System for Mobile Communications. Telecom Publishing (1992)
Google Scholar
Markel, J.D., Gray, A.H.: Linear prediction of speech. Springer, Heidelberg (1976)
MATH Google Scholar
Levinson, N.: The wiener rms error criterion in filter design and prediction. J. Math. Phys. 25, 261–278 (1947)
MathSciNet Google Scholar
Davis, S., Mermelstein, P.: Comparison of parametric representation for monosyllable word recognition in continuously spoken sentences. IEEE Trans. on Acoustics, Speech, and Signal Processing 28, 357–366 (1980)
Article Google Scholar
Heerden, C.J.v., Barnard, E.: Durations of context-dependent phonemes: A new feature in speaker verification. In: Müller, C. (ed.) Speaker Classification. LNCS(LNAI), vol. 4441, Springer, Heidelberg (this issue, 2007)
Google Scholar
Bellegarda, J.R.: Language–independent speaker classification over a far–field microphone. In: Müller, C. (ed.) Speaker Classification. LNCS(LNAI), vol. 4441, Springer, Heidelberg (this issue, 2007)
Google Scholar
Garcia, G., Jung, S.-K., Eriksson, T.: Bayes-optimal estimation of gmm parameters for speaker recognition. In: Müller, C. (ed.) Speaker Classification I. LNCS(LNAI), vol. 4343, Springer, Heidelberg (this issue, 2007)
Google Scholar
Stevens, S.S., Volkmann, J., Newmann, E.B.: A scale for the measurement of a psychological magnitude pitch. Journal of the Acoustical Society of America 8, 185–190 (1937)
Article Google Scholar
Jain, A.: A sinusoidal family of unitary transforms. In: PAMI (1979)
Google Scholar
Schulz, T.: Speaker characteristics. In: Müller, C. (ed.) Speaker Classification I. LNCS (LNAI), vol. 4343, Springer, Heidelberg (this issue, 2007)
Google Scholar
Sturim, D.E., Campbell, W.M., Reynolds, D.A.: Classification methods for speaker recognition. In: Müller, C. (ed.) Speaker Classification I. LNCS(LNAI), vol. 4343, Springer, Heidelberg (this issue, 2007)
Google Scholar
Furui, S.: Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Transactions on Acoustic, Speech, and Signal Processing 34, 52–59 (1986)
Article Google Scholar
Bradley, J.N., Brislawn, C.M., Hopper, T.: Fbi wavelet/scalar quantization standard for gray-scale fingerprint image compression. In: Proc. SPIE. vol. 1961, pp. 293–304 (1993)
Google Scholar
Christopoulos, C.A., Ebrahimi, T., Skodras, A.: Jpeg 2000: the new still picture compression standard. In: Proceedings of the ACM workshops on Multimedia, pp. 45–49 (2000)
Google Scholar
Daubechies, I.: Ten Lectures on Wavelets (C B M S - N S F Regional Conference Series in Applied Mathematics). Soc. for Industrial & Applied Math. (1992)
Google Scholar
Vetterli, M., Kovacevic, J.: Wavelets and Subband Coding. Prentice-Hall, Englewood Cliffs (1995)
MATH Google Scholar
Sarikaya, R., Pellom, B., Hansen, J.: Wavelet packet transform features with application to speaker identification. In: NORSIG 1998, pp. 81–84 (1998)
Google Scholar
Erzin, E., Cetin, A.E., Yardimici, Y.: Subband analysis for robust speech recognition in the presence of car noise. In: Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE Computer Society Press, Los Alamitos (1995)
Google Scholar
Kim, K., Youn, D.H., Lee, C.: Evaluation of wavelet filters for speech recognition. In: Proc. of the IEEE International Conference on Systems, Man, and Cybernetics, IEEE Computer Society Press, Los Alamitos (2000)
Google Scholar
Hirsch, H.G., Pearce, D.: The AURORA experimental framework for the performance evaluation of speech recognition under noisy conditions. In: Proceedings of the ISCA ITRW ASR (2000)
Google Scholar
Leonard, R.: A database for speaker independent digit recognition (1984)
Google Scholar
Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., Dahlgren, N.: Darpa timit acoustic-phonetic continuous speech corpus cd-rom (1993)
Google Scholar
Varga, A., Steeneken, H.: Assessment for automatic speech recognition: Noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication 12(3), 247–251 (1993)
Article Google Scholar
Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book Version 2.2. Entropic (1999)
Google Scholar
Nordstrm, F., Holst, J., Lindoff, B.: Time and frequency dependent noise reduction in speech signals. In: Proc. Int. Conf. on Signal Proc. Appl. and Techn. (1999)
Google Scholar
Reynolds, D.A.: Experimental evaluation of features for robust speaker identification. IEEE Transactions on SAP 2, 639–643 (1994)
Google Scholar
Collobert, R., Bengio, S., Marithoz, J.: Torch: a modular machine learning software library. Technical report (2002)
Google Scholar
Modic, R., Lindberg, B., Petek, B.: Comparative wavelet and mfcc speech recognition experiments on the slovenian and english speechdat2. In: Proc. Isca-ITRW NOLISP (2003)
Google Scholar
Bengio, S.: Multimodal speech processing using asynchronous hidden markov models. In: Proc. Information Fusion (2004)
Google Scholar
Bengio, S.: Multimodal authentication using asynchronous HMMs. IDIAP-RR 03-02 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Spoken Language Systems/Phonetics, Universität des Saarlandes, Saarbrücken, Germany
Stefan Schacht, Jacques Koreman, Christoph Lauer, Andrew Morris, Dalei Wu & Dietrich Klakow

Authors

Stefan Schacht
View author publications
You can also search for this author in PubMed Google Scholar
Jacques Koreman
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Lauer
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Morris
View author publications
You can also search for this author in PubMed Google Scholar
Dalei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Dietrich Klakow
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Christian Müller

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schacht, S., Koreman, J., Lauer, C., Morris, A., Wu, D., Klakow, D. (2007). Frame Based Features. In: Müller, C. (eds) Speaker Classification I. Lecture Notes in Computer Science(), vol 4343. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74200-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-540-74200-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74186-2
Online ISBN: 978-3-540-74200-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics