Abstract
In this chapter we will discuss feature extraction methods for speaker classification. We introduce linear predictive coding, mel frequency cepstral coefficients and wavelets and perform experimental studies on AURORA and TIMIT data. For the speaker identification task, we can show that wavelets are beneficial.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Itakura, F., Saito, S.: Analysis synthesis telephony based upon the maximum likelihood method. In: Kohasi, Y. (ed.) Reports of 6th Int. Cong. Acoust. (1968)
Itakura, F., Saito, S.: Analysis synthesis telephony based on the partial autocorrelation coefficient, Acoust. Soc. of Japan Meeting (1969)
Mouly, M., Pautet, M.B.: The GSM System for Mobile Communications. Telecom Publishing (1992)
Markel, J.D., Gray, A.H.: Linear prediction of speech. Springer, Heidelberg (1976)
Levinson, N.: The wiener rms error criterion in filter design and prediction. J. Math. Phys. 25, 261–278 (1947)
Davis, S., Mermelstein, P.: Comparison of parametric representation for monosyllable word recognition in continuously spoken sentences. IEEE Trans. on Acoustics, Speech, and Signal Processing 28, 357–366 (1980)
Heerden, C.J.v., Barnard, E.: Durations of context-dependent phonemes: A new feature in speaker verification. In: Müller, C. (ed.) Speaker Classification. LNCS(LNAI), vol. 4441, Springer, Heidelberg (this issue, 2007)
Bellegarda, J.R.: Language–independent speaker classification over a far–field microphone. In: Müller, C. (ed.) Speaker Classification. LNCS(LNAI), vol. 4441, Springer, Heidelberg (this issue, 2007)
Garcia, G., Jung, S.-K., Eriksson, T.: Bayes-optimal estimation of gmm parameters for speaker recognition. In: Müller, C. (ed.) Speaker Classification I. LNCS(LNAI), vol. 4343, Springer, Heidelberg (this issue, 2007)
Stevens, S.S., Volkmann, J., Newmann, E.B.: A scale for the measurement of a psychological magnitude pitch. Journal of the Acoustical Society of America 8, 185–190 (1937)
Jain, A.: A sinusoidal family of unitary transforms. In: PAMI (1979)
Schulz, T.: Speaker characteristics. In: Müller, C. (ed.) Speaker Classification I. LNCS (LNAI), vol. 4343, Springer, Heidelberg (this issue, 2007)
Sturim, D.E., Campbell, W.M., Reynolds, D.A.: Classification methods for speaker recognition. In: Müller, C. (ed.) Speaker Classification I. LNCS(LNAI), vol. 4343, Springer, Heidelberg (this issue, 2007)
Furui, S.: Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Transactions on Acoustic, Speech, and Signal Processing 34, 52–59 (1986)
Bradley, J.N., Brislawn, C.M., Hopper, T.: Fbi wavelet/scalar quantization standard for gray-scale fingerprint image compression. In: Proc. SPIE. vol. 1961, pp. 293–304 (1993)
Christopoulos, C.A., Ebrahimi, T., Skodras, A.: Jpeg 2000: the new still picture compression standard. In: Proceedings of the ACM workshops on Multimedia, pp. 45–49 (2000)
Daubechies, I.: Ten Lectures on Wavelets (C B M S - N S F Regional Conference Series in Applied Mathematics). Soc. for Industrial & Applied Math. (1992)
Vetterli, M., Kovacevic, J.: Wavelets and Subband Coding. Prentice-Hall, Englewood Cliffs (1995)
Sarikaya, R., Pellom, B., Hansen, J.: Wavelet packet transform features with application to speaker identification. In: NORSIG 1998, pp. 81–84 (1998)
Erzin, E., Cetin, A.E., Yardimici, Y.: Subband analysis for robust speech recognition in the presence of car noise. In: Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE Computer Society Press, Los Alamitos (1995)
Kim, K., Youn, D.H., Lee, C.: Evaluation of wavelet filters for speech recognition. In: Proc. of the IEEE International Conference on Systems, Man, and Cybernetics, IEEE Computer Society Press, Los Alamitos (2000)
Hirsch, H.G., Pearce, D.: The AURORA experimental framework for the performance evaluation of speech recognition under noisy conditions. In: Proceedings of the ISCA ITRW ASR (2000)
Leonard, R.: A database for speaker independent digit recognition (1984)
Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., Dahlgren, N.: Darpa timit acoustic-phonetic continuous speech corpus cd-rom (1993)
Varga, A., Steeneken, H.: Assessment for automatic speech recognition: Noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication 12(3), 247–251 (1993)
Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book Version 2.2. Entropic (1999)
Nordstrm, F., Holst, J., Lindoff, B.: Time and frequency dependent noise reduction in speech signals. In: Proc. Int. Conf. on Signal Proc. Appl. and Techn. (1999)
Reynolds, D.A.: Experimental evaluation of features for robust speaker identification. IEEE Transactions on SAP 2, 639–643 (1994)
Collobert, R., Bengio, S., Marithoz, J.: Torch: a modular machine learning software library. Technical report (2002)
Modic, R., Lindberg, B., Petek, B.: Comparative wavelet and mfcc speech recognition experiments on the slovenian and english speechdat2. In: Proc. Isca-ITRW NOLISP (2003)
Bengio, S.: Multimodal speech processing using asynchronous hidden markov models. In: Proc. Information Fusion (2004)
Bengio, S.: Multimodal authentication using asynchronous HMMs. IDIAP-RR 03-02 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Schacht, S., Koreman, J., Lauer, C., Morris, A., Wu, D., Klakow, D. (2007). Frame Based Features. In: Müller, C. (eds) Speaker Classification I. Lecture Notes in Computer Science(), vol 4343. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74200-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-74200-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74186-2
Online ISBN: 978-3-540-74200-5
eBook Packages: Computer ScienceComputer Science (R0)