Abstract
Filter bank approach is commonly used in feature extraction phase of speech recognition (e.g. Mel frequency cepstral coefficients). Filter bank is applied for modification of magnitude spectrum according to physiological and psychological findings. However, since mechanism of human auditory system is not fully understood, the optimal filter bank parameters are not known. This work presents a method where the filter bank, optimized for discriminability between phonemes, is derived directly from phonetically labeled speech data using Linear Discriminant Analysis. This work can be seen as another proof of the fact that incorporation of psychoacoustic findings into feature extraction can lead to better recognition performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
B. Gold and N. Morgan. Speech and Audio Signal Processing, New York, 1999.
S. B. Davis and P. Mermelstein. Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences IEEE Trans. on Acoustics, Speech & Signal Processing, vol. 28, No. 4, pp. 357–366, 1980
M. J. Hunt. A statistical approach to metrics for word and syllable recognition J. Acoust Soc. Am., vol. 66(S1), S35(A), 1979
N. Malayath. Data-Driven Methods for Extracting Features from Speech Ph.D. thesis, Oregon Graduate Institute, Portland, USA, 2000.
H. Hermansky and N. Malayath. Spectral Basis Functions from Discriminant Analysis in Proceedings ICSLP’98, Sydney, Australia, November 1998.
L. Rabiner and B. H. Juang. Fundamentals of speech recognition Signal Processing. Prentice Hall, Engelwood cliffs, NJ, 1993.
S. Young. The HTK Book Entropics Ltd. 1999
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Burget, L., Heřmanský, H. (2001). Data Driven Design of Filter Bank for Speech Recognition. In: Matoušek, V., Mautner, P., Mouček, R., Taušer, K. (eds) Text, Speech and Dialogue. TSD 2001. Lecture Notes in Computer Science(), vol 2166. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44805-5_40
Download citation
DOI: https://doi.org/10.1007/3-540-44805-5_40
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42557-1
Online ISBN: 978-3-540-44805-1
eBook Packages: Springer Book Archive