Abstract
This paper proposes modifications to the Multi-resolution RASTA (MRASTA) feature extraction technique for the automatic speech recognition (ASR). By emulating asymmetries of the temporal receptive field (TRF) profiles of higher level auditory neurons, we obtain more than 11.4% relative improvement in word error rate on OGI-Digits database. Experiments on TIMIT database confirm that proposed modifications are indeed useful.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hermansky, H.: Perceptual Linear Predictive (PLP) Analysis of Speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)
Hermansky, H., Fousek, P.: Multi-resolution RASTA filtering for TANDEM-based ASR. In: INTERSPEECH, September 2005, pp. 361–364 (2005)
Hermansky, H., Ellis, D.P.W., Sharma, S.: Tandem connectionist feature extraction for conventional HMMsystems. In: Proc. of ICASSP, Istanbul, Turkey (2000)
Depireux, D.A., Simon, J.Z., Klein, D.J., Shamma, S.A.: Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. Journal of Neurophysiology 85, 1220–1234 (2001)
Schreiner, C.E., Read, H.L., Sutter, M.L.: Modular Organization of Frequency Integration in Primary Auditory Cortex. Annual Review of Neuroscience 23, 501–529 (2000)
Qiu, A., Schreiner, C.E., Escabi, M.A.: Gabor analysis of auditory midbrain receptive fields: Spectro-temporal and binaural composition. Journal of Neurophysiology 90 (2003)
Theunissen, F.E., Sen, K., Doupe, A.J.: Spectral-Temporal Receptive Fields of Nonlinear Auditory Neurons Obtained Using Natural Sounds. Journal of Neurophysiology 20, 2315–2331 (2000)
Kleinschmidt, M., Gelbart, D.: Improving Word Accuracy with Gabor Feature Extraction. In: Proc. of ICSLP, Colorado, USA (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sivaram, G.S.V.S., Hermansky, H. (2008). Emulating Temporal Receptive Fields of Higher Level Auditory Neurons for ASR. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_65
Download citation
DOI: https://doi.org/10.1007/978-3-540-87391-4_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87390-7
Online ISBN: 978-3-540-87391-4
eBook Packages: Computer ScienceComputer Science (R0)