Abstract
One of the current challenges in Pattern Recognition is the treatment of Speech, the complexity of this task being due fundamentally to the great statistical variability of the speakers, and to the temporal structure of the speech signal in itself. Throughout the current bibliography multiple discriminative methods and coding algorithms can be found claiming slight advances in the recognition rates, which may be considered important advances, as the field is reaching a verge difficult to move over. In this experiment a representation in bidimensional selforganizing maps of the decimal digits spoken in English (from one to nine is carried out. This representation has been checked taking data from the TIMIT database, starting from a previous code based in Perceptual Linear Prediction coefficients (PLP). Subsequently, a heuristic algorithm for recognition has been defined. The application of this algorithm to both a training data set and a test data set produces acceptable recognition rates, even for low-dimension maps with the benefit of the reduction in the computational costs. The basic methodology used and the mentioned results are presented and discussed.
Preview
Unable to display preview. Download preview PDF.
References
Furui, S., “Speaker-Independent Isolated Word Recognition using Dynamic Features of Speech Spectrum”, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 34, no. 1, February 1986, pp. 52–59.
Hermansky, H., “Perceptual Linear Predictive (PLP) Analysis of Speech”, Journal of the Acoustical Society of America, vol. 87, no. 4, pp 1738–1752, 1990.
Hermansky, H., Morgan, N., Bayya, A. andKohn, P., “RASTA-PLP Speech Analysis technique”, Proc. of the ICASSP'92, pp. I-121-124,1992.
Hermansky, H. and Morgan, N., “RASTA processing of Speech”, IEEE Transactions on Speech and Audio Processing, vol. 2, no. 4, October, pp. 578–589, 1994.
Kohonen, T., “The ‘Neural’ Phonetic Typewriter”, Computer, March, 1988, pp. 11–24.
Kohonen, T., “Physiological Interpretation of the Self-Organizing Map Algorithm”, Neural Networks, vol. 6, pp. 895–905, 1993.
Picone, J., “Signal Modeling Techniques in Speech Recognition”, Proceedings of the IEEE, vol. 81, no. 9, September, 1993, pp. 1215–1247.
Robinson T., “Speech Analysis. Notes”, Department of Electrical Engineering, University Cambridge, 1996.
Schreiner C. E., “Order and Disorder in Auditory Cortical Maps”, Current Opinion in Neurobiology, vol. 5, pp. 489–496, 1995.
Valtchev, V., “Discriminative Methods in HMM-based Speech Recognition”, PhD. Thesis, University of Cambridge, March 1995.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Díaz, F., Ferrández, J.M., Gómez, P., Rodellar, V., Nieto, V. (1997). Spoken-digit recognition using self-organizing maps with perceptual pre-processing. In: Mira, J., Moreno-Díaz, R., Cabestany, J. (eds) Biological and Artificial Computation: From Neuroscience to Technology. IWANN 1997. Lecture Notes in Computer Science, vol 1240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0032580
Download citation
DOI: https://doi.org/10.1007/BFb0032580
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63047-0
Online ISBN: 978-3-540-69074-0
eBook Packages: Springer Book Archive