Abstract
We recently reported the use of Kohonen's feature map as the hidden layer of an RBF network for the recognition of spoken letters [1], and the analysis of sleep EEG [2]. The feature map was shown to act as an aid to visualization during the initial period of unsupervised learning in the hidden layer. In this paper, we again explore the topology preserving properties of Kohonen's feature map, this time for the visual interpretation of speech. It is shown that speech sounds, such as words or phonemes, may be displayed as moving trajectories on a computer screen and enhanced for ease of interpretation. A system known as the Visual Ear is introduced, in which speech from a normal speaker is displayed alongside that of a pupil learning pronunciation, enabling a visual comparison to be made between the two. The application of the Visual Ear to accelerated learning of foreign languages, or as a general speech therapy tool, are then discussed, and the limitations of the present system are highlighted.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Reynolds J, Tarassenko L. Spoken letter recognition with neural networks. Int J Neural Syst 1992; 3(3): 219–235
Roberts S, Tarassenko L. Analysis of the sleep eeg using a multilayer network with spatial organisation. IEE Proc F 1992; 139(6): 420–425
Hardcastle WJ, Gibbon FE, Jones W. Visual display of tongue palate contact: Electropalatography in the assessment and remediation of speech disorders. Br J Disorders of Commun 1991; 26: 41–74
Kohonen T. Self-Organisation and Associative Memory, Springer-Verlag, Berlin, 1988
Moody J, Darken C. Fast learning in networks of locally-tuned processing units. Neural Comput 1989; 1: 281–294
Lippmann R. An introduction to computing with neural nets. IEEE ASSP Mag 1987; 4(2): 4–22
Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust, Speech, and Signal Process 1980; 28(4): 357–366
Tattersall G, Linford P, Linggard R. Neural arrays for speech recognition. Br Telecom Tech J 1988; 6(2): 140–163
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Reynolds, J., Tarassenko, L. Learning pronunciation with the Visual ear. Neural Comput & Applic 1, 169–175 (1993). https://doi.org/10.1007/BF01414942
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF01414942