Flexible vowel recognition by the generation of dynamic coherence in oscillator neural networks: speaker-independent vowel recognition

Liu, Fang; Yamaguchi, Yoko; Shimizu, Hiroshi

doi:10.1007/BF00197313

Flexible vowel recognition by the generation of dynamic coherence in oscillator neural networks: speaker-independent vowel recognition

Published: June 1994

Volume 71, pages 105–114, (1994)
Cite this article

Biological Cybernetics Aims and scope Submit manuscript

Fang Liu¹,
Yoko Yamaguchi² &
Hiroshi Shimizu³

61 Accesses
3 Altmetric
Explore all metrics

Abstract

We propose a new model for speaker-independent vowel recognition which uses the flexibility of the dynamic linking that results from the synchronization of oscillating neural units. The system consists of an input layer and three neural layers, which are referred to as the A-, B- and C-centers. The input signals are a time series of linear prediction (LPC) spectrum envelopes of auditory signals. At each time-window within the series, the A-center receives input signals and extracts local peaks of the spectrum envelope, i.e., formants, and encodes them into local groups of independent oscillations. Speaker-independent vowel characteristics are embedded as a connection matrix in the B-center according to statistical data of Japanese vowels. The associative interaction in the B-center and reciprocal interaction between the A- and B-centers selectively activate a vowel as a global synchronized pattern over two centers. The C-center evaluates the synchronized activities among the three formant regions to give the selective output of the category among the five Japanese vowels. Thus, a flexible ability of dynamical linking among features is achieved over the three centers. The capability in the present system was investigated for speaker-independent recognition of Japanese vowels. The system demonstrated a remarkable ability for the recognition of vowels very similar to that of human listeners, including misleading vowels. In addition, it showed stable recognition for unsteady input signals and robustness against background noise. The optimum condition of the frequency of oscillation is discussed in comparison with stimulus-dependent synchronizations observed in neurophysiological experiments of the cortex.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Midbrain Synchrony to Envelope Structure Supports Behavioral Sensitivity to Single-Formant Vowel-Like Sounds in Noise

Article 20 October 2016

Text-Independent Speaker Identification Using Vowel Formants

Article Open access 05 May 2015

Speech rhythms and their neural foundations

Article 06 May 2020

References

Chernikoff R, Brogden WJ (1949) The effect of response termination of stimulus upon reaction time. J Comp Physiol Psychol 42:357–364
Google Scholar
Cherry EC (1953) Some experiments on the recognition of speech, with one and with two ears. J Acoust Soc Am 25:975–979
Article Google Scholar
Eckhorn R, Bauer R, Jordan W, Brosch M, Kruse W, Munk M, Reitboeck HJ (1988) Coherent oscillations: a mechanism of feature linking in the visual cortex? Biol Cybern 60:121–130
Article PubMed Google Scholar
Eckhorn R, Reitboeck HJ, Arndt M, Dicke P (1990) Feature linking via synchronization among distributed assemblies: simulations of results from cat visual cortex. Neural Comput 2:293–307
Google Scholar
Fant G (1966) A note on vocal tract size factors and non-uniform f-pattern scalings. Q Prog Status Rep Speech Transmission Lab 4:22–30
Google Scholar
Freeman WJ (1975) Mass action in the nervous system. Academic, New York
Google Scholar
Fujisaki H, Nakamura N (1969) Normalization and recognition of vowels. Annual report of the Engineering Research Institute, University of Tokyo 28:61–66
Google Scholar
Fukunishi K, Murai N, Uno H (1992) Dynamic characteristics of the auditory cortex of guinea pigs observed with multichannel optical recording. Biol Cybern 67:501–509
Article PubMed Google Scholar
Fukunishi K, Murai N, Uno H, Miyashita T (1993) Cortical neural networks revealed by spatiotemporal neural observation and analysis on guinea pig auditory cortex. IJCNN, pp 73–76
Gray CM, Singer W (1987) Stimulus-specific neuronal oscillations in the cat visual cortex: a cortical functional unit. Soc Neurosci Abstr 13:404.
Google Scholar
Gray CM, Koenig P, Engel AK, Singer W (1989) Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties. Nature 338:334–337
Article PubMed Google Scholar
Irino T, Kawahara H (1990) A method for designing neural networks using nonlinear multivariate analysis: application to speaker-independent vowel recognition. Neural Comput 2:386–397
Google Scholar
Kasuya H, Suzuki H, Kido K. (1968) Changes in pitch and first three formant frequencies of five Japanese vowels with age and sex of speakers (in Japanese). J Acoust Soc Jpn 24:355–364
Google Scholar
Koenig P, Schillen TB (1991) Stimulus-dependent assembly formation of oscillatory responses. I. Synchronization. Neural Comput 3:155–166
Google Scholar
Malsburg C von der, Buhmann J (1992) Sensory segmentation with coupled neural oscillators. Biol Cybern 67:233–242
Article PubMed Google Scholar
Malsburg C von der, Schneider W (1986) A neural cocktail-party processor. Biol Cybern 54:29–40
Article PubMed Google Scholar
Pantev C, Makeig S, Hoke M, Galambos R, Hampson S, Gallen C (1991) Human auditory evoked gamma-band magnetic fields. Proc Natl Acad Sci USA 88:8996–9000
PubMed Google Scholar
Ribary U, Ioannides AA, Singh KD, Hasson R, Bolton JPR, Lado F, Mogilner A, Llinas R (1991) Magnetic field tomography of coherent thalamocortical 40-Hz oscillations in humans. Proc Natl Acad Sci USA 88:11037–11041
PubMed Google Scholar
Shimizu H, Yamaguchi Y (1987) Synergetic computers and holonics-information dynamics of a semantic computer. Physics Scripta 36:970–985
Google Scholar
Shimizu H, Yamaguchi Y (1991) The self-organization of neuronal representations of semantic information of vision. In: Holden AV, Kryukov VI (eds) Neurocomputers and attention I. Neurobiology, synchronization and chaos. Manchester University Press, Manchester, pp 383–403
Google Scholar
Shimizu H, Yamaguchi Y, Tsuda I, Yano M (1985) Pattern recognition based on holonic information dynamics towards synergetic computers. In: Haken H (eds) Complex system-operational approaches. Springer, Berlin Heidelberg New York, pp. 225–239
Google Scholar
Sporns O, Gaily JA, Reeke GN Jr, Edelman GM (1989) Reentrant signaling among simulated neuronal groups leads to coherency in their oscillatory activity. Proc Natl Acad Sci USA 86:7265–7269
PubMed Google Scholar
Stevens SS, Volkmann J (1940) The relation of pitch to frequency: a revised scale. Am J Psychol 53:329–353
Google Scholar
Syrdal AK, Gopal HS (1986) A perceptual model of vowel recognition based on the auditory representation of American English vowels. J Acoust Soc Am 79:1086–1100
Article PubMed Google Scholar
Taniguchi I, Horikawa J, Moriyama T, Nasu M (1992) Spatio-temporal pattern of frequency representation in the auditory cortex. Neurosci Lett 146:37–40
Article PubMed Google Scholar
Tiitinen H, Sinkkonen J, Reinikainen K, Alho K, Lavikainen J, Naatanen R (1993) Selective attention enhances the auditory 40-Hz transient response in humans. Nature 364:59–60
Article PubMed Google Scholar
Yamaguchi Y, Shimizu H (1993) Pattern recognition with figureground separation by generation of coherent oscillations. Neuron Netw (in press)

Download references

Author information

Authors and Affiliations

Faculty of Pharmaceutical Sciences, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, 113, Tokyo, Japan
Fang Liu
Department of Information Sciences, College of Science and Engineering, Tokyo Denki University, Hatoyama, 350-03, Saitama, Japan
Yoko Yamaguchi
The “Ba” Research Institute, Kanazawa Institute of Technology & International Media Research Foundation, Jingu-mae, 1-15-13, Shibuya-ku, 150, Tokyo, Japan
Hiroshi Shimizu

Authors

Fang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yoko Yamaguchi
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Shimizu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, F., Yamaguchi, Y. & Shimizu, H. Flexible vowel recognition by the generation of dynamic coherence in oscillator neural networks: speaker-independent vowel recognition. Biol. Cybern. 71, 105–114 (1994). https://doi.org/10.1007/BF00197313

Download citation

Received: 20 July 1993
Accepted: 22 December 1993
Issue Date: June 1994
DOI: https://doi.org/10.1007/BF00197313

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Flexible vowel recognition by the generation of dynamic coherence in oscillator neural networks: speaker-independent vowel recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Midbrain Synchrony to Envelope Structure Supports Behavioral Sensitivity to Single-Formant Vowel-Like Sounds in Noise

Text-Independent Speaker Identification Using Vowel Formants

Speech rhythms and their neural foundations

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Flexible vowel recognition by the generation of dynamic coherence in oscillator neural networks: speaker-independent vowel recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Midbrain Synchrony to Envelope Structure Supports Behavioral Sensitivity to Single-Formant Vowel-Like Sounds in Noise

Text-Independent Speaker Identification Using Vowel Formants

Speech rhythms and their neural foundations

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation