Abstract
From physiology we learn that the auditory system extracts simultaneous features from the underlying signal, giving birth to simultaneous representations of audible signals. We also learn that pattern analysis and recognition are not separated processes (in opposition to the engineering approach of pattern recognition where analysis and recognition are usually separated processes). Furthermore, in the visual system, it has been observed that the sequence order of firing is crucial to perform fast visual recognition tasks (Rank Order Coding). The use of the Rank Order Coding has also been recently hypothesized in the mammalian auditory system. In a first application we compare a very simplistic speech recognition prototype that uses the Rank Order Coding with a conventional Hidden Markov Model speech recognizer. It is also shown that the type of neurons being used should be adapted to the type of phonemes (consonants/transients or vowels/stable) to be recognized.
In a second application, we combine a simultaneous auditory images representation with a network of oscillatory spiking neurons to segregate and bind auditory objects for acoustical source separation. It is shown that the spiking neural network performs unsupervised auditory images segmentation (to find ’auditory’ objects) and binding of the objects belonging to the same auditory source (yielding automatic sound source separation).
This work has been funded by NSERC and Université de Sherbrooke. S. Loiselle has been funded by FQRNT of Québec for the year 2006.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Shamma, S.: Physiological foundations of temporal integration in the perception of speech. Journal of Phonetics 31, 495–501 (2003)
Zotkin, D.N., Chi, T., Shamma, S.A., Duraiswami, R.: Neuromimetic sound representation for percept detection and manipulation. EURASIP Journal on Applied Signal Processing, Special Issue on Anthropomorphic Processing of Audio and Speech, 1350–1364 (June 2005)
Elhilali, M., Shamma, S.: A biologically-inspired approach to the cocktail party problem. In: ICASSP, vol. V, pp. 637–640 (2006)
Bird, S., Harrington, J.: Speech annotation and corpus tools. Speech Communication Journal (Special Issue on Speech annotation and corpus tools) 33(1-2), 1–4 (2001)
http://www.elda.fr (2004)
Karlsen, B.L., Brown, G.J., Cooke, M., Crawford, M., Green, P., Renals, S.: Analysis of a Multi-Simultaneous-Speaker Corpus. Lawrence Erlbaum, Mahwah (1998)
Henkel, C.K.: The Auditory System. In: Haines, D.E. (ed.) Fondamental Neuroscience, Churchill Livingstone, Edinburgh (1997)
Tang, P., Rouat, J.: Modeling neurons in the anteroventral cochlear nucleus for amplitude modulation (AM) processing: Application to speech sound. In: Proc. Int. Conf. on Spok. Lang. Proc., Oct. 1996, pp. 562–565 (1996)
VanRullen, R., Guyonneau, R., Thorpe, S.J.: Spike times make sense. Trends in Neurosciences 28(1), 4 (2005)
Natschläger, T., Maass, W.: Information dynamics and emergent computation in recurrent circuits of spiking. In: NIPS (December 2003)
Thorpe, S., Fize, D., Marlot, C.: Speed of processing in the human visual system. Nature 381(6582), 520–522 (1996)
DeWeese, M., Wehr, M., Zador, A.: Binary spiking in auditory cortex. The Journal of Neuroscience 23(21), 7940–7949 (2003)
Bregman, A.: Auditory Scene Analysis. MIT Press, Cambridge (1994)
VanRullen, R., Thorpe, S.J.: Surfing a spike wave down the ventral stream. Vision Research 42(23), 2593–2615 (2002)
Perrinet, L.: Comment déchiffrer le code impulsionnel de la Vision? Étude du flux parallèle, asynchrone et épars dans le traitement visuel ultra-rapide. PhD thesis, Université Paul Sabatier (2003)
Patterson, R.D.: Auditory filter shapes derived with noise stimuli. JASA 59(3), 640–654 (1976)
Yang, X., Wang, K., Shamma, S.: Auditory representations of acoustic signals. IEEE Tr. on information theory 38(2), 824–839 (1992)
Thorpe, S., Delorme, A., Van Rullen, R.: Spike-based strategies for rapid processing. Neural Networks 14(6-7), 715–725 (2001)
von der Malsburg, C.: The what and why of binding: The modeler’s perspective. Neuron, 95–104 (1999)
Riesenhuber, M., Poggio, T.: Are cortical models really bound by the binding problem? Neuron 24, 87–93 (1999)
Reynolds, J., Desimone, R.: The role of neural mechanisms of attention in solving the binding problem. Neuron 24, 19–29 (1999)
Milner, P.M.: A model for visual shape recognition. Psychological Review 81, 521–535 (1974)
von der Malsburg, C., Schneider, W.: A neural cocktail-party processor. Biol. Cybern., 29–40 (1986)
von der Malsburg, C.: The correlation theory of brain function. Technical Report Internal Report 81-2, Max-Planck Institute for Biophysical Chemistry (1981)
Maass, W.: Networks of spiking neurons: The third generation of neural network models. Neural Networks 10(9), 1659–1671 (1997)
Haines, D.E. (ed.): Fondamental Neuroscience. Churchill Livingstone, Edinburgh (1997)
Pichevar, R., Rouat, J., Feldbauer, C., Kubin, G.: A bio-inspired sound source separation technique in combination with an enhanced FIR gammatone Analysis/Synthesis filterbank. In: EUSIPCO, Vienna (2004)
Pichevar, R., Rouat, J.: Cochleotopic/AMtopic (CAM) and Cochleotopic/Spectrotopic (CSM) map based sound source separation using relaxation oscillatory neurons. In: IEEE Neural Networks for Signal Processing Workshop, Toulouse, France (2003)
Wang, D.-L., Brown, G.J. (eds.): Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley, Chichester (2006)
Rouat, J., Pichevar, R.: Source separation with one ear: Proposition for an anthropomorphic approach. EURASIP Journal on Applied Signal Processing, 1365–1373 (June 2005)
Pichevar, R., Rouat, J.: A Quantitative Evaluation of a Bio-inspired Sound Segregation Technique for Two- and Three-Source Mixtures. In: Chollet, G., Esposito, A., Faúndez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling and Applications. LNCS (LNAI), vol. 3445, pp. 392–396. Springer, Heidelberg (2005)
Pichevar, R.: http://www-edu.gel.usherbrooke.ca/pichevar/Demos.htm
Rouat, J.: http://www.gel.usherb.ca/rouat
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this chapter
Cite this chapter
Rouat, J., Loiselle, S., Pichevar, R. (2007). Towards Neurocomputational Speech and Sound Processing. In: Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds) Progress in Nonlinear Speech Processing. Lecture Notes in Computer Science, vol 4391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71505-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-71505-4_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71503-0
Online ISBN: 978-3-540-71505-4
eBook Packages: Computer ScienceComputer Science (R0)