Towards Neurocomputational Speech and Sound Processing

Rouat, Jean; Loiselle, Stéphane; Pichevar, Ramin

doi:10.1007/978-3-540-71505-4_4

Towards Neurocomputational Speech and Sound Processing

Jean Rouat¹,
Stéphane Loiselle¹ &
Ramin Pichevar^1,2

Chapter

1123 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4391))

Abstract

From physiology we learn that the auditory system extracts simultaneous features from the underlying signal, giving birth to simultaneous representations of audible signals. We also learn that pattern analysis and recognition are not separated processes (in opposition to the engineering approach of pattern recognition where analysis and recognition are usually separated processes). Furthermore, in the visual system, it has been observed that the sequence order of firing is crucial to perform fast visual recognition tasks (Rank Order Coding). The use of the Rank Order Coding has also been recently hypothesized in the mammalian auditory system. In a first application we compare a very simplistic speech recognition prototype that uses the Rank Order Coding with a conventional Hidden Markov Model speech recognizer. It is also shown that the type of neurons being used should be adapted to the type of phonemes (consonants/transients or vowels/stable) to be recognized.

In a second application, we combine a simultaneous auditory images representation with a network of oscillatory spiking neurons to segregate and bind auditory objects for acoustical source separation. It is shown that the spiking neural network performs unsupervised auditory images segmentation (to find ’auditory’ objects) and binding of the objects belonging to the same auditory source (yielding automatic sound source separation).

This work has been funded by NSERC and Université de Sherbrooke. S. Loiselle has been funded by FQRNT of Québec for the year 2006.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Shamma, S.: Physiological foundations of temporal integration in the perception of speech. Journal of Phonetics 31, 495–501 (2003)
Article Google Scholar
Zotkin, D.N., Chi, T., Shamma, S.A., Duraiswami, R.: Neuromimetic sound representation for percept detection and manipulation. EURASIP Journal on Applied Signal Processing, Special Issue on Anthropomorphic Processing of Audio and Speech, 1350–1364 (June 2005)
Google Scholar
Elhilali, M., Shamma, S.: A biologically-inspired approach to the cocktail party problem. In: ICASSP, vol. V, pp. 637–640 (2006)
Google Scholar
Bird, S., Harrington, J.: Speech annotation and corpus tools. Speech Communication Journal (Special Issue on Speech annotation and corpus tools) 33(1-2), 1–4 (2001)
Article Google Scholar
http://www.elda.fr (2004)
Karlsen, B.L., Brown, G.J., Cooke, M., Crawford, M., Green, P., Renals, S.: Analysis of a Multi-Simultaneous-Speaker Corpus. Lawrence Erlbaum, Mahwah (1998)
Google Scholar
Henkel, C.K.: The Auditory System. In: Haines, D.E. (ed.) Fondamental Neuroscience, Churchill Livingstone, Edinburgh (1997)
Google Scholar
Tang, P., Rouat, J.: Modeling neurons in the anteroventral cochlear nucleus for amplitude modulation (AM) processing: Application to speech sound. In: Proc. Int. Conf. on Spok. Lang. Proc., Oct. 1996, pp. 562–565 (1996)
Google Scholar
VanRullen, R., Guyonneau, R., Thorpe, S.J.: Spike times make sense. Trends in Neurosciences 28(1), 4 (2005)
Article Google Scholar
Natschläger, T., Maass, W.: Information dynamics and emergent computation in recurrent circuits of spiking. In: NIPS (December 2003)
Google Scholar
Thorpe, S., Fize, D., Marlot, C.: Speed of processing in the human visual system. Nature 381(6582), 520–522 (1996)
Article Google Scholar
DeWeese, M., Wehr, M., Zador, A.: Binary spiking in auditory cortex. The Journal of Neuroscience 23(21), 7940–7949 (2003)
Google Scholar
Bregman, A.: Auditory Scene Analysis. MIT Press, Cambridge (1994)
Google Scholar
VanRullen, R., Thorpe, S.J.: Surfing a spike wave down the ventral stream. Vision Research 42(23), 2593–2615 (2002)
Article Google Scholar
Perrinet, L.: Comment déchiffrer le code impulsionnel de la Vision? Étude du flux parallèle, asynchrone et épars dans le traitement visuel ultra-rapide. PhD thesis, Université Paul Sabatier (2003)
Google Scholar
Patterson, R.D.: Auditory filter shapes derived with noise stimuli. JASA 59(3), 640–654 (1976)
Google Scholar
Yang, X., Wang, K., Shamma, S.: Auditory representations of acoustic signals. IEEE Tr. on information theory 38(2), 824–839 (1992)
Article Google Scholar
Thorpe, S., Delorme, A., Van Rullen, R.: Spike-based strategies for rapid processing. Neural Networks 14(6-7), 715–725 (2001)
Article Google Scholar
von der Malsburg, C.: The what and why of binding: The modeler’s perspective. Neuron, 95–104 (1999)
Google Scholar
Riesenhuber, M., Poggio, T.: Are cortical models really bound by the binding problem? Neuron 24, 87–93 (1999)
Article Google Scholar
Reynolds, J., Desimone, R.: The role of neural mechanisms of attention in solving the binding problem. Neuron 24, 19–29 (1999)
Article Google Scholar
Milner, P.M.: A model for visual shape recognition. Psychological Review 81, 521–535 (1974)
Article Google Scholar
von der Malsburg, C., Schneider, W.: A neural cocktail-party processor. Biol. Cybern., 29–40 (1986)
Google Scholar
von der Malsburg, C.: The correlation theory of brain function. Technical Report Internal Report 81-2, Max-Planck Institute for Biophysical Chemistry (1981)
Google Scholar
Maass, W.: Networks of spiking neurons: The third generation of neural network models. Neural Networks 10(9), 1659–1671 (1997)
Article Google Scholar
Haines, D.E. (ed.): Fondamental Neuroscience. Churchill Livingstone, Edinburgh (1997)
Google Scholar
Pichevar, R., Rouat, J., Feldbauer, C., Kubin, G.: A bio-inspired sound source separation technique in combination with an enhanced FIR gammatone Analysis/Synthesis filterbank. In: EUSIPCO, Vienna (2004)
Google Scholar
Pichevar, R., Rouat, J.: Cochleotopic/AMtopic (CAM) and Cochleotopic/Spectrotopic (CSM) map based sound source separation using relaxation oscillatory neurons. In: IEEE Neural Networks for Signal Processing Workshop, Toulouse, France (2003)
Google Scholar
Wang, D.-L., Brown, G.J. (eds.): Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley, Chichester (2006)
Google Scholar
Rouat, J., Pichevar, R.: Source separation with one ear: Proposition for an anthropomorphic approach. EURASIP Journal on Applied Signal Processing, 1365–1373 (June 2005)
Google Scholar
Pichevar, R., Rouat, J.: A Quantitative Evaluation of a Bio-inspired Sound Segregation Technique for Two- and Three-Source Mixtures. In: Chollet, G., Esposito, A., Faúndez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling and Applications. LNCS (LNAI), vol. 3445, pp. 392–396. Springer, Heidelberg (2005)
Google Scholar
Pichevar, R.: http://www-edu.gel.usherbrooke.ca/pichevar/Demos.htm
Rouat, J.: http://www.gel.usherb.ca/rouat

Download references

Author information

Authors and Affiliations

Université de Sherbrooke,
Jean Rouat, Stéphane Loiselle & Ramin Pichevar
Communications Research Centre, Ottawa,
Ramin Pichevar

Authors

Jean Rouat
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Loiselle
View author publications
You can also search for this author in PubMed Google Scholar
Ramin Pichevar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Yannis Stylianou Marcos Faundez-Zanuy Anna Esposito

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rouat, J., Loiselle, S., Pichevar, R. (2007). Towards Neurocomputational Speech and Sound Processing. In: Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds) Progress in Nonlinear Speech Processing. Lecture Notes in Computer Science, vol 4391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71505-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-71505-4_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71503-0
Online ISBN: 978-3-540-71505-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics