Skip to main content

Towards Neurocomputational Speech and Sound Processing

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4391))

Abstract

From physiology we learn that the auditory system extracts simultaneous features from the underlying signal, giving birth to simultaneous representations of audible signals. We also learn that pattern analysis and recognition are not separated processes (in opposition to the engineering approach of pattern recognition where analysis and recognition are usually separated processes). Furthermore, in the visual system, it has been observed that the sequence order of firing is crucial to perform fast visual recognition tasks (Rank Order Coding). The use of the Rank Order Coding has also been recently hypothesized in the mammalian auditory system. In a first application we compare a very simplistic speech recognition prototype that uses the Rank Order Coding with a conventional Hidden Markov Model speech recognizer. It is also shown that the type of neurons being used should be adapted to the type of phonemes (consonants/transients or vowels/stable) to be recognized.

In a second application, we combine a simultaneous auditory images representation with a network of oscillatory spiking neurons to segregate and bind auditory objects for acoustical source separation. It is shown that the spiking neural network performs unsupervised auditory images segmentation (to find ’auditory’ objects) and binding of the objects belonging to the same auditory source (yielding automatic sound source separation).

This work has been funded by NSERC and Université de Sherbrooke. S. Loiselle has been funded by FQRNT of Québec for the year 2006.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Shamma, S.: Physiological foundations of temporal integration in the perception of speech. Journal of Phonetics 31, 495–501 (2003)

    Article  Google Scholar 

  2. Zotkin, D.N., Chi, T., Shamma, S.A., Duraiswami, R.: Neuromimetic sound representation for percept detection and manipulation. EURASIP Journal on Applied Signal Processing, Special Issue on Anthropomorphic Processing of Audio and Speech, 1350–1364 (June 2005)

    Google Scholar 

  3. Elhilali, M., Shamma, S.: A biologically-inspired approach to the cocktail party problem. In: ICASSP, vol. V, pp. 637–640 (2006)

    Google Scholar 

  4. Bird, S., Harrington, J.: Speech annotation and corpus tools. Speech Communication Journal (Special Issue on Speech annotation and corpus tools) 33(1-2), 1–4 (2001)

    Article  Google Scholar 

  5. http://www.elda.fr (2004)

  6. Karlsen, B.L., Brown, G.J., Cooke, M., Crawford, M., Green, P., Renals, S.: Analysis of a Multi-Simultaneous-Speaker Corpus. Lawrence Erlbaum, Mahwah (1998)

    Google Scholar 

  7. Henkel, C.K.: The Auditory System. In: Haines, D.E. (ed.) Fondamental Neuroscience, Churchill Livingstone, Edinburgh (1997)

    Google Scholar 

  8. Tang, P., Rouat, J.: Modeling neurons in the anteroventral cochlear nucleus for amplitude modulation (AM) processing: Application to speech sound. In: Proc. Int. Conf. on Spok. Lang. Proc., Oct. 1996, pp. 562–565 (1996)

    Google Scholar 

  9. VanRullen, R., Guyonneau, R., Thorpe, S.J.: Spike times make sense. Trends in Neurosciences 28(1), 4 (2005)

    Article  Google Scholar 

  10. Natschläger, T., Maass, W.: Information dynamics and emergent computation in recurrent circuits of spiking. In: NIPS (December 2003)

    Google Scholar 

  11. Thorpe, S., Fize, D., Marlot, C.: Speed of processing in the human visual system. Nature 381(6582), 520–522 (1996)

    Article  Google Scholar 

  12. DeWeese, M., Wehr, M., Zador, A.: Binary spiking in auditory cortex. The Journal of Neuroscience 23(21), 7940–7949 (2003)

    Google Scholar 

  13. Bregman, A.: Auditory Scene Analysis. MIT Press, Cambridge (1994)

    Google Scholar 

  14. VanRullen, R., Thorpe, S.J.: Surfing a spike wave down the ventral stream. Vision Research 42(23), 2593–2615 (2002)

    Article  Google Scholar 

  15. Perrinet, L.: Comment déchiffrer le code impulsionnel de la Vision? Étude du flux parallèle, asynchrone et épars dans le traitement visuel ultra-rapide. PhD thesis, Université Paul Sabatier (2003)

    Google Scholar 

  16. Patterson, R.D.: Auditory filter shapes derived with noise stimuli. JASA 59(3), 640–654 (1976)

    Google Scholar 

  17. Yang, X., Wang, K., Shamma, S.: Auditory representations of acoustic signals. IEEE Tr. on information theory 38(2), 824–839 (1992)

    Article  Google Scholar 

  18. Thorpe, S., Delorme, A., Van Rullen, R.: Spike-based strategies for rapid processing. Neural Networks 14(6-7), 715–725 (2001)

    Article  Google Scholar 

  19. von der Malsburg, C.: The what and why of binding: The modeler’s perspective. Neuron, 95–104 (1999)

    Google Scholar 

  20. Riesenhuber, M., Poggio, T.: Are cortical models really bound by the binding problem? Neuron 24, 87–93 (1999)

    Article  Google Scholar 

  21. Reynolds, J., Desimone, R.: The role of neural mechanisms of attention in solving the binding problem. Neuron 24, 19–29 (1999)

    Article  Google Scholar 

  22. Milner, P.M.: A model for visual shape recognition. Psychological Review 81, 521–535 (1974)

    Article  Google Scholar 

  23. von der Malsburg, C., Schneider, W.: A neural cocktail-party processor. Biol. Cybern., 29–40 (1986)

    Google Scholar 

  24. von der Malsburg, C.: The correlation theory of brain function. Technical Report Internal Report 81-2, Max-Planck Institute for Biophysical Chemistry (1981)

    Google Scholar 

  25. Maass, W.: Networks of spiking neurons: The third generation of neural network models. Neural Networks 10(9), 1659–1671 (1997)

    Article  Google Scholar 

  26. Haines, D.E. (ed.): Fondamental Neuroscience. Churchill Livingstone, Edinburgh (1997)

    Google Scholar 

  27. Pichevar, R., Rouat, J., Feldbauer, C., Kubin, G.: A bio-inspired sound source separation technique in combination with an enhanced FIR gammatone Analysis/Synthesis filterbank. In: EUSIPCO, Vienna (2004)

    Google Scholar 

  28. Pichevar, R., Rouat, J.: Cochleotopic/AMtopic (CAM) and Cochleotopic/Spectrotopic (CSM) map based sound source separation using relaxation oscillatory neurons. In: IEEE Neural Networks for Signal Processing Workshop, Toulouse, France (2003)

    Google Scholar 

  29. Wang, D.-L., Brown, G.J. (eds.): Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley, Chichester (2006)

    Google Scholar 

  30. Rouat, J., Pichevar, R.: Source separation with one ear: Proposition for an anthropomorphic approach. EURASIP Journal on Applied Signal Processing, 1365–1373 (June 2005)

    Google Scholar 

  31. Pichevar, R., Rouat, J.: A Quantitative Evaluation of a Bio-inspired Sound Segregation Technique for Two- and Three-Source Mixtures. In: Chollet, G., Esposito, A., Faúndez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling and Applications. LNCS (LNAI), vol. 3445, pp. 392–396. Springer, Heidelberg (2005)

    Google Scholar 

  32. Pichevar, R.: http://www-edu.gel.usherbrooke.ca/pichevar/Demos.htm

  33. Rouat, J.: http://www.gel.usherb.ca/rouat

Download references

Author information

Authors and Affiliations

Authors

Editor information

Yannis Stylianou Marcos Faundez-Zanuy Anna Esposito

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this chapter

Cite this chapter

Rouat, J., Loiselle, S., Pichevar, R. (2007). Towards Neurocomputational Speech and Sound Processing. In: Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds) Progress in Nonlinear Speech Processing. Lecture Notes in Computer Science, vol 4391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71505-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71505-4_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71503-0

  • Online ISBN: 978-3-540-71505-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics