Skip to main content

Bio-inspired Audio-Visual Speech Recognition Towards the Zero Instruction Set Computing

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 629))

Abstract

The traditional approach to automatic speech recognition continues to push the limits of its implementation. The multimodal approach to audio-visual speech recognition and its neuromorphic computational modeling is a novel data driven paradigm that will lead towards zero instruction set computing and will enable proactive capabilities in audio-visual recognition systems. An engineering-oriented deployment of the audio-visual processing framework is discussed in this paper, proposing a bimodal speech recognition framework to process speech utterances and lip reading data, applying soft computing paradigms according to a bio-inspired and the holistic modeling of speech.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Allen, J.B.: How do humans process and recognize speech? IEEE Trans. Speech Audio Process. 2(4), 567–577 (1994)

    Article  Google Scholar 

  2. McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature 264, 746–748 (1976)

    Article  Google Scholar 

  3. Massaro, D.: Speech perception by ear and eye: A paradigm for psychological enquiry. Erlbaum, London (1987)

    Google Scholar 

  4. Norrix, L.W., Green, K.P.: Auditory-visual context effects on the perception of /r/ and /l/ in a stop cluster. J. Acoust. Soc. Am. 99, 2951 (1996)

    Article  Google Scholar 

  5. Bernstain, L.: Visual speech perception. Audio Visual Speech Processing, pp. 21–39 (2012)

    Google Scholar 

  6. Cappelletta, L., Harte, N.: Phoneme-to-viseme mapping for visual speech recognition. In: Proceedings of the International Conference on Pattern Recognition Applications and Methods. (IEEE) (2012)

    Google Scholar 

  7. Kazemi, A., Boostani, R., Sobhanmanesh, F.: Audio visual speech source separation via improoved context dependent association model. EURASIP J. Adv. Sig. Process., 47 (2014)

    Google Scholar 

  8. Vigliocco, G., Perniss, P., Vinson, D.: Language as a multimodal phenomenon: implications for language learning, processing and evolution. Philos. Trans. R. Soc. Lond. B Biol. Sci. 369(1651), 20130292 (2014)

    Article  Google Scholar 

  9. Sainath, T.N., Mohamed, A., Kingsbury, B., Ramabhadran, B.; Deep convolutional neural networks for LVCSR. In: Proceedings of ICASSP, May 2013

    Google Scholar 

  10. Wysoski, S.G., Benuskova, L., Kasabov, N.: Adaptive spiking neural networks for audiovisual pattern recognition. In: Ishikawa, M., Doya, K., Miyamoto, H., Yamakawa, T. (eds.) ICONIP 2007, Part II. LNCS, vol. 4985, pp. 406–415. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Zouhir, Y., Ouni, K.: A bio-inspired feature extraction for robust speech recognition. SpringerPlus 3, 651 (2014)

    Article  Google Scholar 

  12. Lawrence, S.: Face recognition: a convolutional neural- network approach. IEEE Trans. Neural Netw. 8(1), 98–113 (1997)

    Article  MathSciNet  Google Scholar 

  13. Abdel-Hamid, O., Mohamed, A., Jiang, H., Penn, G.: Applying convolutional neural network concepts to Hybrid NN- HMMModel for speech recognition. In: Proceedings of ICASSP (2012)

    Google Scholar 

  14. Kasabov, N.: Evolving Connectionist Systems: The knoledge engineering approach. Springer, Heidelberg (2007)

    MATH  Google Scholar 

  15. Malcangi, M., Grew, P.: Evolving fuzzy-neural method for multimodal speech recognition. In: Iliadis, L., et al. (eds.) EANN 2015. CCIS, vol. 517, pp. 216–227. Springer, Heidelberg (2015). doi:10.1007/978-3-319-23983-5_21

    Chapter  Google Scholar 

  16. http://www.kedri.aut.ac.nz/areas-of-expertise/data-mining-and-decision-support-systems/neuco

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mario Malcangi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Malcangi, M., Quan, H. (2016). Bio-inspired Audio-Visual Speech Recognition Towards the Zero Instruction Set Computing. In: Jayne, C., Iliadis, L. (eds) Engineering Applications of Neural Networks. EANN 2016. Communications in Computer and Information Science, vol 629. Springer, Cham. https://doi.org/10.1007/978-3-319-44188-7_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44188-7_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44187-0

  • Online ISBN: 978-3-319-44188-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics