Bio-inspired Audio-Visual Speech Recognition Towards the Zero Instruction Set Computing

Malcangi, Mario; Quan, Hao

doi:10.1007/978-3-319-44188-7_25

Bio-inspired Audio-Visual Speech Recognition Towards the Zero Instruction Set Computing

Mario Malcangi¹² &
Hao Quan¹²

Conference paper
First Online: 19 August 2016

2142 Accesses
1 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 629))

Abstract

The traditional approach to automatic speech recognition continues to push the limits of its implementation. The multimodal approach to audio-visual speech recognition and its neuromorphic computational modeling is a novel data driven paradigm that will lead towards zero instruction set computing and will enable proactive capabilities in audio-visual recognition systems. An engineering-oriented deployment of the audio-visual processing framework is discussed in this paper, proposing a bimodal speech recognition framework to process speech utterances and lip reading data, applying soft computing paradigms according to a bio-inspired and the holistic modeling of speech.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Allen, J.B.: How do humans process and recognize speech? IEEE Trans. Speech Audio Process. 2(4), 567–577 (1994)
Article Google Scholar
McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature 264, 746–748 (1976)
Article Google Scholar
Massaro, D.: Speech perception by ear and eye: A paradigm for psychological enquiry. Erlbaum, London (1987)
Google Scholar
Norrix, L.W., Green, K.P.: Auditory-visual context effects on the perception of /r/ and /l/ in a stop cluster. J. Acoust. Soc. Am. 99, 2951 (1996)
Article Google Scholar
Bernstain, L.: Visual speech perception. Audio Visual Speech Processing, pp. 21–39 (2012)
Google Scholar
Cappelletta, L., Harte, N.: Phoneme-to-viseme mapping for visual speech recognition. In: Proceedings of the International Conference on Pattern Recognition Applications and Methods. (IEEE) (2012)
Google Scholar
Kazemi, A., Boostani, R., Sobhanmanesh, F.: Audio visual speech source separation via improoved context dependent association model. EURASIP J. Adv. Sig. Process., 47 (2014)
Google Scholar
Vigliocco, G., Perniss, P., Vinson, D.: Language as a multimodal phenomenon: implications for language learning, processing and evolution. Philos. Trans. R. Soc. Lond. B Biol. Sci. 369(1651), 20130292 (2014)
Article Google Scholar
Sainath, T.N., Mohamed, A., Kingsbury, B., Ramabhadran, B.; Deep convolutional neural networks for LVCSR. In: Proceedings of ICASSP, May 2013
Google Scholar
Wysoski, S.G., Benuskova, L., Kasabov, N.: Adaptive spiking neural networks for audiovisual pattern recognition. In: Ishikawa, M., Doya, K., Miyamoto, H., Yamakawa, T. (eds.) ICONIP 2007, Part II. LNCS, vol. 4985, pp. 406–415. Springer, Heidelberg (2008)
Chapter Google Scholar
Zouhir, Y., Ouni, K.: A bio-inspired feature extraction for robust speech recognition. SpringerPlus 3, 651 (2014)
Article Google Scholar
Lawrence, S.: Face recognition: a convolutional neural- network approach. IEEE Trans. Neural Netw. 8(1), 98–113 (1997)
Article MathSciNet Google Scholar
Abdel-Hamid, O., Mohamed, A., Jiang, H., Penn, G.: Applying convolutional neural network concepts to Hybrid NN- HMMModel for speech recognition. In: Proceedings of ICASSP (2012)
Google Scholar
Kasabov, N.: Evolving Connectionist Systems: The knoledge engineering approach. Springer, Heidelberg (2007)
MATH Google Scholar
Malcangi, M., Grew, P.: Evolving fuzzy-neural method for multimodal speech recognition. In: Iliadis, L., et al. (eds.) EANN 2015. CCIS, vol. 517, pp. 216–227. Springer, Heidelberg (2015). doi:10.1007/978-3-319-23983-5_21
Chapter Google Scholar
http://www.kedri.aut.ac.nz/areas-of-expertise/data-mining-and-decision-support-systems/neuco

Download references

Author information

Authors and Affiliations

Department of Computer Science, Università degli Studi di Milano, Milan, Italy
Mario Malcangi & Hao Quan

Authors

Mario Malcangi
View author publications
You can also search for this author in PubMed Google Scholar
Hao Quan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mario Malcangi .

Editor information

Editors and Affiliations

Robert Gordon University, Aberdeen, United Kingdom
Chrisina Jayne
Lab of Forest Informatics (FiLAB), Democritus University of Thrace Lab of Forest Informatics (FiLAB), Orestiada, Greece
Lazaros Iliadis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Malcangi, M., Quan, H. (2016). Bio-inspired Audio-Visual Speech Recognition Towards the Zero Instruction Set Computing. In: Jayne, C., Iliadis, L. (eds) Engineering Applications of Neural Networks. EANN 2016. Communications in Computer and Information Science, vol 629. Springer, Cham. https://doi.org/10.1007/978-3-319-44188-7_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-44188-7_25
Published: 19 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44187-0
Online ISBN: 978-3-319-44188-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics