Abstract
The traditional approach to automatic speech recognition continues to push the limits of its implementation. The multimodal approach to audio-visual speech recognition and its neuromorphic computational modeling is a novel data driven paradigm that will lead towards zero instruction set computing and will enable proactive capabilities in audio-visual recognition systems. An engineering-oriented deployment of the audio-visual processing framework is discussed in this paper, proposing a bimodal speech recognition framework to process speech utterances and lip reading data, applying soft computing paradigms according to a bio-inspired and the holistic modeling of speech.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Allen, J.B.: How do humans process and recognize speech? IEEE Trans. Speech Audio Process. 2(4), 567–577 (1994)
McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature 264, 746–748 (1976)
Massaro, D.: Speech perception by ear and eye: A paradigm for psychological enquiry. Erlbaum, London (1987)
Norrix, L.W., Green, K.P.: Auditory-visual context effects on the perception of /r/ and /l/ in a stop cluster. J. Acoust. Soc. Am. 99, 2951 (1996)
Bernstain, L.: Visual speech perception. Audio Visual Speech Processing, pp. 21–39 (2012)
Cappelletta, L., Harte, N.: Phoneme-to-viseme mapping for visual speech recognition. In: Proceedings of the International Conference on Pattern Recognition Applications and Methods. (IEEE) (2012)
Kazemi, A., Boostani, R., Sobhanmanesh, F.: Audio visual speech source separation via improoved context dependent association model. EURASIP J. Adv. Sig. Process., 47 (2014)
Vigliocco, G., Perniss, P., Vinson, D.: Language as a multimodal phenomenon: implications for language learning, processing and evolution. Philos. Trans. R. Soc. Lond. B Biol. Sci. 369(1651), 20130292 (2014)
Sainath, T.N., Mohamed, A., Kingsbury, B., Ramabhadran, B.; Deep convolutional neural networks for LVCSR. In: Proceedings of ICASSP, May 2013
Wysoski, S.G., Benuskova, L., Kasabov, N.: Adaptive spiking neural networks for audiovisual pattern recognition. In: Ishikawa, M., Doya, K., Miyamoto, H., Yamakawa, T. (eds.) ICONIP 2007, Part II. LNCS, vol. 4985, pp. 406–415. Springer, Heidelberg (2008)
Zouhir, Y., Ouni, K.: A bio-inspired feature extraction for robust speech recognition. SpringerPlus 3, 651 (2014)
Lawrence, S.: Face recognition: a convolutional neural- network approach. IEEE Trans. Neural Netw. 8(1), 98–113 (1997)
Abdel-Hamid, O., Mohamed, A., Jiang, H., Penn, G.: Applying convolutional neural network concepts to Hybrid NN- HMMModel for speech recognition. In: Proceedings of ICASSP (2012)
Kasabov, N.: Evolving Connectionist Systems: The knoledge engineering approach. Springer, Heidelberg (2007)
Malcangi, M., Grew, P.: Evolving fuzzy-neural method for multimodal speech recognition. In: Iliadis, L., et al. (eds.) EANN 2015. CCIS, vol. 517, pp. 216–227. Springer, Heidelberg (2015). doi:10.1007/978-3-319-23983-5_21
http://www.kedri.aut.ac.nz/areas-of-expertise/data-mining-and-decision-support-systems/neuco
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Malcangi, M., Quan, H. (2016). Bio-inspired Audio-Visual Speech Recognition Towards the Zero Instruction Set Computing. In: Jayne, C., Iliadis, L. (eds) Engineering Applications of Neural Networks. EANN 2016. Communications in Computer and Information Science, vol 629. Springer, Cham. https://doi.org/10.1007/978-3-319-44188-7_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-44188-7_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44187-0
Online ISBN: 978-3-319-44188-7
eBook Packages: Computer ScienceComputer Science (R0)