Abstract
Information processing and responding to sensory input with appropriate actions are among the most important capabilities of the brain and the brain has specific areas that deal with auditory or visual processing. The auditory information is sent first to the cochlea, then to the inferior colliculus area and then later to the auditory cortex where it is further processed so that then eyes, head or both can be turned towards an object or location in response. The visual information is processed in the retina, various subsequent nuclei and then the visual cortex before again actions will be performed. However, how is this information integrated and what is the effect of auditory and visual stimuli arriving at the same time or at different times? Which information is processed when and what are the responses for multimodal stimuli? Multimodal integration is first performed in the Superior Colliculus, located in a subcortical part of the midbrain. In this chapter we will focus on this first level of multimodal integration, outline various approaches of modelling the superior colliculus, and suggest a model of multimodal integration of visual and auditory information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Green, A., Eklundh, K.S.: Designing for Learnability in Human-Robot Communication. IEEE Transactions on Industrial Electronics 50(4), 644–650 (2003)
Arai, K., Keller, E.L., Edelman, J.A.: A Neural Network Model of Saccade Generation Using Distributed Dynamic Feedback to Superior Colliculus. In: Proceedings of International Joint Conference on Neural Networks, pp. 53–56 (1993)
Stein, B.E., Meredith, M.A.: The Merging of the Senses. Cognitive Neuroscience Series. MIT Press, Cambridge (1993)
Quaia, C., Lefevre, P., Optican, L.M.: Model of the Control of Saccades by Superior Colliculus and Cerebellum. Journal of Neurophysiology 82(2), 999–1018 (1999)
Cuppini, C., Magosso, E., Serino, A., Pellegrino, G.D., Ursino, M.: A Neural Network for the Analysis of Multisensory Integration in the Superior Colliculus. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D.P. (eds.) ICANN 2007, Part II. LNCS, vol. 4669, pp. 9–11. Springer, Heidelberg (2007)
Gilbert, C., Kuenen, L.P.S.: Multimodal Integration: Visual Cues Help Odour-Seeking Fruit Flies. Current Biology 18, 295–297 (2008)
Fitzpatrick, D.C., Kuwada, S., Batra, R.: Transformations in processing Interaural time difference between the superior olivary complex and inferior colliculus: beyond Jeffress model. Hearing Research 168, 79–89 (2002)
Massaro, D.W.: A Framework for Evaluating Multimodal integration by Humans and A Role for Embodied Conversational Agents. In: Proceedings of the 6th International Conference on Multimodal Interfaces (ICMI 2004), pp. 24–31 (2004)
Droulez, J., Berthoz, A.: A neural network model of sensoritopic maps with predictive short-term memory properties. In: Proceedings of the National Academy of Sciences, USA, Neurobiology, vol. 88, pp. 9653–9657 (1991)
Girard, B., Berthoz, A.: From brainstem to cortex: Computational models of saccade generation circuitry. Progress in Neurobiology 77, 215–251 (2005)
Gurney, K.: Integrative computation for autonomous agents: a novel approach based on the vertebrate brain. Talk presented at EPSRC Novel computation initiative meeting (2003)
Yavuz, H.: An integrated approach to the conceptual design and development of an intelligent autonomous mobile robot. Robotics and Autonomous Systems 55, 498–512 (2007)
Hanheide, M., Bauckhage, C., Sagerer, G.: Combining Environmental Cues & Head Gestures to Interact with Wearable Devices. In: Proceedings of 7th International Conference on Multimodal Interfaces, pp. 25–31 (2005)
Laubrock, J., Engbert, R., Kliegl, R.: Fixational eye movements predict the perceived direction of ambiguous apparent motion. Journal of Vision 8(14), 1–17 (2008)
Lewald, J., Ehrenstein, W.H., Guski, R.: Spatio-temporal constraints for auditory-visual integration. Behavioural Brain Research 121(1-2), 69–79 (2001)
Wolf, J.C., Bugmann, G.: Linking Speech and Gesture in Multimodal Instruction Systems. In: The 15th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2006), UK (2006)
Maas, J.F., Spexard, T., Fritsch, J., Wrede, B., Sagerer, G.: BIRON, What’s the topic? – A Multi-Modal Topic Tracker for improved Human-Robot Interaction. In: The 15th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2006), UK (2006)
Armingol, J.M., del la Escalera, A., Hilario, C., Collado, J.M., Carrasco, J.P., Flores, M.J., Postor, J.M., Rodriguez, F.J.: IVVI: Intelligent vehicle based on visual information. Robotics and Autonomous Systems 55, 904–916 (2007)
Pavon, J., Gomez-Sanz, J., Fernandez-Caballero, A., Valencia-Jimenez, J.J.: Development of intelligent multisensor surveillance systems with agents. Robotics and Autonomous Systems 55, 892–903 (2007)
Juan, C.H., Muggleton, N.G., Tzeng, O.J.L., Hung, D.L., Cowey, A., Walsh, V.: Segregation of Visual Selection and Saccades in Human. Cerebral Cortex 18(10), 2410–2415 (2008)
Groh, J.: Sight, sound processed together and earlier than previously thought, 919-660-1309, Duke University Medical Centre (2007) (Released, 29 October 2007)
Jolly, K.G., Ravindran, K.P., Vijayakumar, R., Sreerama Kumar, R.: Intelligent decision making in multi-agent robot soccer system through compounded artificial neural networks. Robotics and Autonomous Systems 55, 589–596 (2006)
King, A.J.: The Superior Colliculus. Current Biology 14(9), R335–R338 (2004)
Kohonen, T.: Self-Organized formation of Topographical correct feature Maps. Biological Cybernetics 43, 59–69 (1982)
Voutsas, K., Adamy, J.: A Biologically inspired spiking neural network for sound source lateralization. IEEE transactions on Neural Networks 18(6), 1785–1799 (2007)
Calms, L., Lakemeyer, G., Wagner, H.: Azimuthal sound localization using coincidence of timing across frequency on a robotic platform. Acoustical Society of America 121(4), 2034–2048 (2007)
Lee, M., Ban, S.-W., Cho, J.-K., Seo, C.-J., Jung, S.K.: Modeling of Saccadic Movements Using Neural Networks. In: International Joint Conference on Neural Networks, vol. 4, pp. 2386–2389 (1999)
Coen, M.H.: Multimodal Integration – A Biological View. In: Proceedings of the 15th International Joint Conference on Artificial Intelligence (IJCAI 2001), pp. 1414–1424 (2001)
Beauchamp, M.S., Lee, K.E., Argall, B.D., Martin, A.: Integration of Auditory and Visual Information about Objects in Superior Temporal Sulcus. Neuron 41, 809–823 (2004)
Bennewitz, M., Faber, F., Joho, D., Schreiber, M., Behnke, S.: Integrating Vision and Speech for Conversations with Multiple Persons. In: Proceedings of International Conference on Intelligent Robots and System, IROS (2005)
Murray, J., Erwin, H., Wermter, S.: A Hybrid Architecture using Cross-Correlation and Recurrent Neural Networks for Acoustic Tracking in Robots. In: Wermter, S., Palm, G., Elshaw, M. (eds.) Biomimetic Neural Learning for Intelligent Robots, pp. 55–73 (2005)
Paleari, M., Christine, L.L.: Toward Multimodal Fusion of Affective Cues. In: Proceedings of International Conference on Human Computer Multimodality HCM 2006, pp. 99–108 (2006)
Casey, M.C., Pavlou, A.: A Behavioural Model of Sensory Alignment in the Superficial and Deep Layers of the Superior Colliculus. In: Proceeding of International Joint Conference on Neural Networks (IJCNN 2008), pp. 2750–2755 (2008)
Mavridis, N., Roy, D.: Grounded Situation Models for Robots: Where words and percepts meets. In: Proceedings of International Conference on Intelligent Robots and Systems (IEEE/RSJ), pp. 4690–4697 (2006)
Kubota, N., Nishida, K., Kojima, H.: Perceptual System of A Partner Robot for Natural Communication Restricted by Environments. In: Proceedings of International Conference on Intelligent Robots and Systems (IEEE/RSJ), pp. 1038–1043 (2006)
Palanivel, S., Yegnanarayana, B.: Multimodal person authentication using speech, face and visual speech. Computer Vision and Image Understanding (IEEE) 109, 44–55 (2008)
Patton, P., Belkacem-Boussaid, K., Anastasio, T.J.: Multimodality in the superior colliculus: an information theoretic analysis. Cognitive Brain Research 14, 10–19 (2002)
Pattion, P.E., Anastasio, T.J.: Modeling Cross-Modal Enhancement and Modality-Specific Suppression in Multisensory Neurons. Neural Computation 15, 783–810 (2003)
Cucchiara, R.: Multimedia Surveillance Systems. In: 3rd International Workshop on Video Surveillance and Sensor Networks (VSSN 2005), Singapore, pp. 3–10 (2005) ISBN: 1-59593-242-9
Rothwell, J.C., Schmidt, R.F.: Experimental Brain Research, vol. 221. Springer, Heidelberg, SSN: 0014-4819
Schauer, C., Gross, H.M.: Design and Optimization of Amari Neural Fields for Early Auditory – Visual Integration. In: Proc. Int. Joint Conference on Neural Networks (IJCNN), Budapest, pp. 2523–2528 (2004)
Stiefelhagen, R.: Tracking focus of attention in meetings. In: International conference on Multimodal Interfaces (IEEE), Pittsburgh, PA, pp. 273–280 (2002)
Stiefelhagen, R., Bernardin, K., Ekenel, H.K., McDonough, J., Nickel, K., Voit, M., Wolfel, M.: Auditory-visual perception of a lecturer in a smart seminar room. Signal Processing 86, 3518–3533 (2006)
Wermter, S., Weber, C., Elshaw, M., Panchev, C., Erwin, H., Pulvermuller, F.: Towards multimodal neural robot learning. Robotics and Autonomous Systems 47, 171–175 (2004)
Stork, D.G., Wolff, G., Levine, E.: Neural Network lip reading system for improved speech recognition. In: Proc. Intl. Conf. Neural Networks (IJCNN 1992), vol. 2, pp. 289–295 (1992)
Steil, J.J., Rothling, F., Haschke, R., Ritter, H.: Situated robot learning for multi-modal instruction and imitation of grasping. Robotics and Autonomous Systems 47, 129–141 (2004)
Huwel, S., Wrede, B., Sagerer, G.: Robust Speech Understanding for Multi-Modal Human-Robot Communication. In: 15th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2006), UK, pp. 45–50 (2006)
Trappenberg, T.: A Model of the Superior Colliculus with Competing and Spiking Neurons., BSIS Technical Report, No. 98-3 (1998)
Anastasio, T.J., Patton, P.E., Belkacem-Baussaid, K.: Using Bayes’ Rule to Model Multisensory Enhancement in the Superior Colliculus. Neural Computation 12, 1165–1187 (2000)
Perrault Jr., T.J., William Vaughan, J., Stein, B.E., Wallace, M.T.: Superior Colliculus Neurons use Distinct Operational Modes in the Integration of Multisensory Stimuli. Journal of Neurophysiology 93, 2575–2586 (2005)
Stanford, T.R., Stein, B.E., Quessy, S.: Evaluating the Operations Underlying Multisensory Integration in the Cat Superior Colliculus. The Journal of Neuroscience 25(28), 6499–6508 (2005)
Spexard, T., Li, S., Booij, O., Zivkovic, Z.: BIRON, where are you?—Enabling a robot to learn new places in a real home environment by integrating spoken dialog and visual localization. In: Proceedings of International Conference on Intelligent Robots and Systems (IEEE/RSJ), pp. 934–940 (2006)
Trifa, V.M., Koene, A., Moren, J., Cheng, G.: Real-time acoustic source localization in noisy environments for human-robot multimodal interaction. In: Proceedings of RO-MAN 2007 (IEEE International Symposium on Robot & Human Interactive Communication), Korea, pp. 393–398 (2007)
Cutsuridis, V., Smyrnis, N., Evdokimidis, I., Perantonis, S.: A Neural Model of Decision-making by the Superior Colliculus in an Anti-saccade task. Neural Networks 20, 690–704 (2007)
Wallace, M.T., Meredith, M.A., Stein, B.E.: Multisensory Integration in the Superior Colliculus of the Alert Cat. Journal of Neurophysiology 80, 1006–1010 (1998)
Wilhelm, T., Bohme, H.J., Gross, H.M.: A Multi-modal system for tracking and analyzing faces on a mobile robot. Robotics and Autonomous Systems 48, 31–40 (2004)
Zou, X., Bhanu, B.: Tracking humans using Multi-modal Fusion. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2005), p. 4 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Ravulakollu, K., Knowles, M., Liu, J., Wermter, S. (2009). Towards Computational Modelling of Neural Multimodal Integration Based on the Superior Colliculus Concept. In: Bianchini, M., Maggini, M., Scarselli, F., Jain, L.C. (eds) Innovations in Neural Information Paradigms and Applications. Studies in Computational Intelligence, vol 247. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04003-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-04003-0_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04002-3
Online ISBN: 978-3-642-04003-0
eBook Packages: EngineeringEngineering (R0)