Elsevier

Neurocomputing

Volume 70, Issues 13–15, August 2007, Pages 2234-2244
Neurocomputing

Unique association between self-occlusion and double-touching towards binding vision and touch

https://doi.org/10.1016/j.neucom.2006.02.028Get rights and content

Abstract

Binding is one of the most fundamental cognitive functions, how to find the correspondence of sensations between different modalities such as vision and touch. Learning the multimodal representation of the body is supposed to be the first step toward binding since the morphological constraints on sensations during self-body-observation would make the binding problem tractable. In this paper, we address an issue of learning to match the foci of attention in vision and touch through self-body-observation. We propose the cross-anchoring Hebbian learning rule to uniquely associate double-touching and self-occlusion. Experiments with both the computer simulation and a real robot show the validity of the proposed method.

Introduction

Binding is one of the most fundamental cognitive functions, concerning how to find the correspondence of sensations between different modalities such as vision and touch. Both of these modalities are major sources of perceptual information not only about the external world but also about the agent's body itself. The latter is closely related to the body representation which is often, in developmental theories or robotic models, taken to be innate or given by the designer. However, the body representation is better conceived as an adaptable aspect of the infant's responsiveness to changes in the environment, and this adaptability can be taken as a problem for the design of robot agents. Assuming that the designer does not give any explicit knowledge in the body representation, a robot should construct its body representation only from its uninterpreted multimodal sensory data. In this process, binding has a significant role.

The binding problem concerns the agent's capability to integrate information of different attributes [20]. Although there are already some models of binding, for example based on attention [9], firing in synchrony [19], [14], and so on, they focused on the binding problem between visual attributes. Therefore, it is still not clear how to bind different sensor modalities, although there is growing evidence of connections and interplay between different modalities such as vision and touch [13]. Often receptive fields for touch and vision are simultaneously stimulated, but in response to different physical phenomena since the foci of attention in these modalities are often different. For example, a robot does not always watch its touching region. Therefore, to bind different modalities, it should be able to correctly match the foci of attention in different modalities even though they may have multiple possible correspondences to each other.

In this study, we suppose that learning the multimodal representation of one's own body should be the first step toward binding because we utilize the physical and spatio-temporal constraints on simultaneous sensation of body parts from different modalities to find the relationship among them. Hereafter, we call this the “morphological constraint”. Building a robot that can acquire the multimodal representation of its own body is an interesting enterprise from a viewpoint not only of establishing design principles for an intelligent robot but also of understanding the process humans acquire their body representations [1]. There is some previous robotics work on acquisition of body representation (e.g. [7], [15]). However, that work avoided dealing with this binding problem by assuming that the agent observes only matched sensations in different modalities. This is true, as well, in the previous work on category learning with multimodalities [4].

In addition, it seems important to relate other modality like proprioception to the cognitive development of visual perception as shown in studies of kittens [6], [5]. There exist several studies on the cognitive developmental process related to self-body representation by an infant [2], [21], [18], [11], [10]. Although they have pointed out the importance of inter-modal correlation or contingency between proprioception and other modalities including extro-prioception such as vision [2], [21], [11], audition [12], or touch [10], [22], there has been less focus on the relationship among extro-prioceptive modalities. Although it has been reported that multisensory neurons in the superior colliculus of kittens are responsive to multimodal stimulus from birth [17], how such neurons are developed in the prenatal period has not been evident. From the viewpoint of a constructivist approach, therefore, it seems an important starting point for the designer not to provide a robot with any a priori knowledge about its sensing structure (sensor configuration). Rather, the robot should be able to acquire the capacity to discriminate the different sensor modalities such as vision and touch.

In this study, as a preliminary stage to solve the binding problem, we focus on how a robot can perform learning to look at one of its body parts when it detects a collision on it. Yoshikawa et al. have proposed a method for a robot to learn the multimodal representation of the body surface through double-touching, that is, touching its body with its own body part [24]. It is based on the fact that double-touching causes self-occlusion, that is, visibly covering part of one's body with one's own body part. Although this method did not take account of multiple self-occlusions caused by the physical volume of the body, which makes the binding problem remain formidable, it still seems reasonable to utilize the fact that double-touching co-occurs with self-occlusions. In this paper, we address the issue of learning to match the foci of attention in vision and touch through self-body-observation, and propose a method to learn the relationship between double-touching and self-occlusion based on the morphological constraint. In the proposed method, the mismatched responses in these modalities can be discarded through the process of unique association where corresponding pairs of subsets in different attributes are exclusively connected with each other through the bi-directional process of finding corresponding pairs, what we call cross-anchoring, that is, connection learning process with variable learning rate depending on the current connections.

In the rest of the paper, first we explain the problem definition for binding in different modalities. Then, we argue a possible developmental course towards binding and a basic idea utilizing the morphological constraint of the human-like robot's body to perform binding. After introducing an anchoring Hebbian learning rule to perform unique association, we show preliminary computer simulations of a robot which has a one- and two-degree-of-freedom (hereafter df) arm and a camera head. We conduct an experiment with a real robot to test whether the proposed learning rule works. Finally we will discuss the implications of the results.

Section snippets

The binding problem between different modalities

In order to propose the model for the binding mechanism of humans based on a constructivist approach, we should start with an assumption that the designer does not give any a priori knowledge on the robot body representation, but a robot can discriminate the different sensor modalities such as vision and touch. Therefore, the problem for the robot is how to associate these different sensations to build its body representation, which is needed to accomplish tasks such as collision avoidance and

A basic idea

As the previous work [24] suggests, it seems reasonable to utilize the fact that double-touching co-occurs with self-occlusion, although that work did not take the physical volume of the body into account, which makes the binding problem remain formidable. The following explains a basic idea how the robot can correctly match double-touching and self-occlusion despite this binding problem. First, we introduce our assumptions regarding what kinds of cognitive competences it should possess and

Cross-anchoring Hebbian learning rule

In this section, we introduce a cross-anchoring Hebbian learning rule as an implementation of the learning rule with the anchoring mechanism. The architecture consists of two layers called the double-touching layer and the self-occlusion layer (see Fig. 5). In the double-touching layer, there are Nt nodes, each of which is responsible for a set of certain posture of the arm Θi,(i=1,,Nt) which is assumed to be quantized in advance. When the posture of the arm is θRm, the activation of the ith

Simulation

In preliminary experiments, we tested whether the cross-anchoring Hebbian learning rule would enable a robot to solve the binding problem by using a computer simulation. First, we examined a robot with a single df to show how learning proceeds. Then we examined a robot with more dfs.

Experiment with a real robot

In the simulation so far, we coped with an ideal situation where the robot's exploration was simplified to the random selection of the posture only from the representative quanta. As a result, the agent experienced only representative pairs of double-touching and self-occlusion. On the other hand, there is much uncertainty in the real world. The experience of the same pair of these quanta may cause different results of detection of self-occlusion partially because each quantum of

Discussion

Finally, we argue further analysis and some possible improvements of the proposed model with suggesting future, interdisciplinary collaboration concerning the topic that we addressed in this paper.

There are parameters in the proposed learning rule that control the degree of anchoring. To examine the sensitivity of the parameters σt and σo for the performance of binding, we let the simulated 2-D robot learn with various parameters. Fig. 15 illustrates the distribution of the averaged mismatched

Conclusion

In this paper, we addressed how to solve the binding problem in different modalities for body representation. We proposed a method called cross-anchoring Hebbian learning rule to perform binding by virtue of the morphological constraints of a human-like configuration in perceiving the self. By computer simulations and an experiment with a real robot, we showed that the robot can bind its tactile and visual sensations. Towards understanding the process of binding vision and touch on general

Acknowledgments

This research was supported by the Advanced and Innovational Research program in Life Sciences of the Ministry of Education, Culture, Sports, Science and Technology of the Japanese Government and a Research Fellowship for Young Scientists from Japan Society for the Promotion of Science. The authors would like to thank the special issue editor Dr. Gedeon Deák and anonymous reviewers for giving us very valuable comments and suggestions for revising our paper.

Yuichiro Yoshikawa received the Ph.D. degree in Engineering from Osaka University, Japan in 2005. From April 2003 to March 2005, he was a Research Fellow of the Japan Society for the Promotion of Science. Since April 2005, he has been a researcher of Intelligent Robotics and Communication Laboratories, Advanced Telecommunications Research Institute International.

References (26)

  • P. Rochat et al.

    Differential rooting response by neonates: evidence for an early sense of self

    Early Dev. Parenting

    (1997)
  • P. Rochat et al.

    Spatial determinants in the preception of self-produced leg movements in three- to five-month-old infants

    Dev. Psychol.

    (1995)
  • P. Rochat et al.

    Emerging self-exploration by two-month-old infants

    Dev. Sci.

    (1999)
  • Cited by (0)

    Yuichiro Yoshikawa received the Ph.D. degree in Engineering from Osaka University, Japan in 2005. From April 2003 to March 2005, he was a Research Fellow of the Japan Society for the Promotion of Science. Since April 2005, he has been a researcher of Intelligent Robotics and Communication Laboratories, Advanced Telecommunications Research Institute International.

    Koh Hosoda received the Ph.D. degree in Mechanical Engineering from Kyoto University, Japan in 1993. From 1993 to 1997, he was a Research Associate of Department of Mechanical Engineering for Computer-Controlled Machinery, Osaka University. Since 1997, he has been an Associate Professor of Department of Adaptive Machine Systems, Osaka University. In 1998, he was a visiting researcher in AI-Lab, Department of Computer Science, University of Zurich.

    Minoru Asada received the B.E., M.E., and Ph.D., degrees in Control Engineering from Osaka University, Osaka, Japan, in 1977, 1979, and 1982, respectively. From 1982 to 1988, he was a Research Associate of Control Engineering, Osaka University. April 1989, he became an Associate Professor of Mechanical Engineering for Computer-Controlled Machinery, Osaka University. April 1995 he became a Professor of the same Department. Since April 1997, he has been a Professor of the Department of Adaptive Machine Systems at the same university. From August 1986 to October 1987, he was a Visiting Researcher of Center for Automation Research, University of Maryland, College Park, MD. He has been a Research Director of JST ERATO Asada Synergistic Intelligence Project since September, 2005.

    View full text