Towards selective attention: generating image features by learning a visuo-motor map

https://doi.org/10.1016/j.robot.2003.09.001Get rights and content

Abstract

Robots require a form of visual attention to perform a wide range of tasks effectively. Existing approaches specify in advance the image features and attention control scheme required for a given robot to perform a specific task. However, to cope with different tasks in a dynamic environment, a robot should be able to construct its own attentional mechanisms. This paper presents a method that a robot can use to generating image features by learning a visuo-motor map. The robot constructs the visuo-motor map from training data, and the map constrains both the generation of image features and the estimation of state vectors. The resulting image features and state vectors are highly task-oriented. The learned mechanism is attentional in the sense that it determines what information to select from the image to perform a task. We examine robot experiments using the proposed method for indoor navigation and scoring soccer goals.

Introduction

Through billions of years of evolution, biological systems have acquired their organs and strategies to survive in hostile environments. Visual attention can be regarded as a combination of such organs and strategies: vision captures a huge amount of data about the external world, and attentional mechanisms extract information necessary for the system to achieve the mission at hand. This capability is desirable for artificial systems, and it has remained one of the most formidable issues in robotics and AI for many years.

Human beings can readily exploit attentional mechanisms in various kinds of situations, and much research focuses on the early visual processing of human beings [8], [16], [19], [20]. Some research applies Shannon’s information theory to the observed image to select the focus of attention in the view [14]. The main emphasis of this work is the analysis of human visual processing and the explanation of our own attentional mechanisms.

Some computer vision researchers focused on the viewpoint selection (i.e., where to look) problem [1], [13] in order to disambiguate the descriptions for the observed image that is obtained by matching the image with a model database. The selection criterion is based on the statistics of image data. The actions (gaze control) are intended to obtain a better observation for object recognition, but are not directly related to the physical actions needed to accomplish a given task beyond those required for recognition.

Some robot researchers focused on the attention problem of robot vision. Thrun [15] and Vlassis et al. [17] extracted image features correlated with the mobile robot’s self-localization information from the observed images based on a probabilistic method. Kröse and Bunschoten [7] decided the robot direction, that is, the camera direction, by minimizing the conditional entropy of the robot position given the observations. These methods are considered to be task-relevant visual attention but are not related to any physical actions.

As mentioned above, there are many methods to construct attentional mechanisms. However, existing methods do not generally take the robot’s (or human’s) physical actions into consideration.

Meanwhile, in biological systems, feature extracting cells in the visual cortex develop depending on visual and motor experiences. The functions of these cells are not innate, but are adaptively acquired depending on visual experiences in early development after birth [2], [5]. Furthermore, self-produced movement with its concurrent visual feedback is necessary for the development of visuo-motor coordination [4].

In light of these neurophysiological studies, the visual organs of a robot should develop depending on its visual and motor experiences. Linsker [9], [10] showed that an orientation-selective cell emerged in an artificial multilayered network using modified Hebbian learning. This result shows that the artificial system can learn a visual function similar to the one found in the brain. However, this is a closed system with respect to visual experiences.

In this paper, we focus on extracting image features as an attentional mechanism, and propose a method for image feature (e.g., edges or color regions) generation by visuo-motor map learning depending on the experience gathered by the robot while performing a task. The training data constructs the visuo-motor mapping that constrains image feature generation and state vector estimation for the selection of actions. That is, the state space is constructed so that the correlation between a state and a given instruction can be maximized. The resultant image feature and state vector are task-oriented. The method is applied to indoor navigation and scoring soccer tasks.

There are some existing methods to construct the visual state spaces through task execution (e.g. [6], [12]). These methods can construct the task-oriented state vector, but they have not focused on image features. The proposed method constructs the task-oriented visual state space and image features that are useful for selective attention.

The remainder of the paper is organized as follows. First, we describe the basic idea of image feature generation along with the learning formulation. We use the projection matrix from the extracted image feature to the state vector to determine the optimal action. Next, we give experimental results to show the validity of the proposed method. Finally, we conclude with a discussion on the attentional mechanism suggested by the current results.

Section snippets

The basic idea

In the visual cortex, there are many kinds of cells that extract basic features such as edges from retinal signals. Higher level processes, such as recognition, are performed according to the cell’s responses (bottom–up signals) and memory or appetite (top–down signals). That is, various kinds of features are extracted from an input image in a bottom–up process, and the necessary features are selected according to the task in a top–down process. In our model shown in Fig. 1, we decompose the

Tasks and assumptions

We applied the proposed method to an indoor navigation task with the Nomad mobile robot, Fig. 3(a), and a ball shooting task with a soccer robot, Fig. 3(b). Although the mobile robot shown in Fig. 3(a) is equipped with stereo cameras, we use only the left camera image. The soccer robot shown in Fig. 3(b) is equipped with a single camera directed ahead. Each robot must move along the given path to the destination using the camera image. The size of the observed image Io is 64×54 pixels and the

Discussion and future work

We proposed a method to generate an image feature and to learn a projection matrix from the filtered image to the state that suggests which part of the view is important, that is, a gaze selection by visuo-motor mapping. The generated image features are appropriate for the task and environment. Also the acquired projection matrices give appropriate gaze selection for the task and environment. To show this, we illustrate the absolute values of W acquired in the model with Fs of task 1 and Fc of

Conclusion

In this paper, we have proposed a method in which a robot learns the image feature and state vector that are effective in the given tasks through its experiences. It is considered that the brains of mammals including a human being develop through not only their perception but also the interaction between their bodily movements and the surrounding environment. Our results suggest that we can draw an analogy between generating the image features and developing feature cells in the visual cortex.

Takashi Minato received his B.E. and M.E. degrees in mechanical engineering from Osaka University in 1996 and 1998, respectively. He was a researcher of CREST, JST since December 2001. He has been a Research Associate of the Department of Adaptive Machine Systems, Osaka University since September 2002. He is a member of The Robotics Society of Japan.

References (20)

  • A. Treisman et al.

    A feature integration theory of attention

    Cognitive Psychology

    (1980)
  • T. Arbel, F.P. Ferrie, Viewpoint selection by navigation through entropy maps, in: Proceedings of the Seventh...
  • C. Blakemore et al.

    Development of the brain depends on the visual environment

    Nature

    (1970)
  • D.J. Fellman et al.

    Distributed hierarchical processing in the primate cerebral cortex

    Cerebral Cortex

    (1991)
  • R. Held et al.

    Movement-produced simulation in the development of visually guided behavior

    Comparative and Physiological Psychology

    (1963)
  • H.V.B. Hirsch et al.

    Visual experience modifies distribution of horizontally and vertically oriented receptive fields in cats

    Science

    (1970)
  • H. Ishiguro, M. Kamiharako, T. Ishida, State space construction by attention control, in: Proceedings of the 16th...
  • B.J.A. Kröse, R. Bunschoten, Probabilistic localization by appearance models and active vision, in: Proceedings of the...
  • P. Laar et al.

    Task-dependent learning of attention

    Neural Networks

    (1997)
  • R. Linsker, From basic network principles to neural architecture: emergence of spatial-opponent cells, Proceedings of...
There are more references available in the full text version of this article.

Cited by (10)

View all citing articles on Scopus

Takashi Minato received his B.E. and M.E. degrees in mechanical engineering from Osaka University in 1996 and 1998, respectively. He was a researcher of CREST, JST since December 2001. He has been a Research Associate of the Department of Adaptive Machine Systems, Osaka University since September 2002. He is a member of The Robotics Society of Japan.

Minoru Asada received his B.E., M.E., and Ph.D. degrees in control engineering from Osaka University, Osaka, Japan, in 1977, 1979, and 1982, respectively. From 1982 to 1988, he was a Research Associate of Control Engineering, Osaka University, Toyonaka, Osaka, Japan. Since April 1989, he became an Associate Professor of Mechanical Engineering for Computer-controlled Machinery, Osaka University, Suita, Osaka, Japan. Since April 1995, he became a Professor of the same department. Since April 1997, he has been a Professor of the Department of Adaptive Machine Systems at the same university. From August 1986 to October 1987, he was a visiting researcher of Center for Automation Research, University of Maryland, College Park, MD.

He has received the 1992 best paper award of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS92), and the 1996 best paper award of RSJ (Robotics Society of Japan). He was a general chair of IEEE/RSJ 1996 International Conference on Intelligent Robots and Systems (IROS96). His team was the first champion team with the USC team in the middle-size league of the first RoboCup held in conjunction with IJCAI-97. In 2001, he received a Commendation by the Minister of Education. Since 2002, he has been the President of the International RoboCup Federation.

View full text