Elsevier

Neurocomputing

Volume 69, Issues 4–6, January 2006, Pages 537-558
Neurocomputing

Biologically motivated vergence control system using human-like selective attention model

https://doi.org/10.1016/j.neucom.2004.12.012Get rights and content

Abstract

We propose a new human-like vergence control method for an active stereo vision system. The proposed system uses a selective attention model to localize an interesting area in each camera. The selected object area in the master camera is compared with that in the slave camera to identify whether the two cameras find a same landmark. If the left and right cameras successfully find a same landmark, the implemented active vision system with two cameras focuses on the landmark. Using the motor encoder information, we can detect depth information automatically. Computer simulation and experimental results show that the proposed vergence control method is very effective in implementing the human-like active stereo vision system.

Introduction

When the human eye searches a natural scene, the left and right eyes converge on an interesting area by action of the brain and the eyeballs. This mechanism is divided into two attention processes. In a top-down (or volitional) processing, the human visual system determines salient locations through perceptive processing such as understanding and recognition. On the other hand, with bottom-up (or image-based) processing, the human visual system determines salient locations obtained from features that are based on the basic information of an input image such as intensity, color, and orientation. Bottom-up processing is a function of primitive selective attention in the human vision system since humans selectively attend to a salient area according to various stimuli in the input scene [1]. If we can apply the human-like vergence function considered human attention process to an active stereo vision system, an efficient and intelligent vision system can be developed.

Researchers have been developing the vergence stereo system. It was known that the two major sensory drives for vergence and accommodation are disparity and blur [2], [3]. Krotkov organized the stereo system through waking up the camera, gross focusing, orienting the cameras, and obtaining depth information [4]. Abbott and Ahuja proposed surface reconstruction by dynamic integration of the focus, camera vergence and stereo disparity [5]. These approaches give good results for a specific condition, but it is difficult to use these systems in real environment because the region extraction based on intensity information is very sensitive to luminance change. For mimicking a human vision system, Yamato implemented a layered control system for stereo vision head with vergence control function. This system utilized a search of the most similar region based on the sum of absolute difference (SAD) for tracking. The vergence module utilized a minimum SAD search for each pixel to obtain figure-ground separation in 3D space [6]. But these systems may not give good results when the camera is moving with background because the SAD contains much noise by moving the camera. Jian Peng et al. made that an active vision system enables the selective capture of information for a specific colored object [7]. But this system only considered the color information for the selective attention. Thus, the developed active vision only operates for a specific color object and the luminance change deteriorate performance of the system. Bernardino and Victor implemented vergence control stereo system using log-polar images [8]. This work considers only the intensity information. Batista et al. made a vergence control stereo system using retinal optical flow disparity and target depth velocity [9]. But this system mainly converges on the moving object because of optical flow. Thus, this system only considered the motion information of retina and do not consider intensity, edge and symmetry as retina operation. Moreover, these all approaches take a lot of computation load to get the vergence control. Therefore, we need a new method not only to sufficiently reflect information of images such as color, intensity and edge but also to reduce the computation load during vergence control. Conradt et al. proposed a stereo vision system using a biologically inspired saliency map (SM) [10]. They detected landmarks in both images with interaction between the feature detectors and the SM, and obtained their direction and distance. They considered intensity, color, and circles of different radius, and horizontal, vertical and diagonal edges as features. But, they do not consider the occlusion problem. Also their proposed model does not fully consider the operation of the brain visual signal processing mechanism because they only considered the roles of neurons in the hippocampus responding to mainly depth information.

In this paper, we propose a biologically motivated vergence control method for an active stereo vision system that cannot only solve the occlusion problem but also reduce the computational load by focusing on only an interesting object. The proposed method reflects the processing of the biological stereo visual signal from the retinal operation to the visual cortex. It is well known that our retina has preprocessing such as cone opponent coding and edge detection, and the extracted information is delivered to the visual cortex through lateral geniculate nucleus (LGN). Symmetrical information is also important feature to determine the salient object, which is related with the function of LGN and primary visual cortex [11], [12]. Our developed bottom-up SM model considers the preprocessing mechanism of cells in retina and the LGN with on-set and off-surround mechanism before the redundancy reduction in the visual cortex. SM resulted by integration of the feature maps is finally constructed by applying the independent component analysis (ICA) that is the best way for redundancy reduction [13], [14], [15]. Using the bottom-up SM model, we can obtain a sequence of salient areas according to visual stimuli. However, the bottom-up SM model may select unwanted area and generate unreasonable scan path because it just generates the salient sequence based on the primitive features such as intensity, edge, color and symmetry. On the other hand, human being can learn and memorize the characteristics of the unwanted area, and also inhibits or reinforces attention to that area in subsequent visual search. Thus, we use a new selective attention model for implementing a human-like vergence control system based on a selective attention mechanism not only with truly bottom-up process but also with interactive process to skip an unwanted area and/or to pay attention to a desired area in subsequent visual search process. In order to implement the trainable selective attention model, we use the bottom-up SM model in conjunction with the fuzzy adaptive resonant theory (Fuzzy ART) network. It is well known that the Fuzzy ART model maintains the plasticity required to learn new patterns, while preventing the modification of patterns that have been learned previously [16]. Thus, the characteristics of unwanted salient area and desired salient area selected by the bottom-up SM model are used as an input data of Fuzzy ART model that is to learn and generalize a feature of unwanted area and desired area in natural scene. In training process, the Fuzzy ART network learns about uninteresting areas and desired areas that are decided by human supervisor interactively, which is different from the conventional Fuzzy ART network. In test mode, the vigilance parameter in the Fuzzy ART network determines whether the new input area is interesting or not, because the Fuzzy ART network memorizes the characteristics of the unwanted salient areas or desired salient areas. If the vigilance value is larger than a threshold, the ART network for inhibition inhibits the selected area in the bottom-up SM model so that the area should be ignored in subsequent visual search process, and also the Fuzzy ART network for reinforcement modifies the scan path so that the selected area is to be the most salient area.

Using the proposed trainable selective attention model, we can obtain a desired salient object in each camera for vergence control system and compare the attention regions in each camera. If the difference of the attention region's values is sufficiently small, we regard the attention region as a landmark in each camera to make a vergence. Then the stereo vision system moves to each landmark point by motors, the result of which becomes the vergence. When disparity between the two vergence points is minimized by the attention region comparison, depth estimation algorithm is performed in the vergence point. To prevent it from being a repetitively attended region in the vision system, the converged region is masked by an inhibition of return (IOR) function [15]. Then the vision system continuously searches a new converged region by the above procedure. The practical purpose of the proposed system is to get depth information for robot vision with small computational load by considering focusing on an interesting object only by training process. The depth information of the developed system will operate for avoiding an obstacle in a robotic system.

In Section 2, we briefly discuss the selective attention model. In Section 3, we explain the landmark selection algorithm in each camera, the verification of the landmarks and depth estimation using eye gaze matching. In Section 4, we explain the hardware setup. In Section 5, we describe computer simulation and the experimental results. Section 6 will follow.

Section snippets

Selective attention model

Fig. 1 shows the biological visual pathway from the retina to the visual cortex through the LGN for the bottom-up processing, which is extended to the extrastriate cortex and the prefrontal cortex for the top-down processing. In order to implement a human-like visual attention function, we consider the bottom-up SM model and top-down trainable attention model. In our approach, we reflect the functions of the retina cells, LGN and visual cortex for the bottom-up processing, and dorsolateral

Selection and verification of landmarks

During an infant's development, binocular disparity by binocular fixation is decomposed into three different mechanisms; alignment of eyes, convergence and sensory binocularity [22]. According to this fact, the single eye alignment should be the first factor considered regarding convergence that needs binocular fixation. In order to accomplish the single eye alignment, we use successive attention regions by the trainable selective attention model in each camera image. Most of the stereo vision

Hardware setup

Fig. 10(a) shows the block diagram of the developed active stereo vision system. Fig. 10(b) shows a picture of the implemented active stereo vision system with five degrees of freedom. Fig. 10(c) shows the implemented motor driving circuit and the DSP board.

We use two CCD type cameras as image sensors, and two images are obtained by the MIL image grabber and transferred to the IBM PC at a speed of 30 frames per second. The SM model, which is implemented in the IBM PC, generates a target point

Computer simulation and experimental results

Fig. 11 shows the result of the top-down trainable model using Fuzzy ART. Fig. 11(a) shows source input image. The bottom-up SM model generates the scan path for the input image as shown in Fig. 11(b). Fig. 11(c) shows the scan path generated after training the Fuzzy ART model to inhibit the 4th and 5th salient area in the bottom-up scan path image, Fig. 11(b). Fig. 11(d) shows the scan path after training the Fuzzy ART model to reinforce the 2nd salient area in the bottom-up scan path image as

Conclusion

We proposed a new biologically motivated vergence control method of an active stereo vision system that mimics human-like visual selective attention. We used a trainable selective attention model that can decide an interesting area by the top-down inhibition and reinforcement mechanism implemented by the Fuzzy ART training model in conjunction with the bottom-up processing model. In the system, we proposed a landmark selection method using the trainable selective attention model and the IOR

Acknowledgement

This research was supported by the Brain Science & Engineering Research Program of the Ministry of Korea Science and Technology and Grant No. R05-2003-000-11399-0 from the Basic Research Program of the Korea Science & Engineering Foundation.

Sang-Bok Choi got the Ph.D. degree from Department of Sensor Engineering, Kyungpook National University, Taegu, Korea in 2004. He is now working as a CTO at COMEDO Co. His research interest includes biologically motivated active vision systems, intelligent medical system, intelligent sensor systems, pattern recognition techniques, fingerprint recognition system, and embedded systems.

References (22)

  • A.J. Bell et al.

    The independent components of natural scenes are edge filters

    Vision Res.

    (1997)
  • S.J. Park et al.

    Saliency map model with adaptive masking based on independent component analysis

    Neurocomputing

    (2002)
  • V. Navalpakkam et al.

    A goal oriented attention guidance model, BMCV 2002

    Lecture Notes in Computer Science

    (2002)
  • V.V. Krishnan et al.

    A heuristic model of the human vergence eye movement system

    IEEE Trans. Biomed. Engl.

    (1977)
  • G.K. Hung et al.

    Static behavior of accommodation and vergence: computer simulation of an interactive dual-feedback system

    IEEE Trans. Biomed. Engl.

    (1980)
  • E. Krotkov, Exploratory visual sensing for determining spatial layout with an agile stereo camera system, University of...
  • A.L. Abbott et al.

    Surface reconstruction by dynamic integration of focus, camera vergence, and stereo

    IEEE International Conference Computer Vision

    (1988)
  • J. Yamato

    A layered control system for stereo vision head with vergence

    IEEE International Conference Systems, Man, and Cybernetics

    (1999)
  • J. Peng et al.

    An active vision system for mobile robots

    IEEE International Conference Systems, Man, and Cybernetics

    (2000)
  • A. Bernardino et al.

    Vergence control for robotic heads using log-polar images

    IEEE/RSJ International Conference Intelligent Robots and Systems

    (1996)
  • J. Batista, P. Peixoto, H. Araujo, A focusing-by-vergence system controlled by retinal motion disparity, IEEE...
  • Cited by (41)

    • Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features

      2017, Neurocomputing
      Citation Excerpt :

      The field of computer vision is replete with a numerous variety of saliency models. A widely recognized group of models apply the feature integration theory [40] and consider a center-surround interaction of features [2,45–54]. There are models which consider the information theoretic foundations [55–60], frequency domain aspect [16,61–68], diffusion and random walk techniques [69–71], and etc.

    • Goal-oriented behavior sequence generation based on semantic commands using multiple timescales recurrent neural network with initial state correction

      2014, Neurocomputing
      Citation Excerpt :

      The stereo visual attention system receives a visual attention command from the MTRNN and the retina image from the robot's vision [4–7]. The spatial location of a target object is encoded by using an angle between the robot body and a target object together with depth information obtained from the stereo-type visual attention system [6,8]. The spatial location of a target object, the visual attention command and behavior command are fed back as inputs to the MTRNN.

    • Affective saliency map considering psychological distance

      2011, Neurocomputing
      Citation Excerpt :

      Based on Treisman's feature integration theory, Itti et al. used three basic feature maps (FMs) based on intensity, orientation, and color information for generating a bottom-up SM model [1,18]. By extending Itti et al.'s SM model, Lee et al. previously proposed SM models that included the neural network approach of Fukushima to construct the symmetry FM and an ICA filter designed to integrate the different feature information, which properly reflected the importance of the symmetry FM and an ICA filter in constructing an object preferred attention model [1,6,7,16,19]. Lee et al. showed the importance of the symmetry features and an ICA filter through performance comparison experiments [16].

    • Stereo saliency map considering affective factors and selective motion analysis in a dynamic environment

      2008, Neural Networks
      Citation Excerpt :

      The proposed attention model can also generate a SM that integrates static and dynamic features as well as affective factors and depth information in natural input scenes. In particular, we added a Hebbian learning process to generate a top-down bias signal based on human affective factors, which enhances the performance of the previous Lee’s trainable selection attention scheme (Choi et al., 2006). The previous trainable SM model was implemented using a single fuzzy ART network, in which the input is obtained by concatenating four different features of a salient area.

    • Pervasive Eye-Tracking for Real-World Consumer Behavior Analysis

      2019, A Handbook of Process Tracing Methods: Second Edition
    View all citing articles on Scopus

    Sang-Bok Choi got the Ph.D. degree from Department of Sensor Engineering, Kyungpook National University, Taegu, Korea in 2004. He is now working as a CTO at COMEDO Co. His research interest includes biologically motivated active vision systems, intelligent medical system, intelligent sensor systems, pattern recognition techniques, fingerprint recognition system, and embedded systems.

    Bum-Soo Jung received the B.Eng. and M.Eng. degrees from School of Electronic and Electrical Engineering, Kyungpook National University, Taegu, Korea, in 2002 and in 2004, respectively. He is now working as a research engineer at LG Innotek Co., Ltd. His research interest includes biologically motivated stereo vision systems, vergence control.

    Sang-Woo Ban is currently a Ph.D. candidate, School of Electronic and Electrical Engineering, Kyungpook National University, Taegu, Korea. His research interest includes brain science and engineering, intelligent sensor systems, neural networks, pattern recognition techniques, and biologically motivated active vision systems.

    Hirotaka Niitsuma received the Ph.D. degree in information science from Nara Institute of Science and Technology, Japan, in 1999. He worked as a postdoctoral researcher from September 2004 to February 2005 in Kyungpook National Univeristy, Taegu, Korea. Since March 2005, he has been a postdoctoral research scientist at National Institute of Advanced Industrial Science and Technology, Japan. His research interests include image processing, machine learning, neural networks, and data mining.

    Minho Lee graduated from Korea Advanced Institute of Science and Technology in 1995, and is currently a professor of School of Electronic and Electrical Engineering, Kyungpook National University, Taegu, Korea. His research interests include active vision systems based on human eye movements, selective attention, neural networks, independent component analysis, active noise control, and intelligent sensor systems. (Home page: http://abr.knu.ac.kr)

    View full text