Elsevier

Neurocomputing

Volume 72, Issues 10–12, June 2009, Pages 2198-2208
Neurocomputing

A network of integrate and fire neurons for visual selection

https://doi.org/10.1016/j.neucom.2008.10.024Get rights and content

Abstract

Biological systems have facility to capture salient object(s) in a given scene, but it is still a difficult task to be accomplished by artificial vision systems. In this paper a visual selection mechanism based on the integrate and fire neural network is proposed. The model not only can discriminate objects in a given visual scene, but also can deliver focus of attention to the salient object. Moreover, it processes a combination of relevant features of an input scene, such as intensity, color, orientation, and the contrast of them. In comparison to other visual selection approaches, this model presents several interesting features. It is able to capture attention of objects in complex forms, including those linearly non-separable. Moreover, computer simulations show that the model produces results similar to those observed in natural vision systems.

Introduction

Visual attention is an efficient mechanism that biological systems have developed to address the reduction of large amount of input visual information [1]. It is related to the individual capacity of discriminating one significant stimulus among others. This process appears to optimize the search procedure by selecting a number of possible candidate images and feature subsets which can be used in tasks such as recognition [2] and might enhance the signal produced by stimulus [3]. It is also responsible to break down complex tasks into a series of small localized computational tasks [4]. According to Tsotsos et al. [5], intermediate and higher visual processes seem to select part of the sensory information received from the world and use just these selected data in further processing. Visual attention is also responsible for reducing the combinatorial explosion resulting from the large amount of incoming sensory information [6], [7], [8].

Visual attention is generated by a combination of information from the retina and early visual cortical areas (called bottom-up attention or scene dependent attention) as well as feedback signals from areas outside the visual cortex (called top-down attention or task dependent attention) [9], [10]. Bottom-up attention is a feedforward process formed by simple features extracted from the input image, such as intensity, motion, stereo disparity, color, orientation, and others [9]. The top-down attention is responsible for modulating the competition among all stimulus within the visual input, entailing a short-term memory to keep the information about an object location which is used as a target of attention to influence the earlier visual processes [11].

Most of the bottom-up visual attention models are related to the concept of a Saliency Map [9]. In those models, the first stage of processing is responsible for decomposing the input image into a set of feature maps. After that, a saliency map is generated by a combination of those feature maps. The saliency map is a topographical map which represents, by a scalar quantity, all salient points over the entire input visual stimulus [9], [10]. The main purpose of saliency map is to guide a selection mechanism to deliver the focus of attention to a specific region of the image.

There are basically two approaches for computer visual attention models. One is location based and another is object based [12]. The majority of visual attention models are implemented by using WTA (winner take all) mechanisms, which are compatible to location-based theory. In this case, a single neuron is activated. As a result, the attention is directed to a point or a small area, but not a whole object or a component. For example, in the model proposed in [10], when a neuron receives the focus of attention, a circle with a fixed radius is considered to be the region of the visual input under the focus of attention. Object-based theories consider objects as the basic units of perception acting as a whole in a competitive process for attention [1], [13], [14], [15]. The model proposed in this paper is object based, in which visual attention is delivered to the salient object or component. To build a visual selection model compatible with object-based theories of visual attention, the visual selection model must have an embedded segmentation function.

von der Malsburg [16] proposed a mechanism of temporal correlation as a representational framework. This theory suggests that objects are represented by the temporal correlation of the firing activities of spatially distributed neurons coding different features of an object. A natural way to encode temporal correlation is by using synchronization of oscillators where each oscillator encodes some features of an object [12], [17], [18]. Inspired from the biological findings and von der Masburg's brain correlation theory, Wang and his collaborators have developed oscillatory correlation theory for scene segmentation [17], [19], [20], [21], which can be described by the following rule: the neurons which process different features of the same object are synchronized, while neurons which code different objects are desynchronized. There are two basic mechanisms working simultaneously in each oscillatory correlation model: synchronization and desynchronization. The former serves to group neurons into objects while the latter serves to distinguish one group of synchronized neurons (an object) from another. Oscillatory correlation theory has been extended and successfully applied to various tasks of scene analysis, such as image segmentation, motion determination, auditory signal segregation, and perception ([12] and references there in).

Recent biological findings have shown that synchronization among neural oscillators are highly related to several cognitive process, such as memory, attention, visual perception, consciousness, etc. [22], [23]. Synchronization of action potential in different frequency domains also plays an important role in memory formation [24]. According to [25], rhythms in mammalian brain are related to three main computational roles. First, information can be represented by neuronal oscillations; second, oscillations and synchronization regulate the flow of information; and third, synchronization helps the storage and retrieval of information. Visual attention can act in different levels of neural activity, such as increasing the spiking rate of single neurons and also inducing the synchronization among neurons [26]. Neurobiological discoveries have demonstrated that visual attention is strongly linked to synchronization among neurons. Biological experiments have shown that visual attention increases the coherence among neurons responding to the same stimulus, thus suggesting that synchronization is an important mechanism for visual selection [26], [27], [28], [29]. Although our intention is not to explain biological phenomenon, the evidences mentioned above shows that our model has certain level of biological plausibility.

In this paper, we propose an oscillatory correlation model for visual selection built on a network of integrate and fire (I&F) neurons with cooperative short-range connections and competitive long-range connections. As the system runs, each group of neurons representing an object of a visual input is synchronized due to the cooperative connections among neighbor neurons. At the same time, a competition mechanism is introduced by long-range connections among neurons. By means of such a competition mechanism, firing frequencies of neurons representing the salient object are increased, while frequencies of those neurons representing background objects are decreased. As a result, the neurons representing the salient object will keep firing and other neurons will slow down their firing activities until they become silent. Finally, the salient object is highlighted. According to the characteristics of network of I&F neurons, in our model, the firing frequencies are controlled by modeling the external input and coupling term. The abovementioned rhythmic dynamics observed in brain activities is incorporated in the proposed model and thus its biological plausibility is enhanced.

Another feature of the model is that a combination of several visual attributes, such as intensity, contrast of colors and orientations are considered. These features were selected because they are among the most relevant features used by the visual system to guide the search for a visual target. Wolfe and Horowitz [3] claim that color, motion, orientation and size are the attributes that undoubtedly guide the deployment of attention. Moreover, biological findings show that the perceptual system might encode contrast of features rather than the absolute level of them [30] and the search is easier when the target-distracter difference (contrast) is larger [3].

Several oscillator networks have been proposed to realize WTA function, for example, see [31]. In [31], the winner is a single neuron with the highest input value. It uses a global inhibitory neuron in order to prevent non-winning neurons from firing, or a global excitatory neuron to detect fast coincidence. This model could be extended to perform visual attention tasks, as claimed by the authors. However, it still requires an additional segmentation mechanism to distinguish one object from others. In our model, instead of coding directly the input value, each neuron codes local feature contrasts, which is biologically plausible. Moreover, our model does not need a global coordinating neuron, and it uses both excitatory and inhibitory connections among neurons, the former serves to synchronize neurons representing the same object and the latter serves to inhibit neurons corresponding to non-salient objects. Usually, there is not a single winning neuron, but a group of winning neurons, which represents the salient object. Through this unique feature, our model can highlight and segment the salient object at the same time.

In our model, the neurons self-organize based on synchronization and form a certain pattern to represent a specific input. In other words, the coupling weights between neurons are directly determined by external stimulation (input pattern). The external stimulation evokes oscillation and synchronization among neurons, which represent the same object. The salient object is obtained by changing adaptively the frequency of each neuron. This kind of modeling is interesting because it receives support from neuro-physiological experiments [32], [33].

The rest of the paper is organized as follows. In Section 2, networks of I&F neurons are revisited and the synchronization process among these neurons are presented. Section 3 is devoted to the model description. Section 4 presents computer simulation results and Section 5 concludes the paper.

Section snippets

Synchronization analysis of I&F neurons

Networks of I&F neurons have been extensively used to build computational neuroscience models and also applied to several computing tasks, such as, image segmentation [20], clustering [34], etc. These networks have also been investigated theoretically by several authors. These studies are mainly related to the ability of synchronization among neurons organized in several connection topologies, such as: globally coupled networks [35], [36], locally coupled networks [37], [38], [39], and even

Model description

The model presented in this paper is formed by a 2D network of I&F neurons with two connection types: excitatory short-range connections and inhibitory long-range connections. Excitatory connection is a cooperative mechanism to synchronize each group of neurons representing an coherent object. Inhibitory connection, on the other hand, is responsible to not only desynchronize different groups of neurons (segments of the input image) but also inhibit background objects permitting the salient

Computer simulations

In this section, we present computer simulation results to check the robustness of our model as a visual selection mechanism.

Given a color image as input, the intensity, and local orientations are extracted by using the three color components: R, G, and B. All features are normalized between [0,1]. The intensity is obtained as I=(R+G+B)/3. The local orientations are obtained by means of the application of a Laplacian filter followed by four spatial masks, defined by Eqs. (28), (29), (30), (31),

Conclusions

This paper presents a visual attention mechanism realized by a network of I&F neurons. The contrast of a combination of features of an input image, such as intensity, color and orientation are considered to stimulate the corresponding neurons in the network. The local connections among neighbor neurons serve to synchronize neurons representing a coherent object in a given input image, while those long-range coupling terms have the role of selecting the salient object by inhibiting distracters,

Acknowledgments

This work is supported by the São Paulo State Research Foundation (FAPESP) and the Brazilian National Research Council (CNPq).

Marcos G. Quiles received the B.S. degree from the State University of Londrina, Brazil, and the M.S. degree from the University of São Paulo, Brazil, in 2003 and 2004, respectively, both in Computer Science. He is currently a Ph.D. candidate in Computer Science at the University of São Paulo. From January 2008 to July 2008, he was a Visiting Scholar in the Department of Computer Science and Engineering, the Ohio State University, USA. His current research interests include computer vision,

References (42)

  • W. Wang et al.

    Fast computation with neural oscillators

    Neurocomputing

    (2006)
  • Y. Kuramoto

    Collective synchronization of pulse-coupled oscillators and excitable units

    Physica D

    (1991)
  • J. Theeuwes et al.

    Parallel search for a conjunction of contrast polarity and shape

    Vision Research

    (1994)
  • R. Desimone et al.

    Neural mechanisms of selective visual attention

    Annual Review of Neuroscience

    (1995)
  • J.M. Wolfe et al.

    What attributes guide the deployment of visual attention and how do they do it?

    Nature Review Neuroscience

    (2004)
  • F. Shic et al.

    A behavioral analysis of computational models of visual attention

    International Journal of Computer Vision

    (2007)
  • J.K. Tsotsos

    On the relative complexity of active vs. passive visual search

    International Journal of Computer Vision

    (1992)
  • L. Itti et al.

    Computational modelling of visual attention

    Nature Reviews Neuroscience

    (2001)
  • L. Itti et al.

    A model of saliency-based visual attention for rapid scene analysis

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1998)
  • D.L. Wang

    The time dimension for scene analysis

    IEEE Transactions on Neural Networks

    (2005)
  • P.R. Roelfsema et al.

    Object-based attention in the primary visual cortex of the macaque monkey

    Nature

    (1998)
  • Cited by (5)

    • A comprehensive survey of recent developments in neuronal communication and computational neuroscience

      2019, Journal of Industrial Information Integration
      Citation Excerpt :

      It has been shown in the work that a single Integrate and Fire neuron is capable of doing tasks that require multilayer perceptron neural model with many hidden layers. [30] improves upon existing INF model by introducing a partial reset mechanism which counters the strong dependency of thee model on inhibitory inputs. [31] proposes a robust Integrate and Fire model for visual selection. [30]

    • A neural network model for visual selection and shifting

      2016, Journal of Integrative Neuroscience
    • Modeling figure/ground separation with spiking neurons

      2015, Simulations in Medicine: Pre-Clinical and Clinical Applications
    • Model of top-down / bottom-up visual attention for location of salient objects in specific domains

      2012, Proceedings of the International Joint Conference on Neural Networks

    Marcos G. Quiles received the B.S. degree from the State University of Londrina, Brazil, and the M.S. degree from the University of São Paulo, Brazil, in 2003 and 2004, respectively, both in Computer Science. He is currently a Ph.D. candidate in Computer Science at the University of São Paulo. From January 2008 to July 2008, he was a Visiting Scholar in the Department of Computer Science and Engineering, the Ohio State University, USA. His current research interests include computer vision, complex networks, and machine learning.

    Liang Zhao received the B.S. degree in 1988 from Wuhan University, China, and both the M.Sc. and the Ph.D. degrees from Aeronautic Institute of Technology, Brazil, in 1996 and 1998, respectively, all in computer science. From 1988 to 1993, he worked as a software engineer. In 1999, he was a postdoctoral fellow at the National Institute for Space Research, Brazil. Dr. Zhao joined the University of São Paulo, Brazil, in 2000, where he is currently an associate professor in the Department of Computer Science and Statistics. From 2003 to 2004, he was a visiting researcher in the Department of Mathematics, Arizona State University, USA. His research interests include artificial neural networks, nonlinear dynamical systems, complex networks, bioinformatics, and pattern recognition.

    Fabricio A. Breve received the bachelor's degree from the Methodist Uãniversity of Piracicaba, Brazil, in 2001 and the master's degree from the Federal University of São Carlos, Brazil, in 2006, both in computer science. From 2006 to 2007 he was an adjunct professor at the São Paulo State University “Júlio de Mesquita Filho,” Rio Claro, Brazil, and at the Engineering School of Piracicaba, Brazil. He is currently a Ph.D. student in computer science at the University of São Paulo, Brazil. His research interests include pattern recognition, image processing, complex networks and data clustering.

    Roseli A.F. Romero received her Ph.D. degree in electrical engineering from the University of Campinas, Brazil, in 1993. She is an associate professor in Department of Computer Science at the University of Sao Paulo since 1988. From 1996 to 1998, she was a Visiting Scientist at Carnegie Mellon's Robot Learning Lab, USA. Her research interests include artificial neural networks, machine learning techniques, fuzzy logic, robot learning and computational vision. She has been reviewer of some important journals, such as, IEEE Transactions on Neural Networks, ASME Journal, Asian Journal of Control, Control and Cybernetics, IJCIA, JBSMSE and several International and National Conferences of her area. She has already organized Special Sessions in important Conferences, such as, IJCNN’05, IJCNN’06 and IJCNN’07 and organized important events in Brazil, such as, IBERAMIA/SBIA/SBRN’06. She was supervisor of 15 Master Dissertations and 05 Ph.D. Thesis. Dr. Romero is a member of INNS—International Neural Networks Society and Computer Brazilian Society (SBC).

    View full text