Categorization and decision-making in a neurobiologically plausible spiking network using a STDP-like learning rule
Introduction
Object recognition in monkeys has traditionally been associated with an anatomically distinct pathway termed the “what” (or ventral) visual stream (Ungerleider & Haxby, 1994), which consists of at least V1, V2, V4, and various regions in the inferior and anterior temporal cortices (e.g., TEO, TE1–TE3, TEa, and TEm) (Rolls, 2012, Rolls and Deco, 2002). While traveling along this pathway, the characteristics of the stimuli to which neurons respond become more complex (Rolls, 2012, Ungerleider and Haxby, 1994), ranging from rather simple stimuli with small receptive fields such as oriented bars in V1 (Hubel & Wiesel, 1965) to relatively large and more abstract objects such as faces in the inferotemporal cortex (IT) (Bruce, Desimone, & Gross, 1981). These empirical observations have led to a number of classic studies modeling the ventral stream as a hierarchical feed-forward network, such as the Neocognitron (Fukushima, 1980), HMAX (Riesenhuber & Poggio, 1999), or VisNet (Rolls, 2012, Wallis and Rolls, 1997)—although it should be noted that the notion of a strictly hierarchical or feed-forward network has been questioned by recent anatomical studies that reserve a more important functional role for bi-directional and non-hierarchical connections (Markov et al., 2012, Markov et al., 2011). Inspired by these classic models, a variety of more conventional machine learning algorithms have emerged that demonstrate the extraordinary performance in certain recognition tasks, such as convolutional neural networks (CNNs) in handwriting recognition (Ciresan et al., 2011, LeCun et al., 1998, Simard et al., 2003)—or for that matter, adaptive boosting in face recognition (Viola & Jones, 2001). Although CNNs implement a network topology that is biologically-inspired, they often rely on the error backpropagation (gradient descent), which has been criticized for being biologically unrealistic because it involves variables that cannot be computed locally (Rolls & Deco, 2002). Part of the challenge is thus to discover how comparably hard problems can be solved by more biologically plausible networks relying on local learning rules that operate on the abstraction level of a synapse.
A potential candidate for such a mechanism is spike-timing-dependent plasticity (STDP), (Bi and Poo, 2001, Sjöström et al., 2001, Song et al., 2000), a paradigm which modulates the weight of synapses according to their degree of causality. Many different variants of STDP seem to exist in the brain, and many different models to explain them have emerged over the years (Morrison, Diesmann, & Gerstner, 2008). In an effort to implement STDP-like learning rules using only information locally available at the synapse without algorithmically storing the spike timings, several models have proposed to pair presynaptic spiking with postsynaptic voltage, determining the weight change by using either the temporal change of postsynaptic voltage (Porr, Saudargiene, & Worgotter, 2004), a piece-wise linear function to approximate postsynaptic voltage (Gorchetchnikov, Versace, & Hasselmo, 2005) or postsynaptic Calcium concentration (Brader et al., 2007, Graupner and Brunel, 2012). Networks with STDP have been shown to be able to learn precise spike times through supervised learning (Legenstein et al., 2005, Pfister et al., 2006), to implement reinforcement learning (Florian, 2007, Izhikevich, 2007b, O’Brien and Srinivasa, 2013), to develop localized receptive fields (Clopath, Busing, Vasilaki, & Gerstner, 2010), or to classify highly correlated patterns of neuronal activity (Brader et al., 2007).
Once an internal representation of a visual object is built in the brain, the question then remains how this memory can be retrieved from the system in order to make a perceptual decision. A general mechanism has been suggested to involve the temporal integration and comparison of the outputs of different pools of sensory neurons in order to compute a decision variable (Heekeren, Marrett, Bandettini, & Ungerleider, 2004). This temporal integration might be performed in one of several regions such as the dorsolateral prefrontal cortex (dlPFC) (Heekeren et al., 2004, Kim and Shadlen, 1999), lateral intraparietal area (LIP) (Shadlen & Newsome, 2001), superior colliculus (SC) (Horwitz & Newsome, 1999), frontal eye fields (FEF) (Schall, 2002, Schall and Thompson, 1999, Thompson et al., 1996) or intraparietal sulcus (IPS) (Colby & Goldberg, 1999), which all cooperate in order to translate the accumulated evidence into an action (Heekeren et al., 2008, Rorie and Newsome, 2005). Neuronal activity in integrator areas gradually increases and then remains elevated until a response is given, with the rate of increase being slower during more difficult trials. A successful approach to explaining these kinds of neurophysiological data has been through the means of drift–diffusion or race models (Bogacz et al., 2006, Schall and Thompson, 1999, Smith and Ratcliff, 2004), in which the noisy sensory information is integrated over time until a decision threshold is reached.
Here we present a large-scale model of a hierarchical spiking neural network (SNN) that integrates a low-level memory encoding mechanism with a higher-level decision process to perform a visual classification task in real-time. The model consists of Izhikevich neurons and conductance-based synapses for realistic approximation of neuronal dynamics (Dayan and Abbott, 2001, Izhikevich, 2003, Izhikevich et al., 2004), a STDP synaptic learning rule with additional synaptic dynamics for memory encoding (Brader et al., 2007), and an accumulator model for memory retrieval and categorization (Smith & Ratcliff, 2004). Grayscale input images were fed through a feed-forward network consisting of V1 and V2, which then projected to a layer of downstream classifier neurons through plastic synapses that implement the STDP-like learning rule mentioned above. Population responses of these classifier neurons were then integrated over time to make a perceptual decision about the presented stimulus. The full network, which comprised 71,026 neurons and approximately 133 million synapses, ran in real-time on a single off-the-shelf graphics processing unit (GPU).
In order to evaluate the feasibility of our model, we applied it to the extensively studied MNIST database of handwritten digits (LeCun et al., 1998). Due to the large variability within a given class of digits and a high level of correlation between members of different classes, the database provides stimuli whose categorization might span a wide range of difficulty levels, and as such is well-suited as a first benchmark for our model. However, it should be noted that MNIST does not pose many of the challenges of biological vision, such as distractors, occluders or translation invariance. Moreover, all the images are static and isolated in their receptive field. The network achieved 92% correct classifications, which is comparable to other SNN approaches (Brader et al., 2007, Querlioz et al., 2011) and simple machine learning algorithms (such as linear classifiers, k-Nearest Neighbor classifiers and simple artificial neural networks LeCun et al., 1998), but not to state-of-the-art models whose performance is close to 99.8% (Ciresan et al., 2011, Niu and Suen, 2012).
Additionally, our network produces reaction time (RT) distributions that are comparable to the behavioral RT distributions reported in psychophysical experiments. For example, we show that when the network makes an error, its RT is significantly slower than when making a correct class prediction; and that RTs do not decrease when the target stimulus has become familiar, which has also been observed in a rapid categorization study (Fabre-Thorpe, Richard, & Thorpe, 1998).
Although the present model does not reach the performance of specialized classification systems (Ciresan et al., 2011, Niu and Suen, 2012), our model represents a first step towards the construction of a general-purpose neurobiologically inspired model of visual recognition and perceptual decision-making. The model includes many neurobiologically inspired details not found in the algorithms described above. The present network was constructed on a publicly available SNN simulator that uses design principles, data structures, and process flows that are in compliance with general-purpose neuromorphic computer chips, and that allows for real-time execution on off-the-shelf GPUs (Richert, Nageswaran, Dutt, & Krichmar, 2011); its neuron model, synapse model, and address-event representation (AER) are compatible with recent neuromorphic hardware (Srinivasa & Cruz-Albrecht, 2012). Because of the scalability of our approach, the current model can readily be extended to an efficient neuromorphic implementation that supports the simulation of more generalized object recognition and decision-making regions found in the brain. Ultimately, understanding the neural mechanisms that mediate perceptual decision-making based on sensory evidence will further our understanding of how the brain is able to make more complex decisions we encounter in everyday life (Lieberman, 2007), and could shed light on phenomena like the misperception of objects in neuropsychiatric disorders such as schizophrenia (Persaud and Cutting, 1991, Summerfield et al., 2006).
Section snippets
Methods
We performed all simulations in a large-scale SNN simulator which allows the execution on both generic x86 central processing units (CPUs) and standard off-the-shelf GPUs (Richert et al., 2011). The simulator provides a PyNN-like environment (PyNN is a common programming interface developed by the neuronal simulation community) in C/C++ and is publicly available at http://www.socsci.uci.edu/~jkrichma/Richert-FrontNeuroinf-SourceCode.zip. The simulator’s API allows for details and parameters to
Results
We addressed the question of how many training samples are needed to allow good classification by varying the size of the training set between ten patterns (one per digit) and 2000 patterns (200 per digit). The testing set always consisted of 1000 patterns the network had not seen before. We ran a total of four experiments, where each experiment featured a set of training samples, and each experiment was run 100 times. The number of training cycles was adjusted such that overall 2000
Discussion
The main contributions of the present study are as follows. First, we modified the original model (Brader et al., 2007) to be more biologically plausible most notably by (i) implementing a SNN using Izhikevich spiking neurons and conductance-based synapses, (ii) implementing the different dynamics seen in excitatory and inhibitory neurons, (iii) incorporating a pre-processing stage that approximates the spatiotemporal tuning properties of simple and complex cells in the primary visual cortex,
Conclusion
We have shown the experimental results from a neurobiologically plausible spiking network that is able to rapidly categorize highly correlated patterns of neural activity. Our approach demonstrates how a STDP-like learning rule (previously described in Brader et al. (2007)) can be utilized to store object information in a SNN, and how a simple decision-making paradigm is able to retrieve this memory in a way that allows the network to generalize to a large number of MNIST exemplars.
Acknowledgments
We thank four anonymous reviewers for their feedback, which has greatly improved the manuscript. This work was supported by the Defense Advanced Research Projects Agency (DARPA) subcontract 801888-BS.
References (74)
Catastrophic forgetting in connectionist networks
Trends in Cognitive Sciences
(1999)- et al.
A model of STDP based on spatially and temporally local information: derivation and combination with gated decay
Neural Networks
(2005) - et al.
Pull–push neuromodulation of LTP and LTD enables bidirectional experience-induced synaptic scaling in visual cortex
Neuron
(2012) - et al.
A novel hybrid CNN-SVM classifier for recognizing handwritten digits
Pattern Recognition
(2012) - et al.
A general mechanism for decision-making in the human brain?
Trends in Cognitive Sciences
(2005) - et al.
A model of neuronal responses in visual area MT
Vision Research
(1998) - et al.
Rate, timing, and cooperativity jointly determine cortical synaptic plasticity
Neuron
(2001) - et al.
Psychology and neurobiology of simple decisions
Trends in Neurosciences
(2004) - et al.
‘What’ and ‘where’ in the human brain
Current Opinion in Neurobiology
(1994) - et al.
Invariant face and object recognition in the visual system
Progress in Neurobiology
(1997)