Elsevier

Neurocomputing

Volume 149, Part C, 3 February 2015, Pages 1270-1279
Neurocomputing

A Bayesian model for canonical circuits in the neocortex for parallelized and incremental learning of symbol representations

https://doi.org/10.1016/j.neucom.2014.09.002Get rights and content

Abstract

We present a Bayesian model for parallelized canonical circuits in the neocortex, which can partition a cognitive context into orthogonal symbol representations. The model is capable of learning from infinite sensory streams, updating itself with every new instance and without having to keep instances older than the last seen instance per symbol. The inherently incremental and parallel qualities of the model, allow it to scale to any number of symbols as they appear in the sensory stream, and to transparently follow non-stationary distributions for existing symbols. These qualities are made possible in part by a novel Bayesian inference method, which can run Metropolis-Hastings incrementally on a data stream, and significantly outperforms particle filters in a Bayesian neural network application.

Introduction

We present a model for a hypothetical functional unit of the neocortex and its relationship with proximal peers that share the same cognitive context. Our model is not one of the neocortex at large, which would require a network of cognitive contexts, but we hope it offers a building block for such an objective. Our approach is to first identify key computational aspects of the neocortex, and then build the model upon those assumptions. We demonstrate how the resulting model can orthogonalize a cognitive context developing representations for the cognitive symbols. The model is evaluated on popular machine learning datasets.

Our assumptions are detailed in Section 2. The principal assumption is the existence of an elementary functional unit in the neocortex, identified as a canonical circuit. Next, we assume that each canonical circuit develops to represent a particular cognitive symbol, by learning towards data associated with its symbol, and by learning away from the data of all other symbols. Another assumption is that each canonical circuit must execute concurrently with all other canonical circuits, in complete task parallelism. Next, we assume that neocortical computation is analogous to Bayesian inference, and we approach this aspect through Marr׳s three levels of analysis. Lastly, we assume that canonical circuits must operate inherently incremental by learning from only a few examples, from infinite data streams, and without having to store old data or use multiple epochs. There is existing work in neocortical computational modeling which covers the previous listed assumptions to a certain extent, either individually or in subsets. However, we are not aware of any work which covers all the assumptions jointly. One of our contributions is that we identify the state of each aspect in current neuroscience, propose correlations between them, and propose a model that puts them all in a common framework.

In our model, each cognitive symbol is represented by a canonical circuit in the form of an independent Bayesian neural network. Each of these neural networks updates with its own Bayesian inference process. The inference processes of all cognitive symbols are coupled in an inhibitory way, so that each pursues uniqueness and the overall result is orthogonalization of the cognitive context which describes the data stream. The model starts blank and adds a canonical circuit for each new symbol as it shows up in the stream. For example, cognitive context could be “direction of motion,” with its symbols being “up”, “down”, “right”, etc. Fig. 1 shows a simplified visualization of how canonical circuits orthogonalize a cognitive context. Due to the task parallelism, regardless of how many canonical circuits become involved, the model runs in constant time, and the canonical circuits can be distributed to different processors or machines across the network.

In order to meet the requirements for incremental learning, we developed a novel Bayesian inference method that runs Metropolis-Hastings (MH) on a data stream. The method makes it possible for a single data instance to be sufficient in forming a useful representation of a symbol, and for each symbol to update with each new data instance efficiently. No data instances, before the last seen per symbol, have to be kept for subsequent updates. We discuss how it is possible, in an optimal model, for not even the latest instance to be required. We call our method Incremental Metropolis-Hastings (IMH). IMH recurrently re-uses the last posterior as a new prior. Priors and posteriors are represented as non-parametric probability distributions, utilized through Monte Carlo or kernel density estimators. Therefore, the inference does not suffer from limitations of point-based approximation such as Maximum-a-Posteriori.

From a purely computational perspective, we contribute a Bayesian classification model which is capable of supervised learning of an unlimited number of symbols (classes) from an infinite data stream, and which has simple parameterization. Because of its incremental learning qualities, the model is unique in handling concept drift transparently, i.e. it inherently supports non-stationary class distributions. We show that it matches the performance of state-of-the-art incremental learning methods. IMH is also a computational contribution in itself, because we show how it vastly outperforms particle filters for incremental Bayesian inference, at least with a neural network model.

In the following section we present the background for our model, structured as a literature review of the principal assumptions, each of them identified as a subsection. We attempt to relate them by expressing them into a shared terminology, which allows for a unified perspective upon which we build the model. In the third section we describe our model in detail. The forth section reviews existing and related models. In the fifth section we present the evaluation results, after which we finish with a Conclusions section.

Section snippets

Canonical circuits in the neocortex

The idea of elementary circuits as functional modules in the neocortex was hypothesized as early as 1938 [1], though it remains an open question [2]. A prominent hypothesis of this type is the columnar view of the neocortex, based on functional identification of neural circuits perpendicular to the pial surface [3], [4], [5], as well as a repeating template of neural distribution and connectivity found in such circuits [6], [7], [8]. In the columnar hypothesis, the smallest circuit is called a

Overview and definitions

This work presents two independent but complementing contributions. The principal contribution is a model for incremental and Bayesian orthogonalization of a cognitive context, where each resulting cognitive symbol is adopted by a canonical circuit, and where all canonical circuits execute in parallel. The model describes how canonical circuits relate in a cognitive context, and how cognitive symbols develop while inhibiting each other. We refer to this model as Cognitive Context

Biomorphic perspective

Computational models of the neocortex vary greatly in their objectives and biological inspiration. Many of them do not pursue a generic function but are specialized models, such as for example in modeling ocular dominance with a 2D grid over which simple Hebbian-type relationships are simulated [25]. Models which are more generic cover only a subset of our five principal assumptions. Even if we ignore these assumption, existing models usually present either no evaluation or one done on

Evaluation

The objective of the evaluation is to see how the CCON works as a supervised classifier on data streams. Part of the evaluation is also to compare IMH to particle filters used as inference methods in CCON. It is a limitation in our work that we do not evaluate on datasets that are more related to biological behavior. We are partially prevented from doing this because we would need a network of cognitive contexts, which is part of the future work.

We evaluate CCON in classification tasks on data

Conclusion

We started by identifying five assumptions about neocortical computation: the existence of canonical circuits, their relationship to cognitive symbols and contexts, Bayesian aspects, parallelism and incremental requirements. Following these requirements, we built our CCON model for a single cognitive context which is continually orthogonalized into symbols by canonical circuits, can follow the evolution of non-stationary symbols, and can grow the number of symbols dynamically as the context

Martin Dimkovski is a Computer Science Ph.D. student at York University in Toronto, Canada, in Dr. Aijun An׳s team. He is an Elia scholar, who is awarded at York University for achievements in liberal education and interdisciplinary studies. Martin holds a M.Sc. in Computer Science, a M.Sc. in Information Technology, and a B.Sc. in Computer Science. His research interests are in artificial intelligence and knowledge technology, in particular related to modeling biological intelligence.

References (40)

  • A.M. Bastos et al.

    Canonical microcircuits for predictive coding

    Neuron

    (2012)
  • S. Harnad

    The symbol grounding problem

    Phys. D: Nonlinear Phenom.

    (1990)
  • A.M. Bastos et al.

    Canonical microcircuits for predictive coding

    Neuron

    (2012)
  • R.L. de No

    Architecture and structure of the cerebral cortex

  • G.J. Rinkus

    A cortical sparse distributed coding model linking mini-and macrocolumn-scale functionality

    Front. Neuroanat.

    (2010)
  • V.B. Mountcastle

    The columnar organization of the neocortex

    Brain

    (1997)
  • D.P. Buxhoeveden et al.

    The minicolumn hypothesis in neuroscience

    Brain

    (2002)
  • J. Szentagothai, The Ferrier lecture, 1977: the neuron network of the cerebral cortex: a functional interpretation,...
  • S. Haeusler et al.

    A statistical analysis of information-processing properties of lamina-specific cortical microcircuit models

    Cereb. Cortex

    (2007)
  • H. Markram et al.

    Interneurons of the neocortical inhibitory system

    Nat. Rev. Neurosci

    (2004)
  • N.M. da Costa et al.

    Whose cortical column would that be?

    Front. Neuroanat.

    (2010)
  • J. DeFelipe et al.

    The neocortical column

    Front. Neuroanat.

    (2012)
  • J.B. Tenenbaum et al.

    How to grow a mindstatistics, structure, and abstraction

    Science

    (2011)
  • K.J. Friston et al.

    Free-energy and the brain

    Synthese

    (2007)
  • D. George et al.

    Towards a mathematical theory of cortical micro-circuits

    PLoS Comput. Biol.

    (2009)
  • A. Gelman et al.

    Bayesian Data Analysis

    (2004)
  • C.M. Bishop

    Pattern Recognition and Machine Learning

    (2006)
  • D. Marr

    Vision: A Computational Investigation into the Human Representation and Processing of Visual Information

    (1982)
  • E. Vul, N.D. Goodman, T.L. Griffiths, J.B. Tenenbaum, One and done? Optimal decisions from very few samples, in:...
  • A.N. Sanborn, T.L. Griffiths, D.J. Navarro, A more rational model of categorization, in: Proceedings of the 28th Annual...
  • Cited by (9)

    View all citing articles on Scopus

    Martin Dimkovski is a Computer Science Ph.D. student at York University in Toronto, Canada, in Dr. Aijun An׳s team. He is an Elia scholar, who is awarded at York University for achievements in liberal education and interdisciplinary studies. Martin holds a M.Sc. in Computer Science, a M.Sc. in Information Technology, and a B.Sc. in Computer Science. His research interests are in artificial intelligence and knowledge technology, in particular related to modeling biological intelligence.

    Aijun An is a Professor of Computer Science at York University, Toronto, Canada. She received her PhD degree in computer science from the University of Regina in 1997, and held research positions at the University of Waterloo from 1997 to 2001. She joined York University in 2001. Her research area is data mining. She has published widely in premier journals and conference proceedings on various topics of data mining, including classification, clustering, data stream mining, transitional and diverging pattern mining, high utility pattern mining, sentiment and emotion analysis from text, topic detection, keyword search on graphs, social network analysis, and bioinformatics.

    View full text