Elsevier

Neurocomputing

Volume 171, 1 January 2016, Pages 1099-1107
Neurocomputing

A neural model that implements probabilistic topics

https://doi.org/10.1016/j.neucom.2015.07.061Get rights and content

Abstract

We present a neural network model that can execute some of the procedures used in the information sciences literature. In particular we offer a simplified notion of topic and how to implement it using neural networks that use the Kronecker tensor product. We show that the topic detecting mechanism is related to Naive Bayes statistical classifiers, and that it is able to disambiguate the meaning of polysemous words. We evaluate our network in a text categorization task, resulting in performance levels comparable to Naive Bayes classifiers, as expected. Hence, we propose a simple scalable neural model capable of dealing with machine learning tasks, while retaining biological plausibility and probabilistic transparency.

Introduction

Data-driven approaches to natural language processing rely on the extraction of meaningful knowledge from large amounts of data. Many of these techniques have also been applied as models of human cognition, in particular as models of lexical, semantic and pragmatic knowledge. In recent years, tools like Latent Semantic Analysis (LSA) [5], HAL [25], [24], generative topic models [4], [7], and lately BEAGLE [16] have been developed and proved to be successful in modeling several cognitive activities [15], [10].

The fact that these tools match some capabilities of human cognition constitutes a good opportunity to understand what is required to implement complex data processing abilities as those seen in humans [2]. Nevertheless, the fact that some methods are good at processing language-related material does not mean that the way they operate is necessarily the same as those of brains. Given the specificity of the problems these tools tackle, their capacities might only be limited to particular applications [46] and so despite the impressive success of recent machine learning applications there is still much to understand about how the brain solves similar challenges. Clearly, many of these algorithms are not able to be considered as brain-like procedures due to their weak biological plausibility. Moreover, the hardware used to run those algorithms differs from the brain on many aspects, in a sense that many of them would run extremely slow in the brain, making them unfeasible as neurally plausible models of human cognition.

What we propose is to use the insights gained from successful information sciences methods and explore their possible implementation in a neural-based framework. As our efforts are directed to understand the neural implementation of cognitive functions, we have been studying the relationship between these methods and neural network models [50], [31], [34]. In particular, we have shown that extensions (see below) of associative matrix models overlap both in capacities and formal properties with LSA. Our approach is related to the work of Serrano et al. [43], [44] on cognitive models of reading, although we use neural network models instead of spreading activation on semantic networks.

A recent advancement in these field has been the advent of deep learning methods, a powerful approach (or set of approaches) to machine learning and natural language processing. It is structured around coupled hierarchical layers (or modules) that via different learning procedures extract and store structured information. In these computational procedures, information reaches higher levels of abstraction through the connections between these layers (or modules) [3]. Many deep learning procedures use Artificial Neural Networks (ANN), which acquire knowledge by means of unsupervised algorithms or learning algorithms reflecting the self-organization of sensory inputs (involved in language), together with supervised learning algorithms linking the hierarchical modules [47]. Usually the ANN are modules containing hidden layers and the supervised learning is based on gradient descent algorithms (eg. backpropagation (BP)). A related influential approach was developed by Hinton and Collaborators (see, for instance [45]) using multi-modular systems with an unsupervised learning phase carried out by Deep Boltzmann Machines.

In the present work we describe a modular device that processes contexts using the Kronecker product between context vectors and key patterns. This kind of formalism can be thought of as another building block for deep learning approaches, where layers are conceived as modules and that can be connected with other modules in order to build a deep learning hierarchy. We recently described how this modeling approach can be related to factual fMRI data obtained from the brain alterations associated with language impairments in schizophrenia [48]. In addition, from the computational point of view, the use of the Kronecker product can have some advantages. On the one hand, in each module, any multi-layer perceptron trainable by algorithms of the type of BP can be replaced by a one layer network expanded by the extra dimensions produced by the Kronecker product and trainable with the Widrow–Hoff algorithm, another gradient descent procedure that usually exhibits more rapid convergence (see for instance [38], [49]). On the other hand, the factorization implied by the Kronecker product allows in many cases to reduce significantly the computational complexity of computations [14], [13]. There has been some recent attempts to use of tensor operations (as the Kronecker product) in deep learning architectures (see [12]) something that adds to the relevance of this type of model to machine learning.

A further interest in using the Kronecker products in neural modeling emerges from the fact that it belongs to a set of powerful matrix operations with remarkable algebraic properties [23]. These properties allow the construction of mathematical theories able to describe the dynamics of complex hierarchical neural networks. An example of this kind of mathematical construction shows how the neural computations of order relations in the neural processing of natural language, and usually coded in simple propositions, can be understood as a three level hierarchical device that transports concrete factual conceptualized data embedded in a particular contextual query (eg. Is this cat larger than this dog?) towards abstract neural modules capable of providing the correct answer [32], [33].

When we pursue our objective of inspiring neural models in machine learning techniques, many techniques available pose somewhat a challenge, since most of them have something of value to add, but the connections between them are seldom expressed. We want to move this isolation of approaches toward the unification in a powerful theory. To this end, here we explore the connection between probabilistic models and neural network models in the context of topic detection. The link between neural models and inference and statistics has been recognized from the beginning of current neural modeling efforts [36], [30], and given the relationship between LSA and the other methods mentioned above, it should come as no surprise that neural models are related to probabilistic topic models. Nevertheless, there are several recent developments that call to renew interest in connecting both worlds. First, the last few years have seen a tremendous development of probabilistic models and associated methods. Second, the amount of data that can be used to test these methods has increased. In a more theoretical vein, there are some ongoing debates in the literature that oppose probabilistic models and neural network models [41], [29], [8], [28].

In the present work we propose a basic neural architecture displaying a probabilistic topic identification capacity. In the second section we discuss the notion of topic. Then we present the model and show how it can be used to achieve word sense disambiguation. In the third part we show how our model is related to probabilistic models. In the fourth section we submit our model to the stringent tests of text categorization benchmarks, showing its potential as a viable implementation of the approaches of information sciences to topic detection.

The main results are that one version of our model implements a definite statistical procedure. Despite its simplicity our model has a reasonable capacity to categorize texts. It is not our goal to match the current state-of-the-art machine learning algorithms, a highly developed and dynamic research field. However, the discover of neurally plausible procedures can open new promising research avenues and provide building blocks for future models capable of exploiting the potentialities of the human brain to decode highly structured linguistic patterns.

Section snippets

The notion of topic

Before describing our neural model of topic detection, let us discuss what we mean by topic. Although different disciplines refer to different variants of the concept of topic, we assume that most share the notion that a topic is a brief description of a certain domain of knowledge. Within this broad definition there surely co-exist many conceptions. For instance, for probabilistic topic models [4] a topic is a probability distribution over words, whereas for vector space methods it is a

The neural model of topic detection

To instantiate the notion of topic mentioned in the previous section, we will use context-dependent matrix memory modules (CDMM), in particular a recurrent variant that has been used to model language phenomena [49].

In a CDMM, a memory module receives two vectors, the input and the context, and the module associates an output to the Kronecker product of the inputs. The Kronecker product of matrices A=[aij] and B=[bij] is defined as AB=[aijB],and can be applied to matrices of any dimensions. It

Topic selector performance: text categorization

To better assess the capability of the model to recognize topics properly, we tested the topic selector module on a text categorization task. In the field of information retrieval, text categorization is concerned with the assignment of predefined categories to novel texts [52], and usually deals with supervised methods and data collections that include separate document sets for training and testing. In this context, it can be used as a reference task for evaluating topic selector performance

Discussion

The main objective of this work is to connect one particular type of neural network and probabilistic models. We have been working for many years in context dependent memory modules based on the Kronecker product. This type of network allows for a clear probabilistic interpretation. To demonstrate the viability of the proposed model as a part of a language processing network we show how some text processing related tasks can be implemented in this model. In that sense, our network should be

Acknowledgements

J.C.V.L. and E.M. acknowledge the partial financial support by PEDECIBA and by Grant id2012/492 CSIC-UdelaR. A.C. was supportedby a fellowship from PEDECIBA and ANII.

Álvaro Cabana obtained his PhD in Biology in 2014 from Universidad de la República, and is currently Assistant Professor at School of Psychology, Universidad de la República. He is currently interested in computational approaches to cognitive phenomena and the neurophysiology of language processing.

References (53)

  • J. Ignacio Serrano et al.

    Dealing with written language semantics by a connectionist model of cognitive reading

    Neurocomputing

    (2009)
  • J.C. Valle-Lisboa et al.

    A modular approach to language productionmodels and facts

    Cortex

    (2014)
  • J.C. Valle-Lisboa et al.

    Elman topology with sigma-pi unitsan application to the modeling of verbal hallucinations in schizophrenia

    Neural Netw.

    (2005)
  • J.C. Valle-Lisboa et al.

    The uncovering of hidden structures by latent semantic analysis

    Inf. Sci.

    (2007)
  • C. Apté et al.

    Automated learning of decision rules for text categorization

    ACM Trans. Inf. Syst.

    (1994)
  • Y .Bengio et al.

    Representation learninga review and new perspectives

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • D.M. Blei et al.

    Latent Dirichlet Allocation

    J. Mach. Learn. Res.

    (2003)
  • S. Deerwester et al.

    Indexing by latent semantic analysis

    J. Am. Soc. Inf. Sci.

    (1990)
  • S. Deneve.

    Bayesian spiking neurons iinference

    Neural Comput.

    (2008)
  • T.L. Griffiths et al.

    Finding scientific topics

    Proc. Natl. Acad. Sci. USA

    (2004)
  • W. Hersh, C. Buckley, T.J. Leone, D. Hickam, Ohsumed: an interactive retrieval evaluation and new large test collection...
  • B. Hutchinson et al.

    Tensor deep stacking networks

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • R. Grosse, J. Martens, Optimizing neural networks with Kronecker-factored approximate curvature, arXiv preprint...
  • A.K. Jain

    Fundamentals of Digital Image Processing

    (1989)
  • M.N. Jones et al.

    Representing word meaning and order information in a composite holographic lexicon

    Psychol. Rev.

    (2007)
  • W. Kintsch, On the notions of theme and topic in psychological process models of text comprehension, in: M. Louwerse,...
  • Cited by (4)

    • Neural network forecasting of news feeds

      2021, Expert Systems with Applications
      Citation Excerpt :

      The above approaches are based on recognition of events in the text, construction of their time series and forecasting. The second subgroup of publications includes works on the search and application of statistical methods (Mele, Bahrainian, & Crestani, 2019; Hurtado, Agarwal, & Zhu, 2016; Zhang et al., 2016; Curiskis, Drake, Osborn, & Kennedy, 2019; Abuhay, Nigatie, & Kovalchuk, 2018) and neural networks (Widodo, Naomi, Suharjito, & Purnomo, 2013; Cabana, Mizraji, & Valle-Lisboa, 2016; Wang, Zhou, & He, 2019) for forecasting topics and text content. The well-known publications of this subgroup use mainly the ideas of topics disclosure and forecasting based on the associative and probabilistic approaches.

    • Rapid field identification of cites timber species by deep learning

      2020, Trees, Forests and People
      Citation Excerpt :

      The fully connected neurons can be arranged in multiple layers. The number of neurons depends on the classes or objects that the neural network is to distinguish Cabana et al., 2015. For the classification of remote sensing data, RFs are becoming increasingly important (Lawrence et al., 2006; Ghimire et al., 2010; Guo et al., 2011; Mellor et al., 2013; Belgiu and Drăguţ, 2016).

    Álvaro Cabana obtained his PhD in Biology in 2014 from Universidad de la República, and is currently Assistant Professor at School of Psychology, Universidad de la República. He is currently interested in computational approaches to cognitive phenomena and the neurophysiology of language processing.

    Eduardo Mizraji is Professor of Biophysics at the Universidad de la República in Montevideo, Uruguay. He obtained a MD degree from Universidad de la República and a DEA in Applied Mathematics from Université Paris V. His research interests include artificial neural networks and information processing in biological systems. Over the past decades he developed an algebraic formalism for logical operators inspired by distributed memory matrix models.

    Juan Valle-Lisboa has a degree in Biochemistry (Faculty of Sciences-UDELAR), a Master in Biophysics (PEDECIBA) and a PhD in Biological Sciences (PEDECIBA). His main focus is on the use of Neural Network models and the study of their strengths and weaknesses as models of cognitive activities. He is interested in applications of network models of language processing in physiological and pathological conditions. He is currently trying to understand lexical acquisition and how lexical knowledge is represented in the Brain.

    View full text