A neural model that implements probabilistic topics
Introduction
Data-driven approaches to natural language processing rely on the extraction of meaningful knowledge from large amounts of data. Many of these techniques have also been applied as models of human cognition, in particular as models of lexical, semantic and pragmatic knowledge. In recent years, tools like Latent Semantic Analysis (LSA) [5], HAL [25], [24], generative topic models [4], [7], and lately BEAGLE [16] have been developed and proved to be successful in modeling several cognitive activities [15], [10].
The fact that these tools match some capabilities of human cognition constitutes a good opportunity to understand what is required to implement complex data processing abilities as those seen in humans [2]. Nevertheless, the fact that some methods are good at processing language-related material does not mean that the way they operate is necessarily the same as those of brains. Given the specificity of the problems these tools tackle, their capacities might only be limited to particular applications [46] and so despite the impressive success of recent machine learning applications there is still much to understand about how the brain solves similar challenges. Clearly, many of these algorithms are not able to be considered as brain-like procedures due to their weak biological plausibility. Moreover, the hardware used to run those algorithms differs from the brain on many aspects, in a sense that many of them would run extremely slow in the brain, making them unfeasible as neurally plausible models of human cognition.
What we propose is to use the insights gained from successful information sciences methods and explore their possible implementation in a neural-based framework. As our efforts are directed to understand the neural implementation of cognitive functions, we have been studying the relationship between these methods and neural network models [50], [31], [34]. In particular, we have shown that extensions (see below) of associative matrix models overlap both in capacities and formal properties with LSA. Our approach is related to the work of Serrano et al. [43], [44] on cognitive models of reading, although we use neural network models instead of spreading activation on semantic networks.
A recent advancement in these field has been the advent of deep learning methods, a powerful approach (or set of approaches) to machine learning and natural language processing. It is structured around coupled hierarchical layers (or modules) that via different learning procedures extract and store structured information. In these computational procedures, information reaches higher levels of abstraction through the connections between these layers (or modules) [3]. Many deep learning procedures use Artificial Neural Networks (ANN), which acquire knowledge by means of unsupervised algorithms or learning algorithms reflecting the self-organization of sensory inputs (involved in language), together with supervised learning algorithms linking the hierarchical modules [47]. Usually the ANN are modules containing hidden layers and the supervised learning is based on gradient descent algorithms (eg. backpropagation (BP)). A related influential approach was developed by Hinton and Collaborators (see, for instance [45]) using multi-modular systems with an unsupervised learning phase carried out by Deep Boltzmann Machines.
In the present work we describe a modular device that processes contexts using the Kronecker product between context vectors and key patterns. This kind of formalism can be thought of as another building block for deep learning approaches, where layers are conceived as modules and that can be connected with other modules in order to build a deep learning hierarchy. We recently described how this modeling approach can be related to factual fMRI data obtained from the brain alterations associated with language impairments in schizophrenia [48]. In addition, from the computational point of view, the use of the Kronecker product can have some advantages. On the one hand, in each module, any multi-layer perceptron trainable by algorithms of the type of BP can be replaced by a one layer network expanded by the extra dimensions produced by the Kronecker product and trainable with the Widrow–Hoff algorithm, another gradient descent procedure that usually exhibits more rapid convergence (see for instance [38], [49]). On the other hand, the factorization implied by the Kronecker product allows in many cases to reduce significantly the computational complexity of computations [14], [13]. There has been some recent attempts to use of tensor operations (as the Kronecker product) in deep learning architectures (see [12]) something that adds to the relevance of this type of model to machine learning.
A further interest in using the Kronecker products in neural modeling emerges from the fact that it belongs to a set of powerful matrix operations with remarkable algebraic properties [23]. These properties allow the construction of mathematical theories able to describe the dynamics of complex hierarchical neural networks. An example of this kind of mathematical construction shows how the neural computations of order relations in the neural processing of natural language, and usually coded in simple propositions, can be understood as a three level hierarchical device that transports concrete factual conceptualized data embedded in a particular contextual query (eg. Is this cat larger than this dog?) towards abstract neural modules capable of providing the correct answer [32], [33].
When we pursue our objective of inspiring neural models in machine learning techniques, many techniques available pose somewhat a challenge, since most of them have something of value to add, but the connections between them are seldom expressed. We want to move this isolation of approaches toward the unification in a powerful theory. To this end, here we explore the connection between probabilistic models and neural network models in the context of topic detection. The link between neural models and inference and statistics has been recognized from the beginning of current neural modeling efforts [36], [30], and given the relationship between LSA and the other methods mentioned above, it should come as no surprise that neural models are related to probabilistic topic models. Nevertheless, there are several recent developments that call to renew interest in connecting both worlds. First, the last few years have seen a tremendous development of probabilistic models and associated methods. Second, the amount of data that can be used to test these methods has increased. In a more theoretical vein, there are some ongoing debates in the literature that oppose probabilistic models and neural network models [41], [29], [8], [28].
In the present work we propose a basic neural architecture displaying a probabilistic topic identification capacity. In the second section we discuss the notion of topic. Then we present the model and show how it can be used to achieve word sense disambiguation. In the third part we show how our model is related to probabilistic models. In the fourth section we submit our model to the stringent tests of text categorization benchmarks, showing its potential as a viable implementation of the approaches of information sciences to topic detection.
The main results are that one version of our model implements a definite statistical procedure. Despite its simplicity our model has a reasonable capacity to categorize texts. It is not our goal to match the current state-of-the-art machine learning algorithms, a highly developed and dynamic research field. However, the discover of neurally plausible procedures can open new promising research avenues and provide building blocks for future models capable of exploiting the potentialities of the human brain to decode highly structured linguistic patterns.
Section snippets
The notion of topic
Before describing our neural model of topic detection, let us discuss what we mean by topic. Although different disciplines refer to different variants of the concept of topic, we assume that most share the notion that a topic is a brief description of a certain domain of knowledge. Within this broad definition there surely co-exist many conceptions. For instance, for probabilistic topic models [4] a topic is a probability distribution over words, whereas for vector space methods it is a
The neural model of topic detection
To instantiate the notion of topic mentioned in the previous section, we will use context-dependent matrix memory modules (CDMM), in particular a recurrent variant that has been used to model language phenomena [49].
In a CDMM, a memory module receives two vectors, the input and the context, and the module associates an output to the Kronecker product of the inputs. The Kronecker product of matrices and is defined as and can be applied to matrices of any dimensions. It
Topic selector performance: text categorization
To better assess the capability of the model to recognize topics properly, we tested the topic selector module on a text categorization task. In the field of information retrieval, text categorization is concerned with the assignment of predefined categories to novel texts [52], and usually deals with supervised methods and data collections that include separate document sets for training and testing. In this context, it can be used as a reference task for evaluating topic selector performance
Discussion
The main objective of this work is to connect one particular type of neural network and probabilistic models. We have been working for many years in context dependent memory modules based on the Kronecker product. This type of network allows for a clear probabilistic interpretation. To demonstrate the viability of the proposed model as a part of a language processing network we show how some text processing related tasks can be implemented in this model. In that sense, our network should be
Acknowledgements
J.C.V.L. and E.M. acknowledge the partial financial support by PEDECIBA and by Grant id2012/492 CSIC-UdelaR. A.C. was supportedby a fellowship from PEDECIBA and ANII.
Álvaro Cabana obtained his PhD in Biology in 2014 from Universidad de la República, and is currently Assistant Professor at School of Psychology, Universidad de la República. He is currently interested in computational approaches to cognitive phenomena and the neurophysiology of language processing.
References (53)
- et al.
Structured cognition and neural systemsfrom rats to language
Neurosci. Biobehav. Rev.
(2012) - et al.
Probabilistic models of cognitionexploring representations and inductive biases
Trends Cogn. Sci.
(2010) Neural Models of Normal and Abnormal Behavior: What do Schizophrenia, Parkinsonism, Attention Deficit Disorder, and Depression have in Common?
Prog. Brain. Res.
(1999)- et al.
Activating event knowledge
Cognition
(2009) - et al.
High-dimensional semantic space accounts of priming
J. Mem. Lang.
(2006) Rethinking eliminative connectionism
Cognit. Psychol.
(1998)Neither size fits allcomment on McClelland et al. and Griffiths et al.
Trends Cognit. Sci.
(2010)- et al.
Letting structure emergeconnectionist and dynamical systems approaches to cognition
Trends Cognit. Sci.
(2010) - et al.
Memories in context
BioSystems
(1999) Deep learning in neural networksan overview
Neural Netw.
(2015)
Dealing with written language semantics by a connectionist model of cognitive reading
Neurocomputing
A modular approach to language productionmodels and facts
Cortex
Elman topology with sigma-pi unitsan application to the modeling of verbal hallucinations in schizophrenia
Neural Netw.
The uncovering of hidden structures by latent semantic analysis
Inf. Sci.
Automated learning of decision rules for text categorization
ACM Trans. Inf. Syst.
Representation learninga review and new perspectives
IEEE Trans. Pattern Anal. Mach. Intell.
Latent Dirichlet Allocation
J. Mach. Learn. Res.
Indexing by latent semantic analysis
J. Am. Soc. Inf. Sci.
Bayesian spiking neurons iinference
Neural Comput.
Finding scientific topics
Proc. Natl. Acad. Sci. USA
Tensor deep stacking networks
IEEE Trans. Pattern Anal. Mach. Intell.
Fundamentals of Digital Image Processing
Representing word meaning and order information in a composite holographic lexicon
Psychol. Rev.
Cited by (4)
Neural network forecasting of news feeds
2021, Expert Systems with ApplicationsCitation Excerpt :The above approaches are based on recognition of events in the text, construction of their time series and forecasting. The second subgroup of publications includes works on the search and application of statistical methods (Mele, Bahrainian, & Crestani, 2019; Hurtado, Agarwal, & Zhu, 2016; Zhang et al., 2016; Curiskis, Drake, Osborn, & Kennedy, 2019; Abuhay, Nigatie, & Kovalchuk, 2018) and neural networks (Widodo, Naomi, Suharjito, & Purnomo, 2013; Cabana, Mizraji, & Valle-Lisboa, 2016; Wang, Zhou, & He, 2019) for forecasting topics and text content. The well-known publications of this subgroup use mainly the ideas of topics disclosure and forecasting based on the associative and probabilistic approaches.
Rapid field identification of cites timber species by deep learning
2020, Trees, Forests and PeopleCitation Excerpt :The fully connected neurons can be arranged in multiple layers. The number of neurons depends on the classes or objects that the neural network is to distinguish Cabana et al., 2015. For the classification of remote sensing data, RFs are becoming increasingly important (Lawrence et al., 2006; Ghimire et al., 2010; Guo et al., 2011; Mellor et al., 2013; Belgiu and Drăguţ, 2016).
Multiplicative processing in the modeling of cognitive activities in large neural networks
2023, Biophysical ReviewsRECURRENT NEURAL NETWORKS WITH CONTINUOUS LEARNING IN PROBLEMS OF NEWS STREAMS MULTIFUNCTIONAL PROCESSING
2022, Informatics and Automation
Álvaro Cabana obtained his PhD in Biology in 2014 from Universidad de la República, and is currently Assistant Professor at School of Psychology, Universidad de la República. He is currently interested in computational approaches to cognitive phenomena and the neurophysiology of language processing.
Eduardo Mizraji is Professor of Biophysics at the Universidad de la República in Montevideo, Uruguay. He obtained a MD degree from Universidad de la República and a DEA in Applied Mathematics from Université Paris V. His research interests include artificial neural networks and information processing in biological systems. Over the past decades he developed an algebraic formalism for logical operators inspired by distributed memory matrix models.
Juan Valle-Lisboa has a degree in Biochemistry (Faculty of Sciences-UDELAR), a Master in Biophysics (PEDECIBA) and a PhD in Biological Sciences (PEDECIBA). His main focus is on the use of Neural Network models and the study of their strengths and weaknesses as models of cognitive activities. He is interested in applications of network models of language processing in physiological and pathological conditions. He is currently trying to understand lexical acquisition and how lexical knowledge is represented in the Brain.