Elsevier

Neurocomputing

Volume 456, 7 October 2021, Pages 407-420
Neurocomputing

Encoding-based memory for recurrent neural networks

https://doi.org/10.1016/j.neucom.2021.04.051Get rights and content

Abstract

Learning to solve sequential tasks with recurrent models requires the ability to memorize long sequences and to extract task-relevant features from them. In this paper, we study memorization from the point of view of the design and training of recurrent neural networks. We study how to maximize the short-term memory of recurrent units, an objective difficult to achieve using backpropagation. We propose a new model, the Linear Memory Network, which features an encoding-based memorization component built with a linear autoencoder for sequences. Additionally, we provide a specialized training algorithm that initializes the memory to efficiently encode the hidden activations of the network. Experimental results on synthetic and real-world datasets show that the chosen encoding mechanism is superior to static encodings such as orthogonal models and the delay line. The method also outperforms RNN and LSTM units trained using stochastic gradient descent. Experiments on symbolic music modeling show that the training algorithm specialized for the memorization component improves the final performance compared to stochastic gradient descent.

Introduction

Sequential data are ubiquitous and characterize as the primary and most effective representation for several information domains, including music, speech, text, videos. Recurrent neural networks are possibly the most popular learning models for sequential data, with impacting applications in natural language processing [1], speech recognition [2] or time series analysis [3]. Nonetheless, the adaptive processing of sequential data still poses high challenges mainly due to its dynamical nature, which requires maintaining an appropriate short-term memory of the elements of the sequence to succeed in the downstream tasks.

When it comes to recurrent neural models, such challenges have been well materialized by the exploding and vanishing gradient problem [4] which affects the capability of the recurrent models to learn long-term dependencies between elements of the sequences. Successive studies [5] have further deepened the understanding of such gradient issues from a geometric and dynamical system perspective, promoting a whole body of research focusing on attempts to solve these problems with gated architectures [6], [7], [8] or through orthogonal models [9], [10]. Although from different perspectives, both approaches attempt to circumvent the instability of the gradient from the same perspective, by attacking the underlying numerical problem, either through an architectural solution or by leveraging spectral properties of the weight matrices. A second line of research focuses on characterizing the short-term memory capacity of recurrent networks to gain insights allowing to surpass the limitations of dynamical memories in recurrent models [11]. A notable attempt, in this sense, is the reservoir computing design paradigm [12], [13], [14], which advocates the use of untrained dynamic memories with well-defined guarantees on their short-term memory capacity, again leveraging spectral considerations on recurrent weight matrices.

In this paper, we propose a novel approach to address memorization challenges in recurrent neural models, which puts forward a third way between the random encoding in the reservoir paradigm and the vanishing-gradient prone approach of fully-trained recurrent neural networks [4]. The objective is to train memorization units to maximize their short-term memory capacity, a difficult objective to achieve with backpropagation. Our contribution founds on an original intuition concerning the nature of supervised sequential tasks. We claim that solving sequential tasks entails addressing two separate subproblems: (i) the extraction of an input representation that is informative and effective for successfully addressing the target tasks; (ii) the memorization of relevant and task-effective information using limited resources [12], [15]. In the remainder of this paper, we will simply refer to the short-term memory and memorization component as memory for brevity. Notice that, this separation is not strictly necessary to train recurrent models, however, we believe that such separation may be helpful to devise novel and more effective models and algorithmic solutions. In fact, we show how such intuition can be mapped to a new design principle for recurrent neural networks which explicitly considers the conceptual separation of sequential problems into two subtasks:

  • functional subtask, that is the extraction of task-relevant features, and therefore the mapping from the previous input subsequence and the current timestep into a set of abstract task-dependent features.

  • memorization subtask, that is responsible for the update of the internal state of the model, and therefore the memorization of the new task-relevant features.

Throughout this work, we describe how such conceptual separation principles can be translated to a practical architectural pattern for recurrent models, which allows simplifying their design and defining specialised training algorithm to optimize their memorization abilities. Fig. 1 provides a first high-level overview of our architectural design: note how under our separation principle, only the memorization component is recurrent (however, the recurrence still affects the functional component indirectly). The paper further provides a concretisation of the proposed architectural design by introducing a novel recurrent neural network, dubbed Linear Memory Network (LMN), that addresses the memorization subtask through a simple short-term memory component based on a linear autoencoder for sequences (LAES) [16]. The linear coding provided by the LAES allows to efficiently encode the sequence of features, improving the short-term memory capacity of the model. We show how such a simple memory can be leveraged to provide a natural solution to the memorization problem where the autoencoder can be trained to memorize the entire sequence of task-relevant features, while the functional component can leverage an expressive non-linear model to realize the input-output mapping. In addition, we propose a novel training algorithm that builds on our reference architecture by attempting to solve the two subproblems separately, in different stages. Finally, the proposed training approach integrates the two components by an end-to-end fine tuning phase, whose aim is to bring in the advantage of parallel associative information processing needed to optimize the final solution. The last contribution of this work is an empirical characterization of the effectiveness of the LAES memory along with a comparative analysis against related approaches from the literature, dealing with the design of dynamical memories for recurrent neural models.

This paper extends a previous conference paper [17] which has introduced the LMN concept in a limited empirical setting. This work extends the original paper by:

  • defining the conceptual framework based on the separation between the short-term memory and functional components and leveraging it to discuss short-term memory capacity and learnability of different recurrent architectures;

  • extending the analysis of the LMN memory component and the formalization of its training algorithms, characterizing the LAES memorization capabilities compared to recent relevant works from literature;

  • providing an empirical characterization of the architectural bias given by different recurrent autoencoders providing insights on how they allocate their short-term memory capacity by preferring recent or distant elements in the sequence. To the best of our knowledge, this paper is the first to provide an analysis of the short-term memory capacity for adaptive recurrent autoencoders.

The paper is organized as follows: Section 2 introduces background material, including the notation and the LAES. Section 2.4 gives a review of RNN architectures and their ability to solve the memorization and functional subtasks, and the linear autoencoder for sequences. Section 3 describes the LMN architecture. Section 3.2 presents the specialized training algorithm used to train the memorization component of the LMN. Section 4 shows the experimental results of the autoencoders on the reconstruction of synthetic and real-world data and a comparison of recurrent architectures on symbolic music modeling. Finally, we draw the conclusions in Section 5 and highlight possible avenues for future work.

Section snippets

Background

This section presents background material. Section 2.1 illustrates the notation used throught the paper. Section 2.2 introduces recurrent neural networks. Finally, Section 2.3 describes the Linear Autoencoder for Sequences, which is a fundamental component of the proposed model.

Linear memory network

As discussed in Section 1, to solve a sequential task we need to solve a memorization and a functional subtask. To impose this separation at an architectural level, we define two separate components. Formally, the functional component F and the memorization component M are two functions defined as follows:F:Input×MemoryHiddenM:Hidden×MemoryMemory,where Input,Hidden, and Memory represent the input, hidden, and memory space respectively. Under this setting, given a fixed memorization component,

Experiments

In this section, we evaluate the proposed model and its components, including the memorization module and the specialized training algorithm for the memory. First, we show an empirical study on the short-term memory capacity of several autoencoding models. The evaluated models include static encodings and adaptive ones, evaluated on synthetic and real-world autoencoding tasks. These experiments show that the LAES is superior to alternative autoencoding models and justify the choice of the LAES

Conclusion

In this paper, we proposed an approach to devise a novel architecture and training algorithm, in the context of sequence processing, hinging on the conceptual separation of sequential processing into two different subtasks, the functional and memorization subtasks. Specifically, by focusing on the memorization subtask, which is the recurrent part of the problem, we proposed a novel recurrent neural network model that separates the two subtasks into two different components. Consequently, we

CRediT authorship contribution statement

Antonio Carta: Validation, Writing - original draft, Methodology. Alessandro Sperduti: Writing - original draft. Davide Bacciu: Writing - original draft, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work has been partially supported by the University of Padova, Department of Mathematics, DEEPer project and by the Italian Ministry of Education, University, and Research (MIUR) under project SIR 2014 LIST-IT (grant n. RBSI14STDE).

Antonio Carta graduated from the University of Pisa in 2017. He was an intern at the CERN Openlab in 2017, sponsored by Intel, where he studied deep learning models for high-energy physics data processing. Currently he is a PhD student at the University of Pisa working on novel recurrent neural networks and studying the short-term memory of recurrent models. His main research interests are recurrent neural networks and continual learning.

References (59)

  • R. Pascanu, T. Mikolov, Y. Bengio, On the difficulty of training recurrent neural networks, in: International...
  • S. Hochreiter et al.

    Long short-term memory

    Neural Computation

    (1997)
  • K. Greff et al.

    LSTM: A search space odyssey

    IEEE Transactions on Neural Networks and Learning Systems

    (2017)
  • J. Chung et al.

    Empirical evaluation of gated recurrent neural networks on sequence modeling

    CoRR abs/1412.3

    (2014)
  • M. Arjovsky, A. Shah, Y. Bengio, Unitary evolution recurrent neural networks, in: ICML,...
  • E. Vorontsov, C. Trabelsi, S. Kadoury, C. Pal, On orthogonality and learning recurrent networks with long term...
  • M. Hermans, B. Schrauwen, Training and analysing deep recurrent neural networks, in: Advances in Neural Information...
  • H. Jaeger
    (2001)
  • P. Tino et al.

    Markovian architectural bias of recurrent neural networks

    IEEE Transactions on Neural Networks

    (2004)
  • P. Tino, A. Rodan, Short term memory in input-driven linear dynamical systems, Neurocomputing 112 (2013) 58–63....
  • A. Sperduti, Efficient computation of recursive principal component analysis for structured input, in: Machine...
  • D. Bacciu, A. Carta, A. Sperduti, Linear memory networks, in: ICANN,...
  • Y. Bengio et al.

    Learning long-term dependencies with gradient descent is difficult

    IEEE Transactions on Neural Networks

    (1994)
  • I. Sutskever et al.

    Sequence to sequence learning with neural networks

    Advances in Neural Information Processing Systems

    (2014)
  • A. Sperduti, Exact Solutions for Recursive Principal Components Analysis of Sequences and Trees, in: Artificial Neural...
  • L. Pasa et al.

    Pre-training of Recurrent Neural Networks via Linear Autoencoders

    Advances in Neural Information Processing Systems

    (2014)
  • F. Cummins, F.A. Gers, J. Schmidhuber, Learning to Forget: Continual Prediction with LSTM, Neural Computation 2 (June...
  • J. Wang et al.

    Recurrent neural networks with auxiliary memory units

    IEEE Transactions on Neural Networks and Learning Systems

    (2018)
  • S. Wisdom, T. Powers, J.R. Hershey, J.L. Roux, L. Atlas, Full-Capacity Unitary Recurrent Neural Networks, in: NIPS,...
  • Cited by (4)

    • MULTIMODAL LEARNING CONVERSATIONAL DIALOGUE SYSTEM: METHODS AND OBSTACLES

      2023, Journal of Theoretical and Applied Information Technology

    Antonio Carta graduated from the University of Pisa in 2017. He was an intern at the CERN Openlab in 2017, sponsored by Intel, where he studied deep learning models for high-energy physics data processing. Currently he is a PhD student at the University of Pisa working on novel recurrent neural networks and studying the short-term memory of recurrent models. His main research interests are recurrent neural networks and continual learning.

    ALESSANDRO SPERDUTI received the PhD in 1993 from University of Pisa, Italy. He is a Full Professor at the Department of Mathematics of the University of Padova. Previously, he has been associate professor (1998–2002) and assistant professor (1995–1998) at the Department of Computer Science of the University of Pisa. His research interests are mainly in Neural Networks, Kernel Methods, and Process Mining. He was the recipient of the 2000 AI*IA (Italian Association for Artificial Intelligence) “MARCO SOMALVICO” Young Researcher Award. He as been invited plenary speaker in Neural Networks conferences. Prof. Sperduti has served as AC in major AI Conferences and currently is in the editorial board of the journals Theoretical Computer Science (Section C), Natural Computing, Neural Networks. He has been member of the European Neural Networks Society (ENNS) Executive Committee, chair of the DMTC of IEEE CIS for the years 2009 and 2010, chair of the NNTC for the years 2011 and 2012, chair of the IEEE CIS Student Games-Based Competition Committee for the years 2013 and 2014, and Chair of the Continuous Education Committee of the IEEE Computational Intelligence Society for yer 2015. He is senior member IEEE. Prof. Sperduti is the author of more than 220 publications on refereed journals, conferences, and chapters in books.

    Davide Bacciu has a Ph.D. in Computer Science and Engineering from the IMT Lucca Institute for Advanced Studies. He was a visiting researcher at the Neural Computation Research Group, Liverpool John Moores University, in 2007–2008 and at the Cognitive Robotic Systems laboratory, Orebro University, in 2012. He joined the Computer Science Department of the University of Pisa as Assistant Professor in 2014, and he is currently Associate Professor in the same Department. His research interests include machine learning for structured data, Bayesian learning, deep learning, reservoir computing, distributed and embedded learning systems. He received the 2009 E.R. Caianiello Award for the best Italian Ph.D. thesis on neural networks. He is the Secretary of the Italian Association for Artificial Intelligence, a member of the IEEE Technical Committee, and an Associate Editor of the IEEE Transactions on Neural Networks and Learning Systems.

    View full text