Elsevier

Neurocomputing

Volume 379, 28 February 2020, Pages 172-181
Neurocomputing

Progressive Operational Perceptrons with Memory

https://doi.org/10.1016/j.neucom.2019.10.079Get rights and content

Abstract

Generalized Operational Perceptron (GOP) was proposed to generalize the linear neuron model used in the traditional Multilayer Perceptron (MLP) by mimicking the synaptic connections of biological neurons showing nonlinear neurochemical behaviours. Previously, Progressive Operational Perceptron (POP) was proposed to train a multilayer network of GOPs which is formed layer-wise in a progressive manner. While achieving superior learning performance over other types of networks, POP has a high computational complexity. In this work, we propose POPfast, an improved variant of POP that signicantly reduces the computational complexity of POP, thus accelerating the training time of GOP networks. In addition, we also propose major architectural modications of POPfast that can augment the progressive learning process of POP by incorporating an information preserving, linear projection path from the input to the output layer at each progressive step. The proposed extensions can be interpreted as a mechanism that provides direct information extracted from the previously learned layers to the network, hence the term “memory”. This allows the network to learn deeper architectures and better data representations. An extensive set of experiments in human action, object, facial identity and scene recognition problems demonstrates that the proposed algorithms can train GOP networks much faster than POPs while achieving better performance compared to original POPs and other related algorithms.

Introduction

Given a data set, a learning problem can be translated as the task of searching for the suitable transformation or mapping of the input data to some domains with specific characteristics. In discriminative learning, data in the target domain should be separable among different classes of input while in generative learning, data in the target domain should match some specific characteristics (e.g. a given distribution). In the biological learning system of mammals, the transformation is done by a set of neurons, each of which conducts electrical signals over three distinct operations: modification of the input signal from the synapse connection in the Dendrites; pooling operation of the modified input signals in the Soma, and sending pulses when the pooled potentials exceed a limit in the Axon hillock [1]. Biological learning systems are generally built from a diverse set of neurons which perform various neuronal activities. For example, it has been shown that there are approximately 55 different types of neurons to perform low-level visual sensing in mammalian retina [2].

In order to solve learning problems with machines, Artificial Neural Networks (ANNs) were designed to simulate biological learning system with artificial neurons as the core component. The most typical neuron model is based on McCulloch-Pitts perceptron [3], thereupon simply referred to as perceptron, which loosely mimics the behavior of biological neurons by scaling the input signals, summing over all scaled inputs, followed by the thresholding step. Mathematically, the activity of a perceptron corresponds to a linear transformation followed by an element-wise nonlinear function. Despite its simplicity, most of the existing state-of-the-art architectures in different application domains [4], [5], [6], [7] rely on this additive/affine perceptron model. This is due to the fact that linear transformation is expressed via matrix multiplication, which has several highly optimized implementations. While being efficient in terms of computation, the traditional perceptron model might not be optimal in terms of representation. In fact, the idea of enhancing the expressiveness of neural networks via more complex neuron models or activation functions has gradually attracted more attentions [8], [9], [10], [11]. In order to better simulate biological neuron in the mammalian nervous system, the authors in [1] proposed a generalized perceptron model, known as Generalized Operational Perceptron (GOP), which admits a broader range of neuronal activities by three distinct sets of operations: nodal, pooling and activation operations. The schematic operation of GOP is illustrated in Fig. 1.

As shown in Fig. 1, a GOP first applies a nodal operator (ψil+1) to each individual output signal from the previous layer using adjustable synaptic weights wkil+1 (k=1,,Nl). The operated output signals are pooled to a scalar by the pooling operator (ρil+1), after which the bias term bil+1 is added. The activation operator (fil+1) determines the magnitude of activating signal that GOP sends to the next layer. By having the ability to select different nodal, pooling and activation operators from a library of operators, each GOP encapsulates a wide range of neural activities. For example, the traditional perceptron can be formed by selecting multiplication as the nodal operator, summation as the pooling operator and sigmoid or ReLU as the activation operator. In our work, the term operator set, which refers to one specific choice of nodal, pooling and activation operator, represents a particular neuronal activity of a GOP. A sample library of operators is shown in Table 1. Mathematically, the activities performed by the i-th GOP in layer l+1 can be described the by following equations:zkil+1=ψil+1(ykl,wkil+1)xil+1=ρil+1(z1il+1,,zNlil+1)+bil+1yil+1=fil+1(xil+1)

Multiple GOPs can be combined to form multilayer network, hereafter called GOP networks. Since each GOP involves a library of operators, training a GOP network poses a much more challenging problem compared to standard MLP networks: not only the synaptic weights and the biases should be optimized but also the choice of the operator set per neuron. In [1], the authors proposed Progressive Operational Perceptron (POP), a specific configuration of GOP network in which each layer is progressively trained, given a pre-defined network template. To make the search of operator set tractable, POP constrains all GOPs within the same layer to share the same operator set, and the evaluation of each operator set is performed through stochastic optimization, i.e., Back Propagation (BP) algorithm. Recently, the authors in [12] proposed a new learning algorithm that aims at efficiency and compactness by constructing heterogeneous multilayer of GOPs utilizing a randomization process during the search procedure.

In this study, we aim to improve the performance of POPs by making several modifications. Particularly, we incorporate a linear output layer relaxation to reduce the training complexity that only requires one iteration over the library of operator sets instead of four as in the original POP trained with two-pass GIS algorithm. In addition, we propose two memory schemes that aim to augment the progressive learning procedure in POP by incorporating an additional linear path that preserves information extracted from previous layers. The contributions of our work can be summarized as follows:

  • We propose POPfast, a simplified version of POP, which only requires one iteration over the library of operator sets compared to four iterations as in POP. Our experimental results demonstrate that POPfast performs similarly to POP while being faster.

  • Based on POPfast, we propose two memory schemes to enable the network direct access to previous layers’ information at each progressive step. For each memory scheme, we evaluate two types of information-preserving linear transformations to extract information synthesized by the previous layers. Extensive experiments were conducted to demonstrate performance improvements of POPfast augmented with memory. Besides, the importance of memory path is also empirically analyzed.

  • We make our implementation of all evaluated algorithms publicly available to facilitate future research, including parallel implementation for both single and multiple machines [13].

The remaining of the paper is organized as follows: In Section 2, we review POP and other related progressive algorithms for ANN training. Section 3 starts with the description of POPfast and continues to the description of the proposed memory schemes. In Section 4, we describe the details of our experimental setup, followed by quantitative analysis of the experiment results. Finally, our conclusion is made in Section 5.

Section snippets

Related work

This section reviews Progressive Operational Perceptron (POP) that is a particular type of GOP networks with progressive formation. In addition, other related progressive learning algorithms which were evaluated in our work are also briefly presented.

Proposed algorithms

In this section, we start by describing POPfast, an extension we propose to reduce the training complexity of POP trained with two-pass GIS. We continue by describing our motivation to propose memory extensions to POPfast. Two memory extensions are then described and discussed in detail.

Experiments

In this section, we detail our empirical evaluation and analysis of the proposed POPfast, POPmem-H and POPmem-O with respect to POP and three other related algorithms: BLS, S-ELM, and PLN. PCA and LDA were employed as the information-preserving, linear projection G in our memory proposals. The corresponding algorithms are denoted as POPmem-H-PCA, POPmem-H-LDA, POPmem-O-PCA, POPmem-O-LDA.

Information related to the datasets, experimental protocol and implementation will be given first, followed

Conclusions

In this paper, we proposed POPfast, an efcient algorithm that accelerates the training time of the original POP algorithm while achieving competitive performance in a variety of classification problems. Since learning with GOPs involves operator set evaluation, our work contributes an efcient search procedure for the future works that employ GOPs, enabling us to tackle more complex and larger datasets as illustrated in our experiments. Based on the accelerated search procedure, we propose two

Declaration of Competing Interest

We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

Dat Thanh Tran received the B.A.Sc. degree in Automation Engineering from Hameen University of Applied Sciences, and M.Sc. degree in Data Engineering and Machine Learning from Tampere University in 2017 and 2019 respectively. He has been a Research Assistant in Multimedia Research Group led by Professor Moncef Gabbouj at Tampere University since 2015. His current research interests include statistical pattern recognition and machine learning, especially machine learning models with efficient

References (33)

  • W.S. McCulloch et al.

    A logical calculus of the ideas immanent in nervous activity

    Bull. Math. Biophys.

    (1943)
  • Z.C. Lipton, J. Berkowitz, C. Elkan, A critical review of recurrent neural networks for sequence learning,...
  • F. Fan, G. Wang, Universal approximation with quadratic deep networks, arXiv:1808.00098...
  • D.T. Tran et al.

    Heterogeneous multilayer generalized operational perceptron

    IEEE Trans. Neural Netw. Learn. Syst.

    (2019)
  • D.T. Tran et al.

    Pygop: a python library for generalized operational perceptron algorithms

    Knowl. Based Syst.

    (2019)
  • D.T. Tran et al.

    Learning to rank: a progressive neural network learning approach

    Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    (2019)
  • Cited by (34)

    • Zero-shot motor health monitoring by blind domain transition

      2024, Mechanical Systems and Signal Processing
    • Incremental learning of upper limb action pattern recognition based on mechanomyography

      2023, Biomedical Signal Processing and Control
      Citation Excerpt :

      However, when the perceptron is dealing with nonlinear data, it is easy to fall into the local optimum. When the samples are linear inseparable, the core-based perceptron algorithm can be used [34]. The perceptron algorithm based on incremental learning performs new incremental training by passing the weights of the model.

    • Progressive and compressive learning

      2022, Deep Learning for Robot Perception and Cognition
    • 1D convolutional neural networks and applications: A survey

      2021, Mechanical Systems and Signal Processing
      Citation Excerpt :

      GOPs can use any neuron model, linear or non-linear while having a heterogeneous network structure just like the human nervous system. Further in, [98–100] GOPs were further improved to obtain other desired features such as neuron-level heterogeneity and “memory” capability. However, it was not until very recently that the first CNN-like network without those aforementioned limitations has been proposed in [96].

    • Self-organized operational neural networks for severe image restoration problems

      2021, Neural Networks
      Citation Excerpt :

      These nonlinear mappings are defined in an operator set library and searched for each learning problem individually. Similar to their predecessors, Generalized Operational Perceptrons (GOPs), which were shown to be superior to traditional Multi-Layer Perceptrons (MLPs) (Kiranyaz, Ince, Iosifidis, & Gabbouj, 2017a, 2017b; Tran, Kiranyaz, Gabbouj, & Iosifidis, 2020a, 2020b), ONNs were shown to outperform equivalent and even deeper CNN architectures across a variety of computer vision problems, including AWGN image denoising (Kiranyaz, Ince et al., 2020; Kiranyaz, Malik et al., 2020). In ONNs, the choices for nodal and pool operators are critical towards generating an optimal degree of non-linearity for a given problem.

    View all citing articles on Scopus

    Dat Thanh Tran received the B.A.Sc. degree in Automation Engineering from Hameen University of Applied Sciences, and M.Sc. degree in Data Engineering and Machine Learning from Tampere University in 2017 and 2019 respectively. He has been a Research Assistant in Multimedia Research Group led by Professor Moncef Gabbouj at Tampere University since 2015. His current research interests include statistical pattern recognition and machine learning, especially machine learning models with efficient computation and novel neural architectures.

    Serkan Kiranyaz is a Professor in Qatar University, Doha, Qatar. He published 2 books, 5 book chapters, more than 50 journal papers in high impact journals, and more than 100 papers in international conferences. He made contributions on evolutionary optimization, machine learning, bio-signal analysis, computer vision with applications to recognition, classification, and signal processing. Prof. Kiranyaz has co-authored the papers which have nominated or received the “Best Paper Award” in ICIP 2013, ICPR 2014, ICIP 2015 and IEEE TSP 2018. He had the most-popular articles in the years 2010 and 2016, and most-cited article in 2018 in IEEE Transactions on Biomedical Engineering. During 2010–2015 he authored the 4th most-cited article of the Neural Networks journal. His research team has won the 2nd and 1st places in PhysioNet Grand Challenges 2016 and 2017, among 48 and 75 international teams, respectively.

    Moncef Gabbouj received the B.S. degree in electrical engineering from Oklahoma State University, Stillwater, OK, USA, in 1985, and the M.S. and Ph.D. degrees in electrical engineering from Purdue University, West Lafayette, IN, USA, in 1986 and 1989, respectively. He was a Professor with the Academy of Finland, Helsinki, Finland, from 2011 to 2015. He is currently a Professor of signal processing with the Department of Signal Processing, Tampere University of Technology, Tampere, Finland. He guided 45 Ph.D. students and published 700 papers. His current research interests include big data analytics, multimedia content-based analysis, indexing and retrieval, artificial intelligence, machine learning, pattern recognition, nonlinear signal and image processing and analysis, voice conversion, and video processing and coding. He is a member of the Academia Europaea and the Finnish Academy of Science and Letters. He is the past Chairman of the IEEE Circuit and System Technical Committee on Digital Signal Processing and a Committee Member of the IEEE Fourier Award for Signal Processing. He served as an Associate Editor and a Guest Editor for many IEEE and international journals. He served as a Distinguished Lecturer for the IEEE Circuit and System Society. He organized several tutorials and special sessions for major IEEE conferences and European Signal Processing Conference.

    Alexandros Iosifidis is currently an Associate Professor with Aarhus University, Aarhus, Denmark. He has (co-)authored 50 journal papers, 72 conference papers, and four book chapters in topics of his expertise. His work has attracted over 1300+ citations, with h-index of 18 (Publish and Perish). His current research interests include the areas of machine learning, pattern recognition, computer vision, and computational finance. His work received many awards, including the H.C. orsted Forskerspirer 2018 prize for research excellence, the Academy of Finland Postdoc Fellowship, and best (student) paper awards in IPTA 2016 and VCIP 2017. He was listed in the best works of IPTA 2016, IJCCI 2014, and the IEEE SSCI 2013. He served as an Officer of the Finnish IEEE Signal Processing/Circuits and Systems Chapter from 2016 to 2018. He is currently serving as an Associate Editor for Neurocomputing and the IEEE ACCESS journals, an Area Editor for Signal Processing: Image Communication journal, and the Area Chair for the IEEE International Conference on Image Processing 2018.

    View full text