Elsevier

Neurocomputing

Volume 275, 31 January 2018, Pages 2864-2879
Neurocomputing

The memory degradation based online sequential extreme learning machine

https://doi.org/10.1016/j.neucom.2017.11.030Get rights and content

Highlights

  • To improve the accuracy and the generalization of FOS-ELM, the MDOS-ELM is proposed.

  • A self-adaptive memory factor is applied to adjust the weight of the old and the new samples.

  • The self-adaptive memory factor is determined by two elements.

  • One is the similarity between the new and the old samples.

  • The other is the prediction errors of the current training samples on the previous model.

Abstract

In online learning, the contribution of old samples to a model decreases as time passes, and old samples gradually become invalid. Although the Online Sequential Extreme Learning Machine (OS-ELM) can avoid the repetitive training of old samples, invalid samples are still used, which goes against improving the accuracy of an OS-ELM model. The Online Sequence Extreme Learning Machine with Forgetting Mechanism (FOS-ELM) timely discards invalid samples, but it does not consider the differences among valid samples and then has the limitation on boosting the accuracy and generalization. To solve this issue, the Memory Degradation Based OS-ELM (MDOS-ELM) is proposed in this paper. The MDOS-ELM adjusts the weights of the old and new samples in real time by a self-adaptive memory factor, and simultaneously discards invalid samples. The self-adaptive memory factor is determined by two elements. One is the similarity between the new and old samples, and the other is the prediction errors of the current training samples on the previous model. The performance of the proposed MDOS-ELM is validated on both regression and classification datasets which include an artificial dataset and twenty-two real-world dataset. The results demonstrate that the MDOS-ELM model outperforms the OS-ELM and the FOS-ELM models on the accuracy and generalization.

Introduction

The Single-hidden Layer Feed-forward Neural-network (SLFN) with a nonlinear activation function is a strong learning algorithm, which can approximate any nonlinear continuous function [1]. Since it was proposed, the SLFN has been widely concerned in both theory and practice [2], [3], [4], [5]. Unfortunately, it is required to iteratively adjust the parameters for optimizing the SLFN model, which makes the SLFN lack of the fast learning ability.

To address this problem, based on the Moore-Penrose generalized inverse theory, the Extreme Learning Machine (ELM) is proposed by Huang et al. [6], [7], [8]. Compared with the traditional SLFN algorithm, the ELM only needs to set reasonable numbers of the hidden layer nodes and randomly assign parameters of the hidden layer and then the output weights of the ELM are obtained by the least squares method. Only one step is carried out in the whole learning process of the ELM. Namely, the ELM does not iteratively update all the parameters of the SLFN, which makes it have the advantage of fast learning.

However, the ELM has the following disadvantages. First, the ELM only considers the empirical risk minimization and easily causes the over-fitting issue. To solve this issue, the regularized ELM (RELM) [9] was proposed by Deng et al., which was based on both the structural risk and the empirical risk minimizations, but the RELM has the shortcoming which cannot cope with non-Gaussian noises [10]. Second, the ELM cannot handle the curse of dimensionality well. To handle this disadvantage, kernel functions were introduced into the ELM, which could map samples into higher dimensional feature space [11], [12], [13], but these algorithms have the shortcoming which is hard to select an optimal kernel function for a specific application. Third, it is hard to set a suitable number of nodes in hidden layer. If the number is too large, the computation of the ELM model is time-consuming, and inversely, and the ELM model is not accurate enough. To address this problem, literature [14], [15], [16], [17], [18] proposed strategy that gradually added or reduced the number of nodes in the hidden layer to optimize the structure of an ELM model, but the operation pruned or increased is time consuming. Fourthly, the ELM has the limitation in drawing deeper relationship among the data from too complicated practical problems. To deal with this disadvantage, the ELM was usually modified by combining with the deep learning algorithms [19], and then the multilayer ELM [20] and the deep ELM [21] were proposed. Additionally, many revisions of the ELM were presented and applied in various fields, such as the wind forecast [22], [23], the image processing [24], [25], the face recognition [26], the solar radiation prediction [27], the medical diagnosis [28], [29], pathological brain detection [30], [31], [32], and the time series predictions [33].

The traditional ELM and most of its revisions are more suitable for offline learning. Prior to offline learning, a certain number of samples are accumulated, which are used to train a model and then the model is applied to obtain the prediction of a new sample. However, in many real applications, such as the price forecasting [34], the weather forecasting [35], and the robotic online controllers [36], the samples are updated in real time. If the ELM is directly used for the online learning, when the chunk of new samples is acquired, the ELM takes the old and new samples together as a new sample chunk. Then the new sample chunk is re-learned by the ELM. As the accumulation of samples and the repeated learning of the old samples, the learning speed of the ELM will gradually decline, and even worse, the ELM may be unable to make the online learning continuous.

In order to effectively avoid the repeated learning of old samples, the Online Sequential ELM (OS-ELM) is proposed by Liang et al. [37], which gradually adds the new samples for learning chunk by chunk. By using the output weights of the previous OS-ELM model and the new samples, the OS-ELM updates the output weights and improves the speed of online learning. On the basis of the OS-ELM, the fuzzy OS-ELM based on fuzzy inference system is proposed by Rong et al. [38], and the OS-ELM with kernels is proposed by Scardapane et al. [39]. The OS-ELM and its improved algorithms randomly assign initial parameters of the hidden layer, which brings the instability of the models. To deal with the issue, the ensemble OS-ELMs is proposed by Lan et al. [40], which is a combination of several independent OS-ELM sub-models. Although the EOS-ELMs algorithm has strong stability, it is not prominent to improve the accuracy and the generalization. The reason is that it ignores the timeliness of samples.

The so-called timeliness of samples is that as time elapses old samples will be gradually invalid in online learning. Applying the invalid samples in online learning usually leads to the declining accuracy and generalization of the EOS-ELMs model. Aiming at this issue, the OS-ELM with Forgetting Mechanism (FOS-ELM) is proposed by Zhao et al. [41], in which the invalid samples are timely discarded by the forgetting mechanism. However, the differences among valid samples are ignored by the FOS-ELM. In an online learning process, the importance of the valid samples to a model is related with their age [42]. New samples are usually more important than old ones, which should be given greater weights, vice versa.

In order to fully utilize the differences among valid samples and to further improve the accuracy and generalization of the FOS-ELM model, the Memory Degradation based OS-ELM (MDOS-ELM) is proposed in this paper. On the one hand, the self-adaptive memory factor is introduced in the MDOS-ELM, which can dynamically adjust the weights of the old and new valid samples in the online learning processes. The self-adaptive memory factor is determined by two elements. One is the similarity between the input variables of the new and old samples, and the other is the predicting errors of the current training samples on the previous model. The former is from the point of view of the input variables, and the latter is from the point of view of output variable. There are many methods for evaluating the similarity of samples, such as Euclidean distance, Cosine, Pearson-Correlation coefficient [43] and Robust-Huber Similarity Measure (RHSM) [44]. The first three are common methods and they are too sensitive to outliers. With Huber function [45], the last one is more robustness to outliers.

On the other hand, the mechanism of timely discarding invalid samples in the FOS-ELM is referred to by the MDOS-ELM. The performance of the MDOS-ELM is verified on nine regression datasets and fourteen classification datasets. The accuracy and generalization of the MDOS-ELM is compared with those of the OS-ELM and the FOS-ELM.

The remainder of this paper is organized as follows. Section 2 introduces the ELM, the OS-ELM and the FOS-ELM algorithms. In Section 3, the MDOS-ELM algorithm is proposed and the self-adaptive memory factor is discussed. In Section 4, nine regression datasets and fourteen classification datasets are adopted in the experiments to verify the performance of the proposed MDOS-ELM. The performance of the MDOS-ELM is compared with that of the OS-ELM and the FOS-ELM. In Section 5, the conclusion of this paper is summarized.

Section snippets

The ELM algorithm

The learning set is S={(xi,ti)|xiRn,tiR,i=1,2,...,N}, where t is the output variable and x=[x1,x2,...,xn] is the input vector. N is the number of samples and n is the dimension of the input variables. A unified SLFN model with L nodes in the hidden layer can be expressed as follows [5], fL(x)=l=1LβlG(al,bl,x),xRn,where βl is the output weight of the lth node of the hidden layer. G( · ) is the activation function of the hidden layer node, and it can be ‘RBF’, ‘Sigmoid’, ‘Sine’, or ‘hradlim’

The MDOS-ELM

During the whole online learning process, the old and new samples are equally treated by the OS-ELM and the FOS-ELM. However, differences are existed between the old and new samples. The new samples can better reflect the characteristics and trends of the current data than the old ones. Additionally, each sample chunk has a period of validity. Sample chunk will become invalid if its validity time exceeds period of validity.

Considering the differences and timeliness of the samples, the MDOS-ELM

Experiments

In order to verify the performance of the proposed MDOS-ELM, nine datasets of regression and fourteen datasets of classification were employed to carry out the experiments. These datasets included an artificial dataset and twenty-two real-world datasets. These real-world datasets were collected form the UCI machine learning repository, the Delve Datasets and the StatLib-Datasets Archive. Twonorn was collected form the Delve Datasets and Bodyfat was collect form the StatLib—Datasets Archive. The

Conclusions

The main contribution of this work is that the proposed MDOS-ELM significantly improves the accuracy and generalization of the FOS-ELM. The MDOS-ELM successfully addresses two problems of online learning. The first one is the timeliness of the training samples and the second is the differences among the valid samples. In the MDOS-ELM, the self-adaptive memory factor is introduced to dynamically adjust the weight between valid samples. The self-adaptive memory factor is determined by two

Acknowledgments

This work is supported by the National Natural Science Foundation of China (nos. 61702070, 61425002, 61672121, 61672051, 61572093, 61402066, 61402067, 61370005, 31370778), Program for Changjiang Scholars and Innovative Research Team in University (no. IRT_15R07), the Program for Liaoning Innovative Research Team in University (no. LT2015002), the Basic Research Program of the Key Lab in Liaoning Province Educational Department (nos. LZ2014049, LZ2015004), Scientific Research Fund of Liaoning

Quan-Yi Zou is currently working towards the M.Sc. degree in computer science & technology from Dalian University, Dalian, People's Republic of China. His research interests include machine learning and neural networks.

References (49)

  • A.R. Lima et al.

    Forecasting daily streamflower using online sequential extreme learning machines

    J. Hydrol.

    (2016)
  • Y. Lan et al.

    Letters ensemble of online sequential extreme learning machine

    Neurocomputing

    (2009)
  • J. Zhao et al.

    Online sequential extreme learning machine with forgetting mechanism

    Neurocomputing

    (2012)
  • F.A.A. Souza et al.

    Review of soft sensor methods for regression applications

    Chemom. Intell. Lab. Syst.

    (2016)
  • A. Ghaffari et al.

    Robust Huber similarity measure for image registration in the presence of spatially-varying intensity distortion

    Signal Process.

    (2015)
  • C.Y. Deng

    A generalization of the Sherman–Morrison–Woodbury formula

    Appl. Math. Lett.

    (2011)
  • G.B. Huang

    Learning capability and storage capacity of two-hidden-layer feedforward networks

    IEEE Trans. Neural Networks

    (2003)
  • F. Gruau et al.

    Adding learning to the cellular development of neural networks evolution and the Baldwin effect

    Evol. Comput.

    (1993)
  • T. Chen et al.

    Approximation capability to functions of several variables, nonlinear functionals, and operators by radial basis function networks

    IEEE Trans. Neural Networks

    (1995)
  • G.B. Huang et al.

    Universal approximation using incremental constructive feedforward networks with random hidden nodes

    IEEE Trans. Neural Networks

    (2006)
  • G.B. Huang et al.

    Extreme learning machines a survey

    Int. J. Mach. Learn. Cybern.

    (2011)
  • G.B. Huang et al.

    Extreme learning machine a new learning scheme of feedforward neural networks

  • W. Deng et al.

    Regularized extreme learning machine

  • X. Lu et al.

    Probabilistic regularized extreme learning machine for robust modeling of noise data

    IEEE Trans. Cybern.

    (2017)
  • Cited by (0)

    Quan-Yi Zou is currently working towards the M.Sc. degree in computer science & technology from Dalian University, Dalian, People's Republic of China. His research interests include machine learning and neural networks.

    Xiao-Jun Wang received the B.S. degree in automation from Dalian Ocean University, Dalian, China, in 2009. She received the M.Sc. and Ph.D. degrees in control theory & control engineering from Northeastern University, Shenyang, China, in 2011 and 2016, respectively. Her current research interests include machine learning, modeling, and artificial intelligence.

    Chang-Jun Zhou received the Ph.D. degree in Dalian University of Technology, Dalian, in 2008. Currently, he is a Professor at Dalian University, Dalian, People's Republic of China. He was a recipient of Liaoning distinguished professor in 2014, and his research interests include intelligence computing and pattern recognition.

    Qiang Zhang received the Ph.D. degree in Xidian University, Xian, in 2002. Currently, he is a Professor at Dalian University, Dalian. He is the National Science Fund for Distinguished Young Scholars, and his research interests include intelligence computing, neural networks, DNA Computing and Computer Animation.

    View full text