Elsevier

Neurocomputing

Volume 194, 19 June 2016, Pages 34-44
Neurocomputing

A Gaussian mixture framework for incremental nonparametric regression with topology learning neural networks

https://doi.org/10.1016/j.neucom.2016.02.008Get rights and content

Abstract

Incremental learning is important for memory critical systems, especially when the growth of information technology has pushed the memory and storage costs to limits. Despite the great amount of effort researching into incremental classification paradigms and algorithms, the regression is given far less attention. In this paper, an incremental regression framework that is able to model the linear and nonlinear relationships between response variables and explanatory variables is proposed. A three layer feed-forward neural network structure is devised where the weights of the hidden layer are trained by topology learning neural networks. A Gaussian mixture weighted integrator is used to synthesize from the output of the hidden layer to give smoothed predictions. Two hidden layer parameters learning strategies whether by Growing Neural Gas (GNG) or the single layered Self-Organizing Incremental Neural Network (SOINN) are explored. The GNG strategy is more robust and flexible, and single layered SOINN strategy is less sensitive to parameter settings. Experiments are carried out on an artificial dataset and 6 UCI datasets. The artificial dataset experiments show that the proposed method is able to give predictions more smoothed than K-nearest-neighbor (KNN) and the regression tree. Comparing to the parametric method Support Vector Regression (SVR), the proposed method has significant advantage when learning on data with multi-models. Incremental methods including Passive and Aggressive regression, Online Sequential Extreme Learning Machine, Self-Organizing Maps and Incremental K-means are compared with the proposed method on the UCI datasets, and the results show that the proposed method outperforms them on most datasets.

Introduction

With the amount of data growing rapidly in today׳s social and industrial life, it is beneficial for the data mining applications in these areas to be more space efficient. Not only can space efficient algorithms reduce the cost of data storage, they are important for memory critical systems such as embedded systems and autonomous robots. There has been a great amount of research into incremental and online data mining techniques which can reduce the memory required in learning. However, those researches mostly focus on classification and clustering. For incremental regression there has not been enough attention.

Conventional non-incremental regression methods are divided into two categories by their different approaches to nonlinear predictions, namely the parametric and nonparametric methods. The parametric methods assume that the model generated the data is an analytical model. For example, in Support Vector Regression (SVR) [1], the nonlinear model is assumed to a polynomial or a radiant basis function. The SVR learning is to tune the parameters of the models to reach the minimum error on the training data. In nonparametric regression methods like K-nearest-neighbor (KNN), data are not generalized by an analytical model but instead represented by a subset of the data.

A regression method is to learn a model Y=f(X), where X is the explanatory variables and Y is the response variables. The incremental learning methods process the data in a sequential manner (X(1),Y(1)), (X(2),Y(2)), …, (X(t1),Y(t1)), (X(t),Y(t)), … In each step, the model is updated by the input (X(t),Y(t)) as ftft1. In some incremental learning strategies, the input is stored in a buffer with size k and the model is updated every k steps as ftftk. We refer a regression method as strictly incremental if and only if k=1. There are mainly two kinds of incremental methods like the non-incremental regression methods, namely the parametric and nonparametric methods.

Incremental parametric regression methods are often implemented as stochastic approximations to their non-incremental counterparts. The nonlinear prediction problem is a major challenge to adapt parametric regression methods for incremental learning. Researches such as the passive aggressive regression [2] are focusing on the linear regression. In parametric methods such as online SVR [3], the kernel parameter tuning which is important for the accuracy would be nearly impossible, since retraining is not allowed when the training is strictly incremental. In [4], [5], the nonlinear problem is transformed linear by random features [6]. However, through random feature techniques, those methods introduce low efficiency when dealing with high dimensional prediction problems. There is another approach utilizing the random feature mapping technique, namely the Online Sequential Extreme Learning Machine (OS-ELM) [7]. The main problem of OS-ELM is stability, which can be remedied by ensembles [8]. However, by ensemble the benefit of incremental learning is reduced.

Incremental nonparametric regression methods do not assume linearity or a pre-defined nonlinear model. There are mainly two types of incremental nonparametric regression methods. One is the decision tree methods such as [9]. The other is KNN used in combine of incremental clustering methods such as self-organizing map (SOM) [10]. In decision tree methods, the data space is divided into subspaces such that the nonlinear prediction problem is rendered linear in each of the subspaces. This subspace division approach is less efficient if the data increase in sample size and dimension. Besides, regression trees are unable to give smooth predictions as the parametric methods. KNN has to balance accuracy and efficiency, since the larger the subset selected for knowledge representation, the less efficient the algorithm becomes. Moreover, KNN is flawed in its generalization abilities when the distribution of the data for prediction is different from that of the training data. Another way to make use of incremental clustering abilities for regression is the clusterwise regression [11]. The drawbacks of this framework are the same as regression trees because that it is implemented in a subspace division manner as the regression tree. In summery, comparing to the parametric regression, there are generalization issues in nonparametric methods such as decision tree and incremental clustering based regression.

Incremental clustering methods are adapted for regression as mentioned above. There have been new advances in the area of incremental clustering since the self-organizing maps (SOM) used in [10], [11]. One is the data stream clustering such as data stream K-means [12], K-medians [13]. In these methods the clustering objective is to find the low density areas that separate the dense areas where data items are more crowded. Another category is the topology preserving and topology learning methods. SOM is such topology preserving learning methods that learns not only the vector quantization which is similar to the objective of conventional clustering methods, but also keeps the topology relationships of the clustering centers. One advantage of such topology preserving learning is that the data distribution can be represented more accurately [14], [15]. SOM is limited in its clustering ability, because it needs a pre-defined topology structure which might contradict the true distribution. Growing neural gas (GNG) [16], on the contrary, can perform topology learning incrementally. The drawback of GNG is that there is no limit for the growing of neurons even when there is no new information in the incoming data. This drawback is remedied in Self-Organizing Incremental Neural Networks (SOINN) [17]. The original SOINN contains two layered structure, but is reduced to single layered structures in later works such as [18], [19]. SOINN׳s incremental topology learning depends on the applicability of Delaunay triangulation construction from the data, thus a limitation is present when the training data are highly concentrated on some dimensions. Theoretically there is no such limitation for GNG. As a result, GNG and SOINN both have their own advantages.

In this paper, we propose an incremental nonparametric regression framework, with the topology learning neural networks as the solution for nonparametric distribution learning, and a Gaussian mixture regression framework for giving smoothed predictions. Two different approaches, GNG regression (GNGR) and the single layered SOINN regression (SOINNR), are explored. Our main contributions are listed as follows:

  • 1.

    An incremental nonparametric regression framework based on topology learning neural networks is proposed.

  • 2.

    A two steps regression mechanism is proposed. First, the joint density of explanatory variables and response variables are represented by the clustering results of topology learning neural networks. Second, the joint density function is used in a Gaussian mixture regression model to accomplish the regression task. Moreover, deductions are made to construct the regression function directly from the clustering results of topology learning neural networks.

  • 3.

    Two different approaches, GNGR and SOINNR, are proposed. Experimental results confirm that GNGR is more scalable, while SOINNR is less sensitive to training parameter settings.

  • 4.

    Comparing to parametric regression, the parameters in the proposed method can be reset without retraining the model. Comparing to nonparametric regression, the proposed framework is capable of smoothed prediction thus gives better generalization abilities.

Comparison experiments are carried out on an artificial dataset and six UCI datasets. The experiment on the artificial dataset shows that the proposed framework is more smoothed in prediction than KNN and the regression tree. Experiments on the UCI datasets show that the proposed framework outperforms KNN in accuracy and has better performance on most of the datasets than the existing incremental methods.

The rest of the paper is organized as follows. In Section 2, the algorithms of GNG and single layered SOINN which are preliminaries for later sections are introduced. Section 3 details our proposed framework and the algorithms. Section 4 is the experimental results and in Section 5 there are the conclusions.

Section snippets

Topology learning neural networks

Assume a data set {X} with data points X(1),X(2),X(3),,X(i)Rd the learning task of GNG [16] and single layered SOINN [19] is that after a single pass scan of the dataset to represent the data by neurons i with weights WiRd. Their learning objective is formally defined in most literature as a minimization of the reconstruction error [16]t=1|{X}|iNωiX(t)Wi2where N is the set of neurons andωi={1,nearestneurontoinputX(t)isi0,elseThe minimization goal stated in Eq. (1) is not so different

From topology learning to incremental nonparametric regression

In this section we investigate a new perspective to interpret the topology learning results for regression. Assume that X and Y are the random variables involved in the regression problem, and Y is the response variables. Our task is similar to KNN where weights of clustering centers are used to construct a regression model Y^=f(X|{Wi}), where Y^ means the estimated response variables. Topology learning results including weights of neurons {Wi} and the times {Ci} each neuron is selected as the

Experiments

The experiments are first carried out on an artificial dataset to illustrate the smoothed prediction ability of the proposed method. Then six UCI [23] datasets are used to evaluate the effect of smooth parameter and neuron numbers. Comparisons of prediction accuracies with conventional methods are carried out on the six UCI datasets. The details of the UCI datasets are listed in Table 1. All the accuracy values in this paper are the mean squared errors (MSE), and all data including the response

Conclusions

In this paper, we solve nonlinear regression and incremental regression problems in one framework. Incremental learning is solved by the incremental vector quantization abilities of GNG and single layered SOINN. Then smoothed nonlinear prediction is given by a Gaussian mixture regression model. The local minima difficulties in conventional neural networks are avoided by the stable distribution learning of GNG and single layered SOINN. Experimental results show that the proposed method is an

Acknowledgments

This work was supported in part by the National Natural Science Foundations of China (Nos. 61301148 and 61272061), the Fundamental Research Funds for the Central Universities of China, Hunan Natural Science Foundation of China.

Zhiyang Xiang received M.E. degree on computer science from Northwest A & F University, China. He is currently pursuing a Ph.D. degree in Hunan University, China. His research interests include neural networks algorithms and applications in information security.

References (29)

  • A. Gijsberts, G. Metta, Incremental learning of robot dynamics using random features, In: 2011 IEEE International...
  • A. Rahimi, B. Recht, Random features for large-scale kernel machines, In: Advances in neural Information Processing...
  • N.-Y. Liang et al.

    A fast and accurate online sequential learning algorithm for feedforward networks

    IEEE Trans. Neural Netw.

    (2006)
  • L.A. Silva, E. Del-Moral-Hernandez, A SOM combined with KNN for classification task, In: The 2011 International Joint...
  • Cited by (16)

    • A self-organizing deep belief network based on information relevance strategy

      2020, Neurocomputing
      Citation Excerpt :

      Deep belief network (DBN), stacked by multiple restricted Boltzmann machines (RBMs), combining the vast learning ability of DNNs and the interpretability of reasoning-based process, has been considered as one of the most effective DNNs [4–7]. However, in the applications of DBN, how to design the structure of network is a crucial problem [8–10]. To design the structure of DBN, Srivastava et al. proposed a randomly dropout algorithm to drop out hidden neurons of DBN to prevent complex co-adaptations of feature detectors in the training process [11].

    • Short-term traffic volume prediction by ensemble learning in concept drifting environments

      2019, Knowledge-Based Systems
      Citation Excerpt :

      Unfortunately, there would be flaws if this framework is applied directly to traffic volume prediction because of the inability to handle non-stationarity. In addition, the constructed model in [25,26] would be incremented infinitely and would perform badly in the unending traffic volume. Some other examples are an incremental framework based on semi-supervised learning proposed in [27] and the Bayesian particle filter sampling technique shown in [28], both of which are capable of non-linear and incremental learning in a never-ending data stream environment.

    • Learning motion rules from real data: Neural network for crowd simulation

      2018, Neurocomputing
      Citation Excerpt :

      In order to determine a more powerful fitting model that can learn all of the motion rules hidden in the real data, Neural Network (NN) is used. Neural networks, as a kind of popular learning algorithm in the machine learning and data mining community – especially when applied to nonlinear regression problems [12–14] – can fit almost any curve. Moreover, intuitively, NN can emulate the process of a brain by establishing contacts among neurons, which mimics the process of pedestrian's decision-making during walking.

    View all citing articles on Scopus

    Zhiyang Xiang received M.E. degree on computer science from Northwest A & F University, China. He is currently pursuing a Ph.D. degree in Hunan University, China. His research interests include neural networks algorithms and applications in information security.

    Zhu Xiao received M.E. and Ph.D. degrees on signal processing both from Xidan University, China. He is currently doing the teaching and research work in Hunan University, China. His primary research interests include wireless communications. His research interests also include pattern recognition algorithms.

    Dong Wang received M.E. and Ph.D. degrees on computer science from Hunan University, China. He is a Ph.D. director and a director to overseas graduate students in College of Computer Science and Electronics Engineering, Hunan University. His main research interests are computer networks and vehicular multimedia networks.

    Xiaohong Li received M.E. and Ph.D. degrees on computer science from Hunan University, China. He is currently teaching and researching in Hunan University. His main research interests are WSN topology control and scientific visualization.

    View full text