A Gaussian mixture framework for incremental nonparametric regression with topology learning neural networks
Introduction
With the amount of data growing rapidly in today׳s social and industrial life, it is beneficial for the data mining applications in these areas to be more space efficient. Not only can space efficient algorithms reduce the cost of data storage, they are important for memory critical systems such as embedded systems and autonomous robots. There has been a great amount of research into incremental and online data mining techniques which can reduce the memory required in learning. However, those researches mostly focus on classification and clustering. For incremental regression there has not been enough attention.
Conventional non-incremental regression methods are divided into two categories by their different approaches to nonlinear predictions, namely the parametric and nonparametric methods. The parametric methods assume that the model generated the data is an analytical model. For example, in Support Vector Regression (SVR) [1], the nonlinear model is assumed to a polynomial or a radiant basis function. The SVR learning is to tune the parameters of the models to reach the minimum error on the training data. In nonparametric regression methods like K-nearest-neighbor (KNN), data are not generalized by an analytical model but instead represented by a subset of the data.
A regression method is to learn a model , where X is the explanatory variables and Y is the response variables. The incremental learning methods process the data in a sequential manner , , …, , , … In each step, the model is updated by the input as . In some incremental learning strategies, the input is stored in a buffer with size k and the model is updated every k steps as . We refer a regression method as strictly incremental if and only if k=1. There are mainly two kinds of incremental methods like the non-incremental regression methods, namely the parametric and nonparametric methods.
Incremental parametric regression methods are often implemented as stochastic approximations to their non-incremental counterparts. The nonlinear prediction problem is a major challenge to adapt parametric regression methods for incremental learning. Researches such as the passive aggressive regression [2] are focusing on the linear regression. In parametric methods such as online SVR [3], the kernel parameter tuning which is important for the accuracy would be nearly impossible, since retraining is not allowed when the training is strictly incremental. In [4], [5], the nonlinear problem is transformed linear by random features [6]. However, through random feature techniques, those methods introduce low efficiency when dealing with high dimensional prediction problems. There is another approach utilizing the random feature mapping technique, namely the Online Sequential Extreme Learning Machine (OS-ELM) [7]. The main problem of OS-ELM is stability, which can be remedied by ensembles [8]. However, by ensemble the benefit of incremental learning is reduced.
Incremental nonparametric regression methods do not assume linearity or a pre-defined nonlinear model. There are mainly two types of incremental nonparametric regression methods. One is the decision tree methods such as [9]. The other is KNN used in combine of incremental clustering methods such as self-organizing map (SOM) [10]. In decision tree methods, the data space is divided into subspaces such that the nonlinear prediction problem is rendered linear in each of the subspaces. This subspace division approach is less efficient if the data increase in sample size and dimension. Besides, regression trees are unable to give smooth predictions as the parametric methods. KNN has to balance accuracy and efficiency, since the larger the subset selected for knowledge representation, the less efficient the algorithm becomes. Moreover, KNN is flawed in its generalization abilities when the distribution of the data for prediction is different from that of the training data. Another way to make use of incremental clustering abilities for regression is the clusterwise regression [11]. The drawbacks of this framework are the same as regression trees because that it is implemented in a subspace division manner as the regression tree. In summery, comparing to the parametric regression, there are generalization issues in nonparametric methods such as decision tree and incremental clustering based regression.
Incremental clustering methods are adapted for regression as mentioned above. There have been new advances in the area of incremental clustering since the self-organizing maps (SOM) used in [10], [11]. One is the data stream clustering such as data stream K-means [12], K-medians [13]. In these methods the clustering objective is to find the low density areas that separate the dense areas where data items are more crowded. Another category is the topology preserving and topology learning methods. SOM is such topology preserving learning methods that learns not only the vector quantization which is similar to the objective of conventional clustering methods, but also keeps the topology relationships of the clustering centers. One advantage of such topology preserving learning is that the data distribution can be represented more accurately [14], [15]. SOM is limited in its clustering ability, because it needs a pre-defined topology structure which might contradict the true distribution. Growing neural gas (GNG) [16], on the contrary, can perform topology learning incrementally. The drawback of GNG is that there is no limit for the growing of neurons even when there is no new information in the incoming data. This drawback is remedied in Self-Organizing Incremental Neural Networks (SOINN) [17]. The original SOINN contains two layered structure, but is reduced to single layered structures in later works such as [18], [19]. SOINN׳s incremental topology learning depends on the applicability of Delaunay triangulation construction from the data, thus a limitation is present when the training data are highly concentrated on some dimensions. Theoretically there is no such limitation for GNG. As a result, GNG and SOINN both have their own advantages.
In this paper, we propose an incremental nonparametric regression framework, with the topology learning neural networks as the solution for nonparametric distribution learning, and a Gaussian mixture regression framework for giving smoothed predictions. Two different approaches, GNG regression (GNGR) and the single layered SOINN regression (SOINNR), are explored. Our main contributions are listed as follows:
- 1.
An incremental nonparametric regression framework based on topology learning neural networks is proposed.
- 2.
A two steps regression mechanism is proposed. First, the joint density of explanatory variables and response variables are represented by the clustering results of topology learning neural networks. Second, the joint density function is used in a Gaussian mixture regression model to accomplish the regression task. Moreover, deductions are made to construct the regression function directly from the clustering results of topology learning neural networks.
- 3.
Two different approaches, GNGR and SOINNR, are proposed. Experimental results confirm that GNGR is more scalable, while SOINNR is less sensitive to training parameter settings.
- 4.
Comparing to parametric regression, the parameters in the proposed method can be reset without retraining the model. Comparing to nonparametric regression, the proposed framework is capable of smoothed prediction thus gives better generalization abilities.
Comparison experiments are carried out on an artificial dataset and six UCI datasets. The experiment on the artificial dataset shows that the proposed framework is more smoothed in prediction than KNN and the regression tree. Experiments on the UCI datasets show that the proposed framework outperforms KNN in accuracy and has better performance on most of the datasets than the existing incremental methods.
The rest of the paper is organized as follows. In Section 2, the algorithms of GNG and single layered SOINN which are preliminaries for later sections are introduced. Section 3 details our proposed framework and the algorithms. Section 4 is the experimental results and in Section 5 there are the conclusions.
Section snippets
Topology learning neural networks
Assume a data set with data points the learning task of GNG [16] and single layered SOINN [19] is that after a single pass scan of the dataset to represent the data by neurons i with weights . Their learning objective is formally defined in most literature as a minimization of the reconstruction error [16]where N is the set of neurons andThe minimization goal stated in Eq. (1) is not so different
From topology learning to incremental nonparametric regression
In this section we investigate a new perspective to interpret the topology learning results for regression. Assume that X and Y are the random variables involved in the regression problem, and Y is the response variables. Our task is similar to KNN where weights of clustering centers are used to construct a regression model , where means the estimated response variables. Topology learning results including weights of neurons and the times each neuron is selected as the
Experiments
The experiments are first carried out on an artificial dataset to illustrate the smoothed prediction ability of the proposed method. Then six UCI [23] datasets are used to evaluate the effect of smooth parameter and neuron numbers. Comparisons of prediction accuracies with conventional methods are carried out on the six UCI datasets. The details of the UCI datasets are listed in Table 1. All the accuracy values in this paper are the mean squared errors (MSE), and all data including the response
Conclusions
In this paper, we solve nonlinear regression and incremental regression problems in one framework. Incremental learning is solved by the incremental vector quantization abilities of GNG and single layered SOINN. Then smoothed nonlinear prediction is given by a Gaussian mixture regression model. The local minima difficulties in conventional neural networks are avoided by the stable distribution learning of GNG and single layered SOINN. Experimental results show that the proposed method is an
Acknowledgments
This work was supported in part by the National Natural Science Foundations of China (Nos. 61301148 and 61272061), the Fundamental Research Funds for the Central Universities of China, Hunan Natural Science Foundation of China.
Zhiyang Xiang received M.E. degree on computer science from Northwest A & F University, China. He is currently pursuing a Ph.D. degree in Hunan University, China. His research interests include neural networks algorithms and applications in information security.
References (29)
- et al.
Ensemble of online sequential extreme learning machine
Neurocomputing
(2009) Post-pruning in decision tree induction using multiple performance measures
Comput. Oper. Res.
(2007)- et al.
A fast and recursive algorithm for clustering large datasets with K-medians
Comput. Stat. Data Anal.
(2012) - et al.
An incremental network for on-line unsupervised classification and topology learning
Neural Netw.
(2006) - et al.
An enhanced self-organizing incremental neural network for online unsupervised learning
Neural Netw.
(2007) Modeling of strength of high-performance concrete using artificial neural networks
Cem. Concr. Res.
(1998)- et al.
A tutorial on support vector regression
Stat. Comput.
(2004) - et al.
Online passive-aggressive algorithms
J. Mach. Learn. Res.
(2006) - et al.
Online SVR training by solving the primal optimization problem
J. Signal Process. Syst.
(2011) - A.B. Goldberg, X. Zhu, A. Furger, J.-M. Xu, OASIS: online active semi-supervised learning, In: Proceedings of the...
A fast and accurate online sequential learning algorithm for feedforward networks
IEEE Trans. Neural Netw.
Cited by (16)
Remote Parkinson's disease severity prediction based on causal game feature selection
2024, Expert Systems with ApplicationsPatient-specific game-based transfer method for Parkinson's disease severity prediction
2024, Artificial Intelligence in MedicineProgress prediction of Parkinson's disease based on graph wavelet transform and attention weighted random forest
2022, Expert Systems with ApplicationsA self-organizing deep belief network based on information relevance strategy
2020, NeurocomputingCitation Excerpt :Deep belief network (DBN), stacked by multiple restricted Boltzmann machines (RBMs), combining the vast learning ability of DNNs and the interpretability of reasoning-based process, has been considered as one of the most effective DNNs [4–7]. However, in the applications of DBN, how to design the structure of network is a crucial problem [8–10]. To design the structure of DBN, Srivastava et al. proposed a randomly dropout algorithm to drop out hidden neurons of DBN to prevent complex co-adaptations of feature detectors in the training process [11].
Short-term traffic volume prediction by ensemble learning in concept drifting environments
2019, Knowledge-Based SystemsCitation Excerpt :Unfortunately, there would be flaws if this framework is applied directly to traffic volume prediction because of the inability to handle non-stationarity. In addition, the constructed model in [25,26] would be incremented infinitely and would perform badly in the unending traffic volume. Some other examples are an incremental framework based on semi-supervised learning proposed in [27] and the Bayesian particle filter sampling technique shown in [28], both of which are capable of non-linear and incremental learning in a never-ending data stream environment.
Learning motion rules from real data: Neural network for crowd simulation
2018, NeurocomputingCitation Excerpt :In order to determine a more powerful fitting model that can learn all of the motion rules hidden in the real data, Neural Network (NN) is used. Neural networks, as a kind of popular learning algorithm in the machine learning and data mining community – especially when applied to nonlinear regression problems [12–14] – can fit almost any curve. Moreover, intuitively, NN can emulate the process of a brain by establishing contacts among neurons, which mimics the process of pedestrian's decision-making during walking.
Zhiyang Xiang received M.E. degree on computer science from Northwest A & F University, China. He is currently pursuing a Ph.D. degree in Hunan University, China. His research interests include neural networks algorithms and applications in information security.
Zhu Xiao received M.E. and Ph.D. degrees on signal processing both from Xidan University, China. He is currently doing the teaching and research work in Hunan University, China. His primary research interests include wireless communications. His research interests also include pattern recognition algorithms.
Dong Wang received M.E. and Ph.D. degrees on computer science from Hunan University, China. He is a Ph.D. director and a director to overseas graduate students in College of Computer Science and Electronics Engineering, Hunan University. His main research interests are computer networks and vehicular multimedia networks.
Xiaohong Li received M.E. and Ph.D. degrees on computer science from Hunan University, China. He is currently teaching and researching in Hunan University. His main research interests are WSN topology control and scientific visualization.