A Gaussian mixture framework for incremental nonparametric regression with topology learning neural networks

doi:10.1016/j.neucom.2016.02.008

Neurocomputing

Volume 194, 19 June 2016, Pages 34-44

https://doi.org/10.1016/j.neucom.2016.02.008 Get rights and content

Abstract

Incremental learning is important for memory critical systems, especially when the growth of information technology has pushed the memory and storage costs to limits. Despite the great amount of effort researching into incremental classification paradigms and algorithms, the regression is given far less attention. In this paper, an incremental regression framework that is able to model the linear and nonlinear relationships between response variables and explanatory variables is proposed. A three layer feed-forward neural network structure is devised where the weights of the hidden layer are trained by topology learning neural networks. A Gaussian mixture weighted integrator is used to synthesize from the output of the hidden layer to give smoothed predictions. Two hidden layer parameters learning strategies whether by Growing Neural Gas (GNG) or the single layered Self-Organizing Incremental Neural Network (SOINN) are explored. The GNG strategy is more robust and flexible, and single layered SOINN strategy is less sensitive to parameter settings. Experiments are carried out on an artificial dataset and 6 UCI datasets. The artificial dataset experiments show that the proposed method is able to give predictions more smoothed than K-nearest-neighbor (KNN) and the regression tree. Comparing to the parametric method Support Vector Regression (SVR), the proposed method has significant advantage when learning on data with multi-models. Incremental methods including Passive and Aggressive regression, Online Sequential Extreme Learning Machine, Self-Organizing Maps and Incremental K-means are compared with the proposed method on the UCI datasets, and the results show that the proposed method outperforms them on most datasets.

Introduction

With the amount of data growing rapidly in today׳s social and industrial life, it is beneficial for the data mining applications in these areas to be more space efficient. Not only can space efficient algorithms reduce the cost of data storage, they are important for memory critical systems such as embedded systems and autonomous robots. There has been a great amount of research into incremental and online data mining techniques which can reduce the memory required in learning. However, those researches mostly focus on classification and clustering. For incremental regression there has not been enough attention.

Conventional non-incremental regression methods are divided into two categories by their different approaches to nonlinear predictions, namely the parametric and nonparametric methods. The parametric methods assume that the model generated the data is an analytical model. For example, in Support Vector Regression (SVR) [1], the nonlinear model is assumed to a polynomial or a radiant basis function. The SVR learning is to tune the parameters of the models to reach the minimum error on the training data. In nonparametric regression methods like K-nearest-neighbor (KNN), data are not generalized by an analytical model but instead represented by a subset of the data.

A regression method is to learn a model $Y = f (X)$ , where X is the explanatory variables and Y is the response variables. The incremental learning methods process the data in a sequential manner $(X (1), Y (1))$ , $(X (2), Y (2))$ , …, $(X (t - 1), Y (t - 1))$ , $(X (t), Y (t))$ , … In each step, the model is updated by the input $(X (t), Y (t))$ as $f_{t} \leftarrow f_{t - 1}$ . In some incremental learning strategies, the input is stored in a buffer with size k and the model is updated every k steps as $f_{t} \leftarrow f_{t - k}$ . We refer a regression method as strictly incremental if and only if k=1. There are mainly two kinds of incremental methods like the non-incremental regression methods, namely the parametric and nonparametric methods.

Incremental parametric regression methods are often implemented as stochastic approximations to their non-incremental counterparts. The nonlinear prediction problem is a major challenge to adapt parametric regression methods for incremental learning. Researches such as the passive aggressive regression [2] are focusing on the linear regression. In parametric methods such as online SVR [3], the kernel parameter tuning which is important for the accuracy would be nearly impossible, since retraining is not allowed when the training is strictly incremental. In [4], [5], the nonlinear problem is transformed linear by random features [6]. However, through random feature techniques, those methods introduce low efficiency when dealing with high dimensional prediction problems. There is another approach utilizing the random feature mapping technique, namely the Online Sequential Extreme Learning Machine (OS-ELM) [7]. The main problem of OS-ELM is stability, which can be remedied by ensembles [8]. However, by ensemble the benefit of incremental learning is reduced.

Incremental nonparametric regression methods do not assume linearity or a pre-defined nonlinear model. There are mainly two types of incremental nonparametric regression methods. One is the decision tree methods such as [9]. The other is KNN used in combine of incremental clustering methods such as self-organizing map (SOM) [10]. In decision tree methods, the data space is divided into subspaces such that the nonlinear prediction problem is rendered linear in each of the subspaces. This subspace division approach is less efficient if the data increase in sample size and dimension. Besides, regression trees are unable to give smooth predictions as the parametric methods. KNN has to balance accuracy and efficiency, since the larger the subset selected for knowledge representation, the less efficient the algorithm becomes. Moreover, KNN is flawed in its generalization abilities when the distribution of the data for prediction is different from that of the training data. Another way to make use of incremental clustering abilities for regression is the clusterwise regression [11]. The drawbacks of this framework are the same as regression trees because that it is implemented in a subspace division manner as the regression tree. In summery, comparing to the parametric regression, there are generalization issues in nonparametric methods such as decision tree and incremental clustering based regression.

Incremental clustering methods are adapted for regression as mentioned above. There have been new advances in the area of incremental clustering since the self-organizing maps (SOM) used in [10], [11]. One is the data stream clustering such as data stream K-means [12], K-medians [13]. In these methods the clustering objective is to find the low density areas that separate the dense areas where data items are more crowded. Another category is the topology preserving and topology learning methods. SOM is such topology preserving learning methods that learns not only the vector quantization which is similar to the objective of conventional clustering methods, but also keeps the topology relationships of the clustering centers. One advantage of such topology preserving learning is that the data distribution can be represented more accurately [14], [15]. SOM is limited in its clustering ability, because it needs a pre-defined topology structure which might contradict the true distribution. Growing neural gas (GNG) [16], on the contrary, can perform topology learning incrementally. The drawback of GNG is that there is no limit for the growing of neurons even when there is no new information in the incoming data. This drawback is remedied in Self-Organizing Incremental Neural Networks (SOINN) [17]. The original SOINN contains two layered structure, but is reduced to single layered structures in later works such as [18], [19]. SOINN׳s incremental topology learning depends on the applicability of Delaunay triangulation construction from the data, thus a limitation is present when the training data are highly concentrated on some dimensions. Theoretically there is no such limitation for GNG. As a result, GNG and SOINN both have their own advantages.

In this paper, we propose an incremental nonparametric regression framework, with the topology learning neural networks as the solution for nonparametric distribution learning, and a Gaussian mixture regression framework for giving smoothed predictions. Two different approaches, GNG regression (GNGR) and the single layered SOINN regression (SOINNR), are explored. Our main contributions are listed as follows:

1.
An incremental nonparametric regression framework based on topology learning neural networks is proposed.
2.
A two steps regression mechanism is proposed. First, the joint density of explanatory variables and response variables are represented by the clustering results of topology learning neural networks. Second, the joint density function is used in a Gaussian mixture regression model to accomplish the regression task. Moreover, deductions are made to construct the regression function directly from the clustering results of topology learning neural networks.
3.
Two different approaches, GNGR and SOINNR, are proposed. Experimental results confirm that GNGR is more scalable, while SOINNR is less sensitive to training parameter settings.
4.
Comparing to parametric regression, the parameters in the proposed method can be reset without retraining the model. Comparing to nonparametric regression, the proposed framework is capable of smoothed prediction thus gives better generalization abilities.

Comparison experiments are carried out on an artificial dataset and six UCI datasets. The experiment on the artificial dataset shows that the proposed framework is more smoothed in prediction than KNN and the regression tree. Experiments on the UCI datasets show that the proposed framework outperforms KNN in accuracy and has better performance on most of the datasets than the existing incremental methods.

The rest of the paper is organized as follows. In Section 2, the algorithms of GNG and single layered SOINN which are preliminaries for later sections are introduced. Section 3 details our proposed framework and the algorithms. Section 4 is the experimental results and in Section 5 there are the conclusions.

Section snippets

Topology learning neural networks

Assume a data set ${X}$ with data points $X (1), X (2), X (3), \dots, X (i) \in R^{d}$ the learning task of GNG [16] and single layered SOINN [19] is that after a single pass scan of the dataset to represent the data by neurons i with weights $W_{i} \in R^{d}$ . Their learning objective is formally defined in most literature as a minimization of the reconstruction error [16] $\sum_{t = 1}^{| {X} |} \sum_{i \in N} ω_{i} ∥ X (t) - W_{i} ∥^{2}$ where N is the set of neurons and $ω_{i} = {\begin{matrix} 1, & nearest neuron to input X (t) is i \\ 0, & else \end{matrix}$ The minimization goal stated in Eq. (1) is not so different

From topology learning to incremental nonparametric regression

In this section we investigate a new perspective to interpret the topology learning results for regression. Assume that X and Y are the random variables involved in the regression problem, and Y is the response variables. Our task is similar to KNN where weights of clustering centers are used to construct a regression model $\hat{Y} = f (X | {W_{i}})$ , where $\hat{Y}$ means the estimated response variables. Topology learning results including weights of neurons ${W_{i}}$ and the times ${C_{i}}$ each neuron is selected as the

Experiments

The experiments are first carried out on an artificial dataset to illustrate the smoothed prediction ability of the proposed method. Then six UCI [23] datasets are used to evaluate the effect of smooth parameter and neuron numbers. Comparisons of prediction accuracies with conventional methods are carried out on the six UCI datasets. The details of the UCI datasets are listed in Table 1. All the accuracy values in this paper are the mean squared errors (MSE), and all data including the response

Conclusions

In this paper, we solve nonlinear regression and incremental regression problems in one framework. Incremental learning is solved by the incremental vector quantization abilities of GNG and single layered SOINN. Then smoothed nonlinear prediction is given by a Gaussian mixture regression model. The local minima difficulties in conventional neural networks are avoided by the stable distribution learning of GNG and single layered SOINN. Experimental results show that the proposed method is an

Acknowledgments

This work was supported in part by the National Natural Science Foundations of China (Nos. 61301148 and 61272061), the Fundamental Research Funds for the Central Universities of China, Hunan Natural Science Foundation of China.

Zhiyang Xiang received M.E. degree on computer science from Northwest A & F University, China. He is currently pursuing a Ph.D. degree in Hunan University, China. His research interests include neural networks algorithms and applications in information security.

References (29)

Y. Lan et al.
Ensemble of online sequential extreme learning machine
Neurocomputing
(2009)
K.-M. Osei-Bryson
Post-pruning in decision tree induction using multiple performance measures
Comput. Oper. Res.
(2007)
H. Cardot et al.
A fast and recursive algorithm for clustering large datasets with K-medians
Comput. Stat. Data Anal.
(2012)
S. Furao et al.
An incremental network for on-line unsupervised classification and topology learning
Neural Netw.
(2006)
S. Furao et al.
An enhanced self-organizing incremental neural network for online unsupervised learning
Neural Netw.
(2007)
I.-C. Yeh
Modeling of strength of high-performance concrete using artificial neural networks
Cem. Concr. Res.
(1998)
A.J. Smola et al.
A tutorial on support vector regression
Stat. Comput.
(2004)
K. Crammer et al.
Online passive-aggressive algorithms
J. Mach. Learn. Res.
(2006)
D. Brugger et al.
Online SVR training by solving the primal optimization problem
J. Signal Process. Syst.
(2011)
A.B. Goldberg, X. Zhu, A. Furger, J.-M. Xu, OASIS: online active semi-supervised learning, In: Proceedings of the...

A. Gijsberts, G. Metta, Incremental learning of robot dynamics using random features, In: 2011 IEEE International...

A. Rahimi, B. Recht, Random features for large-scale kernel machines, In: Advances in neural Information Processing...

N.-Y. Liang et al.

A fast and accurate online sequential learning algorithm for feedforward networks

IEEE Trans. Neural Netw.

(2006)

L.A. Silva, E. Del-Moral-Hernandez, A SOM combined with KNN for classification task, In: The 2011 International Joint...

Cited by (16)

Remote Parkinson's disease severity prediction based on causal game feature selection
2024, Expert Systems with Applications
Telemonitoring of Parkinson's disease has important implications for early diagnosis and treatment of patients. Most of the existing feature selection methods for remote prediction of PD severity are based on correlation and rarely consider causality, thus compromising the robustness of the model. Therefore, a causal game-based feature selection (CGFS) model is proposed for remote PD symptom severity assessment. Firstly, to address the challenge of small data size, the similar patient transfer strategy is designed to find data from source domain patients with conditions similar to those of the target patient. Secondly, the undirected equivalent greedy search method is employed to construct the causal graph between features and PD severity scores, and the robustness of the model is improved by selecting causal features. Then, to enhance the prediction accuracy, this paper utilizes the cooperative game approach Shapley value to evaluate the contribution of neighborhood nodes of the target value, and selects the features with causality and high contribution to form the final feature subset. Finally, the subset is input into the random forest to further enhance robustness and performance of the model. Experiments on Parkinson’s telemonitoring dataset and the tapping dataset with different biomarkers show that the robustness of the feature subset selected by the CGFS model, and the prediction performance is better than advanced models compared. Therefore, the validity and universality of the CGFS method is demonstrated in remote PD severity prediction.
Patient-specific game-based transfer method for Parkinson's disease severity prediction
2024, Artificial Intelligence in Medicine
Dysphonia is one of the early symptoms of Parkinson's disease (PD). Most existing methods use feature selection methods to find the optimal subset of voice features for all PD patients. Few have considered the heterogeneity between patients, which implies the need to provide specific prediction models for different patients. However, building the specific model faces the challenge of small sample size, which makes it lack generalization ability. Instance transfer is an effective way to solve this problem. Therefore, this paper proposes a patient-specific game-based transfer (PSGT) method for PD severity prediction. First, a selection mechanism is used to select PD patients with similar disease trends to the target patient from the source domain, which reduces the risk of negative transfer. Then, the contribution of the transferred subjects and their instances to the disease estimation of the target subject is fairly evaluated by the Shapley value, which improves the interpretability of the method. Next, the proportion of valid instances in the transferred subjects is determined, and the instances with higher contribution are transferred to further reduce the difference between the transferred instance subset and the target subject. Finally, the selected subset of instances is added to the training set of the target subject, and the extended data is fed into the random forest to improve the performance of the method. Parkinson's telemonitoring dataset is used to evaluate the feasibility and effectiveness. The mean values of mean absolute error, root mean square error, and volatility obtained by predicting motor-UPDRS and total-UPDRS for target patients are 1.59, 1.95, 1.56 and 1.98, 2.54, 1.94, respectively. Experiment results show that the PSGT has better performance in both prediction error and stability over compared methods.
Progress prediction of Parkinson's disease based on graph wavelet transform and attention weighted random forest
2022, Expert Systems with Applications
The progress prediction of Parkinson's disease (PD) is one of the most important issues in early diagnosis of PD. Many researches have been conducted in this field, however, most existing methods focus on the selection of baseline features and regressors to reduce prediction errors. Different from the previous studies, the main goal of this paper is to obtain more effective features by feature transformation of baseline features to improve the prediction performance. Therefore, this paper proposes a prediction model based on graph wavelet transform (GWT) and attention weighted random forest (RF). Firstly, a clustering algorithm is adopted to reduce the prediction error of the model. Next, a multi-scale analysis of the feature vectors by GWT is conducted to yield a frequency feature representation that is more relevant to the target value. Finally, the frequency features are input into the attention weighted RF to predict the severity of PD, allowing the results of decision trees with better predictive performance in the RF to be highlighted while reducing the risk of overfitting. The effectiveness of the method is evaluated on the Parkinson's telemonitoring dataset collected by the University of Oxford. The experimental results show that the mean absolute error and root mean squared error of the proposed method for predicting PD severity (motor- and total-UPDRS) are 1.53, 2.13 and 1.91, 2.70, respectively. Compared with the quoted optimal method, the errors are reduced by 7.27%, 4.05% and 5.45%, 1.10%, respectively. This indicates that the proposed method has better prediction performance.
A self-organizing deep belief network based on information relevance strategy
2020, Neurocomputing
Citation Excerpt :
Deep belief network (DBN), stacked by multiple restricted Boltzmann machines (RBMs), combining the vast learning ability of DNNs and the interpretability of reasoning-based process, has been considered as one of the most effective DNNs [4–7]. However, in the applications of DBN, how to design the structure of network is a crucial problem [8–10]. To design the structure of DBN, Srivastava et al. proposed a randomly dropout algorithm to drop out hidden neurons of DBN to prevent complex co-adaptations of feature detectors in the training process [11].
One of the major obstacles in using deep belief network (DBN) is the structure design. Numerous studies, both empirically and theoretically, show that choosing suitable structure can improve the performance of DBN. In this paper, a self-organizing DBN (S-DBN), based on the information relevance strategy (IRS), was proposed to design the structure of DBN. For this IRS, the maximal information coefficient was designed to measure the input and output information relevance of hidden neurons. Meanwhile, the mutual information was introduced to measure the information relevance among the hidden layers. Then, a novel self-organizing strategy was developed to grow and prune both the hidden neurons and layers during the training process. Moreover, a contrastive divergence algorithm was used to adjust the parameters of S-DBN. Finally, several benchmark problems were used to illustrate the effectiveness of S-DBN. The experimental results demonstrate that the proposed S-DBN owns better performance for classification problems and modeling nonlinear systems than some existing methods.
Short-term traffic volume prediction by ensemble learning in concept drifting environments
2019, Knowledge-Based Systems
Citation Excerpt :
Unfortunately, there would be flaws if this framework is applied directly to traffic volume prediction because of the inability to handle non-stationarity. In addition, the constructed model in [25,26] would be incremented infinitely and would perform badly in the unending traffic volume. Some other examples are an incremental framework based on semi-supervised learning proposed in [27] and the Bayesian particle filter sampling technique shown in [28], both of which are capable of non-linear and incremental learning in a never-ending data stream environment.
Because of the rapid changes in traffic conditions caused by various circumstances, such as road construction and traffic jams, the distribution of the traffic volume data changes over time. The performances of traditional traffic volume prediction methods, with fixed model types and parameter settings, suffer from gradual degradation during these concept drift processes. In this paper, a novel incremental regression framework under the concept drifting environment is proposed, with ensemble learning as the major solution for updating the distribution representation. First, we transform the regression problem of traffic volume forecasting into a binary classification problem. Second, loss functions for incremental and ensemble learning are constructed based on this transformation. Finally, the incremental learning of the regression function is formulated as stepwise updating of the decision hyperplane. The experimental results show that our method is more stable and accurate than the existing incremental and ensemble regression methods.
Learning motion rules from real data: Neural network for crowd simulation
2018, Neurocomputing
Citation Excerpt :
In order to determine a more powerful fitting model that can learn all of the motion rules hidden in the real data, Neural Network (NN) is used. Neural networks, as a kind of popular learning algorithm in the machine learning and data mining community – especially when applied to nonlinear regression problems [12–14] – can fit almost any curve. Moreover, intuitively, NN can emulate the process of a brain by establishing contacts among neurons, which mimics the process of pedestrian's decision-making during walking.
This paper addresses the problem of efficiently simulating a believable virtual crowd. Our method is the first one that uses the Neural Network (NN) model to fit behaviors from real crowd data to a crowd simulation. Unlike several rule-based approaches that often result in ‘walking robots’, our model can learn motion rules derived from real data and later simulate human walking motions. Additionally, unlike the existing data-driven crowd simulation methods that have to perform search operations on the bound dataset simultaneously during the simulation, our model directly uses the NN model to generate the proper motion for each crowd member. The proposed method is being tested on various scenarios and compared with state-of-the-art state-action-based methods that are commonly employed in data-driven crowd simulation systems. The results demonstrate a significant increase in speed, as well as better simulation quality.

View all citing articles on Scopus

Zhu Xiao received M.E. and Ph.D. degrees on signal processing both from Xidan University, China. He is currently doing the teaching and research work in Hunan University, China. His primary research interests include wireless communications. His research interests also include pattern recognition algorithms.

Dong Wang received M.E. and Ph.D. degrees on computer science from Hunan University, China. He is a Ph.D. director and a director to overseas graduate students in College of Computer Science and Electronics Engineering, Hunan University. His main research interests are computer networks and vehicular multimedia networks.

Xiaohong Li received M.E. and Ph.D. degrees on computer science from Hunan University, China. He is currently teaching and researching in Hunan University. His main research interests are WSN topology control and scientific visualization.

View full text

A Gaussian mixture framework for incremental nonparametric regression with topology learning neural networks

Abstract

Introduction

Section snippets

Topology learning neural networks

From topology learning to incremental nonparametric regression

Experiments

Conclusions

Acknowledgments

Neurocomputing

Comput. Oper. Res.

Comput. Stat. Data Anal.

Neural Netw.

Neural Netw.

Cem. Concr. Res.

A tutorial on support vector regression

Stat. Comput.

Online passive-aggressive algorithms

J. Mach. Learn. Res.

Online SVR training by solving the primal optimization problem

J. Signal Process. Syst.

A fast and accurate online sequential learning algorithm for feedforward networks

IEEE Trans. Neural Netw.