Reducing samples for accelerating multikernel semiparametric support vector regression

doi:10.1016/j.eswa.2009.12.058

Expert Systems with Applications

Volume 37, Issue 6, June 2010, Pages 4519-4525

https://doi.org/10.1016/j.eswa.2009.12.058 Get rights and content

Abstract

In this paper, the reducing samples strategy instead of classical $ν$ -support vector regression ( $ν$ -SVR), viz. single kernel $ν$ -SVR, is utilized to select training samples for admissible functions so as to curtail the computational complexity. The proposed multikernel learning algorithm, namely reducing samples based multikernel semiparametric support vector regression (RS-MSSVR), has an advantage over the single kernel support vector regression (classical $ε$ -SVR) in regression accuracy. Meantime, in comparison with multikernel semiparametric support vector regression (MSSVR), the algorithm is also favorable for computational complexity with the comparable generalization performance. Finally, the efficacy and feasibility of RS-MSSVR are corroborated by experiments on the synthetic and real-world benchmark data sets.

Section snippets

Motivation

The support vector machines (SVMs) proposed by Vapnik and his group (Burges, 1998, Cristianini and Shawe-Taylor, 2000, Schölkopf and Smola, 2002, Vapnik, 1995) have a foolproof theoretical foundation, viz. structural risk minimization (SRM) principle, which minimizes the upper bound of generalization error consisting of training errors and the confidence interval. As a state-of-the-art tool, SVM was first presented to cope with binary classification problem, and then it was extended to

Multikernel semiparametric $ε$ -SVR

Considering the training set ${(x_{i}, d_{i})}_{i = 1}^{N}$ , where $x_{i} \in R^{n}$ is the input variable and $d_{i} \in R$ is the corresponding output variable, with the $ε$ -insensitive loss function, we can get the following model $\min_{w, b} \{\frac{1}{2} w_{s}^{T} w_{s} + C \sum_{i = 1}^{N} (ξ_{i}^{*} + ξ_{i})\}$ $\begin{matrix} s . t . d_{i} - w_{s} \cdot φ_{s} (x_{i}) - \sum_{p = 1}^{B} b_{p} ϕ_{p} (x_{i}) ⩽ ε + ξ_{i}^{*} \\ w_{s} \cdot φ_{s} (x_{i}) + \sum_{p = 1}^{B} b_{p} ϕ_{p} (x_{i}) - d_{i} ⩽ ε + ξ_{i} \\ ξ_{i}^{*}, ξ_{i} ⩾ 0, i = 1, \dots, N \end{matrix}$ where $ε > 0$ is the width of the tolerance band, $C > 0$ is the user-selected regularization parameter, $w_{s}$ represents the model complexity, $φ_{s} (\cdot)$ is usually a nonlinear mapping which is induced from the

Reducing samples based MSSVR

Based on experimental observations that support vectors tend to take the extreme target values among the values of their k-nearest neighbors, Guo and Zhang (2007) proposed a reducing samples strategy to alleviate the training computational burden of SVR with the comparable generalization performance with that trained on the full training set. Under some conditions, they gave 4 propositions mathematically to support their viewpoint. In addition, plenty of synthetic and real-world experiments

Experiments

To validate the effectiveness and feasibility of Algorithm 3, in this section, we will do experiments on the synthetic and real-world benchmark data sets. In our paper, all the experiments are performed on a personal computer with AMD 3200+ (2.01GHz) processor, 512MB memory, and Windows XP operation system in a MATLAB 7.1 environment. For all algorithms, the quadratic programming is solved using the active method in the MATLAB toolbox. Moreover, to conveniently compare with different

Conclusions

In our real world, a lot of systems own different data trends in different regions. In this situation, the commonly-used single-kernel learning algorithm sometimes does not achieve satisfactory result. Hence, it is necessary to develop multikernel learning algorithms. Nguyen and Tay (2008) proposed a multikernel semiparametric support vector regression to cope with the systems holding complicated structure, viz. different data trends in different regions. Compared with the single-kernel

Acknowledgment

This research was supported by the National Natural Science Foundation of China under Grant No. 50576033.

References (27)

S. Chen et al.
Seeking multi-thresholds directly from support vectors for image segmentation
Neurocomputing
(2005)
G. Guo et al.
Support vector machines for face recognition
Image and Vision Computing
(2001)
G. Guo et al.
Reducing examples to accelerate support vector regression
Pattern Recognition Letter
(2007)
K.W. Lau et al.
Local prediction of non-linear time series using support vector regression
Pattern Recognition
(2008)
Bi, J., Zhang, T., & Bennett, K. (2004). Column-generation boosting methods for mixture of kernels. In KDD (pp....
C.J.C. Burges
A tutorial on support vector machines for pattern recognition
Data Mining and Knowledge Discovery
(1998)
Chang, C. C., & Lin, C. J. (2001a). LIBSVM: A library for support vector machines....
C.C. Chang et al.
Training $ν$ -support vector classifiers: Theory and algorithms
Neural Computation
(2001)
C.C. Chang et al.
Training $ν$ -support vector regression: Theory and algorithms
Neural Computation
(2002)
R. Collobert et al.
SVMTorch: Support vector machines for large-scale regression problems
Journal of Machine Learning Research
(2001)

N. Cristianini et al.

An introduction to support vector machines

(2000)

Joachims, T. (1999). Making large-scale SVM learning practical. In Advances in kernel methods-support vector machine....

D. Kim et al.

$ε$ -tube based pattern selection for support vector machines

Lecture Notes in Computer Science

(2006)

Cited by (8)

Selecting rows and columns for training support vector regression models with large retail datasets
2013, European Journal of Operational Research
Citation Excerpt :
However, business data sets with hundreds of input variables are very sparse due to curse of dimensionality and the distance definition affects the choice of training data. Zhao et al. (2010) choose observations within the multiple kernel support vector regression framework by extending Guo and Zhang’s ideas: they use the Euclidean distance between two observations or the cosine of the angle between the vectors, while noting that the measurement function can be defined in different ways. Kim and Cho (2006) create bootstrap samples to estimate the regression function, and retain those observations with high probability of falling within the ε-tubed as the final training dataset.
Although support vector regression models are being used successfully in various applications, the size of the business datasets with millions of observations and thousands of variables makes training them difficult, if not impossible to solve. This paper introduces the Row and Column Selection Algorithm (ROCSA) to select a small but informative dataset for training support vector regression models with standard SVM tools. ROCSA uses ε-SVR models with L₁-norm regularization of the dual and primal variables for the row and column selection steps, respectively. The first step involves parallel processing of data chunks and selects a fraction of the original observations that are either representative of the pattern identified in the chunk, or represent those observations that do not fit the identified pattern. The column selection step dramatically reduces the number of variables and the multicolinearity in the dataset, increasing the interpretability of the resulting models and their ease of maintenance. Evaluated on six retail datasets from two countries and a publicly available research dataset, the reduced ROCSA training data improves the predictive accuracy on average by 39% compared with the original dataset when trained with standard SVM tools. Comparison with the ε SSVR method using reduced kernel technique shows similar performance improvement. Training a standard SVM tool with the ROCSA selected observations improves the predictive accuracy on average by 21% compared to the practical approach of random sampling.
Sparse multikernel support vector regression machines trained by active learning
2012, Expert Systems with Applications
Citation Excerpt :
Step 9: If the stopping criterion is not met, go to Step 3. In Zhao et al. (2010), limited training data set is also considered for multikernel support vector regression. Reducing samples strategy is utilised to select training samples for admissible functions.
A method for the sparse multikernel support vector regression machines is presented. The proposed method achieves a high accuracy versus complexity ratio and allows the user to adjust the complexity of the resulting models. The sparse representation is guaranteed by limiting the number of training data points for the support vector regression method. Each training data point is selected based on its influence on the accuracy of the model using the active learning principle. A different kernel function is attributed to each training data point, yielding multikernel regressor. The advantages of the proposed method are illustrated on several examples and the experiments show the advantages of the proposed method.
New validation methods for improving standard and multi-parametric support vector regression training time
2012, Expert Systems with Applications
Citation Excerpt :
In spite of this, the time for training a SVMr model can be very high due to the SVMr performance heavily depends on the choice of several hyper-parameters, necessary to define the optimization problem and the final SVMr model. There are different approaches focused on reducing this hard computation time of the SVMr model: in Guo and Zhang (2007) a method based on reducing the number of the samples included in the SVMr training is proposed, and in Zhao, Sun, and Zou (2010) a similar idea is applied to multi-parametric kernel SVMr. In Zhao and Sun (2010) a different methodology is applied, based on approximating the SVMr solution instead of solving the optimization problem in an exact way.
The selection of hyper-parameters in support vector regression algorithms (SVMr) is an essential process in the training of these learning machines. Unfortunately, there is not an exact method to obtain the optimal values of SVMr hyper-parameters. Therefore, it is necessary to use a search algorithm and sometimes a validation method in order to find the best combination of hyper-parameters. The problem is that the SVMr training time can be huge in large training databases if standard search algorithms and validation methods (such as grid search and K-fold cross validation), are used. In this paper we propose two novel validation methods which reduce the SVMr training time, maintaining the accuracy of the final machine. We show the good performance of both methods in the standard SVMr with 3 hyper-parameters (where the hyper-parameters search is usually carried out by means of a grid search) and also in the extension to multi-parametric kernels, where meta-heuristic approaches such as evolutionary algorithms must be used to look for the best set of SVMr hyper-parameters. In all cases the new validation methods have provided very good results in terms of training time, without affecting the final SVMr accuracy.
Kernel-based regression via a novel robust loss function and iteratively reweighted least squares
2021, Knowledge and Information Systems
Heuristic sample reduction based support vector regression method
2016, 2016 IEEE International Conference on Mechatronics and Automation, IEEE ICMA 2016
An activity improved robust adaptive normal SVR optimization algorithm
2014, Journal of Computational Information Systems

View all citing articles on Scopus

View full text

Reducing samples for accelerating multikernel semiparametric support vector regression

Abstract

Section snippets

Motivation

Multikernel semiparametric ε-SVR

Reducing samples based MSSVR

Experiments

Conclusions

Acknowledgment

Neurocomputing

Image and Vision Computing

Pattern Recognition Letter

Pattern Recognition

A tutorial on support vector machines for pattern recognition

Data Mining and Knowledge Discovery

Training ν-support vector classifiers: Theory and algorithms

Neural Computation

Training ν-support vector regression: Theory and algorithms

Neural Computation

SVMTorch: Support vector machines for large-scale regression problems

Journal of Machine Learning Research

An introduction to support vector machines

ε-tube based pattern selection for support vector machines

Lecture Notes in Computer Science

Multikernel semiparametric $ε$ -SVR

Training $ν$ -support vector classifiers: Theory and algorithms

Training $ν$ -support vector regression: Theory and algorithms

$ε$ -tube based pattern selection for support vector machines