Elsevier

Neurocomputing

Volume 134, 25 June 2014, Pages 173-180
Neurocomputing

Sparse Ridgelet Kernel Regressor and its online sequential extreme learning

https://doi.org/10.1016/j.neucom.2012.12.066Get rights and content

Abstract

In this paper, a Sparse Ridgelet Kernel Regressor (SRKR) is constructed by combing the ridgelet theory, the sparse theory with kernel trick. By using the dimensionality non-separable ridgelet kernels, SRKR is capable of processing the high-dimensional data more efficiently. Considering the preferable future of sequential learning over batch learning in problems where data arrive constantly and where batch learning is expensive, we exploit the new kernel method in an online setting using the sequential extreme learning scheme. The online learning algorithm of the examples, named Online Sequential Extreme Learning Algorithm (OS-ELA) is employed to rapidly produce a sequence of estimations. OS-ELA learns the training data one-by-one or chunk by chunk (with fixed or varying size), and discards them as long as the training procedure for those data is completed to keep the memory bounded in online learning. Evolution scheme is also incorporated to obtain a ‘good’ sparse regressor. Experiments are taken on some nonlinear time-series prediction problems, in which the examples are available one by one. Some comparisons are made and the experimental results show its efficiency and superiority to its counterparts.

Introduction

Regressor is widely used for prediction and forecasting, where it is used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships [1]. Substantial regression techniques have been developed, including the parametric regression and nonparametric regression [2], [3]. Parametric regression methods define an unknown function in terms of a finite number of unknown parameters that are estimated from some observed data such as linear regression and ordinary least square [4], while nonparametric regression allows the regression function to lie in a specified set of functions that may be infinite-dimensional, such as neural networks and support vector machines [5].

Although many nonparametric regression approaches have been developed, there are two difficulties in reaching their success in practical applications. Firstly, in the nonparametric regression, determining a function from limited number of examples is a well known ill-posed problem, and in order to obtain a unique solution we should regularize it by adding some assumptions on the function, such as the smoothness of approximated functions. However, in practical applications such as the financial series prediction, microarray gene expressions analysis, electric grid monitoring, identification and control of industrial systems, most of the systems are of multi-input and multi-output (MIMO) that should be described by a complex mapping function with singularities (or spatial inhomogeneities). Though the assumption is sometimes testable if a large amount of data is available, unfortunately, regression methods will give misleading results in approximation of these singular functions, especially with small number of observational data. Secondly, the incremental learning of accumulating knowledge obtained from data sets given at different times is desirable in many practical systems because the observed data usually become available gradually. For example, in the electric grid monitoring application where synchrophasors are used to monitor the conditions of the transmission lines and data samples are collected every 0.05 s [6], decisions may be required within a few minutes. Our work will address these two issues by constructing a Ridgelet Kernel Regressor and investigate its online sequential learning pattern.

The kernel trick based on learning machines is among the best (and many believe is indeed the best) “off-the-shelf” supervised learning method. Over the last 10 years the kernel methods based on reproducing kernel Hilbert spaces (RKHS) have demonstrated great successes in solving function approximations. Some pioneering works have been done by Aronszajin [7], Aizerman et al. [8], Kimeldorf and Wahba [9], [10], and Duttweiler and Kailath [11]. They exploit an efficient way for a nonlinear (implicit) mapping inspired by kernel function, from the finite dimensional data space to a very high, or even infinite dimensional Hilbert space called feature space [12]. The regression task is “transferred” to the feature space as soon as the kernel function is defined. Subsequently the inner product is simply given by an evaluation of the kernel function performed at data space, which can carry out the calculations without making direct reference to the nonlinear mapping of input vectors [13], [14]. In this paper, a Sparse Ridgelet Kernel Regressor (SRKR) for multidimensional function approximation is proposed motivated by multiscale geometric analysis (MGA) [15], [16], [17], [18] and kernel machine. Considering the singularity of the functions to be approximated and the sparseness request of solutions, we construct a sparse ridgelet kernel function based on the previous work [19]. By using a finite number of ridgelet kernels, SRKR can accurately catch the straight-linear and curvilinear singularities and thus process the high-dimensional data more efficiently [15], [16].

Though kernel methods have achieved successes in batch settings [20], and the batch algorithms can be also realized by utilizing a sliding buffer [21], it is more desirable to have a truly online algorithm where the data arrive sequentially. However, the extension of batch learning of kernel regressor to online setting is not so easy, mainly because of the following three reasons [22], [23], [24], [25], [26], [27]: (1) the regression function will keep updating in online learning, which will consequently make the support set grow unboundedly, so the amount of memory required to store the online hypothesis may increase without bound as the algorithm progresses; (2) as a direct result of (1), the complexity of the online training algorithm increases linearly with the number of the observed data; and (3) the online learning algorithm for kernel regressor is reliable to overfitting and memory explosion.

To attack these problems, some online kernel regression approaches have focused on discarding some of the instances by assuming an upper bound on the complexity of the regressor, in order to keep the memory bounded [28], [29], [30], [31], [32], [33], [34], [35]. In the online regression, once the size of the support set reaches the budget, an instance from the support set that meets some criterion is removed, and replaced by the new instance. The first algorithm to overcome the unlimited growth of the support set was proposed by Crammer et al. [29], and refined by Weston et al. [30]. The similar strategy is also used in NORMA [31] and SILK [32]. The very first online algorithm to have a fixed memory budget and a relative mistake bound is the Forgetron algorithm [33]. A stochastic algorithm that on average achieves similar performance to Forgetron, and with a similar mistake bound was proposed by Cesa-Bianchi et al. [34]. Recent work by Langford et al. [35] proposed a parameter that trades accuracy for sparseness in the weights of online learning algorithms. An accurate online SVR (AOSVR) technique [15] is also proposed to accurately update SVR parameters each time a new sample is added to the training set. However, as to these methods, the solutions are usually not very sparse and the number of support vectors is strongly correlated with the sample size. This drawback prompts other formulations to produce better sparse kernel models.

In this paper, in order to achieve a rapid online learning of SRKR, the kernel regression under the extreme learning machine (ELM) framework is investigated [36], and an Online Sequential Extreme Learning Algorithm (OS-ELA) [37], [38], [39], [40] is proposed to produce a sequence of estimations in SRKR. OS-ELA learn the training data one-by-one or chunk by chunk (with fixed or varying size), and discard them as long as the training procedure for those data is completed to keep the memory bounded in online learning. For the randomness of the kernel parameters in SRKR, the evolution scheme is also incorporated to obtain a ‘good’ sparse regressor. In summary, the proposed regressor is characteristics of (1) employing ridgelet kernels to approximate quite a wide range of singular functions in more sparse formulation; (2) developing an Online Sequential Extreme Learning Algorithm whose solution is guaranteed to be bounded; and (3) incorporating an evolutionary scheme to select a “good” sparse regressor. Some experiments are taken on some nonlinear time-series prediction problems, in which the examples are available one by one. Some comparisons are made and the experimental results show its efficiency and superiority to its counterparts.

The rest of this paper is organized as follows. In Section 2, we discuss the sparse ridgelet kernel regression approach. In Section 3, the evolution aided Online Sequential Extreme Learning Algorithm is introduced. In Section 4 some simulation experiments are taken to illustrate its efficiency and superiority to its counterparts. Finally some conclusions are drawn in Section 5.

Section snippets

Sparse Ridgelet Kernel Regressor

Various kinds of wavelets have been used to construct multiscale kernels for regression [41], [42]. For high dimensional functions, tensor product wavelet kernels are constructed. It is well known that the number of wavelet basis in L2(d) is exponential with respect to the input dimensionality d; however, wavelet fails in capturing the geometrical regularity in multidimensional data [43]. Ridgelet provides an efficient way to analyze the higher dimensional singularities and proves to be

OS-ELA for Sparse Ridgelet Kernel Regressor

Just as we have mentioned in the introduction, the extension of batch learning of kernel regressor to sequential training requires a large amount of memory, and is characteristic of high training complexity. So in this section, we are interested in an extreme online learning of the above Sparse Ridgelet Kernel Regressor, where the examples become available one by one and learned under the extreme learning framework. The online kernel regression problem is defined as a weighted sum of kernel

Simulation experiments

In this section some experiments are taken to investigate the performance of the proposed method, including the prediction of artificial data and practical power quality events data. All the simulations are carried out in MATLAB 6.1 environment running in a Pentium 4, 3.06 GHZ CPU. Firstly we use the forecasting of a typical nonlinear time series, Macky_Glass time series to test the proposed method, which is frequently referred in the chaos theory and often be used to test the learning and

Conclusion

In this paper, a Sparse Ridgelet Kernel Regressor for multidimensional function approximation is proposed motivated by the multiscale geometric analysis theory and the kernel machine theory. For the non-separable form of the ridgelet kernels in high dimensional space, the new regressor can approximate multidimensional functions more efficiently. An online learning algorithm of the examples, named evolution aided Online Sequential Extreme Learning Algorithm is employed to rapidly produce a

Acknowledgment

The authors would like to thank the anonymous reviewers for their very constructive comments. This work was supported by the National Basic Research Program of China (973 Program) under Grant no. 2013CB329402, the National Science Foundation of China under Grant nos. 61072108, 61173090, 61271290, 61272282, NCET-10-0668, and 9140A24070412DZ0101, Huawei Innovation Research Program IRP2013-01-09, and the National Research Foundation for the Doctoral Program of Higher Education of China under Grant

Shuyuan Yang received the B.S. degree in Electrical Engineering from Xidian University, Xi’an, China, in 2000; the M.S. degree and Ph.D. degree in Circuit and System from Xidian University, Xi’an, China, in 2003 and 2005, respectively. She is currently working towards as a postdoctoral fellowship at the Institute of Intelligent Information Processing, Xidian University, China. Her main current research interests are intelligent signal processing, machine learning and image processing.

References (52)

  • W. Härdle

    Applied Nonparametric Regression

    (1990)
  • T. Hastie et al.

    The Elements of Statistical Learning

    (2001)
  • G.G. Lorentz et al.

    Constructive Approximation: Advanced Problems

    (1996)
  • C.M. Bishop

    Neural Networks for Pattern Recognition

    (1995)
  • J.N. Bank, O.A. Omitaomu, S.J. Fernandez, Y. Liu, Visualization and classification of power system frequency data...
  • N. Aronszajn

    Theory of reproducing kernels

    Trans. Am. Math. Soc.

    (1950)
  • M.A. Aizerman et al.

    The method of potential functions for the problem of restoring the characteristic of a function converter from randomly observed points

    Autom. Remote Control

    (1964)
  • G. Wahba

    Spline Models for Observational Data

    (1990)
  • D.L. Duttweiler et al.

    An RKHS approach to detection and estimation theory: Some parameter estimation problems (Part V)

    IEEE Trans. Inf. Theory

    (1973)
  • B. Schölkopf et al.

    Advances in Kernel Methods

    (1999)
  • B. Schölkopf et al.

    Learning With Kernels

    (2002)
  • S. Theodoridis et al.

    Pattern Recognition

    (2006)
  • J. Ma et al.

    Accurate on-line support vector regression

    Neural Comput.

    (2003)
  • E.J..Candès, Ridgelets and Their Derivatives: Representation of Images with Edges, Curves and Surfaces,...
  • E.J. Candès

    Ridgelets and the Representation of Mutilated Sobolev Functions

    SIAM J. Math. Anal.

    (2002)
  • J.L. Starck et al.

    The curvelet transform for image denoising

    IEEE Trans. Image Process.

    (2002)
  • Cited by (5)

    • Tensile property prediction by feature engineering guided machine learning in reduced activation ferritic/martensitic steels

      2020, Journal of Nuclear Materials
      Citation Excerpt :

      This modeling framework could be considered as a new strategy for the quantitative guidance of the alloy design of RAFM steels. As one of the main topic in machine learning, supervised learning was studied in decades and various kinds of regressors were developed, as back propagation artificial neural network (BP-ANN) [39], support vector regressor (SVR) [40], random forests regressor (RFR) [41], Gradient boosting regressor (GBR) [42], Kernel ridge regressor (KRR) [43], etc.. Every regressor had its advantages and characteristic of dealing with different kinds of database.

    • Modeling collinear data using double-layer GA-based selective ensemble kernel partial least squares algorithm

      2017, Neurocomputing
      Citation Excerpt :

      In nature, except near the steady working condition, most of the industrial processes are nonlinear. Thus, kernel methods have become one of the simple and elegant approaches in soft measuring model's development [10–17]. Kernel PLS (KPLS) method can model collinear and nonlinear data effectively with well prediction performance in terms of nonlinear LVs [18–20].

    • Feature vector regression with efficient hyperparameters tuning and geometric interpretation

      2016, Neurocomputing
      Citation Excerpt :

      These experiments are noted as CCPP-8000, EMC-3000, Protein-20000, SARCOS-10000 and SARCOS-20000. The benchmark methods subject to comparison are Kernel Gaussian Process (KGP) [29], Kernel Partial Least Square (KPLS) [30], KRR [40], RRKRR [2], Relevance Vector Machine (RVM) [41], SVR [38], Kernel-based Recursive Least-Squares Tracker (KRLST) algorithm [56] and Quantized Kernel Least Mean Square (QKLMS) algorithm [57]. In the experiments: the KMBOX-0.9 Matlab toolbox (http://sourceforge.net/projects/kmbox/files/) is used to realize the simulation with KRR; the Spider Matlab toolbox (http://people.kyb.tuebingen.mpg.de/spider/main.html) is used for simulations with KGP, KPLS, RVM, KRLST and QKLMS; the Libsvm Matlab toolbox (https://www.csie.ntu.edu.tw/~cjlin/libsvm/) is used for the simulations with SVR.

    • Machine Learning Applications for The Tensile Property Evaluation of Steel: An Overview

      2022, Handbook of Smart Materials, Technologies, and Devices: Applications of Industry 4.0: Volume 1-3

    Shuyuan Yang received the B.S. degree in Electrical Engineering from Xidian University, Xi’an, China, in 2000; the M.S. degree and Ph.D. degree in Circuit and System from Xidian University, Xi’an, China, in 2003 and 2005, respectively. She is currently working towards as a postdoctoral fellowship at the Institute of Intelligent Information Processing, Xidian University, China. Her main current research interests are intelligent signal processing, machine learning and image processing.

    Lixia Yang received the B.S. degree in Mathematics Education from Ningxia University, Yinchuan, China, in 2004; the M.S. degree in Applied Mathematics from Ningxia University, Yinchuan, China, in 2007. She is currently pursuing the Ph.D. degree at the Institute of Intelligent Information Processing, Xidian University, China. Her main current research interests are machine learning and image processing.

    Min Wang received the B.S. degree in autocontrol from Xidian University, Xi’an, China, 2000; the M.S. degree and Ph.D. degree in Signal and Information Processing from Xidian University, Xi’an, China, in 2003 and 2005, respectively. Now he is working as an instructor in National Lab. of radar signal processing, Xidian University. His research interests include radar signal processing, statistical signal processing and impulse radio.

    Licheng Jiao received the B.S. degree from Shanghai Jiaotong University, Shanghai, China, in 1982 and the M.S. and Ph.D. degrees from Xi’an Jiaotong University, Xi’an, China, in 1984 and 1990, respectively. He is currently the Professor and Dean of the Electronic Engineering School at Xidian University. His research interests include neural networks, data mining, nonlinear intelligence signal processing, and communication.

    View full text