Elsevier

Neurocomputing

Volume 311, 15 October 2018, Pages 41-50
Neurocomputing

Applying norm concepts for solving interval support vector machine

https://doi.org/10.1016/j.neucom.2018.05.046Get rights and content

Abstract

In this paper, a new variation of Support Vector Machine (SVM) is proposed which is named Interval Support Vector Machine (ISVM). The uncertainty of input data is one of the main problems to handle the real-world problems. Uncertainty factors could be a result of random variables existence, incorrect or imperfect data, and approximations instead of measurements or incomparability of data that is very dependent on different measurement or observation conditions. Interval numbers generally can be used for the representation of real data. In this version, SVM with interval samples is presented and reformulation of SVM performs real sample classification. Here, ISVM could be interchanged with an interval quadratic programming problem which itself can be converted to a pair of the so-called two-level mathematical programs based on norm concepts. The two-level mathematical programs are formed to find the upper and the lower bounds corresponding to the objective values of the interval quadratic program. Numerical experiments are examined for variant datasets such as normal, noisy, and interval-valued dataset to illustrate the performance of our approach.

Introduction

Support vector machines (SVMs) as introduced by Vapnik in 1995 [1] within the area of statistical learning theory. SVMs are very popular and powerful in learning systems. Over the years, a wide variety of numerical optimization algorithms have been proposed for SVM learning [2], [3], [4], [5]. However, these traditional algorithms could not apply to digital computers since required the computing time for a solution which is greatly dependent on the dimension and structure of the problem and the complexity of solving method. The property that distinguishes SVMs from other nonparametric techniques is that they are based on structural risk minimization [1], [6], [7]. The pattern recognition methods try to minimize the misclassification errors on the training set. But, SVMs minimize the structural risk that is the probability of misclassifying a previously unseen sample drawn randomly from a fixed but unknown probability distribution [8]. Many researcher have focused on sparse learning for SVM. For this goal, L2-norm, L1-norm, L0-norm or even L-norm have been trained as a regularization term in SVMs [9], [10]. Recently, neural network and deep learning have been trained to achieve the best solutions to many problems in image recognition, speech recognition, natural language processing and classification. Most of these models use the softmax activation function for minimize the loss function but by using SVMs (especially linear) in combination with deep learning usually have been improved performance [11].

The learning of models from imprecise data, such as interval data has gained increasing interest in recent years [1], [12], [13], [14]. Sometimes the exact value of a variable is hidden on purpose due to some confidentiality reasons. In such cases, intervals are considered as disjunctive sets which represent incomplete information. Learning the model from imprecise and uncertain data also requires the extension of corresponding learning algorithms. Unfortunately, this is often done without clarifying the actual meaning of an observation, although representations of interval sets can be interpreted in different ways [15]. Some applications of the problems of interval data are in regression analysis, time series, hypothesis testing and principal component analysis. There is some literature, which presented available algorithms and models to learn interval-valued data [16], [17], [18], [19]. Fig. 1 is shown an example of interval-valued data. Each box shows interval-valued of a sample. The classes are disjointed as * and + where are denoted in the center of each data. Also, the point “ · ” demonstrates the arbitrary data in each box. de Souza and De Carvalho introduced clustering methods for interval data based on the dynamic cluster algorithm [18]. de Carvalho et al. presented a partitioning clustering method for objects which are described by interval data [19]. Furthermore,L2-norm and L-norm SVMs are used to construct classification algorithms based on different forms of SVMs to deal with interval-valued training data in [17].

The hard margin SVM optimization problem can be formulated as follows [1]: min12w2s.t.yi(w.xi+b)1,i=1,2,,l,where S={(x1,y1),(x2,y2),,(xl,yl)} is a set of l training samples, xiRm is an m-dimensional sample in the input space, yi{1,+1} is the class label of xi and b is the bias that is a scalar. Fig. 2(a) shows the concept of hard margin SVM problem in the linear case. Moreover, the soft linear separable case of SVM is defined as follows and is shown in Fig. 2(b). min12w2+Ci=1lξis.t.yi(w.xi+b)1ξii=1,2,,lξi0,i=1,2,,l.Inputs of soft margin SVM system are the training data and a constant value C. The system calculates proper slack variables ξi and determines the separating hyperplane. ξi is the training error corresponding to data sample xi. Also, the quantity i is considered as the “penalty” for any data point xi that either lie within the margin on the correct side of the hyperplane (ξi ≤ 1) or on the wrong side of the hyperplane (ξi > 1). Increasing the values of slack variables help to reduce the effect of noisy support vectors. SVMs find the optimal separating hyperplane with the minimal classification error. Let w and b denote the optimum values of the weight vector and bias, respectively. The hyperplane can be represented as: wTx+b=0, that w=[w1,w2,,wm] and xi=[x1i,x2i,,xmi]; w is the normal vector of the hyperplane and b is the bias. The main innovation of the proposed method is to model the noisy samples and parameters by using interval number in SVM. For this purpose, the data points and penalty coefficient are considered as interval-valued. We present ISVM and reformulate it into an interval quadratic optimization problem. Then, the optimization problem is divided into two quadratic programming problems in a normal way (no interval). By solving these two quadratic programs, two hyperplanes are obtained which they construct the solution range of the problem. Some classifier rules to classify the data points are given. The experiments show that the proposed method is effective to use for all datasets such as interval-valued, noisy, and normal (no interval).

The rest of the paper is organized as follows. In Section 2, ISVM is presented and shows that ISVM is equal to solve an interval quadratic program. In Section 3, some requirements of vectors and matrix norms are investigated. Section 4 introduces proposed method. The interval quadratic program is reformulated into two normal (no interval) quadratic optimization programs. In Section 5, the parameter setting is given, and the normal and interval of some parameters are discussed. Section 6, is evaluated the solutions. Also, some classifier rules for decision makers are given there. Numerical experiments are given in Section 7. Finally, the paper is concluded in Section 7.

Section snippets

Interval Support Vector Machine (ISVM)

In this section, ISVM is introduced. In fact, SVM (1) is reformulated as ISVM and then is reduced to an interval quadratic optimization problem. A superscript, “ ∼ ”, on a quantity indicates that the quantity has an interval value (number, vector, or matrix). Quantities without a superscript have real (numbers, vectors, or matrices). We say that a vector x˜Rn, if xiLx˜ixiU, for all i=1,2,,n, and a matrix A˜Rm×n, if aijLa˜ijaijU, for all i,j=1,2,,n. In the whole paper, ISVM is introduced

Concepts of vectors and matrix norms

The proposed approach is based on concepts of vectors and matrix norms. For this purpose some concepts of vectors and matrix norms are expressed in this section.

Definition 3.1

Let x be an n-vector in Cn. Then, a vector norm is denoted by ‖x‖, which is a real continuous function of components x, and has the following properties:

  • 1.

    x‖ ≥ 0, for all xCn.

  • 2.

    x=0, if and only if, x=0.

  • 3.

    αx=|α|x, for all xCn and αC.

  • 4.

    x+yx+y for all x,yCn.

Also, the popular vector norms are as follows: x1=i=1n|xi|,x2=xTx=(

Description of method

According to the last section, the interval programming problem (4) can be transformed into the matrix norm form as the following: minZ˜=12H˜α22fTαs.t.yTα=00αC˜.To obtain the interval bound of objective values, it suffices to obtain the lower bound (Z1) and the upper bound (Z2) of the objective values of problem (5). For reach to this end, we can consider two regions as follows: The largest feasible region (SU) and the least feasible region (SL) of problem (5) is defined as below: SU={α:yTα

Interval parameters toward normal parameters in SVM and parameters setting

In this section, we make a detail description about the effect of interval value penalty coefficient and the input datasets.

SVM is a powerful tool to solve classification problems but has still some limitations [22]. It is well-known that SVM generalization performance (estimation accuracy) depends on a fine tuning of parameter C. The parameter C is a regularization parameter which controls the trade-off between model complexity and the parameter ξ which is training error (the second term) in

Classifier rules

In this section, some rules to classify data are given. In fact, we provide some classifier rules to assign data to appropriate classes. Three decision regions exist, and they are shown in Fig. 5. It demonstrates that the classifier rules can divide samples into three categories include class 1, 2, and outlier data. Here, we find an area including the optimal hyperplane instead of a constant hyperplane because of being the interval of samples. The achieved area is determined by two hyperplanes h

Numerical experiments

Here, we investigate the performance of proposed approach and compare it with other well-known algorithms by using synthetic data and standard UCI benchmark datasets (normal and noisy). The proposed method has been examined in several aspects. The purpose of Example 1 is to demonstrate the geometric representation of the proposed method on syntactic data. Example 2 illustrates the efficiency of our proposed method in comparison with traditional algorithms in different datasets. Examples 3 and 4

Conclusions

In this paper, some norm concepts were applied to solve ISVM. We investigated ISVM problem and reformulated the problem into two quadratic optimization problems. We showed that two hyperplanes exist to classify interval data which the lower and the upper hyperplanes are obtained from the solution of the quadratic optimization problems. Also, we gave some classifier rules for decision-making to classify data appropriately. In contrast to the existing methods to solve this problem in the normal

Mojtaba Baymani is an Associate Professor of Applied Mathematics at Department of Computer and Mathematics, Quchan University of Technology, Quchan, Iran. My principal research interests include neural networks, optimization, partial differential equation and support vector machine. I received the B.S. degree in mathematics teacher from Shahid Chamran University of Ahvaz, Khuzestan, Iran in 2000. and the M.A. in applied mathematics from Hakim Sabzevari University, Sabzevar, Razavi Khorasan,

References (25)

  • S.K. Shevade et al.

    Improvements to the SMO algorithm for SVM regression

    IEEE Trans. Neural Netw.

    (2000)
  • B. Schölkopf et al.

    New support vector algorithms

    Neural Comput.

    (2000)
  • Cited by (4)

    • Multi-variable grey model based on dynamic background algorithm for forecasting the interval sequence

      2020, Applied Mathematical Modelling
      Citation Excerpt :

      Many researches have proposed interval forecasting models. For example, Baymani et al. [1] and Xiong et al. [2] proposed interval forecasting methods based on the support vector regression model. Ronay et al. [3] and Galvan et al. [4] proposed the methods based on the neural network for the interval forecasting of the wind and solar power generation.

    • An efficient algorithm to improve the accuracy and reduce the computations of LS-SVM

      2020, Iranian Journal of Numerical Analysis and Optimization

    Mojtaba Baymani is an Associate Professor of Applied Mathematics at Department of Computer and Mathematics, Quchan University of Technology, Quchan, Iran. My principal research interests include neural networks, optimization, partial differential equation and support vector machine. I received the B.S. degree in mathematics teacher from Shahid Chamran University of Ahvaz, Khuzestan, Iran in 2000. and the M.A. in applied mathematics from Hakim Sabzevari University, Sabzevar, Razavi Khorasan, Iran in 2003 and Ph.D. in applied mathematics from Department of Applied Mathematics, Ferdowsi University of Mashhad, Mashhad, Iran in 2010.

    Nima Salehi Moghaddami is currently a Ph.D. candidate in Artificial Intelligence and Robotics at Ferdowsi University of Mashhad (FUM). His research focuses on online learning, Deep learning, kernel methods, optimization and low-rank approximation. He received the B.S. degree in Computer Hardware Engineering from Sadjad Institute of Higher Education, Mashhad, Iran, in 2008, and awarded to the top student and then received M.Sc. in Artificial Intelligence from Ferdowsi University of Mashhad, Iran in 2011.

    Amin Mansoori received the B.S. degree in applied mathematics from Ferdowsi University of Mashhad, Mashhad, Iran, in 2012 and the M.S. degree in applied mathematics-optimal control and optimization from Ferdowsi University of Mashhad, Mashhad,Iran, in 2014. He is currently working toward the Ph.D. degree in applied mathematics-optimal control and optimization with Ferdowsi University of Mashhad. His research interests include mathematical modelling, optimization, optimal control, fuzzy mathematics, and neural networks.

    View full text