Applying norm concepts for solving interval support vector machine
Introduction
Support vector machines (SVMs) as introduced by Vapnik in 1995 [1] within the area of statistical learning theory. SVMs are very popular and powerful in learning systems. Over the years, a wide variety of numerical optimization algorithms have been proposed for SVM learning [2], [3], [4], [5]. However, these traditional algorithms could not apply to digital computers since required the computing time for a solution which is greatly dependent on the dimension and structure of the problem and the complexity of solving method. The property that distinguishes SVMs from other nonparametric techniques is that they are based on structural risk minimization [1], [6], [7]. The pattern recognition methods try to minimize the misclassification errors on the training set. But, SVMs minimize the structural risk that is the probability of misclassifying a previously unseen sample drawn randomly from a fixed but unknown probability distribution [8]. Many researcher have focused on sparse learning for SVM. For this goal, L2-norm, L1-norm, L0-norm or even L∞-norm have been trained as a regularization term in SVMs [9], [10]. Recently, neural network and deep learning have been trained to achieve the best solutions to many problems in image recognition, speech recognition, natural language processing and classification. Most of these models use the softmax activation function for minimize the loss function but by using SVMs (especially linear) in combination with deep learning usually have been improved performance [11].
The learning of models from imprecise data, such as interval data has gained increasing interest in recent years [1], [12], [13], [14]. Sometimes the exact value of a variable is hidden on purpose due to some confidentiality reasons. In such cases, intervals are considered as disjunctive sets which represent incomplete information. Learning the model from imprecise and uncertain data also requires the extension of corresponding learning algorithms. Unfortunately, this is often done without clarifying the actual meaning of an observation, although representations of interval sets can be interpreted in different ways [15]. Some applications of the problems of interval data are in regression analysis, time series, hypothesis testing and principal component analysis. There is some literature, which presented available algorithms and models to learn interval-valued data [16], [17], [18], [19]. Fig. 1 is shown an example of interval-valued data. Each box shows interval-valued of a sample. The classes are disjointed as * and + where are denoted in the center of each data. Also, the point “ · ” demonstrates the arbitrary data in each box. de Souza and De Carvalho introduced clustering methods for interval data based on the dynamic cluster algorithm [18]. de Carvalho et al. presented a partitioning clustering method for objects which are described by interval data [19]. Furthermore,L2-norm and L∞-norm SVMs are used to construct classification algorithms based on different forms of SVMs to deal with interval-valued training data in [17].
The hard margin SVM optimization problem can be formulated as follows [1]: where is a set of l training samples, is an m-dimensional sample in the input space, is the class label of xi and b is the bias that is a scalar. Fig. 2(a) shows the concept of hard margin SVM problem in the linear case. Moreover, the soft linear separable case of SVM is defined as follows and is shown in Fig. 2(b). Inputs of soft margin SVM system are the training data and a constant value C. The system calculates proper slack variables ξi and determines the separating hyperplane. ξi is the training error corresponding to data sample xi. Also, the quantity Cξi is considered as the “penalty” for any data point xi that either lie within the margin on the correct side of the hyperplane (ξi ≤ 1) or on the wrong side of the hyperplane (ξi > 1). Increasing the values of slack variables help to reduce the effect of noisy support vectors. SVMs find the optimal separating hyperplane with the minimal classification error. Let w and b denote the optimum values of the weight vector and bias, respectively. The hyperplane can be represented as: that and ; w is the normal vector of the hyperplane and b is the bias. The main innovation of the proposed method is to model the noisy samples and parameters by using interval number in SVM. For this purpose, the data points and penalty coefficient are considered as interval-valued. We present ISVM and reformulate it into an interval quadratic optimization problem. Then, the optimization problem is divided into two quadratic programming problems in a normal way (no interval). By solving these two quadratic programs, two hyperplanes are obtained which they construct the solution range of the problem. Some classifier rules to classify the data points are given. The experiments show that the proposed method is effective to use for all datasets such as interval-valued, noisy, and normal (no interval).
The rest of the paper is organized as follows. In Section 2, ISVM is presented and shows that ISVM is equal to solve an interval quadratic program. In Section 3, some requirements of vectors and matrix norms are investigated. Section 4 introduces proposed method. The interval quadratic program is reformulated into two normal (no interval) quadratic optimization programs. In Section 5, the parameter setting is given, and the normal and interval of some parameters are discussed. Section 6, is evaluated the solutions. Also, some classifier rules for decision makers are given there. Numerical experiments are given in Section 7. Finally, the paper is concluded in Section 7.
Section snippets
Interval Support Vector Machine (ISVM)
In this section, ISVM is introduced. In fact, SVM (1) is reformulated as ISVM and then is reduced to an interval quadratic optimization problem. A superscript, “ ∼ ”, on a quantity indicates that the quantity has an interval value (number, vector, or matrix). Quantities without a superscript have real (numbers, vectors, or matrices). We say that a vector if for all and a matrix if for all . In the whole paper, ISVM is introduced
Concepts of vectors and matrix norms
The proposed approach is based on concepts of vectors and matrix norms. For this purpose some concepts of vectors and matrix norms are expressed in this section.
Definition 3.1 Let x be an n-vector in . Then, a vector norm is denoted by ‖x‖, which is a real continuous function of components x, and has the following properties:
‖x‖ ≥ 0, for all . if and only if, . for all and . for all .
Also, the popular vector norms are as follows:
Description of method
According to the last section, the interval programming problem (4) can be transformed into the matrix norm form as the following: To obtain the interval bound of objective values, it suffices to obtain the lower bound (Z1) and the upper bound (Z2) of the objective values of problem (5). For reach to this end, we can consider two regions as follows: The largest feasible region (SU) and the least feasible region (SL) of problem (5) is defined as below:
Interval parameters toward normal parameters in SVM and parameters setting
In this section, we make a detail description about the effect of interval value penalty coefficient and the input datasets.
SVM is a powerful tool to solve classification problems but has still some limitations [22]. It is well-known that SVM generalization performance (estimation accuracy) depends on a fine tuning of parameter C. The parameter C is a regularization parameter which controls the trade-off between model complexity and the parameter ξ which is training error (the second term) in
Classifier rules
In this section, some rules to classify data are given. In fact, we provide some classifier rules to assign data to appropriate classes. Three decision regions exist, and they are shown in Fig. 5. It demonstrates that the classifier rules can divide samples into three categories include class 1, 2, and outlier data. Here, we find an area including the optimal hyperplane instead of a constant hyperplane because of being the interval of samples. The achieved area is determined by two hyperplanes
Numerical experiments
Here, we investigate the performance of proposed approach and compare it with other well-known algorithms by using synthetic data and standard UCI benchmark datasets (normal and noisy). The proposed method has been examined in several aspects. The purpose of Example 1 is to demonstrate the geometric representation of the proposed method on syntactic data. Example 2 illustrates the efficiency of our proposed method in comparison with traditional algorithms in different datasets. Examples 3 and 4
Conclusions
In this paper, some norm concepts were applied to solve ISVM. We investigated ISVM problem and reformulated the problem into two quadratic optimization problems. We showed that two hyperplanes exist to classify interval data which the lower and the upper hyperplanes are obtained from the solution of the quadratic optimization problems. Also, we gave some classifier rules for decision-making to classify data appropriately. In contrast to the existing methods to solve this problem in the normal
Mojtaba Baymani is an Associate Professor of Applied Mathematics at Department of Computer and Mathematics, Quchan University of Technology, Quchan, Iran. My principal research interests include neural networks, optimization, partial differential equation and support vector machine. I received the B.S. degree in mathematics teacher from Shahid Chamran University of Ahvaz, Khuzestan, Iran in 2000. and the M.A. in applied mathematics from Hakim Sabzevari University, Sabzevar, Razavi Khorasan,
References (25)
- et al.
Sparse learning for support vector classification
Pattern Recognit. Lett.
(2010) - et al.
An incremental support vector machine-trained TS-type fuzzy system for online classification problems
Fuzzy Sets Syst.
(2011) - et al.
Robust support vector machine-trained fuzzy system
Neural Netw.
(2014) Learning from imprecise and fuzzy observations: data disambiguation through generalized loss minimization
Int. J. Approx. Reason.
(2014)- et al.
Support vector machines for interval discriminant analysis
Neurocomputing
(2008) - et al.
Binary classification SVM-based algorithms with interval-valued training data using triangular and epanechnikov kernels
Neural Netw.
(2016) - et al.
Clustering of interval data based on city–block distances
Pattern Recognit. Lett.
(2004) - et al.
A robust least squares support vector machine for regression and classification with noise
Neurocomputing
(2014) Statistical Learning Theory
(1998)New support vector algorithms with parametric insensitive/margin model
Neural Netw.
(2009)
Improvements to the SMO algorithm for SVM regression
IEEE Trans. Neural Netw.
New support vector algorithms
Neural Comput.
Cited by (4)
Multi-variable grey model based on dynamic background algorithm for forecasting the interval sequence
2020, Applied Mathematical ModellingCitation Excerpt :Many researches have proposed interval forecasting models. For example, Baymani et al. [1] and Xiong et al. [2] proposed interval forecasting methods based on the support vector regression model. Ronay et al. [3] and Galvan et al. [4] proposed the methods based on the neural network for the interval forecasting of the wind and solar power generation.
Applying data mining techniques to predict vitamin D deficiency in diabetic patients
2023, Health Informatics JournalA survey of robust optimization based machine learning with special reference to support vector machines
2020, International Journal of Machine Learning and CyberneticsAn efficient algorithm to improve the accuracy and reduce the computations of LS-SVM
2020, Iranian Journal of Numerical Analysis and Optimization
Mojtaba Baymani is an Associate Professor of Applied Mathematics at Department of Computer and Mathematics, Quchan University of Technology, Quchan, Iran. My principal research interests include neural networks, optimization, partial differential equation and support vector machine. I received the B.S. degree in mathematics teacher from Shahid Chamran University of Ahvaz, Khuzestan, Iran in 2000. and the M.A. in applied mathematics from Hakim Sabzevari University, Sabzevar, Razavi Khorasan, Iran in 2003 and Ph.D. in applied mathematics from Department of Applied Mathematics, Ferdowsi University of Mashhad, Mashhad, Iran in 2010.
Nima Salehi Moghaddami is currently a Ph.D. candidate in Artificial Intelligence and Robotics at Ferdowsi University of Mashhad (FUM). His research focuses on online learning, Deep learning, kernel methods, optimization and low-rank approximation. He received the B.S. degree in Computer Hardware Engineering from Sadjad Institute of Higher Education, Mashhad, Iran, in 2008, and awarded to the top student and then received M.Sc. in Artificial Intelligence from Ferdowsi University of Mashhad, Iran in 2011.
Amin Mansoori received the B.S. degree in applied mathematics from Ferdowsi University of Mashhad, Mashhad, Iran, in 2012 and the M.S. degree in applied mathematics-optimal control and optimization from Ferdowsi University of Mashhad, Mashhad,Iran, in 2014. He is currently working toward the Ph.D. degree in applied mathematics-optimal control and optimization with Ferdowsi University of Mashhad. His research interests include mathematical modelling, optimization, optimal control, fuzzy mathematics, and neural networks.