Least squares twin support vector hypersphere (LS-TSVH) for pattern recognition

doi:10.1016/j.eswa.2010.05.045

Expert Systems with Applications

Volume 37, Issue 12, December 2010, Pages 8371-8378

https://doi.org/10.1016/j.eswa.2010.05.045 Get rights and content

Abstract

The twin support vector hypersphere (TSVH) is a novel efficient pattern recognition tool, because it determines a pair of hyperspheres by solving two related SVM-type problems, each of which is smaller than in a classical SVM. In this paper we formulate a least squares version for this classifier, termed as the least squares twin support vector hypersphere (LS-TSVH). This formulation leads to extremely simple and fast algorithm for generating binary classifier based on a pair of hyperspheres. Due to equality type constraints in the formulation, the solution follows from solving two sets of nonlinear equations, instead of the two dual quadratic programming problems (QPPs) for TSVH. We show that the two sets of nonlinear equations are solved using the well-known Newton downhill algorithm. The effectiveness of proposed LS-TSVH is demonstrated by experimental results on several artificial and benchmark datasets.

Introduction

Support vector machine (SVM) is an excellent kernel-based tool for binary data classification (Burges, 1998, Christianini and Shawe-Taylor, 2002, Vapnik, 1995, Vapnik, 1998). This learning strategy introduced by Vapnik, 1995, Vapnik, 1998 is a principled and very powerful method in machine learning algorithms. Within a few years after its introduction SVM has already outperformed most other systems in a wide variety of applications. These include a wide spectrum of research areas, ranging from pattern recognition (Osuna, Freund, & Girosi, 1997a), text categorization (Joachims, Ndellec, & Rouveriol, 1998), biomedicine (Brown, Grundy, & Lin, 1997), brain–computer interface (Ebrahimi, Garcia, & Vesin, 2003), and financial applications (Ince & Trafalis, 2002), etc.

The theory of SVM is based on the idea of structural risk minimization (SRM) principle (Burges, 1998, Vapnik, 1995, Vapnik, 1998). Generally, the hyperplane is obtained by solving a quadratic programming problem (QPP). One of the main challenges in classical SVM is that it requires large training time for huge database as it has to optimize a computationally expensive cost function. The performance of a trained SVM classifier also depends on the optimal parameter set which is usually found by cross-validation on a tuning set. The large training time of SVM also prevents one to locate optimal parameter set from a very fine grid of parameters over large span. To remove these drawbacks, various algorithms and versions of SVM have been reported with comparable classification abilities, including the Chunking algorithm (Cortes & Vapnik, 1995), decomposition method (Osuna, Freund, & Girosi, 1997b), sequential minimal optimization (SMO) approach (Keerthi et al., 2001, Platt, 1999), geometric algorithms (Keerthi et al., 2000, Mavroforakis and Theodoridis, 2007, Tao et al., 2008), and least squares SVM (LS-SVM) (Suykens and Vandewalle, 1999, Suykens et al., 1999), etc.

All the above classifiers discriminate a pattern by determining in which half space it lies. Recently, Jayadeva, Khemchandani, and Chandra (2007) have proposed a twin support vector machine (TSVM) classifier for binary data classification, which is in the spirit of generalized eigenvalue proximal support vector machine (GEPSVM) (Mangasarian & Wild, 2006). The formulation of TSVM is very much similar to the classical SVM except that it aims at generating two non-parallel planes such that each plane is closer to one class and is as far as possible from the other. TSVM has become one of the popular methods in machine learning because of its low computational complexity, such as Ghorai et al., 2009, Kumar and Gopal, 2008, Kumar and Gopal, 2009. However, TSVM also requires inversion of matrix of size (l + 1) × (l + 1) twice along with two QPPs to be solved.

Recently, we have proposed a new hypersphere classifier, termed as the twin support vector hypersphere (TSVH) (Peng, submitted for publication). TSVH aims at generating two hyperspheres in the feature space such that each hypersphere contains as much as possible of samples of one of the two classes and is as far as possible from the other. Similar to TSVM, TSVH solves two smaller sized QPPs instead of solving large one as in the classical SVM. However, the formulation of TSVH is totally different from that of TSVM, in which the matric inversions in the objective functions of dual QPPs of TSVM are avoided, indicating a low computational cost to train it. Besides, it derives the uniform formulations for the linear and nonlinear cases compared with TSVM. These two differences means that TSVH not only runs faster than TSVM, but also has a more concise computer programming than TSVM.

In the spirit of Suykens and Vandewalle, 1999, Suykens et al., 1999, in this paper we formulate a least squares version of TSVH for classification problems, namely as the least squares twin support vector hypersphere (LS-TSVH). We first consider the primal QPPs of TSVH in least squares sense and solve them with equality constraints instead of inequalities of TSVH. As a result the solution of LS-TSVH follows directly from solving two sets of nonlinear equations as opposed to solving two dual QPPs in TSVH. We then show that the pair of sets of nonlinear equations can be solved by using the well-known Newton downhill method. Computational comparisons of LS-TSVH against LS-TSVM (Kumar & Gopal, 2009), LS-SVM (Suykens & Vandewalle, 1999), TSVH (Peng, submitted for publication), TSVM (Jayadeva et al., 2007) and SVM, in terms of classification accuracy and computing time, have been made on several artificial and benchmark datasets, indicating this algorithm can accurately and fastly solve large datasets.

The paper is organized as follows: Section 2 briefly introduces SVM, TSVM, and our TSVH. Section 3 first proposes the least squares twin support vector hypersphere, and then presents the fast learning algorithm for LS-TSVH based on the well-known Newton downhill method. Section 4 deals with some experimental results and Section 5 concludes the paper.

Section snippets

Support vector machine

As a state-of-the-art of machine learning algorithm, SVM is based on guaranteed risk bounds of statistical learning theory (Vapnik, 1995, Vapnik, 1998) which is known as SRM principle. Compared to other methods, SVM has showed excellent performance in pattern recognition tasks. In the simplest binary pattern recognition tasks, SVM uses a linear separating hyperplane to create a classifier with maximal margin. Consider a binary classification problem with data set D = {(x₁, y₁), … , (x_l, y_l)}, where $x_{i} \in$

Least squares twin support vector hypersphere

In this section we introduce a least squares version to TSVH classifier using the same idea as LS-SVM (Suykens and Vandewalle, 1999, Suykens et al., 1999) by formulating the classification problem as: $\begin{matrix} \min \frac{1}{2} \sum_{i \in I^{+}} ‖ φ (x_{i}) - c_{+} ‖^{2} - ν_{1} R_{+}^{2} + C_{1} \sum_{j \in I^{-}} ξ_{j}^{2} \\ s . t . ‖ φ (x_{j}) - c_{+} ‖^{2} = R_{+}^{2} - ξ_{j}, j \in I^{-}, \end{matrix}$ $\begin{matrix} \min \frac{1}{2} \sum_{j \in I^{-}} ‖ φ (x_{j}) - c_{-} ‖^{2} - ν_{2} R_{-}^{2} + C_{2} \sum_{i \in I^{-}} ξ_{i}^{2} \\ s . t . ‖ φ (x_{i}) - c_{-} ‖^{2} = R_{-}^{2} - ξ_{i}, i \in I^{+} . \end{matrix}$ Here the pair of QPPs (24), (25) use the square of 2-norm of slack variables in the objective functions instead of 1-norm as used in (16), (17), which makes the

Experimental results

To test the performance of our LS-TSVH we investigate results in terms of accuracy and execution time on several artificial and publicly available benchmark data sets from the UCI Repository (Blake & Merz, 1998) which are commonly used in testing machine learning algorithms. All the classification methods are implemented in MATLAB 6.5 (MATLAB, 1994) on Windows XP running on a PC with system configuration Intel P4 processor (2.4 GHz) with 1 GB of RAM. We compare the performances of SVM (Vapnik,

Conclusion and further work

The recently proposed twin support vector hypersphere (TSVH) is a novel efficient classifier. It determines a pair of hyperspheres by solving two related SVM-type problems, each of which is smaller than in a classical SVM. The formulation of TSVH is totally different from that of TSVM, in which the matric inversions in the objective function of dual QPPs of TSVM are avoided. In this paper, in the spirit of LS-SVM, we have formulated a novel least squares TSVH (LS-TSVH). LS-TSVH is an extremely

Acknowledgements

This work has been partly supported by the Shanghai Leading Academic Discipline Project (No. S30405), and the Natural Science Foundation of Shanghai Normal University (No. SK200937).

References (26)

S. Ghorai et al.
Nonparallel plane proximal classifier
Signal Processing
(2009)
M.A. Kumar et al.
Application of smoothing technique on twin support vector machines
Pattern Recognition Letters
(2008)
Q. Tao et al.
A general soft method for learning SVM classifiers with L₁-norm penalty
Pattern Recognition
(2008)
Blake, C. I., & Merz, C. J. (1998). UCI repository for machine learning databases....
M.P.S. Brown et al.
Knowledge-based analysis of microarray gene expression data by using support vector machine
Proceedings of the National Academy of Sciences of the United States of America
(1997)
C.J.C. Burges
A tutorial on support vector machines for pattern recognition
Data Mining and Knowledge Discovery
(1998)
V. Christianini et al.
An introduction to support vector machines
(2002)
C. Cortes et al.
Support vector networks
Machine Learning
(1995)
T. Ebrahimi et al.
Joint time-frequency-space classification of EEG in a brain–computer interface application
Journal on Applied Signal Processing
(2003)
Ince, H., & Trafalis, T. B. (2002). Support vector machine for regression and applications to financial forecasting. In...

Jayadeva et al.

Twin support vector machines for pattern classification

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2007)

Joachims, T., Ndellec, C., & Rouveriol, C. (1998). Text categorization with support vector machines: Learning with many...

S.S. Keerthi et al.

A fast iterative nearest point algorithm for support vector machine classifier design

IEEE Transactions on Neural Networks

(2000)

Cited by (32)

Sparse least-squares Universum twin bounded support vector machine with adaptive L<inf>p</inf>-norms and feature selection
2024, Expert Systems with Applications
In data analysis, when attempting to solve classification problems, we may encounter a large number of features. However, not all features are relevant for the current classification, and including irrelevant features can occasionally degrade learning performance. As a result, selecting the most relevant features is critical, especially for high-dimensional data sets in classification problems. Feature selection is an effective method for resolving this issue. It attempts to represent the original data by extracting relevant features containing useful information. In this research, our aim is to propose a $p$ -norm least-squares Universum twin bounded support vector machine (LS $_{p}$ - $U$ TBSVM) to perform classification and feature selection at the same time. Indeed, the proposed method, which outperforms the traditional least-squares Universum twin bounded support vector machine, can achieve good classification accuracy in a reasonable amount of time while also providing a sparse solution. The model we propose is an adaptive learning procedure with $p$ -norm $(0 < p < 1)$ , where the parameter $p$ can be automatically selected by the data set. The algorithm we use to find the approximate solution of this model involves solving systems of linear equations. Furthermore, we obtain new bounds for the absolute values of non-zero components of a local optimal solution. These bounds allow us to remove the zero components from an arbitrary numerical solution. Setting the parameter $p$ , LS $_{p}$ - $U$ TBSVM improves classification accuracy and selects the relevant features. Numerical experiments on a handwritten digit recognition, University of California Irvine (UCI) benchmark, Normally Distributed Clusters (NDC) and high dimensional data sets confirm the superiority of the proposed method in the accuracy of classification and the selection of relevant features in comparison with some popular methods.
Hessian scatter regularized twin support vector machine for semi-supervised classification
2023, Engineering Applications of Artificial Intelligence
Citation Excerpt :
It implements the principle of structural risk minimization (SRM) rather than the principle of empirical risk minimization (ERM) (Brown et al., 2000; Sastry, 2002; Cortes and Vapnik, 1995). Based on the solid mathematical theoretical foundation of SVM, many researchers have proposed many excellent SVM variational methods from different perspectives, which have been widely used in many fields (Suykens and Vandewalle, 1999; Mangasarian and Wild, 2006; Jayadeva et al., 2007; Kumar and Gopal, 2009; Shao et al., 2011; Peng and Xu, 2014; Peng, 2010). However, although SVM can achieve good classification performance, it needs to solve the large-scale quadratic programming problem (QPP), which seriously hinders the application of SVM in large-scale classification tasks to a certain extent. 1
Currently, semi-supervised twin support vector machine based on Laplacian regularization (LapTSVM) have received extensive attention and research in many fields of machine learning. Unfortunately, Laplacian regularization has a constant null space, so the solution is often a constant function and cannot well maintain the local topology of the samples. Aiming the above urgent problems, this paper, we first construct a Hessian scatter regularization (HSR) term. HSR has two major advantages: (1) HSR prefers linear variation in function values along of the geodesic distance and maintains the local manifold structure of the samples well. (2) HSR tries to find the projection from the original space to the feature space to maximize the inter-class scatter and minimize the intra-class scatter of the samples; the scatter is regarded as the discriminative information (structural information) of samples. Secondly, by introducing HSR, we propose a Hessian scatter regularized twin support vector machine (HSR-TSVM). Compared with LapTSVM, HSR-TSVM uses the global and local structure information of the sample to overcome the shortcomings of insufficient extrapolation caused by Laplacian regularization, while retaining almost all the advantages of the classic LapTSVM. Furthermore, to improve the computational efficiency of HSR-TSVM, the least-squares version of HSR-TSVM, namely HSR-LSTSVM, is proposed, and the conjugate gradient method is used to solve it. Experimental results on four synthetic datasets, ten UCI datasets, and four image datasets show that the proposed methods are competitive with semi-supervised learning methods based on Laplacian regularization.
An irrelevant attributes resistant approach to anomaly detection in high-dimensional space using a deep hypersphere structure
2022, Applied Soft Computing
It is a grand challenge to detect anomalies existing in subspaces from a high-dimensional space. Most existing state-of-the-art methods implicitly or explicitly rely on distances. Since the contrast, e.g., distances, between data objects in a high-dimensional space becomes more and more similar. Moreover, high-dimensional spaces may include many irrelevant attributes masking anomalies (if the prior probability for a class remains unchanged regardless of the value observed for attribute att, att is said to be irrelevant to a class, i.e., att is an irrelevant attribute). Obviously, anomalies can exist in any of subspaces, so it is difficult to select subspaces that highlight the relevant attributes in an exponential searching space. To address this issue, we proposed a hybrid method consisting of a deep network and a hypersphere to detect anomalies. The deep network in the proposed method is used as a feature extractor to capture the low-dimensional features from the background space. Then, anomalies are separated by using the hypersphere in the feature space reconstructed by probability distribution. To prevent irrelevant attributes from being mistaken for anomalies during mining anomalies, the upper of the number of anomalies is estimated by the Chebyshev theorem. Finally, the proposed method was verified on synthetic datasets and real-world datasets. Experimental results show that the proposed method outperforms the existing state-of-the-art detection methods in regard to the accuracy of mining anomalies and the ability of noise resistance. We find that feature extractors can improve the ability of noise resistance for anomalous detection methods. In the feature space reconstructed by probability distribution, anomalous features are easily identified from irrelevant features and normal features. We also indicate that irrelevant attributes increase the complexity of the feature space, through calculating the probability distribution of data in the background space, the layered features can be extracted to distinguish anomaly classes, normal classes, and irrelevant attribute classes.
Capped L<inf>1</inf>-norm distance metric-based fast robust twin bounded support vector machine
2020, Neurocomputing
Citation Excerpt :
This embodies the essence of statistical learning theory, so this modification can improve the classification performance of TSVM. Due to the performance of TSVM, some excellent algorithms based on TSVM have been proposed for pattern recognition and regression problems in recent years [8–10]. For example, Kumar et al. [8] modify TSVM to replace inequality constraints with equality constraints and propose a least squares version of TSVM (LSTSVM).
In this paper, to improve the performance of capped $L_{1}$ -norm twin support vector machine (CTSVM), we first propose a new robust twin bounded support vector machine (RTBSVM) by introducing the regularization term. The significant advantage of our RTBSVM over CTSVM is that the structural risk minimization principle is implemented. This embodies the marrow of statistical learning theory, so this modification can improve the performance of classification. Furthermore, to accelerate the computation of RTBSVM and simultaneously inherit the merit of robustness, we construct a least squares version of RTBSVM (called RTBSVM). This formulation leads to a simple and fast algorithm for binary classifiers by solving just two systems of linear equations. Finally, we derive two simple and effective iterative optimization algorithms for solving RTBSVM and FRTBSVM, respectively. Simultaneously, we theoretically rigorously analyze and prove the computational complexity, local optimality and convergence of the algorithms. Experimental results on one synthetic dataset and nine UCI datasets demonstrate that our methods are competitive with other methods. Additionally, the FRTBSVM is directly applied to recognize the purity of hybrid maize seeds using near-infrared spectral data. Experiments show that our method achieves better performance than the traditional methods in most spectral regions.
Twin Neural Networks for the classification of large unbalanced datasets
2019, Neurocomputing
Citation Excerpt :
The TWSVM has also been used for regression [8–10], and has proven to be efficient even in the primal formulation [11,12]. Some of the extensions of the TWSVM include Twin Spheres SVM [13], Knowledge based Least Squares TWSVM [14], Margin Based TWSVM [15], ϵ-TWSVM [16], Twin Parametric Margin Classifier [17], Least-squares Twin Parametric-Margin Classifier [18], Twin Support Vector Hypersphere Classifier and its variants [19–24], Structural TWSVM [25], Wavelet TWSVM [26], Lagrangian TWSVM for classification [27] and regression [28,29], Laplacian TWSVM and its variants [30–35], pinball loss TWSVM [36], L2 P-norm distance TWSVM [37], angle TWSVM [38,39], fuzzy TWSVM and its variants[40,41] among others. Large-scale variants of the TWSVM have also been presented, such as the Stochastic Gradient Descent TWSVM [42,43], coordinate descent TWSVM [44] and hashing based TWSVM [45].
Twin Support Vector Machines (TWSVMs) have emerged as an efficient alternative to Support Vector Machines (SVM) for learning from imbalanced datasets. The TWSVM learns two non-parallel classifying hyperplanes by solving a couple of smaller sized problems. However, it is unsuitable for large datasets, as it involves matrix operations. In this paper, we discuss a Twin Neural Network (Twin NN) architecture for learning from large unbalanced datasets. The objective functions of the networks in the Twin NN are designed to realize the idea of the Twin SVM with non-parallel decision boudaries for the respective classes, while also being able to reduce model complexity. The Twin NN optimizes the feature map, allowing for better discrimination between classes. The paper also discusses an extension of the Twin NN for multiclass datasets. This architecture trains as many neural networks as the number of classes, and has the additional advantage that it does not have any hyper-parameter which requires tuning. Results presented in the paper demonstrate that the Twin NN generalizes well and scales well on large unbalanced datasets.
Twin support vector machines: A survey
2018, Neurocomputing
Twin support vector machines (TWSVM) is a new machine learning method based on the theory of Support Vector Machine (SVM). Unlike SVM, TWSVM would generate two non-parallel planes, such that each plane is closer to one of the two classes and is as far as possible from the other. In TWSVM, a pair of smaller sized quadratic programming problems (QPPs) is solved, instead of solving a single large one in SVM, making the computational speed of TWSVM approximately 4 times faster than the standard SVM. At present, TWSVM has become one of the popular methods because of its excellent learning performance. In this paper, the research progress of TWSVM is reviewed. Firstly, it analyzes the basic theory of TWSVM, then tracking describes the research progress of TWSVM including the learning model and specific applications in recent years, finally points out the research and development prospects. This helps researchers to effectively use TWSVM as an emerging research approach, encouraging them to work further on performance improvement.

View all citing articles on Scopus

View full text

Least squares twin support vector hypersphere (LS-TSVH) for pattern recognition

Abstract

Introduction

Section snippets

Support vector machine

Least squares twin support vector hypersphere

Experimental results

Conclusion and further work

Acknowledgements

Signal Processing

Pattern Recognition Letters

Pattern Recognition

Knowledge-based analysis of microarray gene expression data by using support vector machine

Proceedings of the National Academy of Sciences of the United States of America

A tutorial on support vector machines for pattern recognition

Data Mining and Knowledge Discovery

An introduction to support vector machines

Support vector networks

Machine Learning

Joint time-frequency-space classification of EEG in a brain–computer interface application

Journal on Applied Signal Processing

Twin support vector machines for pattern classification

IEEE Transactions on Pattern Analysis and Machine Intelligence

A fast iterative nearest point algorithm for support vector machine classifier design

IEEE Transactions on Neural Networks