Construct a robust least squares support vector machine based on L-norm and L-norm
Introduction
In data mining and machine learning, Support Vector Machine (SVM) (Cortes and Vapnik, 1995) has become a popular classification technique. At present, SVM has been successfully applied to various practical problems, such as text categorization (Joachims et al., 1998, Ke et al., 2014), face detection (Tao et al., 2016), gene identification (Guyon et al., 2002), image segmentation (Chen and Wang, 2005), financial regression (Lin et al., 2006) and time series analysis (Muller et al., 1999) and so on. This classification technique can obtain an optimal separating hyper-plane by maximizing the margin between two parallel support planes and minimizing the classification error on training samples (Vapnik, 1998). However, searching for this plane needs to solve a quadratic programming problem (QPP) with inequality constraints, which spends much computational time and storage space. In order to reduce the time complexity and storage space, on the one hand, some algorithms were designed, such as sequential minimal optimization (SMO) algorithm (Platt John, 1999), SVMlight (Joachims, 1999), Libsvm (Chang and Lin, 2011) and so on. Specifically, the SMO algorithm divides the large QPP into a series of the smallest QPPs, which can avoid high time-consuming and handle large scale datasets. On the other hand, least squares Support Vector Machine (LSSVM) (Suykens and Vandewalle, 1999), twin Support Vector Machine (TWSVM) (Shao et al., 2014) were proposed. LSSVM is able to obtain good classification performance on many problems by solving linear equations as opposed to solving QPP (Mall and Suykens, 2015, Gao et al., 2016). Even so, it may suffer from two disadvantages. (i) It may ignore the edge points due to the use of -norm on the measurement of classification error. In some practical problems, edge points are often the important samples that need to be classified correctly, such as anomaly detection (Rolf, 2006, Ju et al., 2020). (ii) It would be ill-condition or singularity when the sample size is much less than feature number. That is to say, LSSVM is not suitable for small sample size (SSS) problem. The aim of this paper is to propose a modified LSSVM classifier to solve the above issues.
In general, the feature selection technique is a good way to avoid SSS problem. For example, -norm SVM (Mangasarian, 2006) with LP formulation can accomplish prediction and feature selection simultaneously. As an improvement of -norm SVM, -norm SVM (Kloft et al., 2011) has been presented. Experimental results showed that -norm not only makes the classifier more suitable for selecting features but also improves the classification performance (Jawanpuri et al., 2014). Inspired by -norm SVM, -norm LSSVM were proposed (Shao et al., 2018, Lu et al., 2013). -norm LSSVM released the least squares problem in primal space and achieved the feature selection and predictions effectively. However, it also brought in new issue meantime. The algorithm of -norm LSSVM suffered from higher time complexity and storage space.
In order to retain the advantages of feature selection function brought by -norm, while avoiding higher computational time and neglect on edge points, we combine -norm and -norm into the optimization model of LSSVM and construct a robust classification approach, named -L-LSSVM in short. -L-LSSVM owns four merits:
- 1.
This new classifier avoids the over fitting on SSS problem due to containing the feature selection using -norm on structural risk.
- 2.
-L-LSSVM can availably detect edge points effectively and improves model robustness, especially in the presence of non-independent and identically distributed (non-i.i.d) samples.
- 3.
Since the primal optimization model is non-smooth and non-differentiable, we design an efficient iterative algorithm after translating into QPP and designing an SMO likely algorithm to ensure the convergence of optimum solution and lower computational time.
- 4.
The last but not the least, it is worth mentioning that our method can extend to other learning areas and further enrich norm-based LSSVM in theory. That is the work what we are going to do in the future.
This paper is organized as follows. Section 2 of this paper briefly dwells on the backgrounds, contain some norm-based distances and norm-based LSSVMs. Section 3 proposes our -L-LSSVM approach with its optimization model, feasibility algorithm, theoretical analysis and feature selection. Section 4 describes the experimental operation and results in real-world datasets and a practical application: fault detection of the railway turnout. Finally, we conclude our work in Section 5.
Section snippets
Norms for vectors
Norm is a basic concept in mathematics (Johnson Charles, 1991, Kahneman and Miller Dale, 1986). In functional analysis, the four axioms for a norm on a real or complex vector space V are satisfied for all x, yV and all cR,
- (1)
(Nonnegativity)
- (2)
if and only if (Positivity)
- (3)
(Homogeneity)
- (4)
(Triangle Inequality)
These four axioms express some of the familiar properties of Euclidean distance. Euclidean distance is computed as follows: Between
Optimization problem
To settle the above issues, while retaining their advantages, we combine -norm and -norm into the optimization model of LSSVM and introduce the novel optimization problem as follows where . The first term is a regularization term to control the sparsity of the final classifier. Superscripts and subscripts of , to denote the reciprocal of the power of -norm and the type of norm vector. It is a
Experiments
In this section, we investigate the classification performance, feature selection ability and execution time of our proposed -L-LSSVM on several publicly available benchmarks datasets and a practical application: the fault detection of railway turnout. We compare it with four models, including -SVM, -LSSVM, -LSSVM and -SVM. More concretely, we will analyze its effectiveness from the following aspects:
(1) The influence of -norm on feature selection and sparse solution in our -L
Conclusions
In this paper we have put forward a novel classifier with on structural risk and -norm on empirical risk for traditional LSSVM, named -L-LSSVM. -L-LSSVM not only accesses feature selection ability, but also has the well classification performance both for the i.i.d and the non-i.i.d datasets, especially for in the anomaly detection: the fault detection of China railway turnout. Not only that, another obvious merit is that we have designed an effective SMO likely algorithm to
CRediT authorship contribution statement
Ting Ke: Conceptualization, Methodology, Software, Investigation, Writing - original draft. Lidong Zhang: Validation, Formal analysis. Xuechun Ge: Validation, Software. Hui Lv: Writing - review & editing, Supervision, Data curation. Min Li: Resources, Supervision, Data curation.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research is supported by the Science Research Program of Tianjin Municipal Education Commission (No. 2018KJ115), Project of Humanities and Social Science Fund of Ministry of Education in China (No. 19YJCZH251).
References (35)
- et al.
Seeking multi-threshold directly from support vectors for image segmentation
Neurocomputing
(2005) - et al.
Extended compressed tracking via random projection based on MSERs and online LS-SVM learning
Pattern Recognit.
(2016) - et al.
PUMAD: PU metric learning for anomaly detection
Inform. Sci.
(2020) - et al.
Face recognition via weighted sparse representation
J. Vis. Commun. Image Represent.
(2013) - et al.
Nonparallel hyperplane support vector machine for binary classification problems
Inf. Sci. Int. J.
(2014) - et al.
Sparse Lq-norm least squares support vector machine with feature selection
Pattern Recognit.
(2018) - et al.
Robust face detection using local CNN and SVM based on kernel combination
Nurocomputing
(2016) - et al.
Least squares twin bounded support vector machines based on l-norm distance metric for classification
Pattern Recognit.
(2018) - et al.
Biased p-norm support vector machine for PU learning
Neurocomputing
(2014) - et al.
UCI repository for machine learning databases:
(1998)
Libsvm: a library for support vector machines
ACM Trans. Intell. Syst. Technol.
Support vector networks
Mach. Learn.
Gene selection for cancer classification using support vector machine
Mach. Learn.
Making large scale support vector machine learning practical
Topics in Matrix Analysis
Cited by (8)
A general maximal margin hyper-sphere SVM for multi-class classification
2024, Expert Systems with ApplicationsExponential loss regularisation for encouraging ordinal constraint to shotgun stocks quality assessment
2023, Applied Soft ComputingRobust and optimal epsilon-insensitive Kernel-based regression for general noise models
2023, Engineering Applications of Artificial IntelligenceMaximal margin hyper-sphere SVM for binary pattern classification
2023, Engineering Applications of Artificial IntelligenceThe new era of hybridisation and learning in heuristic search design
2022, The Palgrave Handbook of Operations Research