Construct a robust least squares support vector machine based on Lp-norm and L∞-norm

doi:10.1016/j.engappai.2020.104134

Engineering Applications of Artificial Intelligence

Volume 99, March 2021, 104134

https://doi.org/10.1016/j.engappai.2020.104134 Get rights and content

Abstract

Despite some $L_{p}$ -norm LSSVMs own feature selection and prediction ability, they still suffer from two common issues. (i) They always ignore edge points because the $L_{2}$ -norm metric is used to measure the classification error of the training samples. The edge points are important in some practical application or on the datasets that are non-independent and identically distributed (non-i.i.d). (ii) They spend higher computational time and storage space for the large scale datasets. In order to solve the above two shortcomings while retaining the feature selection ability, we adopt $L_{\infty}$ -norm to measure the classification error of training samples and still use $L_{p}$ -norm (0<p<1) to measure the maximum margin between two parallel support planes, then obtain a novel LSSVM classifier, denoted as $L_{p}$ -L $_{\infty}$ -LSSVM. Our $L_{p}$ -L $_{\infty}$ -LSSVM owns three advantages: (1) $L_{\infty}$ -norm on empirical risk ensures the effective recognition of edge points, thereby improving the robustness and generalization ability of the classifier. (2) $L_{p}$ -norm on structural risk possess feature selection ability, whether for the linear or non-linear separable case and is suitable for the small samples size (SSS) problem. (3) Inspired by the sequential minimal optimization (SMO) algorithm, we designed an iterative heuristic algorithm by breaking the large quadratic programming problem (QPP) into a series of smallest possible QPPs, which can avoid high time-consuming. This algorithm not only ensures the convergence of optimum solution but also consumes lower computational time and storage space for large scale datasets. Finally, extensive numerical experiments once again verify the above opinions and show the outstanding classification performance and feature selection ability simultaneously.

Introduction

In data mining and machine learning, Support Vector Machine (SVM) (Cortes and Vapnik, 1995) has become a popular classification technique. At present, SVM has been successfully applied to various practical problems, such as text categorization (Joachims et al., 1998, Ke et al., 2014), face detection (Tao et al., 2016), gene identification (Guyon et al., 2002), image segmentation (Chen and Wang, 2005), financial regression (Lin et al., 2006) and time series analysis (Muller et al., 1999) and so on. This classification technique can obtain an optimal separating hyper-plane by maximizing the margin between two parallel support planes and minimizing the classification error on training samples (Vapnik, 1998). However, searching for this plane needs to solve a quadratic programming problem (QPP) with inequality constraints, which spends much computational time and storage space. In order to reduce the time complexity and storage space, on the one hand, some algorithms were designed, such as sequential minimal optimization (SMO) algorithm (Platt John, 1999), SVMlight (Joachims, 1999), Libsvm (Chang and Lin, 2011) and so on. Specifically, the SMO algorithm divides the large QPP into a series of the smallest QPPs, which can avoid high time-consuming and handle large scale datasets. On the other hand, least squares Support Vector Machine (LSSVM) (Suykens and Vandewalle, 1999), twin Support Vector Machine (TWSVM) (Shao et al., 2014) were proposed. LSSVM is able to obtain good classification performance on many problems by solving linear equations as opposed to solving QPP (Mall and Suykens, 2015, Gao et al., 2016). Even so, it may suffer from two disadvantages. (i) It may ignore the edge points due to the use of $L_{2}$ -norm on the measurement of classification error. In some practical problems, edge points are often the important samples that need to be classified correctly, such as anomaly detection (Rolf, 2006, Ju et al., 2020). (ii) It would be ill-condition or singularity when the sample size is much less than feature number. That is to say, LSSVM is not suitable for small sample size (SSS) problem. The aim of this paper is to propose a modified LSSVM classifier to solve the above issues.

In general, the feature selection technique is a good way to avoid SSS problem. For example, $L_{1}$ -norm SVM (Mangasarian, 2006) with LP formulation can accomplish prediction and feature selection simultaneously. As an improvement of $L_{0}$ -norm SVM, $L_{p}$ -norm SVM (Kloft et al., 2011) has been presented. Experimental results showed that $L_{p}$ -norm not only makes the classifier more suitable for selecting features but also improves the classification performance (Jawanpuri et al., 2014). Inspired by $L_{p}$ -norm SVM, $L_{p}$ -norm LSSVM were proposed (Shao et al., 2018, Lu et al., 2013). $L_{p}$ -norm LSSVM released the least squares problem in primal space and achieved the feature selection and predictions effectively. However, it also brought in new issue meantime. The algorithm of $L_{p}$ -norm LSSVM suffered from higher time complexity and storage space.

In order to retain the advantages of feature selection function brought by $L_{p}$ -norm, while avoiding higher computational time and neglect on edge points, we combine $L_{p}$ -norm and $L_{\infty}$ -norm into the optimization model of LSSVM and construct a robust classification approach, named $L_{p}$ -L $_{\infty}$ -LSSVM in short. $L_{p}$ -L $_{\infty}$ -LSSVM owns four merits:

1.
This new classifier avoids the over fitting on SSS problem due to containing the feature selection using $L_{p}$ -norm on structural risk.
2.
$L_{p}$ -L $_{\infty}$ -LSSVM can availably detect edge points effectively and improves model robustness, especially in the presence of non-independent and identically distributed (non-i.i.d) samples.
3.
Since the primal optimization model is non-smooth and non-differentiable, we design an efficient iterative algorithm after translating into QPP and designing an SMO likely algorithm to ensure the convergence of optimum solution and lower computational time.
4.
The last but not the least, it is worth mentioning that our method can extend to other learning areas and further enrich norm-based LSSVM in theory. That is the work what we are going to do in the future.

This paper is organized as follows. Section 2 of this paper briefly dwells on the backgrounds, contain some norm-based distances and norm-based LSSVMs. Section 3 proposes our $L_{p}$ -L $_{\infty}$ -LSSVM approach with its optimization model, feasibility algorithm, theoretical analysis and feature selection. Section 4 describes the experimental operation and results in real-world datasets and a practical application: fault detection of the railway turnout. Finally, we conclude our work in Section 5.

Section snippets

Norms for vectors

Norm is a basic concept in mathematics (Johnson Charles, 1991, Kahneman and Miller Dale, 1986). In functional analysis, the four axioms for a norm $‖ \cdot ‖$ on a real or complex vector space V are satisfied for all x, y $\in$ V and all c $\in$ R,

(1)
$‖ x ‖ \geq 0$ (Nonnegativity)
(2)
$‖ x ‖ = 0$ if and only if $x = 0$ (Positivity)
(3)
$‖ c x ‖ = c ‖ x ‖$ (Homogeneity)
(4)
$‖ x + y ‖ \leq ‖ x ‖ + ‖ y ‖$ (Triangle Inequality)

These four axioms express some of the familiar properties of Euclidean distance. Euclidean distance is computed as follows: ${({(a_{1} - b_{1})}^{2} + \dots + {(a_{n} - b_{n})}^{2})}^{1 ∕ 2}$ Between

Optimization problem

To settle the above issues, while retaining their advantages, we combine $L_{p}$ -norm and $L_{\infty}$ -norm into the optimization model of LSSVM and introduce the novel optimization problem as follows $\begin{matrix} min_{(w, b, η_{1}, \dots, η_{l})} \frac{1}{2} {‖ w ‖}_{p}^{p} + C {‖ η ‖}_{\infty} \\ s . t . | y_{i} (〈 w, φ (x_{i}) 〉 + b) - 1 | = η_{i} i = 1, 2, \dots, l \end{matrix}$ where ${‖ η ‖}_{\infty} = max \{| η_{1} |, \dots, | η_{l} |\}$ . The first term ${‖ w ‖}_{p}^{p}$ is a regularization term to control the sparsity of the final classifier. Superscripts and subscripts of ${‖ w ‖}_{p}^{p}$ , to denote the reciprocal of the power of $L_{p}$ -norm and the type of norm vector. It is a

Experiments

In this section, we investigate the classification performance, feature selection ability and execution time of our proposed $L_{p}$ -L $_{\infty}$ -LSSVM on several publicly available benchmarks datasets and a practical application: the fault detection of railway turnout. We compare it with four models, including $L_{p}$ -SVM, $L_{2}$ -LSSVM, $L_{p}$ -LSSVM and $L_{2}$ -SVM. More concretely, we will analyze its effectiveness from the following aspects:

(1) The influence of $L_{p}$ -norm on feature selection and sparse solution in our $L_{p}$ -L $_{\infty}$

Conclusions

In this paper we have put forward a novel classifier with $L_{p} - n o r m$ on structural risk and $L_{\infty}$ -norm on empirical risk for traditional LSSVM, named $L_{p}$ -L $_{\infty}$ -LSSVM. $L_{p}$ -L $_{\infty}$ -LSSVM not only accesses feature selection ability, but also has the well classification performance both for the i.i.d and the non-i.i.d datasets, especially for in the anomaly detection: the fault detection of China railway turnout. Not only that, another obvious merit is that we have designed an effective SMO likely algorithm to

CRediT authorship contribution statement

Ting Ke: Conceptualization, Methodology, Software, Investigation, Writing - original draft. Lidong Zhang: Validation, Formal analysis. Xuechun Ge: Validation, Software. Hui Lv: Writing - review & editing, Supervision, Data curation. Min Li: Resources, Supervision, Data curation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research is supported by the Science Research Program of Tianjin Municipal Education Commission (No. 2018KJ115), Project of Humanities and Social Science Fund of Ministry of Education in China (No. 19YJCZH251).

References (35)

ChenS. et al.
Seeking multi-threshold directly from support vectors for image segmentation
Neurocomputing
(2005)
GaoY. et al.
Extended compressed tracking via random projection based on MSERs and online LS-SVM learning
Pattern Recognit.
(2016)
JuH.J. et al.
PUMAD: PU metric learning for anomaly detection
Inform. Sci.
(2020)
LuC.Y. et al.
Face recognition via weighted sparse representation
J. Vis. Commun. Image Represent.
(2013)
ShaoY.H. et al.
Nonparallel hyperplane support vector machine for binary classification problems
Inf. Sci. Int. J.
(2014)
ShaoY.H. et al.
Sparse Lq-norm least squares support vector machine with feature selection
Pattern Recognit.
(2018)
TaoQ. et al.
Robust face detection using local CNN and SVM based on kernel combination
Nurocomputing
(2016)
YanH. et al.
Least squares twin bounded support vector machines based on l $_{1}$ -norm distance metric for classification
Pattern Recognit.
(2018)
ZhangZ.Q. et al.
Biased p-norm support vector machine for PU learning
Neurocomputing
(2014)
BlakeC.L. et al.
UCI repository for machine learning databases:
(1998)

ChangC.C. et al.

Libsvm: a library for support vector machines

ACM Trans. Intell. Syst. Technol.

(2011)

CortesC. et al.

Support vector networks

Mach. Learn.

(1995)

GuyonI. et al.

Gene selection for cancer classification using support vector machine

Mach. Learn.

(2002)

Jawanpuri, P., Varma, M., Nath, S., 2014. On P-norm path following in multiple kernel learning for non-linear feature...

JoachimsT.

Making large scale support vector machine learning practical

Joachims, T., Ndellec, C., Rouveriol, ., 1998. Text categorization with support vector machines: learning with many...

Johnson CharlesR.

Topics in Matrix Analysis

(1991)

Cited by (8)

A general maximal margin hyper-sphere SVM for multi-class classification
2024, Expert Systems with Applications
Traditional SVM algorithms for multi-class (k > 2 classes) classification tasks include “one-against-one”, “one-against-rest”, and “one-against-one-against-rest”, which build k(k−1)/2 or k classifiers for space partitioning and classification decision. However, they may cause a variety of problems, such as an imbalanced problem, a high temporal complexity, and trouble establishing the decision boundary. In this study, we use the notion of minimizing structural risks (SRM) to recognize k classes by designing only one optimization problem, which we call M³HS-SVM. The M³HS-SVM offers numerous benefits. In summary, the following points should be emphasized: (1) Rather than dividing the space with hyper-planes, M³HS-SVM describes the structural characteristics of various classes of data and trains the hyper-sphere classifier of each class based on the data distribution. (2) M³HS-SVM inherits all of the advantages of classical binary SVM, such as the maximization spirit, the use of kernel techniques to solve nonlinear separable problems, and excellent generalization ability. (3) In the dual problem, we develop an SMO algorithm to effectively reduce the complexity of time and space. We eventually validate the preceding statement with comprehensive experiments. The experiment findings show that our method outperforms other mainstream methods in terms of computing time and classification performance on synthetic datasets, UCI datasets, and NDC datasets.
Exponential loss regularisation for encouraging ordinal constraint to shotgun stocks quality assessment
2023, Applied Soft Computing
Ordinal problems are those where the label to be predicted from the input data is selected from a group of categories which are naturally ordered. The underlying order is determined by the implicit characteristics of the real problem. They share some characteristics with nominal or standard classification problems but also with regression ones. In the real world, there are many problems of this type in different knowledge areas, such as medical diagnosis, risk prediction or quality control. The latter has gained an increasing interest in the Industry 4.0 scenario. Some weapons manufacturer follow an aesthetic quality control process to determine the quality of the wood used to produce the stock of the weapons they manufacture. This process is an ordinal classification problem that can be automatised using machine learning techniques. Deep learning methods have been widely used for multiples types of tasks including image aesthetic quality control, where convolutional neural networks are the most common alternative, given that they are focused on solving problems where the input data are images. In this work, we propose a new exponential regularised loss function that is usedto improve the classification performance for ordinal problems when using deep neural networks. The proposed methodology is applied to a real-world aesthetic quality control problem. The results and statistical analysis prove that the proposed methodology outperforms other state-of-the-art methods, obtaining very robust results.
Robust and optimal epsilon-insensitive Kernel-based regression for general noise models
2023, Engineering Applications of Artificial Intelligence
Sparse representation of kernel based regression (KBR) has received considerable attention in recent years. Studies on sparse KBR can be divided into two distinct groups, namely (i) pruning-based methods that remove the training samples with the least training errors and retrain the remaining training samples, and (ii) direct methods that begin with a full-dense solution and delete training data according to objective criteria. Pruning-based methods give rise to a high computation time, while direct methods may lead to non-optimal solutions and thus a poor approximation. In addition, most current KBR models assume that the error distribution is Gaussian. However, observations in many practical applications indicate that the noise models do not satisfy the Gaussian error distribution. In such cases, current KBR models are not optimal. To address the above-mentioned problems, this study proposes a new sparse KBR framework for general noise distributions, including the epsilon-insensitive noise family. Compared with other sparse algorithms, sparsity is directly imposed by epsilon-insensitive convex loss functions derived from the theoretical framework of the Bayesian approach within the scope of regularization networks, and then handles the optimization problem in Lagrangian form. Experiments on artificial and real-life benchmark datasets demonstrate that the proposed epsilon-insensitive KBR models are more effective and efficient than pruning-based approaches.
Maximal margin hyper-sphere SVM for binary pattern classification
2023, Engineering Applications of Artificial Intelligence
In this paper, we propose a novel maximal margin hyper-sphere support vector machine (MMHS-SVM) for binary pattern classification. Our proposed MMHS-SVM aims to find two hyper-spheres simultaneously by solving a single quadratic programming problem and is consistent between its predicting and training processes. An essential difference that distinguishes it from other hyper-sphere SVMs is that the optimization model is constructed by maximizing the sum of the square distance between centers of two hyper-spheres, but not the sum of squares distances from the center of hyper-sphere to all examples of the opposite class. Such a principle of structural risk in our MMHS-SVM not only helps us grasp the critical samples and eliminate a large number of redundant samples, but also reduces the test cost due to the sparsity. In addition, an effective SMO-typed algorithm is designed to decrease the high time complexity and storage. Finally, a large number of experiments verify the above statements again. The experimental results on several artificial and publicly available benchmark datasets show the feasibility and effectiveness of the proposed method.
Capped L<inf>2,p</inf>-Norm Metric Based on Robust Twin Support Vector Machine with Welsch Loss
2023, Symmetry
The new era of hybridisation and learning in heuristic search design
2022, The Palgrave Handbook of Operations Research

View all citing articles on Scopus

View full text

Construct a robust least squares support vector machine based on Lp-norm and L∞-norm

Abstract

Introduction

Section snippets

Norms for vectors

Optimization problem

Experiments

Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Neurocomputing

Pattern Recognit.

Inform. Sci.

J. Vis. Commun. Image Represent.

Inf. Sci. Int. J.

Pattern Recognit.

Nurocomputing

Pattern Recognit.

Neurocomputing

UCI repository for machine learning databases:

Libsvm: a library for support vector machines

ACM Trans. Intell. Syst. Technol.

Support vector networks

Mach. Learn.

Gene selection for cancer classification using support vector machine

Mach. Learn.

Making large scale support vector machine learning practical

Topics in Matrix Analysis

Construct a robust least squares support vector machine based on L $_{p}$ -norm and L $_{\infty}$ -norm