Construct a robust least squares support vector machine based on Lp-norm and L-norm

https://doi.org/10.1016/j.engappai.2020.104134Get rights and content

Abstract

Despite some Lp-norm LSSVMs own feature selection and prediction ability, they still suffer from two common issues. (i) They always ignore edge points because the L2-norm metric is used to measure the classification error of the training samples. The edge points are important in some practical application or on the datasets that are non-independent and identically distributed (non-i.i.d). (ii) They spend higher computational time and storage space for the large scale datasets. In order to solve the above two shortcomings while retaining the feature selection ability, we adopt L-norm to measure the classification error of training samples and still use Lp-norm (0<p<1) to measure the maximum margin between two parallel support planes, then obtain a novel LSSVM classifier, denoted as Lp-L-LSSVM. Our Lp-L-LSSVM owns three advantages: (1) L-norm on empirical risk ensures the effective recognition of edge points, thereby improving the robustness and generalization ability of the classifier. (2) Lp-norm on structural risk possess feature selection ability, whether for the linear or non-linear separable case and is suitable for the small samples size (SSS) problem. (3) Inspired by the sequential minimal optimization (SMO) algorithm, we designed an iterative heuristic algorithm by breaking the large quadratic programming problem (QPP) into a series of smallest possible QPPs, which can avoid high time-consuming. This algorithm not only ensures the convergence of optimum solution but also consumes lower computational time and storage space for large scale datasets. Finally, extensive numerical experiments once again verify the above opinions and show the outstanding classification performance and feature selection ability simultaneously.

Introduction

In data mining and machine learning, Support Vector Machine (SVM) (Cortes and Vapnik, 1995) has become a popular classification technique. At present, SVM has been successfully applied to various practical problems, such as text categorization (Joachims et al., 1998, Ke et al., 2014), face detection (Tao et al., 2016), gene identification (Guyon et al., 2002), image segmentation (Chen and Wang, 2005), financial regression (Lin et al., 2006) and time series analysis (Muller et al., 1999) and so on. This classification technique can obtain an optimal separating hyper-plane by maximizing the margin between two parallel support planes and minimizing the classification error on training samples (Vapnik, 1998). However, searching for this plane needs to solve a quadratic programming problem (QPP) with inequality constraints, which spends much computational time and storage space. In order to reduce the time complexity and storage space, on the one hand, some algorithms were designed, such as sequential minimal optimization (SMO) algorithm (Platt John, 1999), SVMlight (Joachims, 1999), Libsvm (Chang and Lin, 2011) and so on. Specifically, the SMO algorithm divides the large QPP into a series of the smallest QPPs, which can avoid high time-consuming and handle large scale datasets. On the other hand, least squares Support Vector Machine (LSSVM) (Suykens and Vandewalle, 1999), twin Support Vector Machine (TWSVM) (Shao et al., 2014) were proposed. LSSVM is able to obtain good classification performance on many problems by solving linear equations as opposed to solving QPP (Mall and Suykens, 2015, Gao et al., 2016). Even so, it may suffer from two disadvantages. (i) It may ignore the edge points due to the use of L2-norm on the measurement of classification error. In some practical problems, edge points are often the important samples that need to be classified correctly, such as anomaly detection (Rolf, 2006, Ju et al., 2020). (ii) It would be ill-condition or singularity when the sample size is much less than feature number. That is to say, LSSVM is not suitable for small sample size (SSS) problem. The aim of this paper is to propose a modified LSSVM classifier to solve the above issues.

In general, the feature selection technique is a good way to avoid SSS problem. For example, L1-norm SVM (Mangasarian, 2006) with LP formulation can accomplish prediction and feature selection simultaneously. As an improvement of L0-norm SVM, Lp-norm SVM (Kloft et al., 2011) has been presented. Experimental results showed that Lp-norm not only makes the classifier more suitable for selecting features but also improves the classification performance (Jawanpuri et al., 2014). Inspired by Lp-norm SVM, Lp-norm LSSVM were proposed (Shao et al., 2018, Lu et al., 2013). Lp-norm LSSVM released the least squares problem in primal space and achieved the feature selection and predictions effectively. However, it also brought in new issue meantime. The algorithm of Lp-norm LSSVM suffered from higher time complexity and storage space.

In order to retain the advantages of feature selection function brought by Lp-norm, while avoiding higher computational time and neglect on edge points, we combine Lp-norm and L-norm into the optimization model of LSSVM and construct a robust classification approach, named Lp-L-LSSVM in short. Lp-L-LSSVM owns four merits:

  • 1.

    This new classifier avoids the over fitting on SSS problem due to containing the feature selection using Lp-norm on structural risk.

  • 2.

    Lp-L-LSSVM can availably detect edge points effectively and improves model robustness, especially in the presence of non-independent and identically distributed (non-i.i.d) samples.

  • 3.

    Since the primal optimization model is non-smooth and non-differentiable, we design an efficient iterative algorithm after translating into QPP and designing an SMO likely algorithm to ensure the convergence of optimum solution and lower computational time.

  • 4.

    The last but not the least, it is worth mentioning that our method can extend to other learning areas and further enrich norm-based LSSVM in theory. That is the work what we are going to do in the future.

This paper is organized as follows. Section 2 of this paper briefly dwells on the backgrounds, contain some norm-based distances and norm-based LSSVMs. Section 3 proposes our Lp-L-LSSVM approach with its optimization model, feasibility algorithm, theoretical analysis and feature selection. Section 4 describes the experimental operation and results in real-world datasets and a practical application: fault detection of the railway turnout. Finally, we conclude our work in Section 5.

Section snippets

Norms for vectors

Norm is a basic concept in mathematics (Johnson Charles, 1991, Kahneman and Miller Dale, 1986). In functional analysis, the four axioms for a norm on a real or complex vector space V are satisfied for all x, yV and all cR,

  • (1)

    x0 (Nonnegativity)

  • (2)

    x=0 if and only if x=0 (Positivity)

  • (3)

    cx=cx (Homogeneity)

  • (4)

    x+yx+y (Triangle Inequality)

These four axioms express some of the familiar properties of Euclidean distance. Euclidean distance is computed as follows: (a1b1)2++(anbn)212Between

Optimization problem

To settle the above issues, while retaining their advantages, we combine Lp-norm and L-norm into the optimization model of LSSVM and introduce the novel optimization problem as follows min(w,b,η1,,ηl)12wpp+Cηs.t.|yi(w,φxi+b)1|=ηii=1,2,,lwhere η=max|η1|,,|ηl|. The first term wpp is a regularization term to control the sparsity of the final classifier. Superscripts and subscripts of wpp, to denote the reciprocal of the power of Lp-norm and the type of norm vector. It is a

Experiments

In this section, we investigate the classification performance, feature selection ability and execution time of our proposed Lp-L-LSSVM on several publicly available benchmarks datasets and a practical application: the fault detection of railway turnout. We compare it with four models, including Lp-SVM, L2-LSSVM, Lp-LSSVM and L2-SVM. More concretely, we will analyze its effectiveness from the following aspects:

(1) The influence of Lp-norm on feature selection and sparse solution in our Lp-L

Conclusions

In this paper we have put forward a novel classifier with Lpnorm on structural risk and L-norm on empirical risk for traditional LSSVM, named Lp-L-LSSVM. Lp-L-LSSVM not only accesses feature selection ability, but also has the well classification performance both for the i.i.d and the non-i.i.d datasets, especially for in the anomaly detection: the fault detection of China railway turnout. Not only that, another obvious merit is that we have designed an effective SMO likely algorithm to

CRediT authorship contribution statement

Ting Ke: Conceptualization, Methodology, Software, Investigation, Writing - original draft. Lidong Zhang: Validation, Formal analysis. Xuechun Ge: Validation, Software. Hui Lv: Writing - review & editing, Supervision, Data curation. Min Li: Resources, Supervision, Data curation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research is supported by the Science Research Program of Tianjin Municipal Education Commission (No. 2018KJ115), Project of Humanities and Social Science Fund of Ministry of Education in China (No. 19YJCZH251).

References (35)

  • ChangC.C. et al.

    Libsvm: a library for support vector machines

    ACM Trans. Intell. Syst. Technol.

    (2011)
  • CortesC. et al.

    Support vector networks

    Mach. Learn.

    (1995)
  • GuyonI. et al.

    Gene selection for cancer classification using support vector machine

    Mach. Learn.

    (2002)
  • Jawanpuri, P., Varma, M., Nath, S., 2014. On P-norm path following in multiple kernel learning for non-linear feature...
  • JoachimsT.

    Making large scale support vector machine learning practical

  • Joachims, T., Ndellec, C., Rouveriol, ., 1998. Text categorization with support vector machines: learning with many...
  • Johnson CharlesR.

    Topics in Matrix Analysis

    (1991)
  • Cited by (8)

    • Maximal margin hyper-sphere SVM for binary pattern classification

      2023, Engineering Applications of Artificial Intelligence
    • The new era of hybridisation and learning in heuristic search design

      2022, The Palgrave Handbook of Operations Research
    View all citing articles on Scopus
    View full text