Elsevier

Neurocomputing

Volume 530, 14 April 2023, Pages 188-204
Neurocomputing

Over-relaxed multi-block ADMM algorithms for doubly regularized support vector machines

https://doi.org/10.1016/j.neucom.2023.01.082Get rights and content

Abstract

As a classical machine learning model, support vector machine (SVM) has attracted much attention due to its rigorous theoretical foundation and powerful discriminative performance. The doubly regularized SVM (DRSVM) is an important variant of SVM based on elastic-net regularization, which considers both the sparsity and stability of the model. To tackle the problems of explosive increases in data dimensions and data volume, the alternating direction method of multipliers (ADMM) algorithm can be used to train the DRSVM model. ADMM is an effective iterative algorithm for solving convex optimization problems by decomposing a large issue into a series of solvable subproblems, which is also well suited for distributed computing. However, lack of guaranteed convergence and slow convergence rate are two critical limitations of ADMM. In this paper, a 3-block ADMM algorithm based on the over-relaxation technique is proposed to accelerate DRSVM training, namely, the over-relaxed DRSVM (O-RDRSVM). The main strategy of the over-relaxation technique is to further append the information from the previous iteration to the next iteration to improve the convergence of ADMM. We also propose a distributed version of O-RDRSVM to handle parallel and distributed computing faster, termed DO-RDRSVM. Moreover, we develop a fast O-RDRSVM algorithm (FO-RDRSVM) and a fast DO-RDRSVM algorithm (FDO-RDRSVM), which further reduce the computational cost of O-RDRSVM and DO-RDRSVM by employing the matrix inversion lemma. The convergence analyses ensure the effectiveness of our algorithms for DRSVM training. Finally, extensive experiments on public datasets demonstrate the advantages of our algorithms in terms of convergence rate and training time while maintaining accuracy and sparsity comparable to those of previous works.

Introduction

In the past two decades, machine learning techniques [1], [2] have undergone exponential development, and many variants and research directions have been generated [3], [4]. As a classical machine learning model, support vector machine (SVM) [5] has been extensively studied by many researchers [6], [7], [8] and has also been effectively applied in a variety of fields, such as face recognition [9], [10], image and signal classification [11], [12], and information forecasting [13], [14]. The original SVM was proposed for solving linearly separable problems, without considering the common intersections of datasets. To overcome this defect, the simplest SVM was enhanced into the soft margin SVM [15] and the twin SVM [16], which allow a small number of misclassifications. For handling classification problems with different requirements, the one-class SVM [17] was developed to detect outliers with no supervision, and the multi-class SVM presented in [18] aims to address multi-class classification. Moreover, by combining with other machine learning methods, SVM can further improve its classification efficiency [19].

In recent years, the “big data” era has posed more challenges to SVM [20], [21]. One of the main difficulties is the dramatic increase in the dimensions of realistic data, which may contain noise features and place higher demands on the sparsity of SVM. The common soft margin SVM can be regarded as an SVM regularized by the L2 norm without considering the sparsity problem. Therefore, Ji et al. [22] introduced L1-norm regularization into the original SVM and investigated the use of the sparse L1-SVM to select effective features. Furthermore, the doubly regularized SVM (DRSVM) [23], with great sparsity and generalization performance, was proposed by using elastic-net regularization [24], which combines the advantages of L1-norm and L2-norm regularization. The huberized SVM [25] also employs the elastic net to regularize the SVM model while replacing the classical hinge loss function with the differentiable huberized hinge loss.

Other challenges include the explosive growth in the data volume and the distributed storage of training data, requiring us to develop corresponding distributed algorithms for calculation [26], [27], [28]. Unfortunately, the frequently used SVM trained by the sequential minimal optimization (SMO) algorithm [29] has difficulty in addressing the problem of distributed data storage [30]. To develop a sparse and universal distributed SVM algorithm, the alternating direction method of multipliers (ADMM) [31] can be employed to train DRSVM. ADMM is an iterative algorithm commonly used to solve large-scale convex optimization problems [32], [33], [34]. It splits a complex optimization problem into solvable subproblems and then iteratively solves all subproblems in turn until convergence. Since these subproblems usually have efficient solution methods and can be solved independently, it is convenient for ADMM to solve non-distributed and distributed consensus problems [31]. Therefore, this paper focuses on training DRSVM and the distributed DRSVM using the ADMM and distributed ADMM algorithms, respectively. In fact, Ye et al. [35] first presented a prototype algorithm using ADMM to train the DRSVM model.

However, the convergency of ADMM has always been a difficult issue. For a convex minimization model with three uncoupled variables, Chen et al. [36] proved that the corresponding 3-block ADMM is not necessarily convergent. Extending the 3-block ADMM to the m-block ADMM (m3), by simple mathematical inference, the m-block ADMM without any additional conditions does not necessarily converge either. Another factor that limits the performance of ADMM is its convergence rate. In [36], the worst-case convergence rate for the 3-block ADMM was established to be O(1/t) in the ergodic sense, where t is the iteration number. Tao et al. [37] established the same worst-case convergence rate for the m-block ADMM. It is obvious that this sublinear convergence rate is too slow to handle large-scale data, especially when the subproblems in ADMM also need to be solved iteratively.

To accelerate the convergence of ADMM, the over-relaxation technique was investigated in [38], [39], [40]. This technique applies the results of the previous iteration additionally to the results of the next iteration, making fuller use of the information from the previous iterations to accelerate convergence. It has been applied to various numerical algebra and optimization problems, including the successive overrelaxation method [41], [42] and image reconstruction [43], [44]. The results show that the over-relaxation technique indeed improves the convergence rate. However, the existing studies focus on the 2-block ADMM with its extension, and the effectiveness of over-relaxation on the 3-block ADMM when training DRSVM cannot be guaranteed. The convergency of the over-relaxed 3-block ADMM and the over-relaxed m-block ADMM also needs to be reconsidered.

In this paper, we take both the convergency and the convergence rate into consideration. Different from the previous works using the non-relaxed ADMM, the over-relaxation technique is applied to improve the convergence of the 3-block ADMM and corresponding m-block ADMM when training DRSVM and the distributed DRSVM. We first propose an over-relaxed 3-block ADMM for training DRSVM at a faster convergence rate, termed the over-relaxed DRSVM (O-RDRSVM). Second, to meet the challenge of data distributed storage, O-RDRSVM is extended to its distributed version DO-RDRSVM. We fully consider the structure of the distributed DRSVM model to achieve the global consensus of the over-relaxed m-block ADMM with as little information interaction as possible. Third, in the case of high-dimensional datasets, we transform the large-scale matrix inversion problem into a small-scale matrix inversion problem and then propose the fast O-RDRSVM (FO-RDRSVM) and the fast DO-RDRSVM (FDO-RDRSVM) to reduce the computational burden of the algorithms. Furthermore, we prove the convergence of O-RDRSVM and DO-RDRSVM, while the convergence properties of FO-RDRSVM and FDO-RDRSVM are equivalent to those of O-RDRSVM and DO-RDRSVM, respectively. To the best of our knowledge, no convergence analysis has been previously performed for the over-relaxed multi-block ADMM algorithm applied to train DRSVM. Extensive experiments empirically demonstrate that our proposed algorithms not only maintain high classification accuracy and sparsity but also effectively accelerate convergence and reduce the training time to handle high-dimensional and distributed data more efficiently.

The contributions of this paper include the following:

  • 1.

    We apply the over-relaxation technique to the 3-block ADMM to accelerate the DRSVM training while maintaining the sparsity and generalization ability of DRSVM.

  • 2.

    We establish the distributed DRSVM and train it in a consensus form using the over-relaxed multi-block ADMM to handle large-scale and distributed data more efficiently.

  • 3.

    The matrix inversion lemma is introduced into the iterative process of ADMM to further reduce the training time when facing high-dimensional tasks.

  • 4.

    The convergence of our proposed algorithms is guaranteed both theoretically and empirically.

The remainder of this paper is organized as follows. In Section 2, we introduce the theoretical basis of regularized SVMs and the ADMM algorithm. In Section 3, we describe O-RDRSVM, DO-RDRSVM, FO-RDRSVM and FDO-RDRSVM in detail and then analyze their convergence. The experimental results are presented and discussed in Section 4. Section 5 concludes this paper with final remarks and future research directions.

Section snippets

Singly regularized SVM

To cope with data that are not linearly separable, a fast approach is to use the soft margin SVM (SMSVM) [29] trained by the SMO-type decomposition method. We consider a binary classification training task {(xi,yi)}i=1n, where xi=(xi1,xi2,,xip)TRp is the ith sample and yi{-1,1} is the corresponding label. SMSVM aims to find an optimal hyperplane that separates the two classes of data with the largest margin by solving the following optimization problem:minβ,β0,ξiλ2β22+i=1nξis.t.yi(xiTβ+β0)

Over-relaxed DRSVM

Since the convergence rate of the ADMM iterations in Eqs. (8a), (8b), (8c), (8d) is relatively slow, we use the over-relaxation technique in [46] to accelerate the algorithm. The specific strategy of the over-relaxation technique is to linearly combine the partial results of the next iteration with the results of the previous iteration and then apply the combined results to subsequent updates, which can accelerate the convergence of the ADMM algorithm [47], [48], [49]. In the case of the

Experiments

In this section, we evaluate the performance of our proposed O-RDRSVM, DO-RDRSVM, FO-RDRSVM, and FDO-RDRSVM algorithms. In the first experiment, we compare O-RDRSVM and DO-RDRSVM with their non-relaxed versions to verify their convergence and acceleration abilities, and further evaluate the effect of the over-relaxation parameter α on the accelerated convergence. Subsequently, we compare the performances of SMSVM, L1-SVM, DRSVM, O-RDRSVM, and FO-RDRSVM on low-dimensional and high-dimensional

Conclusions

In this paper, we proposed O-RDRSVM to accelerate the training process of DRSVM by introducing the over-relaxation technique into the 3-block ADMM algorithm. The over-relaxation technique makes full use of the information from past iterations to accelerate the convergence of the ADMM algorithm. Moreover, we constructed a distributed DRSVM in consensus form, extended O-RDRSVM to the field of distributed computing and developed its distributed version DO-RDRSVM. DO-RDRSVM is independent of the

CRediT authorship contribution statement

Yunwei Dai: Methodology, Software, Investigation, Writing - original draft. Yuao Zhang: Methodology, Writing - review & editing. Qingbiao Wu: Conceptualization, Writing - review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No.12271479).

Yunwei Dai received the B.S. degree in Information and Computing Science from Zhejiang University, Hangzhou, China, in 2020. He is currently working toward the Ph.D. degree in Computational Mathematics at Zhejiang University, Hangzhou, China. His research interests include optimization, neural networks and machine learning.

References (55)

  • R. Zhang et al.

    Privacy-preserving decentralized power system economic dispatch considering carbon capture power plants and carbon emission trading scheme via over-relaxed ADMM

    Int. J. Electr. Power Energy Syst.

    (2020)
  • G.-B. Ye et al.

    Split bregman method for large scale fused lasso

    Comput. Stat. Data Anal.

    (2011)
  • C.Y. Deng

    A generalization of the Sherman–Morrison–Woodbury formula

    Appl. Math. Lett.

    (2011)
  • S. Al-Janabi et al.

    A nifty collaborative analysis to predicting a novel tool (DRFLLS) for missing values estimation

    Soft Comput.

    (2020)
  • D.V. Carvalho et al.

    Machine learning interpretability: A survey on methods and metrics

    Electronics

    (2019)
  • M. Ahmadi et al.

    FWNNet: presentation of a new classifier of brain tumor diagnosis based on fuzzy logic and the wavelet-based neural network using machine-learning methods

    Comput. Intell. Neurosci.

    (2021)
  • S. Al-Janabi et al.

    An innovative synthesis of deep learning techniques (DCapsNet & DCOM) for generation electrical renewable energy from wind energy

    Soft Comput.

    (2020)
  • B.E. Boser, I.M. Guyon, V.N. Vapnik, A training algorithm for optimal margin classifiers, in: Proceedings of the Fifth...
  • M.A. Mahdi et al.

    A novel software to improve healthcare base on predictive analytics and mobile services for cloud data centers

  • M. Tanveer et al.

    Comprehensive review on twin support vector machines

    Ann. Oper. Res.

    (2022)
  • M. Zangeneh Soroush et al.

    EEG artifact removal using sub-space decomposition, nonlinear dynamics, stationary wavelet transform and machine learning algorithms

    Front. Physiol.

    (2022)
  • A.D. Jia et al.

    Detection of cervical cancer cells based on strong feature CNN-SVM network

    Neurocomputing

    (2020)
  • C. Cortes et al.

    Support-vector networks

    Mach. Learn.

    (1995)
  • M. Tanveer

    Newton method for implicit Lagrangian twin support vector machines

    Int. J. Mach. Learn. Cybern.

    (2015)
  • A. Nomani et al.

    PSOWNNs-CNN: a computational radiology for breast cancer diagnosis improvement based on image processing using machine learning methods

    Comput. Intell. Neurosci.

    (2022)
  • S. Al-Janabi et al.

    A new method for prediction of air pollution based on intelligent computation

    Soft Comput.

    (2020)
  • V.K. Chauhan et al.

    Problem formulations and solvers in linear SVM: a review

    Artif. Intell. Rev.

    (2019)
  • Yunwei Dai received the B.S. degree in Information and Computing Science from Zhejiang University, Hangzhou, China, in 2020. He is currently working toward the Ph.D. degree in Computational Mathematics at Zhejiang University, Hangzhou, China. His research interests include optimization, neural networks and machine learning.

    Yuao Zhang received the M.S. degree from the Department of Mathematics, Zhejiang Sci-Tech University, Hangzhou, China, in 2020. He is currently pursuing the Ph.D. degree at Zhejiang University, Hangzhou. His research interests include machine learning, neural networks, and deep learning.

    Qingbiao Wu received the Ph.D. degree from the Department of Mathematics, Zhejiang University, Hangzhou, China. He was the Director of the Institute of Science and Engineering Computing, Zhejiang University. He is currently a Full Professor with the Department of Mathematics, Zhejiang University. He has published over 80 journal articles in image processing, neural networks, and numerical algebra. His research interests include neural computing, deep learning, and numerical computation method. He served as the Deputy Editor of the International Journal of Information Processing and Management.

    View full text