Skip to main content
Log in

An accelerated optimization algorithm for the elastic-net extreme learning machine

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Extreme learning machine (ELM) has received considerable attention due to its rapid learning speed and powerful fitting capabilities. One of its important variants, the elastic-net ELM (Enet-ELM), was recently proposed to improve its sparsity and stability of simulations simultaneously. However, entering the era of big data, the explosive growth of data volume and dimensions poses a huge challenge to Enet-ELM. On the other hand, the alternating direction method of multipliers (ADMM) is a powerful iterative algorithm for solving large-scale optimization problems by splitting a large problem into a set of executable sub-problems. But its performance is highly restricted by its astringency and convergence rates. In this paper, we therefore develop a novel Enet-ELM algorithm based on the over-relaxed ADMM, termed over-relaxed Enet-ELM (OE-ELM), which accelerates model training by applying the results of the previous iteration to the next iteration. Besides, we also propose a parallel version of OE-ELM (POE-ELM) to implement parallel and distributed computation, which is trained by the consensus over relaxation ADMM algorithm. Finally, the convergence analysis conducted on the two proposed algorithms proves the effectiveness of model training, and extensive experiments on classification and regression datasets demonstrate their competitiveness in accuracy and convergence rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets generated during the current study are available in the LIBSVM Data [https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/], UCI Machine Learning Repository [http://archive.ics.uci.edu/ml/index.php], and Luis Torgo’s Regression DataSets [https://www.dcc.fc.up.pt/ ltorgo/Regression/DataSets.html].

References

  1. Mühlroth C, Grottke M (2020) Artificial intelligence in innovation: how to spot emerging trends and technologies. IEEE Trans Eng Manage 69(2):493–510

    Article  Google Scholar 

  2. Abu Arqub O, Abo-Hammour Z (2014) Numerical solution of systems of second-order boundary value problems using continuous genetic algorithm. Inf Sci 279:396–415

    Article  MathSciNet  MATH  Google Scholar 

  3. Topol EJ (2019) High-performance medicine: the convergence of human and artificial intelligence. Nat Med 25(1):44–56

    Article  Google Scholar 

  4. Abo-Hammour Z, Abu Arqub O, Alsmadi O et al (2014) An optimization algorithm for solving systems of singular boundary value problems. Appl Math Inf Sci 8(6):2809–2821

    Article  MathSciNet  Google Scholar 

  5. Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501

    Article  Google Scholar 

  6. Huang G, Huang GB, Song S et al (2015) Trends in extreme learning machines: a review. Neural Netw 61:32–48

    Article  MATH  Google Scholar 

  7. Alshamiri AK, Singh A, Surampudi BR (2018) Two swarm intelligence approaches for tuning extreme learning machine. Int J Mach Learn Cybern 9(8):1271–1283

    Article  Google Scholar 

  8. Zhou L, Ma L (2019) Extreme learning machine-based heterogeneous domain adaptation for classification of hyperspectral images. IEEE Geosci Remote Sens Lett 16(11):1781–1785

    Article  Google Scholar 

  9. Lv F, Han M (2019) Hyperspectral image classification based on multiple reduced kernel extreme learning machine. Int J Mach Learn Cybern 10(12):3397–3405

    Article  Google Scholar 

  10. Zabala-Blanco D, Mora M, Barrientos RJ et al (2020) Fingerprint classification through standard and weighted extreme learning machines. Appl Sci 10(12):4125

    Article  Google Scholar 

  11. Jang SI, Tan GC, Toh KA et al (2017) Online heterogeneous face recognition based on total-error-rate minimization. IEEE Trans Syst Man Cybern Syst 50(4):1286–1299

    Article  Google Scholar 

  12. She Q, Zou J, Meng M et al (2021) Balanced graph-based regularized semi-supervised extreme learning machine for eeg classification. Int J Mach Learn Cybern 12(4):903–916

    Article  Google Scholar 

  13. Nayak DR, Das D, Dash R et al (2020) Deep extreme learning machine with leaky rectified linear unit for multiclass classification of pathological brain images. Multimed Tools Appl 79(21–22):15381–15396

    Article  Google Scholar 

  14. Sun W, Du Y, Zhang X et al (2021) Detection and recognition of text traffic signs above the road. Int J Sens Netw 35(2):69–78

    Article  Google Scholar 

  15. Liu B, Zhou Y, Sun W (2020) Character-level text classification via convolutional neural network and gated recurrent unit. Int J Mach Learn Cybern 11(8):1939–1949

    Article  Google Scholar 

  16. Deng W, Zheng Q, Chen L (2009) Regularized extreme learning machine. In: 2009 IEEE symposium on computational intelligence and data mining, IEEE, pp 389–395

  17. Yıldırım H, Özkale MR (2020) An enhanced extreme learning machine based on Liu regression. Neural Process Lett 52(1):421–442

    Article  Google Scholar 

  18. Huang GB, Chen L, Siew CK et al (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892

    Article  Google Scholar 

  19. Martínez-Martínez JM, Escandell-Montero P, Soria-Olivas E et al (2011) Regularized extreme learning machine for regression problems. Neurocomputing 74(17):3716–3721

    Article  Google Scholar 

  20. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Series B Stat Methodol 67(2):301–320

    Article  MathSciNet  MATH  Google Scholar 

  21. Ghosh S (2011) On the grouped selection and model complexity of the adaptive elastic net. Stat Comput 21(3):451–462

    Article  MathSciNet  MATH  Google Scholar 

  22. Yıldırım H, Özkale MR (2021) Ll-elm: a regularized extreme learning machine based on l\({1}\)-norm and liu estimator. Neural Comput Appl 33(16):10,469-10,484

    Article  Google Scholar 

  23. Huang GB, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2(2):107–122

    Article  Google Scholar 

  24. Scardapane S, Wang D (2017) Randomness in neural networks: an overview. Wiley Interdisc Rev Data Min Knowl Discov 7(2):e1200

    Article  Google Scholar 

  25. Markowska-Kaczmar U, Kosturek M (2021) Extreme learning machine versus classical feedforward network. Neural Comput Appl 33(22):15,121-15,144

    Article  Google Scholar 

  26. Abo-Hammour Z, Abu Arqub O, Momani S et al (2014) (2014) Optimization solution of Troesch’s and Bratu’s problems of ordinary type using novel continuous genetic algorithm. Discrete Dyn Nat Soc 401:696

    MATH  Google Scholar 

  27. Wang Y, Dou Y, Liu X et al (2016) Pr-elm: parallel regularized extreme learning machine based on cluster. Neurocomputing 173:1073–1081

    Article  Google Scholar 

  28. Dokeroglu T, Sevinc E (2019) Evolutionary parallel extreme learning machines for the data classification problem. Comput Ind Eng 130:237–249

    Article  Google Scholar 

  29. Duan M, Li K, Liao X et al (2017) A parallel multiclassification algorithm for big data using an extreme learning machine. IEEE Trans Neural Netw Learn Syst 29(6):2337–2351

    Article  MathSciNet  Google Scholar 

  30. Luo M, Zhang L, Liu J et al (2017) Distributed extreme learning machine with alternating direction method of multiplier. Neurocomputing 261:164–170

    Article  Google Scholar 

  31. Boyd S, Parikh N, Chu E et al (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends ®Mach Learn 3(1):1–122

    MATH  Google Scholar 

  32. Lai X, Cao J, Lin Z (2021) An accelerated maximally split admm for a class of generalized ridge regression. IEEE Trans Neural Networks Learn Syst: 1–15

  33. Abu Arqub O, Abo-Hammour Z, Momani S et al (2012) Solving singular two-point boundary value problems using continuous genetic algorithm. Abstr Appl Anal 205:391

    MathSciNet  MATH  Google Scholar 

  34. Wang H, Feng R, Han ZF et al (2017) Admm-based algorithm for training fault tolerant rbf networks and selecting centers. IEEE Trans Neural Networks Learn Syst 29(8):3870–3878

    MathSciNet  Google Scholar 

  35. Zhan Y, Bai Y, Zhang W et al (2018) A p-admm for sparse quadratic kernel-free least squares semi-supervised support vector machine. Neurocomputing 306:37–50

    Article  Google Scholar 

  36. Lai X, Cao J, Huang X et al (2019) A maximally split and relaxed admm for regularized extreme learning machines. IEEE Trans Neural Netw Learn Syst 31(6):1899–1913

    Article  MathSciNet  Google Scholar 

  37. Song T, Li D, Liu Z et al (2019) Online admm-based extreme learning machine for sparse supervised learning. IEEE Access 7:64533–64544

    Article  Google Scholar 

  38. Chen C, He B, Ye Y et al (2016) The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Math Program 155(1–2):57–79

    Article  MathSciNet  MATH  Google Scholar 

  39. Tao M, Yuan X (2018) Convergence analysis of the direct extension of admm for multiple-block separable convex minimization. Adv Comput Math 44(3):773–813

    Article  MathSciNet  MATH  Google Scholar 

  40. Eckstein J, Bertsekas DP (1992) On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math Program 55(1):293–318

    Article  MathSciNet  MATH  Google Scholar 

  41. França G, Bento J (2016) An explicit rate bound for over-relaxed admm. In: 2016 IEEE International Symposium on Information Theory (ISIT), IEEE, pp 2104–2108

  42. Alves MM, Eckstein J, Geremia M et al (2020) Relative-error inertial-relaxed inexact versions of douglas-rachford and admm splitting algorithms. Comput Optim Appl 75(2):389–422

    Article  MathSciNet  MATH  Google Scholar 

  43. Sun H, Tai XC, Yuan J (2021) Efficient and convergent preconditioned admm for the potts models. SIAM J Sci Comput 43(2):B455–B478

    Article  MathSciNet  MATH  Google Scholar 

  44. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536

    Article  MATH  Google Scholar 

  45. Wang M, Wei M, Feng Y (2010) An iterative algorithm for a least squares solution of a matrix equation. Int J Comput Math 87(6):1289–1298

    Article  MathSciNet  MATH  Google Scholar 

  46. Inaba FK, Salles EOT, Perron S et al (2018) Dgr-elm-distributed generalized regularized elm for classification. Neurocomputing 275:1522–1530

    Article  Google Scholar 

  47. Donoho DL (1995) De-noising by soft-thresholding. IEEE Trans Inf Theory 41(3):613–627

    Article  MathSciNet  MATH  Google Scholar 

  48. He B, Yuan X (2014) On the direct extension of admm for multi-block separable convex programming and beyond: from variational inequality perspective. Optimization-Online 2014:4293

    Google Scholar 

  49. Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Tech (TIST) 2(3):1–27

    Article  Google Scholar 

  50. Dua D, Graff C (2019) UCI machine learning repository. https://archive.ics.uci.edu/ml. Accessed 8 December 2021

  51. Torgo L (2017) Regression data sets. https://www.dcc.fc.up.pt/~ltorgo/Regression/DataSets.html Accessed 8 December 2021

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No.12271479).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingbiao Wu.

Ethics declarations

Conflict of interest

The authors declared that they have no conflict of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A Proof of Theorem 1

Appendix A Proof of Theorem 1

Proof

Note that POE-ELM is equivalent to the iterations of Eqs. (37)–(41), so we just need to prove that the iterations of Eqs. (37)–(41) converge to the global optimal solution because for \(\forall i \ne j\) there must be \(\widetilde{{\textbf {B}}}_i^\mathsf {T}\widetilde{{\textbf {B}}}_j = {\textbf {0}}\) (\(i, j = 1, \ldots , M\)), both \(\widetilde{{\textbf {A}}}\) and \(\widetilde{{\textbf {C}}}\) have full column rank, and \(\widetilde{{\textbf {A}}}^\mathsf {T}\widetilde{{\textbf {C}}} = {\textbf {0}}\). By the first-order optimality conditions of Eqs. (37)–(41), we derive that

$$\begin{aligned}&\theta _j({\varvec{\beta }}_j) - \theta _j({\varvec{\beta }}_j^{k+1}) + ({\varvec{\beta }}_j \nonumber \\&\quad - {\varvec{\beta }}_j^{k+1})^\mathsf {T} \Bigg [ -\widetilde{{\textbf {B}}}_j^\mathsf {T} {\textbf {u}}^k + \rho \widetilde{{\textbf {B}}}_j^\mathsf {T} (\sum _{l=1}^{j}\widetilde{{\textbf {B}}}_l{\varvec{\beta }}_l^{k+1} \nonumber \\&\quad + \sum ^{M}_{l=j+1}\widetilde{{\textbf {B}}}_l{\varvec{\beta }}_l^k + \widetilde{{\textbf {A}}}{} {\textbf {a}}^k \nonumber \\&\quad + \widetilde{{\textbf {C}}}{} {\textbf {c}}^k + \widetilde{{\textbf {d}}}) \Bigg ] \ge 0, \ \forall {\varvec{\beta }}_j \in \mathbb {R}^L, j = 1, \ldots , M, \end{aligned}$$
(A1)
$$\begin{aligned}&\theta _{M+1}({\textbf {a}}) - \theta _{M+1}({\textbf {a}}^{k+1}) + ({\textbf {a}} - {\textbf {a}}^{k+1})^\mathsf {T} \Bigg \{ -\widetilde{{\textbf {A}}}^\mathsf {T} {\textbf {u}}^k \nonumber \\&\quad + \rho \widetilde{{\textbf {A}}}^\mathsf {T} [\alpha \sum _{j=1}^M\widetilde{{\textbf {B}}}_j{\varvec{\beta }}_j^{k+1} \nonumber \\&\quad + \widetilde{{\textbf {A}}} ({\textbf {a}}^{k+1} - (1 - \alpha ){\textbf {a}}^k) + \alpha \widetilde{{\textbf {C}}}{} {\textbf {c}}^k \nonumber \\&\quad + \alpha \widetilde{{\textbf {d}}}] \Bigg \} \ge 0, \forall {\textbf {a}} \in \mathbb {R}^N, \end{aligned}$$
(A2)
$$\begin{aligned}&\theta _{M+2}({\textbf {c}}) - \theta _{M+2}({\textbf {c}}^{k+1}) + ({\textbf {c}} - {\textbf {c}}^{k+1})^\mathsf {T} \Bigg \{ -\widetilde{{\textbf {C}}}^\mathsf {T} {\textbf {u}}^k \nonumber \\&\quad + \rho \widetilde{{\textbf {C}}}^\mathsf {T} [\alpha \sum _{j=1}^M\widetilde{{\textbf {B}}}_j{\varvec{\beta }}_j^{k+1} \nonumber \\&\quad + \widetilde{{\textbf {A}}} ({\textbf {a}}^{k+1} - (1 - \alpha ){\textbf {a}}^k) + \widetilde{{\textbf {C}}} ({\textbf {c}}^{k+1} - (1 - \alpha ){\textbf {c}}^k) \nonumber \\&\quad + \alpha \widetilde{{\textbf {d}}}] \Bigg \} \ge 0, \forall {\textbf {c}} \in \mathbb {R}^{L}, \end{aligned}$$
(A3)
$$\begin{aligned}&{\textbf {u}}^{k+1} = {\textbf {u}}^k - \rho \big [\alpha \sum _{j=1}^M\widetilde{{\textbf {B}}}_j{\varvec{\beta }}_j^{k+1} + \widetilde{{\textbf {A}}} ({\textbf {a}}^{k+1} \nonumber \\&\quad - (1 - \alpha ){\textbf {a}}^k) + \widetilde{{\textbf {C}}} ({\textbf {c}}^{k+1} \nonumber \\&\quad - (1 - \alpha ){\textbf {c}}^k) + \alpha \widetilde{{\textbf {d}}}\big ], \end{aligned}$$
(A4)

where \(\theta _j({\varvec{\beta }}_j) = \lambda (1 - \sigma )\Vert {\varvec{\beta }}_j \Vert ^2_2\), \(\theta _{M+1}({\textbf {a}}) = \sum ^M_{j = 1} \Vert {\textbf {a}}_j \Vert ^2_2\), \(\theta _{M+2}({\textbf {c}}) = \lambda \sigma \Vert {\textbf {c}} \Vert _1,\) and \({\textbf {u}} = -\rho [{\textbf {v}}_{11}^\mathsf {T}, \ldots , {\textbf {v}}_{1M}^\mathsf {T}, {\textbf {v}}_{21}^\mathsf {T}, \ldots , {\textbf {v}}_{2M}^\mathsf {T}]^\mathsf {T}\).

For simplicity, we also define

$$\begin{aligned}&{\textbf {p}}^k := \begin{pmatrix} {\varvec{\beta }}_1^k \\ \vdots \\ {\varvec{\beta }}_M^k \\ {\textbf {a}}^k \\ {\textbf {c}}^k \end{pmatrix}, \ {\textbf {q}}^k := \begin{pmatrix} {\varvec{\beta }}_1^k \\ \vdots \\ {\varvec{\beta }}_M^k \\ {\textbf {a}}^k \\ {\textbf {c}}^k \\ {\textbf {u}}^k \end{pmatrix}, \ {\textbf {r}}^k := \begin{pmatrix} {\textbf {a}}^k\\ {\textbf {c}}^k\\ {\textbf {u}}^k \end{pmatrix}, \end{aligned}$$
(A5)
$$\begin{aligned}&\widetilde{{\varvec{\beta }}}_j^k := {\varvec{\beta }}_j^{k+1}, \ \widetilde{{\textbf {a}}}^k := {\textbf {a}}^{k+1}, \ \widetilde{{\textbf {c}}}^k := {\textbf {c}}^{k+1}, \end{aligned}$$
(A6)
$$\begin{aligned}&\widetilde{{\textbf {u}}}^k := {\textbf {u}}^k - \rho (\sum ^M_{j=1}\widetilde{{\textbf {B}}}_j{\varvec{\beta }}_j^{k+1} + \widetilde{{\textbf {A}}}{} {\textbf {a}}^k + \widetilde{{\textbf {C}}}{} {\textbf {c}}^k + \widetilde{{\textbf {d}}}), \end{aligned}$$
(A7)
$$\begin{aligned}&\widetilde{{\textbf {p}}}^k := \begin{pmatrix} \widetilde{{\varvec{\beta }}}_1^k\\ \vdots \\ \widetilde{{\varvec{\beta }}}_M^k \\ \widetilde{{\textbf {a}}}^k \\ \widetilde{{\textbf {c}}}^k \end{pmatrix}, \ \widetilde{{\textbf {q}}}^k := \begin{pmatrix} \widetilde{{\varvec{\beta }}}_1^k \\ \vdots \\ \widetilde{{\varvec{\beta }}}_M^k \\ \widetilde{{\textbf {a}}}^k \\ \widetilde{{\textbf {c}}}^k \\ \widetilde{{\textbf {u}}}^k \end{pmatrix}, \ \widetilde{{\textbf {r}}}^k := \begin{pmatrix} \widetilde{{\textbf {a}}}^k \\ \widetilde{{\textbf {c}}}^k \\ \widetilde{{\textbf {u}}}^k \end{pmatrix}, \end{aligned}$$
(A8)

where (\({\varvec{\beta }}_1^{k+1}, \ldots , {\varvec{\beta }}_M^{k+1}, {\textbf {a}}^{k+1}, {\textbf {c}}^{k+1}\)) is generated by Eqs. (37)-(41) from given (\({\textbf {a}}^k, {\textbf {c}}^k, {\textbf {u}}^k\)). Then Eq. (A4) can be reformulated as

$$\begin{aligned} {\textbf {u}}^{k+1} = \&\widetilde{{\textbf {u}}}^k + \rho [\widetilde{{\textbf {A}}} ({\textbf {a}}^k - \widetilde{{\textbf {a}}}^k) + \widetilde{{\textbf {C}}} ({\textbf {c}}^k - \widetilde{{\textbf {c}}}^k)] + (1 - \alpha )({\textbf {u}}^k - \widetilde{{\textbf {u}}}^k). \end{aligned}$$
(A9)

Substituting the definition of \(\widetilde{{\textbf {u}}}^k\) and Eq. (A9) into Eqs. (A1)-(A4), we have

$$\begin{aligned}&\theta _j({\varvec{\beta }}_j) - \theta _j({\varvec{\beta }}_j^{k+1}) + ({\varvec{\beta }}_j \nonumber \\&\quad - {\varvec{\beta }}_j^{k+1})^\mathsf {T} \Bigg \{ -\widetilde{{\textbf {B}}}_j^\mathsf {T}\widetilde{{\textbf {u}}}^k - \rho \widetilde{{\textbf {B}}}_j^\mathsf {T} [\sum ^M_{l = j + 1} \widetilde{{\textbf {B}}}_l({\varvec{\beta }}_l^{k+1} - {\varvec{\beta }}_l^k)] \Bigg \} \nonumber \\&\quad \ge 0, \ \forall {\varvec{\beta }}_j \in \mathbb {R}^L, \end{aligned}$$
(A10)
$$\begin{aligned}&\theta _{M+1}({\textbf {a}}) - \theta _{M+1}({\textbf {a}}^{k+1}) \nonumber \\&\quad + ({\textbf {a}} - {\textbf {a}}^{k+1})^\mathsf {T} \Bigg \{ -\widetilde{{\textbf {A}}}^\mathsf {T}\widetilde{{\textbf {u}}}^k \nonumber \\&\quad - \rho \widetilde{{\textbf {A}}}^\mathsf {T}\widetilde{{\textbf {A}}}({\textbf {a}}^k - \widetilde{{\textbf {a}}}^k) \nonumber \\&\quad - (1 - \alpha )\widetilde{{\textbf {A}}}^\mathsf {T}({\textbf {u}}^k - \widetilde{{\textbf {u}}}^k) \Bigg \} \ge 0, \ \forall {\textbf {a}} \in \mathbb {R}^N, \end{aligned}$$
(A11)
$$\begin{aligned}&\theta _{M+2}({\textbf {c}}) - \theta _{M+2}({\textbf {c}}^{k+1}) \nonumber \\&\quad + ({\textbf {c}} - {\textbf {c}}^{k+1})^\mathsf {T} \Bigg \{ -\widetilde{{\textbf {C}}}^\mathsf {T}\widetilde{{\textbf {u}}}^k - \rho \widetilde{{\textbf {C}}}^\mathsf {T}\widetilde{{\textbf {A}}}({\textbf {a}}^k - \widetilde{{\textbf {a}}}^k) \nonumber \\&\quad - \rho \widetilde{{\textbf {C}}}^\mathsf {T}\widetilde{{\textbf {C}}}({\textbf {c}}^k - \widetilde{{\textbf {c}}}^k) \nonumber \\&\quad - (1 - \alpha )\widetilde{{\textbf {C}}}^\mathsf {T}({\textbf {u}}^k - \widetilde{{\textbf {u}}}^k) \Bigg \} \ge 0, \ \forall {\textbf {c}} \in \mathbb {R}^{L}, \end{aligned}$$
(A12)
$$\begin{aligned}&(\sum ^M_{j=1}\widetilde{{\textbf {B}}}_j{\varvec{\beta }}_j^{k+1} + \widetilde{{\textbf {A}}}{} {\textbf {a}}^{k+1} + \widetilde{{\textbf {C}}}{} {\textbf {c}}^{k+1} \nonumber \\&\quad + \widetilde{{\textbf {d}}}) + \widetilde{{\textbf {A}}} ({\textbf {a}}^k - \widetilde{{\textbf {a}}}^k) + \widetilde{{\textbf {C}}} ({\textbf {c}}^k - \widetilde{{\textbf {c}}}^k) \nonumber \\&\quad - \frac{1}{\rho }({\textbf {u}}^k - \widetilde{{\textbf {u}}}^k)= {\textbf {0}}. \end{aligned}$$
(A13)

By Eq. (A13), we derive that

$$\begin{aligned}&({\textbf {u}} - \widetilde{{\textbf {u}}}^k)^\mathsf {T}[(\sum ^M_{j=1}\widetilde{{\textbf {B}}}_j{\varvec{\beta }}_j^{k+1} + \widetilde{{\textbf {A}}}{} {\textbf {a}}^{k+1} \nonumber \\&\quad + \widetilde{{\textbf {C}}}{} {\textbf {c}}^{k+1} + \widetilde{{\textbf {d}}}) + \widetilde{{\textbf {A}}} ({\textbf {a}}^k - \widetilde{{\textbf {a}}}^k) + \widetilde{{\textbf {C}}} ({\textbf {c}}^k - \widetilde{{\textbf {c}}}^k) \nonumber \\&\quad - \frac{1}{\rho }({\textbf {u}}^k - \widetilde{{\textbf {u}}}^k)] = 0, \ \forall {\textbf {u}} \in \mathbb {R}^{N+ML}. \end{aligned}$$
(A14)

Thus, by adding up Eqs. (A10)–(A12) and (A14), since \(\widetilde{{\textbf {B}}}_i^\mathsf {T}\widetilde{{\textbf {B}}}_j = {\textbf {0}}\) (\(\forall i \ne j\), \(i, j = 1, \ldots , M\)) and \(\widetilde{{\textbf {A}}}^\mathsf {T}\widetilde{{\textbf {C}}} = {\textbf {0}}\), we can deduce the following inequality:

$$\begin{aligned} \theta ({\textbf {p}}) - \theta (\widetilde{{\textbf {p}}}^k) + ({\textbf {q}} - \widetilde{{\textbf {q}}}^k)^\mathsf {T}[{\mathrm{F}}(\widetilde{{\textbf {q}}}^k) - {\textbf {Q}}_0({\textbf {q}}^k - \widetilde{{\textbf {q}}}^k)] \ge 0, \end{aligned}$$
(A15)

where

$$\begin{aligned} {\textbf {Q}}_0 = \begin{pmatrix} {\textbf {0}}&{}{\textbf {0}}&{}{\textbf {0}}&{}{\textbf {0}} \\ {\textbf {0}}&{}\rho \widetilde{{\textbf {A}}}^\mathsf {T}\widetilde{{\textbf {A}}}&{}{\textbf {0}}&{}(1 - \alpha )\widetilde{{\textbf {A}}}^\mathsf {T} \\ {\textbf {0}}&{}{\textbf {0}}&{}\rho \widetilde{{\textbf {C}}}^\mathsf {T}\widetilde{{\textbf {C}}}&{}(1 - \alpha )\widetilde{{\textbf {C}}}^\mathsf {T} \\ {\textbf {0}}&{}-\widetilde{{\textbf {A}}}&{}-\widetilde{{\textbf {C}}}&{}\frac{1}{\rho }{} {\textbf {I}}_{N+ML} \end{pmatrix}. \end{aligned}$$
(A16)

By the definition of \({\textbf {q}}\), \({\textbf {q}}^k\), \(\widetilde{{\textbf {q}}}^k\) and \({\textbf {r}}\), \({\textbf {r}}^k\), \(\widetilde{{\textbf {r}}}^k\), Eq. (A15) can be simplified as

$$\begin{aligned} \theta ({\textbf {p}}) - \theta (\widetilde{{\textbf {p}}}^k) + ({\textbf {q}} - \widetilde{{\textbf {q}}}^k)^\mathsf {T}{\mathrm{F}}(\widetilde{{\textbf {q}}}^k) \ge ({\textbf {r}} - \widetilde{{\textbf {r}}}^k)^\mathsf {T}{} {\textbf {Q}}({\textbf {r}}^k - \widetilde{{\textbf {r}}}^k), \end{aligned}$$
(A17)

where

$$\begin{aligned} {\textbf {Q}} = \begin{pmatrix} \rho \widetilde{{\textbf {A}}}^\mathsf {T}\widetilde{{\textbf {A}}}&{}{\textbf {0}}&{}(1 - \alpha )\widetilde{{\textbf {A}}}^\mathsf {T} \\ {\textbf {0}}&{}\rho \widetilde{{\textbf {C}}}^\mathsf {T}\widetilde{{\textbf {C}}}&{}(1 - \alpha )\widetilde{{\textbf {C}}}^\mathsf {T} \\ -\widetilde{{\textbf {A}}}&{}-\widetilde{{\textbf {C}}}&{}\frac{1}{\rho }{} {\textbf {I}}_{N+ML} \end{pmatrix}. \end{aligned}$$
(A18)

According to the full column rank of \(\widetilde{{\textbf {A}}}\) and \(\widetilde{{\textbf {C}}}\), it is obvious that

$$\begin{aligned} {\textbf {Q}}^\mathsf {T} + {\textbf {Q}} = \begin{pmatrix} 2\rho \widetilde{{\textbf {A}}}^\mathsf {T}\widetilde{{\textbf {A}}}&{}{\textbf {0}}&{}- \alpha \widetilde{{\textbf {A}}}^\mathsf {T} \\ {\textbf {0}}&{}2\rho \widetilde{{\textbf {C}}}^\mathsf {T}\widetilde{{\textbf {C}}}&{}- \alpha \widetilde{{\textbf {C}}}^\mathsf {T} \\ - \alpha \widetilde{{\textbf {A}}}&{}- \alpha \widetilde{{\textbf {C}}}&{}\frac{2}{\rho }{} {\textbf {I}}_{N+ML} \end{pmatrix} \end{aligned}$$
(A19)

is a positive definite matrix. Then, since Eq. (A9) has the expression that

$$\begin{aligned} {\textbf {u}}^{k+1} = \ {\textbf {u}}^k + \rho [\widetilde{{\textbf {A}}}({\textbf {a}}^k - \widetilde{{\textbf {a}}}^k) + \widetilde{{\textbf {C}}}({\textbf {c}}^k - \widetilde{{\textbf {c}}}^k)] - \alpha ({\textbf {u}}^k - \widetilde{{\textbf {u}}}^k), \end{aligned}$$
(A20)

the following equation holds:

$$\begin{aligned} {\textbf {r}}^{k+1} = {\textbf {r}}^k - \gamma {\textbf {M}}({\textbf {r}}^k - \widetilde{{\textbf {r}}}^k), \end{aligned}$$
(A21)

where \(\gamma = 1\) and

$$\begin{aligned} {\textbf {M}} = \begin{pmatrix} {\textbf {I}}_N&{}{\textbf {0}}&{}{\textbf {0}} \\ {\textbf {0}}&{}{\textbf {I}}_L&{}{\textbf {0}} \\ - \rho \widetilde{{\textbf {A}}}&{}- \rho \widetilde{{\textbf {C}}}&{}\alpha {\textbf {I}}_{N+ML} \end{pmatrix}. \end{aligned}$$
(A22)

Note that \({\textbf {M}}\) is a nonsingular matrix whose inverse matrix is

$$\begin{aligned} {\textbf {M}}^{-1} = \begin{pmatrix} {\textbf {I}}_N&{}{\textbf {0}}&{}{\textbf {0}} \\ {\textbf {0}}&{}{\textbf {I}}_L&{}{\textbf {0}} \\ \frac{\rho }{\alpha }\widetilde{{\textbf {A}}}&{}\frac{\rho }{\alpha }\widetilde{{\textbf {C}}}&{}\frac{1}{\alpha }{} {\textbf {I}}_{N+ML} \end{pmatrix}. \end{aligned}$$
(A23)

Furthermore, we can define a symmetric and positive definite matrix as

$$\begin{aligned} {\textbf {P}} := {\textbf {Q}}{} {\textbf {M}}^{-1} = \begin{pmatrix} \frac{\rho }{\alpha }\widetilde{{\textbf {A}}}^\mathsf {T}\widetilde{{\textbf {A}}}&{}{\textbf {0}}&{}\frac{1 - \alpha }{\alpha }\widetilde{{\textbf {A}}}^\mathsf {T} \\ {\textbf {0}}&{}\frac{\rho }{\alpha }\widetilde{{\textbf {C}}}^\mathsf {T}\widetilde{{\textbf {C}}}&{}\frac{1 - \alpha }{\alpha }\widetilde{{\textbf {C}}}^\mathsf {T} \\ \frac{1 - \alpha }{\alpha }\widetilde{{\textbf {A}}}&{}\frac{1 - \alpha }{\alpha }\widetilde{{\textbf {C}}}&{}\frac{1}{\alpha \rho }{} {\textbf {I}}_{N+ML} \end{pmatrix}. \end{aligned}$$
(A24)

Consequently, we find that

$$\begin{aligned} {\textbf {G}} := {\textbf {Q}}^\mathsf {T} + {\textbf {Q}} - \gamma {\textbf {M}}^\mathsf {T}{} {\textbf {P}}{} {\textbf {M}} = \begin{pmatrix} {\textbf {0}}&{}{\textbf {0}}&{}{\textbf {0}} \\ {\textbf {0}}&{}{\textbf {0}}&{}{\textbf {0}} \\ {\textbf {0}}&{}{\textbf {0}}&{}\frac{2 - \alpha }{\rho }{} {\textbf {I}}_{N+ML} \end{pmatrix} \end{aligned}$$
(A25)

is obviously positive semidefinite.

Finally, the matrices \({\textbf {Q}}^\mathsf {T} + {\textbf {Q}}\), \({\textbf {P}}\) and \({\textbf {G}}\) derived from (A19), (A24) and (A25) satisfy the convergence condition proposed in Lemma 1. As a result, the iterations of Eqs. (37)–(41), namely POE-ELM, converge to the global optimal solution. \(\square\)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Dai, Y. & Wu, Q. An accelerated optimization algorithm for the elastic-net extreme learning machine. Int. J. Mach. Learn. & Cyber. 13, 3993–4011 (2022). https://doi.org/10.1007/s13042-022-01636-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01636-1

Keywords

Navigation