An accelerated optimization algorithm for the elastic-net extreme learning machine

Zhang, Yuao; Dai, Yunwei; Wu, Qingbiao

doi:10.1007/s13042-022-01636-1

An accelerated optimization algorithm for the elastic-net extreme learning machine

Original Article
Published: 09 September 2022

Volume 13, pages 3993–4011, (2022)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

401 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Extreme learning machine (ELM) has received considerable attention due to its rapid learning speed and powerful fitting capabilities. One of its important variants, the elastic-net ELM (Enet-ELM), was recently proposed to improve its sparsity and stability of simulations simultaneously. However, entering the era of big data, the explosive growth of data volume and dimensions poses a huge challenge to Enet-ELM. On the other hand, the alternating direction method of multipliers (ADMM) is a powerful iterative algorithm for solving large-scale optimization problems by splitting a large problem into a set of executable sub-problems. But its performance is highly restricted by its astringency and convergence rates. In this paper, we therefore develop a novel Enet-ELM algorithm based on the over-relaxed ADMM, termed over-relaxed Enet-ELM (OE-ELM), which accelerates model training by applying the results of the previous iteration to the next iteration. Besides, we also propose a parallel version of OE-ELM (POE-ELM) to implement parallel and distributed computation, which is trained by the consensus over relaxation ADMM algorithm. Finally, the convergence analysis conducted on the two proposed algorithms proves the effectiveness of model training, and extensive experiments on classification and regression datasets demonstrate their competitiveness in accuracy and convergence rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse and Outlier Robust Extreme Learning Machine Based on the Alternating Direction Method of Multipliers

Article 17 March 2023

A Novel Regularization Paradigm for the Extreme Learning Machine

Article 10 April 2023

Improving the Speed and Quality of Extreme Learning Machine by Conjugate Gradient Method

Data availability

The datasets generated during the current study are available in the LIBSVM Data [https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/], UCI Machine Learning Repository [http://archive.ics.uci.edu/ml/index.php], and Luis Torgo’s Regression DataSets [https://www.dcc.fc.up.pt/ ltorgo/Regression/DataSets.html].

References

Mühlroth C, Grottke M (2020) Artificial intelligence in innovation: how to spot emerging trends and technologies. IEEE Trans Eng Manage 69(2):493–510
Article Google Scholar
Abu Arqub O, Abo-Hammour Z (2014) Numerical solution of systems of second-order boundary value problems using continuous genetic algorithm. Inf Sci 279:396–415
Article MathSciNet MATH Google Scholar
Topol EJ (2019) High-performance medicine: the convergence of human and artificial intelligence. Nat Med 25(1):44–56
Article Google Scholar
Abo-Hammour Z, Abu Arqub O, Alsmadi O et al (2014) An optimization algorithm for solving systems of singular boundary value problems. Appl Math Inf Sci 8(6):2809–2821
Article MathSciNet Google Scholar
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501
Article Google Scholar
Huang G, Huang GB, Song S et al (2015) Trends in extreme learning machines: a review. Neural Netw 61:32–48
Article MATH Google Scholar
Alshamiri AK, Singh A, Surampudi BR (2018) Two swarm intelligence approaches for tuning extreme learning machine. Int J Mach Learn Cybern 9(8):1271–1283
Article Google Scholar
Zhou L, Ma L (2019) Extreme learning machine-based heterogeneous domain adaptation for classification of hyperspectral images. IEEE Geosci Remote Sens Lett 16(11):1781–1785
Article Google Scholar
Lv F, Han M (2019) Hyperspectral image classification based on multiple reduced kernel extreme learning machine. Int J Mach Learn Cybern 10(12):3397–3405
Article Google Scholar
Zabala-Blanco D, Mora M, Barrientos RJ et al (2020) Fingerprint classification through standard and weighted extreme learning machines. Appl Sci 10(12):4125
Article Google Scholar
Jang SI, Tan GC, Toh KA et al (2017) Online heterogeneous face recognition based on total-error-rate minimization. IEEE Trans Syst Man Cybern Syst 50(4):1286–1299
Article Google Scholar
She Q, Zou J, Meng M et al (2021) Balanced graph-based regularized semi-supervised extreme learning machine for eeg classification. Int J Mach Learn Cybern 12(4):903–916
Article Google Scholar
Nayak DR, Das D, Dash R et al (2020) Deep extreme learning machine with leaky rectified linear unit for multiclass classification of pathological brain images. Multimed Tools Appl 79(21–22):15381–15396
Article Google Scholar
Sun W, Du Y, Zhang X et al (2021) Detection and recognition of text traffic signs above the road. Int J Sens Netw 35(2):69–78
Article Google Scholar
Liu B, Zhou Y, Sun W (2020) Character-level text classification via convolutional neural network and gated recurrent unit. Int J Mach Learn Cybern 11(8):1939–1949
Article Google Scholar
Deng W, Zheng Q, Chen L (2009) Regularized extreme learning machine. In: 2009 IEEE symposium on computational intelligence and data mining, IEEE, pp 389–395
Yıldırım H, Özkale MR (2020) An enhanced extreme learning machine based on Liu regression. Neural Process Lett 52(1):421–442
Article Google Scholar
Huang GB, Chen L, Siew CK et al (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892
Article Google Scholar
Martínez-Martínez JM, Escandell-Montero P, Soria-Olivas E et al (2011) Regularized extreme learning machine for regression problems. Neurocomputing 74(17):3716–3721
Article Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Series B Stat Methodol 67(2):301–320
Article MathSciNet MATH Google Scholar
Ghosh S (2011) On the grouped selection and model complexity of the adaptive elastic net. Stat Comput 21(3):451–462
Article MathSciNet MATH Google Scholar
Yıldırım H, Özkale MR (2021) Ll-elm: a regularized extreme learning machine based on l${1}$-norm and liu estimator. Neural Comput Appl 33(16):10,469-10,484
Article Google Scholar
Huang GB, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2(2):107–122
Article Google Scholar
Scardapane S, Wang D (2017) Randomness in neural networks: an overview. Wiley Interdisc Rev Data Min Knowl Discov 7(2):e1200
Article Google Scholar
Markowska-Kaczmar U, Kosturek M (2021) Extreme learning machine versus classical feedforward network. Neural Comput Appl 33(22):15,121-15,144
Article Google Scholar
Abo-Hammour Z, Abu Arqub O, Momani S et al (2014) (2014) Optimization solution of Troesch’s and Bratu’s problems of ordinary type using novel continuous genetic algorithm. Discrete Dyn Nat Soc 401:696
MATH Google Scholar
Wang Y, Dou Y, Liu X et al (2016) Pr-elm: parallel regularized extreme learning machine based on cluster. Neurocomputing 173:1073–1081
Article Google Scholar
Dokeroglu T, Sevinc E (2019) Evolutionary parallel extreme learning machines for the data classification problem. Comput Ind Eng 130:237–249
Article Google Scholar
Duan M, Li K, Liao X et al (2017) A parallel multiclassification algorithm for big data using an extreme learning machine. IEEE Trans Neural Netw Learn Syst 29(6):2337–2351
Article MathSciNet Google Scholar
Luo M, Zhang L, Liu J et al (2017) Distributed extreme learning machine with alternating direction method of multiplier. Neurocomputing 261:164–170
Article Google Scholar
Boyd S, Parikh N, Chu E et al (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends ®Mach Learn 3(1):1–122
MATH Google Scholar
Lai X, Cao J, Lin Z (2021) An accelerated maximally split admm for a class of generalized ridge regression. IEEE Trans Neural Networks Learn Syst: 1–15
Abu Arqub O, Abo-Hammour Z, Momani S et al (2012) Solving singular two-point boundary value problems using continuous genetic algorithm. Abstr Appl Anal 205:391
MathSciNet MATH Google Scholar
Wang H, Feng R, Han ZF et al (2017) Admm-based algorithm for training fault tolerant rbf networks and selecting centers. IEEE Trans Neural Networks Learn Syst 29(8):3870–3878
MathSciNet Google Scholar
Zhan Y, Bai Y, Zhang W et al (2018) A p-admm for sparse quadratic kernel-free least squares semi-supervised support vector machine. Neurocomputing 306:37–50
Article Google Scholar
Lai X, Cao J, Huang X et al (2019) A maximally split and relaxed admm for regularized extreme learning machines. IEEE Trans Neural Netw Learn Syst 31(6):1899–1913
Article MathSciNet Google Scholar
Song T, Li D, Liu Z et al (2019) Online admm-based extreme learning machine for sparse supervised learning. IEEE Access 7:64533–64544
Article Google Scholar
Chen C, He B, Ye Y et al (2016) The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Math Program 155(1–2):57–79
Article MathSciNet MATH Google Scholar
Tao M, Yuan X (2018) Convergence analysis of the direct extension of admm for multiple-block separable convex minimization. Adv Comput Math 44(3):773–813
Article MathSciNet MATH Google Scholar
Eckstein J, Bertsekas DP (1992) On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math Program 55(1):293–318
Article MathSciNet MATH Google Scholar
França G, Bento J (2016) An explicit rate bound for over-relaxed admm. In: 2016 IEEE International Symposium on Information Theory (ISIT), IEEE, pp 2104–2108
Alves MM, Eckstein J, Geremia M et al (2020) Relative-error inertial-relaxed inexact versions of douglas-rachford and admm splitting algorithms. Comput Optim Appl 75(2):389–422
Article MathSciNet MATH Google Scholar
Sun H, Tai XC, Yuan J (2021) Efficient and convergent preconditioned admm for the potts models. SIAM J Sci Comput 43(2):B455–B478
Article MathSciNet MATH Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Article MATH Google Scholar
Wang M, Wei M, Feng Y (2010) An iterative algorithm for a least squares solution of a matrix equation. Int J Comput Math 87(6):1289–1298
Article MathSciNet MATH Google Scholar
Inaba FK, Salles EOT, Perron S et al (2018) Dgr-elm-distributed generalized regularized elm for classification. Neurocomputing 275:1522–1530
Article Google Scholar
Donoho DL (1995) De-noising by soft-thresholding. IEEE Trans Inf Theory 41(3):613–627
Article MathSciNet MATH Google Scholar
He B, Yuan X (2014) On the direct extension of admm for multi-block separable convex programming and beyond: from variational inequality perspective. Optimization-Online 2014:4293
Google Scholar
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Tech (TIST) 2(3):1–27
Article Google Scholar
Dua D, Graff C (2019) UCI machine learning repository. https://archive.ics.uci.edu/ml. Accessed 8 December 2021
Torgo L (2017) Regression data sets. https://www.dcc.fc.up.pt/~ltorgo/Regression/DataSets.html Accessed 8 December 2021

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No.12271479).

Author information

Authors and Affiliations

School of Mathematical Sciences, Zhejiang University, 38 Zheda Road, Hangzhou, 310027, Zhejiang, China
Yuao Zhang, Yunwei Dai & Qingbiao Wu

Authors

Yuao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yunwei Dai
View author publications
You can also search for this author in PubMed Google Scholar
Qingbiao Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qingbiao Wu.

Ethics declarations

Conflict of interest

The authors declared that they have no conflict of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A Proof of Theorem 1

Proof

Note that POE-ELM is equivalent to the iterations of Eqs. (37)–(41), so we just need to prove that the iterations of Eqs. (37)–(41) converge to the global optimal solution because for $\forall i \ne j$ there must be $\widetilde{{\textbf {B}}}_i^\mathsf {T}\widetilde{{\textbf {B}}}_j = {\textbf {0}}$ ($i, j = 1, \ldots , M$), both $\widetilde{{\textbf {A}}}$ and $\widetilde{{\textbf {C}}}$ have full column rank, and $\widetilde{{\textbf {A}}}^\mathsf {T}\widetilde{{\textbf {C}}} = {\textbf {0}}$. By the first-order optimality conditions of Eqs. (37)–(41), we derive that

$$\begin{aligned}&\theta _j({\varvec{\beta }}_j) - \theta _j({\varvec{\beta }}_j^{k+1}) + ({\varvec{\beta }}_j \nonumber \\&\quad - {\varvec{\beta }}_j^{k+1})^\mathsf {T} \Bigg [ -\widetilde{{\textbf {B}}}_j^\mathsf {T} {\textbf {u}}^k + \rho \widetilde{{\textbf {B}}}_j^\mathsf {T} (\sum _{l=1}^{j}\widetilde{{\textbf {B}}}_l{\varvec{\beta }}_l^{k+1} \nonumber \\&\quad + \sum ^{M}_{l=j+1}\widetilde{{\textbf {B}}}_l{\varvec{\beta }}_l^k + \widetilde{{\textbf {A}}}{} {\textbf {a}}^k \nonumber \\&\quad + \widetilde{{\textbf {C}}}{} {\textbf {c}}^k + \widetilde{{\textbf {d}}}) \Bigg ] \ge 0, \ \forall {\varvec{\beta }}_j \in \mathbb {R}^L, j = 1, \ldots , M, \end{aligned}$$

(A1)

$$\begin{aligned}&\theta _{M+1}({\textbf {a}}) - \theta _{M+1}({\textbf {a}}^{k+1}) + ({\textbf {a}} - {\textbf {a}}^{k+1})^\mathsf {T} \Bigg \{ -\widetilde{{\textbf {A}}}^\mathsf {T} {\textbf {u}}^k \nonumber \\&\quad + \rho \widetilde{{\textbf {A}}}^\mathsf {T} [\alpha \sum _{j=1}^M\widetilde{{\textbf {B}}}_j{\varvec{\beta }}_j^{k+1} \nonumber \\&\quad + \widetilde{{\textbf {A}}} ({\textbf {a}}^{k+1} - (1 - \alpha ){\textbf {a}}^k) + \alpha \widetilde{{\textbf {C}}}{} {\textbf {c}}^k \nonumber \\&\quad + \alpha \widetilde{{\textbf {d}}}] \Bigg \} \ge 0, \forall {\textbf {a}} \in \mathbb {R}^N, \end{aligned}$$

(A2)

$$\begin{aligned}&\theta _{M+2}({\textbf {c}}) - \theta _{M+2}({\textbf {c}}^{k+1}) + ({\textbf {c}} - {\textbf {c}}^{k+1})^\mathsf {T} \Bigg \{ -\widetilde{{\textbf {C}}}^\mathsf {T} {\textbf {u}}^k \nonumber \\&\quad + \rho \widetilde{{\textbf {C}}}^\mathsf {T} [\alpha \sum _{j=1}^M\widetilde{{\textbf {B}}}_j{\varvec{\beta }}_j^{k+1} \nonumber \\&\quad + \widetilde{{\textbf {A}}} ({\textbf {a}}^{k+1} - (1 - \alpha ){\textbf {a}}^k) + \widetilde{{\textbf {C}}} ({\textbf {c}}^{k+1} - (1 - \alpha ){\textbf {c}}^k) \nonumber \\&\quad + \alpha \widetilde{{\textbf {d}}}] \Bigg \} \ge 0, \forall {\textbf {c}} \in \mathbb {R}^{L}, \end{aligned}$$

(A3)

$$\begin{aligned}&{\textbf {u}}^{k+1} = {\textbf {u}}^k - \rho \big [\alpha \sum _{j=1}^M\widetilde{{\textbf {B}}}_j{\varvec{\beta }}_j^{k+1} + \widetilde{{\textbf {A}}} ({\textbf {a}}^{k+1} \nonumber \\&\quad - (1 - \alpha ){\textbf {a}}^k) + \widetilde{{\textbf {C}}} ({\textbf {c}}^{k+1} \nonumber \\&\quad - (1 - \alpha ){\textbf {c}}^k) + \alpha \widetilde{{\textbf {d}}}\big ], \end{aligned}$$

(A4)

where $\theta _j({\varvec{\beta }}_j) = \lambda (1 - \sigma )\Vert {\varvec{\beta }}_j \Vert ^2_2$, $\theta _{M+1}({\textbf {a}}) = \sum ^M_{j = 1} \Vert {\textbf {a}}_j \Vert ^2_2$, $\theta _{M+2}({\textbf {c}}) = \lambda \sigma \Vert {\textbf {c}} \Vert _1,$ and ${\textbf {u}} = -\rho [{\textbf {v}}_{11}^\mathsf {T}, \ldots , {\textbf {v}}_{1M}^\mathsf {T}, {\textbf {v}}_{21}^\mathsf {T}, \ldots , {\textbf {v}}_{2M}^\mathsf {T}]^\mathsf {T}$.

For simplicity, we also define

$$\begin{aligned}&{\textbf {p}}^k := \begin{pmatrix} {\varvec{\beta }}_1^k \\ \vdots \\ {\varvec{\beta }}_M^k \\ {\textbf {a}}^k \\ {\textbf {c}}^k \end{pmatrix}, \ {\textbf {q}}^k := \begin{pmatrix} {\varvec{\beta }}_1^k \\ \vdots \\ {\varvec{\beta }}_M^k \\ {\textbf {a}}^k \\ {\textbf {c}}^k \\ {\textbf {u}}^k \end{pmatrix}, \ {\textbf {r}}^k := \begin{pmatrix} {\textbf {a}}^k\\ {\textbf {c}}^k\\ {\textbf {u}}^k \end{pmatrix}, \end{aligned}$$

(A5)

$$\begin{aligned}&\widetilde{{\varvec{\beta }}}_j^k := {\varvec{\beta }}_j^{k+1}, \ \widetilde{{\textbf {a}}}^k := {\textbf {a}}^{k+1}, \ \widetilde{{\textbf {c}}}^k := {\textbf {c}}^{k+1}, \end{aligned}$$

(A6)

$$\begin{aligned}&\widetilde{{\textbf {u}}}^k := {\textbf {u}}^k - \rho (\sum ^M_{j=1}\widetilde{{\textbf {B}}}_j{\varvec{\beta }}_j^{k+1} + \widetilde{{\textbf {A}}}{} {\textbf {a}}^k + \widetilde{{\textbf {C}}}{} {\textbf {c}}^k + \widetilde{{\textbf {d}}}), \end{aligned}$$

(A7)

$$\begin{aligned}&\widetilde{{\textbf {p}}}^k := \begin{pmatrix} \widetilde{{\varvec{\beta }}}_1^k\\ \vdots \\ \widetilde{{\varvec{\beta }}}_M^k \\ \widetilde{{\textbf {a}}}^k \\ \widetilde{{\textbf {c}}}^k \end{pmatrix}, \ \widetilde{{\textbf {q}}}^k := \begin{pmatrix} \widetilde{{\varvec{\beta }}}_1^k \\ \vdots \\ \widetilde{{\varvec{\beta }}}_M^k \\ \widetilde{{\textbf {a}}}^k \\ \widetilde{{\textbf {c}}}^k \\ \widetilde{{\textbf {u}}}^k \end{pmatrix}, \ \widetilde{{\textbf {r}}}^k := \begin{pmatrix} \widetilde{{\textbf {a}}}^k \\ \widetilde{{\textbf {c}}}^k \\ \widetilde{{\textbf {u}}}^k \end{pmatrix}, \end{aligned}$$

(A8)

where (${\varvec{\beta }}_1^{k+1}, \ldots , {\varvec{\beta }}_M^{k+1}, {\textbf {a}}^{k+1}, {\textbf {c}}^{k+1}$) is generated by Eqs. (37)-(41) from given (${\textbf {a}}^k, {\textbf {c}}^k, {\textbf {u}}^k$). Then Eq. (A4) can be reformulated as

$$\begin{aligned} {\textbf {u}}^{k+1} = \&\widetilde{{\textbf {u}}}^k + \rho [\widetilde{{\textbf {A}}} ({\textbf {a}}^k - \widetilde{{\textbf {a}}}^k) + \widetilde{{\textbf {C}}} ({\textbf {c}}^k - \widetilde{{\textbf {c}}}^k)] + (1 - \alpha )({\textbf {u}}^k - \widetilde{{\textbf {u}}}^k). \end{aligned}$$

(A9)

Substituting the definition of $\widetilde{{\textbf {u}}}^k$ and Eq. (A9) into Eqs. (A1)-(A4), we have

$$\begin{aligned}&\theta _j({\varvec{\beta }}_j) - \theta _j({\varvec{\beta }}_j^{k+1}) + ({\varvec{\beta }}_j \nonumber \\&\quad - {\varvec{\beta }}_j^{k+1})^\mathsf {T} \Bigg \{ -\widetilde{{\textbf {B}}}_j^\mathsf {T}\widetilde{{\textbf {u}}}^k - \rho \widetilde{{\textbf {B}}}_j^\mathsf {T} [\sum ^M_{l = j + 1} \widetilde{{\textbf {B}}}_l({\varvec{\beta }}_l^{k+1} - {\varvec{\beta }}_l^k)] \Bigg \} \nonumber \\&\quad \ge 0, \ \forall {\varvec{\beta }}_j \in \mathbb {R}^L, \end{aligned}$$

(A10)

$$\begin{aligned}&\theta _{M+1}({\textbf {a}}) - \theta _{M+1}({\textbf {a}}^{k+1}) \nonumber \\&\quad + ({\textbf {a}} - {\textbf {a}}^{k+1})^\mathsf {T} \Bigg \{ -\widetilde{{\textbf {A}}}^\mathsf {T}\widetilde{{\textbf {u}}}^k \nonumber \\&\quad - \rho \widetilde{{\textbf {A}}}^\mathsf {T}\widetilde{{\textbf {A}}}({\textbf {a}}^k - \widetilde{{\textbf {a}}}^k) \nonumber \\&\quad - (1 - \alpha )\widetilde{{\textbf {A}}}^\mathsf {T}({\textbf {u}}^k - \widetilde{{\textbf {u}}}^k) \Bigg \} \ge 0, \ \forall {\textbf {a}} \in \mathbb {R}^N, \end{aligned}$$

(A11)

$$\begin{aligned}&\theta _{M+2}({\textbf {c}}) - \theta _{M+2}({\textbf {c}}^{k+1}) \nonumber \\&\quad + ({\textbf {c}} - {\textbf {c}}^{k+1})^\mathsf {T} \Bigg \{ -\widetilde{{\textbf {C}}}^\mathsf {T}\widetilde{{\textbf {u}}}^k - \rho \widetilde{{\textbf {C}}}^\mathsf {T}\widetilde{{\textbf {A}}}({\textbf {a}}^k - \widetilde{{\textbf {a}}}^k) \nonumber \\&\quad - \rho \widetilde{{\textbf {C}}}^\mathsf {T}\widetilde{{\textbf {C}}}({\textbf {c}}^k - \widetilde{{\textbf {c}}}^k) \nonumber \\&\quad - (1 - \alpha )\widetilde{{\textbf {C}}}^\mathsf {T}({\textbf {u}}^k - \widetilde{{\textbf {u}}}^k) \Bigg \} \ge 0, \ \forall {\textbf {c}} \in \mathbb {R}^{L}, \end{aligned}$$

(A12)

$$\begin{aligned}&(\sum ^M_{j=1}\widetilde{{\textbf {B}}}_j{\varvec{\beta }}_j^{k+1} + \widetilde{{\textbf {A}}}{} {\textbf {a}}^{k+1} + \widetilde{{\textbf {C}}}{} {\textbf {c}}^{k+1} \nonumber \\&\quad + \widetilde{{\textbf {d}}}) + \widetilde{{\textbf {A}}} ({\textbf {a}}^k - \widetilde{{\textbf {a}}}^k) + \widetilde{{\textbf {C}}} ({\textbf {c}}^k - \widetilde{{\textbf {c}}}^k) \nonumber \\&\quad - \frac{1}{\rho }({\textbf {u}}^k - \widetilde{{\textbf {u}}}^k)= {\textbf {0}}. \end{aligned}$$

(A13)

By Eq. (A13), we derive that

$$\begin{aligned}&({\textbf {u}} - \widetilde{{\textbf {u}}}^k)^\mathsf {T}[(\sum ^M_{j=1}\widetilde{{\textbf {B}}}_j{\varvec{\beta }}_j^{k+1} + \widetilde{{\textbf {A}}}{} {\textbf {a}}^{k+1} \nonumber \\&\quad + \widetilde{{\textbf {C}}}{} {\textbf {c}}^{k+1} + \widetilde{{\textbf {d}}}) + \widetilde{{\textbf {A}}} ({\textbf {a}}^k - \widetilde{{\textbf {a}}}^k) + \widetilde{{\textbf {C}}} ({\textbf {c}}^k - \widetilde{{\textbf {c}}}^k) \nonumber \\&\quad - \frac{1}{\rho }({\textbf {u}}^k - \widetilde{{\textbf {u}}}^k)] = 0, \ \forall {\textbf {u}} \in \mathbb {R}^{N+ML}. \end{aligned}$$

(A14)

Thus, by adding up Eqs. (A10)–(A12) and (A14), since $\widetilde{{\textbf {B}}}_i^\mathsf {T}\widetilde{{\textbf {B}}}_j = {\textbf {0}}$ ($\forall i \ne j$, $i, j = 1, \ldots , M$) and $\widetilde{{\textbf {A}}}^\mathsf {T}\widetilde{{\textbf {C}}} = {\textbf {0}}$, we can deduce the following inequality:

$$\begin{aligned} \theta ({\textbf {p}}) - \theta (\widetilde{{\textbf {p}}}^k) + ({\textbf {q}} - \widetilde{{\textbf {q}}}^k)^\mathsf {T}[{\mathrm{F}}(\widetilde{{\textbf {q}}}^k) - {\textbf {Q}}_0({\textbf {q}}^k - \widetilde{{\textbf {q}}}^k)] \ge 0, \end{aligned}$$

(A15)

where

$$\begin{aligned} {\textbf {Q}}_0 = \begin{pmatrix} {\textbf {0}}&{}{\textbf {0}}&{}{\textbf {0}}&{}{\textbf {0}} \\ {\textbf {0}}&{}\rho \widetilde{{\textbf {A}}}^\mathsf {T}\widetilde{{\textbf {A}}}&{}{\textbf {0}}&{}(1 - \alpha )\widetilde{{\textbf {A}}}^\mathsf {T} \\ {\textbf {0}}&{}{\textbf {0}}&{}\rho \widetilde{{\textbf {C}}}^\mathsf {T}\widetilde{{\textbf {C}}}&{}(1 - \alpha )\widetilde{{\textbf {C}}}^\mathsf {T} \\ {\textbf {0}}&{}-\widetilde{{\textbf {A}}}&{}-\widetilde{{\textbf {C}}}&{}\frac{1}{\rho }{} {\textbf {I}}_{N+ML} \end{pmatrix}. \end{aligned}$$

(A16)

By the definition of ${\textbf {q}}$, ${\textbf {q}}^k$, $\widetilde{{\textbf {q}}}^k$ and ${\textbf {r}}$, ${\textbf {r}}^k$, $\widetilde{{\textbf {r}}}^k$, Eq. (A15) can be simplified as

$$\begin{aligned} \theta ({\textbf {p}}) - \theta (\widetilde{{\textbf {p}}}^k) + ({\textbf {q}} - \widetilde{{\textbf {q}}}^k)^\mathsf {T}{\mathrm{F}}(\widetilde{{\textbf {q}}}^k) \ge ({\textbf {r}} - \widetilde{{\textbf {r}}}^k)^\mathsf {T}{} {\textbf {Q}}({\textbf {r}}^k - \widetilde{{\textbf {r}}}^k), \end{aligned}$$

(A17)

where

$$\begin{aligned} {\textbf {Q}} = \begin{pmatrix} \rho \widetilde{{\textbf {A}}}^\mathsf {T}\widetilde{{\textbf {A}}}&{}{\textbf {0}}&{}(1 - \alpha )\widetilde{{\textbf {A}}}^\mathsf {T} \\ {\textbf {0}}&{}\rho \widetilde{{\textbf {C}}}^\mathsf {T}\widetilde{{\textbf {C}}}&{}(1 - \alpha )\widetilde{{\textbf {C}}}^\mathsf {T} \\ -\widetilde{{\textbf {A}}}&{}-\widetilde{{\textbf {C}}}&{}\frac{1}{\rho }{} {\textbf {I}}_{N+ML} \end{pmatrix}. \end{aligned}$$

(A18)

According to the full column rank of $\widetilde{{\textbf {A}}}$ and $\widetilde{{\textbf {C}}}$, it is obvious that

$$\begin{aligned} {\textbf {Q}}^\mathsf {T} + {\textbf {Q}} = \begin{pmatrix} 2\rho \widetilde{{\textbf {A}}}^\mathsf {T}\widetilde{{\textbf {A}}}&{}{\textbf {0}}&{}- \alpha \widetilde{{\textbf {A}}}^\mathsf {T} \\ {\textbf {0}}&{}2\rho \widetilde{{\textbf {C}}}^\mathsf {T}\widetilde{{\textbf {C}}}&{}- \alpha \widetilde{{\textbf {C}}}^\mathsf {T} \\ - \alpha \widetilde{{\textbf {A}}}&{}- \alpha \widetilde{{\textbf {C}}}&{}\frac{2}{\rho }{} {\textbf {I}}_{N+ML} \end{pmatrix} \end{aligned}$$

(A19)

is a positive definite matrix. Then, since Eq. (A9) has the expression that

$$\begin{aligned} {\textbf {u}}^{k+1} = \ {\textbf {u}}^k + \rho [\widetilde{{\textbf {A}}}({\textbf {a}}^k - \widetilde{{\textbf {a}}}^k) + \widetilde{{\textbf {C}}}({\textbf {c}}^k - \widetilde{{\textbf {c}}}^k)] - \alpha ({\textbf {u}}^k - \widetilde{{\textbf {u}}}^k), \end{aligned}$$

(A20)

the following equation holds:

$$\begin{aligned} {\textbf {r}}^{k+1} = {\textbf {r}}^k - \gamma {\textbf {M}}({\textbf {r}}^k - \widetilde{{\textbf {r}}}^k), \end{aligned}$$

(A21)

where $\gamma = 1$ and

$$\begin{aligned} {\textbf {M}} = \begin{pmatrix} {\textbf {I}}_N&{}{\textbf {0}}&{}{\textbf {0}} \\ {\textbf {0}}&{}{\textbf {I}}_L&{}{\textbf {0}} \\ - \rho \widetilde{{\textbf {A}}}&{}- \rho \widetilde{{\textbf {C}}}&{}\alpha {\textbf {I}}_{N+ML} \end{pmatrix}. \end{aligned}$$

(A22)

Note that ${\textbf {M}}$ is a nonsingular matrix whose inverse matrix is

$$\begin{aligned} {\textbf {M}}^{-1} = \begin{pmatrix} {\textbf {I}}_N&{}{\textbf {0}}&{}{\textbf {0}} \\ {\textbf {0}}&{}{\textbf {I}}_L&{}{\textbf {0}} \\ \frac{\rho }{\alpha }\widetilde{{\textbf {A}}}&{}\frac{\rho }{\alpha }\widetilde{{\textbf {C}}}&{}\frac{1}{\alpha }{} {\textbf {I}}_{N+ML} \end{pmatrix}. \end{aligned}$$

(A23)

Furthermore, we can define a symmetric and positive definite matrix as

$$\begin{aligned} {\textbf {P}} := {\textbf {Q}}{} {\textbf {M}}^{-1} = \begin{pmatrix} \frac{\rho }{\alpha }\widetilde{{\textbf {A}}}^\mathsf {T}\widetilde{{\textbf {A}}}&{}{\textbf {0}}&{}\frac{1 - \alpha }{\alpha }\widetilde{{\textbf {A}}}^\mathsf {T} \\ {\textbf {0}}&{}\frac{\rho }{\alpha }\widetilde{{\textbf {C}}}^\mathsf {T}\widetilde{{\textbf {C}}}&{}\frac{1 - \alpha }{\alpha }\widetilde{{\textbf {C}}}^\mathsf {T} \\ \frac{1 - \alpha }{\alpha }\widetilde{{\textbf {A}}}&{}\frac{1 - \alpha }{\alpha }\widetilde{{\textbf {C}}}&{}\frac{1}{\alpha \rho }{} {\textbf {I}}_{N+ML} \end{pmatrix}. \end{aligned}$$

(A24)

Consequently, we find that

$$\begin{aligned} {\textbf {G}} := {\textbf {Q}}^\mathsf {T} + {\textbf {Q}} - \gamma {\textbf {M}}^\mathsf {T}{} {\textbf {P}}{} {\textbf {M}} = \begin{pmatrix} {\textbf {0}}&{}{\textbf {0}}&{}{\textbf {0}} \\ {\textbf {0}}&{}{\textbf {0}}&{}{\textbf {0}} \\ {\textbf {0}}&{}{\textbf {0}}&{}\frac{2 - \alpha }{\rho }{} {\textbf {I}}_{N+ML} \end{pmatrix} \end{aligned}$$

(A25)

is obviously positive semidefinite.

Finally, the matrices ${\textbf {Q}}^\mathsf {T} + {\textbf {Q}}$, ${\textbf {P}}$ and ${\textbf {G}}$ derived from (A19), (A24) and (A25) satisfy the convergence condition proposed in Lemma 1. As a result, the iterations of Eqs. (37)–(41), namely POE-ELM, converge to the global optimal solution. $\square$

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Dai, Y. & Wu, Q. An accelerated optimization algorithm for the elastic-net extreme learning machine. Int. J. Mach. Learn. & Cyber. 13, 3993–4011 (2022). https://doi.org/10.1007/s13042-022-01636-1

Download citation

Received: 16 March 2022
Accepted: 17 August 2022
Published: 09 September 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s13042-022-01636-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An accelerated optimization algorithm for the elastic-net extreme learning machine

Abstract

Access this article

Similar content being viewed by others

Sparse and Outlier Robust Extreme Learning Machine Based on the Alternating Direction Method of Multipliers

A Novel Regularization Paradigm for the Extreme Learning Machine

Improving the Speed and Quality of Extreme Learning Machine by Conjugate Gradient Method

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A Proof of Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An accelerated optimization algorithm for the elastic-net extreme learning machine

Abstract

Access this article

Similar content being viewed by others

Sparse and Outlier Robust Extreme Learning Machine Based on the Alternating Direction Method of Multipliers

A Novel Regularization Paradigm for the Extreme Learning Machine

Improving the Speed and Quality of Extreme Learning Machine by Conjugate Gradient Method

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A Proof of Theorem 1

Appendix A Proof of Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation