Abstract
Extreme learning machine (ELM) has received considerable attention due to its rapid learning speed and powerful fitting capabilities. One of its important variants, the elastic-net ELM (Enet-ELM), was recently proposed to improve its sparsity and stability of simulations simultaneously. However, entering the era of big data, the explosive growth of data volume and dimensions poses a huge challenge to Enet-ELM. On the other hand, the alternating direction method of multipliers (ADMM) is a powerful iterative algorithm for solving large-scale optimization problems by splitting a large problem into a set of executable sub-problems. But its performance is highly restricted by its astringency and convergence rates. In this paper, we therefore develop a novel Enet-ELM algorithm based on the over-relaxed ADMM, termed over-relaxed Enet-ELM (OE-ELM), which accelerates model training by applying the results of the previous iteration to the next iteration. Besides, we also propose a parallel version of OE-ELM (POE-ELM) to implement parallel and distributed computation, which is trained by the consensus over relaxation ADMM algorithm. Finally, the convergence analysis conducted on the two proposed algorithms proves the effectiveness of model training, and extensive experiments on classification and regression datasets demonstrate their competitiveness in accuracy and convergence rate.
Similar content being viewed by others
Data availability
The datasets generated during the current study are available in the LIBSVM Data [https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/], UCI Machine Learning Repository [http://archive.ics.uci.edu/ml/index.php], and Luis Torgo’s Regression DataSets [https://www.dcc.fc.up.pt/ ltorgo/Regression/DataSets.html].
References
Mühlroth C, Grottke M (2020) Artificial intelligence in innovation: how to spot emerging trends and technologies. IEEE Trans Eng Manage 69(2):493–510
Abu Arqub O, Abo-Hammour Z (2014) Numerical solution of systems of second-order boundary value problems using continuous genetic algorithm. Inf Sci 279:396–415
Topol EJ (2019) High-performance medicine: the convergence of human and artificial intelligence. Nat Med 25(1):44–56
Abo-Hammour Z, Abu Arqub O, Alsmadi O et al (2014) An optimization algorithm for solving systems of singular boundary value problems. Appl Math Inf Sci 8(6):2809–2821
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501
Huang G, Huang GB, Song S et al (2015) Trends in extreme learning machines: a review. Neural Netw 61:32–48
Alshamiri AK, Singh A, Surampudi BR (2018) Two swarm intelligence approaches for tuning extreme learning machine. Int J Mach Learn Cybern 9(8):1271–1283
Zhou L, Ma L (2019) Extreme learning machine-based heterogeneous domain adaptation for classification of hyperspectral images. IEEE Geosci Remote Sens Lett 16(11):1781–1785
Lv F, Han M (2019) Hyperspectral image classification based on multiple reduced kernel extreme learning machine. Int J Mach Learn Cybern 10(12):3397–3405
Zabala-Blanco D, Mora M, Barrientos RJ et al (2020) Fingerprint classification through standard and weighted extreme learning machines. Appl Sci 10(12):4125
Jang SI, Tan GC, Toh KA et al (2017) Online heterogeneous face recognition based on total-error-rate minimization. IEEE Trans Syst Man Cybern Syst 50(4):1286–1299
She Q, Zou J, Meng M et al (2021) Balanced graph-based regularized semi-supervised extreme learning machine for eeg classification. Int J Mach Learn Cybern 12(4):903–916
Nayak DR, Das D, Dash R et al (2020) Deep extreme learning machine with leaky rectified linear unit for multiclass classification of pathological brain images. Multimed Tools Appl 79(21–22):15381–15396
Sun W, Du Y, Zhang X et al (2021) Detection and recognition of text traffic signs above the road. Int J Sens Netw 35(2):69–78
Liu B, Zhou Y, Sun W (2020) Character-level text classification via convolutional neural network and gated recurrent unit. Int J Mach Learn Cybern 11(8):1939–1949
Deng W, Zheng Q, Chen L (2009) Regularized extreme learning machine. In: 2009 IEEE symposium on computational intelligence and data mining, IEEE, pp 389–395
Yıldırım H, Özkale MR (2020) An enhanced extreme learning machine based on Liu regression. Neural Process Lett 52(1):421–442
Huang GB, Chen L, Siew CK et al (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892
Martínez-Martínez JM, Escandell-Montero P, Soria-Olivas E et al (2011) Regularized extreme learning machine for regression problems. Neurocomputing 74(17):3716–3721
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Series B Stat Methodol 67(2):301–320
Ghosh S (2011) On the grouped selection and model complexity of the adaptive elastic net. Stat Comput 21(3):451–462
Yıldırım H, Özkale MR (2021) Ll-elm: a regularized extreme learning machine based on l\({1}\)-norm and liu estimator. Neural Comput Appl 33(16):10,469-10,484
Huang GB, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2(2):107–122
Scardapane S, Wang D (2017) Randomness in neural networks: an overview. Wiley Interdisc Rev Data Min Knowl Discov 7(2):e1200
Markowska-Kaczmar U, Kosturek M (2021) Extreme learning machine versus classical feedforward network. Neural Comput Appl 33(22):15,121-15,144
Abo-Hammour Z, Abu Arqub O, Momani S et al (2014) (2014) Optimization solution of Troesch’s and Bratu’s problems of ordinary type using novel continuous genetic algorithm. Discrete Dyn Nat Soc 401:696
Wang Y, Dou Y, Liu X et al (2016) Pr-elm: parallel regularized extreme learning machine based on cluster. Neurocomputing 173:1073–1081
Dokeroglu T, Sevinc E (2019) Evolutionary parallel extreme learning machines for the data classification problem. Comput Ind Eng 130:237–249
Duan M, Li K, Liao X et al (2017) A parallel multiclassification algorithm for big data using an extreme learning machine. IEEE Trans Neural Netw Learn Syst 29(6):2337–2351
Luo M, Zhang L, Liu J et al (2017) Distributed extreme learning machine with alternating direction method of multiplier. Neurocomputing 261:164–170
Boyd S, Parikh N, Chu E et al (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends ®Mach Learn 3(1):1–122
Lai X, Cao J, Lin Z (2021) An accelerated maximally split admm for a class of generalized ridge regression. IEEE Trans Neural Networks Learn Syst: 1–15
Abu Arqub O, Abo-Hammour Z, Momani S et al (2012) Solving singular two-point boundary value problems using continuous genetic algorithm. Abstr Appl Anal 205:391
Wang H, Feng R, Han ZF et al (2017) Admm-based algorithm for training fault tolerant rbf networks and selecting centers. IEEE Trans Neural Networks Learn Syst 29(8):3870–3878
Zhan Y, Bai Y, Zhang W et al (2018) A p-admm for sparse quadratic kernel-free least squares semi-supervised support vector machine. Neurocomputing 306:37–50
Lai X, Cao J, Huang X et al (2019) A maximally split and relaxed admm for regularized extreme learning machines. IEEE Trans Neural Netw Learn Syst 31(6):1899–1913
Song T, Li D, Liu Z et al (2019) Online admm-based extreme learning machine for sparse supervised learning. IEEE Access 7:64533–64544
Chen C, He B, Ye Y et al (2016) The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Math Program 155(1–2):57–79
Tao M, Yuan X (2018) Convergence analysis of the direct extension of admm for multiple-block separable convex minimization. Adv Comput Math 44(3):773–813
Eckstein J, Bertsekas DP (1992) On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math Program 55(1):293–318
França G, Bento J (2016) An explicit rate bound for over-relaxed admm. In: 2016 IEEE International Symposium on Information Theory (ISIT), IEEE, pp 2104–2108
Alves MM, Eckstein J, Geremia M et al (2020) Relative-error inertial-relaxed inexact versions of douglas-rachford and admm splitting algorithms. Comput Optim Appl 75(2):389–422
Sun H, Tai XC, Yuan J (2021) Efficient and convergent preconditioned admm for the potts models. SIAM J Sci Comput 43(2):B455–B478
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Wang M, Wei M, Feng Y (2010) An iterative algorithm for a least squares solution of a matrix equation. Int J Comput Math 87(6):1289–1298
Inaba FK, Salles EOT, Perron S et al (2018) Dgr-elm-distributed generalized regularized elm for classification. Neurocomputing 275:1522–1530
Donoho DL (1995) De-noising by soft-thresholding. IEEE Trans Inf Theory 41(3):613–627
He B, Yuan X (2014) On the direct extension of admm for multi-block separable convex programming and beyond: from variational inequality perspective. Optimization-Online 2014:4293
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Tech (TIST) 2(3):1–27
Dua D, Graff C (2019) UCI machine learning repository. https://archive.ics.uci.edu/ml. Accessed 8 December 2021
Torgo L (2017) Regression data sets. https://www.dcc.fc.up.pt/~ltorgo/Regression/DataSets.html Accessed 8 December 2021
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No.12271479).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declared that they have no conflict of interest to this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A Proof of Theorem 1
Appendix A Proof of Theorem 1
Proof
Note that POE-ELM is equivalent to the iterations of Eqs. (37)–(41), so we just need to prove that the iterations of Eqs. (37)–(41) converge to the global optimal solution because for \(\forall i \ne j\) there must be \(\widetilde{{\textbf {B}}}_i^\mathsf {T}\widetilde{{\textbf {B}}}_j = {\textbf {0}}\) (\(i, j = 1, \ldots , M\)), both \(\widetilde{{\textbf {A}}}\) and \(\widetilde{{\textbf {C}}}\) have full column rank, and \(\widetilde{{\textbf {A}}}^\mathsf {T}\widetilde{{\textbf {C}}} = {\textbf {0}}\). By the first-order optimality conditions of Eqs. (37)–(41), we derive that
where \(\theta _j({\varvec{\beta }}_j) = \lambda (1 - \sigma )\Vert {\varvec{\beta }}_j \Vert ^2_2\), \(\theta _{M+1}({\textbf {a}}) = \sum ^M_{j = 1} \Vert {\textbf {a}}_j \Vert ^2_2\), \(\theta _{M+2}({\textbf {c}}) = \lambda \sigma \Vert {\textbf {c}} \Vert _1,\) and \({\textbf {u}} = -\rho [{\textbf {v}}_{11}^\mathsf {T}, \ldots , {\textbf {v}}_{1M}^\mathsf {T}, {\textbf {v}}_{21}^\mathsf {T}, \ldots , {\textbf {v}}_{2M}^\mathsf {T}]^\mathsf {T}\).
For simplicity, we also define
where (\({\varvec{\beta }}_1^{k+1}, \ldots , {\varvec{\beta }}_M^{k+1}, {\textbf {a}}^{k+1}, {\textbf {c}}^{k+1}\)) is generated by Eqs. (37)-(41) from given (\({\textbf {a}}^k, {\textbf {c}}^k, {\textbf {u}}^k\)). Then Eq. (A4) can be reformulated as
Substituting the definition of \(\widetilde{{\textbf {u}}}^k\) and Eq. (A9) into Eqs. (A1)-(A4), we have
By Eq. (A13), we derive that
Thus, by adding up Eqs. (A10)–(A12) and (A14), since \(\widetilde{{\textbf {B}}}_i^\mathsf {T}\widetilde{{\textbf {B}}}_j = {\textbf {0}}\) (\(\forall i \ne j\), \(i, j = 1, \ldots , M\)) and \(\widetilde{{\textbf {A}}}^\mathsf {T}\widetilde{{\textbf {C}}} = {\textbf {0}}\), we can deduce the following inequality:
where
By the definition of \({\textbf {q}}\), \({\textbf {q}}^k\), \(\widetilde{{\textbf {q}}}^k\) and \({\textbf {r}}\), \({\textbf {r}}^k\), \(\widetilde{{\textbf {r}}}^k\), Eq. (A15) can be simplified as
where
According to the full column rank of \(\widetilde{{\textbf {A}}}\) and \(\widetilde{{\textbf {C}}}\), it is obvious that
is a positive definite matrix. Then, since Eq. (A9) has the expression that
the following equation holds:
where \(\gamma = 1\) and
Note that \({\textbf {M}}\) is a nonsingular matrix whose inverse matrix is
Furthermore, we can define a symmetric and positive definite matrix as
Consequently, we find that
is obviously positive semidefinite.
Finally, the matrices \({\textbf {Q}}^\mathsf {T} + {\textbf {Q}}\), \({\textbf {P}}\) and \({\textbf {G}}\) derived from (A19), (A24) and (A25) satisfy the convergence condition proposed in Lemma 1. As a result, the iterations of Eqs. (37)–(41), namely POE-ELM, converge to the global optimal solution. \(\square\)
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, Y., Dai, Y. & Wu, Q. An accelerated optimization algorithm for the elastic-net extreme learning machine. Int. J. Mach. Learn. & Cyber. 13, 3993–4011 (2022). https://doi.org/10.1007/s13042-022-01636-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01636-1