Skip to main content
Log in

Negative correlation learning in the extreme learning machine framework

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Extreme learning machine (ELM) has shown to be a suitable algorithm for classification problems. Several ensemble meta-algorithms have been developed in order to generalize the results of ELM models. Ensemble approaches introduced in the ELM literature mainly come from boosting and bagging frameworks. The generalization of these methods relies on data sampling procedures, under the assumption that training data are heterogeneously enough to set up diverse base learners. The proposed ELM ensemble model overcomes this strong assumption by using the negative correlation learning (NCL) framework. An alternative diversity metric based on the orthogonality of the outputs is proposed. The error function formulation allows us to develop an analytical solution to the parameters of the ELM base learners, which significantly reduce the computational burden of the standard NCL ensemble method. The proposed ensemble method has been validated by an experimental study with a variety of benchmark datasets, comparing it with the existing ensemble methods in ELM. Finally, the proposed method statistically outperforms the comparison ensemble methods in accuracy, also reporting a competitive computational burden (specially if compared to the baseline NCL-inspired method).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Subscripts are used to denote the number of the iteration (initialization stage corresponds to the first iteration of the algorithm) and superscripts to index the number of classifiers within the ensemble.

  2. The dot product is squared in order to consider solely the direction of the vector.

  3. It was carried out using a Python repository that has been developed by the authors with this goal in mind and uploaded to Github (https://github.com/cperales/uci-download-process).

  4. A Python library has been developed by the authors with the algorithms used for these experiments and publicly uploaded to Github (https://github.com/cperales/pyridge)

References

  1. Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36(1–2):105–139

    Google Scholar 

  2. Bengio Y (2000) Gradient-based optimization of hyperparameters. Neural Comput 12(8):1889–1900

    MathSciNet  Google Scholar 

  3. Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305

    MathSciNet  MATH  Google Scholar 

  4. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  5. Brown G, Wyatt J (2003) Negative correlation learning and the ambiguity family of ensemble methods. In: 4th international workshop on multiple classifier systems, vol 2709. Springer, pp 266–275

  6. Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inf Fusion 6(1):5–20

    Google Scholar 

  7. Brown G, Wyatt J, Tino P (2005) Managing diversity in regression ensembles. J Mach Learn Res 6:1621–1650

    MathSciNet  MATH  Google Scholar 

  8. Bühlmann P, Yu B (2003) Boosting with the L2 loss: regression and classification. J Am Stat Assoc 98(462):324–339

    MATH  Google Scholar 

  9. Bui T, Hernández-Lobato D, Hernandez-Lobato J, Li Y, Turner R (2016) Deep Gaussian processes for regression using approximate expectation propagation. In: 33rd international conference on machine learning, vol 48. ICML, pp 1472–1481

  10. Cao F, Yang Z, Ren J, Chen W, Han G, Shen Y (2019) Local block multilayer sparse extreme learning machine for effective feature extraction and classification of hyperspectral images. IEEE Trans Geosci Remote Sens 57(8):5580–5594

    Google Scholar 

  11. Chaturvedi I, Ragusa E, Gastaldo P, Zunino R, Cambria E (2018) Bayesian network based extreme learning machine for subjectivity detection. J Frankl Inst 355(4):1780–1797

    MathSciNet  MATH  Google Scholar 

  12. Chen H, Jiang B, Yao X (2018) Semisupervised negative correlation learning. IEEE Trans Neural Netw Learn Syst 29(11):5366–5379

    MathSciNet  Google Scholar 

  13. Chen H, Yao X (2010) Multiobjective neural network ensembles based on regularized negative correlation learning. IEEE Trans Knowl Data Eng 22(12):1738–1751

    Google Scholar 

  14. Chu Y, Feng C, Guo C, Wang Y (2018) Network embedding based on deep extreme learning machine. Int J Mach Learn Cybern 10(10):2709–2724

    Google Scholar 

  15. Damianou A, Lawrence N (2013) Deep Gaussian processes. In: Artificial intelligence and statistics. AISTATS, pp 207–215

  16. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  17. Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems. Springer, Berlin, pp 1–15

  18. Ding S, Zhao H, Zhang Y, Xu X, Nie R (2015) Extreme learning machine: algorithm, theory and applications. Artif Intell Rev 44(1):103–115

    Google Scholar 

  19. Domingos P (1997) Why does bagging work? A Bayesian account and its implications. In: 3rd international conference on knowledge discovery and data mining. KDD, pp 155–158

  20. Dua D, Graff C (2019) UCI machine learning repository. School of Information and Computer Sciences, University of California, Irvine. http://archive.ics.uci.edu/ml

  21. Fernández-Navarro F, Gutiérrez PA, Hervás-Martánez C, Yao X (2013) Negative correlation ensemble learning for ordinal regression. IEEE Trans Neural Netw Learn Syst 24(11):1836–1849

    Google Scholar 

  22. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    MathSciNet  MATH  Google Scholar 

  23. Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92

    MathSciNet  MATH  Google Scholar 

  24. Göçken M, Özçalıcı M, Boru A, Dosdoğru AT (2019) Stock price prediction using hybrid soft computing models incorporating parameter tuning and input variable selection. Neural Comput Appl 31(2):577–592

    Google Scholar 

  25. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, vol 27. NIPS, pp 2672–2680

  26. Hager WW (1989) Updating the inverse of a matrix. SIAM Rev 31(2):221–239

    MathSciNet  MATH  Google Scholar 

  27. Hastie T, Rosset S, Zhu J, Zou H (2009) Multi-class AdaBoost. Stat Interface 2(3):349–360

    MathSciNet  MATH  Google Scholar 

  28. Higuchi T, Yao X, Liu Y (2002) Evolutionary ensembles with negative correlation learning. IEEE Trans Evol Comput 4(4):380–387

    Google Scholar 

  29. Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65–70

    MathSciNet  MATH  Google Scholar 

  30. Huang GB, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern B Cybern 42(2):513–29

    Google Scholar 

  31. Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501

    Google Scholar 

  32. Chen H, Yao X (2009) Regularized negative correlation learning for neural network ensembles. IEEE Trans Neural Netw 20(12):1962–1979

    Google Scholar 

  33. Ibrahim W, Abadeh M (2019) Protein fold recognition using deep kernelized extreme learning machine and linear discriminant analysis. Neural Comput Appl 31(8):4201–4214

    Google Scholar 

  34. Islam MA, Anderson DT, Ball JE, Younan NH: Fusion of diverse features and kernels using LP norm based multiple kernel learning in hyperspectral image processing. In: 8th workshop on hyperspectral image and signal processing: evolution in remote sensing. IEEE, pp 1–5 (2016)

  35. Jia X, Li X, Jin Y, Miao J (2019) Region-enhanced multi-layer extreme learning machine. Cognit Comput 11(1):101–109

    Google Scholar 

  36. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: 3rd international conference on learning representations. ICLR

  37. Ko AHR, Sabourin R, De Oliveira LE, De Souza Britto A (2008) The implication of data diversity for a classifier-free ensemble selection in random subspaces. In: 19th international conference on pattern recognition. ICPR, pp 2251–2255

  38. Kohavi R, John GH (1995) Automatic parameter selection by minimizing estimated error. In: Machine Learning Proceedings. Elsevier, pp 304–312

  39. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, vol 25. NIPS, pp 1097–1105

  40. Krueger T, Panknin D, Braun M (2015) Fast cross-validation via sequential testing. J Mach Learn Res 16:1103–1155

    MathSciNet  MATH  Google Scholar 

  41. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Google Scholar 

  42. LeCun Y, Bengio Y et al (1998) Convolutional networks for images, speech, and time series. MIT Press, Cambridge, pp 255–258

    Google Scholar 

  43. Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A (2018) Hyperband: a novel bandit-based approach to hyperparameter optimization. J Mach Learn Res 18(1):1–52

    MathSciNet  MATH  Google Scholar 

  44. Liu Y, Yao X (1999) Ensemble learning via negative correlation. Neural Netw 12(10):1399–1404

    Google Scholar 

  45. Liu Y, Yao X (1999) Negatively correlated neural networks for classification. Artif Life Robot 3(4):255–259

    Google Scholar 

  46. Liu Y, Yao X (1999) Simultaneous training of negatively correlated neural networks in an ensemble. IEEE Trans Syst Man Cybern Part B (Cybern) 29(6):716–725

    Google Scholar 

  47. MacKay DJ (1996) Hyperparameters: optimize, or integrate out? In: 13th international workshop on maximum entropy and Bayesian methods, vol 62. Springer, pp 43–59

  48. Mehrkanoon S (2019) Deep neural-kernel blocks. Neural Netw 116:46–55

    Google Scholar 

  49. Perrone M, Cooper L (1992) When networks disagree: ensemble methods for hybrid neural networks. Tech. rep., Brown University Providence, Institute for Brain and Neural Systems

  50. Ran Y, Sun X, Sun H, Sun L, Wang X (2012) Boosting ridge extreme learning machine. In: IEEE symposium on robotics and applications. IEEE, pp 881–884

  51. Rátsch G, Onoda T, Múller KR (2001) Soft margins for adaboost. Mach Learn 42(3):287–320

    MATH  Google Scholar 

  52. Riccardi A, Fernández-Navarro F, Carloni S (2014) Cost-sensitive AdaBoost algorithm for ordinal regression based on extreme learning machine. IEEE Trans Cybern 44(10):1898–1909

    Google Scholar 

  53. Schaal S, Atkeson CG (1996) From isolation to cooperation: an alternative view of a system of experts. In: Advances in neural information processing systems, vol 8. NIPS, pp 605–611

  54. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117

    Google Scholar 

  55. Shan P, Zhao Y, Sha X, Wang Q, Lv X, Peng S, Ying Y (2018) Interval lasso regression based extreme learning machine for nonlinear multivariate calibration of near infrared spectroscopic datasets. Anal Methods 10(25):3011–3022

    Google Scholar 

  56. Tang J, Deng C, Huang GB (2016) Extreme learning machine for multilayer perceptron. IEEE Trans Neural Netw Learn Syst 27(4):809–821

    MathSciNet  Google Scholar 

  57. Tian H, Meng B (2010) A new modeling method based on bagging elm for day-ahead electricity price prediction. In: 5th international conference on bio-inspired computing: theories and applications. IEEE, pp 1076–1079

  58. Tutz G, Binder H (2007) Boosting ridge regression. Comput Stat Data Anal 51(12):6044–6059

    MathSciNet  MATH  Google Scholar 

  59. Ueda N, Nakano R (1996) Generalization error of ensemble estimators. In: International conference on neural networks. IEEE, pp 90–95

  60. Van Heeswijk M, Miche Y, Oja E, Lendasse A (2011) Gpu-accelerated and parallelized elm ensembles for large-scale regression. Neurocomputing 74(16):2430–2437

    Google Scholar 

  61. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408

    MathSciNet  MATH  Google Scholar 

  62. Wang H, Zheng B, Yoon SW, Ko HS (2018) A support vector machine-based ensemble algorithm for breast cancer diagnosis. Eur J Oper Res 267(2):687–699

    MathSciNet  MATH  Google Scholar 

  63. Wang S, Chen H, Yao X (2010) Negative correlation learning for classification ensembles. In: International joint conference on neural networks. IEEE, pp 1–8

  64. Witten IH, Frank E (2005) 2nd data mining: practical machine learning tools and techniques. Data management systems. Elsevier, Amsterdam

    MATH  Google Scholar 

  65. Woźniak M, Graña M, Corchado E (2014) A survey of multiple classifier systems as hybrid systems. Inf Fusion 16:3–17

    Google Scholar 

  66. Wyner AJ, Olson M, Bleich J, Mease D (2017) Explaining the success of adaboost and random forests as interpolating classifiers. J Mach Learn Res 18(1):1558–1590

    MathSciNet  MATH  Google Scholar 

  67. Xu X, Deng J, Coutinho E, Wu C, Zhao L, Schuller BW (2019) Connecting subspace learning and extreme learning machine in speech emotion recognition. IEEE Trans Multimed 21(3):795–808

    Google Scholar 

  68. Young SR, Rose DC, Karnowski TP, Lim SH, Patton RM (2015) Optimizing deep learning hyper-parameters through an evolutionary algorithm. In: Proceedings of the workshop on machine learning in high-performance computing environments. Association for Computing Machinery, pp 1–5

  69. Zhang W, Xu A, Ping D, Gao M (2019) An improved kernel-based incremental extreme learning machine with fixed budget for nonstationary time series prediction. Neural Comput Appl 31(3):637–652

    Google Scholar 

  70. Zhao J, Liang Z, Yang Y (2012) Parallelized incremental support vector machines based on mapreduce and bagging technique. In: International conference on information science and technology. IEEE, pp 297–301

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos Perales-González.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Perales-González, C., Carbonero-Ruz, M., Pérez-Rodríguez, J. et al. Negative correlation learning in the extreme learning machine framework. Neural Comput & Applic 32, 13805–13823 (2020). https://doi.org/10.1007/s00521-020-04788-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-04788-9

Keywords

Navigation