Abstract
In real applications of cognitive computation, data with imbalanced classes are used to be collected sequentially. In this situation, some of current machine learning algorithms, e.g., support vector machine, will obtain weak classification performance, especially on minority class. To solve this problem, a new hybrid sampling online extreme learning machine (ELM) on sequential imbalanced data is proposed in this paper. The key idea is keeping the majority and minority classes balanced with similar sequential distribution characteristic of the original data. This method includes two stages. At the offline stage, we introduce the principal curve to build confidence regions of minority and majority classes respectively. Based on these two confidence zones, over-sampling of minority class and under-sampling of majority class are both conducted to generate new synthetic samples, and then, the initial ELM model is established. At the online stage, we first choose the most valuable ones from the synthetic samples of majority class in terms of sample importance. Afterwards, a new online fast leave-one-out cross validation (LOO CV) algorithm utilizing Cholesky decomposition is proposed to determine whether to update the ELM network weight at online stage or not. We also prove theoretically that the proposed method has upper bound of information loss. Experimental results on seven UCI datasets and one real-world air pollutant forecasting dataset show that, compared with ELM, OS-ELM, meta-cognitive OS-ELM, and OSELM with SMOTE strategy, the proposed method can simultaneously improve the classification performance of minority and majority classes in terms of accuracy, G-mean value, and ROC curve. As a conclusion, the proposed hybrid sampling online extreme learning machine can be effectively applied to the sequential data imbalance problem with better generalization performance and numerical stability.
Similar content being viewed by others
References
Liu H, Sun F, Gao D, et al. Structured output-associated dictionary learning for haptic understanding. IEEE Trans Syst Man Cybern Syst. 2017;47(7):1564–74.
Deng W, Zheng Q, Wang Z. Cross-person activity recognition using reduced kernel extreme learning machine. Neural Netw 2014;53:1–7.
Liu H, Sun F, Fang B, et al. Robotic room-level localization using multiple sets of sonar measurements. IEEE Trans Instrum Meas. 2017;66(1):2–13.
Liu H, Yu Y, Sun F, et al. Robotic room-level localization using multiple sets of sonar measurements. IEEE Trans Autom Sci Eng 2017;14(2):996–1008.
Xu R, Chen T, Xia Y, et al. Word embedding composition for data imbalances in sentiment and emotion classification. Cognitive Computation 2015;7:226–40.
Xiong S, Meng F, Liu B, et al. A kernel clustering-based possibilistic fuzzy extreme learning machine for class imbalance learning. Cognitive Computation 2015;7(1):74–85.
Ou W, Yuan D, Li D, et al. Patch-based visual tracking with online representative sample selection. J Electronic Imaging 2017;26(3):33006.
Batista GEAPA, Prati RC, Monard MC. A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor Newsl. 2004;6:20–9.
Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artifical Intelligence Research 2002;16(1):321–57.
Yang Z, Qiao L, Peng X. Research on datamining method for imbalanced dataset based on improved SMOTE. ACTA ELECTRONICA SINICA 2007;12(A):22–6.
Zeng Z, Wu Q, Liao B, et al. A classification method for imbalance data set based on kernel SMOTE. Acta Electronica Sinica 2009;37(11):2489–95.
Jeatrakul P, Wong KW, Fung CC. Classification of imbalanced data by combining the complementary neural network and SMOTE algorithm. Neural Information Processing 2010;6444:152–9.
Zhai Y, Ma N, Ruan D. An effective over-sampling method for imbalanced data sets classification. Chin J Electron 2011;20(3):489–94.
Ducange P, Lazzerini B, Marcelloni F. Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets. Soft Comput. 2010;14:713–28.
Wu G, Chang EY. KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Trans Knowl Data. Eng. 2005;17:786–95.
Estabrooks A, Jo T, Japkowicz N. A multiple resampling method for learning from imbalanced datasets. Comput Intell. 2004;20:18–36.
Huang GB. What are extreme learning machines? Filling the gap between Frank Rosenblatt’s dream and John von Neumann’s puzzle. Cognitive Computation 2015;7:263–78.
Huang GB, Zhu QY, Siew CK. Extreme learning machine: a new learning scheme of feedforward neural networks. Proceedings of International Joint Conference on Neural Networks (IJCNN2004) 2004;2:985–90.
Huang GB, Zhou H, Ding X, Zhang R. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern B Cybern. 2012;42(2):513–29.
Zong W, Huang G.-B., Chen Y. Weighted extreme learning machine for imbalanced learning. Neurocomputing 2013;101(3):229–42.
Liang NY, Huang GB, Saratchandran P. A fast accurate online sequential learning algorithm for feedforword networks. IEEE Trans, Neural Networks 2006;17:1411–23.
Vong CM, IP WF, Wong PK, Chiu CC. Predicting minority class for suspended particulate matters level by extreme learning machine. Neurocomputing 2014;128:136–44.
Mirza B, Lin Z, Toh KA. Weighted online sequential extreme learning machine for class imbalance learning, neural. Process Lett. 2013;38:465–86.
Wang S, Minku LL, Yao X. Resampling-based ensemble methods for online class imbalance learning. IEEE Trans Knowl Data Eng. 2015;27(5):1356–68.
Mirza B, Lin Z, Liu N. Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift. Neurocomputing 2005;149(A):316–29.
Zhang Y, Liu B, Cai J, Zhang S. 2016. Ensemble weighted extreme learning machine for imbalanced data classification based on differential evolution. Neural Comput Applic. 1–9.
Yuan P, Ma H, Fu H. Hotspot-entropy based data forwarding in opportunistic social networks. Pervasive and Mobile Computing 2015;16(A):136–54.
Mao W, Wang J, He L, et al. Online sequential prediction of imbalance data with two-stage hybrid strategy by extreme learning machine. Neurocomputing 2017;261:94–105.
Liu H, Qin J, Sun F, et al. 2017. Extreme kernel sparse learning for tactile object recognition. IEEE Transactions on Cybernetics. (in press).
Cao J, Zhao T, Wang J, et al. 2017. Excavation equipment classification based on improved MFCC features and ELM. Neurocomputing. (in press).
Huang, Yu Y, Gu J, et al. An efficient method for traffic sign recognition based on extreme learning machine. IEEE Transactions on Cybernetics 2016;47(4):920–33.
Lan Y, Soh YC, Huang GB. Two-stage extreme learning machine for regression. Neurocomputing 2010; 73(16–18):3028–38.
Feng G, Huang G.-B., Lin Q, Gay R. Error minimized extreme learning machine with growth of hidden nodes and incremental learning. IEEE Transactions on Neural Networks 2009;20(8):1352–7.
Mao W, Tian M, Cao X, Xu J. Model selection of extreme learning machine based on multi-objective optimization. Neural Comput Applic. 2013;22(3–4):521–9.
Cao J, Zhang K, Luo M, et al. Extreme learning machine and adaptive sparse representation for image classification. Neural Netw. 2016;81:91–102.
Heeswijk M, Miche Y. Binary/ternary extreme learning machines. Neurocomputing 2015;149:187–97.
Liu X, Li P, Gao C. Fast leave-one-out cross-validation algorithm for extreme learning machine. Journal of Shanghai Jiaotong University 2011;45(8):6–11.
Hastie T, Stuetzle WX. 1984. Principal curves and surfaces, Stanford University, Department of Statistics: Technical Report 11.
Hermann T, Meinicke P, Ritter H. Principal curve sonification. Proceedings of International Conference on Auditory Display. Atlanda; 2000. p. 81–6.
Kégl B, Krzyzak A, Linder T, Zeger K. Learning and design of principal curves. IEEE Trans. Pattern Recognition and Machine Intelligence 2000;22(3):281–97.
Zhang J, Wang J. An overview of principal curves. Chinese Journal of Computers 2003;26(2):129–46.
Zhang X, Wang L. Incremental regularized extreme learning machine based on Cholesky factorization and its application to time series prediction. Acta Phys. Sin 2011;11:7–12.
Vong CM, Ip WF, Chiu CC, Wong PK. Imbalanced learning for air pollution by meta-cognitive online sequential extreme learning machine. Cognitive Computation 2015;7:381–91.
Yang Z, Qiao L, Peng X. Research on data mining method for imbalanced dataset based on improved SMOTE. Acta Electronica Sinica 2007;12A(35):22–6.
Newman DJ, Hettich S, Blake CL, et al. UCI Repository of machine learning databases [http://www.ics.uci.edu/mlearn/MLRepository.html]. Irvine: University of california, department of information and computer science.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Informed Consent
Informed consent was obtained from all individual participants included in the study.
Ethical approval
This article does not contain any studies with human participants performed by any of the authors.
Funding
This work was supported by the National Natural Science Foundation of China (No. 61572399, U1204609), the China Postdoctoral Science Foundation Specific Support (No. 2016T90944), the funding scheme of the University Science and Technology Innovation in Henan Province (No. 15HASTIT022), the funding scheme of University Young Core Instructor in Henan Province (No. 2014GGJS-046), the Foundation of Henan Normal University for Excellent Young Teachers (No. 14YQ007), the Major Science and Technology Foundation in Guangdong Province of China(No:2015B010104002), and the key Scientific Research Foundation of Henan Provincial University (16A520015,15A520078).
Rights and permissions
About this article
Cite this article
Mao, W., Jiang, M., Wang, J. et al. Online Extreme Learning Machine with Hybrid Sampling Strategy for Sequential Imbalanced Data. Cogn Comput 9, 780–800 (2017). https://doi.org/10.1007/s12559-017-9504-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-017-9504-2