Abstract
Data stream mining is one of the hot topics in data mining. Most existing algorithms assume that data stream with concept drift is balanced. However, in real-world, the data streams are imbalanced with concept drift. The learning algorithm will be more complex for the imbalanced data stream with concept drift. In online learning algorithm, the oversampling method is used to select a small number of samples from the previous data block through a certain strategy and add them into the current data block to amplify the current minority class. However, in this method, the number of stored samples, the method of oversampling and the weight calculation of base-classifier all affect the classification performance of ensemble classifier. This paper proposes a dynamic weighted selective ensemble (DWSE) learning algorithm for imbalanced data stream with concept drift. On the one hand, through resampling the minority samples in previous data block, the minority samples of the current data block can be amplified, and the information in the previous data block can be absorbed into building a classifier to reduce the impact of concept drift. The calculation method of information content of every sample is defined, and the resampling method and updating method of the minority samples are given in this paper. On the other hand, because of concept drift, the performance of the base-classifier will be degraded, and the decay factor is usually used to describe the performance degradation of base-classifier. However, the static decay factor cannot accurately describe the performance degradation of the base-classifier with the concept drift. The calculation method of dynamic decay factor of the base-classifier is defined in DWSE algorithm to select sub-classifiers to eliminate according to the attenuation situation, which makes the algorithm better deal with concept drift. Compared with other algorithms, the results show that the DWSE algorithm has better classification performance for majority class samples and minority samples.
Similar content being viewed by others
References
Sun Z, Song Q, Zhu X, Sun H, Xu B, Zhou Y (2015) A novel ensemble method for classifying imbalanced data. Pattern Recogn 48(5):1623–1637
Tingting Z, Yang G, Junwu Z (2020) Survey of online learning algorithms for streaming data classification. Ruan Jian Xue Bao J Softw 31(4):912–993. https://doi.org/10.13328/j.cnki.jos.005916 (in Chinese with English abstract)
Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8(12):2755–2790
Luong AV, Nguyen TT, Liew AW, Wang S (2020) Heterogeneous ensemble selection for evolving data streams-sciencedirect. Pattern Recognit. https://doi.org/10.1016/j.patcog.2020.107743
Aburomman AA, Reaz M (2016) A novel SVM-KNN-PSO ensemble method for intrusion detection system. Appl Soft Comput 38(C):360–372
Hido S, Kashima H (2008) Roughly Balanced Bagging for Imbalanced Data. Proceedings of the SIAM International Conference on Data Mining, SDM 2008, April 24–26, 2008, Atlanta, Georgia, USA
Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Part A Syst Hum 40(1):185–197
Wang B, Pineau J (2016) Online bagging and boosting for imbalanced data streams. IEEE Trans Knowl Data Eng 28:3353–3366
Ferdowsi Z, Ghani R, Settimi R (2013) Online Active Learning with Imbalanced Classes. IEEE International Conference on Data Mining. IEEE.1, pp 1043–1048
Wang S, Minku LL, Xin Y (2013) A learning framework for online class imbalance learning. 2013 IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL), 2013: 36–45, https://doi.org/10.1109/CIEL.2013.6613138
Yu S, Ke T, Minku LL, Wang S, Xin Y (2016) Online ensemble learning of data streams with gradually evolved classes. IEEE Trans Knowl Data Eng 28(6):1532–1545
Ke W, Edwards A, Wei F, Jing G, Zhang K (2014) Classifying imbalanced data streams via dynamic feature group weighting with importance sampling. International Conference on Data Mining. SIAM International Conference on Data Mining. 2014 Apr; 2014:722–730. https://doi.org/10.1137/1.9781611973440.83
Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. Proc Seventh ACM SIGKDD Int Conf Knowl Discov Data Min ACM 2001:377–382
Sidhu P, Bhatia M (2019) A two ensemble system to handle concept drifting data streams: recurring dynamic weighted majority. Int J Mach Learn Cybern 10(3):563–578
Dhaliwal P, Kumar A, Chaudhary P (2020) An approach for concept drifting streams: early dynamic weighted majority. Proc Comput Sci 167:2653–2661
Brzezinski D, Stefanowski J (2014) Combining block-based and online methods in learning ensembles from concept drifting data streams. Inf Sci 265:50–67
Sidhu P, Bhatia M (2015) A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority. Int J Mach Learn Cyber 9:37–61. https://doi.org/10.1007/s13042-015-0333-x
Kamala VR, MaryGladence L (2015) An optimal approach for social data analysis in Big Data. In: 2015 International Conference on Computation of Power, Energy, Information and Communication (ICCPEIC), pp. 0205–0208. IEEE
Shan J, Zhang H, Liu W, Liu Q (2018) Online active learning ensemble framework for drifted data streams. IEEE Trans Neural Netw Learn Syst 30:486–498
Yang Z, Al-Dahidi S, Baraldi P et al (2020) A novel concept drift detection method for incremental learning in nonstationary environments[J]. IEEE Trans Neural Netw Learn Syst 31(1):309–320
He H, Yang B, Garcia EA, Li S (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning. Neural Networks, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328, https://doi.org/10.1109/IJCNN.2008.4633969
Oza NC, Russell S (2005) Online Bagging and Boosting. 2005 IEEE International Conference on Systems, Man and Cybernetics, pp. 2340–2345 Vol. 3, https://doi.org/10.1109/ICSMC.2005.1571498
Bifet A, Holmes G, Pfahringer B (2010) Leveraging bagging for evolving data streams. In: Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science, vol 6321. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15880-3_15
Gomes HM, Bifet A, Read J et al (2017) Adaptive random forests for evolving data stream classification. Mach Learn 106(9):1469–1495
Zhou ZH, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all [J]. Artif Intell 137(1–2):239–263
Brzezinski D, Stefanowski J (2013) Reacting to different types of concept drift: the accuracy updated ensemble algorithm [J]. IEEE Trans Nerrual Netw Learn Syst 25(1):81–94. https://doi.org/10.1109/TNNLS.2013.2251352
Zhou JX, Zhou ZH, Shen XH et al (2000) A selective constructing approach to neural network ensemble[J]. J Calc Res Dev 37(9):1039–1044
Du H, Zhang Y, Gang K, Zhang L, Chen YC (2021) Online ensemble learning algorithm for imbalanced data stream. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2021.107378
Wang Q, Luo ZH, Huang JC, Feng YH, Zhong L (2017) A novel ensemble method for imbalanced data learning: bagging of extrapolation-SMOTE SVM. Comput Intell Neurosci 2017:1827016. https://doi.org/10.1155/2017/1827016
Du H, Zhang Y (2020) Network anomaly detection based on selective ensemble algorithm [J]. J Supercomput. https://doi.org/10.1007/s11227-020-03374-z
Wang S, Minku LL, Yao X (2015) (2015) Resampling-based ensemble methods for online class imbalance learning [J]. Knowl Data Eng IEEE Trans 27(5):1356–1368
Gao J, Fan W, Han J, Philip SY (2007) A general framework for mining concept drifting data streams with skewed distributions. In: Proceedings of the 2007 SIAM International Conference on Data Mining (SDM), SIAM, 2007, pp. 3–14
Ren S, Liao B, Zhu W et al (2018) The gradual resampling ensemble for mining imbalanced data streams with concept drift. Neurocomputing 286:150–166
Sheng C, He H (2011) Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evol Syst 2(1):35–50
Ester M, Kriegel H-P, Sander J, Xu X et al (1996) (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Proc Sec Int Conf Knowl Discov Data Min (KDD) 96:226–231
Sheng C, He H (2009) SERA: Selectively recursive approach towards nonstationary imbalanced stream data mining. In: Proceedings of the 2009 International Joint Conference on Neural Networks (IJCNN'09). IEEE Press, pp 2053–2060, https://doi.org/10.1109/IJCNN.2009.5178874
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '03). Association for Computing Machinery, New York, NY, USA, 226–235. https://doi.org/10.1145/956750.956778
Du L, Song Q, Jia X (2014) detecting concept drift: an information entropy based method using an adaptive sliding window. Intell Data Anal 18(3):337–364
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604
Chen S, He H, Li K, Desai S (2010) Musera: multiple selectively recursive approach towards imbalanced stream data mining, In: Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), IEEE, 2010, pp. 1–8
Acknowledgements
This work is supported by the Natural Science Foundation Research Project of Shaanxi Province of China (No.2020KRM156); Shaanxi Provincial Education Department Scientific Research Program Foundation of China (No.15JK1218); Science and Technology Plan Project of Shangluo City of China (No. SK2019-84); Science and Technology Research Project of Shangluo University of China (No.17SKY003). Science and Technology Innovation Team Building Project of Shangluo University of China (No.18SCX002).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yan, Z., Hongle, D., Gang, K. et al. Dynamic weighted selective ensemble learning algorithm for imbalanced data streams. J Supercomput 78, 5394–5419 (2022). https://doi.org/10.1007/s11227-021-04084-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-04084-w