Skip to main content
Log in

Dynamic weighted selective ensemble learning algorithm for imbalanced data streams

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Data stream mining is one of the hot topics in data mining. Most existing algorithms assume that data stream with concept drift is balanced. However, in real-world, the data streams are imbalanced with concept drift. The learning algorithm will be more complex for the imbalanced data stream with concept drift. In online learning algorithm, the oversampling method is used to select a small number of samples from the previous data block through a certain strategy and add them into the current data block to amplify the current minority class. However, in this method, the number of stored samples, the method of oversampling and the weight calculation of base-classifier all affect the classification performance of ensemble classifier. This paper proposes a dynamic weighted selective ensemble (DWSE) learning algorithm for imbalanced data stream with concept drift. On the one hand, through resampling the minority samples in previous data block, the minority samples of the current data block can be amplified, and the information in the previous data block can be absorbed into building a classifier to reduce the impact of concept drift. The calculation method of information content of every sample is defined, and the resampling method and updating method of the minority samples are given in this paper. On the other hand, because of concept drift, the performance of the base-classifier will be degraded, and the decay factor is usually used to describe the performance degradation of base-classifier. However, the static decay factor cannot accurately describe the performance degradation of the base-classifier with the concept drift. The calculation method of dynamic decay factor of the base-classifier is defined in DWSE algorithm to select sub-classifiers to eliminate according to the attenuation situation, which makes the algorithm better deal with concept drift. Compared with other algorithms, the results show that the DWSE algorithm has better classification performance for majority class samples and minority samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Sun Z, Song Q, Zhu X, Sun H, Xu B, Zhou Y (2015) A novel ensemble method for classifying imbalanced data. Pattern Recogn 48(5):1623–1637

    Article  Google Scholar 

  2. Tingting Z, Yang G, Junwu Z (2020) Survey of online learning algorithms for streaming data classification. Ruan Jian Xue Bao J Softw 31(4):912–993. https://doi.org/10.13328/j.cnki.jos.005916 (in Chinese with English abstract)

    Article  MATH  Google Scholar 

  3. Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8(12):2755–2790

    MATH  Google Scholar 

  4. Luong AV, Nguyen TT, Liew AW, Wang S (2020) Heterogeneous ensemble selection for evolving data streams-sciencedirect. Pattern Recognit. https://doi.org/10.1016/j.patcog.2020.107743

    Article  Google Scholar 

  5. Aburomman AA, Reaz M (2016) A novel SVM-KNN-PSO ensemble method for intrusion detection system. Appl Soft Comput 38(C):360–372

    Article  Google Scholar 

  6. Hido S, Kashima H (2008) Roughly Balanced Bagging for Imbalanced Data. Proceedings of the SIAM International Conference on Data Mining, SDM 2008, April 24–26, 2008, Atlanta, Georgia, USA

  7. Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Part A Syst Hum 40(1):185–197

    Article  Google Scholar 

  8. Wang B, Pineau J (2016) Online bagging and boosting for imbalanced data streams. IEEE Trans Knowl Data Eng 28:3353–3366

    Article  Google Scholar 

  9. Ferdowsi Z, Ghani R, Settimi R (2013) Online Active Learning with Imbalanced Classes. IEEE International Conference on Data Mining. IEEE.1, pp 1043–1048

  10. Wang S, Minku LL, Xin Y (2013) A learning framework for online class imbalance learning. 2013 IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL), 2013: 36–45, https://doi.org/10.1109/CIEL.2013.6613138

  11. Yu S, Ke T, Minku LL, Wang S, Xin Y (2016) Online ensemble learning of data streams with gradually evolved classes. IEEE Trans Knowl Data Eng 28(6):1532–1545

    Article  Google Scholar 

  12. Ke W, Edwards A, Wei F, Jing G, Zhang K (2014) Classifying imbalanced data streams via dynamic feature group weighting with importance sampling. International Conference on Data Mining. SIAM International Conference on Data Mining. 2014 Apr; 2014:722–730. https://doi.org/10.1137/1.9781611973440.83

  13. Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. Proc Seventh ACM SIGKDD Int Conf Knowl Discov Data Min ACM 2001:377–382

    Article  Google Scholar 

  14. Sidhu P, Bhatia M (2019) A two ensemble system to handle concept drifting data streams: recurring dynamic weighted majority. Int J Mach Learn Cybern 10(3):563–578

    Article  Google Scholar 

  15. Dhaliwal P, Kumar A, Chaudhary P (2020) An approach for concept drifting streams: early dynamic weighted majority. Proc Comput Sci 167:2653–2661

    Article  Google Scholar 

  16. Brzezinski D, Stefanowski J (2014) Combining block-based and online methods in learning ensembles from concept drifting data streams. Inf Sci 265:50–67

    Article  MathSciNet  Google Scholar 

  17. Sidhu P, Bhatia M (2015) A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority. Int J Mach Learn Cyber 9:37–61. https://doi.org/10.1007/s13042-015-0333-x

    Article  Google Scholar 

  18. Kamala VR, MaryGladence L (2015) An optimal approach for social data analysis in Big Data. In: 2015 International Conference on Computation of Power, Energy, Information and Communication (ICCPEIC), pp. 0205–0208. IEEE

  19. Shan J, Zhang H, Liu W, Liu Q (2018) Online active learning ensemble framework for drifted data streams. IEEE Trans Neural Netw Learn Syst 30:486–498

    Article  Google Scholar 

  20. Yang Z, Al-Dahidi S, Baraldi P et al (2020) A novel concept drift detection method for incremental learning in nonstationary environments[J]. IEEE Trans Neural Netw Learn Syst 31(1):309–320

    Article  Google Scholar 

  21. He H, Yang B, Garcia EA, Li S (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning. Neural Networks, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328, https://doi.org/10.1109/IJCNN.2008.4633969

  22. Oza NC, Russell S (2005) Online Bagging and Boosting. 2005 IEEE International Conference on Systems, Man and Cybernetics, pp. 2340–2345 Vol. 3, https://doi.org/10.1109/ICSMC.2005.1571498

  23. Bifet A, Holmes G, Pfahringer B (2010) Leveraging bagging for evolving data streams. In: Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science, vol 6321. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15880-3_15

    Chapter  Google Scholar 

  24. Gomes HM, Bifet A, Read J et al (2017) Adaptive random forests for evolving data stream classification. Mach Learn 106(9):1469–1495

    Article  MathSciNet  Google Scholar 

  25. Zhou ZH, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all [J]. Artif Intell 137(1–2):239–263

    Article  MathSciNet  Google Scholar 

  26. Brzezinski D, Stefanowski J (2013) Reacting to different types of concept drift: the accuracy updated ensemble algorithm [J]. IEEE Trans Nerrual Netw Learn Syst 25(1):81–94. https://doi.org/10.1109/TNNLS.2013.2251352

    Article  Google Scholar 

  27. Zhou JX, Zhou ZH, Shen XH et al (2000) A selective constructing approach to neural network ensemble[J]. J Calc Res Dev 37(9):1039–1044

    Google Scholar 

  28. Du H, Zhang Y, Gang K, Zhang L, Chen YC (2021) Online ensemble learning algorithm for imbalanced data stream. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2021.107378

    Article  Google Scholar 

  29. Wang Q, Luo ZH, Huang JC, Feng YH, Zhong L (2017) A novel ensemble method for imbalanced data learning: bagging of extrapolation-SMOTE SVM. Comput Intell Neurosci 2017:1827016. https://doi.org/10.1155/2017/1827016

    Article  Google Scholar 

  30. Du H, Zhang Y (2020) Network anomaly detection based on selective ensemble algorithm [J]. J Supercomput. https://doi.org/10.1007/s11227-020-03374-z

    Article  Google Scholar 

  31. Wang S, Minku LL, Yao X (2015) (2015) Resampling-based ensemble methods for online class imbalance learning [J]. Knowl Data Eng IEEE Trans 27(5):1356–1368

    Article  Google Scholar 

  32. Gao J, Fan W, Han J, Philip SY (2007) A general framework for mining concept drifting data streams with skewed distributions. In: Proceedings of the 2007 SIAM International Conference on Data Mining (SDM), SIAM, 2007, pp. 3–14

  33. Ren S, Liao B, Zhu W et al (2018) The gradual resampling ensemble for mining imbalanced data streams with concept drift. Neurocomputing 286:150–166

    Article  Google Scholar 

  34. Sheng C, He H (2011) Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evol Syst 2(1):35–50

    Article  Google Scholar 

  35. Ester M, Kriegel H-P, Sander J, Xu X et al (1996) (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Proc Sec Int Conf Knowl Discov Data Min (KDD) 96:226–231

    Google Scholar 

  36. Sheng C, He H (2009) SERA: Selectively recursive approach towards nonstationary imbalanced stream data mining. In: Proceedings of the 2009 International Joint Conference on Neural Networks (IJCNN'09). IEEE Press, pp 2053–2060, https://doi.org/10.1109/IJCNN.2009.5178874

  37. Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '03). Association for Computing Machinery, New York, NY, USA, 226–235. https://doi.org/10.1145/956750.956778

  38. Du L, Song Q, Jia X (2014) detecting concept drift: an information entropy based method using an adaptive sliding window. Intell Data Anal 18(3):337–364

    Article  Google Scholar 

  39. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604

    Google Scholar 

  40. Chen S, He H, Li K, Desai S (2010) Musera: multiple selectively recursive approach towards imbalanced stream data mining, In: Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), IEEE, 2010, pp. 1–8

Download references

Acknowledgements

This work is supported by the Natural Science Foundation Research Project of Shaanxi Province of China (No.2020KRM156); Shaanxi Provincial Education Department Scientific Research Program Foundation of China (No.15JK1218); Science and Technology Plan Project of Shangluo City of China (No. SK2019-84); Science and Technology Research Project of Shangluo University of China (No.17SKY003). Science and Technology Innovation Team Building Project of Shangluo University of China (No.18SCX002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Du Hongle.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, Z., Hongle, D., Gang, K. et al. Dynamic weighted selective ensemble learning algorithm for imbalanced data streams. J Supercomput 78, 5394–5419 (2022). https://doi.org/10.1007/s11227-021-04084-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-04084-w

Keywords

Navigation