Abstract
Data stream classification is an important research direction in the field of data mining, but in many practical applications, it is impossible to collect the complete training set at one time, and the data may be in an imbalanced state and interspersed with concept drift, which will greatly affect the classification performance. To this end, an online dynamic ensemble selection classification algorithm based on window over imbalanced drift data stream (DESW-ID) is proposed. The algorithm employs various balancing measures, first resampling the data stream using Poisson distribution, and if it is in a highly imbalanced state then secondary sampling is performed using a window storing a minority class instances to achieve the current balanced state of the data. To improve the processing efficiency of the algorithm, a classifier selection ensemble is proposed to dynamically adjust the number of classifiers, and the algorithm runs with an ADWIN detector to detect the presence of concept drift. The experimental results show that the proposed algorithm ranks first on average in all five classification performance metrics compared to the state-of-the-art methods. Therefore, the proposed algorithm has better classification performance for imbalanced data streams with concept drift and also improves the operation efficiency of the algorithm.
Similar content being viewed by others
References
Bernardo A, Gomes HM et al (2020) C-SMOTE: continuous synthetic minority oversampling for evolving data streams. In: Proceedings of the IEEE international conference on big data, pp 483–492
Ren SQ, Zhu W, Li Z et al (2018) The Gradual Resampling Ensemble for mining imbalanced data streams with concept drift. Neurocomputing 286:150–166
Li H, Wang Y, Wang H (2017) Multi-window based ensemble learning for classification of imbalanced streaming data. World Wide Web 20(6):1507–1525
Gama J, Medas P (2004) Learning with drift detection detection. Adv Artif Intell 3171:286–295
Baena-Garc M, Campo-Ávila JD, Fidalgo-Merino R et al (2006) Early drift detection method. In: International workshop on knowledge discovery from data streams, vol 6, pp 77–86
Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the seventh SIAM international conference on data mining, pp 443–448
Ren SQ, Zhu W, Liao B et al (2019) Selection-based resampling ensemble algorithm for nonstationary imbalanced stream data learning. Knowl Based Syst 163:705–722
Chawla NV, Lazarevic A, Hall LO et al (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: Proceedings of knowledge discovery in databases: PKDD 2003, vol 2838. Springer, Berlin, pp 107–109
Du HL, Zhang Y, Gang K et al (2021) Online ensemble learning algorithm for imbalanced data stream. Appl Soft Comput 107:107378
Sun Y, Kamel MS, Wong AKC et al (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn 40(12):3358–3378
Oza NC, Russell S (2005) Online bagging and boosting. In: Proceedings of artificial intelligence and statistics, pp 105–112
Barros RSM, Santos SGT (2016) A boosting-like online learning ensemble. In: Proceedings of the 26 international joint conference on neural networks, pp 1871–1878
Wang BY, Pineau J (2016) Online bagging and boosting for imbalanced data stream. IEEE Trans Knowl Data Eng 28(12):3353–3366
Wang S, Minku LL, Yao X (2015) Resampling-based ensemble methods for online class imbalance learning. IEEE Trans Knowl Data Eng 27(5):1356–1368
Hou WH, Wang XK, Zhang HY et al (2020) A novel dynamic ensemble selection classifier for an imbalanced dataset: an application for credit risk assessment. Knowl Based Syst 208:106462
Ko AHR, Sabourin R, Britto AS et al (2008) From dynamic classifier selection to dynamic ensemble selection. Pattern Recognit 41:1718–1731
Woloszynski T, Kurzynski M, Podsiadlo P et al (2012) A measure of competence based on random classification for dynamic ensemble selection. Inf Fusion 13:207–213
Soares RGF, Santana A, Canuto AMP et al (2006) Using accuracy and diversity to select classifiers to build ensembles. In: Proceedings of IEEE international joint conference on neural network, Vancouver, Canada, pp 1310–1316
Cruz RMO, Sabourin R, Cavalcanti GDC et al (2015) META-DES: a dynamic ensemble selection framework using meta-learning. Pattern Recogn 48:1925–1935
García S, Zhang ZL, Altalhi A et al (2018) Dynamic ensemble selection for multi-class imbalanced datasets. Inf Sci 445:445–446
Zhang XL, Han M, Chen ZQ, Wu HX, Li MH (2021) An overview of complex data stream ensemble classification. J Intell Fuzzy Syst 41(2):3667–3695
Wang H, Fan W, Yu PS (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 226–235
Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 377–382
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 97–106
Cano A, Krawczyk B (2020) Kappa updated ensemble for drifting data stream mining. Mach Learn 1(109):178–218
Zyblewski P, Sabourin R, Wozniak M (2021) Preprocessed dynamic classifier ensemble selection for highly imbalanced drifted data streams. Inf Fusion 66:138–154
Bernardo A, Valle DE, Bifet A (2020) Increment rebalancing learning on evolving data streams. In: Proceedings of the 20th international conference on data mining workshops (ICDM), pp 844–850
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 71–80
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1604
Lemaire V, Salperwyck C, Bondu A (2015) A survey on supervised classification on data streams. Lecture Notes Bus Inf Process 205:88–125
Funding
This work was supported by the National Nature Science Foundation of China (62062004), the Ningxia Natural Science Foundation Project (2020AAC03216, 2022AAC03279) and the Graduate Innovation Project of North Minzu University (YCX21085).
Author information
Authors and Affiliations
Contributions
MH completes the main work; XZ completed the coding of the model, some experiments and the writing of the main paper; ZC completed some experiments and the production of experimental diagrams; HW and ML participated in the coordination of the study and reviewed the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
No potential conflict of interest was reported by the authors.
Human or animal rights
With the unanimous consent of all our authors, the paper is only about a research on a machine learning algorithm and does not involve Human Participants and/or Animals. All data are open source and do not involve the interests of others.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Han, M., Zhang, X., Chen, Z. et al. Dynamic ensemble selection classification algorithm based on window over imbalanced drift data stream. Knowl Inf Syst 65, 1105–1128 (2023). https://doi.org/10.1007/s10115-022-01791-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-022-01791-5