Abstract
Difficulties of learning from nonstationary data stream are generally twofold. First, dynamically structured learning framework is required to catch up with the evolution of unstable class concepts, i.e., concept drifts. Second, imbalanced class distribution over data stream demands a mechanism to intensify the underrepresented class concepts for improved overall performance. To alleviate the challenges brought by these issues, we propose the recursive ensemble approach (REA) in this paper. To battle against the imbalanced learning problem in training data chunk received at any timestamp t, i.e., \({{\mathcal{S}}_t,}\) REA adaptively pushes into \({{\mathcal{S}}_t}\) part of minority class examples received within [0, t − 1] to balance its skewed class distribution. Hypotheses are then progressively developed over time for all balanced training data chunks and combined together as an ensemble classifier in a dynamically weighted manner, which therefore addresses the concept drifts issue in time. Theoretical analysis proves that REA can provide less erroneous prediction results than a comparative algorithm. Besides that, empirical study on both synthetic benchmarks and real-world data set is also applied to validate effectiveness of REA as compared with other algorithms in terms of evaluation metrics consisting of overall prediction accuracy and ROC curve.
Similar content being viewed by others
References
Aggarwal C (2003) A framework for diagnosing changes in evolving data streams. In: ACM SIGMOD conference, pp 575–586
Aggarwal C (2007) Data streams: models and algorithms. Springer, New York
Angelov P, Zhou X (2006) Evolving fuzzy systems from data streams in real-time. In: IEEE symposium on evolving fuzzy systems. IEEE Press, Ambelside, pp 29–35
Babcock B, Badu S, Datar M, Motwani R, Wisdom J (2002) Models and issues in data stream systems. In: Proceedings of PODS
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth International, Belmont, CA
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: Proceedings of the principles of knowledge discovery in databases, PKDD-2003, pp 107–119
Chen S, He H (2009) Sera: selectively recursive approach towards nonstationary imbalanced stream data mining. IEEE-INNS-ENNS international joint conference on Neural Networks, pp 522–529
Chen S, He H (2010) Musera: multiple selectively recursive approach towards imbalanced stream data mining. In: Proceedings of world conference computational intellligence
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of international conference KDD. ACM Press, pp 71–80
Dovzan D, Skrjanc I (2010) Predictive functional control based on an adaptive fuzzy model of a hybrid semi-batch reactor. Control Eng Practise 18(8):979–989
Fan W (2004) Systematic data selection to mine concept-drifting data streams. In: Proceedings of ACM SIGKDD international conference knowledge discovery and data mining. ACM Press, pp 128–137
Fan W, Stolfo SJ, Zhang J, Chan PK (1999) Adacost: misclassification cost-sensitive boosting. In: Proceedings of 16th international conference on machine learning, pp 97–105
Fawcett T (2003) Roc graphs: notes and practical considerations for data mining researchers. Technical Report, HPL-2003-4
Filev D, Georgieva O (2010) An extended version of the gustafson-kessel algorithm for evolving data stream clustering. In: Angelov P, Filev D, Kasabov N (eds) Evolving intelligent systems: methodology and applications. IEEE Press Series on Computational Intelligence, Wiley, pp 273–300
Freund Y, Schapire R (1997) Decision-theoretic generalization of on-line learning and application to boosting. J Comput Syst Sci 55(1):119–139
Gaber MM, Krishnaswamy S, Zaslavsky A (2003) Adaptive mining techniques for data streams using algorithm output granularity mohamed. In: Workshop (AusDM 2003), held in conjunction with the 2003 congress on evolutionary computation (CEC 2003)
Gao J, Fan W, Han J (2007) On appropriate assumptions to mine data streams: analysis and practice. In: Proceedings of international conference data mining, Washington, DC, USA, pp 143–152
Gao J, Fan W, Han J, Yu PS (2007) A general framework for mining concept-drifting streams with skewed distribution. In: Proceedings of international conference SIAM
Georgieva O, Filev D (2009) Gustafson-kessel algorithm for evolving data stream clustering. In: Proceedings of international conference computer systems and technologies for PhD students in computing
Grossberg S (1988) Nonlinear neural networks: principles, mechanisms, and architectures. Neural Netw 1(1):17–161
Harries M (1999) Splice-2 comparative evaluation: electricity pricing. Tech. rep., The University of South Wales
He H, Chen S (2008) Imorl: Incremental multiple-object recognition and localization. IEEE Trans Neural Netw 19(10):1727–1738
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowledge Data Eng 21(9):1263–1284
Hong X, Chen S, Harris CJ (2007) A kernel-based two-class classifier for imbalanced data-sets. IEEE Trans Neural Netw 18(1):28–41
Lange S, Grieser G (2002) On the power of incremental learning. Theor Comput Sci 288(2):277–307
Last M (2002) Online classification of nonstationary data streams. Intell Data Analysis 6(2):129–147
Law Y, Zaniolo C (2005) An adaptive nearest neighbor classification algorithm for data streams. In: Proceedings of European Conference PKDD
Masnadi-Shirazi, Vasconcelos N (2007) Asymmetric boosting. In: Proceedings of international conference machine learning
Muhlbaier MD, Topalis A, Polikar R (2009) Learn++.nc: Combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes. IEEE Trans Neural Netw 20(1):152–168
Polikar R, Udpa L, Udpa S, Honavar V (2001) Learn++: an incremental learning algorithm for supervised neural networks. IEEE TransSyst Man Cybern C Spec Issue Knowledge Manage 31:497–508
Sharma A (1998) A note on batch and incremental learnability. J Comput Syst Sci 56(3):272–276
Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings the seventh ACM SIGKDD internatinal conference knowledge discovery and data mining. ACM Press, pp 377–382
Tumer K, Ghosh J (1996) Analysis of decision boundaries in linearly combined neural classifiers. Pattern Recog 29:341–348
Tumer K, Ghosh J (1996) Error correlation and error reduction in ensemble classifiers. Connect Sci 8(3–4):385–403
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 226–235
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, S., He, H. Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evolving Systems 2, 35–50 (2011). https://doi.org/10.1007/s12530-010-9021-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12530-010-9021-y