Skip to main content
Log in

Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

Difficulties of learning from nonstationary data stream are generally twofold. First, dynamically structured learning framework is required to catch up with the evolution of unstable class concepts, i.e., concept drifts. Second, imbalanced class distribution over data stream demands a mechanism to intensify the underrepresented class concepts for improved overall performance. To alleviate the challenges brought by these issues, we propose the recursive ensemble approach (REA) in this paper. To battle against the imbalanced learning problem in training data chunk received at any timestamp t, i.e., \({{\mathcal{S}}_t,}\) REA adaptively pushes into \({{\mathcal{S}}_t}\) part of minority class examples received within [0, t − 1] to balance its skewed class distribution. Hypotheses are then progressively developed over time for all balanced training data chunks and combined together as an ensemble classifier in a dynamically weighted manner, which therefore addresses the concept drifts issue in time. Theoretical analysis proves that REA can provide less erroneous prediction results than a comparative algorithm. Besides that, empirical study on both synthetic benchmarks and real-world data set is also applied to validate effectiveness of REA as compared with other algorithms in terms of evaluation metrics consisting of overall prediction accuracy and ROC curve.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Aggarwal C (2003) A framework for diagnosing changes in evolving data streams. In: ACM SIGMOD conference, pp 575–586

  • Aggarwal C (2007) Data streams: models and algorithms. Springer, New York

  • Angelov P, Zhou X (2006) Evolving fuzzy systems from data streams in real-time. In: IEEE symposium on evolving fuzzy systems. IEEE Press, Ambelside, pp 29–35

  • Babcock B, Badu S, Datar M, Motwani R, Wisdom J (2002) Models and issues in data stream systems. In: Proceedings of PODS

  • Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth International, Belmont, CA

  • Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    MATH  Google Scholar 

  • Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: Proceedings of the principles of knowledge discovery in databases, PKDD-2003, pp 107–119

  • Chen S, He H (2009) Sera: selectively recursive approach towards nonstationary imbalanced stream data mining. IEEE-INNS-ENNS international joint conference on Neural Networks, pp 522–529

  • Chen S, He H (2010) Musera: multiple selectively recursive approach towards imbalanced stream data mining. In: Proceedings of world conference computational intellligence

  • Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of international conference KDD. ACM Press, pp 71–80

  • Dovzan D, Skrjanc I (2010) Predictive functional control based on an adaptive fuzzy model of a hybrid semi-batch reactor. Control Eng Practise 18(8):979–989

    Article  Google Scholar 

  • Fan W (2004) Systematic data selection to mine concept-drifting data streams. In: Proceedings of ACM SIGKDD international conference knowledge discovery and data mining. ACM Press, pp 128–137

  • Fan W, Stolfo SJ, Zhang J, Chan PK (1999) Adacost: misclassification cost-sensitive boosting. In: Proceedings of 16th international conference on machine learning, pp 97–105

  • Fawcett T (2003) Roc graphs: notes and practical considerations for data mining researchers. Technical Report, HPL-2003-4

  • Filev D, Georgieva O (2010) An extended version of the gustafson-kessel algorithm for evolving data stream clustering. In: Angelov P, Filev D, Kasabov N (eds) Evolving intelligent systems: methodology and applications. IEEE Press Series on Computational Intelligence, Wiley, pp 273–300

  • Freund Y, Schapire R (1997) Decision-theoretic generalization of on-line learning and application to boosting. J Comput Syst Sci 55(1):119–139

    Article  MATH  MathSciNet  Google Scholar 

  • Gaber MM, Krishnaswamy S, Zaslavsky A (2003) Adaptive mining techniques for data streams using algorithm output granularity mohamed. In: Workshop (AusDM 2003), held in conjunction with the 2003 congress on evolutionary computation (CEC 2003)

  • Gao J, Fan W, Han J (2007) On appropriate assumptions to mine data streams: analysis and practice. In: Proceedings of international conference data mining, Washington, DC, USA, pp 143–152

  • Gao J, Fan W, Han J, Yu PS (2007) A general framework for mining concept-drifting streams with skewed distribution. In: Proceedings of international conference SIAM

  • Georgieva O, Filev D (2009) Gustafson-kessel algorithm for evolving data stream clustering. In: Proceedings of international conference computer systems and technologies for PhD students in computing

  • Grossberg S (1988) Nonlinear neural networks: principles, mechanisms, and architectures. Neural Netw 1(1):17–161

    Article  MathSciNet  Google Scholar 

  • Harries M (1999) Splice-2 comparative evaluation: electricity pricing. Tech. rep., The University of South Wales

  • He H, Chen S (2008) Imorl: Incremental multiple-object recognition and localization. IEEE Trans Neural Netw 19(10):1727–1738

    Article  Google Scholar 

  • He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowledge Data Eng 21(9):1263–1284

    Article  Google Scholar 

  • Hong X, Chen S, Harris CJ (2007) A kernel-based two-class classifier for imbalanced data-sets. IEEE Trans Neural Netw 18(1):28–41

    Article  Google Scholar 

  • Lange S, Grieser G (2002) On the power of incremental learning. Theor Comput Sci 288(2):277–307

    Article  MATH  MathSciNet  Google Scholar 

  • Last M (2002) Online classification of nonstationary data streams. Intell Data Analysis 6(2):129–147

    MATH  MathSciNet  Google Scholar 

  • Law Y, Zaniolo C (2005) An adaptive nearest neighbor classification algorithm for data streams. In: Proceedings of European Conference PKDD

  • Masnadi-Shirazi, Vasconcelos N (2007) Asymmetric boosting. In: Proceedings of international conference machine learning

  • Muhlbaier MD, Topalis A, Polikar R (2009) Learn++.nc: Combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes. IEEE Trans Neural Netw 20(1):152–168

    Article  Google Scholar 

  • Polikar R, Udpa L, Udpa S, Honavar V (2001) Learn++: an incremental learning algorithm for supervised neural networks. IEEE TransSyst Man Cybern C Spec Issue Knowledge Manage 31:497–508

    Article  Google Scholar 

  • Sharma A (1998) A note on batch and incremental learnability. J Comput Syst Sci 56(3):272–276

    Article  MATH  Google Scholar 

  • Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings the seventh ACM SIGKDD internatinal conference knowledge discovery and data mining. ACM Press, pp 377–382

  • Tumer K, Ghosh J (1996) Analysis of decision boundaries in linearly combined neural classifiers. Pattern Recog 29:341–348

    Article  Google Scholar 

  • Tumer K, Ghosh J (1996) Error correlation and error reduction in ensemble classifiers. Connect Sci 8(3–4):385–403

    Article  Google Scholar 

  • Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 226–235

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haibo He.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, S., He, H. Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evolving Systems 2, 35–50 (2011). https://doi.org/10.1007/s12530-010-9021-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-010-9021-y

Keywords

Navigation