Skip to main content
Log in

A two ensemble system to handle concept drifting data streams: recurring dynamic weighted majority

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

We present an ensemble system, recurring dynamic weighted majority (RDWM) that maintains two ensembles of experts, so as to accurately handle drifting concepts mainly recurrent drifts. The primary online ensemble represents the present concepts and the secondary ensemble represents the old concepts since the beginning of learning. An effective pruning methodology helps to remove redundant and old classifiers, which may have otherwise caused interference in learning the new concepts. Experimental evaluation using datasets proves that RDWM achieves very high generalization accuracy, irrespective of the speed or severity of drift; or presence of noise in the dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Baena-Garcia M, Campo-Avila JD, Fidalgo R, Bifet A (2006) Early drift detection method. In: Proc. 4th ECML PKDD Int’l Workshop Knowled. Discovery from Data Streams, pp 77–86

  2. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis, a framework for stream classification and clustering, JMLR: workshop and conference proceedings, vol 11, p 44

  3. Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):1–37

    Article  Google Scholar 

  4. Dawid A, Vovk V (1999) Prequential probability: principles and proper ties. Bernoulli 5(1):125–162

    Article  MathSciNet  Google Scholar 

  5. Dietterich TG (1997) Machine learning research: four current directions. Artif Intell 18(4):97–136

    Google Scholar 

  6. Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection, SBIA’04, pp 286–295

  7. Patrick PK, Yeung DS, Ng WWY et al (2012) Dynamic fusion method using localized generalization error model. Inf Sci 217:1–20

    Article  Google Scholar 

  8. Harries M (1999) Splice-2 comparative evaluation: electricity pricing, Technical report. University of New South Wales, Australia, July 1999

  9. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. KDD, San Francisco, pp 97–106

    Google Scholar 

  10. Garcı´a S, Ferna´ndez A, Luengo J et al (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064

    Article  Google Scholar 

  11. Ramamurthy S, Bhatnagar R (2007) Tracking recurrent concept drift in streaming data Using ensemble classifiers. ICMLA’07, pp 404–409

  12. Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. JMLR 8:2755–2790

    MATH  Google Scholar 

  13. Littlestone N, Warmuth M (1994) The weighted majority algorithm. Inform Comput 108:212–261

    Article  MathSciNet  Google Scholar 

  14. Oza N, Russell S (2001) Online bagging and boosting. In: Artificial intelligence and statistics 2001”. Morgan Kaufmann, pp 105–112

  15. Gomes J, Menasalvas E, Sousa P (2011) Learning recurring concepts from data streams with a context-aware ensemble. In: ACM Symp. on Applied Computing, pp 994–999

  16. Minku LL, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. ITKDE 24(4):619

    Google Scholar 

  17. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  18. Nishida K, Yamauchi K (2007) Adaptive classifiers-ensemble system for tracking concept drift. In: ICMLC’07, pp 3607–3612

  19. Kumar Y, Sahoo G (2015) Hybridization of magnetic charge system search and particle swarm optimization for efficient data clustering using neighborhood search strategy. Soft Comput 19(12):3621–3645

    Article  Google Scholar 

  20. Nishida K, Yamauchi K, Omori T (2005) ACE: Adaptive classifiers-ensemble system for concept-drifting environments. In: 6th Int’l Workshop on Multiple Classifier Systems, ser. LNCS, vol 3541, pp 176–185

  21. Oza NC, Russell S (2001) Experimental comparisons of online and batch versions of bagging and boosting. In: Proc. of Seventh ACM SIGKDD’01. ACM, NY, pp 359–364

    Google Scholar 

  22. Bach SH, Maloof MA (2008). Paired learners for concept drift. ICDM’08, Los Alamitos, pp 23–32

  23. Bifet A, Gavaldà R (2007) Learning from time-changing data with adaptive windowing. In: SDM’07. SIAM, Florida, pp 443–448

    Google Scholar 

  24. Kumar Y, Sahoo G (2015) A two-step artificial bee colony algorithm for clustering. NCAA, pp 1–15

  25. Zhu X (2010) Stream data mining repository. http://www.cse.fau.edu/~xqzhu/stream.html. Accessed 13 Mar 2016

  26. Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354

    Google Scholar 

  27. Alippi C, Boracchi G, Roveri M (2013) Just-in-time classifiers for recurrent concepts. IEEE Trans Neural Netw Learn Syst 24(4):620–634

    Article  Google Scholar 

  28. Hosseini M, Ahmadi Z, Beigy H (2013) Using a classifier pool in accuracy based tracking of recurring concepts in data stream classification. ES 4:43–60

    Google Scholar 

  29. Gama J, Sebastiao R, Rodrigues PP (2009) Issues in evaluation of stream learning algorithms. In: ACM SIGKDD’09, pp 329–338

  30. Sidhu P, Bhatia MPS (2015) A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority, IJMLC. Springer, Berlin Heidelberg

    Google Scholar 

  31. Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Irvine

    Google Scholar 

  32. Gama J, Sebastião R, Rodrigues PP (2009) Issues in evaluation of stream learning algorithms. In: KDD’09, pp 329–338

  33. Daniel, Wayne W (1990). Friedman two-way analysis of variance by ranks. Applied nonparametric statistics, 2nd edn. PWS-Kent, Boston, pp 262–274. ISBN 0-534-91976-6

  34. The UCI KDD (1999) Archive. http://mlr.cs.umass.edu/ml/databases /kddcup99/kddcup99.html. Accessed 10 May 2016

  35. Wang X-z, Xing H-J, Li Y et al (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654

    Article  Google Scholar 

  36. Wang X, Rana A, Ai-Min F (2015) Fuzziness based sample categorization for classifier performance improvement. JIFS 29:1185–1196

    MathSciNet  Google Scholar 

  37. Ashfaq RAR, Wang XZ, Huang JZ, Abbas H, He YL (2017) Fuzziness based semi-supervised learning approach for intrusion detection system., Inf Sci 378:484–497

    Article  Google Scholar 

  38. Khan I, Huang JZ, Ivanov K (2016) Incremental density-based ensemble clustering over evolving data streams. Neurocomputing 191:34–43. doi:https://doi.org/10.1016/j.neucom.2016.01.009

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Parneeta Sidhu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sidhu, P., Bhatia, M.P.S. A two ensemble system to handle concept drifting data streams: recurring dynamic weighted majority. Int. J. Mach. Learn. & Cyber. 10, 563–578 (2019). https://doi.org/10.1007/s13042-017-0738-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-017-0738-9

Keywords

Navigation