Skip to main content
Log in

An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Data Streams are continuous data instances arriving at a very high speed with varying underlying conceptual distribution. We present a novel online ensemble approach, Diversified online ensembles detection (DOED), for handling these drifting concepts in data streams. Our approach maintains two ensembles of weighted experts, an ensemble with low diversity and an ensemble with high diversity, which are updated as per their accuracy in classifying the new data instances. Our approach detects drifts by comparing the two accuracies: an accuracy of an ensemble on the recent examples and its accuracy since the beginning of the learning. The final prediction for an instance is the class predicted by the ensemble which gives better accuracy in classifying the recent examples. When a drift is detected by an ensemble, it is reinitialized still maintaining its diversity levels. Experimental evaluation using various artificial and real-world datasets proves that DOED provides very high accuracy in classifying new data instances, irrespective of the size of dataset, type of drift or presence of noise. We compare DOED with the other learners in terms of new performance metrics such as kappa statistic, model cost, and the evaluation time and memory requirements. Our approach proved to be highly resource effective achieving very high accuracies even in a resource constrained environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Baena-Garcı´a M, Campo-Avila JD, Fidalgo R, Bifet A (2006) Early Drift Detection Method. In: Proceedings of fourth ECML PKDD international workshop knowledge discovery from data streams, pp 77–86

  2. Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Proceedings of seventh Brazilian symposium artificial intelligence (SBIA’04), pp 286–295

  3. Gao J, Fan W, Han J (2007a) On appropriate assumptions to mine data streams: analysis and practice. In: Proceedings of IEEE international conference data mining (ICDM,’07), pp 143–152

  4. Minku FL, White A, Yao X (2010) The impact of diversity on on-line ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(5):730–742

    Article  Google Scholar 

  5. Dawid A, Vovk V (1999) Prequential probability: principles and proper ties. Bernoulli 5(1):125–162

    Article  MathSciNet  MATH  Google Scholar 

  6. Kolter JZ, Maloof MA (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the 3rd ICDM, 2003, USA, pp 123–130

  7. Kolter JZ, Maloof MA (2005) Using additive expert ensembles to cope with concept drift. In: Proceedings of International conference machine learning (ICML’05), pp 449–456

  8. Nishida K, Yamauchi K (2007a) Adaptive classifiers-ensemble system for tracking concept drift. In: Proceedings of sixth international conference machine learning and cybernetics (ICMLC’07), pp 3607–3612

  9. Nishida K, Yamauchi K (2007b) Detecting concept drift using statistical testing. In: Proceedings of 10th International conference discovery science (DS’07), pp 264–269

  10. Scholz M, Klinkenberg R (2005) An ensemble classifier for drifting concepts. Proceedings of the second international workshop on knowledge discovery from data streams (IWKDDS’05). Porto, Portugal, pp 53–64

    Google Scholar 

  11. Stanley KO (2003) Learning concept drift with a committee of decision trees. Technical report AI-TR-03-302, Department of Computer Sciences, University of Texas, Austin, 2003

  12. Street W, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th ACM international conference on knowledge discovery and data mining, ACM Press, New York, NY, pp 377–382

  13. Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of ACM SIGKDD international conference knowledge discovery and data mining, pp 226–235

  14. Chu F, Zaniolo C (2004) Fast and light boosting for adaptive mining of data streams. In: Proceedings of Pacific-Asia conference knowledge discovery and data mining (PAKDD’04), pp 282–292

  15. Scholz M, Klinkenberg R (2007) Boosting classifiers for drifting concepts. Intell Data Anal Special Issue Knowl Discov Data Streams 11(1):3–28

    Google Scholar 

  16. Ramamurthy S, Bhatnagar R (2007) Tracking recurrent concept drift in streaming data using ensemble classifiers. In: Proceedings of International conference machine learning and applications (ICMLA’07), pp 404–409

  17. Gao J, Fan W, Han J, Yu P (2007b) A general framework for mining concept-drifting data streams with skewed distributions. In: Proceedings of SIAM international conference data mining (ICDM)

  18. He H, Chen S (2008) IMORL: incremental multiple-object recognition and localization. IEEE Trans Neural Netw 19(10):1727–1738

    Article  Google Scholar 

  19. Schlimmer J, Granger R (1986b) Beyond incremental processing: tracking concept drift. In: Proceedings of the 5th national conference on artificial intelligence. AAAI Press, Menlo Park, pp 502–507

  20. Polikar R, Udpa L, Udpa SS, Honavar V (2001) Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans Syst Man Cybernet Part C 31(4):497–508

    Article  Google Scholar 

  21. Sidhu P, Bhatia MPS (2014) Extended dynamic weighted majority using diversity to handle drifts. New Trends Datab Inf Syst Adv Intell Syst Comput 241:389–395

    Article  Google Scholar 

  22. Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51:181–207

    Article  MATH  Google Scholar 

  23. Tang EK, Sunganthan PN, Yao X (2006) An analysis of diversity measures. Mach Learn 65:247–271

    Article  Google Scholar 

  24. Tumer K, Ghosh J (1996) Error correlation and error reduction in ensemble classifiers. Connect Sci 8(3):385–404

    Article  Google Scholar 

  25. Kolter JZ, Maloof MA (2007) Dynamic weighted Majority: an ensemble method for drifting concepts. J Machine Learn Res 8:2755–2790

    MATH  Google Scholar 

  26. Blum A (1997) Empirical support for winnow and weighted majority algorithms: results on a calendar scheduling domain, machine learning. Kluwer Academic Publisher, Boston

    Google Scholar 

  27. Littlestone N, Warmuth M (1994) The weighted majority algorithm. Inf Comput 108:212–261

    Article  MathSciNet  MATH  Google Scholar 

  28. Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354

    Google Scholar 

  29. Widmer G (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101 .16.Klinkenberg R., Learning drifting

    Google Scholar 

  30. Tsymbal A (2004) The problems of concept drift: definitions and related work. Technical Report TCD-CS-2004-15, Department of Computer Science, Trinity College Dublin, Ireland, April 2004

  31. Nishida K, Yamauchi K, Omori T (2005) ACE: adaptive classifiers-ensemble system for concept-drifting environments. In: Proceedings of the 6th international workshop on multiple classifier systems, ser. Lecture notes in computer science, vol 3541, pp 176–185

  32. Nishida K (2008) Learning and detecting concept drift, PhD dissertation, Hokkaido University. http://lis2.huie.hokudai.ac.jp/%20%20knishida/paper/nishida2008-dissertation%20.pdf

  33. Tsai CJ, Lee CI, Yang WP (2009) Mining decision rules on data streams in the presence of concept drifts. Expert Syst Appl 36:1164–1178

    Article  Google Scholar 

  34. Gaber MM, Yu PS (2006) Detection and classification of changes in evolving data streams. Int J Inf Technol Dec Mak 5:659–670

    Article  Google Scholar 

  35. Yang Y, Wu X, Zhu X (2005) Combining proactive and reactive predictions for data streams. In: Proceedings of ACM SIGKDD, pp 710–715

  36. Su L, Liu HY, Song ZH (2011) A new classification algorithm for data stream. Int J Modern Educat Comput Sci 4:32–39

    Article  Google Scholar 

  37. Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22:1517–1531

    Article  Google Scholar 

  38. Minku LL, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24(4):619

    Article  Google Scholar 

  39. Farid DM, Zhang L, Hossain A, Rahman CM, Strachan R, Sexton G, Dahal K (2013) An adaptive ensemble classifier for mining concept drifting data streams. Expert Syst Appl 40(15):5895–5906. doi:10.1016/j.eswa.2013.05.001

    Article  Google Scholar 

  40. Yule G (1900) On the association of attributes in statistics. philosophical trans. R Soc Lond Ser A 194:257–319

    Article  MATH  Google Scholar 

  41. Oza NC, Russell S (2001) Experimental comparisons of online and batch versions of bagging and boosting. Proceedings of the Seventh ACM international conference on knowledge discovery and data mining (SIGKDD’01). ACM Press, New York, pp 359–364

    Chapter  Google Scholar 

  42. Yates F (1934) Contingency table involving small numbers and the χ2 test. J R Stat Soc Suppl 1:217–235

    Article  MATH  Google Scholar 

  43. Gama J, Sebastião R, Rodrigues PP (2009) Issues in evaluation of stream learning algorithms. In KDD’09, pp 329–338

  44. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of KDD’01, ACM Press. San Francisco, CA, pp 97–106

  45. Harries M (1999) Splice-2 comparative evaluation: electricity pricing, technical report. University of New South Wales, Australia, July 1999

  46. Blake C, Merz C (1998) UCI Repository of machine learning databases. Web site. http://www.ics.uci.edu/~mlearn/MLRepository.html, Department of Information and Computer Sciences, University of California, Irvine, 1998

  47. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis, a framework for stream classification and clustering. Workshop on Applications of Pattern Analysis, JMLR: Workshop and Conference Proceedings 11(2010):44

    Google Scholar 

  48. Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 71–80

  49. Asuncion A, Newman DJ (2007) UCI machine learning repository. Web site, Department of Information and Computer Sciences, University of California, Irvine, http://www.ics.uci.edu/~mlearn/MLRepository.html

  50. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Parneeta Sidhu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sidhu, P., Bhatia, M.P.S. An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection. Int. J. Mach. Learn. & Cyber. 6, 883–909 (2015). https://doi.org/10.1007/s13042-015-0366-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-015-0366-1

Keywords

Navigation