Abstract
Data Streams are continuous data instances arriving at a very high speed with varying underlying conceptual distribution. We present a novel online ensemble approach, Diversified online ensembles detection (DOED), for handling these drifting concepts in data streams. Our approach maintains two ensembles of weighted experts, an ensemble with low diversity and an ensemble with high diversity, which are updated as per their accuracy in classifying the new data instances. Our approach detects drifts by comparing the two accuracies: an accuracy of an ensemble on the recent examples and its accuracy since the beginning of the learning. The final prediction for an instance is the class predicted by the ensemble which gives better accuracy in classifying the recent examples. When a drift is detected by an ensemble, it is reinitialized still maintaining its diversity levels. Experimental evaluation using various artificial and real-world datasets proves that DOED provides very high accuracy in classifying new data instances, irrespective of the size of dataset, type of drift or presence of noise. We compare DOED with the other learners in terms of new performance metrics such as kappa statistic, model cost, and the evaluation time and memory requirements. Our approach proved to be highly resource effective achieving very high accuracies even in a resource constrained environment.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Baena-Garcı´a M, Campo-Avila JD, Fidalgo R, Bifet A (2006) Early Drift Detection Method. In: Proceedings of fourth ECML PKDD international workshop knowledge discovery from data streams, pp 77–86
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Proceedings of seventh Brazilian symposium artificial intelligence (SBIA’04), pp 286–295
Gao J, Fan W, Han J (2007a) On appropriate assumptions to mine data streams: analysis and practice. In: Proceedings of IEEE international conference data mining (ICDM,’07), pp 143–152
Minku FL, White A, Yao X (2010) The impact of diversity on on-line ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(5):730–742
Dawid A, Vovk V (1999) Prequential probability: principles and proper ties. Bernoulli 5(1):125–162
Kolter JZ, Maloof MA (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the 3rd ICDM, 2003, USA, pp 123–130
Kolter JZ, Maloof MA (2005) Using additive expert ensembles to cope with concept drift. In: Proceedings of International conference machine learning (ICML’05), pp 449–456
Nishida K, Yamauchi K (2007a) Adaptive classifiers-ensemble system for tracking concept drift. In: Proceedings of sixth international conference machine learning and cybernetics (ICMLC’07), pp 3607–3612
Nishida K, Yamauchi K (2007b) Detecting concept drift using statistical testing. In: Proceedings of 10th International conference discovery science (DS’07), pp 264–269
Scholz M, Klinkenberg R (2005) An ensemble classifier for drifting concepts. Proceedings of the second international workshop on knowledge discovery from data streams (IWKDDS’05). Porto, Portugal, pp 53–64
Stanley KO (2003) Learning concept drift with a committee of decision trees. Technical report AI-TR-03-302, Department of Computer Sciences, University of Texas, Austin, 2003
Street W, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th ACM international conference on knowledge discovery and data mining, ACM Press, New York, NY, pp 377–382
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of ACM SIGKDD international conference knowledge discovery and data mining, pp 226–235
Chu F, Zaniolo C (2004) Fast and light boosting for adaptive mining of data streams. In: Proceedings of Pacific-Asia conference knowledge discovery and data mining (PAKDD’04), pp 282–292
Scholz M, Klinkenberg R (2007) Boosting classifiers for drifting concepts. Intell Data Anal Special Issue Knowl Discov Data Streams 11(1):3–28
Ramamurthy S, Bhatnagar R (2007) Tracking recurrent concept drift in streaming data using ensemble classifiers. In: Proceedings of International conference machine learning and applications (ICMLA’07), pp 404–409
Gao J, Fan W, Han J, Yu P (2007b) A general framework for mining concept-drifting data streams with skewed distributions. In: Proceedings of SIAM international conference data mining (ICDM)
He H, Chen S (2008) IMORL: incremental multiple-object recognition and localization. IEEE Trans Neural Netw 19(10):1727–1738
Schlimmer J, Granger R (1986b) Beyond incremental processing: tracking concept drift. In: Proceedings of the 5th national conference on artificial intelligence. AAAI Press, Menlo Park, pp 502–507
Polikar R, Udpa L, Udpa SS, Honavar V (2001) Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans Syst Man Cybernet Part C 31(4):497–508
Sidhu P, Bhatia MPS (2014) Extended dynamic weighted majority using diversity to handle drifts. New Trends Datab Inf Syst Adv Intell Syst Comput 241:389–395
Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51:181–207
Tang EK, Sunganthan PN, Yao X (2006) An analysis of diversity measures. Mach Learn 65:247–271
Tumer K, Ghosh J (1996) Error correlation and error reduction in ensemble classifiers. Connect Sci 8(3):385–404
Kolter JZ, Maloof MA (2007) Dynamic weighted Majority: an ensemble method for drifting concepts. J Machine Learn Res 8:2755–2790
Blum A (1997) Empirical support for winnow and weighted majority algorithms: results on a calendar scheduling domain, machine learning. Kluwer Academic Publisher, Boston
Littlestone N, Warmuth M (1994) The weighted majority algorithm. Inf Comput 108:212–261
Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354
Widmer G (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101 .16.Klinkenberg R., Learning drifting
Tsymbal A (2004) The problems of concept drift: definitions and related work. Technical Report TCD-CS-2004-15, Department of Computer Science, Trinity College Dublin, Ireland, April 2004
Nishida K, Yamauchi K, Omori T (2005) ACE: adaptive classifiers-ensemble system for concept-drifting environments. In: Proceedings of the 6th international workshop on multiple classifier systems, ser. Lecture notes in computer science, vol 3541, pp 176–185
Nishida K (2008) Learning and detecting concept drift, PhD dissertation, Hokkaido University. http://lis2.huie.hokudai.ac.jp/%20%20knishida/paper/nishida2008-dissertation%20.pdf
Tsai CJ, Lee CI, Yang WP (2009) Mining decision rules on data streams in the presence of concept drifts. Expert Syst Appl 36:1164–1178
Gaber MM, Yu PS (2006) Detection and classification of changes in evolving data streams. Int J Inf Technol Dec Mak 5:659–670
Yang Y, Wu X, Zhu X (2005) Combining proactive and reactive predictions for data streams. In: Proceedings of ACM SIGKDD, pp 710–715
Su L, Liu HY, Song ZH (2011) A new classification algorithm for data stream. Int J Modern Educat Comput Sci 4:32–39
Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22:1517–1531
Minku LL, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24(4):619
Farid DM, Zhang L, Hossain A, Rahman CM, Strachan R, Sexton G, Dahal K (2013) An adaptive ensemble classifier for mining concept drifting data streams. Expert Syst Appl 40(15):5895–5906. doi:10.1016/j.eswa.2013.05.001
Yule G (1900) On the association of attributes in statistics. philosophical trans. R Soc Lond Ser A 194:257–319
Oza NC, Russell S (2001) Experimental comparisons of online and batch versions of bagging and boosting. Proceedings of the Seventh ACM international conference on knowledge discovery and data mining (SIGKDD’01). ACM Press, New York, pp 359–364
Yates F (1934) Contingency table involving small numbers and the χ2 test. J R Stat Soc Suppl 1:217–235
Gama J, Sebastião R, Rodrigues PP (2009) Issues in evaluation of stream learning algorithms. In KDD’09, pp 329–338
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of KDD’01, ACM Press. San Francisco, CA, pp 97–106
Harries M (1999) Splice-2 comparative evaluation: electricity pricing, technical report. University of New South Wales, Australia, July 1999
Blake C, Merz C (1998) UCI Repository of machine learning databases. Web site. http://www.ics.uci.edu/~mlearn/MLRepository.html, Department of Information and Computer Sciences, University of California, Irvine, 1998
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis, a framework for stream classification and clustering. Workshop on Applications of Pattern Analysis, JMLR: Workshop and Conference Proceedings 11(2010):44
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 71–80
Asuncion A, Newman DJ (2007) UCI machine learning repository. Web site, Department of Information and Computer Sciences, University of California, Irvine, http://www.ics.uci.edu/~mlearn/MLRepository.html
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sidhu, P., Bhatia, M.P.S. An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection. Int. J. Mach. Learn. & Cyber. 6, 883–909 (2015). https://doi.org/10.1007/s13042-015-0366-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-015-0366-1