An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection

Sidhu, Parneeta; Bhatia, M. P. S.

doi:10.1007/s13042-015-0366-1

An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection

Original Article
Published: 30 April 2015

Volume 6, pages 883–909, (2015)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Parneeta Sidhu¹ &
M. P. S. Bhatia¹

471 Accesses
14 Citations
Explore all metrics

Abstract

Data Streams are continuous data instances arriving at a very high speed with varying underlying conceptual distribution. We present a novel online ensemble approach, Diversified online ensembles detection (DOED), for handling these drifting concepts in data streams. Our approach maintains two ensembles of weighted experts, an ensemble with low diversity and an ensemble with high diversity, which are updated as per their accuracy in classifying the new data instances. Our approach detects drifts by comparing the two accuracies: an accuracy of an ensemble on the recent examples and its accuracy since the beginning of the learning. The final prediction for an instance is the class predicted by the ensemble which gives better accuracy in classifying the recent examples. When a drift is detected by an ensemble, it is reinitialized still maintaining its diversity levels. Experimental evaluation using various artificial and real-world datasets proves that DOED provides very high accuracy in classifying new data instances, irrespective of the size of dataset, type of drift or presence of noise. We compare DOED with the other learners in terms of new performance metrics such as kappa statistic, model cost, and the evaluation time and memory requirements. Our approach proved to be highly resource effective achieving very high accuracies even in a resource constrained environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority

Article 31 January 2015

A two ensemble system to handle concept drifting data streams: recurring dynamic weighted majority

Article 02 November 2017

Kappa Updated Ensemble for drifting data stream mining

Article 02 October 2019

References

Baena-Garcı´a M, Campo-Avila JD, Fidalgo R, Bifet A (2006) Early Drift Detection Method. In: Proceedings of fourth ECML PKDD international workshop knowledge discovery from data streams, pp 77–86
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Proceedings of seventh Brazilian symposium artificial intelligence (SBIA’04), pp 286–295
Gao J, Fan W, Han J (2007a) On appropriate assumptions to mine data streams: analysis and practice. In: Proceedings of IEEE international conference data mining (ICDM,’07), pp 143–152
Minku FL, White A, Yao X (2010) The impact of diversity on on-line ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(5):730–742
Article Google Scholar
Dawid A, Vovk V (1999) Prequential probability: principles and proper ties. Bernoulli 5(1):125–162
Article MathSciNet MATH Google Scholar
Kolter JZ, Maloof MA (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the 3rd ICDM, 2003, USA, pp 123–130
Kolter JZ, Maloof MA (2005) Using additive expert ensembles to cope with concept drift. In: Proceedings of International conference machine learning (ICML’05), pp 449–456
Nishida K, Yamauchi K (2007a) Adaptive classifiers-ensemble system for tracking concept drift. In: Proceedings of sixth international conference machine learning and cybernetics (ICMLC’07), pp 3607–3612
Nishida K, Yamauchi K (2007b) Detecting concept drift using statistical testing. In: Proceedings of 10th International conference discovery science (DS’07), pp 264–269
Scholz M, Klinkenberg R (2005) An ensemble classifier for drifting concepts. Proceedings of the second international workshop on knowledge discovery from data streams (IWKDDS’05). Porto, Portugal, pp 53–64
Google Scholar
Stanley KO (2003) Learning concept drift with a committee of decision trees. Technical report AI-TR-03-302, Department of Computer Sciences, University of Texas, Austin, 2003
Street W, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th ACM international conference on knowledge discovery and data mining, ACM Press, New York, NY, pp 377–382
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of ACM SIGKDD international conference knowledge discovery and data mining, pp 226–235
Chu F, Zaniolo C (2004) Fast and light boosting for adaptive mining of data streams. In: Proceedings of Pacific-Asia conference knowledge discovery and data mining (PAKDD’04), pp 282–292
Scholz M, Klinkenberg R (2007) Boosting classifiers for drifting concepts. Intell Data Anal Special Issue Knowl Discov Data Streams 11(1):3–28
Google Scholar
Ramamurthy S, Bhatnagar R (2007) Tracking recurrent concept drift in streaming data using ensemble classifiers. In: Proceedings of International conference machine learning and applications (ICMLA’07), pp 404–409
Gao J, Fan W, Han J, Yu P (2007b) A general framework for mining concept-drifting data streams with skewed distributions. In: Proceedings of SIAM international conference data mining (ICDM)
He H, Chen S (2008) IMORL: incremental multiple-object recognition and localization. IEEE Trans Neural Netw 19(10):1727–1738
Article Google Scholar
Schlimmer J, Granger R (1986b) Beyond incremental processing: tracking concept drift. In: Proceedings of the 5th national conference on artificial intelligence. AAAI Press, Menlo Park, pp 502–507
Polikar R, Udpa L, Udpa SS, Honavar V (2001) Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans Syst Man Cybernet Part C 31(4):497–508
Article Google Scholar
Sidhu P, Bhatia MPS (2014) Extended dynamic weighted majority using diversity to handle drifts. New Trends Datab Inf Syst Adv Intell Syst Comput 241:389–395
Article Google Scholar
Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51:181–207
Article MATH Google Scholar
Tang EK, Sunganthan PN, Yao X (2006) An analysis of diversity measures. Mach Learn 65:247–271
Article Google Scholar
Tumer K, Ghosh J (1996) Error correlation and error reduction in ensemble classifiers. Connect Sci 8(3):385–404
Article Google Scholar
Kolter JZ, Maloof MA (2007) Dynamic weighted Majority: an ensemble method for drifting concepts. J Machine Learn Res 8:2755–2790
MATH Google Scholar
Blum A (1997) Empirical support for winnow and weighted majority algorithms: results on a calendar scheduling domain, machine learning. Kluwer Academic Publisher, Boston
Google Scholar
Littlestone N, Warmuth M (1994) The weighted majority algorithm. Inf Comput 108:212–261
Article MathSciNet MATH Google Scholar
Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354
Google Scholar
Widmer G (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101 .16.Klinkenberg R., Learning drifting
Google Scholar
Tsymbal A (2004) The problems of concept drift: definitions and related work. Technical Report TCD-CS-2004-15, Department of Computer Science, Trinity College Dublin, Ireland, April 2004
Nishida K, Yamauchi K, Omori T (2005) ACE: adaptive classifiers-ensemble system for concept-drifting environments. In: Proceedings of the 6th international workshop on multiple classifier systems, ser. Lecture notes in computer science, vol 3541, pp 176–185
Nishida K (2008) Learning and detecting concept drift, PhD dissertation, Hokkaido University. http://lis2.huie.hokudai.ac.jp/%20%20knishida/paper/nishida2008-dissertation%20.pdf
Tsai CJ, Lee CI, Yang WP (2009) Mining decision rules on data streams in the presence of concept drifts. Expert Syst Appl 36:1164–1178
Article Google Scholar
Gaber MM, Yu PS (2006) Detection and classification of changes in evolving data streams. Int J Inf Technol Dec Mak 5:659–670
Article Google Scholar
Yang Y, Wu X, Zhu X (2005) Combining proactive and reactive predictions for data streams. In: Proceedings of ACM SIGKDD, pp 710–715
Su L, Liu HY, Song ZH (2011) A new classification algorithm for data stream. Int J Modern Educat Comput Sci 4:32–39
Article Google Scholar
Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22:1517–1531
Article Google Scholar
Minku LL, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24(4):619
Article Google Scholar
Farid DM, Zhang L, Hossain A, Rahman CM, Strachan R, Sexton G, Dahal K (2013) An adaptive ensemble classifier for mining concept drifting data streams. Expert Syst Appl 40(15):5895–5906. doi:10.1016/j.eswa.2013.05.001
Article Google Scholar
Yule G (1900) On the association of attributes in statistics. philosophical trans. R Soc Lond Ser A 194:257–319
Article MATH Google Scholar
Oza NC, Russell S (2001) Experimental comparisons of online and batch versions of bagging and boosting. Proceedings of the Seventh ACM international conference on knowledge discovery and data mining (SIGKDD’01). ACM Press, New York, pp 359–364
Chapter Google Scholar
Yates F (1934) Contingency table involving small numbers and the χ2 test. J R Stat Soc Suppl 1:217–235
Article MATH Google Scholar
Gama J, Sebastião R, Rodrigues PP (2009) Issues in evaluation of stream learning algorithms. In KDD’09, pp 329–338
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of KDD’01, ACM Press. San Francisco, CA, pp 97–106
Harries M (1999) Splice-2 comparative evaluation: electricity pricing, technical report. University of New South Wales, Australia, July 1999
Blake C, Merz C (1998) UCI Repository of machine learning databases. Web site. http://www.ics.uci.edu/~mlearn/MLRepository.html, Department of Information and Computer Sciences, University of California, Irvine, 1998
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis, a framework for stream classification and clustering. Workshop on Applications of Pattern Analysis, JMLR: Workshop and Conference Proceedings 11(2010):44
Google Scholar
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 71–80
Asuncion A, Newman DJ (2007) UCI machine learning repository. Web site, Department of Information and Computer Sciences, University of California, Irvine, http://www.ics.uci.edu/~mlearn/MLRepository.html
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco
Google Scholar

Download references

Author information

Authors and Affiliations

Division of CoE, Netaji Subhas Institute of Technology, Sec-3, Dwarka, New Delhi, 110078, India
Parneeta Sidhu & M. P. S. Bhatia

Authors

Parneeta Sidhu
View author publications
You can also search for this author in PubMed Google Scholar
M. P. S. Bhatia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Parneeta Sidhu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sidhu, P., Bhatia, M.P.S. An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection. Int. J. Mach. Learn. & Cyber. 6, 883–909 (2015). https://doi.org/10.1007/s13042-015-0366-1

Download citation

Received: 29 October 2014
Accepted: 18 April 2015
Published: 30 April 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s13042-015-0366-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection

Abstract

Access this article

Similar content being viewed by others

A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority

A two ensemble system to handle concept drifting data streams: recurring dynamic weighted majority

Kappa Updated Ensemble for drifting data stream mining

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection

Abstract

Access this article

Similar content being viewed by others

A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority

A two ensemble system to handle concept drifting data streams: recurring dynamic weighted majority

Kappa Updated Ensemble for drifting data stream mining

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation