Skip to main content
Log in

Using a classifier pool in accuracy based tracking of recurring concepts in data stream classification

  • Original Article
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

Data streams have some unique properties which make them applicable in precise modeling of many real data mining applications. The most challenging property of data streams is the occurrence of “concept drift”. Recurring concepts is a type of concept drift which can be seen in most of real world problems. Detecting recurring concepts makes it possible to exploit previous knowledge obtained in the learning process. This leads to quick adaptation of the learner whenever a concept reappears. In this paper, we propose a learning algorithm called Pool and Accuracy based Stream Classification with some variations, which takes the advantage of maintaining a pool of classifiers to track recurring concepts. Each classifier is used to describe an existing concept. Consecutive batches of instances are first classified by the pool of classifiers. Two approaches are presented for this task: active classifier and weighted classifiers methods. Then the true labels are revealed and the pool is updated at the end of the batch. Updating the pool is done using one of the following methods: exact Bayesian, Bayesian and Heuristic. As the algorithm may assign multiple classifiers to a single concept, a classifier merging process is used to resolve this problem. Experimental results on real and artificial datasets show the effectiveness of weighted classifiers method while dealing with sudden concept drifting datasets. In addition, the proposed updating methods outperform the existing algorithms in datasets with arbitrary attributes. Finally some performed experiments represent superiority of using merging process in large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. The Apache SpamAssasin Project-http://spamassassin.apache.org/.

References

  • Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66

    Google Scholar 

  • Baena-García M et al (2006) Early drift detection method. In: ECML PKDD workshop on knowledge discovery from data streams, pp 77–86

  • Bifet A et al (2010) Accurate ensembles for data streams: combining restricted Hoeffding trees using stacking. In: 2nd Asian conference on machine learning, Tokyo, Japan: JMLR, pp 225–240

  • Bifet A (2009) Adaptive learning and mining for data streams and frequent patterns, PhD Thesis in Departament de Llenguatges i Sistemes Informatics. Universitat Politecnica de Catalunya

  • Bifet A, Gavaldà R (2007) Learning from time-changing data with adaptive windowing. In: SIAM international conference on data mining (SDM’07). Minneapolis, Minnesota, pp 443–448

  • Castillo G (2006) Adaptive learning algorithms for Bayesian network classifiers, PhD thesis in mathematics. Aveiro University

  • Domingos P, Hulten G (2000), Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Boston, pp 71–80

  • Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 99:1517–1531

    Article  Google Scholar 

  • Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed Jan 2012

  • Freund Y, Schapire RE (1996) Game theory, on-line prediction and boosting. In: Proceedings of the ninth annual conference on Computational learning theory. ACM, New York, pp 325–332

  • Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    Article  MathSciNet  MATH  Google Scholar 

  • Gama J, Castillo G (2006) Learning with local drift detection. In: The proceedings of the advanced data mining and applications. Springer, Berlin, pp 42–55

  • Gama J, Kosina P (2009) Tracking recurring concepts with meta-learners. In: Proceedings of the 14th Portuguese conference on artificial intelligence: progress in artificial intelligence. Springer, Aveiro, pp 423–434

  • Gama J, Medas P, Rocha R (2004) Forest trees for on-line data. In: Proceedings of the 2004 ACM symposium on applied computing. ACM, Nicosia, pp 632–636

  • Gama J, Fernandes R, Rocha R (2006) Decision trees for mining data streams. Intell Data Anal 10(1):23–45

    Google Scholar 

  • Gao J, Fan W, Han J (2007) On appropriate assumptions to mine data streams: analysis and practice. In: Proceedings of the 2007 seventh IEEE international conference on data mining, pp 143–159

  • Gao J et al (2008) Classifying data streams with skewed class distributions and concept drifts. IEEE Internet Comput 12(6):37–49

    Article  Google Scholar 

  • Garnett R (2010) learning from data streams with concept drift, PhD Thesis in engineering science. University of Oxford

  • Gomes JB, Menasalvas E, Sousa PAC (2011) Learning recurring concepts from data streams with a context-aware ensemble. In: Proceedings of the 2011 ACM symposium on applied computing, pp 994–999

  • Hosseini MJ, Ahmadi Z, Beigy H (2011) Pool and accuracy based stream classification: a new ensemble algorithm on data stream classification using recurring concepts detection. In: Proceedings of the 2011 IEEE eleventh international conference on data mining workshops (ICDMW). IEEE, Vancouver, pp 588–595

  • Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, San Francisco, pp 97–106

  • Ikonomovska E, Gama J, S. Deroski., (2011) Learning model trees from evolving data streams. Data Mining Knowl Discov 23(1): 128–168

    Google Scholar 

  • Katakis I, Tsoumakas G, Vlahavas I (2009) Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 22(3):371–391

    Article  Google Scholar 

  • Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell Data Anal 8(3):281–300

    Google Scholar 

  • Kolter JZ, Maloof MA (2005) Using additive expert ensembles to cope with concept drift. In: Proceedings of the 22nd international conference on Machine learning. ACM, Bonn, pp 449–456

  • Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790

    MATH  Google Scholar 

  • Kuncheva LI, Žliobaitė I (2009) On the window size for classification in changing environments. Intell Data Anal 13(6):861–872

    Google Scholar 

  • Lazarescu MM (2005) A multi-resolution learning approach to tracking concept drift and recurrent concepts. In: 5th IAPR workshop on pattern recognition in information systems (PRIS). Miami, USA, pp 52–61

  • Littlestone N (1988) Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Mach Learn 2(4):285–318

    Google Scholar 

  • Littlestone N, Warmuth MK (1994) The weighted majority algorithm. Inf Comput 108(2):212–261

    Article  MathSciNet  MATH  Google Scholar 

  • Minku L, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 99:619–633

    Article  Google Scholar 

  • Morshedlou H, Barforoush AA (2009) A new history based method to handle the recurring concept shifts in data streams. World Acad Sci Eng Technol 58:917–922

    Google Scholar 

  • Nishida K (2008) Learning and detecting concept drift. PhD Thesis in information science and technology. Hokkaido University, Hokkaido

  • Padovitz A, Loke SW, Zaslavsky A (2004) Towards a theory of context spaces. In: Proceedings of the 2nd IEEE annual conference on pervasive computing and communications workshops, pp 38–42

  • Ramamurthy S, Bhatnagar R (2007) Tracking recurrent concept drift in streaming data using ensemble classifiers. In: Proceedings of the sixth international conference on machine learning and applications, pp 404–409

  • Schlimmer JC, Richard J, Granger H (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354

    Google Scholar 

  • Scholz M, Klinkenberg R (2005) An ensemble classifier for drifting concepts. In: Proceedings of the 2nd international workshop on knowledge discovery in data streams, Porto, Portugal, pp 53–64

  • Stanley KO (2003) Learning concept drift with a committee of decision trees, Technical Report UT-AI-TR-03-302, Computer Sciences Department, University of Texas

  • Street WN, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, San Francisco, pp 377–382

  • Tsymbal A (2004) The problem of concept drift: definitions and related work, Technical Report TCD-CS-2004-15, Computer Science Department, Trinity College Dublin

  • Wang H et al (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Washington, pp 226–235

  • Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101

    Google Scholar 

  • Widmer G, Kubat M (1998) Special issue on context sensitivity and concept drift—introduction. Mach Learn 32(2):83–84

    Article  Google Scholar 

  • Woolam C, Masud MM, Khan L (2009) Lacking labels in the stream: classifying evolving stream data with few labels. In: Proceedings of the 18th international symposium on foundations of intelligent systems. Springer, Prague, pp 552–562

  • Zhu X (2010) Stream data mining repository. http://www.cse.fau.edu/~xqzhu/stream.html. Accessed Jan 2012

  • Zliobaite I (2010) Learning under concept drift: an overview. Vilnius University, Technical report

  • Zliobaite I (2010) Adaptive training set formation, PhD Thesis in physical sciences and informatics, Vilnius University

Download references

Acknowledgments

The authors should acknowledge from Pooya Samangouei for taking part in the editing process of the paper. This work was supported by Iran Telecommunication Research Center.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Javad Hosseini.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hosseini, M.J., Ahmadi, Z. & Beigy, H. Using a classifier pool in accuracy based tracking of recurring concepts in data stream classification. Evolving Systems 4, 43–60 (2013). https://doi.org/10.1007/s12530-012-9064-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-012-9064-3

Keywords

Navigation