Skip to main content
Log in

Fixed-size ensemble classifier system evolutionarily adapted to a recurring context with an unlimited pool of classifiers

  • Original Article
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

This paper presents a novel ensemble classifier system designed to process data streams featuring occasional changes in their characteristics (concept drift). The ensemble is especially effective when the concepts reappear (recurring context). The system collects information on emerging contexts in a pool of elementary classifiers trained on subsequent data chunks. The pool is updated only when concept drift is detected. In contrast to other ensemble solutions, classifiers are not removed from the pool, and therefore, knowledge of past contexts is preserved for future use. To ensure high classification performance, the number of classifiers contributing to decision-making is fixed and limited. Only selected elements from the pool can join the decision-making ensemble. The process of selecting classifiers and adjusting their weights is realized by an evolutionary-based optimization algorithm that aims to minimize the system misclassification rate. Performance of the system is evaluated through a series of experiments presenting some key features of the system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Bifet A (2009) Adaptive learning and mining for data streams and frequent patterns. PhD thesis, Universitat Politecnica de Catalunya

  2. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1): 69–101

    Google Scholar 

  3. Zliobait˙e I (2010) Adaptive training set formation. PhD thesis, Vilnius University, Lithuania

  4. Hilas C (2009) Designing an expert system for fraud detection in private telecommunications networks. Expert Syst Appl 36(9):11559–11569

    Article  Google Scholar 

  5. Black M, Hickey R (2002) Classification of customer call data in the presence of concept drift and noise, In: Soft-Ware 2002, Proceedings of the 1st International Conference on Computing in an Imperfect World, Springer, Berlin, pp 74–87

  6. Delany SJ, Cunningham P, Tsymbal A (2005) A comparison of ensemble and case-based maintenance techniques for handling concept drift in spam filtering. Technical Report TCD-CS-2005-19, Trinity College Dublin

  7. Cunningham P, Nowlan N (2003) A case-based approach to spam filtering that can track concept drift. In: ICCBR-2003 Workshop on Long-Lived CBR Systems. Springer, London, pp 3–16

  8. Kelly MG, Hand DJ, Adams NM (1999) The impact of changing populations on classifier performance. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining-KDD′99, ACM Press, New York, USA, pp 367–371

  9. Markou M, Singh S (2003) Novelty detection: a review—part 1: statistical approaches. Signal Process 83:2481–2497

    Article  MATH  Google Scholar 

  10. Widmer G, Kubat M (1993) Effective learning in dynamic environments by explicit context tracking. Proceedings of the 6th European Conference on Machine Learning ECML-1993, Springer. Lecture Notes Comput Sci 667:227–243

  11. Kuncheva LI (2008) Classifier ensembles for detecting concept change in streaming data: overview and perspectives. In: Proceedings of the 2nd Workshop SUEMA, ECAI 2008, Patras, Greece, pp 5–9

  12. Mak L-O, Krause P (2006) Detection and management of concept drift, machine learning and cybernetics. In: International Conference on, 2006, pp 3486–3491

  13. Ouyang Z, Zhou M, Wang T, Wu Q (2009) Mining concept-drifting and noisy data streams using ensemble classifiers, artificial intelligence and computational intelligence, 2009. AICI’09. International Conference, vol 4, pp 360–364

  14. Tsymbal A (2004) The problem of concept drift: definitions and related work, Technical Report TCD-CS-2004-15. Department of Computer Science, Trinity College Dublin, Ireland

  15. Klinkenberg R, Renz I (1998) Adaptive information filtering: Learning in the presence of concept drifts. In: Learning for text categorization. AAAI Press, Marina del Rey, pp 33–40

  16. Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell data anal 8:281–300

    Google Scholar 

  17. Chen S, Wang H, Zhou S, Yu P (2008) Stop chasing trends: Discovering high order models in evolving data. In: Proceedings of the 24th International Conference on Data Engineering, 2008, pp 923–932

  18. Kuncheva LI (2004) Classifier ensembles for changing environments. In: 5th International Workshop on Multiple Classifier Systems, MCS 04, LNCS, vol. 3077, Springer, Berlin, pp 1–15

  19. Kuncheva LI (2004) Combining pattern classifiers. Methods and algorithms. Wiley, New York

    Book  MATH  Google Scholar 

  20. Littlestone N, Warmuth MK (1994) The weighted majority algorithm. Inf Comput 108:212–261

    Article  MathSciNet  MATH  Google Scholar 

  21. Schlimmer J, Granger R (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354

    Google Scholar 

  22. Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66

    Google Scholar 

  23. Ouyang Z, Gao Y, Zhao Z, Wang T (2011) Study on the classification of data streams with concept drift. In: 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pp 1673–1677

  24. Domingos P, Hulten G (2003) A general framework for mining massive data streams. J Comput Graph Stat 12:945–949

    Article  MathSciNet  Google Scholar 

  25. Polikar R, Udpa L, Udpa SS, Honavar V (2001) Learn++: An incremental learning algorithm for supervised neural networks. IEEE Trans Syst Man Cybern Part C Appl Rev 31(4): 497–508

    Google Scholar 

  26. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, San Francisco, pp 97–106

  27. Bifet A, Holmes G, Pfahringer B, Read J, Kranen P, Kremer Hardy, Jansen H, Seidl T (2011) MOA: a real-time analytics open source framework. ECML/PKDD 3:617–620

  28. Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavalda R (2009) New ensemble methods for evolving data streams. In: KDD’09 Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 139–148

  29. Chu F, Zaniolo C (20074) Fast and light boosting for adaptive mining of data streams. In: Dai H, Srikant R, Zhang C (eds) PAKDD. Springer, Berlin,*** pp 282–292

  30. Bifet A, Gavalda R (2006) Learning from time-changing data with adaptive windowing. Technical report, Universitat Politecnica de Catalunya, 2006. (http://www.lsi.upc.edu/~abifet)

  31. Lazarescu M, Venkatesh S, Bui H (2003) Using multiple windows to track concept drift. Technical report. Faculty of Computer Science, Curtin University

  32. Kurlej B, Woźniak M (2011) Learning Curve in Concept Drift While Using Active Learning Paradigm. In: Bouchachia A (ed) Adaptive and Intelligent Systems. Springer Berlin Heidelberg, pp 98–106

  33. Koychev I (2000) Gradual forgetting for adaptation to concept drift. In: Proceedings of ECAI 2000 Workshop Current Issues in Spatio-Temporal Reasoning, pp 101–106

  34. Stanley K (2003) Learning concept drift with a committee of decision trees. Technical Report UT-AI-TR-03-302, Computer Sciences Department, University of Texas

  35. Tsymbal A, Pechenizkiy M, Cunningham P, Puuronen S (2008) Dynamic integration of classifiers for handling concept drift. Inf Fusion 1:56–68

    Article  Google Scholar 

  36. Shipp CA, Kuncheva LI (2002) Relationships between combination methods and measures of diversity in combining classifiers. Inf Fusion 3(2):135–148

    Article  Google Scholar 

  37. Littlestone N, Warmuth M (1994) The weighted majority algorithm. Inf Comput 108(2):212–261

    Article  MathSciNet  MATH  Google Scholar 

  38. Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3:79–87

    Article  Google Scholar 

  39. Nikunj CO (2000) Online ensemble learning. In: AAAI/IAAI. AAAI Press/The MIT Press, USA

  40. Rodriguez J, Kuncheva L (2008) Combining online classification approaches for changing environments. In: SSPR/SPR, vol 5342 of LNCS, Springer, Berlin, pp 520–529

  41. Street N, Kim Y (2001) A streaming ensemble algorithm (sea) for large scale classification. In: KDD’01 Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 377–382

  42. Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Getoor L, Senator TE, Domingos P, Faloutsos C (eds) KDD, ACM, pp 226–235

  43. Kolter J, Maloof M (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: ICDM, IEEE, pp 123–130

  44. Zliobaite I (2009) Learning under concept drift: an overview. Technical report, Vilnius University, Faculty of Mathematics and Informatics

  45. Gaber MM, Yu PS (2006) Classification of changes in evolving data streams using online clustering result deviation. In: Third international workshop on knowledge discovery in data streams, Pittsburgh, PA, USA

  46. Salganicoff M (1993) Density-adaptive learning and forgetting. In: Proceedings of the 10th International Conference on Machine Learning, pp 276–283

  47. Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: Proceedings of the 17th international conference on machine learning (ICML), San Francisco, CA, USA, pp 487–494

  48. Baena-Garcıa M, Campo-Avila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: 4th international workshop on knowledge discovery from data streams, pp 77–86

  49. Kurlej B, Wozniak M (2012) Active learning approach to concept drift problem. Log J IGPL 20:550–559

    Article  MathSciNet  Google Scholar 

  50. Ramamurthy S, Bhatnagar R (2007) Tracking recurrent concept drift in streaming data using ensemble classifiers. In: Proceedings of the 6th international conference on machine learning and applications, pp 404–409

  51. Sasthakumar R, Raj B (2007) Tracking recurrent concept drift in streaming data using ensemble classifiers. In: ICMLA’07 Proceedings of the 6th international conference on machine learning and applications, IEEE Computer Society, Washington, DC, pp 404–409

  52. Turney P (1993) Exploiting context when learning to classify. In: Proceedings of the European conference on machine learning (ECML-93), pp 402–407

  53. Widmer G (1997) Tracking context changes through meta-learning. Mach Learn 27(3):59–286

    Article  Google Scholar 

  54. Gomes JB, Ruiz EM, Sousa PAC (2011) Learning recurring concepts from data streams with a context-aware ensemble. In: Proceedings of SAC, 2011, pp 994–999

  55. Katakis I, Tsoumakas G, Vlahavas I (2010) Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 22(3):371–391

    Article  Google Scholar 

  56. Hosseini MJ, Ahmadi Z, Beigy H (2011) Pool and accuracy based stream classification: a new ensemble algorithm on data stream classification using recurring concepts detection. In: 2011 IEEE 11th International Conference on Data Mining Workshops, pp 588–595

  57. Bäck T, Fogel D, Michalewicz Z (1997) Handbook of evolutionary computation. Oxford University Press, Oxford

    Book  MATH  Google Scholar 

  58. Jackowski K, Wozniak M (2010) Method of classifier selection using the genetic approach. Expert Syst 27(2):114–128

    Article  Google Scholar 

  59. Alpaydin E (2004) Introduction to machine learning. The MIT Press, Cambridge

  60. Duin RPW, Juszczak P, Paclik P, Pekalska E, de Ridder D, Tax DMJ (2004) PRTools4, a Matlab toolbox for pattern recognition, Delft University of Technology, The Netherlands

  61. UCI Machine Learning Repository. http://www.archive.ics.uci.edu/ml/

  62. Harries M (1999) Splice-2 comparative evaluation: electricity pricing, Technical report, The University of South Wales, Australia

  63. Bifet A, Gavaldá R (2009) Adaptive learning from evolving data streams in IDA 2009

  64. Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1924

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by The Polish National Science Centre under the grant N N519 576638 which is being realized in years 2010-2013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Konrad Jackowski.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jackowski, K. Fixed-size ensemble classifier system evolutionarily adapted to a recurring context with an unlimited pool of classifiers. Pattern Anal Applic 17, 709–724 (2014). https://doi.org/10.1007/s10044-013-0318-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-013-0318-x

Keywords

Navigation