Skip to main content
Log in

Modeling recurring concepts in data streams: a graph-based framework

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Classifying a stream of non-stationary data with recurrent drift is a challenging task and has been considered as an interesting problem in recent years. All of the existing approaches handling recurrent concepts maintain a pool of concepts/classifiers and use that pool for future classifications to reduce the error on classifying the instances from a recurring concept. However, the number of classifiers in the pool usually grows very fast as the accurate detection of an underlying concept is a challenging task in itself. Thus, there may be many concepts in the pool representing the same underlying concept. This paper proposes the GraphPool framework that refines the pool of concepts by applying a merging mechanism whenever necessary: after receiving a new batch of data, we extract a concept representation from the current batch considering the correlation among features. Then, we compare the current batch representation to the concept representations in the pool using a statistical multivariate likelihood test. If more than one concept is similar to the current batch, all the corresponding concepts will be merged. GraphPool not only keeps the concepts but also maintains the transition among concepts via a first-order Markov chain. The current state is maintained at all times and new instances are predicted based on that. Keeping these transitions helps to quickly recover from drifts in some real-world problems with periodic behavior. Comprehensive experimental results of the framework on synthetic and real-world data show the effectiveness of the framework in terms of performance and pool management.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://users.rowan.edu/~polikar/research/NSE/.

  2. http://sourceforge.net/projects/moa-datastream/files/Datasets/Classification/.

  3. Raw data were extracted from http://db.csail.mit.edu/labdata/labdata.html.

  4. Raw data were extracted from ftp://ftp.ncdc.noaa.gov/pub/data/gsod/.

  5. We tried to compare our method to the method presented by Yang [53], as it has been proposed to handle recurrent concepts and has a close, yet different, approach from this paper. Unfortunately, we could not reach the corresponding author, and there were some unclear parts in the explanation of the method that prevents reimplementation. We have compared the concept similarity algorithm proposed in [53] to our statistical similarity test in the following subsections.

  6. We have used the implementation provided at https://sites.google.com/site/moaextensions/ for RCD, Learn++.NSE and DWM.

  7. We have used the code provided in the MOA framework.

  8. The experiment was finished for only 10% of the data after 72 h.

References

  1. Aggarwal CC (2014) Data classification: algorithms and applications. CRC Press, Boca Raton

    Google Scholar 

  2. Aggarwal CC, Han J, Wang J, Yu PS (2004) On demand classification of data streams. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 503–508

  3. Anderson TW (2003) An introduction to multivariate statistical analysis. Wiley, New York

    MATH  Google Scholar 

  4. Ángel AM, Bartolo GJ, Ernestina M (2016) Predicting recurring concepts on data-streams by means of a meta-model and a fuzzy similarity function. Expert Syst Appl 46:87–105

    Article  Google Scholar 

  5. Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Proceedings of the fourth international workshop on knowledge discovery from data streams, vol 6, pp 77–86

  6. Bengio Y, Frasconi P (1996) Input-output hmms for sequence processing. IEEE Trans Neural Netw 7(5):1231–1249

    Article  Google Scholar 

  7. Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the seventh SIAM international conference on data mining (SDM), SIAM, pp 443–448

  8. Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavaldà R (2009) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 139–148

  9. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010a) Moa: massive online analysis. J Mach Learn Res 11:1601–1604

    Google Scholar 

  10. Bifet A, Holmes G, Pfahringer B (2010b) Leveraging bagging for evolving data streams. In: Machine learning and knowledge discovery in databases: proceedings of european conference on machine learning (ECML/PKDD), Springer, pp 135–150

  11. Bifet A, Read J, Zliobaite I, Pfahringer B, Holmes G (2013) Pitfalls in benchmarking data stream classification and how to avoid them. In: Machine learning and knowledge discovery in databases: proceedings of european conference on machine learning (ECML/PKDD), Springer, pp 465–479

  12. Borchani H, Martínez AM, Masegosa AR, Langseth H, Nielsen TD, Salmerón A, Fernández A, Madsen AL, Sáez R (2015) Modeling concept drift: a probabilistic graphical model based approach. In: Proceedings of the international symposium on intelligent data analysis, Springer, pp 72–83

  13. Brzeziński D, Stefanowski J (2011) Accuracy updated ensemble for data streams with concept drift. In: Proceedings of the 6th international conference on hybrid artificial intelligence systems, Springer, pp 155–163

  14. Brzezinski D, Stefanowski J (2014) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans Neural Netw Learn Syst 25(1):81–94

    Article  Google Scholar 

  15. Dietterich TG (2002) Machine learning for sequential data: a review. In: Caelli T, Amin A, Duin RPW, de Ridder D, Kamel M (eds) Structural, syntactic, and statistical pattern recognition. Springer, pp 15–30

  16. Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531

    Article  Google Scholar 

  17. Gama J (2010) Knowledge discovery from data streams. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, CRC Press, Boca Raton

    Book  MATH  Google Scholar 

  18. Gama J, Kosina P (2014) Recurrent concepts in data streams classification. Knowl Inf Syst 40(3):489–507

    Article  Google Scholar 

  19. Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):1–44

    Article  MATH  Google Scholar 

  20. Gomes JB, Gaber MM, Sousa PA, Menasalvas E (2013) Mining recurring concepts in a dynamic feature space. IEEE Trans Neural Netw Learn Syst 25(1):95–110

    Article  Google Scholar 

  21. Gonçalves PM Jr, Barros RS (2013) RCD: a recurring concept drift framework. Pattern Recognit Lett 34(9):1018–1025

    Article  Google Scholar 

  22. Hahsler M, Dunham MH (2011) Temporal structure learning for clustering massive data streams in real-time. In: Proceedings of the 2011 SIAM international conference on data mining (SDM), SIAM, pp 664–675

  23. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18

    Article  Google Scholar 

  24. Harries M (1999) Splice-2 comparative evaluation: electricity pricing. University of New South Wales, Technical report

  25. Hosseini MJ, Ahmadi Z, Beigy H (2011) Pool and accuracy based stream classification: a new ensemble algorithm on data stream classification using recurring concepts detection. In: Proceedings of the IEEE 11th international conference on data mining workshops (ICDMW), IEEE, pp 588–595

  26. Hosseini MJ, Ahmadi Z, Beigy H (2012) New management operations on classifiers pool to track recurring concepts. In: Proceedings of the 14th international conference on data warehousing and knowledge discovery (DaWaK), Springer, pp 327–339

  27. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 97–106

  28. Jaber G, Cornuéjols A, Tarroux P (2013) Online learning: searching for the best forgetting strategy under concept drift. In: Proceedings of the 20th international conference neural information processing (ICONIP), Springer, pp 400–408

  29. Kalnis P, Mamoulis N, Bakiras S (2005) On discovering moving clusters in spatio–temporal data. In: Proceedings of the 9th international symposium on advances in spatial and temporal databases (SSTD), Springer, pp 364–381

  30. Katakis I, Tsoumakas G, Vlahavas I (2010) Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 22(3):371–391

    Article  Google Scholar 

  31. Kolter JZ, Maloof MA (2005) Using additive expert ensembles to cope with concept drift. In: Proceedings of the 22nd international conference on machine learning (ICML), ACM, pp 449–456

  32. Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790

    MATH  Google Scholar 

  33. Krempl G, Zliobaite I, Brzeziński D, Hüllermeier E, Last M, Lemaire V, Noack T, Shaker A, Sievi S, Spiliopoulou M, Stefanowski J (2014) Open challenges for data stream mining research. ACM SIGKDD Explor Newslett 16(1):1–10

    Article  Google Scholar 

  34. Kuncheva LI (2004) Classifier ensembles for changing environments. In: Proceedings of the 5th international workshop on multiple classifier systems (MCS), Springer, pp 1–15

  35. Lazarescu M (2005) A multi-resolution learning approach to tracking concept drift and recurrent concepts. In: Proceedings of the 5th international workshop on pattern recognition in information systems (PRIS), pp 52–61

  36. Lewandowski D, Kurowicka D, Joe H (2009) Generating random correlation matrices based on vines and extended onion method. J Multivar Anal 100(9):1989–2001

    Article  MathSciNet  MATH  Google Scholar 

  37. Littlestone N, Warmuth MK (1994) The weighted majority algorithm. Inf Comput 108(2):212–261

    Article  MathSciNet  MATH  Google Scholar 

  38. Masud MM, Chen Q, Khan L, Aggarwal C, Gao J, Han J, Thuraisingham B (2010) Addressing concept-evolution in concept-drifting data streams. In: Proceedings of the IEEE 10th international conference on data mining (ICDM), IEEE, pp 929–934

  39. Minku LL, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24(4):619–633

    Article  Google Scholar 

  40. Muirhead RJ (2009) Aspects of multivariate statistical theory, vol 197. Wiley, Hoboken

    MATH  Google Scholar 

  41. Nishida K, Yamauchi K, Omori T (2005) ACE: adaptive classifiers-ensemble system for concept-drifting environments. In: Proceedings of the 6th international workshop on multiple classifier systems (MCS), Springer, pp 176–185

  42. Ntoutsi I, Spiliopoulou M, Theodoridis Y (2009) Tracing cluster transitions for different cluster types. Control Cybern 38(1):239–259

    MATH  Google Scholar 

  43. Oliveira MDB, Gama J (2010) MEC—monitoring clusters’ transitions. In: Proceedings of the fifth starting AI researchers’ symposium (STAIRS), pp 212–224

  44. Oza NC (2005) Online bagging and boosting. IEEE Int Conf Syst Man Cybern 3:2340–2345

    Google Scholar 

  45. Oza NC, Russell S (2001) Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 359–364

  46. Ramamurthy S, Bhatnagar R (2007) Tracking recurrent concept drift in streaming data using ensemble classifiers. In: Proceedings of the sixth international conference on machine learning and applications (ICMLA), IEEE, pp 404–409

  47. Sakthithasan S, Pears R, Bifet A, Pfahringer B (2015) Use of ensembles of Fourier spectra in capturing recurrent concepts in data streams. In: Proceedings of the international joint conference on neural networks (IJCNN), pp 1–8

  48. Spiliopoulou M, Ntoutsi I, Theodoridis Y, Schult R (2006) Monic: modeling and monitoring cluster transitions. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 706–711

  49. Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 377–382

  50. Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 226–235

  51. Webb GI, Hyde R, Cao H, Nguyen HL, Petitjean F (2016) Characterizing concept drift. Data Min Knowl Disc 30(4):964–994

    Article  MathSciNet  Google Scholar 

  52. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101

    Google Scholar 

  53. Yang Y, Wu X, Zhu X (2006) Mining in anticipation for concept change: proactive–reactive prediction in data streams. Data Min Knowl Discov 13(3):261–289

    Article  MathSciNet  Google Scholar 

  54. Zliobaite I, Pechenizkiy M, Gama J (2016) An overview of concept drift applications. In: Japkowicz N, Stefanowski J (eds) Big data analysis: new algorithms for a new society. Springer, pp 91–114

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zahra Ahmadi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahmadi, Z., Kramer, S. Modeling recurring concepts in data streams: a graph-based framework. Knowl Inf Syst 55, 15–44 (2018). https://doi.org/10.1007/s10115-017-1070-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-017-1070-0

Keywords

Navigation