Modeling recurring concepts in data streams: a graph-based framework

Ahmadi, Zahra; Kramer, Stefan

doi:10.1007/s10115-017-1070-0

Modeling recurring concepts in data streams: a graph-based framework

Regular Paper
Published: 12 June 2017

Volume 55, pages 15–44, (2018)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

859 Accesses
19 Citations
Explore all metrics

Abstract

Classifying a stream of non-stationary data with recurrent drift is a challenging task and has been considered as an interesting problem in recent years. All of the existing approaches handling recurrent concepts maintain a pool of concepts/classifiers and use that pool for future classifications to reduce the error on classifying the instances from a recurring concept. However, the number of classifiers in the pool usually grows very fast as the accurate detection of an underlying concept is a challenging task in itself. Thus, there may be many concepts in the pool representing the same underlying concept. This paper proposes the GraphPool framework that refines the pool of concepts by applying a merging mechanism whenever necessary: after receiving a new batch of data, we extract a concept representation from the current batch considering the correlation among features. Then, we compare the current batch representation to the concept representations in the pool using a statistical multivariate likelihood test. If more than one concept is similar to the current batch, all the corresponding concepts will be merged. GraphPool not only keeps the concepts but also maintains the transition among concepts via a first-order Markov chain. The current state is maintained at all times and new instances are predicted based on that. Keeping these transitions helps to quickly recover from drifts in some real-world problems with periodic behavior. Comprehensive experimental results of the framework on synthetic and real-world data show the effectiveness of the framework in terms of performance and pool management.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

http://users.rowan.edu/~polikar/research/NSE/.
http://sourceforge.net/projects/moa-datastream/files/Datasets/Classification/.
Raw data were extracted from http://db.csail.mit.edu/labdata/labdata.html.
Raw data were extracted from ftp://ftp.ncdc.noaa.gov/pub/data/gsod/.
We tried to compare our method to the method presented by Yang [53], as it has been proposed to handle recurrent concepts and has a close, yet different, approach from this paper. Unfortunately, we could not reach the corresponding author, and there were some unclear parts in the explanation of the method that prevents reimplementation. We have compared the concept similarity algorithm proposed in [53] to our statistical similarity test in the following subsections.
We have used the implementation provided at https://sites.google.com/site/moaextensions/ for RCD, Learn++.NSE and DWM.
We have used the code provided in the MOA framework.
The experiment was finished for only 10% of the data after 72 h.

References

Aggarwal CC (2014) Data classification: algorithms and applications. CRC Press, Boca Raton
Google Scholar
Aggarwal CC, Han J, Wang J, Yu PS (2004) On demand classification of data streams. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 503–508
Anderson TW (2003) An introduction to multivariate statistical analysis. Wiley, New York
MATH Google Scholar
Ángel AM, Bartolo GJ, Ernestina M (2016) Predicting recurring concepts on data-streams by means of a meta-model and a fuzzy similarity function. Expert Syst Appl 46:87–105
Article Google Scholar
Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Proceedings of the fourth international workshop on knowledge discovery from data streams, vol 6, pp 77–86
Bengio Y, Frasconi P (1996) Input-output hmms for sequence processing. IEEE Trans Neural Netw 7(5):1231–1249
Article Google Scholar
Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the seventh SIAM international conference on data mining (SDM), SIAM, pp 443–448
Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavaldà R (2009) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 139–148
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010a) Moa: massive online analysis. J Mach Learn Res 11:1601–1604
Google Scholar
Bifet A, Holmes G, Pfahringer B (2010b) Leveraging bagging for evolving data streams. In: Machine learning and knowledge discovery in databases: proceedings of european conference on machine learning (ECML/PKDD), Springer, pp 135–150
Bifet A, Read J, Zliobaite I, Pfahringer B, Holmes G (2013) Pitfalls in benchmarking data stream classification and how to avoid them. In: Machine learning and knowledge discovery in databases: proceedings of european conference on machine learning (ECML/PKDD), Springer, pp 465–479
Borchani H, Martínez AM, Masegosa AR, Langseth H, Nielsen TD, Salmerón A, Fernández A, Madsen AL, Sáez R (2015) Modeling concept drift: a probabilistic graphical model based approach. In: Proceedings of the international symposium on intelligent data analysis, Springer, pp 72–83
Brzeziński D, Stefanowski J (2011) Accuracy updated ensemble for data streams with concept drift. In: Proceedings of the 6th international conference on hybrid artificial intelligence systems, Springer, pp 155–163
Brzezinski D, Stefanowski J (2014) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans Neural Netw Learn Syst 25(1):81–94
Article Google Scholar
Dietterich TG (2002) Machine learning for sequential data: a review. In: Caelli T, Amin A, Duin RPW, de Ridder D, Kamel M (eds) Structural, syntactic, and statistical pattern recognition. Springer, pp 15–30
Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531
Article Google Scholar
Gama J (2010) Knowledge discovery from data streams. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, CRC Press, Boca Raton
Book MATH Google Scholar
Gama J, Kosina P (2014) Recurrent concepts in data streams classification. Knowl Inf Syst 40(3):489–507
Article Google Scholar
Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):1–44
Article MATH Google Scholar
Gomes JB, Gaber MM, Sousa PA, Menasalvas E (2013) Mining recurring concepts in a dynamic feature space. IEEE Trans Neural Netw Learn Syst 25(1):95–110
Article Google Scholar
Gonçalves PM Jr, Barros RS (2013) RCD: a recurring concept drift framework. Pattern Recognit Lett 34(9):1018–1025
Article Google Scholar
Hahsler M, Dunham MH (2011) Temporal structure learning for clustering massive data streams in real-time. In: Proceedings of the 2011 SIAM international conference on data mining (SDM), SIAM, pp 664–675
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18
Article Google Scholar
Harries M (1999) Splice-2 comparative evaluation: electricity pricing. University of New South Wales, Technical report
Hosseini MJ, Ahmadi Z, Beigy H (2011) Pool and accuracy based stream classification: a new ensemble algorithm on data stream classification using recurring concepts detection. In: Proceedings of the IEEE 11th international conference on data mining workshops (ICDMW), IEEE, pp 588–595
Hosseini MJ, Ahmadi Z, Beigy H (2012) New management operations on classifiers pool to track recurring concepts. In: Proceedings of the 14th international conference on data warehousing and knowledge discovery (DaWaK), Springer, pp 327–339
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 97–106
Jaber G, Cornuéjols A, Tarroux P (2013) Online learning: searching for the best forgetting strategy under concept drift. In: Proceedings of the 20th international conference neural information processing (ICONIP), Springer, pp 400–408
Kalnis P, Mamoulis N, Bakiras S (2005) On discovering moving clusters in spatio–temporal data. In: Proceedings of the 9th international symposium on advances in spatial and temporal databases (SSTD), Springer, pp 364–381
Katakis I, Tsoumakas G, Vlahavas I (2010) Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 22(3):371–391
Article Google Scholar
Kolter JZ, Maloof MA (2005) Using additive expert ensembles to cope with concept drift. In: Proceedings of the 22nd international conference on machine learning (ICML), ACM, pp 449–456
Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790
MATH Google Scholar
Krempl G, Zliobaite I, Brzeziński D, Hüllermeier E, Last M, Lemaire V, Noack T, Shaker A, Sievi S, Spiliopoulou M, Stefanowski J (2014) Open challenges for data stream mining research. ACM SIGKDD Explor Newslett 16(1):1–10
Article Google Scholar
Kuncheva LI (2004) Classifier ensembles for changing environments. In: Proceedings of the 5th international workshop on multiple classifier systems (MCS), Springer, pp 1–15
Lazarescu M (2005) A multi-resolution learning approach to tracking concept drift and recurrent concepts. In: Proceedings of the 5th international workshop on pattern recognition in information systems (PRIS), pp 52–61
Lewandowski D, Kurowicka D, Joe H (2009) Generating random correlation matrices based on vines and extended onion method. J Multivar Anal 100(9):1989–2001
Article MathSciNet MATH Google Scholar
Littlestone N, Warmuth MK (1994) The weighted majority algorithm. Inf Comput 108(2):212–261
Article MathSciNet MATH Google Scholar
Masud MM, Chen Q, Khan L, Aggarwal C, Gao J, Han J, Thuraisingham B (2010) Addressing concept-evolution in concept-drifting data streams. In: Proceedings of the IEEE 10th international conference on data mining (ICDM), IEEE, pp 929–934
Minku LL, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24(4):619–633
Article Google Scholar
Muirhead RJ (2009) Aspects of multivariate statistical theory, vol 197. Wiley, Hoboken
MATH Google Scholar
Nishida K, Yamauchi K, Omori T (2005) ACE: adaptive classifiers-ensemble system for concept-drifting environments. In: Proceedings of the 6th international workshop on multiple classifier systems (MCS), Springer, pp 176–185
Ntoutsi I, Spiliopoulou M, Theodoridis Y (2009) Tracing cluster transitions for different cluster types. Control Cybern 38(1):239–259
MATH Google Scholar
Oliveira MDB, Gama J (2010) MEC—monitoring clusters’ transitions. In: Proceedings of the fifth starting AI researchers’ symposium (STAIRS), pp 212–224
Oza NC (2005) Online bagging and boosting. IEEE Int Conf Syst Man Cybern 3:2340–2345
Google Scholar
Oza NC, Russell S (2001) Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 359–364
Ramamurthy S, Bhatnagar R (2007) Tracking recurrent concept drift in streaming data using ensemble classifiers. In: Proceedings of the sixth international conference on machine learning and applications (ICMLA), IEEE, pp 404–409
Sakthithasan S, Pears R, Bifet A, Pfahringer B (2015) Use of ensembles of Fourier spectra in capturing recurrent concepts in data streams. In: Proceedings of the international joint conference on neural networks (IJCNN), pp 1–8
Spiliopoulou M, Ntoutsi I, Theodoridis Y, Schult R (2006) Monic: modeling and monitoring cluster transitions. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 706–711
Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 377–382
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), ACM, pp 226–235
Webb GI, Hyde R, Cao H, Nguyen HL, Petitjean F (2016) Characterizing concept drift. Data Min Knowl Disc 30(4):964–994
Article MathSciNet Google Scholar
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101
Google Scholar
Yang Y, Wu X, Zhu X (2006) Mining in anticipation for concept change: proactive–reactive prediction in data streams. Data Min Knowl Discov 13(3):261–289
Article MathSciNet Google Scholar
Zliobaite I, Pechenizkiy M, Gama J (2016) An overview of concept drift applications. In: Japkowicz N, Stefanowski J (eds) Big data analysis: new algorithms for a new society. Springer, pp 91–114

Download references

Author information

Authors and Affiliations

Institut für Informatik, Johannes Gutenberg-Universität, Mainz, Germany
Zahra Ahmadi & Stefan Kramer

Authors

Zahra Ahmadi
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Kramer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zahra Ahmadi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ahmadi, Z., Kramer, S. Modeling recurring concepts in data streams: a graph-based framework. Knowl Inf Syst 55, 15–44 (2018). https://doi.org/10.1007/s10115-017-1070-0

Download citation

Received: 24 October 2016
Revised: 17 April 2017
Accepted: 27 May 2017
Published: 12 June 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s10115-017-1070-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling recurring concepts in data streams: a graph-based framework

Abstract

Access this article

Similar content being viewed by others

Learning in the presence of concept recurrence in data stream clustering

CPF: Concept Profiling Framework for Recurring Drifts in Data Streams

Mining Recurrent Concepts in Data Streams Using the Discrete Fourier Transform

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modeling recurring concepts in data streams: a graph-based framework

Abstract

Access this article

Similar content being viewed by others

Learning in the presence of concept recurrence in data stream clustering

CPF: Concept Profiling Framework for Recurring Drifts in Data Streams

Mining Recurrent Concepts in Data Streams Using the Discrete Fourier Transform

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation