Abstract
Data stream mining has become an important research area over the past decade due to the increasing amount of data available today. Sources from various domains generate a near-limitless volume of data in temporal order. Such data are referred to as data streams, and are generally nonstationary as the characteristics of data evolves over time. This phenomenon is called concept drift, and is an issue of great importance in the literature, since it makes models obsolete by decreasing their predictive performance. In the presence of concept drift, it is necessary to adapt to change in data to build more robust and effective classifiers. Drift detectors are designed to run jointly with classification models, updating them when a significant change in data distribution is observed. In this paper, we present an implicit (unsupervised) algorithm called One-Class Drift Detector (OCDD), which uses a one-class learner with a sliding window to detect concept drift. We perform a comprehensive evaluation on mostly recent 17 prevalent concept drift detection methods and an adaptive classifier using 13 datasets. The results show that OCDD outperforms the other methods by producing models with better predictive performance on both real-world and synthetic datasets.
Similar content being viewed by others
Code availability
The source code is available on: https://github.com/ogozuacik/one-class-drift-detection.
Notes
The source code is available on: https://github.com/ogozuacik/one-class-drift-detection.
References
Baena-García M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. Fourth Int Workshop Knowl Discov Data Streams 6:77–86
Bar-Ilan J (2007) Google bombing from a time perspective. J Comput Mediat Commun 12(3):910–938
Barros RS, Cabral DR, Gonçalves PM Jr, Santos SG (2017) RDDM: reactive drift detection method. Expert Syst Appl 90:344–355
Barros RSM, Santos SGTC (2018) A large-scale comparison of concept drift detectors. Inf Sci 451:348–370
Bifet A (2017) Classifier concept drift detection and the illusion of progress. In: International conference on artificial intelligence and soft computing. Springer, pp 715–725
Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proc of the 2007 SIAM SDM, SIAM, pp 443–448
Bifet A, Gavaldà R (2009) Adaptive learning from evolving data streams. In: International symposium on intelligent data analysis. Springer, pp 249–260
Blackard JA, Dean DJ, Anderson CW (1998) The forest covertype dataset. UCI Machine Learning Repository
Bonab H, Can F (2019) Less is more: a comprehensive framework for the number of components of ensemble classifiers. IEEE Trans Neural Netw Learn Syst 30(9):2735–2745
Bonab HR, Can F (2018) GOOWE: geometrically optimum and online-weighted ensemble classifier for evolving data streams. ACM Trans Knowl Discov Data TKDD 12(2):1–33
Bousquet O, Elisseeff A (2002) Stability and generalization. J Mach Learn Res 2(Mar):499–526
Bousquet O, Boucheron S, Lugosi G (2003) Introduction to statistical learning theory. In: Summer school on machine learning. Springer, pp 169–207
Can F (1993) Incremental clustering for dynamic information processing. ACM Trans Inform Syst TOIS 11(2):143–164
Chandra S, Haque A, Khan L, Aggarwal C (2016) An adaptive framework for multistream classification. In: Proceedings of the 25th ACM international conference on information and knowledge management. ACM, pp 1181–1190
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30
Demšar J, Bosnić Z (2018) Detecting concept drift in data streams using model explanation. Expert Syst Appl 92:546–559
de Mello RF, Vaz Y, Grossi CH, Bifet A (2019) On learning guarantees to unsupervised concept drift detection on data streams. Expert Syst Appl 117:90–102
Ditzler G, Roveri M, Alippi C, Polikar R (2015) Learning in nonstationary environments: a survey. IEEE Comput Intell Mag 10(4):12–25
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp 71–80
Dredze M, Oates T, Piatko C (2010) We’re not in Kansas anymore: Detecting domain changes in streams. In: Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 585–595
Dries A, Rückert U (2009) Adaptive concept drift detection. Stat Anal Data Min ASA Data Sci J 2(5–6):311–327
Dua D, Graff C (2017) The Pokerhand dataset. UCI Machine Learning Repository
Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, Hoboken
Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531
Expo AD (2009) Airline on-time performance, ASA section on: statistical computing statistical graphics. http://stat-computing.org/dataexpo/2009
Fan W, Bifet A (2013) Mining big data: current status, and forecast to the future. ACM SIGKDD Explor Newslett 14(2):1–5
Faria ER, Gama J, Carvalho AC (2013) Novelty detection algorithm for data streams multi-class problems. In: Proc of the 28th annual ACM symposium on applied computing. ACM, pp 795–800
Faria ER, Gonçalves IJCR, de Carvalho ACPLF, Gama J (2016) Novelty detection in data streams. Artif Intell Rev 45(2):235–269
Frías-Blanco I, del Campo-Ávila J, Ramos-Jimenez G, Morales-Bueno R, Ortiz-Díaz A, Caballero-Mota Y (2014) Online and non-parametric drift detection methods based on hoeffding’s bounds. IEEE Trans Knowl Data Eng 27(3):810–823
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Brazilian symposium on artificial intelligence. Springer, pp 286–295
Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):44:1–44:37
Gözüaçık Ö, Büyükçakır A, Bonab H, Can F (2019) Unsupervised concept drift detection with a discriminative classifier. In: Proceedings of the 28th ACM international conference on information and knowledge management. ACM, pp 2365–2368
Haque A, Khan L, Baron M (2016) Sand: semi-supervised adaptive novel class detection and classification over data stream. In: 30th AAAI conference on artificial intelligence
Harel M, Mannor S, El-Yaniv R, Crammer K (2014) Concept drift detection through resampling. In: International conference on machine learning, pp 1009–1017
Harries M, Wales NS (1999) Splice-2 comparative evaluation: electricity pricing
Hayat MZ, Hashemi MR (2010) A DCT based approach for detecting novelty and concept drift in data streams. In: 2010 international conference of soft computing and pattern recognition. IEEE, pp 373–378
Hu H, Kantardzic M, Sethi TS (2020) No free lunch theorem for concept drift detection in streaming data classification: a review. Wiley Interdiscip Rev Data Min Knowl Discov 10(2):1327–1351
Krawczyk B, Woźniak M (2015) One-class classifiers with incremental learning and forgetting for data streams with concept drift. Soft Comput 19(12):3387–3400
Kriegel HP, Kröger P, Schubert E, Zimek A (2009) Loop: local outlier probabilities. In: Proceedings of the 18th ACM conference on information and knowledge management, pp 1649–1652
Kuncheva LI, Faithfull WJ (2014) Pca feature extraction for change detection in multidimensional unlabeled data. IEEE Trans Neural Netw Learn Syst 25(1):69–80
Lee J, Magoules F (2012) Detection of concept drift for learning from stream data. In: 2012 IEEE 14th HPCC & 2012 IEEE 9th ICESS, IEEE, pp 241–245
Lindstrom P, Mac Namee B, Delany SJ (2013) Drift detection using uncertainty distribution divergence. Evol Syst 4(1):13–25
Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. In: 2008 Eighth IEEE international conference on data mining. IEEE, pp 413–422
Losing V, Hammer B, Wersing H (2016) KNN classifier with self adjusting memory for heterogeneous concept drift. In: 2016 IEEE 16th ICDM. IEEE, pp 291–300
Lu J, Liu A, Dong F, Gu F, Gama J, Zhang G (2018) Learning under concept drift: a review. IEEE Trans Knowl Data Eng 31(12):2346–2363
Lughofer E, Weigl E, Heidl W, Eitzinger C, Radauer T (2016) Recognizing input space and target concept drifts in data streams with scarcely labeled and unlabelled instances. Inf Sci 355:127–151
Masud M, Gao J, Khan L, Han J, Thuraisingham BM (2011) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874
Montiel J, Read J, Bifet A, Abdessalem T (2018) Scikit-multiflow: a multi-output streaming framework. J Mach Learn Res 19(1):2914–2915
Page ES (1954) Continuous inspection schemes. Biometrika 41(1/2):100–115
Pariser E (2011) The filter bubble: what the internet is hiding from you. Penguin UK
Pears R, Sakthithasan S, Koh YS (2014) Detecting concept change in dynamic data streams. Mach Learn 97(3):259–293
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Pesaranghader A, Viktor HL (2016) Fast Hoeffding drift detection method for evolving data streams. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 96–111
Pesaranghader A, Viktor H, Paquet E (2018a) Reservoir of diverse adaptive learners and stacking fast Hoeffding drift detection methods for evolving data streams. Mach Learn 107(11):1711–1743
Pesaranghader A, Viktor HL, Paquet E (2018b) Mcdiarmid drift detection methods for evolving data streams. In: 2018 international joint conference on neural networks (IJCNN). IEEE, pp 1–9
Pinto F, Sampaio MO, Bizarro P (2019) Automatic model monitoring for data streams. arXiv preprint arXiv:190804240
Qahtan AA, Alharbi B, Wang S, Zhang X (2015) A PCA-based change detection framework for multidimensional data streams: change detection in multidimensional data streams. In: Proc of the 21th ACM SIGKDD. ACM, pp 935–944
Rendón E, Abundez I, Arizmendi A, Quiroz EM (2011) Internal versus external cluster validation indexes. Int J Comput Commun 5(1):27–34
Ross GJ, Adams NM, Tasoulis DK, Hand DJ (2012) Exponentially weighted moving average charts for detecting concept drift. Pattern Recogn Lett 33(2):191–198
Ryu JW, Kantardzic MM, Kim MW, Khil AR (2012) An efficient method of building an ensemble of classifiers in streaming data. In: International conference on big data analytics. Springer, pp 122–133
Sethi TS, Kantardzic M (2017) On the reliable detection of concept drift from streaming unlabeled data. Expert Syst Appl 82:77–99
Sethi TS, Kantardzic M, Hu H (2016) A grid density based framework for classifying streaming data in the presence of concept drift. J Intell Inform Syst 46(1):179–211
Song X, Wu M, Jermaine C, Ranka S (2007) Statistical change detection for multi-dimensional data. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 667–676
Spinosa EJ, de Leon F de Carvalho AP, Gama J (2007) OLINDDA: a cluster-based approach for detecting novelty and concept drift in data streams. In: Proc of the 2007 ACM symposium on applied computing. ACM, pp 448–452
Tax DMJ et al (2001) One-class classification, concept learning in the absence of counter example. Delft University of Technology
Tsymbal A (2004) The problem of concept drift: definitions and related work. Tech Rep Department of Computer Science, Trinity College, Dublin
Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999
Vardi MY (2020) Efficiency vs. resilience: what COVID-19 teaches computing. Commun ACM 63(5):9
Veloso B, Gama J, Malheiro B (2018) Self hyper-parameter tuning for data streams. In: International conference on discovery science. Springer, pp 241–255
Wares S, Isaacs J, Elyan E (2019) Data stream mining: methods and challenges for handling concept drift. SN Appl Sci 1(11):1412
Zhang P, Zhu X, Shi Y (2008) Categorizing and mining concept drifting data streams. In: Proc of the 14th ACM SIGKDD. ACM, pp 812–820
Žliobaite I (2010) Change with delayed labeling: when is it detectable? In: 2010 IEEE international conference on data mining workshops. IEEE, pp 843–850
Zliobaite I (2013) How good is the electricity benchmark for evaluating concept drift adaptation. arXiv preprint arXiv:13013524
Acknowledgements
We would like to thank two anonymous referees and Alper Can for their valuable comments and pointers on this article.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Availability of data and material
Datasets are available on: https://github.com/ogozuacik/concept-drift-datasets-scikit-multiflow.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This study is partially supported by Scientific and Technological Research Council of Turkey (TÜBİTAK) Grant No. 117E870.
Rights and permissions
About this article
Cite this article
Gözüaçık, Ö., Can, F. Concept learning using one-class classifiers for implicit drift detection in evolving data streams. Artif Intell Rev 54, 3725–3747 (2021). https://doi.org/10.1007/s10462-020-09939-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-020-09939-x