Skip to main content
Log in

Concept learning using one-class classifiers for implicit drift detection in evolving data streams

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Data stream mining has become an important research area over the past decade due to the increasing amount of data available today. Sources from various domains generate a near-limitless volume of data in temporal order. Such data are referred to as data streams, and are generally nonstationary as the characteristics of data evolves over time. This phenomenon is called concept drift, and is an issue of great importance in the literature, since it makes models obsolete by decreasing their predictive performance. In the presence of concept drift, it is necessary to adapt to change in data to build more robust and effective classifiers. Drift detectors are designed to run jointly with classification models, updating them when a significant change in data distribution is observed. In this paper, we present an implicit (unsupervised) algorithm called One-Class Drift Detector (OCDD), which uses a one-class learner with a sliding window to detect concept drift. We perform a comprehensive evaluation on mostly recent 17 prevalent concept drift detection methods and an adaptive classifier using 13 datasets. The results show that OCDD outperforms the other methods by producing models with better predictive performance on both real-world and synthetic datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Code availability

The source code is available on: https://github.com/ogozuacik/one-class-drift-detection.

Notes

  1. The source code is available on: https://github.com/ogozuacik/one-class-drift-detection.

References

  • Baena-García M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. Fourth Int Workshop Knowl Discov Data Streams 6:77–86

    Google Scholar 

  • Bar-Ilan J (2007) Google bombing from a time perspective. J Comput Mediat Commun 12(3):910–938

    Article  Google Scholar 

  • Barros RS, Cabral DR, Gonçalves PM Jr, Santos SG (2017) RDDM: reactive drift detection method. Expert Syst Appl 90:344–355

    Article  Google Scholar 

  • Barros RSM, Santos SGTC (2018) A large-scale comparison of concept drift detectors. Inf Sci 451:348–370

    Article  MathSciNet  Google Scholar 

  • Bifet A (2017) Classifier concept drift detection and the illusion of progress. In: International conference on artificial intelligence and soft computing. Springer, pp 715–725

  • Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proc of the 2007 SIAM SDM, SIAM, pp 443–448

  • Bifet A, Gavaldà R (2009) Adaptive learning from evolving data streams. In: International symposium on intelligent data analysis. Springer, pp 249–260

  • Blackard JA, Dean DJ, Anderson CW (1998) The forest covertype dataset. UCI Machine Learning Repository

  • Bonab H, Can F (2019) Less is more: a comprehensive framework for the number of components of ensemble classifiers. IEEE Trans Neural Netw Learn Syst 30(9):2735–2745

    Article  MathSciNet  Google Scholar 

  • Bonab HR, Can F (2018) GOOWE: geometrically optimum and online-weighted ensemble classifier for evolving data streams. ACM Trans Knowl Discov Data TKDD 12(2):1–33

    Article  Google Scholar 

  • Bousquet O, Elisseeff A (2002) Stability and generalization. J Mach Learn Res 2(Mar):499–526

    MathSciNet  MATH  Google Scholar 

  • Bousquet O, Boucheron S, Lugosi G (2003) Introduction to statistical learning theory. In: Summer school on machine learning. Springer, pp 169–207

  • Can F (1993) Incremental clustering for dynamic information processing. ACM Trans Inform Syst TOIS 11(2):143–164

    Article  MathSciNet  Google Scholar 

  • Chandra S, Haque A, Khan L, Aggarwal C (2016) An adaptive framework for multistream classification. In: Proceedings of the 25th ACM international conference on information and knowledge management. ACM, pp 1181–1190

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30

    MathSciNet  MATH  Google Scholar 

  • Demšar J, Bosnić Z (2018) Detecting concept drift in data streams using model explanation. Expert Syst Appl 92:546–559

    Article  Google Scholar 

  • de Mello RF, Vaz Y, Grossi CH, Bifet A (2019) On learning guarantees to unsupervised concept drift detection on data streams. Expert Syst Appl 117:90–102

    Article  Google Scholar 

  • Ditzler G, Roveri M, Alippi C, Polikar R (2015) Learning in nonstationary environments: a survey. IEEE Comput Intell Mag 10(4):12–25

    Article  Google Scholar 

  • Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp 71–80

  • Dredze M, Oates T, Piatko C (2010) We’re not in Kansas anymore: Detecting domain changes in streams. In: Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 585–595

  • Dries A, Rückert U (2009) Adaptive concept drift detection. Stat Anal Data Min ASA Data Sci J 2(5–6):311–327

    Article  MathSciNet  Google Scholar 

  • Dua D, Graff C (2017) The Pokerhand dataset. UCI Machine Learning Repository

  • Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, Hoboken

    MATH  Google Scholar 

  • Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531

    Article  Google Scholar 

  • Expo AD (2009) Airline on-time performance, ASA section on: statistical computing statistical graphics. http://stat-computing.org/dataexpo/2009

  • Fan W, Bifet A (2013) Mining big data: current status, and forecast to the future. ACM SIGKDD Explor Newslett 14(2):1–5

    Article  Google Scholar 

  • Faria ER, Gama J, Carvalho AC (2013) Novelty detection algorithm for data streams multi-class problems. In: Proc of the 28th annual ACM symposium on applied computing. ACM, pp 795–800

  • Faria ER, Gonçalves IJCR, de Carvalho ACPLF, Gama J (2016) Novelty detection in data streams. Artif Intell Rev 45(2):235–269

    Article  Google Scholar 

  • Frías-Blanco I, del Campo-Ávila J, Ramos-Jimenez G, Morales-Bueno R, Ortiz-Díaz A, Caballero-Mota Y (2014) Online and non-parametric drift detection methods based on hoeffding’s bounds. IEEE Trans Knowl Data Eng 27(3):810–823

    Article  Google Scholar 

  • Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Brazilian symposium on artificial intelligence. Springer, pp 286–295

  • Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):44:1–44:37

  • Gözüaçık Ö, Büyükçakır A, Bonab H, Can F (2019) Unsupervised concept drift detection with a discriminative classifier. In: Proceedings of the 28th ACM international conference on information and knowledge management. ACM, pp 2365–2368

  • Haque A, Khan L, Baron M (2016) Sand: semi-supervised adaptive novel class detection and classification over data stream. In: 30th AAAI conference on artificial intelligence

  • Harel M, Mannor S, El-Yaniv R, Crammer K (2014) Concept drift detection through resampling. In: International conference on machine learning, pp 1009–1017

  • Harries M, Wales NS (1999) Splice-2 comparative evaluation: electricity pricing

  • Hayat MZ, Hashemi MR (2010) A DCT based approach for detecting novelty and concept drift in data streams. In: 2010 international conference of soft computing and pattern recognition. IEEE, pp 373–378

  • Hu H, Kantardzic M, Sethi TS (2020) No free lunch theorem for concept drift detection in streaming data classification: a review. Wiley Interdiscip Rev Data Min Knowl Discov 10(2):1327–1351

    Article  Google Scholar 

  • Krawczyk B, Woźniak M (2015) One-class classifiers with incremental learning and forgetting for data streams with concept drift. Soft Comput 19(12):3387–3400

    Article  Google Scholar 

  • Kriegel HP, Kröger P, Schubert E, Zimek A (2009) Loop: local outlier probabilities. In: Proceedings of the 18th ACM conference on information and knowledge management, pp 1649–1652

  • Kuncheva LI, Faithfull WJ (2014) Pca feature extraction for change detection in multidimensional unlabeled data. IEEE Trans Neural Netw Learn Syst 25(1):69–80

    Article  Google Scholar 

  • Lee J, Magoules F (2012) Detection of concept drift for learning from stream data. In: 2012 IEEE 14th HPCC & 2012 IEEE 9th ICESS, IEEE, pp 241–245

  • Lindstrom P, Mac Namee B, Delany SJ (2013) Drift detection using uncertainty distribution divergence. Evol Syst 4(1):13–25

    Article  Google Scholar 

  • Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. In: 2008 Eighth IEEE international conference on data mining. IEEE, pp 413–422

  • Losing V, Hammer B, Wersing H (2016) KNN classifier with self adjusting memory for heterogeneous concept drift. In: 2016 IEEE 16th ICDM. IEEE, pp 291–300

  • Lu J, Liu A, Dong F, Gu F, Gama J, Zhang G (2018) Learning under concept drift: a review. IEEE Trans Knowl Data Eng 31(12):2346–2363

    Google Scholar 

  • Lughofer E, Weigl E, Heidl W, Eitzinger C, Radauer T (2016) Recognizing input space and target concept drifts in data streams with scarcely labeled and unlabelled instances. Inf Sci 355:127–151

    Article  Google Scholar 

  • Masud M, Gao J, Khan L, Han J, Thuraisingham BM (2011) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874

    Article  Google Scholar 

  • Montiel J, Read J, Bifet A, Abdessalem T (2018) Scikit-multiflow: a multi-output streaming framework. J Mach Learn Res 19(1):2914–2915

    MATH  Google Scholar 

  • Page ES (1954) Continuous inspection schemes. Biometrika 41(1/2):100–115

    Article  MathSciNet  Google Scholar 

  • Pariser E (2011) The filter bubble: what the internet is hiding from you. Penguin UK

  • Pears R, Sakthithasan S, Koh YS (2014) Detecting concept change in dynamic data streams. Mach Learn 97(3):259–293

    Article  MathSciNet  Google Scholar 

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  • Pesaranghader A, Viktor HL (2016) Fast Hoeffding drift detection method for evolving data streams. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 96–111

  • Pesaranghader A, Viktor H, Paquet E (2018a) Reservoir of diverse adaptive learners and stacking fast Hoeffding drift detection methods for evolving data streams. Mach Learn 107(11):1711–1743

  • Pesaranghader A, Viktor HL, Paquet E (2018b) Mcdiarmid drift detection methods for evolving data streams. In: 2018 international joint conference on neural networks (IJCNN). IEEE, pp 1–9

  • Pinto F, Sampaio MO, Bizarro P (2019) Automatic model monitoring for data streams. arXiv preprint arXiv:190804240

  • Qahtan AA, Alharbi B, Wang S, Zhang X (2015) A PCA-based change detection framework for multidimensional data streams: change detection in multidimensional data streams. In: Proc of the 21th ACM SIGKDD. ACM, pp 935–944

  • Rendón E, Abundez I, Arizmendi A, Quiroz EM (2011) Internal versus external cluster validation indexes. Int J Comput Commun 5(1):27–34

    Google Scholar 

  • Ross GJ, Adams NM, Tasoulis DK, Hand DJ (2012) Exponentially weighted moving average charts for detecting concept drift. Pattern Recogn Lett 33(2):191–198

    Article  Google Scholar 

  • Ryu JW, Kantardzic MM, Kim MW, Khil AR (2012) An efficient method of building an ensemble of classifiers in streaming data. In: International conference on big data analytics. Springer, pp 122–133

  • Sethi TS, Kantardzic M (2017) On the reliable detection of concept drift from streaming unlabeled data. Expert Syst Appl 82:77–99

    Article  Google Scholar 

  • Sethi TS, Kantardzic M, Hu H (2016) A grid density based framework for classifying streaming data in the presence of concept drift. J Intell Inform Syst 46(1):179–211

    Article  Google Scholar 

  • Song X, Wu M, Jermaine C, Ranka S (2007) Statistical change detection for multi-dimensional data. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 667–676

  • Spinosa EJ, de Leon F de Carvalho AP, Gama J (2007) OLINDDA: a cluster-based approach for detecting novelty and concept drift in data streams. In: Proc of the 2007 ACM symposium on applied computing. ACM, pp 448–452

  • Tax DMJ et al (2001) One-class classification, concept learning in the absence of counter example. Delft University of Technology

  • Tsymbal A (2004) The problem of concept drift: definitions and related work. Tech Rep Department of Computer Science, Trinity College, Dublin

    Google Scholar 

  • Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999

    Article  Google Scholar 

  • Vardi MY (2020) Efficiency vs. resilience: what COVID-19 teaches computing. Commun ACM 63(5):9

    Article  Google Scholar 

  • Veloso B, Gama J, Malheiro B (2018) Self hyper-parameter tuning for data streams. In: International conference on discovery science. Springer, pp 241–255

  • Wares S, Isaacs J, Elyan E (2019) Data stream mining: methods and challenges for handling concept drift. SN Appl Sci 1(11):1412

    Article  Google Scholar 

  • Zhang P, Zhu X, Shi Y (2008) Categorizing and mining concept drifting data streams. In: Proc of the 14th ACM SIGKDD. ACM, pp 812–820

  • Žliobaite I (2010) Change with delayed labeling: when is it detectable? In: 2010 IEEE international conference on data mining workshops. IEEE, pp 843–850

  • Zliobaite I (2013) How good is the electricity benchmark for evaluating concept drift adaptation. arXiv preprint arXiv:13013524

Download references

Acknowledgements

We would like to thank two anonymous referees and Alper Can for their valuable comments and pointers on this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fazli Can.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Availability of data and material

Datasets are available on: https://github.com/ogozuacik/concept-drift-datasets-scikit-multiflow.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This study is partially supported by Scientific and Technological Research Council of Turkey (TÜBİTAK) Grant No. 117E870.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gözüaçık, Ö., Can, F. Concept learning using one-class classifiers for implicit drift detection in evolving data streams. Artif Intell Rev 54, 3725–3747 (2021). https://doi.org/10.1007/s10462-020-09939-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-020-09939-x

Keywords

Navigation