Skip to main content
Log in

A drift detection method based on dynamic classifier selection

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Machine learning algorithms can be applied to several practical problems, such as spam, fraud and intrusion detection, and customer preferences, among others. In most of these problems, data come in streams, which mean that data distribution may change over time, leading to concept drift. The literature is abundant on providing supervised methods based on error monitoring for explicit drift detection. However, these methods may become infeasible in some real-world applications—where there is no fully labeled data available, and may depend on a significant decrease in accuracy to be able to detect drifts. There are also methods based on blind approaches, where the decision model is updated constantly. However, this may lead to unnecessary system updates. In order to overcome these drawbacks, we propose in this paper a semi-supervised drift detector that uses an ensemble of classifiers based on self-training online learning and dynamic classifier selection. For each unknown sample, a dynamic selection strategy is used to choose among the ensemble’s component members, the classifier most likely to be the correct one for classifying it. The prediction assigned by the chosen classifier is used to compute an estimate of the error produced by the ensemble members. The proposed method monitors such a pseudo-error in order to detect drifts and to update the decision model only after drift detection. The achievement of this method is relevant in that it allows drift detection and reaction and is applicable in several practical problems. The experiments conducted indicate that the proposed method attains high performance and detection rates, while reducing the amount of labeled data used to detect drift.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Altınçay H (2007) Ensembling evidential k-nearest neighbor classifiers through multi-modal perturbation. Appl Soft Comput 7(3):1072–1083

    Article  MathSciNet  Google Scholar 

  • Ang HH, Gopalkrishnan V, Zliobaite I, Pechenizkiy M, Hoi S (2013) Predictive handling of asynchronous concept drifts in distributed environments. IEEE Trans Knowl Data Eng 25(10):2343–2355

    Article  Google Scholar 

  • Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams. vol 6, pp 77–86

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  • Dawid P, Vovk V (1999) Prequential probability: principles and properties. Bernoulli 5(1):125–162

    Article  MathSciNet  Google Scholar 

  • De Almeida PL, Oliveira LS, Britto ADS, Sabourin R (2016) Handling concept drifts using dynamic selection of classifiers. In: 2016 IEEE 28th international conference on tools with artificial intelligence (ICTAI). IEEE, pp 989–995

  • Fanizzi N, dAmato C, Esposito F (2008) Conceptual clustering and its application to concept drift and novelty detection. In: European semantic web conference. Springer, pp 318–332

  • Gama J, Castillo G (2004) Learning with local drift detection. In: Advances in artificial intelligence. Springer, Berlin/Heidelberg, vol 3171, pp 286–295

  • Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):44

    Article  Google Scholar 

  • Giacinto G, Roli F (2001) Dynamic classifier selection based on multiple classifier behaviour. Pattern Recognit 34(9):1879–1881

    Article  Google Scholar 

  • Haque A, Khan L, Baron M (2016) Sand: semi-supervised adaptive novel class detection and classification over data stream. In: THIRTIETH AAAI conference on artificial intelligence

  • Huang S (2008) An active learning method for mining time-changing data streams. In: 2008 Second international symposium on intelligent information technology application. IEEE, vol 2, pp 548–552

  • Kantardzic M, Ryu JW, Walgampaya C (2010) Building a new classifier in an ensemble using streaming unlabeled data. In: International conference on industrial, engineering and other applications of applied intelligent systems. Springer, pp 77–86

  • Kmieciak M, Stefanowski J (2011) Handling sudden concept drift in enron messages data stream. Control Cybern 40(3):667–695

    MATH  Google Scholar 

  • Kolter Z, Maloof M (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8(Dec):2755–2790

    MATH  Google Scholar 

  • Kuncheva L, Skurichina M, Duin R (2002) An experimental study on diversity for bagging and boosting with linear classifiers. Inf Fusion 3(4):245–258

    Article  Google Scholar 

  • Minku L, Yao X (2012) Ddd: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24(4):619–633

    Article  Google Scholar 

  • Minku L, White A, Yao X (2010) The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(5):730–742

    Article  Google Scholar 

  • Mitchell T (1997) Machine learning. McGraw-Hill Higher Education, New York

    MATH  Google Scholar 

  • Muhlbaier M, Polikar R (2007) An ensemble approach for incremental learning in nonstationary environments. In: International workshop on multiple classifier systems. Springer, pp 490–500

  • Nishida K, Yamauchi K (2007) Detecting concept drift using statistical testing. In: International conference on discovery science. Springer, pp 264–269

  • Oza N, Russell S (2001) Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 359–364

  • Pezeshki M, Fan L, Brakel P, Courville A, Bengio Y (2016) Deconstructing the ladder network architecture. In: International conference on machine learning. pp 2368–2376

  • Pinage FA, dos Santos EM (2015) A dissimilarity-based drift detection method. In: 2015 IEEE 27th international conference on tools with artificial intelligence (ICTAI). IEEE, pp 1069–1076

  • Pinage FA, dos Santos EM, da Gama JMP (2016) Classification systems in dynamic environments: an overview. Wiley Interdiscip Rev Data Min Knowl Discov 6(5):156–166

    Article  Google Scholar 

  • Ruta D, Gabrys B (2007) Neural network ensembles for time series prediction. In: 2007 International joint conference on neural networks. IEEE, pp 1204–1209

  • Spinosa E, de Leon de Carvalho AP, Gama J (2008) Cluster-based novel concept detection in data streams applied to intrusion detection in computer networks. In: Proceedings of the 2008 ACM symposium on applied computing. ACM, pp 976–980

  • Street N, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 377–382

  • Tsymbal A, Pechenizkiy M, Cunningham P, Puuronen S (2008) Dynamic integration of classifiers for handling concept drift. Inf Fusion 9(1):56–68

    Article  Google Scholar 

  • Wang H, Fan W, Yu P, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 226–235

  • Woods K, Kegelmeyer WP, Bowyer K (1997) Combination of multiple classifiers using local accuracy estimates. IEEE Trans Pattern Anal Mach Intell 19(4):405–410

    Article  Google Scholar 

  • Wu X, Li P, Hu X (2012) Learning from concept drifting data streams with unlabeled data. Neurocomputing 92:145–155

    Article  Google Scholar 

  • Zliobaite I (2011) Combining similarity in time and space for training set formation under concept drift. Intell Data Anal 15(4):589–611

    Article  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge the financial support granted by PNPD-CAPES (Coordination for the Improvement of Higher Education Personnel) and FAPEAM (Amazonas Research Foundation) for this research through process Number 009/2012 (RHTI-Doutorado).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Felipe Pinagé.

Additional information

Responsible editor: Charu Aggarwal.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pinagé, F., dos Santos, E.M. & Gama, J. A drift detection method based on dynamic classifier selection. Data Min Knowl Disc 34, 50–74 (2020). https://doi.org/10.1007/s10618-019-00656-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-019-00656-w

Keywords

Navigation