A drift detection method based on dynamic classifier selection

Pinagé, Felipe; dos Santos, Eulanda M.; Gama, João

doi:10.1007/s10618-019-00656-w

A drift detection method based on dynamic classifier selection

Published: 11 October 2019

Volume 34, pages 50–74, (2020)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

1474 Accesses
32 Citations
3 Altmetric
Explore all metrics

Abstract

Machine learning algorithms can be applied to several practical problems, such as spam, fraud and intrusion detection, and customer preferences, among others. In most of these problems, data come in streams, which mean that data distribution may change over time, leading to concept drift. The literature is abundant on providing supervised methods based on error monitoring for explicit drift detection. However, these methods may become infeasible in some real-world applications—where there is no fully labeled data available, and may depend on a significant decrease in accuracy to be able to detect drifts. There are also methods based on blind approaches, where the decision model is updated constantly. However, this may lead to unnecessary system updates. In order to overcome these drawbacks, we propose in this paper a semi-supervised drift detector that uses an ensemble of classifiers based on self-training online learning and dynamic classifier selection. For each unknown sample, a dynamic selection strategy is used to choose among the ensemble’s component members, the classifier most likely to be the correct one for classifying it. The prediction assigned by the chosen classifier is used to compute an estimate of the error produced by the ensemble members. The proposed method monitors such a pseudo-error in order to detect drifts and to update the decision model only after drift detection. The achievement of this method is relevant in that it allows drift detection and reaction and is applicable in several practical problems. The experiments conducted indicate that the proposed method attains high performance and detection rates, while reducing the amount of labeled data used to detect drift.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

Vitor Werner de Vargas, Jorge Arthur Schneider Aranda, … Jorge Luis Victória Barbosa

A survey on semi-supervised learning

Article Open access 15 November 2019

Jesper E. van Engelen & Holger H. Hoos

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

References

Altınçay H (2007) Ensembling evidential k-nearest neighbor classifiers through multi-modal perturbation. Appl Soft Comput 7(3):1072–1083
Article MathSciNet Google Scholar
Ang HH, Gopalkrishnan V, Zliobaite I, Pechenizkiy M, Hoi S (2013) Predictive handling of asynchronous concept drifts in distributed environments. IEEE Trans Knowl Data Eng 25(10):2343–2355
Article Google Scholar
Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams. vol 6, pp 77–86
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH Google Scholar
Dawid P, Vovk V (1999) Prequential probability: principles and properties. Bernoulli 5(1):125–162
Article MathSciNet Google Scholar
De Almeida PL, Oliveira LS, Britto ADS, Sabourin R (2016) Handling concept drifts using dynamic selection of classifiers. In: 2016 IEEE 28th international conference on tools with artificial intelligence (ICTAI). IEEE, pp 989–995
Fanizzi N, dAmato C, Esposito F (2008) Conceptual clustering and its application to concept drift and novelty detection. In: European semantic web conference. Springer, pp 318–332
Gama J, Castillo G (2004) Learning with local drift detection. In: Advances in artificial intelligence. Springer, Berlin/Heidelberg, vol 3171, pp 286–295
Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):44
Article Google Scholar
Giacinto G, Roli F (2001) Dynamic classifier selection based on multiple classifier behaviour. Pattern Recognit 34(9):1879–1881
Article Google Scholar
Haque A, Khan L, Baron M (2016) Sand: semi-supervised adaptive novel class detection and classification over data stream. In: THIRTIETH AAAI conference on artificial intelligence
Huang S (2008) An active learning method for mining time-changing data streams. In: 2008 Second international symposium on intelligent information technology application. IEEE, vol 2, pp 548–552
Kantardzic M, Ryu JW, Walgampaya C (2010) Building a new classifier in an ensemble using streaming unlabeled data. In: International conference on industrial, engineering and other applications of applied intelligent systems. Springer, pp 77–86
Kmieciak M, Stefanowski J (2011) Handling sudden concept drift in enron messages data stream. Control Cybern 40(3):667–695
MATH Google Scholar
Kolter Z, Maloof M (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8(Dec):2755–2790
MATH Google Scholar
Kuncheva L, Skurichina M, Duin R (2002) An experimental study on diversity for bagging and boosting with linear classifiers. Inf Fusion 3(4):245–258
Article Google Scholar
Minku L, Yao X (2012) Ddd: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24(4):619–633
Article Google Scholar
Minku L, White A, Yao X (2010) The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(5):730–742
Article Google Scholar
Mitchell T (1997) Machine learning. McGraw-Hill Higher Education, New York
MATH Google Scholar
Muhlbaier M, Polikar R (2007) An ensemble approach for incremental learning in nonstationary environments. In: International workshop on multiple classifier systems. Springer, pp 490–500
Nishida K, Yamauchi K (2007) Detecting concept drift using statistical testing. In: International conference on discovery science. Springer, pp 264–269
Oza N, Russell S (2001) Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 359–364
Pezeshki M, Fan L, Brakel P, Courville A, Bengio Y (2016) Deconstructing the ladder network architecture. In: International conference on machine learning. pp 2368–2376
Pinage FA, dos Santos EM (2015) A dissimilarity-based drift detection method. In: 2015 IEEE 27th international conference on tools with artificial intelligence (ICTAI). IEEE, pp 1069–1076
Pinage FA, dos Santos EM, da Gama JMP (2016) Classification systems in dynamic environments: an overview. Wiley Interdiscip Rev Data Min Knowl Discov 6(5):156–166
Article Google Scholar
Ruta D, Gabrys B (2007) Neural network ensembles for time series prediction. In: 2007 International joint conference on neural networks. IEEE, pp 1204–1209
Spinosa E, de Leon de Carvalho AP, Gama J (2008) Cluster-based novel concept detection in data streams applied to intrusion detection in computer networks. In: Proceedings of the 2008 ACM symposium on applied computing. ACM, pp 976–980
Street N, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 377–382
Tsymbal A, Pechenizkiy M, Cunningham P, Puuronen S (2008) Dynamic integration of classifiers for handling concept drift. Inf Fusion 9(1):56–68
Article Google Scholar
Wang H, Fan W, Yu P, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 226–235
Woods K, Kegelmeyer WP, Bowyer K (1997) Combination of multiple classifiers using local accuracy estimates. IEEE Trans Pattern Anal Mach Intell 19(4):405–410
Article Google Scholar
Wu X, Li P, Hu X (2012) Learning from concept drifting data streams with unlabeled data. Neurocomputing 92:145–155
Article Google Scholar
Zliobaite I (2011) Combining similarity in time and space for training set formation under concept drift. Intell Data Anal 15(4):589–611
Article Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the financial support granted by PNPD-CAPES (Coordination for the Improvement of Higher Education Personnel) and FAPEAM (Amazonas Research Foundation) for this research through process Number 009/2012 (RHTI-Doutorado).

Author information

Authors and Affiliations

Institute of Computing, Federal University of Amazonas, Manaus, AM, Brazil
Felipe Pinagé & Eulanda M. dos Santos
Department of Informatics, Federal University of Paraná, Curitiba, PR, Brazil
Felipe Pinagé
Institute of Engineering and Computer Systems, University of Porto, Porto, Portugal
João Gama

Authors

Felipe Pinagé
View author publications
You can also search for this author in PubMed Google Scholar
Eulanda M. dos Santos
View author publications
You can also search for this author in PubMed Google Scholar
João Gama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Felipe Pinagé.

Additional information

Responsible editor: Charu Aggarwal.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pinagé, F., dos Santos, E.M. & Gama, J. A drift detection method based on dynamic classifier selection. Data Min Knowl Disc 34, 50–74 (2020). https://doi.org/10.1007/s10618-019-00656-w

Download citation

Received: 14 March 2017
Accepted: 01 October 2019
Published: 11 October 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s10618-019-00656-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A drift detection method based on dynamic classifier selection

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A survey on semi-supervised learning

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A drift detection method based on dynamic classifier selection

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A survey on semi-supervised learning

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation