Skip to main content

Benchmarking Concept Drift Detectors for Online Machine Learning

  • Conference paper
  • First Online:
Model and Data Engineering (MEDI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13761))

Included in the following conference series:

  • 787 Accesses

Abstract

Concept drift detection is an essential step to maintain the accuracy of online machine learning. The main task is to detect changes in data distribution that might cause changes in the decision boundaries for a classification algorithm. Upon drift detection, the classification algorithm may reset its model or concurrently grow a new learning model. Over the past fifteen years, several drift detection methods have been proposed. Most of these methods have been implemented within the Massive Online Analysis (MOA). Moreover, a couple of studies have compared the drift detectors. However, such studies have merely focused on comparing the detection accuracy. Moreover, most of these studies are focused on synthetic data sets only. Additionally, these studies do not consider drift detectors not integrated into MOA. Furthermore, None of the studies have considered other metrics like resource consumption and runtime characteristics. These metrics are of utmost importance from an operational point of view.

In this paper, we fill this gap. Namely, this paper evaluates the performance of sixteen different drift detection methods using three different metrics: accuracy, runtime, and memory usage. To guarantee a fair comparison, MOA is used. Fourteen algorithms are implemented in MOA. We integrate two new algorithms (ADWIN++ and SDDM) into MOA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/mahmoudmahgoub/moa.

  2. 2.

    https://github.com/openjdk/jmh.

  3. 3.

    https://visualvm.github.io/.

References

  1. Baena-Garcıa, M., del Campo-Ɓvila, J., Fidalgo, R., Bifet, A., Gavalda, R., Morales-Bueno, R.: Early drift detection method. In: Fourth International Workshop on Knowledge Discovery from Data Streams, vol. 6, pp. 77ā€“86 (2006)

    Google Scholar 

  2. de Barros, R.S.M., de Lima Cabral, D.R., GonƧalves Jr, P.M.G., de Carvalho Santos, S.G.T.: RDDM: reactive drift detection method. Expert Syst. Appl. 90, 344ā€“355 (2017)

    Google Scholar 

  3. Barros, R.S.M., Santos, S.G.T.C.: A large-scale comparison of concept drift detectors. Inf. Sci. 451ā€“452, 348ā€“370 (2018)

    Article  MathSciNet  Google Scholar 

  4. Bifet, A., GavaldĆ , R.: Learning from time-changing data with adaptive windowing. In: ICDM, pp. 443ā€“448. SIAM (2007)

    Google Scholar 

  5. Bifet, A., GavaldĆ , R., Holmes, G., Pfahringer, B.: Machine Learning for Data Streams with Practical Examples in MOA. MIT Press, Cambridge (2018)

    Book  Google Scholar 

  6. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601ā€“1604 (2010)

    Google Scholar 

  7. Brzeziński, D., Stefanowski, J.: Accuracy updated ensemble for data streams with concept drift. In: Corchado, E., Kurzyński, M., WoÅŗniak, M. (eds.) HAIS 2011. LNCS (LNAI), vol. 6679, pp. 155ā€“163. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21222-2_19

    Chapter  Google Scholar 

  8. Domingos, P.M., Hulten, G.: Mining high-speed data streams. In: SIGKDD, pp. 71ā€“80. ACM (2000)

    Google Scholar 

  9. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley (2001)

    Google Scholar 

  10. FrĆ­as-Blanco, I., del Campo-Ɓvila, J., Ramos-JimĆ©nez, G., Morales-Bueno, R., Ortiz-DĆ­az, A., Caballero-Mota, Y.: Online and non-parametric drift detection methods based on Hoeffdingā€™s bounds. IEEE Trans. Knowl. Data Eng. 27(3), 810ā€“823 (2015)

    Article  Google Scholar 

  11. Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286ā€“295. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28645-5_29

    Chapter  Google Scholar 

  12. Gama, J., Zliobaite, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1ā€“44:37 (2014)

    Google Scholar 

  13. GonƧalves, P.M., de Carvalho Santos, S.G., Barros, R.S., Vieira, D.C.: A comparative study on concept drift detectors. Expert Syst. Appl. 41(18), 8144ā€“8156 (2014)

    Article  Google Scholar 

  14. Grulich, P.M., Saitenmacher, R., Traub, J., BreƟ, S., Rabl, T., Markl, V.: Scalable detection of concept drifts on data streams with parallel adaptive windowing. In: EDBT, pp. 477ā€“480. OpenProceedings.org (2018)

    Google Scholar 

  15. Han, M., Chen, Z., Li, M., Wu, H., Zhang, X.: A survey of active and passive concept drift handling methods. Comput. Intell. 38(4), 1492ā€“1535 (2022)

    Article  Google Scholar 

  16. Huang, D.T.J., Koh, Y.S., Dobbie, G., Pears, R.: Detecting volatility shift in data streams, pp. 863ā€“868 (2014)

    Google Scholar 

  17. Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: a new ensemble method for tracking concept drift. In: ICDM, pp. 123ā€“130. IEEE (2003)

    Google Scholar 

  18. de Lima Cabral, D.R., de Barros, R.S.M.: Concept drift detection based on Fisherā€™s Exact test. Inf. Sci. 442, 220ā€“234 (2018)

    Google Scholar 

  19. Liu, G., Cheng, H.R., Qin, Z.G., Liu, Q., Liu, C.X.: E-CVFDT: an improving CVFDT method for concept drift data stream. In: ICCCAS, vol. 1, pp. 315ā€“318. IEEE (2013)

    Google Scholar 

  20. Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., Zhang, G.: Learning under concept drift: a review. IEEE TKDE 31(12), 2346ā€“2363 (2019)

    Google Scholar 

  21. Micevska, S., Awad, A., Sakr, S.: SDDM: an interpretable statistical concept drift detection method for data streams. J. Intell. Inf. Syst. 56(3), 459ā€“484 (2021). https://doi.org/10.1007/s10844-020-00634-5

    Article  Google Scholar 

  22. Moharram, H., Awad, A., El-Kafrawy, P.M.: Optimizing ADWIN for steady streams. In: ACM/SIGAPP SAC, pp. 450ā€“459. ACM (2022)

    Google Scholar 

  23. Nishida, K., Yamauchi, K.: Detecting concept drift using statistical testing. In: Corruble, V., Takeda, M., Suzuki, E. (eds.) DS 2007. LNCS (LNAI), vol. 4755, pp. 264ā€“269. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75488-6_27

    Chapter  Google Scholar 

  24. Page, E.S.: Continuous inspection schemes. Biometrika 41(1/2), 100ā€“115 (1954). https://doi.org/10.1093/biomet/41.1-2.100

    Article  MathSciNet  MATH  Google Scholar 

  25. Pears, R., Sripirakas, S., Koh, Y.S.: Detecting concept change in dynamic data streams. Mach. Learn. 97, 259ā€“293 (2014). https://doi.org/10.1007/s10994-013-5433-9

    Article  MathSciNet  MATH  Google Scholar 

  26. Pesaranghader, A., Viktor, H.L., Paquet, E.: McDiarmid drift detection methods for evolving data streams. In: IJCNN, pp. 1ā€“9. IEEE (2018)

    Google Scholar 

  27. Roberts, S.W.: Control chart tests based on geometric moving averages. Technometrics 1(3), 239ā€“250 (1959). http://www.jstor.org/stable/1266443

  28. Ross, G.J., Adams, N.M., Tasoulis, D.K., Hand, D.J.: Exponentially weighted moving average charts for detecting concept drift. Pattern Recogn. Lett. 33(2), 191ā€“198 (2012). https://www.sciencedirect.com/science/article/pii/S0167865511002704

  29. Sakthithasan, S., Pears, R., Koh, Y.S.: One pass concept change detection for data streams. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 461ā€“472. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_39

    Chapter  Google Scholar 

  30. Sobolewski, P., Wozniak, M.: Enhancing concept drift detection with simulated recurrence. In: Pechenizkiy, M., Wojciechowski, M. (eds.) New Trends in Databases and Information Systems. AISC, vol. 185, pp. 153ā€“162. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32518-2_15

    Chapter  Google Scholar 

  31. Souza, V.M.A., dos Reis, D.M., Maletzke, A.G., Batista, G.E.A.P.A.: Challenges in benchmarking stream learning algorithms with real-world data. Data Min. Knowl. Discov. 34(6), 1805ā€“1858 (2020). https://doi.org/10.1007/s10618-020-00698-5

    Article  MathSciNet  MATH  Google Scholar 

  32. Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: SIGKDD, pp. 377ā€“382. ACM (2001)

    Google Scholar 

  33. Wald, A.: Sequential Analysis. Courier Corporation (1973)

    Google Scholar 

  34. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: SIGKDD, pp. 226ā€“235. ACM (2003)

    Google Scholar 

  35. Wares, S., Isaacs, J., Elyan, E.: Data stream mining: methods and challenges for handling concept drift. SN Appl. Sci. 1(11), 1ā€“19 (2019). https://doi.org/10.1007/s42452-019-1433-0

    Article  Google Scholar 

  36. Webb, G.I., Lee, L.K., Petitjean, F., Goethals, B.: Understanding concept drift. CoRR abs/1704.00362 (2017)

    Google Scholar 

Download references

Acknowledgments

The work of Ahmed Awad is funded by the European Regional Development Funds (Mobilitas Plus Programme grant MOBTT75).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Awad .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mahgoub, M., Moharram, H., Elkafrawy, P., Awad, A. (2023). Benchmarking Concept Drift Detectors for Online Machine Learning. In: Fournier-Viger, P., Hassan, A., Bellatreche, L. (eds) Model and Data Engineering. MEDI 2022. Lecture Notes in Computer Science, vol 13761. Springer, Cham. https://doi.org/10.1007/978-3-031-21595-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21595-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21594-0

  • Online ISBN: 978-3-031-21595-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics