Skip to main content

Unsupervised Online Concept Drift Detection Based on Divergence and EWMA

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13421))

  • 889 Accesses

Abstract

Concept drift problem is a common challenge for data stream mining, while the underlying distribution of incoming data unpredictably changes over time. The classifier model in data stream mining must be self-adjustable to the concept drift, otherwise it will get terrible classification results. To detect concept drift timely and accurately, this paper proposes an unsupervised online Concept Drift Detection algorithm based on Jensen-Shannon Divergence and EWMA(CDDDE), which detects concept drift through measuring the difference of data distribution within sliding windows and calculating the drift threshold dynamically by Exponentially Weighted Moving Average (EWMA), during the detection without the use of labels. Once concept drift is detected, a new classifier would be trained using the current and subsequent data. Experiments on artificial and real-world datasets show that CDDDE algorithm can efficiently detect the concept drift, and the retrained classifier effectively improves the classification accuracy for the subsequent data. Compared with some supervised algorithms, the detection accuracy and classification accuracy are higher for most datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Iwashita, A.S., Papa, J.P.: An Overview on Concept Drift Learning. IEEE Access 7, 1532–1547 (2019)

    Article  Google Scholar 

  2. Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with Drift Detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28645-5_29

    Chapter  Google Scholar 

  3. Lu, J., Liu, A. Dong, F., Gu, F., Gama, J., Zhang, G.: Learning under Concept Drift: a Review. IEEE Trans. Knowl. Data Eng. 31, pp. 2346–2363 (2019)

    Google Scholar 

  4. Dongre P.B., Malik L.G.: A review on real time data stream classification and adapting to various concept drift scenarios. IEEE International Advance Computing Conference (IACC), pp. 533–537 (2014)

    Google Scholar 

  5. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 1–37 (2014)

    Article  MATH  Google Scholar 

  6. Patil, M.M.: Handling Concept Drift in Data Streams by Using Drift Detection Methods. In: Balas, V.E., Sharma, N., Chakrabarti, A. (eds.) Data Management, Analytics and Innovation. AISC, vol. 839, pp. 155–166. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-1274-8_12

    Chapter  Google Scholar 

  7. Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, et al.: Early drift detection method. In: Fourth International Workshop on Knowledge Discovery from Data Streams, pp. 77–86. (2006)

    Google Scholar 

  8. Frias-Blanco, I., del Campo-Avila, J., Ramos-Jimenez, G., Morales-Bueno, R., Ortiz-Diaz, A., Caballero-Mota, Y.: Online and Non-Parametric drift detection methods based on Hoeffding’s bounds. IEEE Trans. Knowl. Data Eng. 27(3), 810–823 (2015)

    Article  Google Scholar 

  9. Wang, Z., Wang, W.: Concept Drift Detection Based on Kolmogorov–Smirnov Test. In: Liang, Q., Wang, W., Mu, J., Liu, X., Na, Z., Chen, B. (eds.) Artificial Intelligence in China. LNEE, vol. 572, pp. 273–280. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-0187-6_31

    Chapter  Google Scholar 

  10. Dos Reis, D. M., Flach, P., Matwin, S., Batista, G.: Fast Unsupervised Online Drift Detection Using Incremental Kolmogorov-Smirnov Test. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1545–1554. Association for Computing Machinery, San Francisco, California, USA (2016)

    Google Scholar 

  11. Lu, N., Zhang, G., Lu, J.: Concept drift detection via competence models. Artif. Intell. 209, 11–28 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  12. Chen, H.-L., Chen, M.-S., Lin, S.-C.: Catching the Trend: A Framework for Clustering Concept-Drifting Categorical Data. IEEE Trans. Knowl. Data Eng. 21(5), 652–665 (2009)

    Google Scholar 

  13. D’Ettorre, S., Viktor, H.L., Paquet, E.: Context-Based Abrupt Change Detection and Adaptation for Categorical Data Streams. In: Yamamoto, A., Kida, T., Uno, T., Kuboyama, T. (eds.) DS 2017. LNCS (LNAI), vol. 10558, pp. 3–17. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67786-6_1

    Chapter  Google Scholar 

  14. Sethi, T.S., Kantardzic, M.: Don’t Pay for Validation: Detecting Drifts from Unlabeled data Using Margin Density. Procedia Comput. Sci. 53, 103–112 (2015)

    Article  Google Scholar 

  15. Diaz-Rozo, J., Bielza, C., Larrañaga, P.: Clustering of Data Streams With Dynamic Gaussian Mixture Models: an IoT Application in Industrial Processes. In: IEEE Internet of Things Journal 5(5), pp. 3533–3547 (2018)

    Google Scholar 

  16. Ghani, N. L. A., Aziz, I. A., Mehat, M.: Concept Drift Detection on Unlabeled Data Streams: a Systematic Literature Review. In: 2020 IEEE Conference on Big Data and Analytics (ICBDA), pp. 61–65 (2020)

    Google Scholar 

  17. Hanen, B., Pedro, L., Concha, B..: Classifying Evolving Data Streams with Partially Labeled Data. Intell. Data Anal. 15(5), 655–670 (2011)

    Google Scholar 

  18. Wang, X., Kang, Q., An, J., Zhou, M.: Drifted Twitter Spam Classification Using Multiscale Detection Test on K-L Divergence. IEEE Access 7, pp. 108384–108394 (2019)

    Google Scholar 

  19. Yange, S., et al.: Adaptive ensemble classification algorithm for data streams based on information entropy J. Univ. Sci. Technol. Chin. 47(7), 575–582 (2017)

    Google Scholar 

  20. Guo, H., Li, H., Ren, Q., Wang, W.: Concept drift type identification based on multi-sliding windows: Husheng Guo, Hai Li, Qiaoyan Ren, Wenjian Wang. Inf. Sci. 585, 1–23 (2022)

    Google Scholar 

  21. Webb, G.I., Hyde, R., Cao, H., Nguyen, H.L., Petitjean, F.: Characterizing concept drift. Data Min. Knowl. Disc. 30(4), 964–994 (2016). https://doi.org/10.1007/s10618-015-0448-4

    Article  MathSciNet  MATH  Google Scholar 

  22. Ross, G.J., Adams, N.M., Tasoulis, D.K., Hand, D.J.: Exponentially weighted moving average charts for detecting concept drift. Pattern Recogn. Lett. 33(2), 191–198 (2012)

    Article  Google Scholar 

  23. Khamassi, I., Sayed-Mouchaweh, M., Hammami, M., Ghédira, K.: Discussion and review on evolving data streams and concept drift adapting. Evol. Syst. 9(1), 1–23 (2016). https://doi.org/10.1007/s12530-016-9168-2

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by National Natural Science Foundation of China under Grant No. 62072236.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunlong Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fan, Q., Liu, C., Zhao, Y., Li, Y. (2023). Unsupervised Online Concept Drift Detection Based on Divergence and EWMA. In: Li, B., Yue, L., Tao, C., Han, X., Calvanese, D., Amagasa, T. (eds) Web and Big Data. APWeb-WAIM 2022. Lecture Notes in Computer Science, vol 13421. Springer, Cham. https://doi.org/10.1007/978-3-031-25158-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25158-0_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25157-3

  • Online ISBN: 978-3-031-25158-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics