Abstract
Concept drift problem is a common challenge for data stream mining, while the underlying distribution of incoming data unpredictably changes over time. The classifier model in data stream mining must be self-adjustable to the concept drift, otherwise it will get terrible classification results. To detect concept drift timely and accurately, this paper proposes an unsupervised online Concept Drift Detection algorithm based on Jensen-Shannon Divergence and EWMA(CDDDE), which detects concept drift through measuring the difference of data distribution within sliding windows and calculating the drift threshold dynamically by Exponentially Weighted Moving Average (EWMA), during the detection without the use of labels. Once concept drift is detected, a new classifier would be trained using the current and subsequent data. Experiments on artificial and real-world datasets show that CDDDE algorithm can efficiently detect the concept drift, and the retrained classifier effectively improves the classification accuracy for the subsequent data. Compared with some supervised algorithms, the detection accuracy and classification accuracy are higher for most datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Iwashita, A.S., Papa, J.P.: An Overview on Concept Drift Learning. IEEE Access 7, 1532–1547 (2019)
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with Drift Detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28645-5_29
Lu, J., Liu, A. Dong, F., Gu, F., Gama, J., Zhang, G.: Learning under Concept Drift: a Review. IEEE Trans. Knowl. Data Eng. 31, pp. 2346–2363 (2019)
Dongre P.B., Malik L.G.: A review on real time data stream classification and adapting to various concept drift scenarios. IEEE International Advance Computing Conference (IACC), pp. 533–537 (2014)
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 1–37 (2014)
Patil, M.M.: Handling Concept Drift in Data Streams by Using Drift Detection Methods. In: Balas, V.E., Sharma, N., Chakrabarti, A. (eds.) Data Management, Analytics and Innovation. AISC, vol. 839, pp. 155–166. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-1274-8_12
Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, et al.: Early drift detection method. In: Fourth International Workshop on Knowledge Discovery from Data Streams, pp. 77–86. (2006)
Frias-Blanco, I., del Campo-Avila, J., Ramos-Jimenez, G., Morales-Bueno, R., Ortiz-Diaz, A., Caballero-Mota, Y.: Online and Non-Parametric drift detection methods based on Hoeffding’s bounds. IEEE Trans. Knowl. Data Eng. 27(3), 810–823 (2015)
Wang, Z., Wang, W.: Concept Drift Detection Based on Kolmogorov–Smirnov Test. In: Liang, Q., Wang, W., Mu, J., Liu, X., Na, Z., Chen, B. (eds.) Artificial Intelligence in China. LNEE, vol. 572, pp. 273–280. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-0187-6_31
Dos Reis, D. M., Flach, P., Matwin, S., Batista, G.: Fast Unsupervised Online Drift Detection Using Incremental Kolmogorov-Smirnov Test. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1545–1554. Association for Computing Machinery, San Francisco, California, USA (2016)
Lu, N., Zhang, G., Lu, J.: Concept drift detection via competence models. Artif. Intell. 209, 11–28 (2014)
Chen, H.-L., Chen, M.-S., Lin, S.-C.: Catching the Trend: A Framework for Clustering Concept-Drifting Categorical Data. IEEE Trans. Knowl. Data Eng. 21(5), 652–665 (2009)
D’Ettorre, S., Viktor, H.L., Paquet, E.: Context-Based Abrupt Change Detection and Adaptation for Categorical Data Streams. In: Yamamoto, A., Kida, T., Uno, T., Kuboyama, T. (eds.) DS 2017. LNCS (LNAI), vol. 10558, pp. 3–17. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67786-6_1
Sethi, T.S., Kantardzic, M.: Don’t Pay for Validation: Detecting Drifts from Unlabeled data Using Margin Density. Procedia Comput. Sci. 53, 103–112 (2015)
Diaz-Rozo, J., Bielza, C., Larrañaga, P.: Clustering of Data Streams With Dynamic Gaussian Mixture Models: an IoT Application in Industrial Processes. In: IEEE Internet of Things Journal 5(5), pp. 3533–3547 (2018)
Ghani, N. L. A., Aziz, I. A., Mehat, M.: Concept Drift Detection on Unlabeled Data Streams: a Systematic Literature Review. In: 2020 IEEE Conference on Big Data and Analytics (ICBDA), pp. 61–65 (2020)
Hanen, B., Pedro, L., Concha, B..: Classifying Evolving Data Streams with Partially Labeled Data. Intell. Data Anal. 15(5), 655–670 (2011)
Wang, X., Kang, Q., An, J., Zhou, M.: Drifted Twitter Spam Classification Using Multiscale Detection Test on K-L Divergence. IEEE Access 7, pp. 108384–108394 (2019)
Yange, S., et al.: Adaptive ensemble classification algorithm for data streams based on information entropy J. Univ. Sci. Technol. Chin. 47(7), 575–582 (2017)
Guo, H., Li, H., Ren, Q., Wang, W.: Concept drift type identification based on multi-sliding windows: Husheng Guo, Hai Li, Qiaoyan Ren, Wenjian Wang. Inf. Sci. 585, 1–23 (2022)
Webb, G.I., Hyde, R., Cao, H., Nguyen, H.L., Petitjean, F.: Characterizing concept drift. Data Min. Knowl. Disc. 30(4), 964–994 (2016). https://doi.org/10.1007/s10618-015-0448-4
Ross, G.J., Adams, N.M., Tasoulis, D.K., Hand, D.J.: Exponentially weighted moving average charts for detecting concept drift. Pattern Recogn. Lett. 33(2), 191–198 (2012)
Khamassi, I., Sayed-Mouchaweh, M., Hammami, M., Ghédira, K.: Discussion and review on evolving data streams and concept drift adapting. Evol. Syst. 9(1), 1–23 (2016). https://doi.org/10.1007/s12530-016-9168-2
Acknowledgements
This research was supported by National Natural Science Foundation of China under Grant No. 62072236.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Fan, Q., Liu, C., Zhao, Y., Li, Y. (2023). Unsupervised Online Concept Drift Detection Based on Divergence and EWMA. In: Li, B., Yue, L., Tao, C., Han, X., Calvanese, D., Amagasa, T. (eds) Web and Big Data. APWeb-WAIM 2022. Lecture Notes in Computer Science, vol 13421. Springer, Cham. https://doi.org/10.1007/978-3-031-25158-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-25158-0_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25157-3
Online ISBN: 978-3-031-25158-0
eBook Packages: Computer ScienceComputer Science (R0)