Skip to main content
Log in

Concept drift detection and accelerated convergence of online learning

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Streaming data has become an important form in the era of big data, and the concept drift, as one of the most important problem of it, is often studied deeply. However, similar to true concept drift, noise and too small training samples will also lead to the classification performance fluctuation, which is easy to confuse with true concept drift. To solve this problem, an improved concept drift detection method is proposed, and the accelerated convergence of the model after concept drift is also studied. Firstly, the effective fluctuation sites can be obtained by group detection method. Secondly, the authenticity of concept drift can be determined by tracking the testing accuracy of reference sites near the effective fluctuation site. Lastly, in the convergence acceleration stage, the time sequential distance is designed to measure the similarity of these sequential data blocks during different time periods, and the noncritical disturbance data with the largest time sequential distance are removed sequentially to improve the convergence speed of the model after concept drift occurs. The experimental results demonstrate that the proposed method not only produces better identification results in distinguishing true and false concept drift but also improves the convergence speed of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Ditzler G, Roveri M, Alippi C, Polikar R (2015) Learning in nonstationary environments: A survey. IEEE Comput Intell Mag 10(4):12–25

    Article  Google Scholar 

  2. Aggarwal CC (2014) A survey of stream classification algorithms, In: Data classification: algorithms and applications, pp 245-274

  3. García-García D, Parrado-Hernández E, Diaz-de-Maria F (2011) State-space dynamics distance for clustering sequential data. Pattern Recogn 44(5):1014–1022

    Article  MATH  Google Scholar 

  4. Havens TC, Bezdek JC, Leckie C, Hall LO, Palaniswami M (2012) Fuzzy c-means algorithms for very large data. IEEE Trans Fuzzy Syst 20(6):1130–1146

    Article  Google Scholar 

  5. Beyene AA, Welemariam T, Lavesson N, Persson M (2015) Improved concept drift handling in surgery prediction and other applications. Knowl Inf Syst 44(1):177–196

    Article  Google Scholar 

  6. Brzezinski D, Stefanowski J (2017) Prequential AUC: properties of the area under the ROC curve for data streams with concept drift. Knowl Inf Syst 52(2):531–562

    Article  Google Scholar 

  7. Goldenberg I, Webb GI (2019) Survey of distance measures for quantifying concept drift and shift in numeric data. Knowl Inf Syst 60(2):591–615

    Article  Google Scholar 

  8. Losing V, Hammer B, Wersing H (2018) Tackling heterogeneous concept drift with the self-adjusting memory (SAM). Knowl Inf Syst 54(1):171–201

    Article  Google Scholar 

  9. Souza VMA, Parmezan ARS, Chowdhury FA, Mueen A (2021) Efficient unsupervised drift detector for fast and high-dimensional data streams. Knowl Inf Syst 63(6):1497–1527

    Article  Google Scholar 

  10. Brzezinski D, Minku LL, Pewinski T, Stefanowski J, Szumaczuk A (2021) The impact of data difficulty factors on classification of imbalanced and concept drifting data streams. Knowl Inf Syst 63(6):1429–1469

    Article  Google Scholar 

  11. Liu A, Lu J, Liu F, Zhang G (2018) Accumulating regional density dissimilarity for concept drift detection in data streams. Pattern Recogn 76:256–272

    Article  Google Scholar 

  12. Lu N, Lu J, Zhang G, De Mantaras RL (2016) A concept drift-tolerant case-base editing technique. Artif Intell 230:108–133

    Article  MathSciNet  MATH  Google Scholar 

  13. Méndez JR, Glez-Peña D, Fdez-Riverola F, Díaz F, Corchado JM (2009) Managing irrelevant knowledge in CBR models for unsolicited E-mail classification. Expert Syst Appl 36(2):1601–1614

    Article  Google Scholar 

  14. Muhlbaier MD, Polikar R (2007) An ensemble approach for incremental learning in nonstationary environments, In: Proceedings of the 7th international workshop on multiple classifier systems, pp 490-500

  15. Krempl G, Žliobaite I, Brzeziński D, Hüllermeier E, Last M, Lemaire V, Noack T, Shaker A, Sievi S, Spiliopoulou M, Stefanowski J (2014) Open challenges for data stream mining research. ACM SIGKDD Explor Newsl 16(1):1–10

    Article  Google Scholar 

  16. Harel M, Crammer K, EI-Yaniv R, Mannor S (2014) Concept drift detection through resampling, In: Proceedings of the 31st international conference on international conference on machine learning, vol. 32, pp 1009-1017

  17. Wang S, Minku LL, Ghezzi D, Caltabiano D, Tino P, Yao X (2013) Concept drift detection for online class imbalance learning, In: Proceedings of the IEEE international joint conference on neural networks (IJCNN), pp 1-10

  18. Sobhani P, Beigy H (2011) New drift detection method for data streams, In: Proceedings of the international conference on adaptive and intelligent systems, pp 88-97

  19. Kuncheva LI (2008) Classifier ensembles for detecting concept change in streaming data: Overview and perspectives, In: Proceedings of the second workshop SUEMA, pp 5–9

  20. Gama J, Medas P, Castillo G, Rodrigues PP (2004) Learning with drift detection, In: Proceedings of the 17th Brazilian symposium on artificial intelligence, vol. 3171, pp 286–295

  21. Baena-Garcia M, Del Campo-Avila J, Fidalgo R, Bifet A (2006) Early drift detection method, In: Proceedings of the 4th ECML PKDD international workshop on knowledge discovery from data streams, pp 77–86

  22. Rakitianskaia AS, Engelbrecht AP (2012) Training feedforward neural networks with dynamic particle swarm optimization. Swarm Intell 6(3):233–270

    Article  Google Scholar 

  23. Han JG, Hui XF, Sun J (2010) Dynamic financial distress prediction modeling based on slip time window and multiple classifiers, In: Proceedings of the 17th annual international conference on management science and engineering, IEEE, pp 148–155

  24. Sun J, Li H (2011) Dynamic financial distress prediction using instance selection for the disposal of concept drift. Expert Syst Appl 38(3):2566–2576

    Article  Google Scholar 

  25. Guo HS, Li H, Ren QY, Wang WJ (2022) Concept drift type identification based on multi-sliding windows. Inf Sci 585:1–23

    Article  Google Scholar 

  26. Gama J, Žliobaité I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):44

    Article  MATH  Google Scholar 

  27. Gomes HM, Bifet A, Read J, Barddal JP, Enembreck F, Pfharinger B, Holmes G, Abdessalem T (2017) Adaptive random forests for evolving data stream classification. Mach Learn 106(9):1469–1495

    Article  MathSciNet  Google Scholar 

  28. Jaber G, Cornuéjols A, Tarroux P (2013) A new online learning method for coping with recurring concepts: the ADACC system, In: Proceedings of the international conference on neural information processing, Springer, Berlin, Heidelberg, pp 595-604

  29. Siahroudi SK, Moodi PZ, Beigy H (2018) Detection of evolving concepts in non-stationary data streams: A multiple kernel learning approach. Expert Syst Appl 91:187–197

    Article  Google Scholar 

  30. Zhao P, Hoi SCH, Wang J, Li B (2014) Online transfer learning. Artif Intell 216(16):76–102

    Article  MathSciNet  MATH  Google Scholar 

  31. Masud MM, Chen Q, Khan L, Aggarwal CC (2013) Classification and adaptive novel class detection of feature-evolving data streams. IEEE Trans Knowl Data Eng 25(7):1484–1497

    Article  Google Scholar 

  32. Brzeziński D, Stefanowski J (2014) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans Neural Netw Learn Syst IEEE 25(1):81–94

    Article  Google Scholar 

  33. Soares SG, Araújo R (2015) A dynamic and online ensemble regression for changing environments. Expert Syst Appl 42(6):2935–2948

    Article  Google Scholar 

  34. Street WN, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification, In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 377-382

  35. Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers, In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 226-235

  36. Saurav S, Malhotra P, Tv V, Gugulothu N, Vig L, Agarwal P, Shroff G (2018) Online anomaly detection with concept drift adaptation using recurrent neural networks, In: Proceedings of the ACM india joint international conference on data science and management of data, pp 78-87

  37. Nguyen TTT, Nguyen TT, Liew AWC, Wang SL (2018) Variational inference based bayes online classifiers with concept drift adaptation. Pattern Recogn 81:280–293

    Article  Google Scholar 

  38. Won D, Jansen PJ, Carbonell JG (2018) Temporal transfer learning for drift adaptation, In: Proceedings of the 26th European symposium on artificial neural networks, computational intelligence and machine learning (ESANN)

  39. Guo HS, Zhang S, Wang WJ (2021) Selective ensemble-based online adaptive deep neural networks for streaming data with concept drift. Neural Netw 142:437–456

    Article  Google Scholar 

  40. Mirza B, Lin ZP, Liu N (2015) Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift. Neurocomputing 149:316–329

    Article  Google Scholar 

  41. Yang C, Yin XC, Hao HW (2014) Diversity-based ensemble with sample weight learning, In: Proceedings of the international conference on pattern recognition (ICPR), pp 1236-1241

  42. Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531

    Article  Google Scholar 

  43. Losing V, Hammer B, Wersing H (2016) KNN classifier with self adjusting memory for heterogeneous concept drift, In: Proceedings of the IEEE 16th international conference on data mining (ICDM), IEEE, pp 291C300

  44. KDDcup99 data [Online], available:http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.htlm

  45. Deselaers T, Gass T, Heigold G, Ney H (2012) Latent log-linear models for handwritten digit classification. IEEE Trans Pattern Anal Mach Intell 34(6):1105–1117

    Article  Google Scholar 

  46. Nassih B, Amine A, Hmina N (2016) Face classification under different kernel function compared to KNN classifier, In: Proceedings of the 13th international conference computer graphics, imaging and visualization

  47. Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines, In: Proceedings of the 17th international conference on machine learning, pp 487-494

  48. Wittek P, Tan CL (2011) Compactly supported basis functions as support vector kernels for classification. IEEE Trans Pattern Anal Mach Intell 33(10):2039–2050

    Article  Google Scholar 

  49. Guo HS, Wang WJ (2015) An active learning-based SVM multi-class classification model. Pattern Recogn 48(5):1577–1597

    Article  MATH  Google Scholar 

  50. Zhu Q, Hu X, Zhang Y, Li P, Wu X (2010) A double-window-based classification algorithm for concept drifting data streams, In: Proceedings of the 2010 IEEE international conference on granular computing (GrC), IEEE, pp 639-644

Download references

Acknowledgements

This research was partially supported by the National Natural Science Foundation of China (62276157, U21A20513, 62076154, U1805263, 61503229), Special Foundation from the Central Finance to Support the Development of Local University (YDZX20201400001224), Natural Science Foundation of Shanxi Province (No. 201901D111033).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenjian Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, H., Li, H., Sun, N. et al. Concept drift detection and accelerated convergence of online learning. Knowl Inf Syst 65, 1005–1043 (2023). https://doi.org/10.1007/s10115-022-01790-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01790-6

Keywords

Navigation