skip to main content
research-article

Combining Diverse Meta-Features to Accurately Identify Recurring Concept Drift in Data Streams

Published:12 May 2023Publication History
Skip Abstract Section

Abstract

Learning from streaming data is challenging as the distribution of incoming data may change over time, a phenomenon known as concept drift. The predictive patterns, or experience learned under one distribution may become irrelevant as conditions change under concept drift, but may become relevant once again when conditions reoccur. Adaptive learning methods adapt a classifier to concept drift by identifying which distribution, or concept, is currently present in order to determine which experience is relevant. Identifying a concept requires some representation to be stored for comparison, with the quality of the representation being key to accurate identification. Existing concept representations are based on meta-features, efficient univariate summaries of a concept. However, no single meta-feature can fully represent a concept, leading to severe accuracy loss when existing representations cannot describe concept drift. To avoid these failure cases, we propose the first general framework for combining a diverse range of meta-features into a single representation. We solve two main challenges, first presenting a method of efficiently computing, storing, and querying an arbitrary set of meta-features as a single representation, showing that a combination of meta-features may successfully avoid failure cases seen with existing methods. Second, we present the first method for dynamically learning which meta-features distinguish concepts in any given dataset, significantly improving performance. Our proposed approach enables state-of-the-art feature selection methods, such as mutual information, to be applied to concept representation meta-features for the first time. We investigate tradeoffs between memory budget and classification performance, observing accuracy increases of up to 16% by dynamically weighting the contribution of each meta-feature.

REFERENCES

  1. [1] Ahmadi Zahra and Kramer Stefan. 2018. Modeling recurring concepts in data streams: A graph-based framework. Knowledge and Information Systems 55, 1 (2018), 1544.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Al-Khateeb Tahseen, Masud Mohammad M., Al-Naami Khaled M., Seker Sadi Evren, Mustafa Ahmad M., Khan Latifur, Trabelsi Zouheir, Aggarwal Charu, and Han Jiawei. 2016. Recurring and novel class detection using class-based ensemble for evolving data stream. IEEE Transactions on Knowledge and Data Engineering 28, 10 (2016), 27522764. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Alippi Cesare, Boracchi Giacomo, and Roveri Manuel. 2013. Just-in-time classifiers for recurrent concepts. IEEE Transactions on Neural Networks and Learning Systems 24, 4 (2013), 620634.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Anderson Robert, Koh Yun Sing, and Dobbie Gillian. 2016. CPF: Concept profiling framework for recurring drifts in data streams. In Proceedings of the Australasian Joint Conference on Artificial Intelligence. Springer, 203214.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Anderson Robert, Koh Yun Sing, Dobbie Gillian, and Bifet Albert. 2019. Recurring concept meta-learning for evolving data streams. Expert Systems with Applications 138 (2019), 112832.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Barddal Jean Paul, Enembreck Fabrício, Gomes Heitor Murilo, Bifet Albert, and Pfahringer Bernhard. 2019. Merit-guided dynamic feature selection filter for data streams. Expert Systems with Applications 116 (2019), 227242.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Barros Roberto S. M., Cabral Danilo R. L., Jr. Paulo M. Gonçalves, and Santos Silas G. T. C.. 2017. RDDM: Reactive drift detection method. Expert Systems with Applications 90, C (2017), 344355.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Bifet Albert and Gavaldà Ricard. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining. SIAM, 443448.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Brzezinski Dariusz and Stefanowski Jerzy. 2014. Prequential AUC for classifier evaluation and drift detection in evolving data streams. In Proceedings of the International Workshop on New Frontiers in Mining Complex Patterns. Springer, ACM, 87101.Google ScholarGoogle Scholar
  10. [10] Cavalcante Rodolfo C., Minku Leandro L., and Oliveira Adriano L. I.. 2016. FEDD: Feature extraction for explicit concept drift detection in time series. In Proceedings of the 2016 International Joint Conference on Neural Networks. IEEE, 740747.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Chiu Chun Wai and Minku Leandro L.. 2018. Diversity-based pool of models for dealing with recurring concepts. In Proceedings of the 2018 International Joint Conference on Neural Networks. IEEE, 18.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Costa Fausto G. da, Duarte Felipe S. L. G., Vallim Rosane M. M., and Mello Rodrigo F. de. 2017. Multidimensional surrogate stability to detect data stream concept drift. Expert Systems with Applications 87, C (2017), 1529.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Cabral Danilo Rafael de Lima and Barros Roberto Souto Maior de. 2018. Concept drift detection based on Fisher’s Exact test. Information Sciences 442, C (2018), 220234.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Ding Fengqian and Luo Chao. 2019. The entropy-based time domain feature extraction for online concept drift detection. Entropy 21, 12 (2019), 1187.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Dries Anton and Rückert Ulrich. 2009. Adaptive concept drift detection. Statistical Analysis and Data Mining: The ASA Data Science Journal 2, 5–6 (2009), 311327.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Gama João, Medas Pedro, Castillo Gladys, and Rodrigues Pedro. 2004. Learning with drift detection. In Advances in Artificial Intelligence—SBIA 2004. A. L. C. Bazzan and S. Labidi (Eds.), Springer, 286295.Google ScholarGoogle Scholar
  17. [17] Ghashami Mina, Liberty Edo, Phillips Jeff M., and Woodruff David P.. 2016. Frequent directions: Simple and deterministic matrix sketching. SIAM Journal on Computing 45, 5 (2016), 17621792.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Gomes Heitor M., Bifet Albert, Read Jesse, Barddal Jean Paul, Enembreck Fabrício, Pfharinger Bernhard, Holmes Geoff, and Abdessalem Talel. 2017. Adaptive random forests for evolving data stream classification. Machine Learning 106, 9–10 (2017), 14691495.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Jr. Paulo Mauricio Gonçalves and Barros Roberto Souto Maior de. 2013. RCD: A recurring concept drift framework. Pattern Recognition Letters 34, 9 (2013), 10181025.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Gu Quanquan, Li Zhenhui, and Han Jiawei. 2011. Generalized Fisher score for feature selection. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence. 266273.Google ScholarGoogle Scholar
  21. [21] Halstead Ben, Koh Yun Sing, Pechenizkiy Mykola, Bifet Albert, and Pears Russel. 2021. Fingerprinting concepts in data streams with supervised and unsupervised meta-information. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Halstead Ben, Koh Yun Sing, Riddle Patricia, Pears Russel, Pechenizkiy Mykola, and Bifet Albert. 2021. Recurring concept memory management in data streams: Exploiting data stream concept evolution to improve performance and transparency. Data Mining and Knowledge Discovery 35, 3 (2021), 141.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Haug Johannes and Kasneci Gjergji. 2021. Learning parameter distributions to detect concept drift in data streams. In Proceedings of the 2020 25th International Conference on Pattern Recognition. IEEE, 94529459.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] He Yi, Yuan Xu, Chen Sheng, and Wu Xindong. 2021. Online learning in variable feature spaces under incomplete supervision. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35, 41064114.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Hinder Fabian, Vaquet Valerie, and Hammer Barbara. 2022. Suitability of different metric choices for concept drift detection. In Advances in Intelligent Data Analysis XX: 20th International Symposium on Intelligent Data Analysis (IDA’22). Springer-Verlag, 157170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Hu Hanqing, Kantardzic Mehmed, and Lyu Lingyu. 2018. Detecting different types of concept drifts with ensemble framework. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications. IEEE, 344350.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Huang David Tse Jung, Koh Yun Sing, Dobbie Gillian, and Bifet Albert. 2015. Drift detection using stream volatility. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 417432.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Katakis Ioannis, Tsoumakas Grigorios, and Vlahavas Ioannis. 2010. Tracking recurring contexts using ensemble classifiers: An application to email filtering. Knowledge and Information Systems 22, 3 (2010), 371391.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Kolter J. Zico and Maloof Marcus A.. 2007. Dynamic weighted majority: An ensemble method for drifting concepts. The Journal of Machine Learning Research 8 (2007), 27552790.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Kuncheva Ludmila I.. 2011. Change detection in streaming multivariate data using likelihood detectors. IEEE Transactions on Knowledge and Data Engineering 25, 5 (2011), 11751180.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Kuncheva Ludmila I. and Faithfull William J.. 2013. PCA feature extraction for change detection in multidimensional unlabeled data. IEEE Transactions on Neural Networks and Learning Systems 25, 1 (2013), 6980.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Li Jundong, Cheng Kewei, Wang Suhang, Morstatter Fred, Trevino Robert P., Tang Jiliang, and Liu Huan. 2017. Feature selection: A data perspective. ACM Computing Surveys 50, 6 (2017), 145.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Lobo Jesus L., Ser Javier Del, Osaba Eneko, Bifet Albert, and Herrera Francisco. 2021. CURIE: A cellular automaton for concept drift detection. Data Mining and Knowledge Discovery 35, 6 (2021), 26552678.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Masud Mohammad, Gao Jing, Khan Latifur, Han Jiawei, and Thuraisingham Bhavani M.. 2010. Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Transactions on Knowledge and Data Engineering 23, 6 (2010), 859874.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Montiel Jacob, Read Jesse, Bifet Albert, and Abdessalem Talel. 2018. Scikit-Multiflow: A multi-output streaming framework. Journal of Machine Learning Research 19, 72 (2018), 15.Google ScholarGoogle Scholar
  36. [36] Reis Denis Moreira dos, Maletzke André, Silva Diego F., and Batista Gustavo EAPA. 2018. Classifying and counting with recurrent contexts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 19831992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Nishida Kyosuke and Yamauchi Koichiro. 2007. Detecting concept drift using statistical testing. In Proceedings of the 10th International Conference on Discovery Science. Springer, Berlin, 264269.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Pesaranghader Ali and Viktor Herna L.. 2016. Fast Hoeffding drift detection method for evolving data streams. In Machine Learning and Knowledge Discovery in Databases. P. Frasconi, N. Landwehr, G. Manco, and J. Vreeken (Eds.), Springer International Publishing, 96111.Google ScholarGoogle Scholar
  39. [39] Ren Siqi, Liao Bo, Zhu Wen, and Li Keqin. 2018. Knowledge-maximized ensemble algorithm for different types of concept drift. Information Sciences 430–431 (2018), 261281. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Ryu Joung Woo, Kantardzic Mehmed M., and Kim Myung-Won. 2012. Efficiently maintaining the performance of an ensemble classifier in streaming data. In Proceedings of the International Conference on Hybrid Information Technology. Springer, 533540.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Sethi Tegjyot Singh and Kantardzic Mehmed. 2017. On the reliable detection of concept drift from streaming unlabeled data. Expert Systems with Applications 82, C (2017), 7799.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Tang Jiliang, Alelyani Salem, and Liu Huan. 2014. Feature selection for classification: A review. In Data Classification: Algorithms and Applications. Charu C. Aggarwal (Ed.). CRC Press, 3764.Google ScholarGoogle Scholar
  43. [43] Vallim Rosane M. M. and Mello Rodrigo F. de. 2014. Proposal of a new stability concept to detect changes in unsupervised data streams. Expert Systems with Applications 41, 16 (2014), 73507360.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Wang Jialei, Zhao Peilin, Hoi Steven C. H., and Jin Rong. 2014. Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering 26, 3 (2014), 698710. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Wang Shuo and Minku Leandro L.. 2020. AUC estimation and concept drift detection for imbalanced data streams with multiple classes. In Proceedings of the 2020 International Joint Conference on Neural Networks. IEEE, 18.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Wang Shuo, Minku Leandro L., and Yao Xin. 2018. A systematic study of online class imbalance learning with concept drift. IEEE Transactions on Neural Networks and Learning Systems 29, 10 (2018), 48024821.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Wu Ocean, Koh Yun Sing, Dobbie Gillian, and Lacombe Thomas. 2021. Nacre: Proactive recurrent concept drift detection in data streams. In Proceedings of the 2021 International Joint Conference on Neural Networks. 18. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] You Dianlong, Sun Miaomiao, Liang Shunpan, Li Ruiqi, Wang Yang, Xiao Jiawei, Yuan Fuyong, Shen Limin, and Wu Xindong. 2022. Online feature selection for multi-source streaming features. Information Sciences 590, C (2022), 267295. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] You Dianlong, Wu Xindong, Shen Limin, Deng Song, Chen Zhen, Ma Chuan, and Lian Qiusheng. 2019. Online feature selection for streaming features using self-adaption sliding-window sampling. IEEE Access 7 (2019), 1608816100. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Zheng Shihao. 2019. Labelless Concept Drift Detection and Explanation. Master’s thesis. Eindhoven University of Technology.Google ScholarGoogle Scholar
  51. [51] Zheng Xiulin, Li Peipei, Hu Xuegang, and Yu Kui. 2021. Semi-supervised classification on data streams with recurring concept drift and concept evolution. Knowledge-Based Systems 215 (2021), 106749. DOI:Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Combining Diverse Meta-Features to Accurately Identify Recurring Concept Drift in Data Streams

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Knowledge Discovery from Data
          ACM Transactions on Knowledge Discovery from Data  Volume 17, Issue 8
          September 2023
          348 pages
          ISSN:1556-4681
          EISSN:1556-472X
          DOI:10.1145/3596449
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 May 2023
          • Online AM: 7 March 2023
          • Accepted: 27 February 2023
          • Revised: 30 December 2022
          • Received: 17 June 2022
          Published in tkdd Volume 17, Issue 8

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text