Abstract
Learning from streaming data is challenging as the distribution of incoming data may change over time, a phenomenon known as concept drift. The predictive patterns, or experience learned under one distribution may become irrelevant as conditions change under concept drift, but may become relevant once again when conditions reoccur. Adaptive learning methods adapt a classifier to concept drift by identifying which distribution, or concept, is currently present in order to determine which experience is relevant. Identifying a concept requires some representation to be stored for comparison, with the quality of the representation being key to accurate identification. Existing concept representations are based on meta-features, efficient univariate summaries of a concept. However, no single meta-feature can fully represent a concept, leading to severe accuracy loss when existing representations cannot describe concept drift. To avoid these failure cases, we propose the first general framework for combining a diverse range of meta-features into a single representation. We solve two main challenges, first presenting a method of efficiently computing, storing, and querying an arbitrary set of meta-features as a single representation, showing that a combination of meta-features may successfully avoid failure cases seen with existing methods. Second, we present the first method for dynamically learning which meta-features distinguish concepts in any given dataset, significantly improving performance. Our proposed approach enables state-of-the-art feature selection methods, such as mutual information, to be applied to concept representation meta-features for the first time. We investigate tradeoffs between memory budget and classification performance, observing accuracy increases of up to 16% by dynamically weighting the contribution of each meta-feature.
- [1] . 2018. Modeling recurring concepts in data streams: A graph-based framework. Knowledge and Information Systems 55, 1 (2018), 15–44.Google ScholarDigital Library
- [2] . 2016. Recurring and novel class detection using class-based ensemble for evolving data stream. IEEE Transactions on Knowledge and Data Engineering 28, 10 (2016), 2752–2764.
DOI: Google ScholarDigital Library - [3] . 2013. Just-in-time classifiers for recurrent concepts. IEEE Transactions on Neural Networks and Learning Systems 24, 4 (2013), 620–634.Google ScholarCross Ref
- [4] . 2016. CPF: Concept profiling framework for recurring drifts in data streams. In Proceedings of the Australasian Joint Conference on Artificial Intelligence. Springer, 203–214.Google ScholarDigital Library
- [5] . 2019. Recurring concept meta-learning for evolving data streams. Expert Systems with Applications 138 (2019), 112832.Google ScholarCross Ref
- [6] . 2019. Merit-guided dynamic feature selection filter for data streams. Expert Systems with Applications 116 (2019), 227–242.Google ScholarCross Ref
- [7] . 2017. RDDM: Reactive drift detection method. Expert Systems with Applications 90, C (2017), 344–355.Google ScholarDigital Library
- [8] . 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining. SIAM, 443–448.Google ScholarCross Ref
- [9] . 2014. Prequential AUC for classifier evaluation and drift detection in evolving data streams. In Proceedings of the International Workshop on New Frontiers in Mining Complex Patterns. Springer, ACM, 87–101.Google Scholar
- [10] . 2016. FEDD: Feature extraction for explicit concept drift detection in time series. In Proceedings of the 2016 International Joint Conference on Neural Networks. IEEE, 740–747.Google ScholarCross Ref
- [11] . 2018. Diversity-based pool of models for dealing with recurring concepts. In Proceedings of the 2018 International Joint Conference on Neural Networks. IEEE, 1–8.Google ScholarCross Ref
- [12] . 2017. Multidimensional surrogate stability to detect data stream concept drift. Expert Systems with Applications 87, C (2017), 15–29.Google ScholarDigital Library
- [13] . 2018. Concept drift detection based on Fisher’s Exact test. Information Sciences 442, C (2018), 220–234.Google ScholarDigital Library
- [14] . 2019. The entropy-based time domain feature extraction for online concept drift detection. Entropy 21, 12 (2019), 1187.Google ScholarCross Ref
- [15] . 2009. Adaptive concept drift detection. Statistical Analysis and Data Mining: The ASA Data Science Journal 2, 5–6 (2009), 311–327.Google ScholarCross Ref
- [16] . 2004. Learning with drift detection. In Advances in Artificial Intelligence—SBIA 2004. A. L. C. Bazzan and S. Labidi (Eds.), Springer, 286–295.Google Scholar
- [17] . 2016. Frequent directions: Simple and deterministic matrix sketching. SIAM Journal on Computing 45, 5 (2016), 1762–1792.Google ScholarDigital Library
- [18] . 2017. Adaptive random forests for evolving data stream classification. Machine Learning 106, 9–10 (2017), 1469–1495.Google ScholarDigital Library
- [19] . 2013. RCD: A recurring concept drift framework. Pattern Recognition Letters 34, 9 (2013), 1018–1025.Google ScholarDigital Library
- [20] . 2011. Generalized Fisher score for feature selection. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence. 266–273.Google Scholar
- [21] . 2021. Fingerprinting concepts in data streams with supervised and unsupervised meta-information. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering.Google ScholarCross Ref
- [22] . 2021. Recurring concept memory management in data streams: Exploiting data stream concept evolution to improve performance and transparency. Data Mining and Knowledge Discovery 35, 3 (2021), 1–41.Google ScholarDigital Library
- [23] . 2021. Learning parameter distributions to detect concept drift in data streams. In Proceedings of the 2020 25th International Conference on Pattern Recognition. IEEE, 9452–9459.Google ScholarCross Ref
- [24] . 2021. Online learning in variable feature spaces under incomplete supervision. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35, 4106–4114.Google ScholarCross Ref
- [25] . 2022. Suitability of different metric choices for concept drift detection. In Advances in Intelligent Data Analysis XX: 20th International Symposium on Intelligent Data Analysis (IDA’22). Springer-Verlag, 157–170. Google ScholarDigital Library
- [26] . 2018. Detecting different types of concept drifts with ensemble framework. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications. IEEE, 344–350.Google ScholarCross Ref
- [27] . 2015. Drift detection using stream volatility. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 417–432.Google ScholarDigital Library
- [28] . 2010. Tracking recurring contexts using ensemble classifiers: An application to email filtering. Knowledge and Information Systems 22, 3 (2010), 371–391.Google ScholarCross Ref
- [29] . 2007. Dynamic weighted majority: An ensemble method for drifting concepts. The Journal of Machine Learning Research 8 (2007), 2755–2790.Google ScholarDigital Library
- [30] . 2011. Change detection in streaming multivariate data using likelihood detectors. IEEE Transactions on Knowledge and Data Engineering 25, 5 (2011), 1175–1180.Google ScholarDigital Library
- [31] . 2013. PCA feature extraction for change detection in multidimensional unlabeled data. IEEE Transactions on Neural Networks and Learning Systems 25, 1 (2013), 69–80.Google ScholarCross Ref
- [32] . 2017. Feature selection: A data perspective. ACM Computing Surveys 50, 6 (2017), 1–45.Google ScholarDigital Library
- [33] . 2021. CURIE: A cellular automaton for concept drift detection. Data Mining and Knowledge Discovery 35, 6 (2021), 2655–2678.Google ScholarDigital Library
- [34] . 2010. Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Transactions on Knowledge and Data Engineering 23, 6 (2010), 859–874.Google ScholarDigital Library
- [35] . 2018. Scikit-Multiflow: A multi-output streaming framework. Journal of Machine Learning Research 19, 72 (2018), 1–5.Google Scholar
- [36] . 2018. Classifying and counting with recurrent contexts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1983–1992.Google ScholarDigital Library
- [37] . 2007. Detecting concept drift using statistical testing. In Proceedings of the 10th International Conference on Discovery Science. Springer, Berlin, 264–269.Google ScholarDigital Library
- [38] . 2016. Fast Hoeffding drift detection method for evolving data streams. In Machine Learning and Knowledge Discovery in Databases. P. Frasconi, N. Landwehr, G. Manco, and J. Vreeken (Eds.), Springer International Publishing, 96–111.Google Scholar
- [39] . 2018. Knowledge-maximized ensemble algorithm for different types of concept drift. Information Sciences 430–431 (2018), 261–281.
DOI: Google ScholarCross Ref - [40] . 2012. Efficiently maintaining the performance of an ensemble classifier in streaming data. In Proceedings of the International Conference on Hybrid Information Technology. Springer, 533–540.Google ScholarCross Ref
- [41] . 2017. On the reliable detection of concept drift from streaming unlabeled data. Expert Systems with Applications 82, C (2017), 77–99.Google ScholarDigital Library
- [42] . 2014. Feature selection for classification: A review. In Data Classification: Algorithms and Applications. Charu C. Aggarwal (Ed.). CRC Press, 37–64.Google Scholar
- [43] . 2014. Proposal of a new stability concept to detect changes in unsupervised data streams. Expert Systems with Applications 41, 16 (2014), 7350–7360.Google ScholarDigital Library
- [44] . 2014. Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering 26, 3 (2014), 698–710.
DOI: Google ScholarDigital Library - [45] . 2020. AUC estimation and concept drift detection for imbalanced data streams with multiple classes. In Proceedings of the 2020 International Joint Conference on Neural Networks. IEEE, 1–8.Google ScholarCross Ref
- [46] . 2018. A systematic study of online class imbalance learning with concept drift. IEEE Transactions on Neural Networks and Learning Systems 29, 10 (2018), 4802–4821.Google ScholarCross Ref
- [47] . 2021. Nacre: Proactive recurrent concept drift detection in data streams. In Proceedings of the 2021 International Joint Conference on Neural Networks. 1–8.
DOI: Google ScholarCross Ref - [48] . 2022. Online feature selection for multi-source streaming features. Information Sciences 590, C (2022), 267–295.
DOI: Google ScholarDigital Library - [49] . 2019. Online feature selection for streaming features using self-adaption sliding-window sampling. IEEE Access 7 (2019), 16088–16100.
DOI: Google ScholarCross Ref - [50] . 2019. Labelless Concept Drift Detection and Explanation. Master’s thesis. Eindhoven University of Technology.Google Scholar
- [51] . 2021. Semi-supervised classification on data streams with recurring concept drift and concept evolution. Knowledge-Based Systems 215 (2021), 106749.
DOI: Google ScholarCross Ref
Index Terms
- Combining Diverse Meta-Features to Accurately Identify Recurring Concept Drift in Data Streams
Recommendations
Algorithm of Recurring Concept Drift Base on Main Feature Extraction
ICCAI '19: Proceedings of the 2019 5th International Conference on Computing and Artificial IntelligenceRecurring concept drift is one of the sub-types of concept drift. In recurring concept drift detection, it is very important to represent concepts and select the most appropriate classifier to classify. We propose an algorithm, conceptual clustering and ...
CPF: Concept Profiling Framework for Recurring Drifts in Data Streams
AI 2016: Advances in Artificial IntelligenceAbstractWe propose the Concept Profiling Framework (CPF), a meta-learner that uses a concept drift detector and a collection of classification models to perform effective classification on data streams with recurrent concept drifts, through relating ...
Modeling recurring concepts in data streams: a graph-based framework
Classifying a stream of non-stationary data with recurrent drift is a challenging task and has been considered as an interesting problem in recent years. All of the existing approaches handling recurrent concepts maintain a pool of concepts/classifiers ...
Comments