Using spectral entropy and bernoulli map to handle concept drift
Introduction
A data stream consists of an open-ended amount of input data which arrive sequentially and continuously along time (Gama & Gaber, 2007). For dealing with data stream mining, learning algorithms must be capable of handling concept drift, a phenomenon inherent to the dynamics of the data distribution over time (Tsymbal, 2004). In this context, learning algorithms may combine a classifier with a drift detector to identify changes in the distribution of the data in order to rapidly adapt or replace the predictive model.
Assuming that the only information received by the detector is the feedback of the prediction, in general 0 if the prediction is correct or 1 if it is incorrect, one can consider the error stream as a sequence of Bernoulli trials (Ross, Adams, Tasoulis, & Hand, 2012). In fact, some drift detectors reduce the problem of change detection to the problem of identifying significant changes in the estimate distribution of the error rate considering the hypothesis of independent observations (Ross et al., 2012, Gama et al., 2014, Žliobaite et al., 2015).
Several proposals have been presented in the literature for the detection of concept changes based on error rate of predictive models (Barros & Santos, 2018). In general, the non-stationarity of the error rate distribution grounds most of the approaches where the detectors signal distribution changes by evaluating the accuracy of the predictions by techniques derived from the sequential analysis, statistical process control or by monitoring distributions using sliding windows (Gama et al., 2014).
Despite the efficiency of many detectors based on the evaluation of distribution of the error rate, empirical studies have shown that the error rate can be influenced by temporal dependence (Žliobaite et al., 2015, Bifet, 2017). This effect is related to the propagation of performance because of successive incremental updates of the model (Bifet, 2017). Considering that a prediction needs to be made for each observation over time, a sequence of models are constructed incrementally, taking into account all or a subset of previous model , previous observations , and true labels , defined as , where f is the algorithm for model update (Žliobaite et al., 2015). Thereby, relevant information of the temporal dependence among the prediction errors are disregarded when only the non-stationarity of the error rate distribution is assessed.
Recently, new approaches considering dynamical system tools have been proposed for concept drift detection in data streams containing temporal dependence (Vallim and Mello, 2014, Costa et al., 2016, Costa et al., 2017). These studies suggest that concept drift must be characterized by temporal relationships among the data, and concept drift detection can be improved by analyzing data dependencies in a phase space (Takens, 1981) as well as by mean of divergence between power spectrum graphs produced from time-shift windows (Vallim & Mello, 2014). Thus, these approaches take into account possible time dependencies, assuming that the observations are not necessarily independent and identically distributed (i.i.d.).
Motivated by those new approaches, we here assess the error stream in an alternative way. Basically, we map the error stream into a time series in order to obtain more information about the temporal relationships among the error predictions considering the Bernoulli Map (Ott, 2002). Then, we use Spectral Entropy (Powell and Percival, 1979, Inouye et al., 1991) to evaluate divergences between power spectrum graphs, rather than the error stream distribution in order to identify concept drift.
We evaluated the feasibility of this new approach to concept drift detection using artificial and real-world datasets with base learners Naive Bayes (John & Langley, 1995) and Hoeffding Tree (Hulten, Spencer, & Domingos, 2001), available in the Massive Online Analysis (MOA) framework (Bifet, Holmes, Kirkby, & Pfahringer, 2010). In the artificial scenarios, the experiments considered both abrupt and gradual drifts, aiming to assess the detectors in different time intervals of concept transition. In addition to accuracy, the following metrics were used to assess the algorithms: Missed Detection Rate (MDR), Mean Time to Detection (MTD), Mean Time between False Alarms (MTFA), and Mean Time Ratio (MTR) (Basseville and Nikiforov, 1993, Bifet, 2017).
The rest of this paper is organized as follows: Section 2 surveys the published literature on concept drift detection methods; Section 3 presents the Bernoulli Shift Map and explains the algorithm to drift detection; Section 4 presents brief descriptions of the datasets and details the experimental setup; Section 5 reports the results and discusses the corresponding statistical evaluation; and, finally, Section 6 draws conclusions and presents future work.
Section snippets
Related work: drift detection methods
Different approaches have been proposed in the literature to learn from data streams containing concept drift (Barros and Santos, 2018, Barros and Santos, 2019). Likewise, drift detectors use distinct strategies to assess the performance of the base learner and signal concept drifts. In this context, a common configuration is to use a concept drift detector to analyze the error stream from a base learner such that a new instance of the base learner replaces the previous classifier whenever an
Handling temporal dependence on error streams
Despite the fact that the error rate can be influenced by temporal dependence, most approaches presented in Section 2 have assumed that the error stream is composed of a set of statistically independent observations. Consequently, no theoretical guarantees can be provided in these previous approaches (Žliobaite et al., 2015, Bifet, 2017). Inspired on Vallim and Mello, 2014, Costa et al., 2016, Costa et al., 2017, this work considers the possibility of temporal dependence on the errors produced
Experimental settings
This section describes the experiments designed to evaluate the SEDD algorithm, comparing it to the following drift detectors: CUSUM, derived from the sequential analysis; ADWIN, STEPD, FTDD, and FHDDM, based on distribution tracking using sliding windows; DDM and RDDM, categorized as statistical process control based methods. All the concept drift detection methods have been analyzed with synthetic and real-world datasets.
Synthetic datasets provide a controlled scenario and allows us to
Experimental results and analysis
This section presents the results of the experiments carried out to compare a number of detection algorithms using several statistical evaluations and to assess the accuracy and concept drift detections of the methods over the artificial dataset configurations, as well as the accuracy in real-world datasets, using the NB and HT base learners.
Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8 show the results obtained using HT and the drift detectors considering the
Conclusion
Inspired by new approaches considering dynamical system tools (Costa et al., 2016, Costa et al., 2017) and by the surrogate stability concept proposed by Vallim and Mello (2014) for concept drift detection in data streams, this article proposed SEDD, a new method that takes into account temporal dependencies on error streams and is not restricted to assumptions about data being i.i.d. Specifically, SEDD maps the temporal relationships among the error predictions into a time series using the
CRediT authorship contribution statement
Rohgi Toshio Meneses Chikushi: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing - original draft. Roberto Souto Maior de Barros: Conceptualization, Validation, Writing - review & editing, Supervision. Marilu Gomes N. Monte da Silva: Conceptualization, Supervision. Bruno Iran Ferreira Maciel: Software, Validation.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
Rohgi Chikushi is and Bruno Maciel was a PhD student, and both were supported by postgraduate grants from CAPES. Roberto S. M. Barros is supported by research Grant No. 310092/2019-1 from CNPq.
References (53)
- et al.
RDDM: Reactive drift detection method
Expert Systems with Applications
(2017) - et al.
Wilcoxon rank sum test drift detector
Neurocomputing
(2018) - et al.
A large-scale comparison of concept drift detectors
Information Sciences
(2018) - et al.
An overview and comprehensive comparison of ensembles for concept drift
Information Fusion
(2019) - et al.
Concept drift detection based on Fisher’s Exact test
Information Sciences
(2018) - et al.
Multidimensional surrogate stability to detect data stream concept drift
Expert Systems with Applications
(2017) - et al.
Quantification of EEG irregularity by use of the entropy of the power spectrum
Electroencephalography and clinical Neurophysiology
(1991) - et al.
Digitally generating true orbits of binary shift chaotic maps and their conjugates
Communications in Nonlinear Science and Numerical Simulation
(2018) - et al.
Exponentially weighted moving average charts for detecting concept drift
Pattern Recognition Letters
(2012) - et al.
A differential evolution based method for tuning concept drift detectors in data streams
Information Science
(2019)
Testing for nonlinearity in time series: the method of surrogate data
Physica D: Nonlinear Phenomena
Recurrence quantification analysis of the logistic equation with transients
Physics Letters A
Proposal of a new stability concept to detect changes in unsupervised data streams
Expert Systems with Applications
Database mining: a performance perspective
IEEE Transactions on Knowledge and Data Engineering
A boosting-like online learning ensemble
Detection of Abrupt Changes: Theory and Application
Classifier concept drift detection and the illusion of progress
Learning from time-changing data with adaptive windowing
MOA: massive online analysis
Journal of Machine Learning Research
Detecting dynamical changes in time series using the permutation entropy
Physical Review E
Using dynamical systems tools to detect concept drift in data streams
Expert Systems with Applications
Present position and potential developments: Some personal views statistical theory the prequential approach
Journal of the Royal Statistical Society: Series A (General)
Statistical comparisons of classifiers over multiple data sets
Journal of Machine Learning Research
On the interpretation of from contingency tables, and the calculation of
Journal of the Royal Statistical Society
Online and non-parametric drift detection methods based on hoeffding’s bounds
IEEE Transactions on Knowledge and Data Engineering
Cited by (13)
A survey on machine learning for recurring concept drifting data streams
2023, Expert Systems with ApplicationsCitation Excerpt :The fast growth of research works highlighting the importance and proposing new drift detection mechanisms has recently triggered the publication of several works surveying and benchmarking the main explicit drift detectors. Angelopoulos et al. (2021) and Chikushi, De Barros, da Silva, and Maciel (2021) benchmarked the impact of different detectors across different state-of-the-art incremental classifiers. Gonçalves et al. (2014) and Pesaranghader and Viktor (2016), compared state-of-the-art concept drift detectors.
From concept drift to model degradation: An overview on performance-aware drift detectors
2022, Knowledge-Based SystemsReview on novelty detection in the non-stationary environment
2024, Knowledge and Information SystemsImproved chaotic sparrow search algorithm and application based on Gaussian cloud
2023, Guangdianzi Jiguang/Journal of Optoelectronics Laser