Using spectral entropy and bernoulli map to handle concept drift

doi:10.1016/j.eswa.2020.114114

Expert Systems with Applications

Volume 167, 1 April 2021, 114114

https://doi.org/10.1016/j.eswa.2020.114114 Get rights and content

Highlights

•
SEDD: a drift detector that considers temporal dependencies on error streams.
•
SEDD detects concept drifts based on Spectral Entropy and uses the Bernoulli Map.
•
SEDD is not restricted to scenarios that assume data on the stream are i.i.d.
•
SEDD was tested against state of the art detectors using two base learners.
•
Experiments confirmed SEDD was competitive with state-of-the-art methods.

Abstract

Data stream mining is a relevant task to extract information from large amounts of data that continuously evolve over time. In this context, learning algorithms may combine a classifier and a drift detector to identify changes in the distribution of the predictions error in order to rapidly adapt or replace the predictive model. Several proposals have been presented in the literature for the detection of concept changes based on the error rate of the predictive models. In general, the error rate distribution grounds most of the approaches based on sequential analysis and statistical process control, or by monitoring distributions using sliding windows, which assume the prediction errors are generated independently. However, empirical studies have shown that the error rate can be influenced by temporal dependence. In addition, new approaches considering dynamical system tools have been proposed for concept drift detection in unsupervised scenarios containing temporal dependencies. Motivated by these approaches, this article proposes the Spectral Entropy Drift Detector (SEDD), which is based on Spectral Entropy, Bernoulli Map and on the surrogate stability concept. Experimental results using abrupt and gradual concept drift versions of different dataset generators as well as real-world data streams, run in the Massive Online Analysis (MOA) framework, suggest that SEDD was competitive with the state-of-the-art methods, especially when considering accuracy and false alarms.

Introduction

A data stream consists of an open-ended amount of input data which arrive sequentially and continuously along time (Gama & Gaber, 2007). For dealing with data stream mining, learning algorithms must be capable of handling concept drift, a phenomenon inherent to the dynamics of the data distribution over time (Tsymbal, 2004). In this context, learning algorithms may combine a classifier with a drift detector to identify changes in the distribution of the data in order to rapidly adapt or replace the predictive model.

Assuming that the only information received by the detector is the feedback of the prediction, in general 0 if the prediction is correct or 1 if it is incorrect, one can consider the error stream as a sequence of Bernoulli trials (Ross, Adams, Tasoulis, & Hand, 2012). In fact, some drift detectors reduce the problem of change detection to the problem of identifying significant changes in the estimate distribution of the error rate considering the hypothesis of independent observations (Ross et al., 2012, Gama et al., 2014, Žliobaite et al., 2015).

Several proposals have been presented in the literature for the detection of concept changes based on error rate of predictive models (Barros & Santos, 2018). In general, the non-stationarity of the error rate distribution grounds most of the approaches where the detectors signal distribution changes by evaluating the accuracy of the predictions by techniques derived from the sequential analysis, statistical process control or by monitoring distributions using sliding windows (Gama et al., 2014).

Despite the efficiency of many detectors based on the evaluation of distribution of the error rate, empirical studies have shown that the error rate can be influenced by temporal dependence (Žliobaite et al., 2015, Bifet, 2017). This effect is related to the propagation of performance because of successive incremental updates of the model (Bifet, 2017). Considering that a prediction needs to be made for each observation over time, a sequence of models $h_{1}, \dots, h_{i}, \dots$ are constructed incrementally, taking into account all or a subset of previous model $h_{i - 1}$ , previous observations $X_{1}, \dots, X_{i - 1}$ , and true labels $y_{1}, \dots, y_{i - 1}$ , defined as $h_{i} = f (h_{i - 1}, X_{1}, \dots, X_{i - 1}, y_{1}, \dots, y_{i - 1})$ , where f is the algorithm for model update (Žliobaite et al., 2015). Thereby, relevant information of the temporal dependence among the prediction errors are disregarded when only the non-stationarity of the error rate distribution is assessed.

Recently, new approaches considering dynamical system tools have been proposed for concept drift detection in data streams containing temporal dependence (Vallim and Mello, 2014, Costa et al., 2016, Costa et al., 2017). These studies suggest that concept drift must be characterized by temporal relationships among the data, and concept drift detection can be improved by analyzing data dependencies in a phase space (Takens, 1981) as well as by mean of divergence between power spectrum graphs produced from time-shift windows (Vallim & Mello, 2014). Thus, these approaches take into account possible time dependencies, assuming that the observations are not necessarily independent and identically distributed (i.i.d.).

Motivated by those new approaches, we here assess the error stream in an alternative way. Basically, we map the error stream into a time series in order to obtain more information about the temporal relationships among the error predictions considering the Bernoulli Map (Ott, 2002). Then, we use Spectral Entropy (Powell and Percival, 1979, Inouye et al., 1991) to evaluate divergences between power spectrum graphs, rather than the error stream distribution in order to identify concept drift.

We evaluated the feasibility of this new approach to concept drift detection using artificial and real-world datasets with base learners Naive Bayes (John & Langley, 1995) and Hoeffding Tree (Hulten, Spencer, & Domingos, 2001), available in the Massive Online Analysis (MOA) framework (Bifet, Holmes, Kirkby, & Pfahringer, 2010). In the artificial scenarios, the experiments considered both abrupt and gradual drifts, aiming to assess the detectors in different time intervals of concept transition. In addition to accuracy, the following metrics were used to assess the algorithms: Missed Detection Rate (MDR), Mean Time to Detection (MTD), Mean Time between False Alarms (MTFA), and Mean Time Ratio (MTR) (Basseville and Nikiforov, 1993, Bifet, 2017).

The rest of this paper is organized as follows: Section 2 surveys the published literature on concept drift detection methods; Section 3 presents the Bernoulli Shift Map and explains the algorithm to drift detection; Section 4 presents brief descriptions of the datasets and details the experimental setup; Section 5 reports the results and discusses the corresponding statistical evaluation; and, finally, Section 6 draws conclusions and presents future work.

Section snippets

Related work: drift detection methods

Different approaches have been proposed in the literature to learn from data streams containing concept drift (Barros and Santos, 2018, Barros and Santos, 2019). Likewise, drift detectors use distinct strategies to assess the performance of the base learner and signal concept drifts. In this context, a common configuration is to use a concept drift detector to analyze the error stream from a base learner such that a new instance of the base learner replaces the previous classifier whenever an

Handling temporal dependence on error streams

Despite the fact that the error rate can be influenced by temporal dependence, most approaches presented in Section 2 have assumed that the error stream is composed of a set of statistically independent observations. Consequently, no theoretical guarantees can be provided in these previous approaches (Žliobaite et al., 2015, Bifet, 2017). Inspired on Vallim and Mello, 2014, Costa et al., 2016, Costa et al., 2017, this work considers the possibility of temporal dependence on the errors produced

Experimental settings

This section describes the experiments designed to evaluate the SEDD algorithm, comparing it to the following drift detectors: CUSUM, derived from the sequential analysis; ADWIN, STEPD, FTDD, ${HDDM}_{W}$ and FHDDM, based on distribution tracking using sliding windows; DDM and RDDM, categorized as statistical process control based methods. All the concept drift detection methods have been analyzed with synthetic and real-world datasets.

Synthetic datasets provide a controlled scenario and allows us to

Experimental results and analysis

This section presents the results of the experiments carried out to compare a number of detection algorithms using several statistical evaluations and to assess the accuracy and concept drift detections of the methods over the artificial dataset configurations, as well as the accuracy in real-world datasets, using the NB and HT base learners.

Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8 show the results obtained using HT and the drift detectors considering the $Agrawal_B$

Conclusion

Inspired by new approaches considering dynamical system tools (Costa et al., 2016, Costa et al., 2017) and by the surrogate stability concept proposed by Vallim and Mello (2014) for concept drift detection in data streams, this article proposed SEDD, a new method that takes into account temporal dependencies on error streams and is not restricted to assumptions about data being i.i.d. Specifically, SEDD maps the temporal relationships among the error predictions into a time series using the

CRediT authorship contribution statement

Rohgi Toshio Meneses Chikushi: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing - original draft. Roberto Souto Maior de Barros: Conceptualization, Validation, Writing - review & editing, Supervision. Marilu Gomes N. Monte da Silva: Conceptualization, Supervision. Bruno Iran Ferreira Maciel: Software, Validation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

Rohgi Chikushi is and Bruno Maciel was a PhD student, and both were supported by postgraduate grants from CAPES. Roberto S. M. Barros is supported by research Grant No. 310092/2019-1 from CNPq.

References (53)

R.S.M. Barros et al.
RDDM: Reactive drift detection method
Expert Systems with Applications
(2017)
R.S.M. Barros et al.
Wilcoxon rank sum test drift detector
Neurocomputing
(2018)
R.S.M. Barros et al.
A large-scale comparison of concept drift detectors
Information Sciences
(2018)
R.S.M. Barros et al.
An overview and comprehensive comparison of ensembles for concept drift
Information Fusion
(2019)
D.R.L. Cabral et al.
Concept drift detection based on Fisher’s Exact test
Information Sciences
(2018)
F.G. Costa et al.
Multidimensional surrogate stability to detect data stream concept drift
Expert Systems with Applications
(2017)
T. Inouye et al.
Quantification of EEG irregularity by use of the entropy of the power spectrum
Electroencephalography and clinical Neurophysiology
(1991)
I. Oztürk et al.
Digitally generating true orbits of binary shift chaotic maps and their conjugates
Communications in Nonlinear Science and Numerical Simulation
(2018)
G.J. Ross et al.
Exponentially weighted moving average charts for detecting concept drift
Pattern Recognition Letters
(2012)
S.G.T.C. Santos et al.
A differential evolution based method for tuning concept drift detectors in data streams
Information Science
(2019)

J. Theiler et al.

Testing for nonlinearity in time series: the method of surrogate data

Physica D: Nonlinear Phenomena

(1992)

L. Trulla et al.

Recurrence quantification analysis of the logistic equation with transients

Physics Letters A

(1996)

R.M.M. Vallim et al.

Proposal of a new stability concept to detect changes in unsupervised data streams

Expert Systems with Applications

(2014)

R. Agrawal et al.

Database mining: a performance perspective

IEEE Transactions on Knowledge and Data Engineering

(1993)

Baena-García, M., Del Campo-Ávila, J., Fidalgo, R., Bifet, A., Gavaldà, R., & Morales-Bueno, R. (2006). Early drift...

R.S.M. Barros et al.

A boosting-like online learning ensemble

M. Basseville et al.

Detection of Abrupt Changes: Theory and Application

(1993)

A. Bifet

Classifier concept drift detection and the illusion of progress

A. Bifet et al.

Learning from time-changing data with adaptive windowing

A. Bifet et al.

MOA: massive online analysis

Journal of Machine Learning Research

(2010)

Y. Cao et al.

Detecting dynamical changes in time series using the permutation entropy

Physical Review E

(2004)

F.G. Costa et al.

Using dynamical systems tools to detect concept drift in data streams

Expert Systems with Applications

(2016)

A.P. Dawid

Present position and potential developments: Some personal views statistical theory the prequential approach

Journal of the Royal Statistical Society: Series A (General)

(1984)

J. Demsar

Statistical comparisons of classifiers over multiple data sets

Journal of Machine Learning Research

(2006)

R.A. Fisher

On the interpretation of $χ^{2}$ from contingency tables, and the calculation of $p$

Journal of the Royal Statistical Society

(1922)

I. Frías-Blanco et al.

Online and non-parametric drift detection methods based on hoeffding’s bounds

IEEE Transactions on Knowledge and Data Engineering

(2015)

Cited by (13)

Unveiling dynamics changes: Singular spectrum analysis-based method for detecting concept drift in industrial data streams
2024, Knowledge-Based Systems
Industrial data streams frequently experience concept drifts. Current drift detection methods, focusing on prediction performance or data distribution, often neglect temporal dependencies and require prior distribution assumptions. These limitations have catalyzed the development of dynamic theory-based approaches that identify drifts by discerning variations in data dynamics. However, the majority of these exhibit high complexity in dynamics characterization and still underperform in industrial drift detection. To overcome these challenges, we propose a singular spectrum analysis-based drift detection method for industry data streams, comprising three modules: common dynamics extraction, noise-robust drift detection, and threshold adaptation. Our approach uses singular spectrum analysis (SSA) as the dynamics representation method instead of traditional high-dimensional phase space embedding and Fourier transform. SSA allows for the efficient description of principal dynamics by intelligently mixing the Fourier basis functions according to given data. The first two modules are specifically engineered to address the common dynamics and stochastic noise inherent in industrial data streams through a well-designed online common dynamics extraction algorithm and a noise-robust detection index respectively. Threshold adaptation module provides a more reasonable way of updating the drift threshold based on the extreme value theory, ensuring global consistency for parameter estimation of extreme value distribution. Our proposed method was tested on simulated examples and a real-world sintering process, exhibiting enhanced drift detection performance.
A survey on machine learning for recurring concept drifting data streams
2023, Expert Systems with Applications
Citation Excerpt :
The fast growth of research works highlighting the importance and proposing new drift detection mechanisms has recently triggered the publication of several works surveying and benchmarking the main explicit drift detectors. Angelopoulos et al. (2021) and Chikushi, De Barros, da Silva, and Maciel (2021) benchmarked the impact of different detectors across different state-of-the-art incremental classifiers. Gonçalves et al. (2014) and Pesaranghader and Viktor (2016), compared state-of-the-art concept drift detectors.
The problem of concept drift has gained a lot of attention in recent years. This aspect is key in many domains exhibiting non-stationary as well as cyclic patterns and structural breaks affecting their generative processes. In this survey, we review the relevant literature to deal with regime changes in the behaviour of continuous data streams. The study starts with a general introduction to the field of data stream learning, describing recent works on passive or active mechanisms to adapt or detect concept drifts, frequent challenges in this area, and related performance metrics. Then, different supervised and non-supervised approaches such as online ensembles, meta-learning and model-based clustering that can be used to deal with seasonalities in a data stream are covered. The aim is to point out new research trends and give future research directions on the usage of machine learning techniques for data streams which can help in the event of shifts and recurrences in continuous learning scenarios in near real-time.
From concept drift to model degradation: An overview on performance-aware drift detectors
2022, Knowledge-Based Systems
The dynamicity of real-world systems poses a significant challenge to deployed predictive machine learning (ML) models. Changes in the system on which the ML model has been trained may lead to performance degradation during the system’s life cycle. Recent advances that study non-stationary environments have mainly focused on identifying and addressing such changes caused by a phenomenon called concept drift. Different terms have been used in the literature to refer to the same type of concept drift and the same term for various types. This lack of unified terminology is set out to create confusion on distinguishing between different concept drift variants. In this paper, we start by grouping concept drift types by their mathematical definitions and survey the different terms used in the literature to build a consolidated taxonomy of the field. We also review and classify performance-based concept drift detection methods proposed in the last decade. These methods utilize the predictive model’s performance degradation to signal substantial changes in the systems. The classification is outlined in a hierarchical diagram to provide an orderly navigation between the methods. We present a comprehensive analysis of the main attributes and strategies for tracking and evaluating the model’s performance in the predictive system. The paper concludes by discussing open research challenges and possible research directions.
Review on novelty detection in the non-stationary environment
2024, Knowledge and Information Systems
Improved chaotic sparrow search algorithm and application based on Gaussian cloud
2023, Guangdianzi Jiguang/Journal of Optoelectronics Laser
CPPE: An Improved Phasmatodea Population Evolution Algorithm with Chaotic Maps
2023, Mathematics

View all citing articles on Scopus

View full text

Using spectral entropy and bernoulli map to handle concept drift

Highlights

Abstract

Introduction

Section snippets

Related work: drift detection methods

Handling temporal dependence on error streams

Experimental settings

Experimental results and analysis

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgements

Expert Systems with Applications

Neurocomputing

Information Sciences

Information Fusion

Information Sciences

Expert Systems with Applications

Electroencephalography and clinical Neurophysiology

Communications in Nonlinear Science and Numerical Simulation

Pattern Recognition Letters

Information Science

Physica D: Nonlinear Phenomena

Physics Letters A

Expert Systems with Applications

Database mining: a performance perspective

IEEE Transactions on Knowledge and Data Engineering

A boosting-like online learning ensemble

Detection of Abrupt Changes: Theory and Application

Classifier concept drift detection and the illusion of progress

Learning from time-changing data with adaptive windowing

MOA: massive online analysis

Journal of Machine Learning Research

Detecting dynamical changes in time series using the permutation entropy

Physical Review E

Using dynamical systems tools to detect concept drift in data streams

Expert Systems with Applications

Present position and potential developments: Some personal views statistical theory the prequential approach

Journal of the Royal Statistical Society: Series A (General)

Statistical comparisons of classifiers over multiple data sets

Journal of Machine Learning Research

On the interpretation of χ2 from contingency tables, and the calculation of p

Journal of the Royal Statistical Society

Online and non-parametric drift detection methods based on hoeffding’s bounds

IEEE Transactions on Knowledge and Data Engineering

On the interpretation of $χ^{2}$ from contingency tables, and the calculation of $p$