Time domain graph-based anomaly detection approach applied to a real industrial problem
Introduction
Supervised methods for anomaly detection (Pang et al., 2020, Modenesi and Braga, 2009) assume that statistical profiles of abnormalities have been sampled in advance, so that a separator model can be induced. This approach is not always possible, since it may be unfeasible to fully observe the space that characterizes abnormalities, particularly the most severe ones, imposing difficulties for learning dichotomous models of anomaly detection. When little prior information about anomalies exists, detection using single class approaches (Alvarenga et al., 2021) is an alternative. Regular operation behavior and drifts in sampled data may be easier to observe and to detect once process engineers, maintenance staff and operators provide information about plant operation conditions. From this perspective, anomaly detection can be treated as a drift detection problem (Takahashi and Braga, 2020, Gama et al., 2014). Therefore, changes in data distribution, associated with some learned threshold, may establish the limits between acceptable and unacceptable fluctuations in observed data.
The presented drift detection method is based on information extracted from the data set through a Gabriel graph (GG) (Gabriel and Sokal, 1969, Torres et al., 2015) and its dominating set (S) (Haynes et al., 1998). The principle of the presented method is based on the idea that S represents a skeleton of the data by uncovering the most influential observations within a time frame (Sun et al., 2017). Graph-based methods have been applied previously to extract structural information from data in different scenarios, particularly in the framework of complex networks (Wu et al., 2018).
This work shows that dominating set incorporates statistical properties of the data set sampled within a time frame. Drifts are observed from frame to frame as changes in the yielded dominating sets. The corresponding elements of S, which are key points of the frame structure, grant some level of representativeness to the outcome, since they correspond to samples observed in different timestamps.
Deep autoencoder approaches to data set drifts and anomaly detection are based on similar principles, as drifts are considered deviations in temporal data. In contrast to the presented method, instead of learning an explicit data structure within time frames in input space, autoencoders learn mapping functions induced by a loss function in an optimization process. In light of the recent success of autoencoders in anomaly detection applications (Cook et al., 2020, Alhajri et al., 2019, Zhou and Paffenroth, 2017, Alhajri et al., 2019, Bulusu et al., 2020), it is also considered in this paper as a model for anomaly detection, providing a base for general performance comparison.
Statistical based approaches are also applied to anomaly detection problems. In this work, the based method described in Modenesi and Braga (2009) — here named as PCA-WC — is used for comparison. The method relies on weighting every principal component, estimated in the training set, by the respective variance, to evaluate the degree of novelty of a test sample.
The methods above are applied to a real industrial problem which contains drifts and tagged faults.
The paper is organized as follows: in Section 2 the Real Industrial data set is detailed. Section 3 presents all basic used principles and theorems of graph theory. Our proposed approach is presented in Section 4. Section 5 presents the results. At last, conclusions is presented in Section 6.
Section snippets
Anomaly detection in a real industrial setting
The data was collected from a real industrial process of a large mining company and are related to an iron ore pulp pump employed in a flotation unit. This pump is a crucial element of the production process and plays a major role as the main equipment to transfer ore concentrate from the flotation tank to the subsequent production steps. Aiming at such equipment is crucial, since a typical mining industry has hundreds of production key pumps, often lacking predictive failure detection systems
Background
The basic principles and previous works which are important for introducing the method are presented next.
The proposed approach
The principle of the method is that drifts and anomalies can be observed as changes in time frame structures of data. Structural information is represented by a Gabriel graph and its dominating set, which contains the most influential samples within the spatial representation of the data. Using two windows with data, one as reference and the other with most recent data, Gabriel graphs and their respective dominating sets are created, then a metric based on Euclidean distance(s) is used to
Experiments
In order to validate the proposed method, tests with the real dataset of an industrial process, presented in Section 2, were carried out. Three tests in total are presented in this section: an anomaly detection with the proposed metric, a comparative test with the proposed metric and an autoencoder, and a test to assess the parameter sensibility of the window length and threshold. In all experiments, the proposed method was used in real time, with a new distance drs calculated for every new
Conclusions
The distance metric presented in the paper was able to capture the essence of observable drifts in graph structure. Experiments show that its outcome is quite similar to the one yielded by the autoencoder, so the consistency of the results point out the robustness of the principle. Graph-based approaches in machine learning have often issues related to their computational costs, however, they are particularly suitable for the current problem, since sliding window graphs can be obtained
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This work was supported by the following Brazilian research funding agencies: Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG), and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).
References (31)
- et al.
Online learning of neural networks using random projections and sliding window: A case study of a real industrial process
Eng. Appl. Artif. Intell.
(2021) - et al.
Width optimization of rbf kernels for binary classification of support vector machines: a density estimation-based approach
Pattern Recognit. Lett.
(2019) - et al.
Dominating complex networks by identifying minimum skeletons
Chaos, Solitons Fractals
(2017) Bootstrap tests for distributional treatment effects in instrumental variable models
J. Am. Stat. Assoc.
(2002)- et al.
Survey for anomaly detection of iot botnets using machine learning auto-encoders
Int. J. Appl. Eng. Res.
(2019) - Bhattacharjee, S.D., Yuan, J., Jiaqi, Z., Tan, Y.P., 2017. Context-aware graph-based analysis for detecting anomalous...
- Bulusu, S., Kailkhura, B., Li, B., Varshney, P.K., Song, D., 2020. Anomalous Instance Detect. Deep Learn.: A Surv.,...
- Chollet, F., et al.,...
- et al.
Anomaly detection for iot time-series data: A survey
IEEE Internet Things J.
(2020) - Dai, H., Zhu, F., Lim, E.P., Pang, H., 2012. Detecting anomalies in bipartite graphs with mutual dependency principles...