Elsevier

Computers in Industry

Volume 142, November 2022, 103714
Computers in Industry

Time domain graph-based anomaly detection approach applied to a real industrial problem

https://doi.org/10.1016/j.compind.2022.103714Get rights and content

Highlights

  • A graph-based approach to anomaly detection is presented.

  • This work shows that dominating set incorporates statistical properties of a data set.

  • The concept of a space coverage to encapsulate a data set, using dominating sets, is presented.

  • The new approach brings results similar to deep autoencoders, without the need to extensively set parameters.

Abstract

Detecting anomalies in industrial processes is a critical task. Prior fault detection can reduce company costs, and most importantly, may prevent accidents and environmental damage. Anomaly detection can be treated as drift detection, as both aims at identifying changes in data that happen unexpectedly over time. In this paper, a graph-based approach to anomaly detection is presented. Based on graph theory and set coverage, the method brings results similar to deep autoencoders, without the need to extensively set parameters.

Introduction

Supervised methods for anomaly detection (Pang et al., 2020, Modenesi and Braga, 2009) assume that statistical profiles of abnormalities have been sampled in advance, so that a separator model can be induced. This approach is not always possible, since it may be unfeasible to fully observe the space that characterizes abnormalities, particularly the most severe ones, imposing difficulties for learning dichotomous models of anomaly detection. When little prior information about anomalies exists, detection using single class approaches (Alvarenga et al., 2021) is an alternative. Regular operation behavior and drifts in sampled data may be easier to observe and to detect once process engineers, maintenance staff and operators provide information about plant operation conditions. From this perspective, anomaly detection can be treated as a drift detection problem (Takahashi and Braga, 2020, Gama et al., 2014). Therefore, changes in data distribution, associated with some learned threshold, may establish the limits between acceptable and unacceptable fluctuations in observed data.

The presented drift detection method is based on information extracted from the data set through a Gabriel graph (GG) (Gabriel and Sokal, 1969, Torres et al., 2015) and its dominating set (S) (Haynes et al., 1998). The principle of the presented method is based on the idea that S represents a skeleton of the data by uncovering the most influential observations within a time frame (Sun et al., 2017). Graph-based methods have been applied previously to extract structural information from data in different scenarios, particularly in the framework of complex networks (Wu et al., 2018).

This work shows that dominating set incorporates statistical properties of the data set sampled within a time frame. Drifts are observed from frame to frame as changes in the yielded dominating sets. The corresponding elements of S, which are key points of the frame structure, grant some level of representativeness to the outcome, since they correspond to samples observed in different timestamps.

Deep autoencoder approaches to data set drifts and anomaly detection are based on similar principles, as drifts are considered deviations in temporal data. In contrast to the presented method, instead of learning an explicit data structure within time frames in input space, autoencoders learn mapping functions induced by a loss function in an optimization process. In light of the recent success of autoencoders in anomaly detection applications (Cook et al., 2020, Alhajri et al., 2019, Zhou and Paffenroth, 2017, Alhajri et al., 2019, Bulusu et al., 2020), it is also considered in this paper as a model for anomaly detection, providing a base for general performance comparison.

Statistical based approaches are also applied to anomaly detection problems. In this work, the based method described in Modenesi and Braga (2009) — here named as PCA-WC — is used for comparison. The method relies on weighting every principal component, estimated in the training set, by the respective variance, to evaluate the degree of novelty of a test sample.

The methods above are applied to a real industrial problem which contains drifts and tagged faults.

The paper is organized as follows: in Section 2 the Real Industrial data set is detailed. Section 3 presents all basic used principles and theorems of graph theory. Our proposed approach is presented in Section 4. Section 5 presents the results. At last, conclusions is presented in Section 6.

Section snippets

Anomaly detection in a real industrial setting

The data was collected from a real industrial process of a large mining company and are related to an iron ore pulp pump employed in a flotation unit. This pump is a crucial element of the production process and plays a major role as the main equipment to transfer ore concentrate from the flotation tank to the subsequent production steps. Aiming at such equipment is crucial, since a typical mining industry has hundreds of production key pumps, often lacking predictive failure detection systems

Background

The basic principles and previous works which are important for introducing the method are presented next.

The proposed approach

The principle of the method is that drifts and anomalies can be observed as changes in time frame structures of data. Structural information is represented by a Gabriel graph and its dominating set, which contains the most influential samples within the spatial representation of the data. Using two windows with data, one as reference and the other with most recent data, Gabriel graphs and their respective dominating sets are created, then a metric based on Euclidean distance(s) is used to

Experiments

In order to validate the proposed method, tests with the real dataset of an industrial process, presented in Section 2, were carried out. Three tests in total are presented in this section: an anomaly detection with the proposed metric, a comparative test with the proposed metric and an autoencoder, and a test to assess the parameter sensibility of the window length and threshold. In all experiments, the proposed method was used in real time, with a new distance drs calculated for every new

Conclusions

The distance metric presented in the paper was able to capture the essence of observable drifts in graph structure. Experiments show that its outcome is quite similar to the one yielded by the autoencoder, so the consistency of the results point out the robustness of the principle. Graph-based approaches in machine learning have often issues related to their computational costs, however, they are particularly suitable for the current problem, since sliding window graphs can be obtained

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work was supported by the following Brazilian research funding agencies: Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG), and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).

References (31)

  • Eberle, W., Holder, L., 2009. Mining for insider threats in business transactions and processes 2009 IEEE Symposium on...
  • K.R. Gabriel et al.

    A new statistical approach to geographic variation analysis

    Syst. Zool.

    (1969)
  • J. Gama et al.

    A survey on concept drift adaptation

    ACM Comput. Surv. (CSUR)

    (2014)
  • I. Goldenberg et al.

    Survey of distance measures for quantifying concept drift and shift in numeric data

    Knowl. Inf. Syst.

    (2018)
  • T.W. Haynes et al.

    Fundamentals of Domination in Graphs

    (1998)
  • Cited by (0)

    View full text