Layer-constrained variational autoencoding kernel density estimation model for anomaly detection

https://doi.org/10.1016/j.knosys.2020.105753Get rights and content

Abstract

Unsupervised techniques typically rely on the probability density distribution of the data to detect anomalies, where objects with low probability density are considered to be abnormal. However, modeling the density distribution of high dimensional data is known to be hard, making the problem of detecting anomalies from high-dimensional data challenging. The state-of-the-art methods solve this problem by first applying dimension reduction techniques to the data and then detecting anomalies in the low dimensional space. Unfortunately, the low dimensional space does not necessarily preserve the density distribution of the original high dimensional data. This jeopardizes the effectiveness of anomaly detection. In this work, we propose a novel high dimensional anomaly detection method called LAKE. The key idea of LAKE is to unify the representation learning capacity of layer-constrained variational autoencoder with the density estimation power of kernel density estimation (KDE). Then a probability density distribution of the high dimensional data can be learned, which is able to effectively separate the anomalies out. LAKE successfully consolidates the merits of the two worlds, namely layer-constrained variational autoencoder and KDE by using a probability density-aware strategy in the training process of the autoencoder. Extensive experiments on six public benchmark datasets demonstrate that our method significantly outperforms the state-of-the-art methods in detecting anomalies and achieves up to 37% improvement in F1 score.

Introduction

Anomaly detection is a fundamental and hence well-studied problem in many areas, including cyber-security [1], manufacturing [2], system management [3], and medicine [4]. The core of anomaly detection is density estimation whether it is high-dimensional data or multi-dimensional data. In general, normal data is large and consistent with certain distribution, while abnormal data is small and discrete, therefore anomalies are residing in low density areas.

Although excellent progress have been achieved in anomaly detection in the past decades, anomaly detection of complex and high-dimensional data remains to be a challenge. It is hard to implement density estimation in original data space with the increasing of dimensionality, because as the data dimension increases, noise and extraneous features have a more negative impact on density estimation. But unfortunately for a real-world problem, the dimensionality of data could be very large, such as video surveillance [5], medical anomaly detection [6], and cyber-intrusion detection [7]. To address this issue, a two-step approach [4] is usually applied and has proved to be successful. It first reduces the dimensionality of data and then adopt density estimation in the latent low-dimensional space. Additionally, spectral anomaly detection [8], [9], [10] and alternative dimensionality reduction [11], [12], [13] techniques are implemented to find the lower dimensional representation of the original high-dimensional data, where anomalies and normal instances are expected to be separated from each other. However, the low dimensional space does not necessarily preserve the density distribution of the original data, and thus it is not able to effectively identify the anomalies in high-dimensional data by estimating the density in low-dimensional space.

Recently, deep learning has achieved great success in anomaly detection [7]. Autoencoder [14] and a range of variants have been widely used for unsupervised anomaly detection, such as deep autoencoder, variational autoencoder (VAE) [15], and adversarial autoencoder (AAE) [16]. The core idea of these methods is to encode input data into a low dimensional representation, and then decode the low dimensional representation into the original data space by minimizing the reconstruction error. In this process, the essential features of the original data are extracted in latent data space through training autoencoder, without noise and unnecessary features. Several recent studies have applied this structure into practical problems, but yet there remains largely unexplored. For example, the feature descriptor is to use an autoencoder to learn robust features for human appearance in the study of re-identification [17], [18], [19]. In the study of anomaly detection, AnoGan [6] uses adversarial autoencoder to detect anomaly in image data. But it only takes advantage of the reconstruction error and does not make full use of the low-dimensional representation. ALAD [20] considers both data distribution and reconstruction error based on bi-directional GANs, which derives adversarially learned features for the anomaly detection task. Nevertheless, ALAD still only uses reconstruction errors based on the adversarially learned features to determine if a data sample is anomalous. DAGMM [21] combines deep autoencoder and Gaussian mixture model (GMM) in anomaly detection. However, the real-world data may not only have high dimensions, but also is lacking of a clear predefined distribution. Manual parameter adjustment is also required in GMM when modeling the density distribution of input data, which has a serious impact on detection performance.

Furthermore, as the example shown in Fig. 1, although the anomalous points are separated from the normal points in the low-dimensional representation space using autoencoder model, the distribution of normal data may be arbitrary, rather than one kind of prior distribution (e.g., GMM). On the other hand, some anomaly data may show the distribution of dense clusters. This is an intractable problem both for neighbor-based and energy-based anomaly detection methods. Additionally, there are always some normal points discretely distributed near normal dense clusters in space. These factors also pose severe challenges for anomaly detecting from large-scale high-dimensional data.

In this paper, we propose a novel Layer-constrained variational Autoencoding Kernel density Estimation model (LAKE), a deep learning framework that addresses the aforementioned challenges in anomaly detection from high-dimensional datasets. LAKE is a probability density-aware model, which unifies the presentation learning capacity of layer-constrained variational autoencoder with the density estimation power of KDE to provide a probability density estimation of high-dimensional data for effectively identifying anomalies.

On the one hand, we propose a layer-constrained variational autoencoder to obtain a low-dimensional representation of the input data which contains the nature of input data. Different from the standard VAE, layer-constrained variational autoencoder considers the reconstruction errors on all corresponding layers of the encoder and decoder and keeps KL divergence unchanged. Since layer-constrained variational autoencoder takes account of both reconstruction error and the distribution of data in the latent data space, the density distribution of high dimensional data is preserved in low dimensional representation. On the other hand, LAKE uses KDE to estimate the probability density distribution of training data. Unlike DAGMM, which needs to manually specify the number of mixed Gaussian models, LAKE can model arbitrary distributed data sophisticatedly. We even flexibly choose kernel function in the KDE model to appropriately simulate the probability density distribution of data. As layer-constrained VAE encodes input data into low-dimensional representations while preserving the key features of input data, the one with a high density value is more likely to be a normal object, while the low one is considered to be an abnormal object.

However, as shown in Fig. 1, some abnormal objects may form a dense cluster due to their common anomalous characteristics. Such abnormal objects may not be detected by simply applying density estimation based on global data, because there are always some normal objects fall in the distribution margin discretely. But fortunately, for each individual abnormal object, it can be easily distinguished from the density distribution of sampled training data separately by estimating its density value in the trained KDE model. Therefore, we propose a probability density estimation strategy in the training and testing process. Specifically, we use sampled training data to learn a probability density distribution in LAKE. In terms of testing, we estimate the density value for each data object separately based on the trained probability density distribution.

Extensive experiments on six public benchmark datasetsdemonstrate that LAKE has superior performance compared to the state-of-the-art models, with up to 37% improvement in standard F1 score for anomaly detection. It is worth noting that LAKE achieves better results with fewer training samples compared to existing methods based on deep learning.

To summarize, we make the following contributions:

  • We propose a layer-constrained variational autoencoding kernel density estimation model for anomaly detection from high-dimensional datasets.

  • We propose a probability density-aware strategy that learns a probability density distribution of the high-dimensional data in the training process that is able to effectively detect abnormal objects in the testing.

  • We conduct extensive evaluations on six benchmarkdatasets. Experimental results demonstrate that our method significantly outperforms the state-of-the-art methods.

Section snippets

Related work

Varieties of research focus on anomaly detection in data mining and machine learning [22]. Distance-based anomaly detection [23] uses global density criterion to detect anomalies.Density-based methods [24], [25] aim to detect local outliers, and thus they use local relative density as anomaly criterion. Several studies [26], [27], [28] apply KDE into density-based local outlier detection to improve the detection accuracy. However, such methods rely on an appropriate distance metric, which is

The proposed LAKE model

Experiments

In this section, we use six public benchmark datasets to evaluate the effectiveness and robustness of our proposed model in anomaly detection. The code of the baseline methods is available at GitHub1 released by ALAD. The source code of our proposed method is available at GitHub.2

Conclusion

In this paper, we propose a layer-constrained variational autoencoding kernel density estimation model (LAKE) for anomaly detection from high-dimensional data. LAKE mainly consists of two parts: the compression network and the KDE model. The compression network obtains a low-dimensional representation while retaining the key features using a layer-constrained variational autoencoder. The KDE model takes the low-dimensional representation and reconstruction error features as feeds, and learns a

CRediT authorship contribution statement

Peng Lv: Methodology, Software, Data curation, Visualization, Writing - original draft. Yanwei Yu: Conceptualization, Supervision, Writing - review & editing, Funding acquisition. Yangyang Fan: Data curation, Investigation, Writing - review & editing. Xianfeng Tang: Validation, Writing - review & editing. Xiangrong Tong: Resources, Funding acquisition.

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments and helpful suggestions. This work is partially supported by the National Natural Science Foundation of China under Grant Nos.: 61773331, 61572418 and 61403328. The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing any funding agencies.

References (39)

  • ChandolaVarun et al.

    Anomaly detection: A survey

    ACM Comput. Surv.

    (2009)
  • Waqas Sultani, Chen Chen, Mubarak Shah, Real-world anomaly detection in surveillance videos, in: Proceedings of the...
  • SchleglThomas et al.

    Unsupervised anomaly detection with generative adversarial networks to guide marker discovery

  • ChalapathyRaghavendra et al.

    Deep learning for anomaly detection: A survey

    (2019)
  • IdéTsuyoshi et al.

    Eigenspace-based anomaly detection in computer systems

  • YuWeiren et al.

    On anomalous hotspot discovery in graph streams

  • FujimakiRyohei et al.

    An approach to spacecraft anomaly detection problem using kernel feature space

  • CandèsEmmanuel J. et al.

    Robust principal component analysis?

    J. ACM

    (2011)
  • ZhouChong et al.

    Anomaly detection with robust deep autoencoders

  • Cited by (23)

    • Quantification of anomalies in rats’ spinal cords using autoencoders

      2021, Computers in Biology and Medicine
      Citation Excerpt :

      Fitting a threshold to this quantification would also be easier, since they tended to have less offset. Although the baseline might have been stable in this and several comparable applications and should definitely be considered, AE- and VAE-based methods have been shown to be very powerful approaches Sun et al. [15] Lv, Yu, Fan, Tang and Tong [31] Uzunova et al. [6] and should not be dismissed for hand crafted features. One further advantage of AE based methods is their adaptability to solve more complex issues.

    View all citing articles on Scopus

    No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2020.105753.

    View full text