Layer-constrained variational autoencoding kernel density estimation model for anomaly detection☆
Introduction
Anomaly detection is a fundamental and hence well-studied problem in many areas, including cyber-security [1], manufacturing [2], system management [3], and medicine [4]. The core of anomaly detection is density estimation whether it is high-dimensional data or multi-dimensional data. In general, normal data is large and consistent with certain distribution, while abnormal data is small and discrete, therefore anomalies are residing in low density areas.
Although excellent progress have been achieved in anomaly detection in the past decades, anomaly detection of complex and high-dimensional data remains to be a challenge. It is hard to implement density estimation in original data space with the increasing of dimensionality, because as the data dimension increases, noise and extraneous features have a more negative impact on density estimation. But unfortunately for a real-world problem, the dimensionality of data could be very large, such as video surveillance [5], medical anomaly detection [6], and cyber-intrusion detection [7]. To address this issue, a two-step approach [4] is usually applied and has proved to be successful. It first reduces the dimensionality of data and then adopt density estimation in the latent low-dimensional space. Additionally, spectral anomaly detection [8], [9], [10] and alternative dimensionality reduction [11], [12], [13] techniques are implemented to find the lower dimensional representation of the original high-dimensional data, where anomalies and normal instances are expected to be separated from each other. However, the low dimensional space does not necessarily preserve the density distribution of the original data, and thus it is not able to effectively identify the anomalies in high-dimensional data by estimating the density in low-dimensional space.
Recently, deep learning has achieved great success in anomaly detection [7]. Autoencoder [14] and a range of variants have been widely used for unsupervised anomaly detection, such as deep autoencoder, variational autoencoder (VAE) [15], and adversarial autoencoder (AAE) [16]. The core idea of these methods is to encode input data into a low dimensional representation, and then decode the low dimensional representation into the original data space by minimizing the reconstruction error. In this process, the essential features of the original data are extracted in latent data space through training autoencoder, without noise and unnecessary features. Several recent studies have applied this structure into practical problems, but yet there remains largely unexplored. For example, the feature descriptor is to use an autoencoder to learn robust features for human appearance in the study of re-identification [17], [18], [19]. In the study of anomaly detection, AnoGan [6] uses adversarial autoencoder to detect anomaly in image data. But it only takes advantage of the reconstruction error and does not make full use of the low-dimensional representation. ALAD [20] considers both data distribution and reconstruction error based on bi-directional GANs, which derives adversarially learned features for the anomaly detection task. Nevertheless, ALAD still only uses reconstruction errors based on the adversarially learned features to determine if a data sample is anomalous. DAGMM [21] combines deep autoencoder and Gaussian mixture model (GMM) in anomaly detection. However, the real-world data may not only have high dimensions, but also is lacking of a clear predefined distribution. Manual parameter adjustment is also required in GMM when modeling the density distribution of input data, which has a serious impact on detection performance.
Furthermore, as the example shown in Fig. 1, although the anomalous points are separated from the normal points in the low-dimensional representation space using autoencoder model, the distribution of normal data may be arbitrary, rather than one kind of prior distribution (e.g., GMM). On the other hand, some anomaly data may show the distribution of dense clusters. This is an intractable problem both for neighbor-based and energy-based anomaly detection methods. Additionally, there are always some normal points discretely distributed near normal dense clusters in space. These factors also pose severe challenges for anomaly detecting from large-scale high-dimensional data.
In this paper, we propose a novel Layer-constrained variational Autoencoding Kernel density Estimation model (LAKE), a deep learning framework that addresses the aforementioned challenges in anomaly detection from high-dimensional datasets. LAKE is a probability density-aware model, which unifies the presentation learning capacity of layer-constrained variational autoencoder with the density estimation power of KDE to provide a probability density estimation of high-dimensional data for effectively identifying anomalies.
On the one hand, we propose a layer-constrained variational autoencoder to obtain a low-dimensional representation of the input data which contains the nature of input data. Different from the standard VAE, layer-constrained variational autoencoder considers the reconstruction errors on all corresponding layers of the encoder and decoder and keeps KL divergence unchanged. Since layer-constrained variational autoencoder takes account of both reconstruction error and the distribution of data in the latent data space, the density distribution of high dimensional data is preserved in low dimensional representation. On the other hand, LAKE uses KDE to estimate the probability density distribution of training data. Unlike DAGMM, which needs to manually specify the number of mixed Gaussian models, LAKE can model arbitrary distributed data sophisticatedly. We even flexibly choose kernel function in the KDE model to appropriately simulate the probability density distribution of data. As layer-constrained VAE encodes input data into low-dimensional representations while preserving the key features of input data, the one with a high density value is more likely to be a normal object, while the low one is considered to be an abnormal object.
However, as shown in Fig. 1, some abnormal objects may form a dense cluster due to their common anomalous characteristics. Such abnormal objects may not be detected by simply applying density estimation based on global data, because there are always some normal objects fall in the distribution margin discretely. But fortunately, for each individual abnormal object, it can be easily distinguished from the density distribution of sampled training data separately by estimating its density value in the trained KDE model. Therefore, we propose a probability density estimation strategy in the training and testing process. Specifically, we use sampled training data to learn a probability density distribution in LAKE. In terms of testing, we estimate the density value for each data object separately based on the trained probability density distribution.
Extensive experiments on six public benchmark datasetsdemonstrate that LAKE has superior performance compared to the state-of-the-art models, with up to 37% improvement in standard score for anomaly detection. It is worth noting that LAKE achieves better results with fewer training samples compared to existing methods based on deep learning.
To summarize, we make the following contributions:
- •
We propose a layer-constrained variational autoencoding kernel density estimation model for anomaly detection from high-dimensional datasets.
- •
We propose a probability density-aware strategy that learns a probability density distribution of the high-dimensional data in the training process that is able to effectively detect abnormal objects in the testing.
- •
We conduct extensive evaluations on six benchmarkdatasets. Experimental results demonstrate that our method significantly outperforms the state-of-the-art methods.
Section snippets
Related work
Varieties of research focus on anomaly detection in data mining and machine learning [22]. Distance-based anomaly detection [23] uses global density criterion to detect anomalies.Density-based methods [24], [25] aim to detect local outliers, and thus they use local relative density as anomaly criterion. Several studies [26], [27], [28] apply KDE into density-based local outlier detection to improve the detection accuracy. However, such methods rely on an appropriate distance metric, which is
The proposed LAKE model
Experiments
In this section, we use six public benchmark datasets to evaluate the effectiveness and robustness of our proposed model in anomaly detection. The code of the baseline methods is available at GitHub1 released by ALAD. The source code of our proposed method is available at GitHub.2
Conclusion
In this paper, we propose a layer-constrained variational autoencoding kernel density estimation model (LAKE) for anomaly detection from high-dimensional data. LAKE mainly consists of two parts: the compression network and the KDE model. The compression network obtains a low-dimensional representation while retaining the key features using a layer-constrained variational autoencoder. The KDE model takes the low-dimensional representation and reconstruction error features as feeds, and learns a
CRediT authorship contribution statement
Peng Lv: Methodology, Software, Data curation, Visualization, Writing - original draft. Yanwei Yu: Conceptualization, Supervision, Writing - review & editing, Funding acquisition. Yangyang Fan: Data curation, Investigation, Writing - review & editing. Xianfeng Tang: Validation, Writing - review & editing. Xiangrong Tong: Resources, Funding acquisition.
Acknowledgments
The authors would like to thank the anonymous reviewers for their valuable comments and helpful suggestions. This work is partially supported by the National Natural Science Foundation of China under Grant Nos.: 61773331, 61572418 and 61403328. The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing any funding agencies.
References (39)
- et al.
High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning
Pattern Recognit.
(2016) - et al.
Low-rank local tangent space embedding for subspace clustering
Inform. Sci.
(2020) - et al.
Multiple metric learning based on bar-shape descriptor for person re-identification
Pattern Recognit.
(2017) - et al.
Maximal granularity structure and generalized multi-view discriminant analysis for person re-identification
Pattern Recognit.
(2018) - et al.
Similarity learning with joint transfer constraints for person re-identification
Pattern Recognit.
(2020) - et al.
A comparative evaluation of outlier detection algorithms: Experiments and analyses
Pattern Recognit.
(2018) - et al.
F-anoGAN: Fast unsupervised anomaly detection with generative adversarial networks
Med. Image Anal.
(2019) - et al.
Fast anomaly detection for streaming data
- et al.
Isolation forest
- et al.
HiCS: high contrast subspaces for density-based outlier ranking
Anomaly detection: A survey
ACM Comput. Surv.
Unsupervised anomaly detection with generative adversarial networks to guide marker discovery
Deep learning for anomaly detection: A survey
Eigenspace-based anomaly detection in computer systems
On anomalous hotspot discovery in graph streams
An approach to spacecraft anomaly detection problem using kernel feature space
Robust principal component analysis?
J. ACM
Anomaly detection with robust deep autoencoders
Cited by (23)
Attention-guided generator with dual discriminator GAN for real-time video anomaly detection
2024, Engineering Applications of Artificial IntelligenceData-driven physical fields reconstruction of supercritical-pressure flow in regenerative cooling channel using POD-AE reduced-order model
2023, International Journal of Heat and Mass TransferRobust anomaly detection for multivariate time series through temporal GCNs and attention-based VAE
2023, Knowledge-Based SystemsA semisupervised autoencoder-based method for anomaly detection in cutting tools
2023, Journal of Manufacturing ProcessesQuantification of anomalies in rats’ spinal cords using autoencoders
2021, Computers in Biology and MedicineCitation Excerpt :Fitting a threshold to this quantification would also be easier, since they tended to have less offset. Although the baseline might have been stable in this and several comparable applications and should definitely be considered, AE- and VAE-based methods have been shown to be very powerful approaches Sun et al. [15] Lv, Yu, Fan, Tang and Tong [31] Uzunova et al. [6] and should not be dismissed for hand crafted features. One further advantage of AE based methods is their adaptability to solve more complex issues.
- ☆
No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2020.105753.