Image Anomaly Detection with Generative Adversarial Networks

Deecke, Lucas; Vandermeulen, Robert; Ruff, Lukas; Mandt, Stephan; Kloft, Marius

doi:10.1007/978-3-030-10925-7_1

Lucas Deecke¹⁷,
Robert Vandermeulen¹⁸,
Lukas Ruff¹⁹,
Stephan Mandt²⁰ &
…
Marius Kloft¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11051))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

6046 Accesses
83 Citations

Abstract

Many anomaly detection methods exist that perform well on low-dimensional problems however there is a notable lack of effective methods for high-dimensional spaces, such as images. Inspired by recent successes in deep learning we propose a novel approach to anomaly detection using generative adversarial networks. Given a sample under consideration, our method is based on searching for a good representation of that sample in the latent space of the generator; if such a representation is not found, the sample is deemed anomalous. We achieve state-of-the-art performance on standard image benchmark datasets and visual inspection of the most anomalous samples reveals that our method does indeed return anomalies.

L. Deecke and R. Vandermeulen—Equal contributions.

You have full access to this open access chapter, Download conference paper PDF

GANomaly: Semi-supervised Anomaly Detection via Adversarial Training

Unsupervised Anomaly Detection with a GAN Augmented Autoencoder

Backpropagated Gradient Representations for Anomaly Detection

1 Introduction

Given a collection of data it is often desirable to automatically determine which instances of it are unusual. Commonly referred to as anomaly detection, this is a fundamental machine learning task with numerous applications in fields such as astronomy [11, 43], medicine [5, 46, 51], fault detection [18], and intrusion detection [15, 19]. Traditional algorithms often focus on the low-dimensional regime and face difficulties when applied to high-dimensional data such as images or speech. Second to that, they require the manual engineering of features.

Deep learning omits manual feature engineering and has become the de-facto approach for tackling many high-dimensional machine learning tasks. This is largely a testament of its experimental performance: deep learning has helped to achieve impressive results in image classification [24], and is setting new standards in domains such as natural language processing [25, 50] and speech recognition [3].

In this paper we present a novel deep learning based approach to anomaly detection which uses generative adversarial networks (GANs) [17]. GANs have achieved state-of-the-art performance in high-dimensional generative modeling. In a GAN, two neural networks – the discriminator and the generator – are pitted against each other. In the process the generator learns to map random samples from a low-dimensional to a high-dimensional space, mimicking the target dataset. If the generator has successfully learned a good approximation of the training data’s distribution it is reasonable to assume that, for a sample drawn from the data distribution, there exists some point in the GAN’s latent space which, after passing it through the generator network, should closely resembles this sample. We use this correspondence to perform anomaly detection with GANs (ADGAN).

In Sect. 2 we give an overview of previous work on anomaly detection and discuss the modeling assumptions of this paper. Section 3 contains a description of our proposed algorithm. In our experiments, see Sect. 4, we both validate our method against traditional methods and showcase ADGAN ’s ability to detect anomalies in high-dimensional data.

2 Background

Here we briefly review previous work on anomaly detection, touch on generative models, and highlight the methodology of GANs.

2.1 Related Work

Anomaly Detection. Research on anomaly detection has a long history with early work going back as far as [12], and is concerned with finding unusual or anomalous samples in a corpus of data. An extensive overview over traditional anomaly detection methods as well as open challenges can be found in [6]. For a recent empirical comparison of various existing approaches, see [13].

Generative models yield a whole family of anomaly detectors through estimation of the data distribution p. Given data, we estimate $\hat{p} \approx p$ and declare those samples which are unlikely under $\hat{p}$ to be anomalous. This guideline is roughly followed by traditional non-parametric methods such as kernel density estimation (KDE) [40], which were applied to intrusion detection in [53]. Other research targeted mixtures of Gaussians for active learning of anomalies [42], hidden Markov models for registering network attacks [39], and dynamic Bayesian networks for traffic incident detection [48].

Deep Generative Models. Recently, variational autoencoders (VAEs) [22] have been proposed as a deep generative model. By optimizing over a variational lower bound on the likelihood of the data, the parameters of a neural network are tuned in such a way that samples resembling the data may be generated from a Gaussian prior. Another generative approach is to train a pair of deep convolutional neural networks in an autoencoder setup (DCAE) [33] and producing samples by decoding random points on the compression manifold. Unfortunately, none of these approaches yield a tractable way of estimating p. Our approach uses a deep generative model in the context of anomaly detection.

Deep Learning for Anomaly Detection. Non-parametric anomaly detection methods suffer from the curse of dimensionality and are thus often inadequate for the interpretation and analysis of high-dimensional data. Deep neural networks have been found to obviate many problems that arise in this context. As a hybrid between the two approaches, deep belief networks were coupled with one-class support vector machines to detect anomalies in [14]. We found that this technique did not work well for image datasets, and indeed the authors included no such experiments in their paper.

A recent work proposed an end-to-end deep learning approach, aimed specifically at the task of anomaly detection [45]. Similarly, one may employ a network that was pretrained on a different task, such as classification on ImageNet [8], and then use this network’s intermediate features to extract relevant information from images. We tested this approach in our experimental section.

GANs, which we discuss in greater depth in the next section, have garnered much attention with its performance surpassing previous deep generative methods. Concurrently to this work, [46] developed an anomaly detection framework that uses GANs in a similar way as we do. We discuss the differences between our work and theirs in Sect. 3.2.

2.2 Generative Adversarial Networks

GANs, which lie at the heart of ADGAN, have set a new state-of-the-art in generative image modeling. They provide a framework to generate samples that are approximately distributed to p, the distribution of the training data $\mathcal \{ x_i \}_{i=1}^n \triangleq \mathcal X \subseteq \mathbb R^d$. To achieve this, GANs attempt to learn the parametrization of a neural network, the so-called generator $g_\theta $, that maps low-dimensional samples drawn from some simple noise prior $p_z$ (e.g. a multivariate Gaussian) to samples in the image space, thereby inducing a distribution $q_\theta $ (the push-forward of $p_z$ with respect to $g_\theta $) that approximates p. To achieve this a second neural network, the discriminator $d_\omega $, learns to classify the data from p and $q_\theta $. Through an alternating training procedure the discriminator becomes better at separating samples from p and samples from $q_\theta $, while the generator adjusts $\theta $ to fool the discriminator, thereby approximating p more closely. The objective function of the GAN framework is thus:

$$\begin{aligned} \min _{\theta } \max _{\omega } \, \Big \{ V(\theta , \omega ) = \mathbb E_{x\sim p}[\log d_\omega (x)] + \mathbb {E}_{z\sim p_z}[\log (1 - d_\omega (g_\theta (z)))] \Big \}, \end{aligned}$$

(1)

where z are vectors that reside in a latent space of dimensionality $d' \ll d$.^{Footnote 1} A recent work showed that this minmax optimization (1) equates to an empirical lower bound of an f-divergence [37].^{Footnote 2}

GAN training is difficult in practice, which has been shown to be a consequence of vanishing gradients in high-dimensional spaces [1]. These instabilities can be countered by training on integral probability metrics (IPMs) [35, 49], one instance of which is the 1-Wasserstein distance.^{Footnote 3} This distance, informally defined, is the amount of work to pull one density onto another, and forms the basis of the Wasserstein GAN (WGAN) [2]. The objective function for WGANs is

$$\begin{aligned} \min _{\theta } \max _{\omega \in \varOmega } \, \Big \{ W(\theta , \omega ) = \mathbb E_{x\sim p}[d_\omega (x)] - \mathbb E_{z\sim p_z}[d_\omega (g_\theta (z))] \Big \}, \end{aligned}$$

(2)

where the parametrization of the discriminator is restricted to allow only 1-Lipschitz functions, i.e. $\varOmega = \{ \omega :\Vert d_\omega \Vert _{\text{ L }} \le 1 \}$. When compared to classic GANs, we have observed that WGAN training is much more stable and is thus used in our experiments, see Sect. 4.

3 Algorithm

Our proposed method (ADGAN, see Algorithm 1) sets in after GAN training has converged. If the generator has indeed captured the distribution of the training data then, given a new sample $x \sim p$, there should exist a point z in the latent space, such that $g_\theta (z) \approx x$. Additionally we expect points away from the support of p to have no representation in the latent space, or at least occupy a small portion of the probability mass in the latent distribution, since they are easily discerned by $d_\omega $ as not coming from p. Thus, given a test sample x, if there exists no z such that $g_\theta (z) \approx x$, or if such a z is difficult to find, then it can be inferred that x is not distributed according to p, i.e. it is anomalous. Our algorithm hinges on this hypothesis, which we illustrate in Fig. 1.

3.1 ADGAN

To find z, we initialize from $z_0 \sim p_z$, where $p_z$ is the same noise prior also used during GAN training. For $t=1,\dots ,k$ steps, we backpropagate the reconstruction loss $\ell $ between $g_\theta (z_t)$ and x, making the subsequent generation $g_\theta (z_{t+1})$ more like x. At each iteration, we also allow a small amount of flexibility to the parametrization of the generator, resulting in a series of mappings from the latent space $g_{\theta _0}(z_0), \dots , g_{\theta _k}(z_k)$ that more and more closely resembles x. Adjusting $\theta $ gives the generator additional representative capacity, which we found to improve the algorithm’s performance. Note that these adjustments to $\theta $ are not part of the GAN training procedure and $\theta $ is reset back to its original trained value for each new testing point.

To limit the risk of seeding in unsuitable regions and address the non-convex nature of the underlying optimization problem, the search is initialized from $n_\text {seed}$ individual points. The key idea underlying ADGAN is that if the generator was trained on the same distribution x was drawn from, then the average over the final set of reconstruction losses $\{\ell (x,g_{\theta _{j,k}}(z_{j,k}))\}_{j=1}^{n_\text {seed}}$ will assume low values, and high values otherwise. In Fig. 2 we track a collection of samples through their search in a latent space of dimensionality $d'=2$.

Our method may also be understood from the standpoint of approximate inversion of the generator. In this sense, the above backpropagation finds latent vectors z that lie close to $g_\theta ^{-1}(x)$. Inversion of the generator was previously studied in [7], where it was verified experimentally that this task can be carried out with high fidelity. In addition [29] showed that generated images can be successfully recovered by backpropagating through the latent space.^{Footnote 4} Jointly optimizing latent vectors and the generator parametrization via backpropagation of reconstruction losses was investigated in detail by [4]. The authors found that it is possible to train the generator entirely without a discriminator, still yielding a model that incorporates many of the desirable properties of GANs, such as smooth interpolations between samples.

3.2 Alternative Approaches

Given that GAN training also gives us a discriminator for discerning between real and fake samples, one might reasonably consider directly applying the discriminator for detecting anomalies. However, once converged, the discriminator exploits checkerboard-like artifacts on the pixel level, induced by the generator architecture [31, 38]. While it perfectly separates real from forged data, it is not equipped to deal with samples which are completely unlike the training data. This line of reasoning is verified in Sect. 4 experimentally.

Another approach we considered was to evaluate the likelihood of the final latent vectors $\{z_{j,k}\}_{j=1}^{n_\text {seed}}$ under the noise prior $p_z$. This approach was tested experimentally in Sect. 4, and while it showed some promise, it was consistently outperformed by ADGAN.

In [46], the authors propose a technique for anomaly detection (called Ano-GAN) which uses GANs in a way somewhat similar to our proposed algorithm. Their algorithm also begins by training a GAN. Given a test point x, their algorithm searches for a point z in the latent space such that $g_\theta (z) \approx x$ and computes the reconstruction loss. Additionally they use an intermediate discriminator layer $d_\omega '$ and compute the loss between $d_\omega '(g_\theta (z))$ and $d_\omega '(x)$. They use a convex combination of these two quantities as their anomaly score.

In ADGAN we never use the discriminator, which is discarded after training. This makes it easy to couple ADGAN with any GAN-based approach, e.g. LSGAN [32], but also any other differentiable generator network such as VAEs or moment matching networks [27]. In addition, we account for the non-convexity of the underlying optimization by seeding from multiple areas in the latent space. Lastly, during inference we update not only the latent vectors z, but jointly update the parametrization $\theta $ of the generator.

4 Experiments

Here we present experimental evidence of the efficacy of ADGAN. We compare our algorithm to competing methods on a controlled, classification-type task and show anomalous samples from popular image datasets. Our main findings are that ADGAN:

Table 1. ROC-AUC of classic anomaly detection methods. For both MNIST and CIFAR-10, each model was trained on every class, as indicated by $y_c$, and then used to score against remaining classes. Results for KDE and OC-SVM are reported both in conjunction with PCA, and after transforming images with a pre-trained Alexnet.

Full size table

outperforms non-parametric as well as available deep learning approaches on two controlled experiments where ground truth information is available;
may be used on large, unsupervised data (such as LSUN bedrooms) to detect anomalous samples that coincide with what we as humans would deem unusual.

4.1 Datasets

Our experiments are carried out on three benchmark datasets with varying complexity: (i) MNIST [26] which contains grayscale scans of handwritten digits. (ii) CIFAR-10 [23] which contains color images of real world objects belonging to ten classes. (iii) LSUN [52], a dataset of images that show different scenes (such as bedrooms, bridges, or conference rooms). For all datasets the training and test splits remain as their default. All images are rescaled to assume pixel values in $[-1, 1]$.

4.2 Methods and Hyperparameters

We tested the performance of ADGAN against four traditional, non-parametric approaches commonly used for anomaly detection: (i) KDE [40] with a Gaussian kernel. The bandwidth is determined from maximum likelihood estimation over ten-fold cross validation, with $h \in \{ 2^0, 2^{1/2}, \dots , 2^4\}$. (ii) One-class support vector machine (OC-SVM) [47] with a Gaussian kernel. The inverse length scale is selected with automated tuning, as proposed by [16], and we set $\nu =0.1$. (iii) Isolation forest (IF) [30], which was largely stable to changes in its parametrization. (iv) Gaussian mixture model (GMM). We allowed the number of components to vary over $\{2, 3, \dots , 20\}$ and selected suitable hyperparameters by evaluating the Bayesian information criterion.

For the methods above we reduced the feature dimensionality before performing anomaly detection. This was done via PCA [41], varying the dimensionality over $\{20, 40, \dots , 100\}$; we simply report the results for which best performance on a small holdout set was attained. As an alternative to a linear projection, we evaluated the performance of both methods after applying a non-linear transformation to the image data instead via an Alexnet [24], pretrained on ImageNet. Just as on images, the anomaly detection is carried out on the representation in the final convolutional layer of Alexnet. This representation is then projected down via PCA, as otherwise the runtime of KDE and OC-SVM becomes problematic.

We also report the performance of two end-to-end deep learning approaches: VAEs and DCAEs. For the DCAE we scored according to reconstruction losses, interpreting a high loss as indicative of a new sample differing from samples seen during training. In VAEs we scored by evaluating the evidence lower bound (ELBO). We found this to perform much better than thresholding directly via the prior likelihood in the latent space or other more exotic approaches, such as scoring from the variance of the inference network.

In both DCAEs and VAEs we use a convolutional architecture similar to that of DCGAN [44], with batch normalization [20] and ReLU activations in each layer. We also report the performance of AnoGAN. To put it on equal footing, we pair it with DCGAN [44], the same architecture also used for training in our approach.

ADGAN requires a trained generator. For this purpose, we trained on the WGAN objective (2), as this was much more stable than using GANs. The architecture was fixed to that of DCGAN [44]. Following [34] we set the dimensionality of the latent space to $d'=256$.

For ADGAN, the searches in the latent space were initialized from the same noise prior that the GAN was trained on (in our case a normal distribution). To take into account the non-convexity of the problem, we seeded with $n_\text {seed}=64$ points. For the optimization of latent vectors and the parameters of the generator we used the Adam optimizer [21].^{Footnote 5} When searching for a point in the latent space to match a test point, we found that more iterations helped the performance, but this gain saturates quickly. As a trade-off between execution time and accuracy we found $k=5$ to be a good value, and used this in the results we report. Unless otherwise noted, we measured reconstruction quality with a squared $L_2$ loss.

4.3 One-Versus-All Classification

The first task is designed to quantify the performance of competing methods. The experimental setup closely follows the original publication on OC-SVMs [47] and we begin by training models on data from a single class from MNIST. Then we evaluate each model’s performance on 5000 items randomly selected from the test set, which contains samples from all classes. In each trial, we label the classes unseen in training as anomalous.

Ideally, a method assigns images from anomalous classes (say, digits 1-9) a higher anomaly score than images belonging to the normal class (zeros). Varying the decision threshold yields the receiver operating characteristic (ROC), shown in Fig. 3 (left). The second experiment follows this guideline with the colored images from CIFAR-10, and the resulting ROC curves are shown in Fig. 3 (right). In Table 1, we report the AUCs that resulted from leaving out each individual class.

In these controlled experiments we highlight the ability of ADGAN to outperform traditional methods at the task of detecting anomalies in a collection of high-dimensional image samples. While neither table explicitly contains results from scoring the samples using the GAN discriminator, we did run these experiments for both datasets. Performance was weak, with an average AUC of 0.625 for MNIST and 0.513 for CIFAR-10. Scoring according to the prior likelihood $p_z$ of the final latent vectors worked only slightly better, resulting in an average AUC of 0.721 for MNIST and 0.554 for CIFAR-10. Figure 2 gives an additional visual intuition as to why scoring via the prior likelihood fails to give a sensible anomaly score: anomalous samples do not get sent to low probability regions of the Gaussian distribution.

4.4 Unsupervised Anomaly Detection

In the second task we showcase the use of ADGAN in a practical setting where no ground truth information is available. For this we first trained a generator on LSUN scenes. We then used ADGAN to find the most anomalous images within the corresponding validation sets containing 300 images. The images associated with the highest and lowest anomaly scores of three different scene categories are shown in Figs. 4, 5, and 6. Note that the large training set sizes in this experiment would complicate the use of non-parametric methods such as KDE and OC-SVMs.

To additionally quantify the performance on LSUN, we build a test set from combining the 300 validation samples of each scene. After training the generator on bedrooms only we recorded whether ADGAN assigns them low anomaly scores, while assigning high scores to samples showing any of the remaining scenes. This resulted in an AUC of 0.641.

As can be seen from visually inspecting the LSUN scenes flagged as anomalous, our method has the ability to discern usual from unusual samples. We infer that ADGAN is able to incorporate many properties of an image. It does not merely look at colors, but also takes into account whether shown geometries are canonical, or whether an image contains a foreign object (like a caption). Opposed to this, samples that are assigned a low anomaly score are in line with a classes’ Ideal Form. They show plain colors, are devoid of foreign objects, and were shot from conventional angles. In the case of bedrooms, some of the least anomalous samples are literally just a bed in a room.

5 Conclusion

We showed that searching the latent space of the generator can be leveraged for use in anomaly detection tasks. To that end, our proposed method: (i) delivers state-of-the-art performance on standard image benchmark datasets; (ii) can be used to scan large collections of unlabeled images for anomalous samples.

To the best of our knowledge we also reported the first results of using VAEs for anomaly detection. We remain optimistic that boosting its performance is possible by additional tuning of the underlying neural network architecture or an informed substitution of the latent prior.

Accounting for unsuitable initializations by jointly optimizing latent vectors and generator parameterization are key ingredients to help ADGAN achieve strong experimental performance. Nonetheless, we are confident that approaches such as initializing from an approximate inversion of the generator as in ALI [9, 10], or substituting the reconstruction loss for a more elaborate variant, such as the Laplacian pyramid loss [28], can be used to improve our method further.

Notes

1.
That p may be approximated via transformations from a low-dimensional space is an assumption that is implicitly motivated from the manifold hypothesis [36].
2.
This lower bound becomes tight for an optimal discriminator, making apparent that $V(\theta ,\omega ^*) \propto \text {JS}[p\vert q_\theta ]$.
3.
This is achieved by restricting the class over which the IPM is optimized to functions that have Lipschitz constant less than one. Note that in Wasserstein GANs, an expression corresponding to a lower bound is optimized.
4.
While it was shown that any $g_\theta (z)$ may be reconstructed from some other $z_0 \in \mathbb R^{d'}$, this does not mean that the same holds for an x not in the image of $g_\theta $.
5.
From a quick parameter sweep, we set the learning rate to $\gamma =0.25$ and $(\beta _1, \beta _2) = (0.5,0.999)$. We update the generator with $\gamma _\theta =5\cdot 10^{-5}$, the default learning rate recommended in [2].

References

Arjovsky, M., Bottou, L.: Towards principled methods for training generative adversarial networks. In: International Conference on Learning Representations (2017)
Google Scholar
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223 (2017)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (2015)
Google Scholar
Bojanowski, P., Joulin, A., Lopez-Paz, D., Szlam, A.: Optimizing the latent space of generative networks. In: International Conference on Machine Learning (2018)
Google Scholar
Campbell, C., Bennett, K.P.: A linear programming approach to novelty detection. In: Advances in Neural Information Processing Systems, pp. 395–401 (2001)
Google Scholar
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)
Article Google Scholar
Creswell, A., Bharath, A.A.: Inverting the generator of a generative adversarial network. arXiv preprint arXiv:1611.05644 (2016)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Donahue, J., Krähenbühl, P., Darrell, T.: Adversarial feature learning. In: International Conference on Learning Representations (2017)
Google Scholar
Dumoulin, V., et al.: Adversarially learned inference. In: International Conference on Learning Representations (2017)
Google Scholar
Dutta, H., Giannella, C., Borne, K., Kargupta, H.: Distributed top-k outlier detection from astronomy catalogs using the DEMAC system. In: International Conference on Data Mining, pp. 473–478. SIAM (2007)
Google Scholar
Edgeworth, F.: XLI. on discordant observations. Lond. Edinb. Dublin Philos. Mag. J. Sci. 23(143), 364–375 (1887)
Article Google Scholar
Emmott, A.F., Das, S., Dietterich, T., Fern, A., Wong, W.K.: Systematic construction of anomaly detection benchmarks from real data. In: ACM SIGKDD Workshop on Outlier Detection and Description, pp. 16–21. ACM (2013)
Google Scholar
Erfani, S.M., Rajasegarar, S., Karunasekera, S., Leckie, C.: High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognit. 58, 121–134 (2016)
Article Google Scholar
Eskin, E.: Anomaly detection over noisy data using learned probability distributions. In: International Conference on Machine Learning (2000)
Google Scholar
Evangelista, P.F., Embrechts, M.J., Szymanski, B.K.: Some properties of the gaussian kernel for one class learning. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds.) ICANN 2007. LNCS, vol. 4668, pp. 269–278. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74690-4_28
Chapter Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Görnitz, N., Braun, M., Kloft, M.: Hidden Markov anomaly detection. In: International Conference on Machine Learning, pp. 1833–1842 (2015)
Google Scholar
Hu, W., Liao, Y., Vemuri, V.R.: Robust anomaly detection using support vector machines. In: International Conference on Machine Learning, pp. 282–289 (2003)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Google Scholar
Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Google Scholar
LeCun, Y.: The MNIST database of handwritten digits (1998). http://yann.lecun.com/exdb/mnist
Li, Y., Swersky, K., Zemel, R.: Generative moment matching networks. In: International Conference on Machine Learning, pp. 1718–1727 (2015)
Google Scholar
Ling, H., Okada, K.: Diffusion distance for histogram comparison. In: Computer Vision and Pattern Recognition, pp. 246–253. IEEE (2006)
Google Scholar
Lipton, Z.C., Tripathi, S.: Precise recovery of latent vectors from generative adversarial networks. In: International Conference on Learning Representations, Workshop Track (2017)
Google Scholar
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: International Conference on Data Mining, pp. 413–422. IEEE (2008)
Google Scholar
Lopez-Paz, D., Oquab, M.: Revisiting classifier two-sample tests. In: International Conference on Learning Representations (2017)
Google Scholar
Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Smolley, S.P.: Least squares generative adversarial networks. In: International Conference on Computer Vision, pp. 2794–2802. IEEE (2017)
Google Scholar
Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. Artificial Neural Networks and Machine Learning (ICANN), pp. 52–59 (2011)
Google Scholar
Metz, L., Poole, B., Pfau, D., Sohl-Dickstein, J.: Unrolled generative adversarial networks. In: International Conference on Learning Representations (2017)
Google Scholar
Müller, A.: Integral probability metrics and their generating classes of functions. Adv. Appl. Probab. 29(2), 429–443 (1997)
Article MathSciNet Google Scholar
Narayanan, H., Mitter, S.: Sample complexity of testing the manifold hypothesis. In: Advances in Neural Information Processing Systems, pp. 1786–1794 (2010)
Google Scholar
Nowozin, S., Cseke, B., Tomioka, R.: f-GAN: training generative neural samplers using variational divergence minimization. In: Advances in Neural Information Processing Systems, pp. 271–279 (2016)
Google Scholar
Odena, A., Dumoulin, V., Olah, C.: Deconvolution and checkerboard artifacts. Distill 1(10), e3 (2016)
Google Scholar
Ourston, D., Matzner, S., Stump, W., Hopkins, B.: Applications of hidden Markov models to detecting multi-stage network attacks. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences. IEEE (2003)
Google Scholar
Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)
Article MathSciNet Google Scholar
Pearson, K.: On lines and planes of closest fit to systems of points in space. Philos. Mag. 2(11), 559–572 (1901)
Article Google Scholar
Pelleg, D., Moore, A.W.: Active learning for anomaly and rare-category detection. In: Advances in Neural Information Processing Systems, pp. 1073–1080 (2005)
Google Scholar
Protopapas, P., Giammarco, J., Faccioli, L., Struble, M., Dave, R., Alcock, C.: Finding outlier light curves in catalogues of periodic variable stars. Mon. Not. R. Astron. Soc. 369(2), 677–696 (2006)
Article Google Scholar
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
Ruff, L., et al.: Deep one-class classification. In: International Conference on Machine Learning (2018)
Google Scholar
Schlegl, T., Seeböck, P., Waldstein, S.M., Schmidt-Erfurth, U., Langs, G.: Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: Niethammer, M., et al. (eds.) IPMI 2017. LNCS, vol. 10265, pp. 146–157. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59050-9_12
Chapter Google Scholar
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Technical report MSR-TR-99-87, Microsoft Research (1999)
Google Scholar
Singliar, T., Hauskrecht, M.: Towards a learning traffic incident detection system. In: Workshop on Machine Learning Algorithms for Surveillance and Event Detection, International Conference on Machine Learning (2006)
Google Scholar
Sriperumbudur, B.K., Fukumizu, K., Gretton, A., Schölkopf, B., Lanckriet, G.R.: On integral probability metrics, $\phi $-divergences and binary classification. arXiv preprint arXiv:0901.2698 (2009)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Google Scholar
Wong, W.K., Moore, A.W., Cooper, G.F., Wagner, M.M.: Bayesian network anomaly pattern detection for disease outbreaks. In: International Conference on Machine Learning, pp. 808–815 (2003)
Google Scholar
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: Computer Vision and Pattern Recognition, pp. 3485–3492. IEEE (2010)
Google Scholar
Yeung, D.Y., Chow, C.: Parzen-window network intrusion detectors. In: International Conference on Pattern Recognition, vol. 4, pp. 385–388. IEEE (2002)
Google Scholar

Download references

Acknowledgments

We kindly thank reviewers for their constructive feedback, which helped to improve this work. LD gratefully acknowledges funding from the School of Informatics, University of Edinburgh. LR acknowledges financial support from the German Federal Ministry of Transport and Digital Infrastructure (BMVI) in the project OSIMAB (FKZ: 19F2017E). MK and RV acknowledge support from the German Research Foundation (DFG) award KL 2698/2-1 and from the Federal Ministry of Science and Education (BMBF) award 031B0187B.

Author information

Authors and Affiliations

University of Edinburgh, Edinburgh, Scotland, UK
Lucas Deecke
TU Kaiserslautern, Kaiserslautern, Germany
Robert Vandermeulen & Marius Kloft
Hasso Plattner Institute, Potsdam, Germany
Lukas Ruff
University of California, Irvine, CA, USA
Stephan Mandt

Authors

Lucas Deecke
View author publications
You can also search for this author in PubMed Google Scholar
Robert Vandermeulen
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Ruff
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Mandt
View author publications
You can also search for this author in PubMed Google Scholar
Marius Kloft
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lucas Deecke .

Editor information

Editors and Affiliations

IBM Research - Ireland, Dublin, Ireland
Michele Berlingerio
Institute for Scientific Interchange, Turin, Italy
Francesco Bonchi
University of Nottingham, Nottingham, UK
Thomas Gärtner
University College Dublin, Dublin, Ireland
Neil Hurley
University College Dublin, Dublin, Ireland
Georgiana Ifrim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deecke, L., Vandermeulen, R., Ruff, L., Mandt, S., Kloft, M. (2019). Image Anomaly Detection with Generative Adversarial Networks. In: Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2018. Lecture Notes in Computer Science(), vol 11051. Springer, Cham. https://doi.org/10.1007/978-3-030-10925-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-10925-7_1
Published: 18 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10924-0
Online ISBN: 978-3-030-10925-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)