Backpropagated Gradient Representations for Anomaly Detection

Kwon, Gukyeong; Prabhushankar, Mohit; Temel, Dogancan; AlRegib, Ghassan

doi:10.1007/978-3-030-58589-1_13

Gukyeong Kwon¹²,
Mohit Prabhushankar¹²,
Dogancan Temel¹² &
…
Ghassan AlRegib¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12366))

Included in the following conference series:

European Conference on Computer Vision

4800 Accesses

Abstract

Learning representations that clearly distinguish between normal and abnormal data is key to the success of anomaly detection. Most of existing anomaly detection algorithms use activation representations from forward propagation while not exploiting gradients from backpropagation to characterize data. Gradients capture model updates required to represent data. Anomalies require more drastic model updates to fully represent them compared to normal data. Hence, we propose the utilization of backpropagated gradients as representations to characterize model behavior on anomalies and, consequently, detect such anomalies. We show that the proposed method using gradient-based representations achieves state-of-the-art anomaly detection performance in benchmark image recognition datasets. Also, we highlight the computational efficiency and the simplicity of the proposed method in comparison with other state-of-the-art methods relying on adversarial networks or autoregressive models, which require at least 27 times more model parameters than the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Image Anomaly Detection with Generative Adversarial Networks

History-Based Anomaly Detector: An Adversarial Approach to Anomaly Detection

A Review in Anomalies Detection Using Deep Learning

References

Abati, D., Porrello, A., Calderara, S., Cucchiara, R.: Latent space autoregression for novelty detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 481–490 (2019)
Google Scholar
Achille, A.: Task2vec: task embedding for meta-learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6430–6439 (2019)
Google Scholar
Achille, A., Paolini, G., Soatto, S.: Where is the information in a deep neural network? arXiv preprint arXiv:1905.12213 (2019)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Boston (2006). https://doi.org/10.1007/978-1-4615-7566-5
Book MATH Google Scholar
Drucker, H., Le Cun, Y.: Double backpropagation increasing generalization performance. In: IJCNN 1991-Seattle International Joint Conference on Neural Networks, vol. 2, pp. 145–150. IEEE (1991)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)
MATH Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1735–1742. IEEE (2006)
Google Scholar
Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems, pp. 487–493 (1999)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, Citeseer (2009)
Google Scholar
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 (2016)
Kwon, G., Prabhushankar, M., Temel, D., AIRegib, G.: Distorted representation space characterization through backpropagated gradients. In: 2019 26th IEEE International Conference on Image Processing (ICIP). IEEE (2019)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Luc, P., Neverova, N., Couprie, C., Verbeek, J., LeCun, Y.: Predicting deeper into the future of semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 648–657 (2017)
Google Scholar
Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders. arXiv preprint arXiv:1511.05644 (2015)
Mu, F., Liang, Y., Li, Y.: Gradients as features for deep representation learning. In: International Conference on Learning Representations (2020)
Google Scholar
Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., et al.: Conditional image generation with pixel CNN decoders. In: Advances in Neural Information Processing Systems, pp. 4790–4798 (2016)
Google Scholar
Peng, X., Zou, C., Qiao, Yu., Peng, Q.: Action recognition with stacked fisher vectors. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 581–595. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_38
Chapter Google Scholar
Perera, P., Nallapati, R., Xiang, B.: OcGAN: one-class novelty detection using GANs with constrained latent representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2898–2906 (2019)
Google Scholar
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)
Google Scholar
Pidhorskyi, S., Almohsen, R., Doretto, G.: Generative probabilistic novelty detection with adversarial autoencoders. In: Advances in Neural Information Processing Systems, pp. 6822–6833 (2018)
Google Scholar
Ross, A.S., Doshi-Velez, F.: Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In: Thirty-second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Ruff, L., et al.: Deep one-class classification. In: International Conference on Machine Learning, pp. 4390–4399 (2018)
Google Scholar
Ruff, L., et al.: Deep one-class classification. In: International Conference on Machine Learning, pp. 4393–4402 (2018)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533 (1986)
Article Google Scholar
Sabokrou, M., Khalooei, M., Fathy, M., Adeli, E.: Adversarially learned one-class classifier for novelty detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3379–3388 (2018)
Google Scholar
Sakurada, M., Yairi, T.: Anomaly detection using autoencoders with nonlinear dimensionality reduction. In: Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, p. 4. ACM (2014)
Google Scholar
Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vis. 105(3), 222–245 (2013)
Article MathSciNet Google Scholar
Schlegl, T., Seeböck, P., Waldstein, S.M., Schmidt-Erfurth, U., Langs, G.: Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: Niethammer, M., et al. (eds.) IPMI 2017. LNCS, vol. 10265, pp. 146–157. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59050-9_12
Chapter Google Scholar
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Article Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Google Scholar
Sokolić, J., Giryes, R., Sapiro, G., Rodrigues, M.R.: Robust large margin deep neural networks. IEEE Trans. Signal Process. 65(16), 4265–4280 (2017)
Article MathSciNet Google Scholar
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806 (2014)
Tax, D.M., Duin, R.P.: Support vector data description. Mach. Learn. 54(1), 45–66 (2004)
Article Google Scholar
Temel, D., Kwon, G., Prabhushankar, M., AlRegib, G.: Cure-TSR: challenging unreal and real environments for traffic sign recognition. arXiv preprint arXiv:1712.02463 (2017)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Chapter Google Scholar
Zhou, C., Paffenroth, R.C.: Anomaly detection with robust deep autoencoders. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 665–674. ACM (2017)
Google Scholar
Zong, B., et al.: Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In: International Conference on Learning Representations (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Georgia Institute of Technology, Atlanta, GA, 30332, USA
Gukyeong Kwon, Mohit Prabhushankar, Dogancan Temel & Ghassan AlRegib

Authors

Gukyeong Kwon
View author publications
You can also search for this author in PubMed Google Scholar
Mohit Prabhushankar
View author publications
You can also search for this author in PubMed Google Scholar
Dogancan Temel
View author publications
You can also search for this author in PubMed Google Scholar
Ghassan AlRegib
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gukyeong Kwon .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 732 KB)

A Appendix

In Sect. A.1, we compare the performance of GradCon with other benchmarking and state-of-the-art algorithms on fMNIST. In Sect. A.2, we perform statistical analysis and highlight the separation between inliers and outliers achieved by using the gradient-based representations in CIFAR-10. In Sect. A.3, we analyze different parameter settings for GradCon. Finally, we provide additional details on CURE-TSR dataset in Sect. A.4.

1.1 A.1 Additional Results on fMNIST

We compared the performance of GradCon with other benchmarking and state-of-the-art algorithms using CIFAR-10 and MNIST in Table 3 and 4. In Table 5 of the paper, we mainly focused on rigorous comparison between GradCon and GPND which shows the second best performacne in terms of the average AUROC on fMNIST. In this section, we report the average AUROC performance of GradCon in comparison with that of additional benchmarking and state-of-the-art algorithms using fMNIST in Table 7. The same experimental setup for fMNIST described in Sect. 5.1 is utilized and the test set contains the same number of inliers and outliers. GradCon outperforms all the compared algorithms including GPND. Given that ALOCC, OCGAN, and GPND are all based on adversarial training to further constrain the activation-based representations, GradCon achieves the best performance in fMNIST only based on a CAE and requires significantly less computations.

Table 7. Average AUROC result of GradCon compared with benchmarking and state-of-the-art anomaly detection algorithms on fMNIST.

Full size table

1.2 A.2 Histogram Analysis on CIFAR-10

We presented histogram analysis using gray scale digit images in MNIST to explain the state-of-the-art performance achieved by GradCon in Fig. 5. In this section, we perform the same histogram analysis using color images of general objects in CIFAR-10 to further highlight the separation between inliers and outliers achieved by the gradient-based representations. We obtain histograms for CIFAR-10 through the same procedures that are used to generate histograms for MNIST visualized in Fig. 5. In Fig. 6, we visualize the histograms of the reconstruction error, the latent loss, and the gradient loss in CIFAR-10. Also, we provide the percentage of overlap between histograms from inliers and outliers. The measured error on each representation is expected to differentiate inliers from outliers and achieve as small as possible overlap between histograms. The gradient loss shows the smallest overlap compared to other two losses defined in activation-based representations. This statistical analysis also supports the superior performance of GradCon compared to other reconstruction error or latent loss-based algorithms reported in Table 3.

Comparison between histograms from MNIST visualized in Fig. 5 and those from CIFAR-10 shows that the gradient loss is more effective when data becomes complicated and challenging for anomaly detection. In MNIST, simple low-level features such as curved edges or straight edges can be class discriminant features for anomaly detection. On the other hand, CIFAR-10 contains images with richer structure and features than MNIST. Therefore, normal and abnormal data are not easily separable and the overlap between histograms is significantly larger in CIFAR-10 than MNIST. In CIFAR-10, the overlap of the gradient loss is smaller than the second smallest overlap of the reconstruction error by $12.4\%$. In MNIST, the overlap of the gradient loss is smaller than the second smallest overlap by $5.7\%$. GradCon also outperforms other state-of-the-art methods by a larger margin of AUROC in CIFAR-10 compared to MNIST. The overlap and performance differences show that the contribution of the gradient loss becomes more significant when data is complicated and challenging for anomaly detection.

1.3 A.3 Parameter Setting for the Gradient Loss

We analyze the impact of different parameter settings on the performance of GradCon. The final anomaly score of GradCon is given as $\mathcal {L} + \beta \mathcal {L}_{grad}$, where $\mathcal {L}$ is the reconstruction error and $\mathcal {L}_{grad}$ is the gradient loss. While we use $\alpha $ parameter to weight the gradient loss and constrain the gradients during training, we observe that the gradient loss generally shows better performance as an anomaly score than the reconstruction error. Hence, we use $\beta = n \alpha $, where n is constant, to weight the gradient loss more for the anomaly score. We evaluate the average AUROC performance of GradCon with different $\beta $ parameters using CIFAR-10 in Fig. 7. In particular, we change the scaling constant, n, to change $\beta $ in the x-axis of the plot. The performance of GradCon improves as we increase $\beta $ in the range of $\beta = \left[ 0, 2\alpha \right] $. Also, GradCon consistently achieves state-of-the-art performance across a wide range of $\beta $ parameter settings when $\beta \ge 1.67 \alpha $. To be specific, GradCon always outperforms OCGAN which achieves the second best average AUROC performance of 0.657 in CIFAR-10 when $\beta \ge 1.67\alpha $. This analysis shows that GradCon achieves the best performance in CIFAR-10 across a wide range of $\beta $.

1.4 A.4 Additional Details on CURE-TSR Dataset

We visualize traffic sign images with 8 different challenge types and 5 different levels in Fig. 8. Level 5 images contain the most severe challenge effect and level 1 images are least affected by the challenging conditions. Since level 1 images are perceptually most similar to the challenge-free image, it is more challenging for anomaly detection algorithms to classify level 1 images as outliers. The gradient loss from CAE + Grad outperforms the reconstruction error from CAE in all level 1 challenge types. This result shows that the gradient loss consistently outperforms the reconstruction error even when inliers and outliers become relatively similar under mild challenging conditions.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kwon, G., Prabhushankar, M., Temel, D., AlRegib, G. (2020). Backpropagated Gradient Representations for Anomaly Detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12366. Springer, Cham. https://doi.org/10.1007/978-3-030-58589-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-58589-1_13
Published: 12 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58588-4
Online ISBN: 978-3-030-58589-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Backpropagated Gradient Representations for Anomaly Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Image Anomaly Detection with Generative Adversarial Networks

History-Based Anomaly Detector: An Adversarial Approach to Anomaly Detection

A Review in Anomalies Detection Using Deep Learning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 732 KB)

A Appendix

A Appendix

1.1 A.1 Additional Results on fMNIST

1.2 A.2 Histogram Analysis on CIFAR-10

1.3 A.3 Parameter Setting for the Gradient Loss

1.4 A.4 Additional Details on CURE-TSR Dataset

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us