Skip to main content

Advertisement

Log in

Extensive framework based on novel convolutional and variational autoencoder based on maximization of mutual information for anomaly detection

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

A Correction to this article was published on 02 August 2021

This article has been updated

Abstract

In present study, we proposed a general framework based on a convolutional kernel and a variational autoencoder (CVAE) for anomaly detection on both complex image and vector datasets. The main idea is to maximize mutual information (MMI) through regularizing key information as follows: (1) the features between original input and the representation of latent space, (2) that between the first convolutional layer output and the last convolutional layer input, (3) original input and output of the decoder to train the model. Therefore, the proposed CVAE is optimized by combining the representations learned across the three different objectives targeted at MMI on both local and global variables with the original training objective function of Kullback–Leibler divergence distributions. It allowed achieving the additional supervision power for the detection of image and vector data anomalies using convolutional and fully connected layers, respectively. Our proposal CVAE combined by regularizing multiple discriminator spaces to detect anomalies was introduced for the first time as far as we know. To evaluate the reliability of the proposed CVAE-MMI, it was compared with the convolutional autoencoder-based model using the original objective function. Furthermore, the performance of our network was compared over state-of-the-art approaches in distinguishing anomalies concerning both image and vector datasets. The proposed structure outperformed the state-of-the-arts with high and stable area under the curve values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Change history

References

  1. Calderara S, Heinemann U, Prati A, Cucchiara R, Tishby N (2011) Detectinganomalies in peoples trajectories using spectral graph analysis. Comput Vis Image Underst 115(8):1099–1111

    Article  Google Scholar 

  2. Hasan M, Choi J, Neumann J, RoyChowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 733–742

  3. Kumar A (2008) Computer-vision-based fabric defect detection: a survey. IEEE Trans Ind Electron 55(1):348–363

    Article  Google Scholar 

  4. Wang Y, Liu M, Bao Z, Zhang S (2019) Stacked sparse autoencoder with PCA and SVM for data-based line trip fault diagnosis in power systems. Neural Comput Appl 31(10):6719–6731

    Article  Google Scholar 

  5. Schlegl T, Seeböck P, Waldstein SM, Schmidt-Erfurth U, Langs G (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: International conference on information processing in medical imaging. Springer, pp 146–157

  6. Kavitha MS, Kurita T, Park S-Y, Chien S-I, Bae J-S, Ahn B-C (2017) Deep vector-based convolutional neural network approach for automatic recognition of colonies of induced pluripotent stem cells. PLoS ONE 12(12):e0189974

    Article  Google Scholar 

  7. Radovanović M, Nanopoulos A, Ivanović M (2014) Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE Trans Knowl Data Eng 27(5):1369–1382

    Article  Google Scholar 

  8. Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp 93–104

  9. Ravanbakhsh M, Nabi M, Sangineto E, Marcenaro L, Regazzoni C, Sebe N (2017) Abnormal event detection in videos using generative adversarialnets. In: 2017 IEEE international conference on image processing (ICIP). IEEE, pp 1577–1581

  10. Chalapathy R, Menon AK, Chawla S (2017) Robust, deep and inductive anomaly detection. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 36–51

  11. Kim D, Yang H, Chung M, Cho S, Kim H, Kim M, Kim K, Kim E (2018) Squeezed convolutional variational autoencoder for unsupervised anomalydetection in edge device industrial internet of things. In: 2018 international conference on information and computer technologies (ICICT). IEEE, pp 67–71

  12. Zenati H, Foo CS, Lecouat B, Manek G, Chandrasekhar VR (2018) Efficient gan-based anomaly detection. arXiv:1802.06222

  13. Nowozin S, Cseke B, Tomioka R (2016) f-gan: training generative neural samplers using variational divergence minimization. In: Advances in neural information processing systems, pp 271–279

  14. Chen Z, Yeo CK, Lee BS, Lau CT (2018) Autoencoder-based network anomaly detection. In: 2018 wireless telecommunications symposium (WTS). IEEE, pp 1–5

  15. Pol A, Berger V, Cerminara G, Germain C, Pierini M (2019) Anomaly detection with conditional variational autoencoders. In: IEEE International conference on machine learning and applications (ICMLA), pp 1651–1657

  16. An J, Cho S (2015) Variational autoencoder based anomaly detection using reconstruction probability. Spec Lect IE 2(1):1–18

    Google Scholar 

  17. Liu Y, Li Z, Zhou C, Jiang Y, Sun J, Wang M, He X (2019) Generative adversarial active learning for unsupervised outlier detection. IEEE Trans Knowl Data Eng 32:1517–1528

    Google Scholar 

  18. Kawachi Y, Koizumi Y, Harada N (2018) Complementary set variational autoencoder for supervised anomaly detection. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2366–2370

  19. Perera P, Nallapati R, Xiang B (2019) Ocgan: one-class novelty detection using gans with constrained latent representations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2898–2906

  20. Belghazi MI, Baratin A, Rajeswar S, Ozair S, Bengio Y, Courville A, Hjelm RD (2018) Mine: mutual information neural estimation. arXiv:1801.04062,ICML

  21. Ji X, Henriques JF, Vedaldi A (2018) Invariant information distillation for unsupervised image segmentation and clustering. arXiv:1807.06653

  22. Oord Avd, Li Y, Vinyals O (2018) Representation learning with contrastive predictive coding. arXiv:1807.03748

  23. Makhzani A, Shlens J, Jaitly N, Goodfellow I, Frey B (2015) Adversarial autoencoders. arXiv:1511.05644

  24. Bereziński P, Jasiul B, Szpyrka M (2015) An entropy-based network anomaly detection method. Entropy 17(4):2367–2408

    Article  Google Scholar 

  25. Koch-Janusz M, Ringel Z (2018) Mutual information, neural networks and there normalization group. Nat Phys 14(6):578–582

    Article  Google Scholar 

  26. Huang W, Zhang J, Sun H, Ma H, Cai Z (2017) An anomaly detection method based on normalized mutual information feature selection and quantum wavelet neural network. Wirel Pers Commun 96(2):2693–2713

    Article  Google Scholar 

  27. Jagota A (1991) Novelty detection on a very large number of memories stored in a hopfield-style network. In: IJCNN-91-seattle international joint conference on neural networks, vol 2. IEEE, pp 905-vol

  28. Moya MM, Koch MW, Hostetler LD (1993) One-class classifier networks for target recognition applications. NASA STI/recon technical report N93

  29. Ritter G, Gallegos MT (1997) Outliers in statistical pattern recognition and an application to automatic chromosome classification. Pattern Recognit Lett 18(6):525–539

    Article  Google Scholar 

  30. Wang G, Yang J, Li R (2017) Imbalanced SVM-based anomaly detection algorithm for imbalanced training datasets. ETRI J 39(5):621–631

    Article  Google Scholar 

  31. Khreich W, Khosravifar B, Hamou-Lhadj A, Talhi C (2017) An anomaly detection system based on variable N-gram features and one-class SVM. Inf Softw Technol 91:186–197

    Article  Google Scholar 

  32. Tax DM, Duin RP (1999) Support vector domain description. Pattern Rcognit Lett 20(11–13):1191–1199

    Article  Google Scholar 

  33. Yeung D-Y, Chow C (2002) Parzen-window network intrusion detectors. In: Object recognition supported by user interaction for service robots, vol 4. IEEE, pp 385–388

  34. Knorr EM, Ng RT, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3–4):237–253

    Article  Google Scholar 

  35. Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining out-liers from large data sets. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 427–438

  36. Yu Q, Kavitha MS, Kurita T (2019) Detection of one dimensional anomalies using a vector-based convolutional autoencoder. In: Asian conference on pattern recognition. Springer, pp 516–529

  37. Marchi E, Vesperini F, Eyben F, Squartini S, Schuller B (2015) A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp. 1996–2000

  38. Sun J, Wang X, Xiong N, Shao J (2018) Learning sparse representation with variational auto-encoder for anomaly detection. IEEE Access 6:33353–33361

    Article  Google Scholar 

  39. Li D, Chen D, Goh J, Ng S-K (2018) Anomaly detection with generative adversarial networks for multivariate time series. arXiv:1809.04758

  40. Schlegl T, Seeböckk P, Waldstein SM, Langs G, Schmidt-Erfurth U (2019) f-anogan: fast unsupervised anomaly detection with generative adversarial networks. Med Image Anal 54:30–44

    Article  Google Scholar 

  41. Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv:1312.6114

  42. Park D, Hoshi Y, Kemp CC (2018) A multimodal anomaly detector for robot-assisted feeding using an lstm-based variational autoencoder. IEEE Robot Autom Lett 3(3):1544–1551

    Article  Google Scholar 

  43. Xu J, Durrett G (2018) Spherical latent spaces for stable variational autoencoders. In: Proceedings of the empirical methods in natural language processing, pp 4503–4513

  44. Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) Infogan: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in neural information processing systems, pp 2172–2180

  45. Dieng AB, Kim Y, Rush AM, Blei DM (2019) Avoiding latent variable collapse with generative skip models. In: Proceedings on artificial intelligence and statistics, pp 2397–2405

  46. Hjelm RD, Fedorov A, Lavoie-Marchildon S, Grewal K, Bachman P, Trischler A, Bengio Y (2018) Learning deep representations by mutual information estimation and maximization. arXiv:1808.06670

  47. Yu S, Principe JC (2019) Understanding autoencoders with information theoretic concepts. Neural Netw 117:104–123

    Article  Google Scholar 

  48. Abati D, Porrello A, Calderara S, Cucchiara R (2019) Latent space autoregression for novelty detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 481–490

  49. Ruff L, Vandermeulen R, Goernitz N, Deecke L, Siddiqui SA, Binder A, Müller E, Kloft M (2018) Deep one-class classification. In: International conference on machine learning, pp 4393–4402

  50. Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images

  51. Coates A, Ng A, Lee H (2011) An analysis of single-layer networks in unsupervised feature learning, pp 215–223

  52. Pfahringer B (2000) Winning the kdd99 classification cup: bagged boosting. ACM SIGKDD Explor Newsl 1(2):65–66

    Article  Google Scholar 

  53. Yeh I-C, Lien C-H (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst Appl 36(2):2473–2480

    Article  Google Scholar 

  54. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image, pp 248–255

  55. Bishop CM (2007) Pattern recognition and machine learning (information science and statistics), 1st edn. Springer, Berlin

    Google Scholar 

  56. Akçay S, Atapour-Abarghouei A, Breckon TP (2019) Skip-ganomaly: skipconnected and adversarially trained encoder–decoder anomaly detection. arXiv:1901.08954

  57. Clifton L, Clifton DA, Watkinson PJ, Tarassenko L (2011) Identification of patient deterioration in vital-sign data using one-class support vector machines, pp 125–131

  58. Van den Oord A, Kalchbrenner N, Espeholt L, Vinyals O, Graves A et al (2016) Conditional image generation with pixelcnn decoders. In: Advances in neural information processing systems, pp 4790–4798

  59. Adler A, Elad M, Hel-Or Y, Rivlin E (2015) Sparse coding with anomaly detection. J Signal Process Syst 79(2):179–188

    Article  Google Scholar 

  60. Abe N, Zadrozny B, Langford J (2006) Outlier detection by active learning. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 504–509

  61. Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, pp 157–166

  62. Hou D, Cong Y, Sun G, Liu J, Xu X (2019) Anomaly detection via adaptive greedy model. Neurocomputing 330:369–379

    Article  Google Scholar 

  63. Akçay S, Atapour-Abarghouei A, Breckon TP (2018) Ganomaly: semi-supervised anomaly detection via adversarial training. In: Asian conference on computer vision. Springer, pp 622–637

  64. Bergmann P, Batzner K, Fauser M, Sattlegger D, Steger C (2021) The MVTec anomaly detection dataset: a comprehensive real-world dataset for unsupervised anomaly detection. Int J Comput Vis 1–22

Download references

Acknowledgements

This work was partly supported by JSPS KAKENHI Grant Number 16K00239.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takio Kurita.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

In Sect. 3.2, we have defined the loss function (Eq. 15) as follows

$$\begin{aligned} \begin{aligned} L_{\mathrm{MMI}}&=\lambda _{\mathrm{KLD}} KL(P(Z)|Q(Z))-\lambda I(X,Z)\\&\quad -\lambda _O I(X,Y)- \lambda _H I(L_1,L'_1)\\&=\lambda _{\mathrm{KLD}} \int p(\mathbf {z}) \hbox {log}\frac{p(\mathbf {z})}{q(\mathbf {z})}\hbox {d}\mathbf {z}\\&\quad -\lambda \int \int p(\mathbf {z}|x)p(x)\hbox {log}\frac{p(\mathbf {z}|x)}{p(\mathbf {z})}\hbox {d}x\hbox {d}\mathbf {z}\\&\quad -\lambda _O \int \int p(y|x)p(x)\hbox {log}\frac{p(y|x)}{p(y)}\hbox {d}x\hbox {d}y\\&\quad -\lambda _H \int \int p(l'_1|l_1)p(l_1)\hbox {log}\frac{p(l'_1|l_1)}{p(l'_1)}\hbox {d}l_1\hbox {d}l'_1 \end{aligned} \end{aligned}$$
(16)

where \(\lambda _{\mathrm{KLD}}\), \(\lambda\), \(\lambda _O\) and \(\lambda _H\) are the weighting parameters used to adjust the impact of individual losses on the overall objective function.

We transform Eq. B.1 to obtain the following:

$$\begin{aligned} \begin{aligned} L_{\mathrm{MMI}}&=\lambda _{\mathrm{KLD}} \int p(\mathbf {z}) \hbox {log}\frac{p(\mathbf {z})}{q(\mathbf {z})}\hbox {d}\mathbf {z}\\&\quad -\lambda \int \int p(\mathbf {z}|x)p(x)\hbox {log}\frac{p(\mathbf {z}|x)}{p(\mathbf {z})}\hbox {d}x\hbox {d}\mathbf {z}\\&\quad -\lambda _O \int \int p(y|x)p(x)\hbox {log}\frac{p(y|x)}{p(y)}\hbox {d}x\hbox {d}y\\&\quad -\lambda _H \int \int p(l'_1|l_1)p(l_1)\hbox {log}\frac{p(l'_1|l_1)}{p(l'_1)}\hbox {d}l_1\hbox {d}l'_1\\&=\int p(\mathbf {z}|x)p(x) \bigg [ \lambda _{\mathrm{KLD}} \hbox {log}\frac{p(\mathbf {z}|x)}{q(\mathbf {z})}\\&\quad -(\lambda _{\mathrm{KLD}}+\lambda ) \frac{p(\mathbf {z}|x)}{p(\mathbf {z})}\bigg ]\hbox {d}x\hbox {d}\mathbf {z} \\&\quad -\lambda _O \int \int p(y|x)p(x)\hbox {log}\frac{p(y|x)}{p(y)}\hbox {d}x\hbox {d}z\\&\quad -\lambda _H\int \int p(l'_1|l_1)p(l_1)\hbox {log}\frac{p(l'_1|l_1)}{p(l'_1)}\hbox {d}l_1\hbox {d}l'_1 \end{aligned} \end{aligned}$$
(17)

We define \(\lambda _L=\lambda _{\mathrm{KLD}}+\lambda\), and thus Eqn. B.20 can be writen as follows:

$$\begin{aligned} \begin{aligned} L_{\mathrm{MMI}}=&\,\lambda _{\mathrm{KLD}} E_{x \sim p(x)}[D_{\mathrm{KL}}(P(Z|X)||Q(Z))]\\&\quad -\lambda _L \int \int p(\mathbf {z}|x)p(x)\hbox {log}\frac{p(\mathbf {z}|x)}{p(\mathbf {z})}\hbox {d}x\hbox {d}z\\&\quad -\lambda _O \int \int p(y|x)p(x)\hbox {log}\frac{p(y|x)}{p(y)}\hbox {d}x\hbox {d}z\\&\quad -\lambda _H \int \int p(l'_1|l_1)p(l_1)\hbox {log}\frac{p(l'_1|l_1)}{p(l'_1)}\hbox {d}l_1\hbox {d}l'_1 \end{aligned} \end{aligned}$$
(18)

The first term of the loss function can be simply expressed as follows

$$\begin{aligned} E_{x \sim p(x)}[D_{\mathrm{KL}}(P(Z|X)||Q(Z))]=\sum \limits _{x \in X} \frac{1}{2}(-\hbox {log}\sigma ^2(x)+\mu ^2(x)+\sigma ^2(x)+1), x \in X, \end{aligned}$$
(19)

where \(\sigma ( .)\) and \(\mu ( . )\) represent the mean and standard deviations given x, respectively [41].

Then, Eq. B.3 is converted into KL divergence as follows:

$$\begin{aligned} \begin{aligned} I(X,Z)&=\int \int p(\mathbf {z}|x)p(x)\hbox {log}\frac{p(\mathbf {z}|x)}{p(\mathbf {z})}\hbox {d}x\hbox {d}z\\&=\int \int p(\mathbf {z}|x)p(x)\hbox {log}\frac{p(\mathbf {z}|x)p(x)}{p(\mathbf {z})p(x)}\hbox {d}x\hbox {d}z\\&=D_{\mathrm{KL}}(p(\mathbf {z}|x)p(x)||p(\mathbf {z})p(x)) \end{aligned} \end{aligned}$$
(20)

Similarly, I(XY) and \(I(L_1,L'_1)\) can be expressed as follows, relatively

$$\begin{aligned} I(X,Y)= &\, D_{\mathrm{KL}}(p(y|x)p(x)||p(y)p(x)) \end{aligned}$$
(21)
$$\begin{aligned} I(l_1,l'_1)= &\, D_{\mathrm{KL}}(p(l'_1|l_1)p(l_1)||p(l'_1)p(l_1)) \end{aligned}$$
(22)

It should be noted that KLD theoretically has no upper limit, but maximizing a quantity without an upper bound is likely to lead to outputting infinite results. Therefore, to perform optimization more effectively, we consider that the characteristic of maximizing MI is to widen the distance between \(p (\mathbf {z} | x) p (x)\) and \(p (\mathbf {z}) p (x)\); accordingly, instead of KL divergence, we switch to Jensen-Shannon divergence (JSD), which is a measure with an upper bound and it is defined as follows:

$$\begin{aligned} D_{JS}(P,Q)=\frac{1}{2}D_{\mathrm{KL}}\left(P|\frac{P+Q}{2}\right)+\frac{1}{2}D_{\mathrm{KL}}\left(Q|\frac{P+Q}{2}\right) \end{aligned}$$
(23)

The loss function according to Eq. 7 can be rewritten as follows:

$$\begin{aligned} \begin{aligned} L_{\mathrm{MMI}}&=\lambda _{\mathrm{KLD}}E_{x \sim p(x)}[D_{\mathrm{KL}}(P(Z|X)||Q(Z))]\\&\quad -\lambda _{L} \cdot ( E_{(x,\mathbf {z}) \sim p(\mathbf {z}|x)p(x)}[\hbox {log}H(x,\mathbf {z})]\\&\quad + E_{(x,\mathbf {z}) \sim p(\mathbf {z})p(x)}[\hbox {log}(1-H(x,\mathbf {z}))])\\&\quad -\lambda _{O} \cdot ( E_{(x,y) \sim p(\mathbf {z}|x)p(x)}[\hbox {log}H(x,y)]\\&\quad + E_{(x,y) \sim p(y)p(x)}[\hbox {log}(1-H(x,y))])\\&\quad -\lambda _{H} \cdot ( E_{(l_1,l'_1) \sim p(l'_1|l_1)p(l_1)}[\hbox {log}H(l_1,l'_1)]\\&\quad +E_{(l_1,l'_1) \sim p(l'_1)p(l_1)}[\hbox {log}(1-H(l_1,l'_1))]) \end{aligned} \end{aligned}$$
(24)

where \(H(.)=\frac{1}{1+\hbox {exp}(-v(.))}\), v(.) is an objective function defined from the proposed MI criterion according to [46].

Appendix 2

See Tables 9, 10, 11 and 12.

Table 9 Performance comparison of CVAE-MMI and the state-of-the-art methods on each individual class in terms of AUC concerning CIFAR10
Table 10 Performance comparison of CVAE-MMI and the state-of-the-art methods on each individual class in terms of AUC concerning CIFAR100
Table 11 Performance comparison of CVAE-MMI and the state-of-the-art methods on each individual class in terms of AUC concerning STL-10
Table 12 Performance comparison of CVAE-MMI and the state-of-the-art methods on each individual class in terms of AUC concerning IMAGENET

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, Q., Kavitha, M.S. & Kurita, T. Extensive framework based on novel convolutional and variational autoencoder based on maximization of mutual information for anomaly detection. Neural Comput & Applic 33, 13785–13807 (2021). https://doi.org/10.1007/s00521-021-06017-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06017-3

Keywords