Computing Anomaly Score Threshold with Autoencoders Pipeline

Fernández-Saúco, Igr Alexánder; Acosta-Mendoza, Niusvel; Gago-Alonso, Andrés; García-Reyes, Edel Bartolo

doi:10.1007/978-3-030-13469-3_28

Igr Alexánder Fernández-Saúco¹⁷,
Niusvel Acosta-Mendoza¹⁸,
Andrés Gago-Alonso¹⁸ &
…
Edel Bartolo García-Reyes¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11401))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

2389 Accesses
1 Citations
3 Altmetric

Abstract

Autoencoders neural networks are considered an unsupervised learning algorithm which can be used for detecting anomalies on datasets. In anomaly detection systems powered by autoencoders, the evaluated samples are sorted by the reconstruction error and the anomaly score threshold is set by experts. This threshold helps to select the set of anomaly candidates. In most of the real-world scenarios, the anomaly score threshold estimation is a non-trivial task even for an expert. This paper contains a proposal for an iterative training method based on an autoencoders pipeline to automatically compute the anomaly score threshold. The proposed method achieves encouraging and consistent results collected through the experimentation over two well-known datasets. According to the network configuration, training and tuning, the estimated anomaly score threshold from the proposed method is close to the best possible for the dataset.

You have full access to this open access chapter, Download conference paper PDF

Detecting Anomalies with Autoencoders on Data Streams

AEGR: a simple approach to gradient reversal in autoencoders for network anomaly detection

Article 17 August 2021

Anomaly Detection by Robust Feature Reconstruction

Keywords

1 Introduction

In data mining, anomaly detection is about the identification of samples which do not share the pattern or behavior followed by most of the elements in a dataset. Anomaly detection techniques have become important tools in several real-world applications such as malicious activity detection [1], intrusion detection [2,3,4], fraud detection [5, 6], surveillance [7, 8], and others. For instance, in certain businesses, specifically in banks or telecommunication companies where customers execute financial transactions between accounts, an anomaly detection system is very useful as it alerts on unusual account behavior in a period [5, 9].

In terms of anomaly detection, one of the most successful methods is autoencoders from Deep Learning. Autoencoder networks can learn a compressed representation of the input data, providing an efficient reconstructed output by reducing the input dimensionality [10]. Moreover, it turns out that autoencoders are more concerned in the difference between the input and output than in the output itself. This difference is known as reconstruction error. According to autoencoder features, a high reconstruction error or score indicates the occurrence of an anomaly [10].

High score samples are highlighted as anomalies. To do so, a selection of the score threshold to compare against is required [4, 8, 10]. Usually this value is either set by a human expert or it is statistically estimated assuming a theoretical distribution. For this reason, in this paper, an iterative training method is introduced which estimates the anomaly score threshold.

This paper is organized as follows: in Sect. 2, the related works are discussed. In Sect. 3, the proposed autoencoder pipeline method is introduced. In Sect. 4, the experimental results are shown and discussed. Finally, in Sect. 5, the conclusions and future work directions are presented.

2 Related Works

Rule-based approach is one of the mainstream techniques in the field of anomaly detection including malicious activity, intrusion or fraud detection [1]. Nowadays, a common scenario requires processing a large volume of unlabeled data where rule-based approaches fail and only unsupervised algorithms are able to support anomaly detection systems. The expansion of the computational power boots up the application of Deep Learning, whose unsupervised algorithms can be used to process large volumes of unlabeled data. These algorithms include the autoencoder networks [6, 11].

2.1 Anomaly Detection with Autoencoders

Autoencoders are a good option in the absence of the ground truth. Since, the introduction of replicator neural network as outlier detection tool [12], autoencoder networks have been used to solve anomaly detection problems [6, 10, 11]. This kind of network consists of two parts: the encoder, shaped like a funnel, and the decoder that expands back out to the full input dataset size at the output layer [10].

The autoencoder structure allows the network to learn a compressed representation of datasets, obtaining a reduced representation of the input data in terms of its dimensions. The output of autoencoders is a reconstruction of the input data in the most efficient way [10]. One of the most interesting characteristics of the autoencoders, since they are a variant of feed-forward neural networks, is the presence of an extra bias that allows the network to recognize normal regions in the feature space, and to compute the reconstruction error [10, 11]. As a consequence, a high reconstruction error indicates an anomaly.

There is a probabilistic version of autoencoders, known as Variational Autoencoders (VAE) [13]. The main advantage of a VAE over an autoencoder network is a probabilistic output which means a reconstruction probability instead of a reconstruction error as an anomaly score. As stated in the literature, probabilities do not require model specific thresholds for considering an evaluated sample as an anomaly since they represent the foundations of what is happening [13]. However, setting a threshold to identify the boundaries and to judge properly what high means is required.

Searching the best anomaly score threshold for automatic anomaly recognition is not a trivial task. The common approaches include to set the anomaly score threshold by human experts or to estimate it from a heuristic (e.g. three-sigma rule [14]) assuming that the dataset fits a theoretical distribution [10].

In the literature, a method for network intrusion detection, which attempts to compute the anomaly score threshold was reported [4]. The proposed training process uses normal samples only. Therefore each autoencoder from the ensemble computes its own anomaly score threshold by selecting the maximum score from training samples.

The reviewed applications of autoencoders are focused on detecting specific dataset anomalies. For example, the classical detection of outlier digits over MNIST database [10], anomalies detection over accounting data [6], or continuous video stream [7] or network intrusion detection [15]. In all mentioned application samples, the anomaly score threshold is a parameter. The estimation of this value is an expert task.

At this point, we conclude there are no reported solutions (neither an exploration) for automatically obtaining the anomaly score threshold from the autoencoders themselves. This paper introduces Autoencoders Pipeline as a valid method to estimate the normality limits.

3 Proposed Method

The goal of the method is to compute the anomaly score threshold. The idea is to arrange and train the autoencoders in sequence, resulting in an iterative training method from which the anomaly score threshold can be obtained. This approach is called “Autoencoders Pipeline” (AEP).

3.1 Autoencoders Pipeline

AEP starts as a regular training. The dataset is split into training and evaluation set. The method consists of the training of a new autoencoder network while normal samples remain in the evaluation set. The normal sample is defined as an evaluated one with a score below the expected anomaly score threshold for the iteration. All anomalous candidate data results are reintegrated for reprocessing in upcoming iterations.

In the first iteration, the scoreThreshold is initialized as follows:

$$\begin{aligned} scoreThreshold = min(score_0) \end{aligned}$$

(1)

and the scoreIncrement computed as follows:

$$\begin{aligned} scoreIncrement = \frac{max(score_0) - min(score_0)}{100} \end{aligned}$$

(2)

where $score_0$ is the vector with the scores of each evaluated sample.

The evaluated sample score is compared with $scoreThreshold + scoreIncrement$ on each iteration. Every sample with a score greater than this value is considered anomaly candidate. If anomaly candidates are collected at the end of the iteration, the scoreThreshold is updated as follows:

$$\begin{aligned} scoreThreshold = scoreThreshold + scoreIncrement. \end{aligned}$$

(3)

When all evaluated samples in the iteration are anomaly candidates, the stop condition of the algorithm is reached and the final scoreThreshold is computed as follows:

$$\begin{aligned} scoreThreshold = \frac{scoreThreshold + min(score_l)}{2} \end{aligned}$$

(4)

where l indicates the last iteration.

If the stop condition isn’t reached, all anomaly candidates are merged back in the training set and split again at random to train a new autoencoder and start a new iteration. The output of this method is the best-trained autoencoder network from the first iteration and the anomaly score threshold. Figure 1 depicts an overview of how AEP works.

Notice that as the anomaly candidates are merged back with the previous training set, a new set with more anomaly average (less normality) is obtained. This approach downgrades smoothly the learning capacity of the new autoencoders from the new datasets until the last autoencoder which considers all evaluated samples as anomalies. When the algorithm reaches this condition the anomaly score threshold is close to the dataset normality limits.

4 Experiments

In this section, the experimental results obtained over two datasets of outliers, following the training method explained in Sect. 3 are shown. All experiments were carried out on a personal computer with an Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz with 16GB of RAM. The algorithm was implemented in Java, powered by Deeplearning4j^{Footnote 1} and executed on Microsoft Windows 10 Professional Operating System.

The datasets used in the experiments are from the Outlier Detection DataSets Library (ODDS)^{Footnote 2}. The datasets were split into equal parts (50% for training and 50% for evaluating) as experiment design approach. Each experiment comprises fifteen executions of the method, looking for a tendency or similarity in the estimated anomaly score thresholds, that allows to validate its effectiveness.

For the scope of the experiments, similar network configurations for the autoencoders were used. The main hyperparameters are Stochastic Gradient Descent (SGD) as optimization algorithm, Xavier as weight initializer, Rectified linear units (ReLU) as activation function and RMSProp as the updater for each layer, and Mean squared error (MSE) as network loss function. The input and output layer sizes depend on the dataset dimensionality. The encoder reduces the dimensionality to 75% on each hidden layer reaching the bottleneck with a size close to 33% of the input size. The decoder increases the dimensionality back using the same values of the encoder in reverse order.

The experiment results are shown in a table that includes the following columns: Anomaly Score Threshold (AST), Detected Anolamies (DA), True Positives (TP), False Positives (FP), True Negatives (TN), False Negatives (FN), Accuracy (AC), Precision (P), Recall (R) and F-measure (F1). Each row of the table represents an isolated execution of the method.

The anomaly score thresholds output from every single execution are overall good estimations, with a few exceptions. From all these results, a better anomaly score threshold can be computed and, according to this value, an associated network can be selected. In the experiments results, the selection of a trained network by the proximity to the denser cluster mean is included. This cluster is an output of the Single-linkage clustering algorithm [16] alongside a heuristic to score each created cluster in the process.

4.1 Arrhythmia Dataset Experiment

“The arrhythmia dataset is a multi-class classification dataset with dimensionality 279. There are five categorical attributes which are discarded here, totalling 274 attributes. The smallest classes, i.e., 3, 4, 5, 7, 8, 9, 14, 15 are combined to form the outliers class and the rest of the classes are combined to form the inliers class”^{Footnote 3}.

In Table 1a, the arrhythmia datasets composition used in this experiment is presented. Table 1b enumerates the input and output layer sizes of the network. Furthermore, Table 1c summarizes the results of the executions. The best anomaly score threshold is 430.6269 but the algorithm selects . This value is the closest to the mean of the cluster from to . This effects the selection of the trained network associated with the selected execution.

Table 1. Arrhythmia dataset experiment

Full size table

This is a result of a very poorly tuned network. Notice values of F1 are under 0.5. As mentioned before, the algorithm is not looking for the best network configuration for the dataset, but for the best possible anomaly score threshold. The average of iterations to convergence was 62 (min:37, max:154).

4.2 Wisconsin-Breast Cancer Dataset Experiment

“The Wisconsin-Breast Cancer (Diagnostics) dataset (WBC) is a classification dataset with dimensionality 30, which records the measurements for breast cancer cases. There are two classes, benign and malignant. The malignant class of this dataset is downsampled to 21 points, which are considered as outliers, while points in the benign class are considered inliers”^{Footnote 4}.

In Table 2a, the WBC datasets composition used in this experiment is presented. Table 2b enumerates the input and output layer sizes of the network. Furthermore, the results of the executions are listed in Table 2c. The best anomaly score threshold is 0.05666 but the unsupervised selection is due to its proximity to the mean of the cluster from to . This selection also includes the associated network.

Table 2. WBC dataset experiment

Full size table

This is the result of a better trained network than the one presented in Sect. 4.1. The configuration and training are not the best possible but are sufficient for purpose of this paper. Values of F1 greater than 0.7 in some executions have been achieved. The average of iterations to convergence was 49 (min:28, max:101).

5 Conclusions

In this paper, the Autoencoder Pipeline as an iterative training method to find the anomaly score threshold for a dataset according to the network configuration, training and tuning was introduced. The method is evaluated over two well-known datasets.

The reliability of the method has been exposed through a couple of experiments and the results are encouraging. Based on the experiments, we conclude that is possible to automatically compute the anomaly score threshold by autoencoders themselves. In essence, an arrangement of autoencoders in a pipeline is required along with a smoothly downgrade of the normal samples from the training set.

As future work, the unsupervised selection of the best network from several executions can be improved. In an ideal case, all networks from the cluster with the best ones should work together by consolidating the anomaly criterion.

Notes

1.
Deep Learning Library for the JVM http://deeplearning4j.org.
2.
ODDS http://odds.cs.stonybrook.edu.
3.
ODDS Arrhythmia dataset http://odds.cs.stonybrook.edu/arrhythmia-dataset.
4.
ODDS WBC dataset http://odds.cs.stonybrook.edu/wbc.

References

Herrera-Semenets, V., Pérez García, O.A., Gago-Alonso, A., Hernández-León, R.: Classification rule-based models for malicious activity detection. Intell. Data Anal. 21, 1141–1154 (2017)
Article Google Scholar
Denning, D.E.: An intrusion-detection model. IEEE Trans. Softw. Eng. SE-13(2), 222–232 (1987)
Google Scholar
Jones, A.K., Sielken, R.S.: Computer system intrusion detection: a survey. Technical report, University of Virginia, February 2001
Google Scholar
Mirsky, Y., Doitshman, T., Elovici, Y., Shabtai, A.: Kitsune: an ensemble of autoencoders for online network intrusion detection. CoRR abs/1802.09089 (2018)
Google Scholar
Pumsirirat, A., Yan, L.: Credit card fraud detection using deep learning based on auto-encoder and restricted Boltzmann machine. Int. J. Adv. Comput. Sci. Appl. 9, 18–25 (2018)
Google Scholar
Schreyer, M., Sattarov, T., Borth, D., Dengel, A., Reimer, B.: Detection of anomalies in large scale accounting data using deep autoencoder networks. CoRR abs/1709.05254 (2017)
Google Scholar
Narasimhan, M.G., Sowmya Kamath, S.: Dynamic video anomaly detection and localization using sparse denoising autoencoders. Multimed. Tools Appl. 77(11), 13173–13195 (2018)
Article Google Scholar
Xu, D., Yan, Y., Ricci, E., Sebe, N.: Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput. Vis. Image Underst. 156, 117–127 (2017). Image and Video Understanding in Big Data
Article Google Scholar
Zheng, Y.J., Zhou, X.H., Sheng, W.G., Xue, Y., Chen, S.Y.: Generative adversarial network based telecom fraud detection at the receiving bank. Neural Networks 102, 78–86 (2018)
Article Google Scholar
Patterson, J., Gibson, A.: Deep Learning: A Practitioner’s Approach. O’Reilly, Beijing (2017)
Google Scholar
Xu, H., et al.: Unsupervised anomaly detection via variational auto-encoder for seasonal KPIs in web applications. CoRR abs/1802.03903 (2018)
Google Scholar
Hawkins, S., He, H., Williams, G., Baxter, R.: Outlier detection using replicator neural networks. In: Proceedings of the Fifth International Conference and Data Warehousing and Knowledge Discovery (DaWaK02), pp. 170–180 (2002)
Google Scholar
An, J., Cho, S.: Variational Autoencoder Based Anomaly Detection Using Reconstruction Probability. Seoul National University, Seoul (2015)
Google Scholar
Upton, G., Cook, I.: A Dictionary of Statistics. Oxford University Press, New York (2008)
Book Google Scholar
Yu, Y., Long, J., Cai, Z.: Network intrusion detection through stacking dilated convolutional autoencoders. Secur. Commun. Networks 2017, 1–10 (2017)
Article Google Scholar
Gower, J.C., Ross, G.J.S.: Minimum spanning trees and single linkage cluster analysis. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 18(1), 54–64 (1969)
Google Scholar

Download references

Author information

Authors and Affiliations

DATYS - Technological Solutions, 5a #3401 e/34 and 36, Miramar, Playa, 11300, Havana, Cuba
Igr Alexánder Fernández-Saúco
Advanced Technologies Application Center (CENATAV), 7a #21406 e/214 and 216, Siboney, Playa, 12200, Havana, Cuba
Niusvel Acosta-Mendoza, Andrés Gago-Alonso & Edel Bartolo García-Reyes

Authors

Igr Alexánder Fernández-Saúco
View author publications
You can also search for this author in PubMed Google Scholar
Niusvel Acosta-Mendoza
View author publications
You can also search for this author in PubMed Google Scholar
Andrés Gago-Alonso
View author publications
You can also search for this author in PubMed Google Scholar
Edel Bartolo García-Reyes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Igr Alexánder Fernández-Saúco .

Editor information

Editors and Affiliations

Biometrics and Data Pattern Analytics Lab, Universidad Autonoma de Madrid, Madrid, Spain
Ruben Vera-Rodriguez
Biometrics and Data Pattern Analytics Lab, Universidad Autonoma de Madrid, Madrid, Spain
Julian Fierrez
Biometrics and Data Pattern Analytics Lab, Universidad Autonoma de Madrid, Madrid, Spain
Aythami Morales

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fernández-Saúco, I.A., Acosta-Mendoza, N., Gago-Alonso, A., García-Reyes, E.B. (2019). Computing Anomaly Score Threshold with Autoencoders Pipeline. In: Vera-Rodriguez, R., Fierrez, J., Morales, A. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2018. Lecture Notes in Computer Science(), vol 11401. Springer, Cham. https://doi.org/10.1007/978-3-030-13469-3_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-13469-3_28
Published: 03 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13468-6
Online ISBN: 978-3-030-13469-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)