Abstract
Anomaly detection is an important problem in many applications, ranging from medical informatics to network security. Various distribution-based techniques have been proposed to tackle this issue, which try to learn the probabilistic distribution of conventional behaviors and consider the observations with low densities as anomalies. For categorical observations, multinomial or dirichlet compound multinomial distributions were adopted as effective statistical models for conventional samples. However, when faced with small-scale data set containing multivariate categorical samples, these models will suffer from the curse of dimensionality and fail to capture the statistical properties of conventional behavior, since only a small proportion of possible categorical configurations will exist in the training data. As an effective bayesian non-parametric technique, categorical latent Gaussian process is able to model small-scale categorical data through learning a continuous latent space for multivariate categorical samples with Gaussian process. Therefore, on the basis of categorical latent Gaussian process, we propose an anomaly detection technique for multivariate categorical observations. In our method, categorical latent Gaussian process is adopted to capture the probabilistic distributions of conventional categorical samples. Experimental results on categorical data set show that our method can effectively detect anomalous categorical observations and achieve better detection performance compared with other anomaly detection techniques.
This is a preview of subscription content, log in via an institution.
References
Abolhasanzadeh, B.: Gaussian process latent variable model for dimensionality reduction in intrusion detection. In: Electrical Engineering (2015)
Agarwal, D.: Detecting anomalies in cross-classified streams: a Bayesian approach. Knowl. Inf. Syst. 11(1), 29–44 (2007)
Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: ACM Sigmod Record, vol. 30, pp. 37–46. ACM (2001)
Anscombe, F.J.: Rejection of outliers. Technometrics 2(2), 123–146 (1960)
Beal, M.J.: Variational algorithms for approximate Bayesian inference. University of London United Kingdom (2003)
Butun, I., Morgera, S.D., Sankar, R.: A survey of intrusion detection systems in wireless sensor networks. IEEE Commun. Surv. Tutor. 16(1), 266–282 (2014)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)
D’Alconzo, A., Coluccia, A., Ricciato, F., Romirer-Maierhofer, P.: A distribution-based approach to anomaly detection and application to 3G mobile traffic. In: GLOBECOM, pp. 1–8 (2009)
Damianou, A.C., Titsias, M.K., Lawrence, N.D.: Variational inference for latent variables and uncertain inputs in Gaussian processes. J. Mach. Learn. Res. 17(1), 1425–1486 (2016)
Gal, Y., Chen, Y., Ghahramani, Z.: Latent Gaussian processes for distribution estimation of multivariate categorical data. In: Proceedings of the 32nd International Conference on Machine Learning, ICML2015, pp. 645–654 (2015)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Kudo, D., Waizumi, Y., Nemoto, Y.: Network traffic anomaly detection using multinomial distribution model according to service. Gastroenterology 148(4), S-500–S-501 (2015)
Laxhammar, R., Falkman, G., Sviestins, E.: Anomaly detection in sea traffic - a comparison of the Gaussian mixture model and the kernel density estimator. In: International Conference on Information Fusion, pp. 756–763. IEEE Computer Society (2009)
Oliveira, H., Caeiro, J.J., Correia, P.L.: Improved road crack detection based on one-class parzen density estimation and entropy reduction. In: 2010 17th IEEE International Conference on Image Processing (ICIP), pp. 2201–2204 (2010)
Orbanz, P., Teh, Y.W.: Bayesian nonparametric models. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 81–89. Springer, Boston (2011)
Ranganathan, A.: PLISS: detecting and labeling places using online change-point detection. Auton. Robots 32(4), 351–368 (2010)
Shewhart, W.A.: Economic Control of Quality of Manufactured Product. Van Nostrand, New York City (1931)
Swarnkar, M., Hubballi, N.: OCPAD: one class Naive Bayes classifier for payload based anomaly detection. Expert Syst. Appl. 64, 330–339 (2016)
Titsias, M.K.: Variational learning of inducing variables in sparse Gaussian processes. In: AISTATS, vol. 5, pp. 567–574 (2009)
Van Vlasselaer, V., Bravo, C., Caelen, O., Eliassi-Rad, T., Akoglu, L., Snoeck, M., Baesens, B.: APATE: a novel approach for automated credit card transaction fraud detection using network-based extensions. Decis. Support Syst. 75, 38–48 (2015)
Wang, W., Zhang, B., Wang, D., Jiang, Y., Qin, S., Xue, L.: Anomaly detection based on probability density function with Kullback-Leibler divergence. Sig. Process. 126, 12–17 (2016)
Acknowledgments
This paper is supported by the National Natural Science Foundation of China under grant No. 61572109, No. 11461006 and No. 61402080. The authors would like to thank the anonymous reviewers for their helpful and constructive comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Lv, F., Yang, G., Wu, J., Liu, C., Yang, Y. (2017). Anomaly Detection for Categorical Observations Using Latent Gaussian Process. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10638. Springer, Cham. https://doi.org/10.1007/978-3-319-70139-4_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-70139-4_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70138-7
Online ISBN: 978-3-319-70139-4
eBook Packages: Computer ScienceComputer Science (R0)