Abstract
Nowadays, the worldwide Internet diffusion has made sharing information a piece of cake. This leads to the problem of protecting this huge amount of information (private data, commercial or strategic information) from those not authorized. A way for protecting it is the adoption of intrusion detection systems, which reveal whether an attacker is violating an information system. In this work, an algorithm for identifying anomalies on network traffic has been studied and developed. This is based on the unsupervised fitting of a set of network data by means of finite Gaussian mixtures models. Its key feature is the online selection of the number of mixture components together with the fitting parameter of each component. The best compromise between the description accuracy (many components) and the computational complexity (few components) is given by a derivation of the minimum message length criterion. The normal network behavior is assumed to be interpreted by the cluster with the highest covariance matrix, while the other smaller components are considered representing anomalies. We tested our technique with the well-known KDD99 Cup data set, in order to clearly compare our findings with the ones presenting the state of the art. Our results show the effectiveness of this approach, while encouraging for further improvements.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adam A, Rivlin E, Shimshoni I (2000) Ror: rejection of outliers by rotations in stereo matching. In: Conference on computer vision and pattern recognition (CVPR-00), pp 1002–1009
Axelsson S (1999) The base-rate fallacy and its implications for the difficulty of intrusion detection. In: Proceeding CCS ’99 proceedings of the 6th ACM conference on computer and communications security
Bi M, Xu J, Wang M, Zhou F (2016) Anomaly detection model of user behavior based on principal component analysis. J Ambient Intell Humaniz Comput 7(4):547–554
Bostani H, Sheikhan M (2015) Hybrid of binary gravitational search algorithm and mutual information for feature selection in intrusion detection systems. Soft Computing. doi:10.1007/s00500-015-1942-8
Brand M (1999) Structure learning in conditional probability models via entropic prior and parameter extinction. Neural Comput 11:1155–1182
Breunig M, Kriegel HP, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: SIGMOD ’00 proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp 93–104
Caviglione L, Merlo A, Migliardi M (2011) What is green security? In: Proceedings of the 2011 7th international conference on information assurance and security, pp 366–371
CISCO: http://www.cisco.com
Craymer M, Cannady J, Harrell J (1996) New methods of intrusion detection using control-loop measurement. In: Fourth technology for information security conference
Dash T (2015) A study on intrusion detection using neural networks trained with evolutionary algorithms. Soft Comput. doi:10.1007/s00500-015-1967-z
Dempster A, Laird N, Rubin D (1977) Maximum likelihood estimation from incomplete data via the em algorithm. J R Stat Soc 30(B):1–38
de la Torre F, Black MJ (2001) Robust principal component analysis for computer vision. In; Proceedings of the eighth international conference on computer vision (ICCV-01), pp 362–369
Ding Z, Fei M, Du D, Yang F (2016) Streaming data anomaly detection method based on hyper-grid structure and online ensemble learning. Soft Comput. doi:10.1007/s00500-016-2258-z
Esking E (2000) Anomaly detection over noisy data using learned probability distributions. In: ICML conference proceedings, pp 255–262
Figueiredo A, Jain A (2002) Unsupervised learning of finite mixture models. IEEE Trans Patt Anal Mach Intell 24(3):381–396
Fiore U, Palmieri F, Castiglione A, Santis AD (2013) Network anomaly detection with the restricted boltzmann machine. Neurocomputing 122(25):13–23
Ghoting A, Otey M, Parthasarathy S (2004) Loaded: Link-based outlier and anomaly detection in evolving data sets. In: Proceedings of the fourth IEEE international conference on data mining, pp 387–390
Gomez J, Gil C, Banos R, Marquez AL, Montoya FG (2013) A pareto-based multi-objective evolutionary algorithm for automatic rule generation in network intrusion detection systems. Soft Comput 17(2):255–263
Govaert G, Nadif M (2006) Fuzzy clustering to estimate the parameters of block mixture models. Soft Comput 10(5):415–422
Greggio N (2012) Learning anomalies in intrusion detection systems by means of greedy finite GMMs. Information security master thesis, University of Modena and Reggio Emilia, Italy
Greggio N (2013) Learning anomalies in IDSs by means of multivariate finite mixture models. In: IEEE 27th international conference on advanced information networking and applications—track: security and privacy (AINA), Barcelona, Spain
Greggio N, Bernardino A, Laschi C, Dario P, Santos-Victor J (2011) Fast estimation of gaussian mixture models for image segmentation. Machine Vision and Applications, pp 1–17. doi:10.1007/s00138-011-0320-5
Hero AO (2006) Geometric entropy minimization (gem) for anomaly detection and localization. In: Proceedings advances in neural information processing systems (NIPS), MIT Press, pp 585–592
Hettich S, Bay S (1999) Kdd cup 1999 data—uci kdd archive. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
B J, Smith A (1994) Bayesian Theory. Wiley, Chichester
Kim J, Bentley PJ (2002) Towards an artificial immune system for network intrusion detection: an investigation of dynamic clonal selection. In: Proceedings of the evolutionary computation on 2002, (CEC 02) 02
Koufakou A, Georgiopoulos M, Anagnostopoulos G (2008) Detecting outliers in high-dimensional datasets with mixed attributes. In: Proceedings of DMIN, pp 427–433
Lanterman A (2001) Schwarz, Wallace and Rissanen: intertwining themes in theories of model order estimation. Int’l Stat Rev 69:185–212
Laxhammar R, Falkman G, Sviestins E (2009) Anomaly detection in sea traffic - a comparison of the gaussian mixture model and the kernel density estimator. In: 12th International conference on information fusion Seattle, WA, USA
Li X, Chong F (2013) A case for energy-aware security mechanisms. In: Proceedings—27th international conference on advanced information networking and applications workshops, WAINA 2013, pp 1541–1546
Lippmann RP, Fried DJ, Graf I, Haines JW, Kendall KR, McClung D, Weber D, Webster SE, Wyschogrod D, Cunningham RK, Zissman MA (2000) Evaluating intrusion detection systems: the 1998 darpa off-line intrusion detection evaluation. In: discex 2, 1012
Malik H, Davis IJ, Godfrey MW, Neuse D, Manskovskii S (2016) Connecting the dots: anomaly and discontinuity detection in large-scale systems. J Ambient Intell Humaniz Comput 7(4):509–522
Markou M, Singh S (2003) Novelty detection: a review part 1: statistical approaches. Signal Process 83:2481–2497
Matlab: The matlab package. url: www.mathworks.com
Migliardi M, Merlo A (2013) Energy consumption simulation of different distributed intrusion detection approaches. In: Proceedings—27th international conference on advanced information networking and applications workshops, WAINA 2013, pp 1547–1552
MIT: Mit and lincoln and labs. http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/index.html
Otey ME, Ghoting A, Parthasarathy S (2006) Fast distributed outlier detection in mixed-attribute data sets. Data Min Knowl Discov 12(2–3):203–228
Palmieri F, Fiore U (2010) Network anomaly detection through nonlinear analysis. Comput Secur 29(7):737–755
Palmieri F, Fiore U, Castiglione A (2013) A distributed approach to network anomaly detection based on independent component analysis. Pract Exp Concurr Computat 26(5):1113–1129
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: ACM SIGMOD conference proceedings, pp 427–438
Snort: http://www.snort.org/
Song X, Wu M, Jermaine C, Ranka S (2006) Conditional anomaly detection. IEEE Trans Data Knowl Eng 19(5). doi:10.1109/TKDE.2007.1009
Stolfo SJ, Fan W, Lee W, Prodromidis A, Chan P (2000) Cost-based modeling for fraud and intrusion detection: Results from the jam project. In: Discex 2:1130
Tavallaee M, Bagheri E, Lu W, Ghorbani A (2009a) A detailed analysis of the kdd cup 99 data set. In: Second IEEE symposium on computational intelligence for security and defense applications (CISDA)
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009b) A detailed analysis of the kdd cup 99 data set. In: Proceedings of the 2009 IEEE symposium on computational intelligence in security and defense applications (CISDA 2009)
Tran KN, Jin H (2009) Fusion of decision tree and gaussian mixture models for heterogeneous data sets. In: International conference on information and multimedia technology, 2009. ICIMT ’09
Willis CJ (2005) Anomaly detection in hyperspectral imagery using statistical mixture models. In: 2nd EMRS DTC technical conference
Wong WK, Moore A, Cooper G, Wagner M (2002) Rule-based anomaly pattern detection for detecting disease outbreaks. In: AAAI conference proceedings, pp 217–223
Xu L, J M (1996) On convergence properties of the em algorithm for gaussian mixtures. Neural Comput 8:129–151
Yang J, Deng J, Li S, Hao Y (2015) Improved traffic detection with support vector machine based on restricted boltzmann machine. Soft Comput. doi:10.1007/s00500-015-1994-9
Zhang Y, Lee W (2000) Intrusion detection in wireless ad-hoc networks. In: MOBICOM, pp 275–283
Acknowledgements
We thank Marco Ivaldi and @ Mediaservice.net s.r.l. for their support.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Nicola Greggio declares that he has no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Greggio, N. Anomaly Detection in IDSs by means of unsupervised greedy learning of finite mixture models. Soft Comput 22, 3357–3372 (2018). https://doi.org/10.1007/s00500-017-2581-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-017-2581-z