Skip to main content
Log in

Anomaly Detection in IDSs by means of unsupervised greedy learning of finite mixture models

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Nowadays, the worldwide Internet diffusion has made sharing information a piece of cake. This leads to the problem of protecting this huge amount of information (private data, commercial or strategic information) from those not authorized. A way for protecting it is the adoption of intrusion detection systems, which reveal whether an attacker is violating an information system. In this work, an algorithm for identifying anomalies on network traffic has been studied and developed. This is based on the unsupervised fitting of a set of network data by means of finite Gaussian mixtures models. Its key feature is the online selection of the number of mixture components together with the fitting parameter of each component. The best compromise between the description accuracy (many components) and the computational complexity (few components) is given by a derivation of the minimum message length criterion. The normal network behavior is assumed to be interpreted by the cluster with the highest covariance matrix, while the other smaller components are considered representing anomalies. We tested our technique with the well-known KDD99 Cup data set, in order to clearly compare our findings with the ones presenting the state of the art. Our results show the effectiveness of this approach, while encouraging for further improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Adam A, Rivlin E, Shimshoni I (2000) Ror: rejection of outliers by rotations in stereo matching. In: Conference on computer vision and pattern recognition (CVPR-00), pp 1002–1009

  • Axelsson S (1999) The base-rate fallacy and its implications for the difficulty of intrusion detection. In: Proceeding CCS ’99 proceedings of the 6th ACM conference on computer and communications security

  • Bi M, Xu J, Wang M, Zhou F (2016) Anomaly detection model of user behavior based on principal component analysis. J Ambient Intell Humaniz Comput 7(4):547–554

  • Bostani H, Sheikhan M (2015) Hybrid of binary gravitational search algorithm and mutual information for feature selection in intrusion detection systems. Soft Computing. doi:10.1007/s00500-015-1942-8

  • Brand M (1999) Structure learning in conditional probability models via entropic prior and parameter extinction. Neural Comput 11:1155–1182

    Article  Google Scholar 

  • Breunig M, Kriegel HP, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: SIGMOD ’00 proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp 93–104

  • Caviglione L, Merlo A, Migliardi M (2011) What is green security? In: Proceedings of the 2011 7th international conference on information assurance and security, pp 366–371

  • CISCO: http://www.cisco.com

  • Craymer M, Cannady J, Harrell J (1996) New methods of intrusion detection using control-loop measurement. In: Fourth technology for information security conference

  • Dash T (2015) A study on intrusion detection using neural networks trained with evolutionary algorithms. Soft Comput. doi:10.1007/s00500-015-1967-z

  • Dempster A, Laird N, Rubin D (1977) Maximum likelihood estimation from incomplete data via the em algorithm. J R Stat Soc 30(B):1–38

    MATH  Google Scholar 

  • de la Torre F, Black MJ (2001) Robust principal component analysis for computer vision. In; Proceedings of the eighth international conference on computer vision (ICCV-01), pp 362–369

  • Ding Z, Fei M, Du D, Yang F (2016) Streaming data anomaly detection method based on hyper-grid structure and online ensemble learning. Soft Comput. doi:10.1007/s00500-016-2258-z

  • Esking E (2000) Anomaly detection over noisy data using learned probability distributions. In: ICML conference proceedings, pp 255–262

  • Figueiredo A, Jain A (2002) Unsupervised learning of finite mixture models. IEEE Trans Patt Anal Mach Intell 24(3):381–396

  • Fiore U, Palmieri F, Castiglione A, Santis AD (2013) Network anomaly detection with the restricted boltzmann machine. Neurocomputing 122(25):13–23

    Article  Google Scholar 

  • Ghoting A, Otey M, Parthasarathy S (2004) Loaded: Link-based outlier and anomaly detection in evolving data sets. In: Proceedings of the fourth IEEE international conference on data mining, pp 387–390

  • Gomez J, Gil C, Banos R, Marquez AL, Montoya FG (2013) A pareto-based multi-objective evolutionary algorithm for automatic rule generation in network intrusion detection systems. Soft Comput 17(2):255–263

    Article  Google Scholar 

  • Govaert G, Nadif M (2006) Fuzzy clustering to estimate the parameters of block mixture models. Soft Comput 10(5):415–422

    Article  Google Scholar 

  • Greggio N (2012) Learning anomalies in intrusion detection systems by means of greedy finite GMMs. Information security master thesis, University of Modena and Reggio Emilia, Italy

  • Greggio N (2013) Learning anomalies in IDSs by means of multivariate finite mixture models. In: IEEE 27th international conference on advanced information networking and applications—track: security and privacy (AINA), Barcelona, Spain

  • Greggio N, Bernardino A, Laschi C, Dario P, Santos-Victor J (2011) Fast estimation of gaussian mixture models for image segmentation. Machine Vision and Applications, pp 1–17. doi:10.1007/s00138-011-0320-5

  • Hero AO (2006) Geometric entropy minimization (gem) for anomaly detection and localization. In: Proceedings advances in neural information processing systems (NIPS), MIT Press, pp 585–592

  • Hettich S, Bay S (1999) Kdd cup 1999 data—uci kdd archive. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

  • B J, Smith A (1994) Bayesian Theory. Wiley, Chichester

    Google Scholar 

  • Kim J, Bentley PJ (2002) Towards an artificial immune system for network intrusion detection: an investigation of dynamic clonal selection. In: Proceedings of the evolutionary computation on 2002, (CEC 02) 02

  • Koufakou A, Georgiopoulos M, Anagnostopoulos G (2008) Detecting outliers in high-dimensional datasets with mixed attributes. In: Proceedings of DMIN, pp 427–433

  • Lanterman A (2001) Schwarz, Wallace and Rissanen: intertwining themes in theories of model order estimation. Int’l Stat Rev 69:185–212

    Article  MATH  Google Scholar 

  • Laxhammar R, Falkman G, Sviestins E (2009) Anomaly detection in sea traffic - a comparison of the gaussian mixture model and the kernel density estimator. In: 12th International conference on information fusion Seattle, WA, USA

  • Li X, Chong F (2013) A case for energy-aware security mechanisms. In: Proceedings—27th international conference on advanced information networking and applications workshops, WAINA 2013, pp 1541–1546

  • Lippmann RP, Fried DJ, Graf I, Haines JW, Kendall KR, McClung D, Weber D, Webster SE, Wyschogrod D, Cunningham RK, Zissman MA (2000) Evaluating intrusion detection systems: the 1998 darpa off-line intrusion detection evaluation. In: discex 2, 1012

  • Malik H, Davis IJ, Godfrey MW, Neuse D, Manskovskii S (2016) Connecting the dots: anomaly and discontinuity detection in large-scale systems. J Ambient Intell Humaniz Comput 7(4):509–522

    Article  Google Scholar 

  • Markou M, Singh S (2003) Novelty detection: a review part 1: statistical approaches. Signal Process 83:2481–2497

    Article  MATH  Google Scholar 

  • Matlab: The matlab package. url: www.mathworks.com

  • Migliardi M, Merlo A (2013) Energy consumption simulation of different distributed intrusion detection approaches. In: Proceedings—27th international conference on advanced information networking and applications workshops, WAINA 2013, pp 1547–1552

  • MIT: Mit and lincoln and labs. http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/index.html

  • Otey ME, Ghoting A, Parthasarathy S (2006) Fast distributed outlier detection in mixed-attribute data sets. Data Min Knowl Discov 12(2–3):203–228

    Article  MathSciNet  Google Scholar 

  • Palmieri F, Fiore U (2010) Network anomaly detection through nonlinear analysis. Comput Secur 29(7):737–755

    Article  Google Scholar 

  • Palmieri F, Fiore U, Castiglione A (2013) A distributed approach to network anomaly detection based on independent component analysis. Pract Exp Concurr Computat 26(5):1113–1129

  • Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: ACM SIGMOD conference proceedings, pp 427–438

  • Snort: http://www.snort.org/

  • Song X, Wu M, Jermaine C, Ranka S (2006) Conditional anomaly detection. IEEE Trans Data Knowl Eng 19(5). doi:10.1109/TKDE.2007.1009

  • Stolfo SJ, Fan W, Lee W, Prodromidis A, Chan P (2000) Cost-based modeling for fraud and intrusion detection: Results from the jam project. In: Discex 2:1130

  • Tavallaee M, Bagheri E, Lu W, Ghorbani A (2009a) A detailed analysis of the kdd cup 99 data set. In: Second IEEE symposium on computational intelligence for security and defense applications (CISDA)

  • Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009b) A detailed analysis of the kdd cup 99 data set. In: Proceedings of the 2009 IEEE symposium on computational intelligence in security and defense applications (CISDA 2009)

  • Tran KN, Jin H (2009) Fusion of decision tree and gaussian mixture models for heterogeneous data sets. In: International conference on information and multimedia technology, 2009. ICIMT ’09

  • Willis CJ (2005) Anomaly detection in hyperspectral imagery using statistical mixture models. In: 2nd EMRS DTC technical conference

  • Wong WK, Moore A, Cooper G, Wagner M (2002) Rule-based anomaly pattern detection for detecting disease outbreaks. In: AAAI conference proceedings, pp 217–223

  • Xu L, J M (1996) On convergence properties of the em algorithm for gaussian mixtures. Neural Comput 8:129–151

    Article  Google Scholar 

  • Yang J, Deng J, Li S, Hao Y (2015) Improved traffic detection with support vector machine based on restricted boltzmann machine. Soft Comput. doi:10.1007/s00500-015-1994-9

  • Zhang Y, Lee W (2000) Intrusion detection in wireless ad-hoc networks. In: MOBICOM, pp 275–283

Download references

Acknowledgements

We thank Marco Ivaldi and @ Mediaservice.net s.r.l. for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicola Greggio.

Ethics declarations

Conflict of interest

Nicola Greggio declares that he has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Greggio, N. Anomaly Detection in IDSs by means of unsupervised greedy learning of finite mixture models. Soft Comput 22, 3357–3372 (2018). https://doi.org/10.1007/s00500-017-2581-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-017-2581-z

Keywords

Navigation