A GP-based ensemble classification framework for time-changing streams of intrusion detection data

Folino, Gianluigi; Pisani, Francesco Sergio; Pontieri, Luigi

doi:10.1007/s00500-020-05200-3

A GP-based ensemble classification framework for time-changing streams of intrusion detection data

Focus
Published: 01 August 2020

Volume 24, pages 17541–17560, (2020)
Cite this article

Soft Computing Aims and scope Submit manuscript

Gianluigi Folino ORCID: orcid.org/0000-0002-8139-3445¹,
Francesco Sergio Pisani¹ &
Luigi Pontieri¹

388 Accesses
14 Citations
Explore all metrics

Abstract

Intrusion detection tools have largely benefitted from the usage of supervised classification methods developed in the field of data mining. However, the data produced by modern system/network logs pose many problems, such as the streaming and non-stationary nature of such data, their volume and velocity, and the presence of imbalanced classes. Classifier ensembles look a valid solution for this scenario, owing to their flexibility and scalability. In particular, data-driven schemes for combining the predictions of multiple classifiers have been shown superior to traditional fixed aggregation criteria (e.g., predictions’ averaging and weighted voting). In intrusion detection settings, however, such schemes must be devised in an efficient way, since (part of) the ensemble may need to be re-trained frequently. A novel ensemble-based framework is proposed here for the online intrusion detection, where the ensemble is updated through an incremental stream-oriented learning scheme, correspondingly to the detection of concept drifts. Differently from mainstream ensemble-based approaches in the field, our proposal relies on deriving, though an efficient genetic programming (GP) method, an expressive kind of combiner function defined in terms of (non-trainable) aggregation functions. This approach is supported by a system architecture, which integrates different kinds of functionalities, ranging from the drift detection, to the induction and replacement of base classifiers, up to the distributed computation of GP-based combiners. Experiments on both artificial and real-life datasets confirmed the validity of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Cybersecurity Framework for Classifying Non Stationary Data Streams Exploiting Genetic Programming and Ensemble Learning

A Distributed Intrusion Detection Framework Based on Evolved Specialized Ensembles of Classifiers

Network Intrusion Detection Based on Data Feature Dynamic Ensemble Model

Notes

The only aggregation function, among those tested in Kuncheva (2004), that has not been integrated in our framework is the product, which was actually shown to not perform well enough in the general case of multi-class classification settings.
In fact, our MOA-based implementations of such models only takes account for the class labels, and disregard the feature vector x.
The choice of using fixed-size windows is mainly for the sake of concreteness and of presentation. In fact, our approach can be easily extended to deal with other data-segmentation schemes.
http://moa.cms.waikato.ac.nz/.
http://mutrics.iitis.pl/flowcalc.
http://www.cs.waikato.ac.nz/ml/weka.

References

Aburomman AA, Reaz MBI (2017) A novel weighted support vector machines multiclass classifier based on differential evolution for intrusion detection systems. Inf Sci 414:225–246
Article Google Scholar
Acosta-Mendoza N, Morales-Reyes A, Escalante HJ, Gago-Alonso A (2014) Learning to assemble classifiers via genetic programming. IJPRAI 28(7)
Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: SDM, vol 7. SIAM
Bifet A, Frank E, Holmes G, Pfahringer B (2012) Ensembles of restricted hoeffding trees. ACM Trans Intell Syst Technol (TIST) 3(2):1–20
Article Google Scholar
Borji A (2007) Combining heterogeneous classifiers for network intrusion detection. In: Cervesato I (ed) Advances in computer science—ASIAN 2007. Computer and network security, vol 4846. Lecture notes in computer science. Springer, Berlin, pp 254–260
Chapter Google Scholar
Buczak AL, Guven E (2016) A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun Surv Tutor 18(2):1153–1176
Article Google Scholar
CERT Australia (2012) Cyber crime and security survey report. Technical report, 2012
Costa VS, Farias ADS, Bedregal B, Santiago RHN, de P Canuto AM, Magaly de A (2018) Combining multiple algorithms in classifier ensembles using generalized mixture functions. Neurocomputing 313:402–414
Article Google Scholar
Cruz RMO, Sabourin R, Cavalcanti GDC (2018) Dynamic classifier selection: recent advances and perspectives. Inf Fusion 41:195–216
Article Google Scholar
de Oliveira DF, Canuto AMP, de Souto MCP (2009) Use of multi-objective genetic algorithms to investigate the diversity/accuracy dilemma in heterogeneous ensembles. In: International joint conference on neural networks. IEEE, pp 2339–2346
De Stefano C, Folino G, Fontanella F, Scotto di Freca A (2014) Using bayesian networks for selecting classifiers in GP ensembles. Inf Sci 258:200–216
Article MathSciNet Google Scholar
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Folino G, Sabatino P (2016) Ensemble based collaborative and distributed intrusion detection systems: a survey. J Netw Comput Appl 66(C):1–16
Article Google Scholar
Folino G, Pizzuti C, Spezzano G (2003) A scalable cellular implementation of parallel genetic programming. IEEE Trans Evol Comput 7(1):37–53
Article MATH Google Scholar
Folino G, Pizzuti C, Spezzano G (2008) Training distributed GP ensemble with a selective algorithm based on clustering and pruning for pattern classification. IEEE Trans Evol Comput 12(4):458–468
Article Google Scholar
Folino G, Pisani FS, Sabatino P (2016a) A distributed intrusion detection framework based on evolved specialized ensembles of classifiers. In: Applications of evolutionary computation—19th European conference, EvoApplications 2016, Porto, Portugal, 30 March–1 April 2016, Proceedings, Part I, pp 315–331
Folino G, Pisani FS, Sabatino P (2016b) An incremental ensemble evolved by using genetic programming to efficiently detect drifts in cyber security datasets. In: Genetic and evolutionary computation conference, GECCO 2016, Denver, CO, USA, 20–24 July 2016, Companion material proceedings, pp 1103–1110
Folino G, Pisani FS, Pontieri L (2019) A cybersecurity framework for classifying non stationary data streams exploiting genetic programming and ensemble learning. In: Numerical computations: theory and algorithms—3rd international conference, NUMTA 2019, Crotone, Italy, 15–21 June 2019, Revised Selected Papers, Part I, volume 11973 of Lecture notes in computer science, pp 269–277
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: SBIA Brazilian symposium on artificial intelligence. Springer, pp 286–295
Gao X, Shan C, Hu C, Niu Z, Liu Z (2019) An adaptive ensemble machine learning model for intrusion detection. IEEE Access 7:82512–82521
Article Google Scholar
García S, Herrera F (2009) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694
MATH Google Scholar
Gonçalves PM Jr, de Carvalho Santos SGT, de Barros RSM, De Lima Vieira DC (2014) A comparative study on concept drift detectors. Expert Syst Appl 41(18):8144–8156
Article Google Scholar
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 97–106
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge
MATH Google Scholar
Kumar G (2020) An improved ensemble approach for effective intrusion detection. J Supercomput 76(1):275–291
Article Google Scholar
Kuncheva L (2004) Combining pattern classifiers: methods and algorithms. Wiley-Interscience, New York
Book MATH Google Scholar
Nishida K, Yamauchi K (2007) Detecting concept drift using statistical testing. In: Discovery science. Springer, pp 264–269
Oza N (2001) Online bagging and boosting. Proc Artif Intell Stat 2001:105–112
Google Scholar
Perdisci R, Ariu D, Fogla P, Giacinto G, Lee W (2009) Mcpad: A multiple classifier system for accurate payload-based anomaly detection. Comput Netw 53(6):864–881 (Traffic classification and its applications to modern networks)
Article MATH Google Scholar
Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2):197–227
Google Scholar
Schapire RE (1995) Boosting a weak learning by majority. Inf Comput 121(2):256–285
Article MathSciNet Google Scholar
Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA (2012) Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput Secur 31(3):357–374
Article Google Scholar
Sindhu SSS, Geetha S, Kannan A (2012) Decision tree based light weight intrusion detection using a wrapper approach. Expert Syst Appl 39(1):129–141
Article Google Scholar
Sylvester J, Chawla NV (2005) Evolutionary ensembles: combining learning agents using genetic algorithms. In: AAAI workshop on multiagent learning, pp 46–51
Tavallaee M, Stakhanova N, Ghorbani AA (2010) Toward credible evaluation of anomaly-based intrusion-detection methods. IEEE Trans Syst Man Cybern Part C Appl Rev 40(5):516–524
Article Google Scholar
Žliobaitė I, Bifet A, Read J, Pfahringer B, Holmes G (2015) Evaluation methods and decision theory for classification of streaming data with temporal dependence. Mach Learn 98(3):455–482
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

ICAR-CNR, Rende, Italy
Gianluigi Folino, Francesco Sergio Pisani & Luigi Pontieri

Authors

Gianluigi Folino
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Sergio Pisani
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Pontieri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gianluigi Folino.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by Yaroslav D. Sergeyev.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Folino, G., Pisani, F.S. & Pontieri, L. A GP-based ensemble classification framework for time-changing streams of intrusion detection data. Soft Comput 24, 17541–17560 (2020). https://doi.org/10.1007/s00500-020-05200-3

Download citation

Published: 01 August 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s00500-020-05200-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A GP-based ensemble classification framework for time-changing streams of intrusion detection data

Abstract

Access this article

Similar content being viewed by others

A Cybersecurity Framework for Classifying Non Stationary Data Streams Exploiting Genetic Programming and Ensemble Learning

A Distributed Intrusion Detection Framework Based on Evolved Specialized Ensembles of Classifiers

Network Intrusion Detection Based on Data Feature Dynamic Ensemble Model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A GP-based ensemble classification framework for time-changing streams of intrusion detection data

Abstract

Access this article

Similar content being viewed by others

A Cybersecurity Framework for Classifying Non Stationary Data Streams Exploiting Genetic Programming and Ensemble Learning

A Distributed Intrusion Detection Framework Based on Evolved Specialized Ensembles of Classifiers

Network Intrusion Detection Based on Data Feature Dynamic Ensemble Model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation