Optimum-path forest stacking-based ensemble for intrusion detection

Bertoni, Mateus A.; Rosa, Gustavo H. de; Brega, Jose R. F.

doi:10.1007/s12065-021-00609-7

Optimum-path forest stacking-based ensemble for intrusion detection

Research Paper
Published: 12 May 2021

Volume 15, pages 2037–2054, (2022)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

Mateus A. Bertoni¹,
Gustavo H. de Rosa ORCID: orcid.org/0000-0002-6442-8343¹ &
Jose R. F. Brega¹

386 Accesses
6 Citations
Explore all metrics

Abstract

Machine learning techniques have been extensively researched in the last years, mainly due to their effectiveness when dealing with recognition or classification applications. Typically, one can comprehend using a Machine Learning system to autonomously delegate routines, save human efforts, and produce great insights regarding decision-making tasks. This paper introduces and validates a stacking-based ensemble approach using Optimum-Path Forest classifiers in intrusion detection tasks. Instead of only using the famous NSL-KDD dataset, we propose a new dataset called uneSPY, which we believe will fill the gap concerning new intrusion detection datasets. Both datasets were evaluated under several classifiers, including Logistic Regression, Decision Trees, Support Vector Machines, Optimum-Path Forests, and compared against Optimum-Path Forest stacking-based ensembles. Experimental results showed an Optimum-Path Forest stacking-based ensemble classification suitability, particularly when considering its ability to generalize large volumes of data while sustaining its performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A random forest guided tour

Article 19 April 2016

Gérard Biau & Erwan Scornet

A comparative analysis of gradient boosting algorithms

Article 24 August 2020

Candice Bentéjac, Anna Csörgő & Gonzalo Martínez-Muñoz

A survey on ensemble learning

Article 30 August 2019

Xibin Dong, Zhiwen Yu, … Qianli Ma

Notes

Prototypes are master nodes representing a specific class and conquer other nodes.
MSTs are subgraphs that connect all nodes within the same set using the minimum possible cost.
Regions more likely to classification mistakes.
Described in Appendix 1 and available in http://recogna.tech/files/datasets/unespy.rar.
Wi-Fi adapter model: TP-Link TL-WN725N V2.
Available in http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
https://scikit-learn.org.
The source code is available at https://github.com/malvesbertoni/ensemble_opf.

References

Amarudin, Ferdiana, R., Widyawan: A systematic literature review of intrusion detection system for network security: Research trends, datasets and methods. In: 2020 4th international conference on informatics and computational sciences (ICICoS), pp 1–6 (2020). https://doi.org/10.1109/ICICoS51170.2020.9299068
Asrafi N, Lo DCT, Parizi RM, Shi Y, Chen YW (2020) Comparing performance of malware classification on automated stacking. In: Proceedings of the 2020 ACM southeast conference, ACM SE ’20, pp 307–308. Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/3374135.3385316. https://doi.org/10.1145/3374135.3385316
Balajinath B, Raghavan S (2001) Intrusion detection through learning behavior model. Comput Commun 24(12):1202–1212. https://doi.org/10.1016/S0140-3664(00)00364-9
Article Google Scholar
Bhuyan MH, Bhattacharyya DK, Kalita JK (2014) Network anomaly detection: methods, systems and tools. IEEE Commun Surv Tutor 16(1):303–336. https://doi.org/10.1109/SURV.2013.052213.00046
Article Google Scholar
Breiman L (1996) Stacked regressions. Mach Learn 24(1):49–64. https://doi.org/10.1023/A:1018046112532.
Article MATH Google Scholar
Chand N, Mishra P, Krishna CR, Pilli ES, Govil MC (2016) A comparative analysis of svm and its stacking with other classification algorithm for intrusion detection. In: 2016 international conference on advances in computing, communication, automation (ICACCA) (Spring), pp 1–6 . https://doi.org/10.1109/ICACCA.2016.7578859
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3), 15:1–15:58 . https://doi.org/10.1145/1541880.1541882
Cisco: The 2018 cisco annual security report. In: The 2018 Cisco annual security report, pp. 46–47. Cisco Systems (2013)
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Denning DE (1987) An intrusion-detection model. IEEE Trans Softw Eng SE-13(2), 222–232 . https://doi.org/10.1109/TSE.1987.232894
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200), 675–701 . http://www.jstor.org/stable/2279372
Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12(10):993–1001. https://doi.org/10.1109/34.58871
Article Google Scholar
Hsu Y, He Z, Tarutani Y, Matsuoka M (2019) Toward an online network intrusion detection system based on ensemble learning. In: 2019 IEEE 12th international conference on cloud computing (CLOUD), pp 174–178 . https://doi.org/10.1109/CLOUD.2019.00037
Kaushik SS, Deshmukh DR (2011) Detection of attacks in an intrusion detection system
Kendall KKR (1999) A database of computer attacks for the evaluation of intrusion detection systems. Ph.D. thesis, Massachusetts Institute of Technology
Krishnaveni S, Sivamohan S, Sridhar S, Prabakaran S (2021) Efficient feature selection and classification through ensemble method for network intrusion detection on cloud computing. Cluster Comput. https://doi.org/10.1007/s10586-020-03222-y
Mirza AH (2018) Computer network intrusion detection using various classifiers and ensemble learning. In: 2018 26th signal processing and communications applications conference (SIU), pp 1–4 . https://doi.org/10.1109/SIU.2018.8404704
Mukhopadhyay I, Chakraborty M, Chakrabarti S (2011) A comparative study of related technologies of intrusion detection & prevention systems. J Inf Secur 2:28–38. https://doi.org/10.4236/jis.2011.21003
Article Google Scholar
Musa US, Chhabra M, Ali A, Kaur M (2020) Intrusion detection system using machine learning techniques: A review. In: 2020 international conference on smart electronics and communication (ICOSEC), pp 149–155 . https://doi.org/10.1109/ICOSEC49089.2020.9215333
Nemenyi P (1963) Distribution-free multiple comparisons. Princeton University . https://books.google.com.br/books?id=nhDMtgAACAAJ
P. Lippmann, R., J. Fried, D., Graf, I., W. Haines, J., R. Kendall, K., McClung, D., Weber, D., Webster, S., Wyschogrod, D., Cunningham, R., Zissman, M.: Evaluating intrusion detection systems: the 1998 darpa off-line intrusion detection evaluation. pp 12–26 vol.2 (2000). https://doi.org/10.1109/DISCEX.2000.821506
Papa JP, Falcao AX, Suzuki CT (2009) Supervised pattern classification based on optimum-path forest. Int J Imaging Syst Technol 19(2):120–131
Article Google Scholar
Rajadurai H, Gandhi UD (2020) A stacked ensemble learning model for intrusion detection in wireless network. Neural Comput Appl
Rajagopal S, Kundapur PP, Hareesha KS (2020) A stacking ensemble for network intrusion detection using heterogeneous datasets. Secur Commun Netw. https://doi.org/10.1155/2020/4586875
Resende PAA, Drummond AC (2018) A survey of random forest based methods for intrusion detection systems 51:3. https://doi.org/10.1145/3178582.
Article Google Scholar
Revathi S, Malathi A (2013) A detailed analysis on nsl-kdd dataset using various machine learning techniques for intrusion detection. Int J Eng Res Technol (IJERT) 2:1848–1853
Google Scholar
de Rosa GH, Papa JP, Falcão AX (2020) Opfython: A python-inspired optimum-path forest classifier
Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2):197–227. https://doi.org/10.1023/A:1022648800760
Article Google Scholar
Tama BA, Patil AS, Rhee K (2017) An improved model of anomaly detection using two-level classifier ensemble. In: 2017 12th Asia joint conference on information security (AsiaJCIS), pp 1–4 . https://doi.org/10.1109/AsiaJCIS.2017.9
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the kdd cup 99 data set. In: Proceedings of the second IEEE international conference on computational intelligence for security and defense applications, CISDA’09, pp 53–58. IEEE Press, Piscataway, NJ, USA
Timčenko V, Gajin S (2017) Ensemble classifiers for supervised anomaly based network intrusion detection. In: 2017 13th IEEE international conference on intelligent computer communication and processing (ICCP), pp 13–19 . https://doi.org/10.1109/ICCP.2017.8116977
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259. https://doi.org/10.1016/S0893-6080(05)80023-1 (http://www.sciencedirect.com/science/article/pii/S0893608005800231)
Article Google Scholar
Yang P, Wan X, Shi G, Qu H, Li J, Yang L (2020) Naruto: DNS Covert Channels Detection Based on Stacking Model, p. 109–115. Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/3425329.3425336
Zhou Y, Cheng G (2019) An efficient network intrusion detection system based on feature selection and ensemble classifier. CoRR abs/1904.01352 . http://arxiv.org/abs/1904.01352

Download references

Acknowledgements

The authors are grateful to Bruna de Camargo Rubio for her assist in making the uneSPY dataset.

Author information

Authors and Affiliations

School of Sciences, São Paulo State University, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01, Bauru, SP, 17033-360, Brazil
Mateus A. Bertoni, Gustavo H. de Rosa & Jose R. F. Brega

Authors

Mateus A. Bertoni
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo H. de Rosa
View author publications
You can also search for this author in PubMed Google Scholar
Jose R. F. Brega
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gustavo H. de Rosa.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors are grateful to Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) - Finance Code 001 and São Paulo Research Foundation (FAPESP) grant #2019/02205-5.

Appendices

Appendix 1

To fulfill the OPF classifier needs, transforming the dataset into numeric data was necessary. Since maintaining the data’s control and patterns was crucial, codification tables were created for every single alphanumeric data transformed. Accordingly, the dataset preserves its information and can be used without losing its essence. As mentioned in Sect. 5.1.1 the uneSPY dataset has 23 features, which are presented in Table 4. The original uneSPY dataset has 3 features that were ignored, which are “Source”, “Destination” and “Info”. Its removal is due to its unsuitability with the OPF classifier. Also, Table 5 presents the data regarding “IGMP Types”; Table 6 describes the DHCPv6 codified messages; Table 7 introduces the classification of the packet types; and Table 8 displays the dataset codified protocols.

Table 4 Description of features employed in the uneSPY dataset

Full size table

Table 5 Description of IGMP types employed in the uneSPY dataset

Full size table

Table 6 Description of DHCPv6 messages codification employed in the uneSPY dataset

Full size table

Table 7 Description of packet types codification employed in the uneSPY dataset

Full size table

Table 8 Description of protocols codification employed in the uneSPY dataset

Full size table

Appendix 2

This appendix displays all results achieved through the experiments presented in Sect. 6. As mentioned earlier, regarding the uneSPY dataset, Table 9 refers to the single classifiers’ experiment, Table 10 displays the homogeneous stacked-based ensembles’ results and Table 11 shows the heterogeneous stacked-based ensembles’. Furthermore, concerning the NSL-KDD dataset, Table 12 shows the single classifiers’ experiment results, Table 13 represents the homogeneous stacked-based ensembles experiment and Table 14 presents the results achieved by the heterogeneous stacked-based ensembles.

Table 9 Mean and standard deviation metrics from single classifiers trained over uneSPY

Full size table

Table 10 Mean and standard deviation metrics from homogeneous ensembles trained over uneSPY

Full size table

Table 11 Mean and standard deviation metrics from heterogeneous ensembles trained over uneSPY

Full size table

Table 12 Mean and standard deviation metrics from single classifiers trained over NSL-KDD

Full size table

Table 13 Mean and standard deviation metrics from homogeneous ensembles trained over NSL-KDD

Full size table

Table 14 Mean and standard deviation metrics from heterogeneous ensembles trained over NSL-KDD

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bertoni, M.A., Rosa, G.H.d. & Brega, J.R.F. Optimum-path forest stacking-based ensemble for intrusion detection. Evol. Intel. 15, 2037–2054 (2022). https://doi.org/10.1007/s12065-021-00609-7

Download citation

Received: 16 October 2020
Revised: 24 March 2021
Accepted: 15 April 2021
Published: 12 May 2021
Issue Date: September 2022
DOI: https://doi.org/10.1007/s12065-021-00609-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimum-path forest stacking-based ensemble for intrusion detection

Abstract

Access this article

Similar content being viewed by others

A random forest guided tour

A comparative analysis of gradient boosting algorithms

A survey on ensemble learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendices

Appendix 1

Appendix 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimum-path forest stacking-based ensemble for intrusion detection

Abstract

Access this article

Similar content being viewed by others

A random forest guided tour

A comparative analysis of gradient boosting algorithms

A survey on ensemble learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendices

Appendix 1

Appendix 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation