Skip to main content
Log in

Optimum-path forest stacking-based ensemble for intrusion detection

  • Research Paper
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

Machine learning techniques have been extensively researched in the last years, mainly due to their effectiveness when dealing with recognition or classification applications. Typically, one can comprehend using a Machine Learning system to autonomously delegate routines, save human efforts, and produce great insights regarding decision-making tasks. This paper introduces and validates a stacking-based ensemble approach using Optimum-Path Forest classifiers in intrusion detection tasks. Instead of only using the famous NSL-KDD dataset, we propose a new dataset called uneSPY, which we believe will fill the gap concerning new intrusion detection datasets. Both datasets were evaluated under several classifiers, including Logistic Regression, Decision Trees, Support Vector Machines, Optimum-Path Forests, and compared against Optimum-Path Forest stacking-based ensembles. Experimental results showed an Optimum-Path Forest stacking-based ensemble classification suitability, particularly when considering its ability to generalize large volumes of data while sustaining its performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. Prototypes are master nodes representing a specific class and conquer other nodes.

  2. MSTs are subgraphs that connect all nodes within the same set using the minimum possible cost.

  3. Regions more likely to classification mistakes.

  4. Described in Appendix 1 and available in http://recogna.tech/files/datasets/unespy.rar.

  5. Wi-Fi adapter model: TP-Link TL-WN725N V2.

  6. Available in http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.

  7. https://scikit-learn.org.

  8. The source code is available at https://github.com/malvesbertoni/ensemble_opf.

References

  1. Amarudin, Ferdiana, R., Widyawan: A systematic literature review of intrusion detection system for network security: Research trends, datasets and methods. In: 2020 4th international conference on informatics and computational sciences (ICICoS), pp 1–6 (2020). https://doi.org/10.1109/ICICoS51170.2020.9299068

  2. Asrafi N, Lo DCT, Parizi RM, Shi Y, Chen YW (2020) Comparing performance of malware classification on automated stacking. In: Proceedings of the 2020 ACM southeast conference, ACM SE ’20, pp 307–308. Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/3374135.3385316. https://doi.org/10.1145/3374135.3385316

  3. Balajinath B, Raghavan S (2001) Intrusion detection through learning behavior model. Comput Commun 24(12):1202–1212. https://doi.org/10.1016/S0140-3664(00)00364-9

    Article  Google Scholar 

  4. Bhuyan MH, Bhattacharyya DK, Kalita JK (2014) Network anomaly detection: methods, systems and tools. IEEE Commun Surv Tutor 16(1):303–336. https://doi.org/10.1109/SURV.2013.052213.00046

    Article  Google Scholar 

  5. Breiman L (1996) Stacked regressions. Mach Learn 24(1):49–64. https://doi.org/10.1023/A:1018046112532.

    Article  MATH  Google Scholar 

  6. Chand N, Mishra P, Krishna CR, Pilli ES, Govil MC (2016) A comparative analysis of svm and its stacking with other classification algorithm for intrusion detection. In: 2016 international conference on advances in computing, communication, automation (ICACCA) (Spring), pp 1–6 . https://doi.org/10.1109/ICACCA.2016.7578859

  7. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3), 15:1–15:58 . https://doi.org/10.1145/1541880.1541882

  8. Cisco: The 2018 cisco annual security report. In: The 2018 Cisco annual security report, pp. 46–47. Cisco Systems (2013)

  9. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  10. Denning DE (1987) An intrusion-detection model. IEEE Trans Softw Eng SE-13(2), 222–232 . https://doi.org/10.1109/TSE.1987.232894

  11. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200), 675–701 . http://www.jstor.org/stable/2279372

  12. Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12(10):993–1001. https://doi.org/10.1109/34.58871

    Article  Google Scholar 

  13. Hsu Y, He Z, Tarutani Y, Matsuoka M (2019) Toward an online network intrusion detection system based on ensemble learning. In: 2019 IEEE 12th international conference on cloud computing (CLOUD), pp 174–178 . https://doi.org/10.1109/CLOUD.2019.00037

  14. Kaushik SS, Deshmukh DR (2011) Detection of attacks in an intrusion detection system

  15. Kendall KKR (1999) A database of computer attacks for the evaluation of intrusion detection systems. Ph.D. thesis, Massachusetts Institute of Technology

  16. Krishnaveni S, Sivamohan S, Sridhar S, Prabakaran S (2021) Efficient feature selection and classification through ensemble method for network intrusion detection on cloud computing. Cluster Comput. https://doi.org/10.1007/s10586-020-03222-y

  17. Mirza AH (2018) Computer network intrusion detection using various classifiers and ensemble learning. In: 2018 26th signal processing and communications applications conference (SIU), pp 1–4 . https://doi.org/10.1109/SIU.2018.8404704

  18. Mukhopadhyay I, Chakraborty M, Chakrabarti S (2011) A comparative study of related technologies of intrusion detection & prevention systems. J Inf Secur 2:28–38. https://doi.org/10.4236/jis.2011.21003

    Article  Google Scholar 

  19. Musa US, Chhabra M, Ali A, Kaur M (2020) Intrusion detection system using machine learning techniques: A review. In: 2020 international conference on smart electronics and communication (ICOSEC), pp 149–155 . https://doi.org/10.1109/ICOSEC49089.2020.9215333

  20. Nemenyi P (1963) Distribution-free multiple comparisons. Princeton University . https://books.google.com.br/books?id=nhDMtgAACAAJ

  21. P. Lippmann, R., J. Fried, D., Graf, I., W. Haines, J., R. Kendall, K., McClung, D., Weber, D., Webster, S., Wyschogrod, D., Cunningham, R., Zissman, M.: Evaluating intrusion detection systems: the 1998 darpa off-line intrusion detection evaluation. pp 12–26 vol.2 (2000). https://doi.org/10.1109/DISCEX.2000.821506

  22. Papa JP, Falcao AX, Suzuki CT (2009) Supervised pattern classification based on optimum-path forest. Int J Imaging Syst Technol 19(2):120–131

    Article  Google Scholar 

  23. Rajadurai H, Gandhi UD (2020) A stacked ensemble learning model for intrusion detection in wireless network. Neural Comput Appl

  24. Rajagopal S, Kundapur PP, Hareesha KS (2020) A stacking ensemble for network intrusion detection using heterogeneous datasets. Secur Commun Netw. https://doi.org/10.1155/2020/4586875

  25. Resende PAA, Drummond AC (2018) A survey of random forest based methods for intrusion detection systems 51:3. https://doi.org/10.1145/3178582.

    Article  Google Scholar 

  26. Revathi S, Malathi A (2013) A detailed analysis on nsl-kdd dataset using various machine learning techniques for intrusion detection. Int J Eng Res Technol (IJERT) 2:1848–1853

    Google Scholar 

  27. de Rosa GH, Papa JP, Falcão AX (2020) Opfython: A python-inspired optimum-path forest classifier

  28. Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2):197–227. https://doi.org/10.1023/A:1022648800760

    Article  Google Scholar 

  29. Tama BA, Patil AS, Rhee K (2017) An improved model of anomaly detection using two-level classifier ensemble. In: 2017 12th Asia joint conference on information security (AsiaJCIS), pp 1–4 . https://doi.org/10.1109/AsiaJCIS.2017.9

  30. Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the kdd cup 99 data set. In: Proceedings of the second IEEE international conference on computational intelligence for security and defense applications, CISDA’09, pp 53–58. IEEE Press, Piscataway, NJ, USA

  31. Timčenko V, Gajin S (2017) Ensemble classifiers for supervised anomaly based network intrusion detection. In: 2017 13th IEEE international conference on intelligent computer communication and processing (ICCP), pp 13–19 . https://doi.org/10.1109/ICCP.2017.8116977

  32. Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259. https://doi.org/10.1016/S0893-6080(05)80023-1 (http://www.sciencedirect.com/science/article/pii/S0893608005800231)

    Article  Google Scholar 

  33. Yang P, Wan X, Shi G, Qu H, Li J, Yang L (2020) Naruto: DNS Covert Channels Detection Based on Stacking Model, p. 109–115. Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/3425329.3425336

  34. Zhou Y, Cheng G (2019) An efficient network intrusion detection system based on feature selection and ensemble classifier. CoRR abs/1904.01352 . http://arxiv.org/abs/1904.01352

Download references

Acknowledgements

The authors are grateful to Bruna de Camargo Rubio for her assist in making the uneSPY dataset.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gustavo H. de Rosa.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors are grateful to Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) - Finance Code 001 and São Paulo Research Foundation (FAPESP) grant #2019/02205-5.

Appendices

Appendix 1

To fulfill the OPF classifier needs, transforming the dataset into numeric data was necessary. Since maintaining the data’s control and patterns was crucial, codification tables were created for every single alphanumeric data transformed. Accordingly, the dataset preserves its information and can be used without losing its essence. As mentioned in Sect. 5.1.1 the uneSPY dataset has 23 features, which are presented in Table 4. The original uneSPY dataset has 3 features that were ignored, which are “Source”, “Destination” and “Info”. Its removal is due to its unsuitability with the OPF classifier. Also, Table 5 presents the data regarding “IGMP Types”; Table 6 describes the DHCPv6 codified messages; Table 7 introduces the classification of the packet types; and Table 8 displays the dataset codified protocols.

Table 4 Description of features employed in the uneSPY dataset
Table 5 Description of IGMP types employed in the uneSPY dataset
Table 6 Description of DHCPv6 messages codification employed in the uneSPY dataset
Table 7 Description of packet types codification employed in the uneSPY dataset
Table 8 Description of protocols codification employed in the uneSPY dataset

Appendix 2

This appendix displays all results achieved through the experiments presented in Sect. 6. As mentioned earlier, regarding the uneSPY dataset, Table 9 refers to the single classifiers’ experiment, Table 10 displays the homogeneous stacked-based ensembles’ results and Table 11 shows the heterogeneous stacked-based ensembles’. Furthermore, concerning the NSL-KDD dataset, Table 12 shows the single classifiers’ experiment results, Table 13 represents the homogeneous stacked-based ensembles experiment and Table 14 presents the results achieved by the heterogeneous stacked-based ensembles.

Table 9 Mean and standard deviation metrics from single classifiers trained over uneSPY
Table 10 Mean and standard deviation metrics from homogeneous ensembles trained over uneSPY
Table 11 Mean and standard deviation metrics from heterogeneous ensembles trained over uneSPY
Table 12 Mean and standard deviation metrics from single classifiers trained over NSL-KDD
Table 13 Mean and standard deviation metrics from homogeneous ensembles trained over NSL-KDD
Table 14 Mean and standard deviation metrics from heterogeneous ensembles trained over NSL-KDD

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bertoni, M.A., Rosa, G.H.d. & Brega, J.R.F. Optimum-path forest stacking-based ensemble for intrusion detection. Evol. Intel. 15, 2037–2054 (2022). https://doi.org/10.1007/s12065-021-00609-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-021-00609-7

Keywords

Navigation