Abstract
Machine learning techniques have been extensively researched in the last years, mainly due to their effectiveness when dealing with recognition or classification applications. Typically, one can comprehend using a Machine Learning system to autonomously delegate routines, save human efforts, and produce great insights regarding decision-making tasks. This paper introduces and validates a stacking-based ensemble approach using Optimum-Path Forest classifiers in intrusion detection tasks. Instead of only using the famous NSL-KDD dataset, we propose a new dataset called uneSPY, which we believe will fill the gap concerning new intrusion detection datasets. Both datasets were evaluated under several classifiers, including Logistic Regression, Decision Trees, Support Vector Machines, Optimum-Path Forests, and compared against Optimum-Path Forest stacking-based ensembles. Experimental results showed an Optimum-Path Forest stacking-based ensemble classification suitability, particularly when considering its ability to generalize large volumes of data while sustaining its performance.
Similar content being viewed by others
Notes
Prototypes are master nodes representing a specific class and conquer other nodes.
MSTs are subgraphs that connect all nodes within the same set using the minimum possible cost.
Regions more likely to classification mistakes.
Described in Appendix 1 and available in http://recogna.tech/files/datasets/unespy.rar.
Wi-Fi adapter model: TP-Link TL-WN725N V2.
Available in http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
The source code is available at https://github.com/malvesbertoni/ensemble_opf.
References
Amarudin, Ferdiana, R., Widyawan: A systematic literature review of intrusion detection system for network security: Research trends, datasets and methods. In: 2020 4th international conference on informatics and computational sciences (ICICoS), pp 1–6 (2020). https://doi.org/10.1109/ICICoS51170.2020.9299068
Asrafi N, Lo DCT, Parizi RM, Shi Y, Chen YW (2020) Comparing performance of malware classification on automated stacking. In: Proceedings of the 2020 ACM southeast conference, ACM SE ’20, pp 307–308. Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/3374135.3385316. https://doi.org/10.1145/3374135.3385316
Balajinath B, Raghavan S (2001) Intrusion detection through learning behavior model. Comput Commun 24(12):1202–1212. https://doi.org/10.1016/S0140-3664(00)00364-9
Bhuyan MH, Bhattacharyya DK, Kalita JK (2014) Network anomaly detection: methods, systems and tools. IEEE Commun Surv Tutor 16(1):303–336. https://doi.org/10.1109/SURV.2013.052213.00046
Breiman L (1996) Stacked regressions. Mach Learn 24(1):49–64. https://doi.org/10.1023/A:1018046112532.
Chand N, Mishra P, Krishna CR, Pilli ES, Govil MC (2016) A comparative analysis of svm and its stacking with other classification algorithm for intrusion detection. In: 2016 international conference on advances in computing, communication, automation (ICACCA) (Spring), pp 1–6 . https://doi.org/10.1109/ICACCA.2016.7578859
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3), 15:1–15:58 . https://doi.org/10.1145/1541880.1541882
Cisco: The 2018 cisco annual security report. In: The 2018 Cisco annual security report, pp. 46–47. Cisco Systems (2013)
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Denning DE (1987) An intrusion-detection model. IEEE Trans Softw Eng SE-13(2), 222–232 . https://doi.org/10.1109/TSE.1987.232894
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200), 675–701 . http://www.jstor.org/stable/2279372
Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12(10):993–1001. https://doi.org/10.1109/34.58871
Hsu Y, He Z, Tarutani Y, Matsuoka M (2019) Toward an online network intrusion detection system based on ensemble learning. In: 2019 IEEE 12th international conference on cloud computing (CLOUD), pp 174–178 . https://doi.org/10.1109/CLOUD.2019.00037
Kaushik SS, Deshmukh DR (2011) Detection of attacks in an intrusion detection system
Kendall KKR (1999) A database of computer attacks for the evaluation of intrusion detection systems. Ph.D. thesis, Massachusetts Institute of Technology
Krishnaveni S, Sivamohan S, Sridhar S, Prabakaran S (2021) Efficient feature selection and classification through ensemble method for network intrusion detection on cloud computing. Cluster Comput. https://doi.org/10.1007/s10586-020-03222-y
Mirza AH (2018) Computer network intrusion detection using various classifiers and ensemble learning. In: 2018 26th signal processing and communications applications conference (SIU), pp 1–4 . https://doi.org/10.1109/SIU.2018.8404704
Mukhopadhyay I, Chakraborty M, Chakrabarti S (2011) A comparative study of related technologies of intrusion detection & prevention systems. J Inf Secur 2:28–38. https://doi.org/10.4236/jis.2011.21003
Musa US, Chhabra M, Ali A, Kaur M (2020) Intrusion detection system using machine learning techniques: A review. In: 2020 international conference on smart electronics and communication (ICOSEC), pp 149–155 . https://doi.org/10.1109/ICOSEC49089.2020.9215333
Nemenyi P (1963) Distribution-free multiple comparisons. Princeton University . https://books.google.com.br/books?id=nhDMtgAACAAJ
P. Lippmann, R., J. Fried, D., Graf, I., W. Haines, J., R. Kendall, K., McClung, D., Weber, D., Webster, S., Wyschogrod, D., Cunningham, R., Zissman, M.: Evaluating intrusion detection systems: the 1998 darpa off-line intrusion detection evaluation. pp 12–26 vol.2 (2000). https://doi.org/10.1109/DISCEX.2000.821506
Papa JP, Falcao AX, Suzuki CT (2009) Supervised pattern classification based on optimum-path forest. Int J Imaging Syst Technol 19(2):120–131
Rajadurai H, Gandhi UD (2020) A stacked ensemble learning model for intrusion detection in wireless network. Neural Comput Appl
Rajagopal S, Kundapur PP, Hareesha KS (2020) A stacking ensemble for network intrusion detection using heterogeneous datasets. Secur Commun Netw. https://doi.org/10.1155/2020/4586875
Resende PAA, Drummond AC (2018) A survey of random forest based methods for intrusion detection systems 51:3. https://doi.org/10.1145/3178582.
Revathi S, Malathi A (2013) A detailed analysis on nsl-kdd dataset using various machine learning techniques for intrusion detection. Int J Eng Res Technol (IJERT) 2:1848–1853
de Rosa GH, Papa JP, Falcão AX (2020) Opfython: A python-inspired optimum-path forest classifier
Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2):197–227. https://doi.org/10.1023/A:1022648800760
Tama BA, Patil AS, Rhee K (2017) An improved model of anomaly detection using two-level classifier ensemble. In: 2017 12th Asia joint conference on information security (AsiaJCIS), pp 1–4 . https://doi.org/10.1109/AsiaJCIS.2017.9
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the kdd cup 99 data set. In: Proceedings of the second IEEE international conference on computational intelligence for security and defense applications, CISDA’09, pp 53–58. IEEE Press, Piscataway, NJ, USA
Timčenko V, Gajin S (2017) Ensemble classifiers for supervised anomaly based network intrusion detection. In: 2017 13th IEEE international conference on intelligent computer communication and processing (ICCP), pp 13–19 . https://doi.org/10.1109/ICCP.2017.8116977
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259. https://doi.org/10.1016/S0893-6080(05)80023-1 (http://www.sciencedirect.com/science/article/pii/S0893608005800231)
Yang P, Wan X, Shi G, Qu H, Li J, Yang L (2020) Naruto: DNS Covert Channels Detection Based on Stacking Model, p. 109–115. Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/3425329.3425336
Zhou Y, Cheng G (2019) An efficient network intrusion detection system based on feature selection and ensemble classifier. CoRR abs/1904.01352 . http://arxiv.org/abs/1904.01352
Acknowledgements
The authors are grateful to Bruna de Camargo Rubio for her assist in making the uneSPY dataset.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The authors are grateful to Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) - Finance Code 001 and São Paulo Research Foundation (FAPESP) grant #2019/02205-5.
Appendices
Appendix 1
To fulfill the OPF classifier needs, transforming the dataset into numeric data was necessary. Since maintaining the data’s control and patterns was crucial, codification tables were created for every single alphanumeric data transformed. Accordingly, the dataset preserves its information and can be used without losing its essence. As mentioned in Sect. 5.1.1 the uneSPY dataset has 23 features, which are presented in Table 4. The original uneSPY dataset has 3 features that were ignored, which are “Source”, “Destination” and “Info”. Its removal is due to its unsuitability with the OPF classifier. Also, Table 5 presents the data regarding “IGMP Types”; Table 6 describes the DHCPv6 codified messages; Table 7 introduces the classification of the packet types; and Table 8 displays the dataset codified protocols.
Appendix 2
This appendix displays all results achieved through the experiments presented in Sect. 6. As mentioned earlier, regarding the uneSPY dataset, Table 9 refers to the single classifiers’ experiment, Table 10 displays the homogeneous stacked-based ensembles’ results and Table 11 shows the heterogeneous stacked-based ensembles’. Furthermore, concerning the NSL-KDD dataset, Table 12 shows the single classifiers’ experiment results, Table 13 represents the homogeneous stacked-based ensembles experiment and Table 14 presents the results achieved by the heterogeneous stacked-based ensembles.
Rights and permissions
About this article
Cite this article
Bertoni, M.A., Rosa, G.H.d. & Brega, J.R.F. Optimum-path forest stacking-based ensemble for intrusion detection. Evol. Intel. 15, 2037–2054 (2022). https://doi.org/10.1007/s12065-021-00609-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-021-00609-7