Abstract
Big data come with new challenges for network intrusion detection as it provides large-scale data with a variety of sophisticated attacks (e.g., malware, advanced persistent threats APTs, zero-day attacks). For that, the demand for new tools and approaches specialized in big data analytics is increasing. In addition, The false alarm rate of anomaly-based intrusion detection systems (IDS) is a major concern. The majority of the existing methods for large-scale network intrusion detection reach a high false-positive rate (FPR) due to the class imbalance of large-scale intrusion datasets, which can affect the network. Subsequently, the critical challenge is to reduce FPR with the lowest decrease in true-positive rate (TPR) to retain detection quality at a feasible level. To face up to these challenges, we have proposed a new network intrusion detection system for big network intrusion based on the negative selection principle and big data frameworks. One of the promising negative selection methods of the artificial immune system (AIS) for network intrusion detection is the variable-sized detector algorithm. Unfortunately, this algorithm cannot analyze big datasets, because the generation of the radius of each detector is related to the self-space, and it will be more complex when the self-space is too big. Furthermore, the search for new detectors is done randomly, and the generated detectors do not have maximum coverage of the self and non-self-space. To confront the shortcoming of this algorithm, we have proposed an extended V-detector algorithm that is built using clonal selection and fuzzy rules, and it is implemented on Apache Spark. The proposed algorithm is scalable and more efficient when applied to large-scale imbalanced datasets. The proposed framework is implemented in a fully distributed cluster of Apache Spark workers and evaluated on the KDDcup99 benchmark dataset, on a large up-to-date dataset CICIDS2017, and on large-scale synthetic datasets. Results reveal that the proposed algorithm outperforms state-of-the-art baselines and achieves high detection accuracy of 0.9984 and 0.9994 and very low positive rates of 0.0002 and 0.0001 with comparable detection rates for the KDDcup99 dataset and the imbalanced dataset CICIDS2017, respectively. Moreover, it improves the scalability and execution time, key for big intrusion detection analysis in real-time.
Similar content being viewed by others
Data availability statement
The datasets that support the findings of this study are available from the corresponding author upon reasonable request.
References
Aickelin U, Dasgupta D (2005) Artificial immune systems. In: Search methodologies, Springer, pp. 375–399
Aickelin U, Greensmith J, and Twycross J (2004) Immune system approaches to intrusion detection–a review. In: International conference on artificial immune systems, pp. 316–329. Springer
Aiqiang X, Yong L, Xiuli Z, Chunying Y, Tingjun L (2011) Optimization and application of real-valued negative selection algorithm. Proc Eng 23:241–246
Al-Badarneh I, Habib M, Aljarah I, Faris H (2022) Neuro-evolutionary models for imbalanced classification problems. J King Saud Univ Comput Inf Sci 34(6):2787–2797
Al-Shiakhli Sarah (2019) Big data analytics: a literature review perspective. A Literature Review Perspective, Big Data Analytics
Alheeti KM (2011) Intrusion detection system and artificial intelligent. In: Intrusion Detection Systems. IntechOpen
Aljarah I and Ludwig SA (2013) Mapreduce intrusion detection system based on a particle swarm optimization clustering algorithm. In 2013 IEEE congress on evolutionary computation, pp. 955–962. IEEE
Amdahl GM (1967) Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the spring joint computer conference, April 18-20, pp. 483–485
Anbar M, Abdullah N, Manickam S (2020) Advances in cyber security. Springer, Cham
Ayara M, Timmis J, de Castro LN, de Lemos R, Duncan R (2002) Negative selection: How to generate detectors. In: Proceedings of the 1st international conference on artificial immune systems (ICARIS), vol. 1, pp. 89–98 (University of Kent at Canterbury Printing Unit University of Kent at Canterbury)
Bai Y and Wang D (2006) Fundamentals of fuzzy logic control-fuzzy sets, fuzzy rules and defuzzifications. In: Advanced fuzzy logic technologies in industrial applications, pp. 17–36. Springer
Burnet FM et al (1959) The clonal selection theory of acquired immunity. Cambridge University Press, London
(2017) Canadian Institute for Cybersecurity. Intrusion Detection Evaluation Dataset (CIC-IDS2017). https://www.unb.ca/cic/datasets/ids-2017.html/,
Chen Y, Li Y, Cheng XQ, and Guo L (2006) Survey and taxonomy of feature selection algorithms in intrusion detection system. In: International conference on information security and cryptology, pp. 153–167. Springer
Chmielewski A and Wierzchon ST (2006) V-detector algorithm with tree-based structures. In: Proceedings of the international multiconference on computer science and information technology, Wisła (Poland), pp. 9–14
Dasgupta D and Forrest S (1996) Novelty detection in time series data using ideas from immunology. In: Proceedings of the international conference on intelligent systems, pp. 82–87
De Castro LN, J Von Zuben F (2002) Learning and optimization using the clonal selection principle. IEEE Trans Evol Comput 6(3):239–251
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Devi R, Rakesh Kumar J, Gupta A, Jain S, Kumar P (2017) Implementation of intrusion detection system using adaptive neuro-fuzzy inference system for 5g wireless communication network. AEU-Int J Electron Commun 74:94–106
Ding W, Nayak J, Naik B, Pelusi D, Mishra M (2020) Fuzzy and real-coded chemical reaction optimization for intrusion detection in industrial big data environment. IEEE Trans Ind Inf 17(6):4298–4307
Fontugne R, Mazel J, and Fukuda K (2014) Hashdoop: amapreduce framework for network anomaly detection. In: 2014 IEEE conference on computer communications workshops (INFOCOM WKSHPS), pp. 494–499. IEEE
Francois J, Wang S, Bronzi W, State R, and Engel T(2011) Botcloud: detecting botnets using mapreduce. In: 2011 IEEE international workshop on information forensics and security, pp. 1–6. IEEE
Han J, Kamber M, and Pei J (2011) Data mining: concepts and techniques (3rd ed), Morgan Kauffman
Holtz MD, David BM, de Sousa Júnior RT (2011) Building scalable distributed intrusion detection systems based on the mapreduce framework. Revista Telecommun 13(2):22
Inan TT, Liu M, and Shehu A (2022) F-measure optimization for multi-class, imbalanced emotion classification tasks. In: Artificial neural networks and machine learning–ICANN 2022: 31st international conference on artificial neural networks, Bristol, UK, September 6–9, 2022, Proceedings, Part I, pp. 158–170. Springer
Jain A, Sharma A (2020) Membership function formulation methods for fuzzy logic systems: a comprehensive review. J Crit Rev 7(19):8717–8733
Ji Z and Dasgupta D (2004) Real-valued negative selection algorithm with variable-sized detectors. In: Genetic and evolutionary computation conference, pp. 287–298. Springer
Ji Z, Dasgupta D (2009) V-detector: an efficient negative selection algorithm with probably adequate detector coverage. Inf Sci 179(10):1390–1406
Jia-chun L, Zhi-tang L (2003) Novel model for intrusion detection. Wuhan Univ J Nat Sci A 8:46–50
Kim J, Bentley PJ, Aickelin U, Greensmith J, Tedesco G, Twycross J (2007) Immune system approaches to intrusion detection–a review. Nat Comput 6(4):413–466
Kim YH, Ahn SC, Kwon WH (2000) Computational complexity of general fuzzy logic control and its simplification for a loop controller. Fuzzy Sets Syst 111(2):215–224
Kourid A and Chikhi S (2018) A comparative study of recent advances in big data for security and privacy. In: Networking communication and data knowledge engineering, pp. 249–259. Springer
Kreinovich V, Kosheleva O, and Shahbazova SN (2020) Why triangular and trapezoid membership functions: a simple explanation. In: Recent developments in fuzzy logic and fuzzy sets: dedicated to Lotfi A. Zadeh, pp. 25–31
Laurentys CA, Ronacher G, Palhares RM, Caminhas WM (2010) Design of an artificial immune system for fault detection: a negative selection approach. Expert Syst Appl 37(7):5507–5513
Lazarevic A, Kumar V, and Srivastava J (2005) Intrusion detection: a survey. In: Managing cyber threats, pp. 19–78. Springer
Li J, Qu Y, Chao F, Shum HPH, Ho ESL, and Yang L (2019) Machine learning algorithms for network intrusion detection. AI in cybersecurity, pp. 151–179
Lin S-W, Ying K-C, Lee C-Y, Lee Z-J (2012) An intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection. Appl Soft Comput 12(10):3285–3290
Miralvand M, Rasoolzadeh S, Majidi M (2015) Proposing a features preprocessing method based on artificial immune and minimum classification errors methods. J Appl Res Technol 13(1):106–112
Mizukoshi M and Munetomo M (2015) Distributed denial of services attack protection system with genetic algorithms on hadoop cluster computing framework. In: 2015 IEEE congress on evolutionary computation (CEC), pp. 1575–1580. IEEE
Mizumoto M (2020) Defuzzification. In: Handbook of fuzzy computation, pp. 223–B6. CRC Press
Nguyen HT, Kosheleva M, Kosheleva O, Kreinovich V, and Mesiar R (1998) Computational complexity and feasibility of fuzzy data processing: why fuzzy numbers, which fuzzy numbers, which operations with fuzzy numbers. In: Information processing and management of uncertainty in knowledge-based systems (IPMU’98), pp. 273–280
Arjun P, Gupta MK (2019) Comparative analysis of resampling techniques under noisy imbalanced datasets. In: 2019 international conference on issues and challenges in intelligent computing techniques (ICICT) 1:1–5 (IEEE)
Ramdane C, Chikhi S (2017) Negative selection algorithm: recent improvements and its application in intrusion detection system. Int J Comput Acad Res (IJCAR) 6(2):20–30
Resende PAA, Drummond AC (2018) Adaptive anomaly-based intrusion detection system using genetic algorithm and profiling. Secur Priv 1(4):e36
Salo F, Injadat M, Nassif AB, Shami A, Essex A (2018) Data mining techniques in intrusion detection systems: a systematic literature review. IEEE Access 6:56046–56058
Sangeetha S, Haripriya S, Mohana Priya SG, Vaidehi V, and Srinivasan N (2010) Fuzzy rule-base based intrusion detection system on application layer. In: Recent trends in network security and applications: third international conference, CNSA 2010, Chennai, India, July 23-25, 2010. Proceedings 3, pp. 27–36. Springer
Savas O, Deng J (2017) Big data analytics in cybersecurity. CRC Press, London
Sharafaldin I, Lashkari AH, and Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116
Siddique K, Akhtar Z, Lee H, Kim W, Kim Y (2017) Toward bulk synchronous parallel-based machine learning techniques for anomaly detection in high-speed big data networks. Symmetry 9(9):197
Sivanandam SN, Sumathi S, Deepa SN, et al (2007) Introduction to fuzzy logic using MATLAB, vol. 1. Springer
Stibor T, Timmis J, and Eckert C (2005) A comparative study of real-valued negative selection to statistical anomaly detection techniques. In: Artificial immune systems: 4th international conference, ICARIS 2005, Banff, Alberta, Canada, August 14-17, 2005. Proceedings 4, pp. 262–275. Springer
Tahvili S, Saadatmand M, and Bohlin M (2015) Multi-criteria test case prioritization using fuzzy analytic hierarchy process. In: Tenth international conference on software engineering advances (ICSEA 2015), November 15-20, 2015, Barcelona, Spain
Tanaka Y (1993) An overview of fuzzy logic. In: Proceedings of WESCON’93, pp. 446–450
Tavallaee M, Bagheri E, Lu W, and Ghorbani AA (2009) A detailed analysis of the KDD cup 99 data set. In: 2009 IEEE symposium on computational intelligence for security and defense applications, pp. 1–6. IEEE
Berna Haktanirlar Ulutas and Sadan Kulturel-Konak (2011) A review of clonal selection algorithm and its applications. Artif Intell Rev 36(2):117–138
Wang L, Jones R (2017) Big data analytics for network intrusion detection: a survey. Int J Netw Commun 7(1):24–31
Wang Z, Yang J, Zhang H, Li C, Zhang S, and Wang H (2016) Towards online anomaly detection by combining multiple detection methods and storm. In: NOMS 2016-2016 IEEE/IFIP network operations and management symposium, pp. 804–807. IEEE
Xu X, Jäger J, and Kriegel HP (1999) A fast parallel clustering algorithm for large spatial databases. In: High performance data mining, pp. 263–290. Springer
Yen J (1999) Fuzzy logic-a modern perspective. IEEE Trans Knowl Data Eng 11(1):153–165
Yu S, Lin X, Misic J, Shen X (2015) Networking for big data. Chapman and Hall/CRC, London
Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353
Zadeh LA (1979) Fuzzy sets and information granularity. In: Fuzzy sets, fuzzy logic, and fuzzy systems, pp. 433–448
Zadeh LA (1996) Soft computing and fuzzy logic. In: LA Zadeh (ed) Fuzzy sets, fuzzy logic, and fuzzy systems, pp. 796–804. World Scientific
Zadeh LA (1988) Fuzzy logic. Computer 21(4):83–93
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, and Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation, pp. 2–2. USENIX Association
Zhang J, Liu P, He J, and Zhang Y (2016)A hadoop based analysis and detection model for ip spoofing typed ddos attack. In: 2016 IEEE Trustcom/BigDataSE/ISPA, pp. 1976–1983. IEEE
Zhang R, Li T, and Xiao X (2013) A real-valued negative selection algorithm based on grid for anomaly detection. In: Abstract and applied analysis, vol. 2013, pp. 1–15. Hindawi
Zhao J and Bose BK (2002) Evaluation of membership functions for fuzzy logic controlled induction motor drive. In: IEEE 2002 28th annual conference of the industrial electronics society. IECON 02, vol . 1, pp. 229–234. IEEE
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kourid, A., Chikhi, S. & Recupero, D.R. Fuzzy optimized V-detector algorithm on Apache Spark for class imbalance issue of intrusion detection in big data. Neural Comput & Applic 35, 19821–19845 (2023). https://doi.org/10.1007/s00521-023-08783-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08783-8