Skip to main content
Log in

Fuzzy optimized V-detector algorithm on Apache Spark for class imbalance issue of intrusion detection in big data

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Big data come with new challenges for network intrusion detection as it provides large-scale data with a variety of sophisticated attacks (e.g., malware, advanced persistent threats APTs, zero-day attacks). For that, the demand for new tools and approaches specialized in big data analytics is increasing. In addition, The false alarm rate of anomaly-based intrusion detection systems (IDS) is a major concern. The majority of the existing methods for large-scale network intrusion detection reach a high false-positive rate (FPR) due to the class imbalance of large-scale intrusion datasets, which can affect the network. Subsequently, the critical challenge is to reduce FPR with the lowest decrease in true-positive rate (TPR) to retain detection quality at a feasible level. To face up to these challenges, we have proposed a new network intrusion detection system for big network intrusion based on the negative selection principle and big data frameworks. One of the promising negative selection methods of the artificial immune system (AIS) for network intrusion detection is the variable-sized detector algorithm. Unfortunately, this algorithm cannot analyze big datasets, because the generation of the radius of each detector is related to the self-space, and it will be more complex when the self-space is too big. Furthermore, the search for new detectors is done randomly, and the generated detectors do not have maximum coverage of the self and non-self-space. To confront the shortcoming of this algorithm, we have proposed an extended V-detector algorithm that is built using clonal selection and fuzzy rules, and it is implemented on Apache Spark. The proposed algorithm is scalable and more efficient when applied to large-scale imbalanced datasets. The proposed framework is implemented in a fully distributed cluster of Apache Spark workers and evaluated on the KDDcup99 benchmark dataset, on a large up-to-date dataset CICIDS2017, and on large-scale synthetic datasets. Results reveal that the proposed algorithm outperforms state-of-the-art baselines and achieves high detection accuracy of 0.9984 and 0.9994 and very low positive rates of 0.0002 and 0.0001 with comparable detection rates for the KDDcup99 dataset and the imbalanced dataset CICIDS2017, respectively. Moreover, it improves the scalability and execution time, key for big intrusion detection analysis in real-time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Data availability statement

The datasets that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Aickelin U, Dasgupta D (2005) Artificial immune systems. In: Search methodologies, Springer, pp. 375–399

  2. Aickelin U, Greensmith J, and Twycross J (2004) Immune system approaches to intrusion detection–a review. In: International conference on artificial immune systems, pp. 316–329. Springer

  3. Aiqiang X, Yong L, Xiuli Z, Chunying Y, Tingjun L (2011) Optimization and application of real-valued negative selection algorithm. Proc Eng 23:241–246

    Article  Google Scholar 

  4. Al-Badarneh I, Habib M, Aljarah I, Faris H (2022) Neuro-evolutionary models for imbalanced classification problems. J King Saud Univ Comput Inf Sci 34(6):2787–2797

    Google Scholar 

  5. Al-Shiakhli Sarah (2019) Big data analytics: a literature review perspective. A Literature Review Perspective, Big Data Analytics

  6. Alheeti KM (2011) Intrusion detection system and artificial intelligent. In: Intrusion Detection Systems. IntechOpen

  7. Aljarah I and Ludwig SA (2013) Mapreduce intrusion detection system based on a particle swarm optimization clustering algorithm. In 2013 IEEE congress on evolutionary computation, pp. 955–962. IEEE

  8. Amdahl GM (1967) Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the spring joint computer conference, April 18-20, pp. 483–485

  9. Anbar M, Abdullah N, Manickam S (2020) Advances in cyber security. Springer, Cham

    Book  Google Scholar 

  10. Ayara M, Timmis J, de Castro LN, de Lemos R, Duncan R (2002) Negative selection: How to generate detectors. In: Proceedings of the 1st international conference on artificial immune systems (ICARIS), vol. 1, pp. 89–98 (University of Kent at Canterbury Printing Unit University of Kent at Canterbury)

  11. Bai Y and Wang D (2006) Fundamentals of fuzzy logic control-fuzzy sets, fuzzy rules and defuzzifications. In: Advanced fuzzy logic technologies in industrial applications, pp. 17–36. Springer

  12. Burnet FM et al (1959) The clonal selection theory of acquired immunity. Cambridge University Press, London

    Book  Google Scholar 

  13. (2017) Canadian Institute for Cybersecurity. Intrusion Detection Evaluation Dataset (CIC-IDS2017). https://www.unb.ca/cic/datasets/ids-2017.html/,

  14. Chen Y, Li Y, Cheng XQ, and Guo L (2006) Survey and taxonomy of feature selection algorithms in intrusion detection system. In: International conference on information security and cryptology, pp. 153–167. Springer

  15. Chmielewski A and Wierzchon ST (2006) V-detector algorithm with tree-based structures. In: Proceedings of the international multiconference on computer science and information technology, Wisła (Poland), pp. 9–14

  16. Dasgupta D and Forrest S (1996) Novelty detection in time series data using ideas from immunology. In: Proceedings of the international conference on intelligent systems, pp. 82–87

  17. De Castro LN, J Von Zuben F (2002) Learning and optimization using the clonal selection principle. IEEE Trans Evol Comput 6(3):239–251

    Article  Google Scholar 

  18. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  19. Devi R, Rakesh Kumar J, Gupta A, Jain S, Kumar P (2017) Implementation of intrusion detection system using adaptive neuro-fuzzy inference system for 5g wireless communication network. AEU-Int J Electron Commun 74:94–106

    Article  Google Scholar 

  20. Ding W, Nayak J, Naik B, Pelusi D, Mishra M (2020) Fuzzy and real-coded chemical reaction optimization for intrusion detection in industrial big data environment. IEEE Trans Ind Inf 17(6):4298–4307

    Article  Google Scholar 

  21. Fontugne R, Mazel J, and Fukuda K (2014) Hashdoop: amapreduce framework for network anomaly detection. In: 2014 IEEE conference on computer communications workshops (INFOCOM WKSHPS), pp. 494–499. IEEE

  22. Francois J, Wang S, Bronzi W, State R, and Engel T(2011) Botcloud: detecting botnets using mapreduce. In: 2011 IEEE international workshop on information forensics and security, pp. 1–6. IEEE

  23. Han J, Kamber M, and Pei J (2011) Data mining: concepts and techniques (3rd ed), Morgan Kauffman

  24. Holtz MD, David BM, de Sousa Júnior RT (2011) Building scalable distributed intrusion detection systems based on the mapreduce framework. Revista Telecommun 13(2):22

    Google Scholar 

  25. Inan TT, Liu M, and Shehu A (2022) F-measure optimization for multi-class, imbalanced emotion classification tasks. In: Artificial neural networks and machine learning–ICANN 2022: 31st international conference on artificial neural networks, Bristol, UK, September 6–9, 2022, Proceedings, Part I, pp. 158–170. Springer

  26. Jain A, Sharma A (2020) Membership function formulation methods for fuzzy logic systems: a comprehensive review. J Crit Rev 7(19):8717–8733

    Google Scholar 

  27. Ji Z and Dasgupta D (2004) Real-valued negative selection algorithm with variable-sized detectors. In: Genetic and evolutionary computation conference, pp. 287–298. Springer

  28. Ji Z, Dasgupta D (2009) V-detector: an efficient negative selection algorithm with probably adequate detector coverage. Inf Sci 179(10):1390–1406

    Article  Google Scholar 

  29. Jia-chun L, Zhi-tang L (2003) Novel model for intrusion detection. Wuhan Univ J Nat Sci A 8:46–50

    Article  Google Scholar 

  30. Kim J, Bentley PJ, Aickelin U, Greensmith J, Tedesco G, Twycross J (2007) Immune system approaches to intrusion detection–a review. Nat Comput 6(4):413–466

    Article  MathSciNet  MATH  Google Scholar 

  31. Kim YH, Ahn SC, Kwon WH (2000) Computational complexity of general fuzzy logic control and its simplification for a loop controller. Fuzzy Sets Syst 111(2):215–224

    Article  MathSciNet  Google Scholar 

  32. Kourid A and Chikhi S (2018) A comparative study of recent advances in big data for security and privacy. In: Networking communication and data knowledge engineering, pp. 249–259. Springer

  33. Kreinovich V, Kosheleva O, and Shahbazova SN (2020) Why triangular and trapezoid membership functions: a simple explanation. In: Recent developments in fuzzy logic and fuzzy sets: dedicated to Lotfi A. Zadeh, pp. 25–31

  34. Laurentys CA, Ronacher G, Palhares RM, Caminhas WM (2010) Design of an artificial immune system for fault detection: a negative selection approach. Expert Syst Appl 37(7):5507–5513

    Article  Google Scholar 

  35. Lazarevic A, Kumar V, and Srivastava J (2005) Intrusion detection: a survey. In: Managing cyber threats, pp. 19–78. Springer

  36. Li J, Qu Y, Chao F, Shum HPH, Ho ESL, and Yang L (2019) Machine learning algorithms for network intrusion detection. AI in cybersecurity, pp. 151–179

  37. Lin S-W, Ying K-C, Lee C-Y, Lee Z-J (2012) An intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection. Appl Soft Comput 12(10):3285–3290

    Article  Google Scholar 

  38. Miralvand M, Rasoolzadeh S, Majidi M (2015) Proposing a features preprocessing method based on artificial immune and minimum classification errors methods. J Appl Res Technol 13(1):106–112

    Article  Google Scholar 

  39. Mizukoshi M and Munetomo M (2015) Distributed denial of services attack protection system with genetic algorithms on hadoop cluster computing framework. In: 2015 IEEE congress on evolutionary computation (CEC), pp. 1575–1580. IEEE

  40. Mizumoto M (2020) Defuzzification. In: Handbook of fuzzy computation, pp. 223–B6. CRC Press

  41. Nguyen HT, Kosheleva M, Kosheleva O, Kreinovich V, and Mesiar R (1998) Computational complexity and feasibility of fuzzy data processing: why fuzzy numbers, which fuzzy numbers, which operations with fuzzy numbers. In: Information processing and management of uncertainty in knowledge-based systems (IPMU’98), pp. 273–280

  42. Arjun P, Gupta MK (2019) Comparative analysis of resampling techniques under noisy imbalanced datasets. In: 2019 international conference on issues and challenges in intelligent computing techniques (ICICT) 1:1–5 (IEEE)

  43. Ramdane C, Chikhi S (2017) Negative selection algorithm: recent improvements and its application in intrusion detection system. Int J Comput Acad Res (IJCAR) 6(2):20–30

    Google Scholar 

  44. Resende PAA, Drummond AC (2018) Adaptive anomaly-based intrusion detection system using genetic algorithm and profiling. Secur Priv 1(4):e36

    Article  Google Scholar 

  45. Salo F, Injadat M, Nassif AB, Shami A, Essex A (2018) Data mining techniques in intrusion detection systems: a systematic literature review. IEEE Access 6:56046–56058

    Article  Google Scholar 

  46. Sangeetha S, Haripriya S, Mohana Priya SG, Vaidehi V, and Srinivasan N (2010) Fuzzy rule-base based intrusion detection system on application layer. In: Recent trends in network security and applications: third international conference, CNSA 2010, Chennai, India, July 23-25, 2010. Proceedings 3, pp. 27–36. Springer

  47. Savas O, Deng J (2017) Big data analytics in cybersecurity. CRC Press, London

    Book  Google Scholar 

  48. Sharafaldin I, Lashkari AH, and Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116

  49. Siddique K, Akhtar Z, Lee H, Kim W, Kim Y (2017) Toward bulk synchronous parallel-based machine learning techniques for anomaly detection in high-speed big data networks. Symmetry 9(9):197

    Article  Google Scholar 

  50. Sivanandam SN, Sumathi S, Deepa SN, et al (2007) Introduction to fuzzy logic using MATLAB, vol. 1. Springer

  51. Stibor T, Timmis J, and Eckert C (2005) A comparative study of real-valued negative selection to statistical anomaly detection techniques. In: Artificial immune systems: 4th international conference, ICARIS 2005, Banff, Alberta, Canada, August 14-17, 2005. Proceedings 4, pp. 262–275. Springer

  52. Tahvili S, Saadatmand M, and Bohlin M (2015) Multi-criteria test case prioritization using fuzzy analytic hierarchy process. In: Tenth international conference on software engineering advances (ICSEA 2015), November 15-20, 2015, Barcelona, Spain

  53. Tanaka Y (1993) An overview of fuzzy logic. In: Proceedings of WESCON’93, pp. 446–450

  54. Tavallaee M, Bagheri E, Lu W, and Ghorbani AA (2009) A detailed analysis of the KDD cup 99 data set. In: 2009 IEEE symposium on computational intelligence for security and defense applications, pp. 1–6. IEEE

  55. Berna Haktanirlar Ulutas and Sadan Kulturel-Konak (2011) A review of clonal selection algorithm and its applications. Artif Intell Rev 36(2):117–138

    Article  Google Scholar 

  56. Wang L, Jones R (2017) Big data analytics for network intrusion detection: a survey. Int J Netw Commun 7(1):24–31

    Google Scholar 

  57. Wang Z, Yang J, Zhang H, Li C, Zhang S, and Wang H (2016) Towards online anomaly detection by combining multiple detection methods and storm. In: NOMS 2016-2016 IEEE/IFIP network operations and management symposium, pp. 804–807. IEEE

  58. Xu X, Jäger J, and Kriegel HP (1999) A fast parallel clustering algorithm for large spatial databases. In: High performance data mining, pp. 263–290. Springer

  59. Yen J (1999) Fuzzy logic-a modern perspective. IEEE Trans Knowl Data Eng 11(1):153–165

    Article  Google Scholar 

  60. Yu S, Lin X, Misic J, Shen X (2015) Networking for big data. Chapman and Hall/CRC, London

    Book  Google Scholar 

  61. Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353

    Article  MATH  Google Scholar 

  62. Zadeh LA (1979) Fuzzy sets and information granularity. In: Fuzzy sets, fuzzy logic, and fuzzy systems, pp. 433–448

  63. Zadeh LA (1996) Soft computing and fuzzy logic. In: LA Zadeh (ed) Fuzzy sets, fuzzy logic, and fuzzy systems, pp. 796–804. World Scientific

  64. Zadeh LA (1988) Fuzzy logic. Computer 21(4):83–93

    Article  Google Scholar 

  65. Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, and Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation, pp. 2–2. USENIX Association

  66. Zhang J, Liu P, He J, and Zhang Y (2016)A hadoop based analysis and detection model for ip spoofing typed ddos attack. In: 2016 IEEE Trustcom/BigDataSE/ISPA, pp. 1976–1983. IEEE

  67. Zhang R, Li T, and Xiao X (2013) A real-valued negative selection algorithm based on grid for anomaly detection. In: Abstract and applied analysis, vol. 2013, pp. 1–15. Hindawi

  68. Zhao J and Bose BK (2002) Evaluation of membership functions for fuzzy logic controlled induction motor drive. In: IEEE 2002 28th annual conference of the industrial electronics society. IECON 02, vol . 1, pp. 229–234. IEEE

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahlam Kourid.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kourid, A., Chikhi, S. & Recupero, D.R. Fuzzy optimized V-detector algorithm on Apache Spark for class imbalance issue of intrusion detection in big data. Neural Comput & Applic 35, 19821–19845 (2023). https://doi.org/10.1007/s00521-023-08783-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08783-8

Keywords

Navigation