Skip to main content
Log in

Toward an efficient fuzziness based instance selection methodology for intrusion detection system

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Building a high quality classifier is one of the key problems in the field of machine learning (ML) and pattern recognition. Many ML algorithms have suffered from high computational power in the presence of large scale data sets. This paper proposes a fuzziness based instance selection technique for the large data sets to increase the efficiency of supervised learning algorithms by improving the shortcomings of designing an effective intrusion detection system (IDS). The proposed methodology is dependent on a new kind of single layer feed-forward neural network (SLFN), called random weight neural network (RWNN). At the first stage, a membership vector corresponding to every training instance is obtained by using RWNN for computing the fuzziness. Secondly, the training instances (along with their fuzziness values) according to the actual class labels are grouped separately. After this, the instances having low fuzziness values in each group are extracted, which are used to build a reduced data set. The instances outputted by the proposed method are used as an input for ML classifiers, which result in reducing the learning time and also increasing the learning capability. The proposed methodology exhibits that the reduced data set can easily learn the boundaries between class labels. The most obvious finding from this study is a considerable increase in the accuracy rate with unseen examples when compared with other instance selection method, i.e., IB2. The proposed method provides the better generalization and fast learning capability. The reasonability of the proposed methodology is theoretically explained and experiments on well known ID data sets support its usefulness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Aamir Raza Ashfaq R, Wang X, Huang J, Abbas H, He Y (2016) Fuzziness based semisupervised learning approach for intrusion detection system, Information Sciences. in press, doi: 10.1016/j.ins.2016.04.019

  2. Aha D, Kibler D, Albert M (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66

    Google Scholar 

  3. Anand K, Ganapathy S, Kulothungan K, Yogesh P, Kannan A (2012) A rule based approach for attribute selection and intrusion detection in wireless sensor networks. Proc Eng 38:1658–1664

    Article  Google Scholar 

  4. Anderson P (1980) Computer security threat monitoring and surveillance, technical report. James P Anderson Co., Fort Washington

    Google Scholar 

  5. Bezdek J, Kuncheva L (2001) Nearest prototype classifier designs: an experimental study. Int J Intell Syst 16(12):1445–1473

    Article  MATH  Google Scholar 

  6. Caises Y, Gonzalez A, Leyva E, Prez R (2009) SCIS: combining instance selection methods to increase their effectiveness over a wide range of domains. Intell Data Eng Autom Learn IDEAL 2009:17–24

    Google Scholar 

  7. Cao FL, Ye HL, Wang DH (2015) A probabilistic learning algorithm for robust modeling using neural networks with random weights. Inf Sci 313:62–78

    Article  Google Scholar 

  8. Chen W, Hsu S, Shen H (2005) Application of SVM and ANN for intrusion detection. Comput Oper Res 32(10):2617–2634

    Article  MATH  Google Scholar 

  9. Chou C, Kuo B, Chang F (2006) The generalized condensed nearest neighbor rule as a data reduction method.  In: Proceedings of the 18th international conference on pattern recognition (ICPR’06), vol 2, pp 556–559

  10. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27

    Article  MATH  Google Scholar 

  11. De Luca A, Termini S (1972) A definition of a non-probabilistic entropy in the setting of fuzzy sets theory. Inf Control 20(4):301–312

    Article  MATH  Google Scholar 

  12. Denning D (1987) An intrusion-detection model. IEEE Trans Softw Eng 13(2):222–232

    Article  Google Scholar 

  13. Devijver P, Kittler J (1980) On the edited nearest neighbor rule. In: Proceedings of the 5th international conference on pattern recognition. Pattern Recognition Society, Los Alamitos, CA, pp 72–80

  14. Elbasiony R, Sallam E, Eltobely T, Fahmy M (2013) A hybrid network intrusion detection framework based on random forests and weighted k-means. Ain Shams Eng J 4(4):753–762

    Article  Google Scholar 

  15. Hart P (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):515–516

    Article  Google Scholar 

  16. He S, Chen H, Zhu Z, Ward D, Cooper H, Viant M, Heath J, Yao X (2015) Robust twin boosting for feature selection from high-dimensional omics data with label noise. Inf Sci 291:1–18

    Article  Google Scholar 

  17. He YL, Wang XZ, Huang JZX (2016) Fuzzy nonlinear regression analysis using a random weight network. Inf Sci 364-365:222–240

    Article  Google Scholar 

  18. Hofmann A, Horeis T, Sick B (2004) Feature selection for intrusion detection: an evolutionary wrapper approach. In: Proceedings of the 2004 IEEE international joint conference on neural networks, vol 2, pp 1563–1568

  19. Igelnik B, Pao Yoh-Han (1995) Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans Neural Netw 6(6):1320–1329

    Article  Google Scholar 

  20. KDDCup 1999 Data, 2016. Available at: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

  21. Keller J, Gray M, Givens J (1985) A fuzzy K-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 15(4):580–585

    Article  Google Scholar 

  22. Kemmerer R, Vigna G (2002) Intrusion detection: a brief history and overview. Computer 35(4):27–30

    Article  Google Scholar 

  23. Li Y, Hu Z, Cai Y, Zhang W (2005) Support vector based prototype selection method for nearest neighbor rules. In: Wang L, Chen K, Ong YS (eds) Advances in natural computation. Lecture notes in computer science, vol 3610. Springer, Berlin, Heidelberg, pp 528–535

  24. Liao Y, Vemuri V (2002) Use of K-Nearest Neighbor classifier for intrusion detection. Comput Secur 21(5):439–448

    Article  Google Scholar 

  25. Liu H, Motoda H (2002) On issues of instance selection. Data Min Knowl Discov 6(2):115–130

    Article  MathSciNet  Google Scholar 

  26. Liu Q, Yin J, Leung V, Zhai J, Cai Z, Lin J (2014) Applying a new localized generalization error model to design neural networks. Neural Comput Appl 27(1):59–66

    Article  Google Scholar 

  27. Liu F, Zhang D, Shen LL (2015) Study on novel curvature features for 3D fingerprint recognition. Neurocomputing 168:599–608

    Article  Google Scholar 

  28. Mukherjee S, Sharma N (2012) Intrusion detection using naive bayes classifier with feature reduction. Proc Technol 4:119–128

    Article  Google Scholar 

  29. Neter J (1996) Applied linear statistical models. WCB/MacGraw-Hill, Boston

    Google Scholar 

  30. ISCX NSL-KDD dataset | UNB. Available at: http://www.unb.ca/research/iscx/dataset/iscx-NSL-KDD-dataset.html

  31. Pereira C, Nakamura R, Costa K, Papa J (2012) An optimum-path forest framework for intrusion detection in computer networks. Eng Appl Artif Intell 25(6):1226–1234

    Article  Google Scholar 

  32. Qiu M, Zhang L, Ming Z, Chen Z, Qin X, Yang L (2013) Security-aware optimization for ubiquitous computing systems with SEAT graph approach. J Comput Syst Sci 79(5):518–529

    Article  MATH  MathSciNet  Google Scholar 

  33. Sanchez D, Trillas E (2012) Measures of fuzziness under different uses of fuzzy sets. Commun Comput Inf Sci 298:25–34

    MATH  Google Scholar 

  34. Schmidt W, Kraaijveld M, Duin R (1992) Feedforward neural networks with random weights. In: Proceedings of 11th IAPR international conference on pattern recognition, conference B: pattern recognition methodology and systems, pp 1–4

  35. Schultz M, Eskin E, Zadok F, Stolfo S (2001) Data mining methods for detection of new malicious executables. In: Proceedings of the 2001 IEEE symposium on security and privacy, pp 38–49

  36. Shi J, Jiang Q, Mao R, Lu M, Wang T (2015) FR-KECA: fuzzy robust kernel entropy component analysis. Neurocomputing 149:1415–1423

    Article  Google Scholar 

  37. Spillmann B, Neuhaus M, Bunke H, Pkalska E, Duin R (2006) Transforming strings to vector spaces using prototype selection. Lecture notes in computer science, pp 287–296

  38. Tavallaee M, Bagheri E, Lu W, Ghorbani A (2009) A detailed analysis of the KDD CUP 99 data set. In: Proceedings of the 2009 IEEE symposium on computational intelligence for security and defense applications. Available at: http://nparc.cisti-icist.nrc-cnrc.gc.ca/eng/view/accepted/?id=649fb606-4a97-47d0-b373-082cb3ac0259

  39. Te Braake H, Van Straten G (1995) Random activation weight neural net (RAWN) for east non-iterative training. Eng Appl Artif Intell 8(1):71–80

    Article  Google Scholar 

  40. Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern 6(6):448–452

    MATH  MathSciNet  Google Scholar 

  41. Vapnik V (1995) The nature of statistical learning theory. Springer, New York

    Book  MATH  Google Scholar 

  42. Wang XZ, Aamir R, Fu A (2015) Fuzziness based sample categorization for classifier performance improvement. J Intell Fuzzy Syst 29(3):1185–1196

    Article  MathSciNet  Google Scholar 

  43. Wang XZ, Miao Q, Zhai M, Zhai J (2012) Instance selection based on sample entropy for efficient data classification with ELM. In: Proceedings of the 2012 IEEE international conference on systems, man, and cybernetics (SMC), pp 970–974

  44. Wang XZ (2015) Learning from big data with uncertainty-editorial. J Intell Fuzzy Syst 28(5):2329–2330

    Article  MathSciNet  Google Scholar 

  45. Wang XZ, Xing HJ, Li Y, Hua Q, Dong CR, Pedrycz W (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654

    Article  Google Scholar 

  46. Wilson D (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421

    Article  MATH  MathSciNet  Google Scholar 

  47. Xie J, Hone K, Xie W, Gao X, Shi Y, Liu X (2013) Extending twin support vector machine classifier for multi-category classification problems. Intell Data Anal 17(4):649–664

    Google Scholar 

  48. Yan Q, Yu F (2015) Distributed denial of service attacks in software-defined networking with cloud computing. IEEE Commun Mag 53(4):52–59

    Article  Google Scholar 

  49. Yang M, Zhu PF, Liu F, Shen LL (2015) Joint representation and pattern learning for robust face recognition. Neurocomputing 168:70–80

    Article  Google Scholar 

  50. Yao Y, Wei Y, Gao FX, Ge Y (2006) Anomaly intrusion detection approach using hybrid MLP/CNN neural network. In: Sixth international conference on intelligent systems design and applications, vol 2, pp 1095–1102

  51. You ZH, Lei YK, Zhu L, Xia JF, Wang B (2013) Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis. BMC Bioinf 14(Suppl 8):S10

    Article  Google Scholar 

  52. You ZH, Yu JZ, Zhu L, Li S, Wen ZK (2014) A mapreduce based parallel SVM for large-scale predicting proteinprotein interactions. Neurocomputing 145:37–43

    Article  Google Scholar 

  53. Zadeh L (1968) Probability measures of fuzzy events. J Math Anal Appl 23(2):421–427

    Article  MATH  MathSciNet  Google Scholar 

  54. Zhang Z, Shen H (2005) Application of online-training SVMs for real-time intrusion detection with different considerations. Comput Commun 28(12):1428–1442

    Article  Google Scholar 

  55. Zhao W, Wang ZH, Cao FL, Wang DH (2015) A local learning algorithm for random weights networks. Knowl Based Syst 74:159–166

    Article  Google Scholar 

Download references

Acknowledgments

This research is supported by China Postdoctoral Science Foundations (2015M572361 and 2016T90799), Basic Research Project of Knowledge Innovation Program in Shenzhen (JCYJ20150324140036825), and National Natural Science Foundations of China (61503252 and 71371063).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu-lin He.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ashfaq, R.A.R., He, Yl. & Chen, Dg. Toward an efficient fuzziness based instance selection methodology for intrusion detection system. Int. J. Mach. Learn. & Cyber. 8, 1767–1776 (2017). https://doi.org/10.1007/s13042-016-0557-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-016-0557-4

Keywords

Navigation