Abstract
Building a high quality classifier is one of the key problems in the field of machine learning (ML) and pattern recognition. Many ML algorithms have suffered from high computational power in the presence of large scale data sets. This paper proposes a fuzziness based instance selection technique for the large data sets to increase the efficiency of supervised learning algorithms by improving the shortcomings of designing an effective intrusion detection system (IDS). The proposed methodology is dependent on a new kind of single layer feed-forward neural network (SLFN), called random weight neural network (RWNN). At the first stage, a membership vector corresponding to every training instance is obtained by using RWNN for computing the fuzziness. Secondly, the training instances (along with their fuzziness values) according to the actual class labels are grouped separately. After this, the instances having low fuzziness values in each group are extracted, which are used to build a reduced data set. The instances outputted by the proposed method are used as an input for ML classifiers, which result in reducing the learning time and also increasing the learning capability. The proposed methodology exhibits that the reduced data set can easily learn the boundaries between class labels. The most obvious finding from this study is a considerable increase in the accuracy rate with unseen examples when compared with other instance selection method, i.e., IB2. The proposed method provides the better generalization and fast learning capability. The reasonability of the proposed methodology is theoretically explained and experiments on well known ID data sets support its usefulness.
Similar content being viewed by others
References
Aamir Raza Ashfaq R, Wang X, Huang J, Abbas H, He Y (2016) Fuzziness based semisupervised learning approach for intrusion detection system, Information Sciences. in press, doi: 10.1016/j.ins.2016.04.019
Aha D, Kibler D, Albert M (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66
Anand K, Ganapathy S, Kulothungan K, Yogesh P, Kannan A (2012) A rule based approach for attribute selection and intrusion detection in wireless sensor networks. Proc Eng 38:1658–1664
Anderson P (1980) Computer security threat monitoring and surveillance, technical report. James P Anderson Co., Fort Washington
Bezdek J, Kuncheva L (2001) Nearest prototype classifier designs: an experimental study. Int J Intell Syst 16(12):1445–1473
Caises Y, Gonzalez A, Leyva E, Prez R (2009) SCIS: combining instance selection methods to increase their effectiveness over a wide range of domains. Intell Data Eng Autom Learn IDEAL 2009:17–24
Cao FL, Ye HL, Wang DH (2015) A probabilistic learning algorithm for robust modeling using neural networks with random weights. Inf Sci 313:62–78
Chen W, Hsu S, Shen H (2005) Application of SVM and ANN for intrusion detection. Comput Oper Res 32(10):2617–2634
Chou C, Kuo B, Chang F (2006) The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th international conference on pattern recognition (ICPR’06), vol 2, pp 556–559
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
De Luca A, Termini S (1972) A definition of a non-probabilistic entropy in the setting of fuzzy sets theory. Inf Control 20(4):301–312
Denning D (1987) An intrusion-detection model. IEEE Trans Softw Eng 13(2):222–232
Devijver P, Kittler J (1980) On the edited nearest neighbor rule. In: Proceedings of the 5th international conference on pattern recognition. Pattern Recognition Society, Los Alamitos, CA, pp 72–80
Elbasiony R, Sallam E, Eltobely T, Fahmy M (2013) A hybrid network intrusion detection framework based on random forests and weighted k-means. Ain Shams Eng J 4(4):753–762
Hart P (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):515–516
He S, Chen H, Zhu Z, Ward D, Cooper H, Viant M, Heath J, Yao X (2015) Robust twin boosting for feature selection from high-dimensional omics data with label noise. Inf Sci 291:1–18
He YL, Wang XZ, Huang JZX (2016) Fuzzy nonlinear regression analysis using a random weight network. Inf Sci 364-365:222–240
Hofmann A, Horeis T, Sick B (2004) Feature selection for intrusion detection: an evolutionary wrapper approach. In: Proceedings of the 2004 IEEE international joint conference on neural networks, vol 2, pp 1563–1568
Igelnik B, Pao Yoh-Han (1995) Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans Neural Netw 6(6):1320–1329
KDDCup 1999 Data, 2016. Available at: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
Keller J, Gray M, Givens J (1985) A fuzzy K-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 15(4):580–585
Kemmerer R, Vigna G (2002) Intrusion detection: a brief history and overview. Computer 35(4):27–30
Li Y, Hu Z, Cai Y, Zhang W (2005) Support vector based prototype selection method for nearest neighbor rules. In: Wang L, Chen K, Ong YS (eds) Advances in natural computation. Lecture notes in computer science, vol 3610. Springer, Berlin, Heidelberg, pp 528–535
Liao Y, Vemuri V (2002) Use of K-Nearest Neighbor classifier for intrusion detection. Comput Secur 21(5):439–448
Liu H, Motoda H (2002) On issues of instance selection. Data Min Knowl Discov 6(2):115–130
Liu Q, Yin J, Leung V, Zhai J, Cai Z, Lin J (2014) Applying a new localized generalization error model to design neural networks. Neural Comput Appl 27(1):59–66
Liu F, Zhang D, Shen LL (2015) Study on novel curvature features for 3D fingerprint recognition. Neurocomputing 168:599–608
Mukherjee S, Sharma N (2012) Intrusion detection using naive bayes classifier with feature reduction. Proc Technol 4:119–128
Neter J (1996) Applied linear statistical models. WCB/MacGraw-Hill, Boston
ISCX NSL-KDD dataset | UNB. Available at: http://www.unb.ca/research/iscx/dataset/iscx-NSL-KDD-dataset.html
Pereira C, Nakamura R, Costa K, Papa J (2012) An optimum-path forest framework for intrusion detection in computer networks. Eng Appl Artif Intell 25(6):1226–1234
Qiu M, Zhang L, Ming Z, Chen Z, Qin X, Yang L (2013) Security-aware optimization for ubiquitous computing systems with SEAT graph approach. J Comput Syst Sci 79(5):518–529
Sanchez D, Trillas E (2012) Measures of fuzziness under different uses of fuzzy sets. Commun Comput Inf Sci 298:25–34
Schmidt W, Kraaijveld M, Duin R (1992) Feedforward neural networks with random weights. In: Proceedings of 11th IAPR international conference on pattern recognition, conference B: pattern recognition methodology and systems, pp 1–4
Schultz M, Eskin E, Zadok F, Stolfo S (2001) Data mining methods for detection of new malicious executables. In: Proceedings of the 2001 IEEE symposium on security and privacy, pp 38–49
Shi J, Jiang Q, Mao R, Lu M, Wang T (2015) FR-KECA: fuzzy robust kernel entropy component analysis. Neurocomputing 149:1415–1423
Spillmann B, Neuhaus M, Bunke H, Pkalska E, Duin R (2006) Transforming strings to vector spaces using prototype selection. Lecture notes in computer science, pp 287–296
Tavallaee M, Bagheri E, Lu W, Ghorbani A (2009) A detailed analysis of the KDD CUP 99 data set. In: Proceedings of the 2009 IEEE symposium on computational intelligence for security and defense applications. Available at: http://nparc.cisti-icist.nrc-cnrc.gc.ca/eng/view/accepted/?id=649fb606-4a97-47d0-b373-082cb3ac0259
Te Braake H, Van Straten G (1995) Random activation weight neural net (RAWN) for east non-iterative training. Eng Appl Artif Intell 8(1):71–80
Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern 6(6):448–452
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Wang XZ, Aamir R, Fu A (2015) Fuzziness based sample categorization for classifier performance improvement. J Intell Fuzzy Syst 29(3):1185–1196
Wang XZ, Miao Q, Zhai M, Zhai J (2012) Instance selection based on sample entropy for efficient data classification with ELM. In: Proceedings of the 2012 IEEE international conference on systems, man, and cybernetics (SMC), pp 970–974
Wang XZ (2015) Learning from big data with uncertainty-editorial. J Intell Fuzzy Syst 28(5):2329–2330
Wang XZ, Xing HJ, Li Y, Hua Q, Dong CR, Pedrycz W (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654
Wilson D (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421
Xie J, Hone K, Xie W, Gao X, Shi Y, Liu X (2013) Extending twin support vector machine classifier for multi-category classification problems. Intell Data Anal 17(4):649–664
Yan Q, Yu F (2015) Distributed denial of service attacks in software-defined networking with cloud computing. IEEE Commun Mag 53(4):52–59
Yang M, Zhu PF, Liu F, Shen LL (2015) Joint representation and pattern learning for robust face recognition. Neurocomputing 168:70–80
Yao Y, Wei Y, Gao FX, Ge Y (2006) Anomaly intrusion detection approach using hybrid MLP/CNN neural network. In: Sixth international conference on intelligent systems design and applications, vol 2, pp 1095–1102
You ZH, Lei YK, Zhu L, Xia JF, Wang B (2013) Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis. BMC Bioinf 14(Suppl 8):S10
You ZH, Yu JZ, Zhu L, Li S, Wen ZK (2014) A mapreduce based parallel SVM for large-scale predicting proteinprotein interactions. Neurocomputing 145:37–43
Zadeh L (1968) Probability measures of fuzzy events. J Math Anal Appl 23(2):421–427
Zhang Z, Shen H (2005) Application of online-training SVMs for real-time intrusion detection with different considerations. Comput Commun 28(12):1428–1442
Zhao W, Wang ZH, Cao FL, Wang DH (2015) A local learning algorithm for random weights networks. Knowl Based Syst 74:159–166
Acknowledgments
This research is supported by China Postdoctoral Science Foundations (2015M572361 and 2016T90799), Basic Research Project of Knowledge Innovation Program in Shenzhen (JCYJ20150324140036825), and National Natural Science Foundations of China (61503252 and 71371063).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ashfaq, R.A.R., He, Yl. & Chen, Dg. Toward an efficient fuzziness based instance selection methodology for intrusion detection system. Int. J. Mach. Learn. & Cyber. 8, 1767–1776 (2017). https://doi.org/10.1007/s13042-016-0557-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-016-0557-4