Abstract
In order to reduce the loss of information of the majority class samples in the resampling process, combining the distribution of class samples and the characteristics of ensemble learning algorithm, in this paper, a two-level selective ensemble learning algorithm for imbalanced datasets is proposed. Firstly, the algorithm under-samples the majority class samples and constructs multiple training subsets. The training process will generate multiple base classifiers using AdaBoost algorithm, then select some base classifiers according to maximum correlation and minimum redundancy criteria, and form sub-classifiers according to weighted integration. Then, generate multiple sub-classifiers for multiple training subsets, and then, select some sub-classifiers according to maximum correlation and minimum redundancy criteria. Then, the weights of the selected sub-classifiers are calculated by F-means or G-means, and the ensemble classifier is obtained by weighted voting. Finally, the improved algorithm for imbalanced dataset is applied to the network anomaly detection. The experimental results on UCI datasets show that this method can improve the classification performance to a certain extent, especially for imbalanced datasets. Finally, the algorithm is applied to network anomaly detection for Internet of Things. From the simulation data of KDDCUP99 dataset, we can see that TLSE-ID algorithm has a small missing report rate and high precision.
Similar content being viewed by others
References
Duan LX, Guo H, Wang JJ (2016) A mechanical fault severity identification method under unbalanced datasets. J Vib Shock China 35(20):178–182
Du H, Teng S, Zhang L (2016) Support vector machine based on dynamic density equalization. In: Human Centered Computing 2nd International Conference 2016, Lecture Notes in Computer Science(LNCS), vol 9567. Springer Verlag, Berlin, pp 58–69
Liu G, Chen Z, Zhuang Z, Guo W, Chen G (2020) A unified algorithm based on HTS and self-adapting PSO for the construction of octagonal and rectilinear SMT. Soft Comput 24(6):3943–3961. https://doi.org/10.1007/s00500-019-04165-2
Wang J, Zhang XM, Lin Y et al (2018) Event-triggered dissipative control for networked stochastic systems under non-uniform sampling. Inf Sci. https://doi.org/10.1016/j.ins.2018.03.003
Jian C, Gao J, Ao Y (2016) A new sampling method for classifying imbalanced data based on support vector machine ensemble. Neurocomputing 193(1):115–122
Li YH, Lou XG, Qin YK et al (2015) RMPCM: network-wide anomaly detection method based on robust multivariate probabilistic calibration model. J Commun Chin 36(11):201–212 (in Chinese)
Zou J, Dong L, Wu W (2018) New algorithms for the unbalanced generalized birthday problem. IET Inf Secur. https://doi.org/10.1049/iet-ifs.2017.0495
Han M, Lu F (2015) Selective ensemble of extreme learning machine with kernels based on mutual information. Control Decis 30(11):2089–2092 (in Chinese)
Zhong S, Chen T, He F et al (2014) Fast Gaussian kernel learning for classification tasks based on specially structured global optimization. Neural Networks 57:51–62
Guo WZ, Chen JY, Chen GL et al (2015) Trust dynamic task allocation algorithm with Nash equilibrium for heterogeneous wireless sensor network. Secur Commun Networks 8(10):1865–1877
Wang Q, Luo ZH, Huang JC et al (2017) A novel ensemble method for imbalanced data learning: bagging of extrapolation-SMOTE SVM. Comput Intell Neurosci 2017(3):1827016
Du HL (2016) Algorithm for imbalanced dataset based on K-nearest neighbor in kernel space. J Front Comput Sci Technol 9(7):869–876 (in Chinese)
Zhou YH, Zhou ZH (2016) Large margin distribution learning with cost interval and unlabeled data. IEEE Trans Knowl Data Eng 28(7):1749–1763
Haque MN, Noman N, Berretta R et al (2016) Heterogeneous ensemble combination search using genetic algorithm for class imbalanced data classification. PLoS ONE 11(1):e0146116
Wang S, Minku LL, Yao X (2015) Resampling-based ensemble methods for online class imbalance learning. IEEE Trans Knowl Data Eng 27(5):1356–1368
Zhou JX, Zhou ZH, Shen XH et al (2000) A selective constructing approach to neural network ensemble. J Calcul Res Dev 37(9):1039–1044
Zhou ZH, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137(1–2):239–263
Zhu ZH, Wang Z, Li DD et al (2020) Geometric structural ensemble learning for imbalanced problems. IEEE Transac Cyber 50(4):1617–1629
Guo H, Li Y, Li Y et al (2016) BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification. Eng Appl Artif Intell 49(C):176–193
Potharaju SP, Sreedevi M (2017) Ensembled rule based classification algorithms for predicting imbalanced kidney disease data. J Eng Sci Technol Rev 9(5):201–207
Ng WWY, Hu J, Yeung DSS et al (2017) Diversified sensitivity-based undersampling for imbalance classification problems. IEEE Trans Cybern 45(11):2402–2412
Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern B Cybern 39(2):539–550
Zhai J, Zhang S, Wang C (2017) The classification of imbalanced large data sets based on MapReduce and ensemble of ELM classifiers. Int J Mach Learn Cyber 8(3):1009–1017
Yu H, Ni J (2014) An improved ensemble learning method for classifying high-dimensional and imbalanced biomedicine data. IEEE/ACM Trans Comput Biol Bioinf 11(4):1339–1347
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Zhang Y, Yang A, Xiong C et al (2014) Feature selection using data envelopment analysis. Knowl-Based Syst 64:70–80
Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207
Tao XL, Kang RN, Liu LY (2018) A parallel multi-classifier fusion approach based on selective ensemble. Comput Eng Sci China 40(5):787–792
Acknowledgements
This work was supported by the Natural Science Foundation Research Project of Shaanxi Province Foundation of China (No. 2019KRM095); Science and Technology Plan Project of Shangluo of China (No. SK2019-84); Science and Technology Research Project of Shangluo University (No. 18SKY014); Science and Technology Innovation Team Building Project of Shangluo University (No. 18SCX002); and Key Discipline Construction Project of Shangluo University, Subject Name: Mathematics.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Du, H., Zhang, Y. Network anomaly detection based on selective ensemble algorithm. J Supercomput 77, 2875–2896 (2021). https://doi.org/10.1007/s11227-020-03374-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03374-z