Abstract
Imbalanced data classification is one of the problems that emerged when classifier learning algorithms used in the worlds of business and industry. This paper proposes the methodology to improve the performance of imbalanced data classification. We balance data sets by using synthetic minority oversampling technique (SMOTE); noise generated by new data sets is eliminated by Tomek links (T-Links), support vector machine (SVM), k-nearest neighbor (KNN), and logistic regression (LR) which are selected as the base classifiers to improve classification by using stacked generalization, and the final result is generated by weighted voting. In the experiments, six UCI datasets are tested, and the experimental results show that the method is highly representative and can effectively improve the classification ability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wei, W., Li, J., Cao, L., Ou, Y., Chen, J.: Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web-Internet Web Inf. Syst. 16(4), 449–475 (2013)
Wang, G., Hao, J., Ma, J., Jiang, H.: A comparative assessment of ensemble learning for credit scoring. Expert Syst. Appl. 38(1), 223–230 (2011)
Yi, P.E.N.G., Gang, K.O.U., Guoxun, W.A.N.G., Wenshuai, W.U., Yong, S.H.I.: Ensemble of software defect predictors: an ahp-based evaluation method. Int. J. Inf. Technol. Decisi. Mak. 10(01), 187–206 (2011)
Chawla, N.V.: Data Mining for Imbalanced Datasets: An Overview. Data Mining and Knowledge Discovery Handbook. Springer, US (2005)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)
Zhi-Fei, Y.E., Wen, Y.M., Bao-Liang, L.U.: A survey of imbalanced pattern classification problems. Caai Trans. Intell. Syst. (2009)
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)
Graczyk, M., Lasota, T., Trawiński, B., Trawiński, K.: Comparison of bagging, boosting and stacking ensembles applied to real estate appraisal, vol. 5991, pp. 340–350 (2010)
Rojarath, A., Songpan, W., Pong-Inwong, C.: Improved ensemble learning for classification techniques based on majority voting. In: IEEE International Conference on Software Engineering and Service Science, pp. 107–110. IEEE (2017)
Bingyan, Xiong, Guoying, Wang, Weibin, Deng: Under-sampling method based on sample weight for imbalance data. J. Comput. Res. Dev. 53(11), 2613–2622 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Xiang, Y., Xie, Y. (2020). Imbalanced Data Classification Method Based on Ensemble Learning. In: Liang, Q., Liu, X., Na, Z., Wang, W., Mu, J., Zhang, B. (eds) Communications, Signal Processing, and Systems. CSPS 2018. Lecture Notes in Electrical Engineering, vol 517. Springer, Singapore. https://doi.org/10.1007/978-981-13-6508-9_3
Download citation
DOI: https://doi.org/10.1007/978-981-13-6508-9_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-6507-2
Online ISBN: 978-981-13-6508-9
eBook Packages: EngineeringEngineering (R0)