Imbalanced Data Classification Method Based on Ensemble Learning

Xiang, Yu; Xie, Yongping

doi:10.1007/978-981-13-6508-9_3

Yu Xiang⁴⁰ &
Yongping Xie⁴⁰

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 517))

Included in the following conference series:

International Conference in Communications, Signal Processing, and Systems

2389 Accesses
2 Citations

Abstract

Imbalanced data classification is one of the problems that emerged when classifier learning algorithms used in the worlds of business and industry. This paper proposes the methodology to improve the performance of imbalanced data classification. We balance data sets by using synthetic minority oversampling technique (SMOTE); noise generated by new data sets is eliminated by Tomek links (T-Links), support vector machine (SVM), k-nearest neighbor (KNN), and logistic regression (LR) which are selected as the base classifiers to improve classification by using stacked generalization, and the final result is generated by weighted voting. In the experiments, six UCI datasets are tested, and the experimental results show that the method is highly representative and can effectively improve the classification ability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wei, W., Li, J., Cao, L., Ou, Y., Chen, J.: Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web-Internet Web Inf. Syst. 16(4), 449–475 (2013)
Article Google Scholar
Wang, G., Hao, J., Ma, J., Jiang, H.: A comparative assessment of ensemble learning for credit scoring. Expert Syst. Appl. 38(1), 223–230 (2011)
Article Google Scholar
Yi, P.E.N.G., Gang, K.O.U., Guoxun, W.A.N.G., Wenshuai, W.U., Yong, S.H.I.: Ensemble of software defect predictors: an ahp-based evaluation method. Int. J. Inf. Technol. Decisi. Mak. 10(01), 187–206 (2011)
Article Google Scholar
Chawla, N.V.: Data Mining for Imbalanced Datasets: An Overview. Data Mining and Knowledge Discovery Handbook. Springer, US (2005)
Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)
Article Google Scholar
Zhi-Fei, Y.E., Wen, Y.M., Bao-Liang, L.U.: A survey of imbalanced pattern classification problems. Caai Trans. Intell. Syst. (2009)
Google Scholar
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)
Article Google Scholar
Graczyk, M., Lasota, T., Trawiński, B., Trawiński, K.: Comparison of bagging, boosting and stacking ensembles applied to real estate appraisal, vol. 5991, pp. 340–350 (2010)
Google Scholar
Rojarath, A., Songpan, W., Pong-Inwong, C.: Improved ensemble learning for classification techniques based on majority voting. In: IEEE International Conference on Software Engineering and Service Science, pp. 107–110. IEEE (2017)
Google Scholar
Bingyan, Xiong, Guoying, Wang, Weibin, Deng: Under-sampling method based on sample weight for imbalance data. J. Comput. Res. Dev. 53(11), 2613–2622 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Communication Engineering, Dalian University of Technology, Dalian, Liaoning, China
Yu Xiang & Yongping Xie

Authors

Yu Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Yongping Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Xiang .

Editor information

Editors and Affiliations

Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA
Qilian Liang
School of Information and Communication Engineering, Dalian University of Technology, Dalian, China
Xin Liu
School of Information Science and Technology, Dalian Maritime University, Dalian, China
Zhenyu Na
College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, China
Wei Wang
College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, China
Jiasong Mu
College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, China
Baoju Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiang, Y., Xie, Y. (2020). Imbalanced Data Classification Method Based on Ensemble Learning. In: Liang, Q., Liu, X., Na, Z., Wang, W., Mu, J., Zhang, B. (eds) Communications, Signal Processing, and Systems. CSPS 2018. Lecture Notes in Electrical Engineering, vol 517. Springer, Singapore. https://doi.org/10.1007/978-981-13-6508-9_3

Download citation

DOI: https://doi.org/10.1007/978-981-13-6508-9_3
Published: 14 June 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-6507-2
Online ISBN: 978-981-13-6508-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics