Abstract
Class imbalance, along with other characteristics of the data, such as feature set and class separability, could affect the performance of most machine learning algorithms. This can be attributed to the algorithm’s primary assumption about the data being class-balanced and indifferent weightage among different misclassification errors. Class imbalance can be handled using both data and algorithmic-level methods. In this paper, we work on cost-sensitive learning, applied at the algorithmic level. We use the Cost-Sensitive Logistic Regression (CSLR) algorithm as a reference. We propose a methodology to empirically evaluate the performance of cost-sensitive algorithms over varying degrees of imbalanced data. Cost-sensitive learning induces a cost matrix consisting of the weighting scheme of different misclassification errors into the algorithm’s training process. This, in turn, forces the model to penalize the misclassification errors according to the skewness of the data to reduce the learning bias of the model towards the majority class. we present empirical evaluations of the reference model over four popular datasets and analyse its behaviour using MAE and Kappa values.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Statlog (Vehicle Silhouettes). UCI Machine Learning Repository
Abalone. UCI Machine Learning Repository (1995)
Cao, P., Zhao, D., Zaiane, O.: An optimized cost-sensitive SVM for imbalanced data learning. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 280–292. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_24
Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of 5th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 155–164 (1999)
Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Cost-sensitive learning. In: Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F. (eds.) Learning from Imbalanced Data Sets, pp. 63–78. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98074-4_4
Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progr. Artif. Intell. 5(4), 221–232 (2016)
Lin, W.C., Tsai, C.F., Hu, Y.H., Jhang, J.S.: Clustering-based undersampling in class-imbalanced data. Inf. Sci. 409, 17–26 (2017)
Maloof, M.A.: Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML Workshop on Learning from Imbalanced Data Sets II, vol. 2, pp. 2–1 (2003)
Mienye, I.D., Sun, Y.: Performance analysis of cost-sensitive learning methods with application to imbalanced medical data. Inform. Med. Unlock. 25, 100690 (2021)
Moepya, S.O., Akhoury, S.S., Nelwamondo, F.V.: Applying cost-sensitive classification for financial fraud detection under high class-imbalance. In: Proceedings IEEE International Conference on Data Mining Workshop, pp. 183–192. IEEE (2014)
Sigillito, V., Wing, S.: Ionosphere. UCI Machine Learning Repository (1989)
Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 IFIP International Federation for Information Processing
About this paper
Cite this paper
Tangudu, S.T., Kumar, R. (2023). Analysis of Cost-Sensitive Algorithms for Degree of Imbalancing. In: Chandran K R, S., N, S., A, B., Hamead H, S. (eds) Computational Intelligence in Data Science. ICCIDS 2023. IFIP Advances in Information and Communication Technology, vol 673. Springer, Cham. https://doi.org/10.1007/978-3-031-38296-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-38296-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38295-6
Online ISBN: 978-3-031-38296-3
eBook Packages: Computer ScienceComputer Science (R0)