Analysis of Cost-Sensitive Algorithms for Degree of Imbalancing

Tangudu, Sai Teja; Kumar, Rajeev

doi:10.1007/978-3-031-38296-3_6

Sai Teja Tangudu¹⁹ &
Rajeev Kumar¹⁹

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 673))

Included in the following conference series:

International Conference on Computational Intelligence in Data Science

116 Accesses

Abstract

Class imbalance, along with other characteristics of the data, such as feature set and class separability, could affect the performance of most machine learning algorithms. This can be attributed to the algorithm’s primary assumption about the data being class-balanced and indifferent weightage among different misclassification errors. Class imbalance can be handled using both data and algorithmic-level methods. In this paper, we work on cost-sensitive learning, applied at the algorithmic level. We use the Cost-Sensitive Logistic Regression (CSLR) algorithm as a reference. We propose a methodology to empirically evaluate the performance of cost-sensitive algorithms over varying degrees of imbalanced data. Cost-sensitive learning induces a cost matrix consisting of the weighting scheme of different misclassification errors into the algorithm’s training process. This, in turn, forces the model to penalize the misclassification errors according to the skewness of the data to reduce the learning bias of the model towards the majority class. we present empirical evaluations of the reference model over four popular datasets and analyse its behaviour using MAE and Kappa values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database.

References

Statlog (Vehicle Silhouettes). UCI Machine Learning Repository
Google Scholar
Abalone. UCI Machine Learning Repository (1995)
Google Scholar
Cao, P., Zhao, D., Zaiane, O.: An optimized cost-sensitive SVM for imbalanced data learning. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 280–292. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_24
Chapter Google Scholar
Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of 5th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 155–164 (1999)
Google Scholar
Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Cost-sensitive learning. In: Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F. (eds.) Learning from Imbalanced Data Sets, pp. 63–78. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98074-4_4
Chapter Google Scholar
Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progr. Artif. Intell. 5(4), 221–232 (2016)
Article Google Scholar
Lin, W.C., Tsai, C.F., Hu, Y.H., Jhang, J.S.: Clustering-based undersampling in class-imbalanced data. Inf. Sci. 409, 17–26 (2017)
Article Google Scholar
Maloof, M.A.: Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML Workshop on Learning from Imbalanced Data Sets II, vol. 2, pp. 2–1 (2003)
Google Scholar
Mienye, I.D., Sun, Y.: Performance analysis of cost-sensitive learning methods with application to imbalanced medical data. Inform. Med. Unlock. 25, 100690 (2021)
Google Scholar
Moepya, S.O., Akhoury, S.S., Nelwamondo, F.V.: Applying cost-sensitive classification for financial fraud detection under high class-imbalance. In: Proceedings IEEE International Conference on Data Mining Workshop, pp. 183–192. IEEE (2014)
Google Scholar
Sigillito, V., Wing, S.: Ionosphere. UCI Machine Learning Repository (1989)
Google Scholar
Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Data to Knowledge (D2K) Lab, School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, 110 067, India
Sai Teja Tangudu & Rajeev Kumar

Authors

Sai Teja Tangudu
View author publications
You can also search for this author in PubMed Google Scholar
Rajeev Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sai Teja Tangudu .

Editor information

Editors and Affiliations

Sri Sivasubramaniya Nadar College of Engineering, Chennai, India
Sarath Chandran K R
Sri Sivasubramaniya Nadar College of Engineering, Chennai, India
Sujaudeen N
Sri Sivasubramaniya Nadar College of Engineering, Chennai, India
Beulah A
Sri Sivasubramaniya Nadar College of Engineering, Chennai, India
Shahul Hamead H

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tangudu, S.T., Kumar, R. (2023). Analysis of Cost-Sensitive Algorithms for Degree of Imbalancing. In: Chandran K R, S., N, S., A, B., Hamead H, S. (eds) Computational Intelligence in Data Science. ICCIDS 2023. IFIP Advances in Information and Communication Technology, vol 673. Springer, Cham. https://doi.org/10.1007/978-3-031-38296-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-38296-3_6
Published: 22 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38295-6
Online ISBN: 978-3-031-38296-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Analysis of Cost-Sensitive Algorithms for Degree of Imbalancing