Abstract
Classification of imbalanced data is one of most challenging aspects of machine learning. Despite over two decades of progress there is still a need for developing new techniques capable to overcome numerous difficulties embedded in the nature of imbalanced datasets. In this paper, we propose Locally Linear Support Vector Machines (LL-SVMs) for effectively handling imbalanced datasets. LL-SVMs is a lazy learning approach which trains a local classifier for each new test instance using its k nearest neighbors. This way, we are able to maximize the margin in the original input features space and obtain a better adaptation to complex class boundaries. We combine LL-SVMs with local oversampling and cost-sensitive approaches to make them skew-insensitive. Working only in the local neighborhood significantly improves the generalization over the minority class and tackles instance-level difficulties, such as class overlapping, borderline and noisy instances, as well as small disjuncts. An extensive experimental study shows that our local models are able to outperform their global counterparts, especially when handling difficult, borderline, and noisy imbalanced datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_7
Bernard, S., Chatelain, C., Adam, S., Sabourin, R.: The multiclass ROC front method for cost-sensitive classification. Pattern Recogn. 52, 46–60 (2016)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Datta, S., Das, S.: Near-Bayesian support vector machines for imbalanced data classification with equal or unequal misclassification costs. Neural Netw. 70, 39–52 (2015)
Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced Data Sets. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98074-4
Gu, B., Quan, X., Gu, Y., Sheng, V.S., Zheng, G.: Chunk incremental learning for cost-sensitive hinge loss support vector machine. Pattern Recogn. 83, 196–208 (2018)
Iranmehr, A., Masnadi-Shirazi, H., Vasconcelos, N.: Cost-sensitive support vector machines. Neurocomputing 343, 50–64 (2019)
Kecman, V., Brooks, J.P.: Locally linear support vector machines and other local models. In: IJCNN, pp. 1–6. IEEE (2010)
Koziarski, M., Krawczyk, B., Wozniak, M.: Radial-based oversampling for noisy imbalanced data classification. Neurocomputing 343, 19–33 (2019)
Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. AI 5(4), 221–232 (2016)
Liang, X.W., Jiang, A.P., Li, T., Xue, Y.Y., Wang, G.: LR-SMOTE - an improved unbalanced data set oversampling based on k-means and SVM. Knowl. Based Syst. 196 (2020)
Napierala, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: International Conference on Rough Sets and Current Trends in Computing, pp. 158–167 (2010)
Tang, B., He, H.: Kerneladasyn: Kernel based adaptive synthetic data generation for imbalanced learning. In: CEC, pp. 664–671. IEEE (2015)
Tao, X., Li, Q., Guo, W., Ren, C., Li, C., Liu, R., Zou, J.: Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification. Inf. Sci. 487, 31–56 (2019)
Yan, Y., et al.: Oversampling for imbalanced data via optimal transport. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, Honolulu, Hawaii, USA, 27 January–1 February 1, 2019, pp. 5605–5612. AAAI Press (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Krawczyk, B., Cano, A. (2021). Locally Linear Support Vector Machines for Imbalanced Data Classification. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12712. Springer, Cham. https://doi.org/10.1007/978-3-030-75762-5_49
Download citation
DOI: https://doi.org/10.1007/978-3-030-75762-5_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75761-8
Online ISBN: 978-3-030-75762-5
eBook Packages: Computer ScienceComputer Science (R0)