Skip to main content

Locally Linear Support Vector Machines for Imbalanced Data Classification

  • Conference paper
  • First Online:
Book cover Advances in Knowledge Discovery and Data Mining (PAKDD 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12712))

Included in the following conference series:

  • 3591 Accesses

Abstract

Classification of imbalanced data is one of most challenging aspects of machine learning. Despite over two decades of progress there is still a need for developing new techniques capable to overcome numerous difficulties embedded in the nature of imbalanced datasets. In this paper, we propose Locally Linear Support Vector Machines (LL-SVMs) for effectively handling imbalanced datasets. LL-SVMs is a lazy learning approach which trains a local classifier for each new test instance using its k nearest neighbors. This way, we are able to maximize the margin in the original input features space and obtain a better adaptation to complex class boundaries. We combine LL-SVMs with local oversampling and cost-sensitive approaches to make them skew-insensitive. Working only in the local neighborhood significantly improves the generalization over the minority class and tackles instance-level difficulties, such as class overlapping, borderline and noisy instances, as well as small disjuncts. An extensive experimental study shows that our local models are able to outperform their global counterparts, especially when handling difficult, borderline, and noisy imbalanced datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_7

    Chapter  Google Scholar 

  2. Bernard, S., Chatelain, C., Adam, S., Sabourin, R.: The multiclass ROC front method for cost-sensitive classification. Pattern Recogn. 52, 46–60 (2016)

    Article  Google Scholar 

  3. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  4. Datta, S., Das, S.: Near-Bayesian support vector machines for imbalanced data classification with equal or unequal misclassification costs. Neural Netw. 70, 39–52 (2015)

    Article  Google Scholar 

  5. Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced Data Sets. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98074-4

    Book  Google Scholar 

  6. Gu, B., Quan, X., Gu, Y., Sheng, V.S., Zheng, G.: Chunk incremental learning for cost-sensitive hinge loss support vector machine. Pattern Recogn. 83, 196–208 (2018)

    Article  Google Scholar 

  7. Iranmehr, A., Masnadi-Shirazi, H., Vasconcelos, N.: Cost-sensitive support vector machines. Neurocomputing 343, 50–64 (2019)

    Article  Google Scholar 

  8. Kecman, V., Brooks, J.P.: Locally linear support vector machines and other local models. In: IJCNN, pp. 1–6. IEEE (2010)

    Google Scholar 

  9. Koziarski, M., Krawczyk, B., Wozniak, M.: Radial-based oversampling for noisy imbalanced data classification. Neurocomputing 343, 19–33 (2019)

    Article  Google Scholar 

  10. Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. AI 5(4), 221–232 (2016)

    Google Scholar 

  11. Liang, X.W., Jiang, A.P., Li, T., Xue, Y.Y., Wang, G.: LR-SMOTE - an improved unbalanced data set oversampling based on k-means and SVM. Knowl. Based Syst. 196 (2020)

    Google Scholar 

  12. Napierala, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: International Conference on Rough Sets and Current Trends in Computing, pp. 158–167 (2010)

    Google Scholar 

  13. Tang, B., He, H.: Kerneladasyn: Kernel based adaptive synthetic data generation for imbalanced learning. In: CEC, pp. 664–671. IEEE (2015)

    Google Scholar 

  14. Tao, X., Li, Q., Guo, W., Ren, C., Li, C., Liu, R., Zou, J.: Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification. Inf. Sci. 487, 31–56 (2019)

    Article  MathSciNet  Google Scholar 

  15. Yan, Y., et al.: Oversampling for imbalanced data via optimal transport. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, Honolulu, Hawaii, USA, 27 January–1 February 1, 2019, pp. 5605–5612. AAAI Press (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bartosz Krawczyk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Krawczyk, B., Cano, A. (2021). Locally Linear Support Vector Machines for Imbalanced Data Classification. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12712. Springer, Cham. https://doi.org/10.1007/978-3-030-75762-5_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-75762-5_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-75761-8

  • Online ISBN: 978-3-030-75762-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics