Skip to main content
Log in

Equation of SVM-rebalancing: the point-normal form of a plane for class imbalance problem

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The task of detecting a rare but important class has extensively been studied in the machine learning community. It is commonly agreed that traditional classifiers are certainly limited to imbalanced datasets and do not perform well. A number of solutions to the problem were proposed at both data and algorithmic levels. We propose the point-normal form of a plane, namely SVM-rebalancing, to be based on the second type. In this learning process, the assumption of pseudo-prior probabilities provides a rebalanced recipe for countering the imbalance inspired by Bayesian decision theory. Thus, we set a rebalancing programming problem by incorporating a rebalanced heuristics into the fitting of model to raise the class separability. In addition, various measures are used to characterize the performance of classifiers. Compared with several popular decision tree splitting criteria and cost-sensitive learning, the proposed method gives comparable separability with minority class to avoid heavy biasing of the majority class.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the 14th international conference on machine learning, pp 179–186

  2. Japkowicz N (ed) (2000) Proceeding of the AAAI’2000 workshop on learning from imbalanced data sets. Technical Report WS-00-05. AAAI Press, Menlo Park, CA

  3. Chawla NV, Japkowicz N, Kolcz A (eds) (2003) Proceedings of the ICML’2003 workshop on learning from imbalanced data sets, August 2003. http://www.site.uottawa.ca/~nat/Workshop2003/workshop2003.html

  4. Weiss G (2004) Mining with rarity: a unifying framework. SIGKDD Explorations 6(1):7–19

    Article  Google Scholar 

  5. Prati RC, Batista GEAPA, Monard MC (2004) Class imbalances versus class overlapping: an analysis of a learning system behavior. In: MICAI, pp 312–321

  6. Visa S, Ralescu A (2005) Issues in mining imbalanced data sets—a review paper. In: Proceeding of the sixteen midwest artificial intelligence and cognitive science conference, Dayton, Ohio, USA, pp 67–73

  7. Chawla NV, Japcowicz N, Kolcz A (2004) Editorial: special issue on learning from imbalanced datasets. SIGKDD Explorations 6(1):1–6

    Article  Google Scholar 

  8. Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin

    Book  Google Scholar 

  9. Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Networks 10:988–999

    Article  Google Scholar 

  10. Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proceedings of 15th ECML, pp 39–50

  11. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  12. Veropoulos K, Campbell C, Cristianini N (1999) Controlling the sensitivity of support vector machines. In: Proceedings of the international joint conference on artificial intelligence, pp 55–60

  13. Tang Y, Zhang Y-Q, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B 39(1):281–288

    Article  Google Scholar 

  14. Tomek I (1976) Two modifications of CNN. IEEE Transactions on Systems Man and Communications 6:769–772

    MathSciNet  MATH  Google Scholar 

  15. Wu G, Chang EY (2005) KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Transaction on Knowledge and Data Engineering 17(6):786–795

    Article  Google Scholar 

  16. Bishop CM (2006) Pattern recognition and machine learning. Springer, New York

    MATH  Google Scholar 

  17. Sollich P (2002) Bayesian methods for support vector machines: evidence and predictive class probabilities. Mach Learn 46:21–52

    Article  Google Scholar 

  18. Ghosal S, Roy A (2006) Posterior consistency of gaussian process prior for nonparametric binary regression. Ann. Statist 34(5):2413–2429

    Article  MathSciNet  Google Scholar 

  19. Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge

    MATH  Google Scholar 

  20. Chung HY, Ho CH (2009) Design of Bayesian-based knowledge extraction for SVMs in unbalanced classifications. Department of Electrical Engineering, National Central University, Jhongli, Taiwan, ROC

    Google Scholar 

  21. Hsu CC, Wang KS, Chang SH (2011) Bayesian decision theory for support vector machines: Imbalance measurement and feature optimization. Expert Syst Appl 38(5):4698–4704

    Article  Google Scholar 

  22. Chung HY, Ho CH, Hsu CC (2011) Support vector machines using Bayesian-based approach in the issue of unbalanced classifications. Expert Syst Appl 38(9):11447–11452

    Article  Google Scholar 

  23. Van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworth, London

    MATH  Google Scholar 

  24. Buckland M, Gey F (1994) The relationship between Recall and Precision. Journal of American Society for Information Science 45(1):12–19

    Article  Google Scholar 

  25. Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159

    Article  Google Scholar 

  26. Lin Y, Yoonkyun L, Grace W (2002) Support vector machines for classification in nonstandard situations. Mach Learn 46(3):191–202

    Article  Google Scholar 

  27. Cieslak DA, Chawla NV (2008) Learning decision trees for unbalanced data. In: European conference on principles and practice of knowledge discovery in databases, pp 241–256

  28. Vilariño F, Spyridonos P, Vitrià J, Radeva P (2005) Experiments with SVM and stratified sampling with an imbalanced problem: detection of intestinal contractions. In: Proceedings of 3rd ICAPR, pp 783–791

  29. Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell. Data Anal. 6(5):429–449

    Article  Google Scholar 

  30. Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, Hoboken

    MATH  Google Scholar 

  31. Breiman L (1996) Bias, variance and arcing classifiers. Technical Report 460. Statistics Department, University of California at Berkeley, Berkeley, CA

  32. Murphy PM (1995) UCI-benchmark repository of artificial and real data sets. University of California Irvine, CA. http://www.ics.uci.edu/~mlearn

  33. Vlachos P, Meyer M (1989) StatLib. Department of Statistics, Carnegie Mellon University, http://lib.stat.cmu.edu/

  34. Hastie T, Tibshirani R, Friendman J (2001) The elements of statistical learning: data mining, inference and prediction. Springer, Berlin, pp 214–217

    Book  Google Scholar 

  35. Tang Y, Zhang Y-Q, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern B Cybern 39(1):281–288

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Che-Chang Hsu.

Ethics declarations

Conflict of interest

We confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hsu, CC., Wang, KS., Chung, HY. et al. Equation of SVM-rebalancing: the point-normal form of a plane for class imbalance problem. Neural Comput & Applic 31, 6013–6025 (2019). https://doi.org/10.1007/s00521-018-3419-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-018-3419-z

Keywords

Navigation