Abstract
The task of detecting a rare but important class has extensively been studied in the machine learning community. It is commonly agreed that traditional classifiers are certainly limited to imbalanced datasets and do not perform well. A number of solutions to the problem were proposed at both data and algorithmic levels. We propose the point-normal form of a plane, namely SVM-rebalancing, to be based on the second type. In this learning process, the assumption of pseudo-prior probabilities provides a rebalanced recipe for countering the imbalance inspired by Bayesian decision theory. Thus, we set a rebalancing programming problem by incorporating a rebalanced heuristics into the fitting of model to raise the class separability. In addition, various measures are used to characterize the performance of classifiers. Compared with several popular decision tree splitting criteria and cost-sensitive learning, the proposed method gives comparable separability with minority class to avoid heavy biasing of the majority class.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the 14th international conference on machine learning, pp 179–186
Japkowicz N (ed) (2000) Proceeding of the AAAI’2000 workshop on learning from imbalanced data sets. Technical Report WS-00-05. AAAI Press, Menlo Park, CA
Chawla NV, Japkowicz N, Kolcz A (eds) (2003) Proceedings of the ICML’2003 workshop on learning from imbalanced data sets, August 2003. http://www.site.uottawa.ca/~nat/Workshop2003/workshop2003.html
Weiss G (2004) Mining with rarity: a unifying framework. SIGKDD Explorations 6(1):7–19
Prati RC, Batista GEAPA, Monard MC (2004) Class imbalances versus class overlapping: an analysis of a learning system behavior. In: MICAI, pp 312–321
Visa S, Ralescu A (2005) Issues in mining imbalanced data sets—a review paper. In: Proceeding of the sixteen midwest artificial intelligence and cognitive science conference, Dayton, Ohio, USA, pp 67–73
Chawla NV, Japcowicz N, Kolcz A (2004) Editorial: special issue on learning from imbalanced datasets. SIGKDD Explorations 6(1):1–6
Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin
Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Networks 10:988–999
Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proceedings of 15th ECML, pp 39–50
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Veropoulos K, Campbell C, Cristianini N (1999) Controlling the sensitivity of support vector machines. In: Proceedings of the international joint conference on artificial intelligence, pp 55–60
Tang Y, Zhang Y-Q, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B 39(1):281–288
Tomek I (1976) Two modifications of CNN. IEEE Transactions on Systems Man and Communications 6:769–772
Wu G, Chang EY (2005) KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Transaction on Knowledge and Data Engineering 17(6):786–795
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
Sollich P (2002) Bayesian methods for support vector machines: evidence and predictive class probabilities. Mach Learn 46:21–52
Ghosal S, Roy A (2006) Posterior consistency of gaussian process prior for nonparametric binary regression. Ann. Statist 34(5):2413–2429
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge
Chung HY, Ho CH (2009) Design of Bayesian-based knowledge extraction for SVMs in unbalanced classifications. Department of Electrical Engineering, National Central University, Jhongli, Taiwan, ROC
Hsu CC, Wang KS, Chang SH (2011) Bayesian decision theory for support vector machines: Imbalance measurement and feature optimization. Expert Syst Appl 38(5):4698–4704
Chung HY, Ho CH, Hsu CC (2011) Support vector machines using Bayesian-based approach in the issue of unbalanced classifications. Expert Syst Appl 38(9):11447–11452
Van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworth, London
Buckland M, Gey F (1994) The relationship between Recall and Precision. Journal of American Society for Information Science 45(1):12–19
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159
Lin Y, Yoonkyun L, Grace W (2002) Support vector machines for classification in nonstandard situations. Mach Learn 46(3):191–202
Cieslak DA, Chawla NV (2008) Learning decision trees for unbalanced data. In: European conference on principles and practice of knowledge discovery in databases, pp 241–256
Vilariño F, Spyridonos P, Vitrià J, Radeva P (2005) Experiments with SVM and stratified sampling with an imbalanced problem: detection of intestinal contractions. In: Proceedings of 3rd ICAPR, pp 783–791
Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell. Data Anal. 6(5):429–449
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, Hoboken
Breiman L (1996) Bias, variance and arcing classifiers. Technical Report 460. Statistics Department, University of California at Berkeley, Berkeley, CA
Murphy PM (1995) UCI-benchmark repository of artificial and real data sets. University of California Irvine, CA. http://www.ics.uci.edu/~mlearn
Vlachos P, Meyer M (1989) StatLib. Department of Statistics, Carnegie Mellon University, http://lib.stat.cmu.edu/
Hastie T, Tibshirani R, Friendman J (2001) The elements of statistical learning: data mining, inference and prediction. Springer, Berlin, pp 214–217
Tang Y, Zhang Y-Q, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern B Cybern 39(1):281–288
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.
Rights and permissions
About this article
Cite this article
Hsu, CC., Wang, KS., Chung, HY. et al. Equation of SVM-rebalancing: the point-normal form of a plane for class imbalance problem. Neural Comput & Applic 31, 6013–6025 (2019). https://doi.org/10.1007/s00521-018-3419-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-018-3419-z