Skip to main content
Log in

SVM classification for imbalanced data sets using a multiobjective optimization framework

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

Classification of imbalanced data sets in which negative instances outnumber the positive instances is a significant challenge. These data sets are commonly encountered in real-life problems. However, performance of well-known classifiers is limited in such cases. Various solution approaches have been proposed for the class imbalance problem using either data-level or algorithm-level modifications. Support Vector Machines (SVMs) that have a solid theoretical background also encounter a dramatic decrease in performance when the data distribution is imbalanced. In this study, we propose an L 1-norm SVM approach that is based on a three objective optimization problem so as to incorporate into the formulation the error sums for the two classes independently. Motivated by the inherent multi objective nature of the SVMs, the solution approach utilizes a reduction into two criteria formulations and investigates the efficient frontier systematically. The results indicate that a comprehensive treatment of distinct positive and negative error levels may lead to performance improvements that have varying degrees of increased computational effort.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Akbani, R., Kwek, S., & Japkowicz, N. (2004). Applying support vector machines to imbalanced datasets. In ECML (pp. 39–50).

    Google Scholar 

  • Aytug, H., & Sayin, S. (2012). Choosing the trade-off parameter for one-norm support vector machines. European Journal of Operational Research, 218(3), 667–675.

    Article  Google Scholar 

  • Benson, H. (1979). Vector optimization with two objective functions. Journal of Optimization Theory and Applications, 28(2), 253–257.

    Article  Google Scholar 

  • Chan, P. K., & Stolfo, S. J. (1998). Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In Proceedings of the fourth international conference on knowledge discovery and data mining (pp. 164–168). Menlo Park: AAAI Press.

    Google Scholar 

  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W.P. (2002). Smote: synthetic minority over-sampling technique. The Journal of Artificial Intelligence Research, 51(16), 321–357.

    Google Scholar 

  • Chawla, N. V., & Japkowicz, N. (2004). Editorial: Special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter, 6, 2004.

    Article  Google Scholar 

  • Chen, X., Gerlach, B., & Casasent, D. (2005). Pruning support vectors for imbalanced data classification. In Proceedings of international joint conference on neural networks, Montreal, Canada.

    Google Scholar 

  • Cristianini, N., Kandola, J., Elisseeff, A., & Shawe-Taylor, J. (2002). On kernel-target alignment. In Advances in neural information processing systems (Vol. 14, pp. 367–373). Cambridge: MIT Press.

    Google Scholar 

  • Domingos, P. (1999). Metacost: A general method for making classifiers cost-sensitive. In Proceedings of the fifth international conference on knowledge discovery and data mining (pp. 155–164). New York: ACM.

    Chapter  Google Scholar 

  • Eitrich, T., & Lang, B. (2005). Parallel tuning of support vector machine learning parameters for large and unbalanced data sets. In CompLife 2005 (pp. 253–264).

    Google Scholar 

  • Ezawa, K. J., Singh, M., & Norton, S. W. (1996). Learning goal oriented Bayesian networks for telecommunications risk management. In Proceedings of the 13th international conference on machine learning (pp. 139–147). Los Altos: Kaufmann.

    Google Scholar 

  • Fawcett, T., & Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1, 291–316.

    Article  Google Scholar 

  • Fruhwirth, B., & Mekelburg, K. (1994). On the efficient point set of tricriteria linear programs. European Journal of Operations Research, 3(72), 192–199.

    Article  Google Scholar 

  • Monard, M. C., Batista, G., & Carvalho, A. (2000). Applying one-sided selection to unbalanced datasets (pp. 315–325). Berlin: Springer.

    Google Scholar 

  • Gu, Q., Cai, Z., Zhu, L., & Huang, B. (2008). Data mining on imbalanced data sets, pp. 1020–1024.

  • CPLEX (2011). IBM ILOG Concert Technology v.12.3.

  • Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A systematic study. Intelligent Data Analysis, 6, 429–449.

    Google Scholar 

  • Japkowicz, N. (2000). The class imbalance problem: significance and strategies. In Proceedings of the 2000 international conference on artificial intelligence (ICAI) (pp. 111–117).

    Google Scholar 

  • Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Handling imbalanced datasets: a review.

  • Kouvelis, P., & Sayin, S. (2006). Algorithm robust for the bicriteria discrete optimization problem: heuristic variations and computational evidence. Annals of Operations Research, 147, 71–85.

    Article  Google Scholar 

  • Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced training sets: one-sided selection. In Proceedings of the 14th international conference on machine learning.

    Google Scholar 

  • Kubat, M., Holte, R. C., & Matwin, S. (1998). Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 6, 195–215.

    Article  Google Scholar 

  • Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proc. 4th international conf. on knowledge discovery and data mining (KDD-98), New York (pp. 73–79). Menlo Park: AAAI Press.

    Google Scholar 

  • Öztürk, A. (2009). SVM classification for imbalanced datasets with multiobjective optimization framework. Ms thesis, Graduate School of Sciences and Engineering, Koç University, İstanbul, Turkey.

  • Provost, F., & Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learning, 42(3), 203–231.

    Article  Google Scholar 

  • Tang, Y., Zhang, Y. Q., Chawla, N. V., & Krasser, S. (2009). SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man and Cybernetics, 39(1), 281–288.

    Article  Google Scholar 

  • Veropoulos, K., Campbell, C., & Cristianini, N. (1999). Controlling the sensitivity of support vector machines. In Proceedings of the international joint conference on AI (pp. 55–60).

    Google Scholar 

  • Visa, S. (2005). Issues in mining imbalanced data sets—a review paper. In Proceedings of the sixteen Midwest artificial intelligence and cognitive science conference, 2005 (pp. 67–73).

    Google Scholar 

  • Wang, S., Jiang, W., & Tsui, K.-L. (2010). Adjusted support vector machines based on a new loss function. Annals of Operations Research, 174, 83–101.

    Article  Google Scholar 

  • Weiss, G. M. (2004). Mining with rarity: a unifying framework. ACM SIGKDD Explorations Newsletter, 6(1), 7–19.

    Article  Google Scholar 

  • Witten, I., & Frank, E. (2000). Data mining: practical machine learning tools and techniques with Java implementations. San Mateo: Kaufmann.

    Google Scholar 

  • Wu, G., & Chang, E. Y. (2003). Class-boundary alignment for imbalanced dataset learning. In ICML 2003 workshop on learning from imbalanced data sets (pp. 49–56).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Serpil Sayın.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aşkan, A., Sayın, S. SVM classification for imbalanced data sets using a multiobjective optimization framework. Ann Oper Res 216, 191–203 (2014). https://doi.org/10.1007/s10479-012-1300-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-012-1300-5

Keywords

Navigation