Abstract
Classification of imbalanced data sets in which negative instances outnumber the positive instances is a significant challenge. These data sets are commonly encountered in real-life problems. However, performance of well-known classifiers is limited in such cases. Various solution approaches have been proposed for the class imbalance problem using either data-level or algorithm-level modifications. Support Vector Machines (SVMs) that have a solid theoretical background also encounter a dramatic decrease in performance when the data distribution is imbalanced. In this study, we propose an L 1-norm SVM approach that is based on a three objective optimization problem so as to incorporate into the formulation the error sums for the two classes independently. Motivated by the inherent multi objective nature of the SVMs, the solution approach utilizes a reduction into two criteria formulations and investigates the efficient frontier systematically. The results indicate that a comprehensive treatment of distinct positive and negative error levels may lead to performance improvements that have varying degrees of increased computational effort.
Similar content being viewed by others
References
Akbani, R., Kwek, S., & Japkowicz, N. (2004). Applying support vector machines to imbalanced datasets. In ECML (pp. 39–50).
Aytug, H., & Sayin, S. (2012). Choosing the trade-off parameter for one-norm support vector machines. European Journal of Operational Research, 218(3), 667–675.
Benson, H. (1979). Vector optimization with two objective functions. Journal of Optimization Theory and Applications, 28(2), 253–257.
Chan, P. K., & Stolfo, S. J. (1998). Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In Proceedings of the fourth international conference on knowledge discovery and data mining (pp. 164–168). Menlo Park: AAAI Press.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W.P. (2002). Smote: synthetic minority over-sampling technique. The Journal of Artificial Intelligence Research, 51(16), 321–357.
Chawla, N. V., & Japkowicz, N. (2004). Editorial: Special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter, 6, 2004.
Chen, X., Gerlach, B., & Casasent, D. (2005). Pruning support vectors for imbalanced data classification. In Proceedings of international joint conference on neural networks, Montreal, Canada.
Cristianini, N., Kandola, J., Elisseeff, A., & Shawe-Taylor, J. (2002). On kernel-target alignment. In Advances in neural information processing systems (Vol. 14, pp. 367–373). Cambridge: MIT Press.
Domingos, P. (1999). Metacost: A general method for making classifiers cost-sensitive. In Proceedings of the fifth international conference on knowledge discovery and data mining (pp. 155–164). New York: ACM.
Eitrich, T., & Lang, B. (2005). Parallel tuning of support vector machine learning parameters for large and unbalanced data sets. In CompLife 2005 (pp. 253–264).
Ezawa, K. J., Singh, M., & Norton, S. W. (1996). Learning goal oriented Bayesian networks for telecommunications risk management. In Proceedings of the 13th international conference on machine learning (pp. 139–147). Los Altos: Kaufmann.
Fawcett, T., & Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1, 291–316.
Fruhwirth, B., & Mekelburg, K. (1994). On the efficient point set of tricriteria linear programs. European Journal of Operations Research, 3(72), 192–199.
Monard, M. C., Batista, G., & Carvalho, A. (2000). Applying one-sided selection to unbalanced datasets (pp. 315–325). Berlin: Springer.
Gu, Q., Cai, Z., Zhu, L., & Huang, B. (2008). Data mining on imbalanced data sets, pp. 1020–1024.
CPLEX (2011). IBM ILOG Concert Technology v.12.3.
Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A systematic study. Intelligent Data Analysis, 6, 429–449.
Japkowicz, N. (2000). The class imbalance problem: significance and strategies. In Proceedings of the 2000 international conference on artificial intelligence (ICAI) (pp. 111–117).
Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Handling imbalanced datasets: a review.
Kouvelis, P., & Sayin, S. (2006). Algorithm robust for the bicriteria discrete optimization problem: heuristic variations and computational evidence. Annals of Operations Research, 147, 71–85.
Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced training sets: one-sided selection. In Proceedings of the 14th international conference on machine learning.
Kubat, M., Holte, R. C., & Matwin, S. (1998). Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 6, 195–215.
Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proc. 4th international conf. on knowledge discovery and data mining (KDD-98), New York (pp. 73–79). Menlo Park: AAAI Press.
Öztürk, A. (2009). SVM classification for imbalanced datasets with multiobjective optimization framework. Ms thesis, Graduate School of Sciences and Engineering, Koç University, İstanbul, Turkey.
Provost, F., & Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learning, 42(3), 203–231.
Tang, Y., Zhang, Y. Q., Chawla, N. V., & Krasser, S. (2009). SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man and Cybernetics, 39(1), 281–288.
Veropoulos, K., Campbell, C., & Cristianini, N. (1999). Controlling the sensitivity of support vector machines. In Proceedings of the international joint conference on AI (pp. 55–60).
Visa, S. (2005). Issues in mining imbalanced data sets—a review paper. In Proceedings of the sixteen Midwest artificial intelligence and cognitive science conference, 2005 (pp. 67–73).
Wang, S., Jiang, W., & Tsui, K.-L. (2010). Adjusted support vector machines based on a new loss function. Annals of Operations Research, 174, 83–101.
Weiss, G. M. (2004). Mining with rarity: a unifying framework. ACM SIGKDD Explorations Newsletter, 6(1), 7–19.
Witten, I., & Frank, E. (2000). Data mining: practical machine learning tools and techniques with Java implementations. San Mateo: Kaufmann.
Wu, G., & Chang, E. Y. (2003). Class-boundary alignment for imbalanced dataset learning. In ICML 2003 workshop on learning from imbalanced data sets (pp. 49–56).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Aşkan, A., Sayın, S. SVM classification for imbalanced data sets using a multiobjective optimization framework. Ann Oper Res 216, 191–203 (2014). https://doi.org/10.1007/s10479-012-1300-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-012-1300-5