SVM classification for imbalanced data sets using a multiobjective optimization framework

Aşkan, Ayşegül; Sayın, Serpil

doi:10.1007/s10479-012-1300-5

SVM classification for imbalanced data sets using a multiobjective optimization framework

Published: 15 January 2013

Volume 216, pages 191–203, (2014)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Ayşegül Aşkan¹ &
Serpil Sayın²

795 Accesses
11 Citations
Explore all metrics

Abstract

Classification of imbalanced data sets in which negative instances outnumber the positive instances is a significant challenge. These data sets are commonly encountered in real-life problems. However, performance of well-known classifiers is limited in such cases. Various solution approaches have been proposed for the class imbalance problem using either data-level or algorithm-level modifications. Support Vector Machines (SVMs) that have a solid theoretical background also encounter a dramatic decrease in performance when the data distribution is imbalanced. In this study, we propose an L ₁-norm SVM approach that is based on a three objective optimization problem so as to incorporate into the formulation the error sums for the two classes independently. Motivated by the inherent multi objective nature of the SVMs, the solution approach utilizes a reduction into two criteria formulations and investigates the efficient frontier systematically. The results indicate that a comprehensive treatment of distinct positive and negative error levels may lead to performance improvements that have varying degrees of increased computational effort.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Akbani, R., Kwek, S., & Japkowicz, N. (2004). Applying support vector machines to imbalanced datasets. In ECML (pp. 39–50).
Google Scholar
Aytug, H., & Sayin, S. (2012). Choosing the trade-off parameter for one-norm support vector machines. European Journal of Operational Research, 218(3), 667–675.
Article Google Scholar
Benson, H. (1979). Vector optimization with two objective functions. Journal of Optimization Theory and Applications, 28(2), 253–257.
Article Google Scholar
Chan, P. K., & Stolfo, S. J. (1998). Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In Proceedings of the fourth international conference on knowledge discovery and data mining (pp. 164–168). Menlo Park: AAAI Press.
Google Scholar
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W.P. (2002). Smote: synthetic minority over-sampling technique. The Journal of Artificial Intelligence Research, 51(16), 321–357.
Google Scholar
Chawla, N. V., & Japkowicz, N. (2004). Editorial: Special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter, 6, 2004.
Article Google Scholar
Chen, X., Gerlach, B., & Casasent, D. (2005). Pruning support vectors for imbalanced data classification. In Proceedings of international joint conference on neural networks, Montreal, Canada.
Google Scholar
Cristianini, N., Kandola, J., Elisseeff, A., & Shawe-Taylor, J. (2002). On kernel-target alignment. In Advances in neural information processing systems (Vol. 14, pp. 367–373). Cambridge: MIT Press.
Google Scholar
Domingos, P. (1999). Metacost: A general method for making classifiers cost-sensitive. In Proceedings of the fifth international conference on knowledge discovery and data mining (pp. 155–164). New York: ACM.
Chapter Google Scholar
Eitrich, T., & Lang, B. (2005). Parallel tuning of support vector machine learning parameters for large and unbalanced data sets. In CompLife 2005 (pp. 253–264).
Google Scholar
Ezawa, K. J., Singh, M., & Norton, S. W. (1996). Learning goal oriented Bayesian networks for telecommunications risk management. In Proceedings of the 13th international conference on machine learning (pp. 139–147). Los Altos: Kaufmann.
Google Scholar
Fawcett, T., & Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1, 291–316.
Article Google Scholar
Fruhwirth, B., & Mekelburg, K. (1994). On the efficient point set of tricriteria linear programs. European Journal of Operations Research, 3(72), 192–199.
Article Google Scholar
Monard, M. C., Batista, G., & Carvalho, A. (2000). Applying one-sided selection to unbalanced datasets (pp. 315–325). Berlin: Springer.
Google Scholar
Gu, Q., Cai, Z., Zhu, L., & Huang, B. (2008). Data mining on imbalanced data sets, pp. 1020–1024.
CPLEX (2011). IBM ILOG Concert Technology v.12.3.
Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A systematic study. Intelligent Data Analysis, 6, 429–449.
Google Scholar
Japkowicz, N. (2000). The class imbalance problem: significance and strategies. In Proceedings of the 2000 international conference on artificial intelligence (ICAI) (pp. 111–117).
Google Scholar
Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Handling imbalanced datasets: a review.
Kouvelis, P., & Sayin, S. (2006). Algorithm robust for the bicriteria discrete optimization problem: heuristic variations and computational evidence. Annals of Operations Research, 147, 71–85.
Article Google Scholar
Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced training sets: one-sided selection. In Proceedings of the 14th international conference on machine learning.
Google Scholar
Kubat, M., Holte, R. C., & Matwin, S. (1998). Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 6, 195–215.
Article Google Scholar
Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proc. 4th international conf. on knowledge discovery and data mining (KDD-98), New York (pp. 73–79). Menlo Park: AAAI Press.
Google Scholar
Öztürk, A. (2009). SVM classification for imbalanced datasets with multiobjective optimization framework. Ms thesis, Graduate School of Sciences and Engineering, Koç University, İstanbul, Turkey.
Provost, F., & Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learning, 42(3), 203–231.
Article Google Scholar
Tang, Y., Zhang, Y. Q., Chawla, N. V., & Krasser, S. (2009). SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man and Cybernetics, 39(1), 281–288.
Article Google Scholar
Veropoulos, K., Campbell, C., & Cristianini, N. (1999). Controlling the sensitivity of support vector machines. In Proceedings of the international joint conference on AI (pp. 55–60).
Google Scholar
Visa, S. (2005). Issues in mining imbalanced data sets—a review paper. In Proceedings of the sixteen Midwest artificial intelligence and cognitive science conference, 2005 (pp. 67–73).
Google Scholar
Wang, S., Jiang, W., & Tsui, K.-L. (2010). Adjusted support vector machines based on a new loss function. Annals of Operations Research, 174, 83–101.
Article Google Scholar
Weiss, G. M. (2004). Mining with rarity: a unifying framework. ACM SIGKDD Explorations Newsletter, 6(1), 7–19.
Article Google Scholar
Witten, I., & Frank, E. (2000). Data mining: practical machine learning tools and techniques with Java implementations. San Mateo: Kaufmann.
Google Scholar
Wu, G., & Chang, E. Y. (2003). Class-boundary alignment for imbalanced dataset learning. In ICML 2003 workshop on learning from imbalanced data sets (pp. 49–56).
Google Scholar

Download references

Author information

Authors and Affiliations

Garanti Teknoloji, Evren Mahallesi, Koçman Caddesi No:34 Güneşli, 34212, İstanbul, Turkey
Ayşegül Aşkan
College of Administrative Sciences and Economics, Koç University, Sarıyer, 34450, İstanbul, Turkey
Serpil Sayın

Authors

Ayşegül Aşkan
View author publications
You can also search for this author in PubMed Google Scholar
Serpil Sayın
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Serpil Sayın.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aşkan, A., Sayın, S. SVM classification for imbalanced data sets using a multiobjective optimization framework. Ann Oper Res 216, 191–203 (2014). https://doi.org/10.1007/s10479-012-1300-5

Download citation

Published: 15 January 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s10479-012-1300-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SVM classification for imbalanced data sets using a multiobjective optimization framework

Abstract

Access this article

Similar content being viewed by others

Feature selection in SVM via polyhedral k-norm

A Multiobjective Multiclass Support Vector Machine Restricting Classifier Candidates Based on k-Means Clustering

Multi-class support vector machine based on the minimization of class variance

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SVM classification for imbalanced data sets using a multiobjective optimization framework

Abstract

Access this article

Similar content being viewed by others

Feature selection in SVM via polyhedral k-norm

A Multiobjective Multiclass Support Vector Machine Restricting Classifier Candidates Based on k-Means Clustering

Multi-class support vector machine based on the minimization of class variance

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation