Weighted relaxed support vector machines

Şeref, Onur; Razzaghi, Talayeh; Xanthopoulos, Petros

doi:10.1007/s10479-014-1711-6

Weighted relaxed support vector machines

Published: 07 September 2014

Volume 249, pages 235–271, (2017)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Onur Şeref¹,
Talayeh Razzaghi² &
Petros Xanthopoulos²

705 Accesses
18 Citations
Explore all metrics

Abstract

Classification of imbalanced data is challenging when outliers exist. In this paper, we propose a supervised learning method to simultaneously classify imbalanced data and reduce the influence of outliers. The proposed method is a cost-sensitive extension of the relaxed support vector machines (RSVM), where the restricted penalty free-slack is split independently between the two classes in proportion to the number samples in each class with different weights, hence given the name weighted relaxed support vector machines (WRSVM). We compare classification results of WRSVM with SVM, WSVM and RSVM on public benchmark datasets with imbalanced classes and outlier noise, and show that WRSVM produces more accurate and robust classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Abe, N., Zadrozny, B., & Langford, J. (2006). Outlier detection by active learning. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp. 504–509.
Anyfantis, D., Karagiannopoulos, M., Kotsiantis, S., & Pintelas, P. (2007). Robustness of learning techniques in handling class noise in imbalanced datasets. In Artificial intelligence and innovations 2007: From theory to applications. Springer, pp. 21–28.
Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository. Irvine: University of California, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml.
Breunig, M., Kriegel, H., Ng, R., & Sander, J. (2000) Lof: Identifying density-based local outliers. In ACM sigmod record, Vol. 29, no. 2. ACM, pp. 93–104.
Brodley, C. E., & Friedl, M. A. (1999). Identifying mislabeled training data. Journal Of Artificial Intelligence Research, 11, 131–167.
Google Scholar
Burez, J., & Van den Poel, D. (2009). Handling class imbalance in customer churn prediction. Expert Systems with Applications, 36(3), 4626–4636.
Article Google Scholar
Cao, J., Kwong, S., & Wang, R. (2012). A noise-detection based adaboost algorithm for mislabeled data. Pattern Recognition, 45(12), 4451–4465.
Article Google Scholar
Cauwenberghs, G., & Poggio, T. (2001). Incremental and decremental support vector machine learning. In T. K. Leen, T. G. Dietterich & V. Tresp (Eds.), Advances in Neural Information Processing Systems 13 (pp. 409–415). MIT Press: USA.
Chang-Xing Ma, Uniform design tables. World Wide Web, http://uic.edu.hk/isci/UniformDesign/UD%20Tables.html.
Chang, C., & Lin, C. (2011). Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 1–27.
Article Google Scholar
Cifarelli, C., Guarracino, M. R., Seref, O., Cuciniello, S., & Pardalos, P. M. (2007). Incremental classification with generalized eigenvalues. Journal of Classification, 24(2), 205–219.
Article Google Scholar
Diehl, C. P., & Cauwenberghs, G. (2003). Svm incremental learning, adaptation and optimization. In Proceedings of the international joint conference on neural networks, 2003, Vol. 4. IEEE, pp. 2685–2690.
Du, S., & Chen, S. (2005). Weighted support vector machine for classification. In IEEE international conference on systems, man and cybernetics, Vol. 4. IEEE, pp. 3866–3871.
Erdoğan, G. (2012). Outlier detection toolbox.
Fawcett, T. (2003). In vivo spam filtering: A challenge problem for kdd. ACM SIGKDD Explorations Newsletter, 5(2), 140–148.
Article Google Scholar
Fefilatyev, S., Kramer, K., Hall, L., Goldgof, D., Kasturi, R., Remsen, A., et al. (2011). Detection of anomalous particles from the deepwater horizon oil spill using the sipper3 underwater imaging platform. In 2011 IEEE 11th international conference on data mining workshops (ICDMW). IEEE, pp. 741–748.
Fefilatyev, S., Shreve, M., Kramer, K., Hall, L., Goldgof, D., Kasturi, R., et al. (2012). Label-noise reduction with support vector machines. In 2012 21st international conference on, pattern recognition (ICPR). IEEE, pp. 3504–3508.
Frank, A., & Asuncion, A. (2010). UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science, Vol. 213, http://uic.edu.hk/isci/UniformDesign/UD%20Tables.html.
Frey, P. W., & Slate, D. J. (1991). Letter recognition using holland-style adaptive classifiers. Machine Learning, 6(2), 161–182.
Google Scholar
Guan, D., Yuan, W., Lee, Y.-K., & Lee, S. (2011). Identifying mislabeled training data with the aid of unlabeled data. Applied Intelligence, 35(3), 345–358.
Article Google Scholar
Guarracino, M. R., Cuciniello, S., & Feminiano, D. (2009). Incremental generalized eigenvalue classification on data streams. In International workshop on data stream management and mining, pp. 1–12.
Huang, Y., & Du, S. (2005). Weighted support vector machine for classification with uneven training class sizes. In Proceedings of 2005 international conference on machine learning and cybernetics, 2005, Vol. 7. IEEE, pp. 4365–4369.
Huang, C., Lee, Y., Lin, D., & Huang, S. (2007). Model selection for support vector machines via uniform design. Computational Statistics & Data Analysis, 52(1), 335–346.
Article Google Scholar
Hulse, J. V., Khoshgoftaar, T. M., & Napolitano, A. (2007). Skewed class distributions and mislabeled examples. In Seventh IEEE international conference on data mining workshops, 2007. ICDM workshops 2007, pp. 477–482.
Van Hulse, J., & Khoshgoftaar, T. (2009). Knowledge discovery from imbalanced and noisy data. Data & Knowledge Engineering, 68(12), 1513–1542.
Article Google Scholar
Hwang, J., Park, S., & Kim, E. (2011). A new weighted approach to imbalanced data classification problem via support vector machine with quadratic cost function. Expert Systems with Applications, 38(7), 8580–8585.
Article Google Scholar
IBM, IBM ILOG CPLEX: High-performance mathematical programming solver for linear programming, mixed integer programming, and quadratic programming. World Wide Web, http://www01.ibm.com/software/commerce/optimization/cplex-optimizer/.
John, G. H. (1995). Robust decision trees: Removing outliers from databases. In Proceedings of the first international conference on knowledge discovery and data mining. AAAI Press, Menlo Park, CA, pp. 174–179.
Khoonsari, P. E., & Motie, A. R. (2012). A comparison of efficiency and robustness of ID3 and C4.5 algorithms using dynamic test and training data sets. International Journal of Machine Learning and Computing, 2(5), 540–543.
Google Scholar
Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. (2011). Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 41(3), 552–568.
Article Google Scholar
Kim, G., Chae, B. K., & Olson, D. L. (2013). A support vector machine (svm) approach to imbalanced datasets of customer responses: Comparison with other customer response models. Service Business, 7(1), 167–182.
Article Google Scholar
Kubat, M., Holte, R. & Matwin, S. (1997). Learning when negative examples abound. Machine Learning: ECML-97, 146–153.
Kubat, M., Holte, R., & Matwin, S. (1998). Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 30(2), 195–215.
Article Google Scholar
Leung, T., Song, Y., & Zhang, J. (2011). Handling label noise in video classification via multiple instance learning. In 2011 IEEE international conference on computer vision (ICCV). IEEE, pp. 2056–2063.
Li, H., Zong, Y., Wang, K., & Wu, B. (2012) A novel classification algorithm to noise data. In Advances in swarm intelligence. Springer, pp. 192–199.
Lin, C., & Wang, S. (2002). Fuzzy support vector machines. IEEE Transactions on Neural Networks, 13(2), 464–471.
Article Google Scholar
Liu, S., Jia, C., & Ma, H. (2005). A new weighted support vector machine with ga-based parameter selection. In Proceedings of 2005 international conference on machine learning and cybernetics, 2005, vol. 7. IEEE, pp. 4351–4355.
Ma, Y., Luo, G., Li, J., & Chen, A. (2011). Combating class imbalance problem in semi-supervised defect detection. In Computational problem-solving (ICCP), 2011 international conference on. IEEE, 2011, 619–622.
Google Scholar
Mangasarian, O. L., & Wild, E. W. (2007). Privacy-preserving classification of horizontally partitioned data via random kernels. In Proceedings of the 2008 international conference on data mining, DMIN08, Vol. 2, pp. 473–479.
Mathworks, MATLAB: The language of technical computing. World Wide Web, http://www.mathworks.com/products/matlab/.
Pang, S., Ozawa, S., & Kasabov, N. (2005). Incremental linear discriminant analysis for classification of data streams. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 35(5), 905–914.
Article Google Scholar
Pechenizkiy, M., Tsymbal, A., Puuronen, S., & Pechenizkiy, O. (2006). Class noise and supervised learning in medical domains: The effect of feature extraction. In 19th IEEE international symposium on computer-based medical systems, 2006. CBMS 2006. IEEE, pp. 708–713.
Poursaeidi, M., & Kundakcioglu, O. (2014). Robust support vector machines for multiple instance learning. Annals of Operations Research, 216(1), 205–227. doi:10.1007/s10479-012-1241-z. (Online).
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Google Scholar
Rätsch, G. (2001). Ida benchmark repository, World Wide Web, http://mldata.org/repository/tags/data/IDA_Benchmark_Repository/.
Rebbapragada, U., Mandrake, L., Wagstaff, K. L., Gleeson, D., Castaño, R., Chien, S., et al. (2009). Improving onboard analysis of hyperion images by filtering mislabeled training data examples. In Aerospace conference, 2009 IEEE. IEEE, pp. 1–9.
Şeref, O., Chaovalitwongse, W. A., & Brooks, J. P. (2012). Relaxing support vectors for classification. Annals of Operations Research, 216(1), 229–255.
Sluban, B., Gamberger, D., & Lavrač, N. (2012). Ensemble-based noise detection: Noise ranking and visual performance evaluation. Data Mining and Knowledge Discovery, 28(2), 265–303.
Sun, J.-W., Zhao, F.-Y., Wang, C.-J., & Chen, S.-F. (2007). Identifying and correcting mislabeled training instances. In Future generation communication and networking (FGCN 2007), Vol. 1. IEEE, pp. 244 250.
Teng, C.-M. (1999). Correcting noisy data. In Proceedings of the sixteenth international conference on machine learning. Morgan Kaufmann, pp. 239–248.
Van Hulse, J., Khoshgoftaar, T. M., & Napolitano, A. (2010). A novel noise filtering algorithm for imbalanced data. In 2010 Ninth international conference on machine learning and applications (ICMLA). IEEE, pp. 9–14.
Verbaeten, S., & Van Assche, A. (2003). Ensemble methods for noise elimination in classification problems. In Multiple classifier systems. Springer, pp. 317–325.
Wang, X., Liu, F., Jiao, L., Zhou, Z., Yu, J., Li, B., et al. (2012). An evidential reasoning based classification algorithm and its application for face recognition with class noise. Pattern Recognition, 45(12), 4117–4128.
Article Google Scholar
Wei, W., Li, J., Cao, L., Ou, Y., & Chen, J. (2012). Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web, 1–27.
Weiss, G. (2004). Mining with rarity: A unifying framework. Sigkdd Explorations, 6(1), 7–19.
Article Google Scholar
Xanthopoulos, P., Pardalos, P., & Trafalis, T. B. (2012). Robust data mining. Berlin: Springer.
Google Scholar
Xanthopoulos, P., Guarracino, M. R., & Pardalos, P. M. (2014). Robust generalized eigenvalue classifier with ellipsoidal uncertainty. Annals of Operations Research, 216(1), 327–342.
Yeung, D.-Y., & Chow, C. (2002). Parzen-window network intrusion detectors. In Proceedings of the 16th international conference on pattern recognition, 2002, Vol. 4. IEEE, pp. 385–388.
Yin, H., & Dong, H. (2011). The problem of noise in classification: Past, current and future work. In 2011 IEEE 3rd international conference on communication software and networks (ICCSN). IEEE, pp. 412–416.
You, M., Zhao, R.-W., Li, G.-Z., & Hu, X. (2011). Maplsc: A novel multi-class classifier for medical diagnosis. International Journal of Data Mining and Bioinformatics, 5(4), 383–401.
Article Google Scholar
Zhong, S., Tang, W., & Khoshgoftaar, T. M. (2005). Boosted noise filters for identifying mislabeled data. Technical report, Department of Computer Science and Engineering, Florida Atlantic University, Tech. Rep.
Zhong, S., Khoshgoftaar, T. M., & Seliya, N. (2004). Analyzing software measurement data with clustering techniques. IEEE Intelligent Systems, 19(2), 20–27.
Article Google Scholar
Zioutas, G., Pitsoulis, L., & Avramidis, A. (2009). Quadratic mixed integer programming and support vectors for deleting outliers in robust regression. Annals of Operations Research, 166(1), 339–353.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Business Information Technology, Pumplin College of Business, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
Onur Şeref
Department of Industrial Engineering and Management Systems, University of Central Florida, 4000 Central Florida Blvd., Orlando, FL, 32816, USA
Talayeh Razzaghi & Petros Xanthopoulos

Authors

Onur Şeref
View author publications
You can also search for this author in PubMed Google Scholar
Talayeh Razzaghi
View author publications
You can also search for this author in PubMed Google Scholar
Petros Xanthopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Petros Xanthopoulos.

Sensitivity, specificity and accuracy tables

In Tables 6, 7, 8, 9, 10 and 11 we provide the breakdown of sensitivity, specificity and accuracy for all the datasets.

Table 6 Comparative sensitivity results for WRSVM against WSVM, FSVM, RSVM, SVM, NB, C4.5 and 5NN on UCI datasets for different imbalanced case with low outlier ratio [average (SD)]

Full size table

Table 7 Comparative specificity results for WRSVM against WSVM, FSVM, RSVM, SVM, NB, C4.5 and 5NN on UCI datasets for different imbalanced case with low outlier ratio

Full size table

Table 8 Comparative accuracy results for WRSVM against WSVM, FSVM, RSVM, SVM, NB, C4.5 and 5NN on UCI datasets for different imbalanced case with low outlier ratio

Full size table

Table 9 Comparative sensitivity results for WRSVM against WSVM, FSVM, RSVM, SVM, NB, C4.5 and 5NN on UCI datasets for different imbalanced case with high outlier ratio

Full size table

Table 10 Comparative specificity results for WRSVM against WSVM, FSVM, RSVM, SVM, NB, C4.5 and 5NN on UCI datasets for different imbalanced case with high outlier ratio

Full size table

Table 11 Comparative accuracy results for WRSVM against WSVM, FSVM, RSVM, SVM, NB, C4.5 and 5NN on UCI datasets for different imbalanced case with high outlier ratio

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Şeref, O., Razzaghi, T. & Xanthopoulos, P. Weighted relaxed support vector machines. Ann Oper Res 249, 235–271 (2017). https://doi.org/10.1007/s10479-014-1711-6

Download citation

Published: 07 September 2014
Issue Date: February 2017
DOI: https://doi.org/10.1007/s10479-014-1711-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weighted relaxed support vector machines

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Author information

Authors and Affiliations

Corresponding author

Sensitivity, specificity and accuracy tables

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Weighted relaxed support vector machines

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Author information

Authors and Affiliations

Corresponding author

Sensitivity, specificity and accuracy tables

Sensitivity, specificity and accuracy tables

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation