skip to main content
research-article

Slack-Factor-Based Fuzzy Support Vector Machine for Class Imbalance Problems

Published: 01 March 2023 Publication History

Abstract

Class imbalance and noisy data widely exist in real-world problems, and the support vector machine (SVM) is hard to construct good classifiers on these data. Fuzzy SVMs (FSVMs), as variants of SVM, use a fuzzy membership function both to reflect the samples’ importance and to remove the impact of noises, and employ cost-sensitive technology to address the class imbalance. They can handle the noise and class imbalance problems in many cases; however, the fuzzy membership functions are often affected by the class imbalance data, leading to inaccurate measures for samples’ performance and affecting the performance of FSVMs. To solve this problem, we design a new fuzzy membership function and combine it with cost-sensitive learning to deal with the class imbalance problem with noisy data, named Slack-Factor-based FSVM (SFFSVM). In SFFSVM, the relative distances between samples and an estimated hyperplane, called slack factors, are used to define the fuzzy membership function. To eliminate the impact of class imbalance on the function and gain more accurate samples’ importance, we rectify the importance according to the positional relationship between the estimated hyperplane and the optimal hyperplane of the problem, and the slack factors of samples. Comprehensive experiments on artificial and real-world datasets demonstrate that SFFSVM outperforms other comparative methods on F1, MCC, and AUC-PR metrics.

References

[1]
Lida Abdi and Sattar Hashemi. 2015. To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Transactions on Knowledge and Data Engineering 28, 1 (2015), 238–251.
[2]
Jesús Alcalá-Fdez, Alberto Fernández, Julián Luengo, Joaquín Derrac, Salvador García, Luciano Sánchez, and Francisco Herrera. 2011. Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic & Soft Computing 17 (2011), 255–287.
[3]
Sukarna Barua, Md Monirul Islam, Xin Yao, and Kazuyuki Murase. 2012. MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on Knowledge and Data Engineering 26, 2 (2012), 405–425.
[4]
Gustavo E. A. P. A. Batista, Ronaldo C. Prati, and Maria Carolina Monard. 2004. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6, 1 (2004), 20–29.
[5]
Rukshan Batuwita and Vasile Palade. 2010. FSVM-CIL: Fuzzy support vector machines for class imbalance learning. IEEE Transactions on Fuzzy Systems 18, 3 (2010), 558–571.
[6]
Jair Cervantes, Farid García-Lamont, Lisbeth Rodríguez-Mazahua, Asdrúbal López Chau, José Sergio Ruiz Castilla, and Adrián Trueba. 2017. PSO-based method for SVM classification on skewed data sets. Neurocomputing 228, 3 (2017), 187–197.
[7]
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 2 (2002), 321–357.
[8]
Barnan Das, Narayanan C. Krishnan, and Diane J. Cook. 2014. RACOG and wRACOG: Two probabilistic oversampling techniques. IEEE Transactions on Knowledge and Data Engineering 27, 1 (2014), 222–234.
[9]
Swagatam Das, Shounak Datta, and Bidyut B. Chaudhuri. 2018. Handling data irregularities in classification: Foundations, trends, and future challenges. Pattern Recognit. 81, 9 (2018), 674–693.
[10]
Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, Jan (2006), 1–30.
[11]
Qi Fan, Zhe Wang, Dongdong Li, Daqi Gao, and Hongyuan Zha. 2017. Entropy-based fuzzy support vector machine for imbalanced datasets. Knowledge-Based Systems 115, 1 (2017), 87–99.
[12]
Alberto Fernández, Salvador Garcia, Francisco Herrera, and Nitesh V. Chawla. 2018. SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. Journal of Artificial Intelligence Research 61, 1 (2018), 863–905.
[13]
García, Salvador, Herrera, and Francisco. 2009. Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evolutionary Computation 17, 3 (2009), 275–306.
[14]
Sami Gazzah and Najoua Essoukri Ben Amara. 2008. New oversampling approaches based on polynomial fitting for imbalanced data sets. In Proceedings of the 2008 The 8th IAPR International Workshop on Document Analysis Systems. IEEE, 677–684.
[15]
Trevor Hastie, Robert Tibshirani, and Jerome H. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer.
[16]
Guoxun He, Hui Han, and Wenyuan Wang. 2005. An over-sampling expert system for learing from imbalanced data sets. In Proceedings of the 2005 International Conference on Neural Networks and Brain, Vol. 1. IEEE, 537–541.
[17]
Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE, 1322–1328.
[18]
Ronald L. Iman and James M. Davenport. 1980. Approximations of the critical region of the fbietkan statistic. Communications in Statistics-Theory and Methods 9, 6 (1980), 571–595.
[19]
Linda Kaufman. 1999. Solving the Quadratic Programming Problem Arising in Support Vector Classification. 147–167.
[20]
Salman H. Khan, Munawar Hayat, Mohammed Bennamoun, Ferdous Ahmed Sohel, and Roberto Togneri. 2018. Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Transactions on Neural Networks and Learning Systems 29, 8 (2018), 3573–3587.
[21]
György Kovács. 2019. An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Applied Soft Computing 83, 10 (2019), 105662.
[22]
Bartosz Krawczyk. 2016. Learning from imbalanced data: Open challenges and future directions. Progress in Artificial Intelligence 5, 4 (2016), 221–232.
[23]
Bartosz Krawczyk and Michal Wozniak. 2015. Cost-sensitive neural network with roc-based moving threshold for imbalanced classification(Lecture Notes in Computer Science, Vol. 9375). Springer, 45–52.
[24]
Miroslav Kubat, Robert C. Holte, and Stan Matwin. 1997. Learning when negative examples abound. In Machine Learning: ECML-97, 9th European Conference on Machine Learning, Prague, Czech Republic, April 23-25, 1997, Proceedings(Lecture Notes in Computer Science, Vol. 1224). Maarten van Someren and Gerhard Widmer (Eds.), Springer, 146–153.
[25]
N. Santhosh Kumar, K. Nageswara Rao, A. Govardhan, K. Sudheer Reddy, and Ali Mirza Mahmood. 2014. Undersampled K-means approach for handling imbalanced distributed data. Progress in Artificial Intelligence 3, 1 (2014), 29–38.
[26]
Chun-fu Lin and Sheng-De Wang. 2002. Fuzzy support vector machines. IEEE Transactions on Neural Networks 13, 2 (2002), 464–471.
[27]
Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Dollár. 2020. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (2020), 318–327.
[28]
Wei-Chao Lin, Chih-Fong Tsai, Ya-Han Hu, and Jing-Shang Jhang. 2017. Clustering-based undersampling in class-imbalanced data. Information Sciences 409, 10 (2017), 17–26.
[29]
Xu-Ying Liu and Zhi-Hua Zhou. 2006. The influence of class imbalance on cost-sensitive learning: An empirical study. In Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), 18–22 December 2006. IEEE Computer Society, 970–974.
[30]
Jakub Nalepa and Michal Kawulok. 2019. Selecting training sets for support vector machines: A review. Artificial Intelligence Review 52, 2 (2019), 857–900.
[31]
Wing W. Y. Ng, Junjie Hu, Daniel S. Yeung, Shaohua Yin, and Fabio Roli. 2015. Diversified sensitivity-based undersampling for imbalance classification problems. IEEE Transactions on Cybernetics 45, 11 (2015), 2402–2412.
[32]
Wing W. Y. Ng, Shichao Xu, Jianjun Zhang, Xing Tian, Tongwen Rong, and Sam Kwong. 2022. Hashing-based undersampling ensemble for imbalanced pattern classification problems. IEEE Transactions on Cybernetics 52, 2 (2022), 1269–1279.
[33]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 4 (2011), 2825–2830.
[34]
Jason D. M. Rennie, Lawrence Shih, Jaime Teevan, and David R. Karger. 2003. Tackling the poor assumptions of naive bayes text classifiers. In ICML'03 Proceedings of the 20th International Conference on International Conference on Machine Learning. 616–623.
[35]
Miriam Seoane Santos, Jastin Pompeu Soares, Pedro Henrigues Abreu, Helder Araujo, and Joao Santos. 2018. Cross-validation for imbalanced datasets: Avoiding overoptimistic and overfitting approaches [research frontier]. IEEE ComputatioNal iNtelligeNCe magaziNe 13, 4 (2018), 59–76.
[36]
Muhammad Atif Tahir, Josef Kittler, Krystian Mikolajczyk, and Fei Yan. 2009. A multiple expert approach to the class imbalance problem using inverse random under sampling. In Multiple Classifier Systems, 8th International Workshop, MCS 2009, Reykjavik, Iceland, June 10–12, 2009. Proceedings(Lecture Notes in Computer Science, Vol. 5519). Jon Atli Benediktsson, Josef Kittler, and Fabio Roli (Eds.), Springer, 82–91.
[37]
Xinmin Tao, Qing Li, Chao Ren, Wenjie Guo, Qing He, Rui Liu, and Junrong Zou. 2020. Affinity and class probability-based fuzzy support vector machine for imbalanced data sets. Neural Networks 122, 2 (2020), 289–307.
[38]
K. Veropoulos, Icg Campbell, and N. Cristianini. 1999. Controlling the sensitivity of support vector machines. In Proceedings of the International Joint Conference on AI. 55–60.
[39]
Jieting Wang, Yuhua Qian, and Feijiang Li. 2020. Learning with mitigating random consistency from the accuracy measure. Machine Learning 109, 12 (2020), 2247–2281.
[40]
Jieting Wang, Yuhua Qian, Feijiang Li, Jiye Liang, and Weiping Ding. 2020. Fusing fuzzy monotonic decision trees. IEEE Transactions on Fuzzy Systems 28, 5 (2020), 887–900.
[41]
Jieting Wang, Yuhua Qian, Feijiang Li, Jiye Liang, and Qingfu Zhang. 2023. Generalization performance of pure accuracy and its application in selective ensemble learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 2 (2023), 1798–1816.
[42]
Lituan Wang, Lei Zhang, Xiaofeng Qi, and Zhang Yi. 2022. Deep attention-based imbalanced image classification. IEEE Transactions on Neural Networks and Learning Systems 33, 8 (2022), 3320–3330.
[43]
Tinghua Wang, Yunzhi Qiu, and Jialin Hua. 2020. Centered kernel alignment inspired fuzzy support vector machine. Fuzzy Sets and Systems 394, 1 (2020), 110–123.
[44]
Zhenning Wu, Huaguang Zhang, and Jinhai Liu. 2014. A fuzzy support vector machine algorithm for classification based on a novel PIM fuzzy clustering method. Neurocomputing 125, 11 (2014), 119–124.
[45]
Xiaowei Yang, Guangquan Zhang, Jie Lu, and Jun Ma. 2011. A kernel fuzzy c-means clustering-based fuzzy support vector machine algorithm for classification problems with outliers or noises. IEEE Transactions on Fuzzy Systems 19, 1 (2011), 105–115.
[46]
Yuanwei Zhu, Yuan-Ting Yan, Yiwen Zhang, and Yanping Zhang. 2020. EHSO: Evolutionary hybrid sampling in overlapping scenarios for imbalanced learning. Neurocomputing 417, 5 (2020), 333–346.

Cited By

View all
  • (2024)CPS-3WS: A critical pattern supported three-way sampling method for classifying class-overlapped imbalanced dataInformation Sciences10.1016/j.ins.2024.120835676(120835)Online publication date: Aug-2024
  • (2024)Toward effective SVM sample reduction based on fuzzy membership functionsChemometrics and Intelligent Laboratory Systems10.1016/j.chemolab.2024.105233254(105233)Online publication date: Nov-2024

Index Terms

  1. Slack-Factor-Based Fuzzy Support Vector Machine for Class Imbalance Problems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 17, Issue 6
    July 2023
    392 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3582889
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 March 2023
    Online AM: 03 January 2023
    Accepted: 29 December 2022
    Revised: 13 November 2022
    Received: 14 September 2022
    Published in TKDD Volume 17, Issue 6

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Cost-sensitive learning
    2. class imbalance
    3. fuzzy support vector machine
    4. decision hyperplane
    5. fuzzy membership function

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • Joint Funds of the National Natural Science Foundation of China
    • Provincial Science and Technology Program of Gansu

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)132
    • Downloads (Last 6 weeks)17
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)CPS-3WS: A critical pattern supported three-way sampling method for classifying class-overlapped imbalanced dataInformation Sciences10.1016/j.ins.2024.120835676(120835)Online publication date: Aug-2024
    • (2024)Toward effective SVM sample reduction based on fuzzy membership functionsChemometrics and Intelligent Laboratory Systems10.1016/j.chemolab.2024.105233254(105233)Online publication date: Nov-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media