Abstract
In supervised learning, imbalanced class dataset is a state where the class distribution is not uniform among the classes. Most standard classifiers fail to properly identify pattern that belongs to minority class because most of those classifiers are built to minimize the error rate. As a result, a biased classification model is highly anticipated, as higher accuracy metrics can solely be represented by the majority class. In order to tackle this problem, several methods have been proposed, mainly to reduce the classifier’s bias, such as performing resampling on the dataset, modification on a classifier optimization problem, or introducing a new optimization task on top of the classifier. Our proposal is based on a new optimization task on top of a classifier, combined as a part of the learning process. Specifically, a hybrid classifier based on genetic programming and support vector machines is proposed. Our classifier has shown to be competitive enough against several variations of support vector machines in solving imbalanced classification problem from the experimentation carried out.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Zheng, B., Myint, S.W., Thenkabail, P.S., Aggarwal, R.M.: A support vector machine to identify irrigated crop types using time-series landsat NDVI data. Int. J. Appl. Earth Obs. Geoinf. 34, 103–112 (2015)
Geiß, C., Pelizari, P.A., Marconcini, M., Sengara, W., Edwards, M., Lakes, T., Taubenböck, H.: Estimation of seismic building structural types using multi-sensor remote sensing and machine learning techniques. ISPRS J. Photogramm. Remote. Sens. 104, 175–188 (2015)
Yu, L., Zhou, R., Tang, L., Chen, R.: A dbn-based resampling svm ensemble learning paradigm for credit classification with imbalanced data. Appl. Soft Comput. 69, 192–202 (2018)
Lameski, P., Zdravevski, E., Mingov, R., Kulakov, A.: Svm parameter tuning with grid search and its impact on reduction of model over-fitting. In: Rough sets, fuzzy sets, data mining, and granular computing, pp. 464–474. Springer (2015)
Mease, D., Wyner, A.J., Buja, A.: Boosted classification trees and class probability/quantile estimation. J. Mach. Learn. Res. 8, 409–439 (2007)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Iranmehr, A., Masnadi-Shirazi, H., Vasconcelos, N.: Cost-sensitive support vector machines. Neurocomputing 343, 50–64 (2019)
Tanveer, M., Gautam, C., Suganthan, P.N.: Comprehensive evaluation of twin SVM based classifiers on UCI datasets. Appl. Soft Comput. 83, 105–617 (2019)
Gonzalez-Abril, L., Nuñez, H., Angulo, C., Velasco, F.: Gsvm: An svm for handling imbalanced accuracy between classes inbi-classification problems. Appl. Soft Comput. 17, 23–31 (2014)
Imam, T., Ting, K.M., Kamruzzaman, J.: z-SVM: an SVM for improved classification of imbalanced data. In: Advances in Artificial Intelligence, pp. 264–273. Springer (2006)
Hsu, C.W., Lin, C.J.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Networks 13(2), 415–425 (2002)
Vapnik, V.N.: The nature of statistical learning theory. Springer-Verlag
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press (2000)
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Fernández, A., López, V., Galar, M., Del Jesus, M.J., Herrera, F.: Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches. Knowl.-Based Syst. 42, 97–110 (2013)
Barua, S., Islam, M.M., Yao, X., Murase, K.: Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2012)
Mathew, J., Pang, C.K., Luo, M., Leong, W.H.: Classification of imbalanced data by oversampling in kernel space of support vector machines. IEEE Trans. Neural Netw. Learn. Syst. 29(9), 4065–4076 (2017)
Douzas, G., Bacao, F.: Self-organizing map oversampling (somo) for imbalanced data set learning. Expert Syst. Appl. 82, 40–52 (2017)
Koziarski, M., Krawczyk, B., Woźniak, M.: Radial-based oversampling for noisy imbalanced data classification. Neurocomputing 343, 19–33 (2019)
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 39(2), 539–550 (2009)
Mani, I., Zhang, I.: KNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of Workshop on Learning from Imbalanced Datasets (2003)
Galar, M., Fernández, A., Barrenechea, E., Herrera, F.: Eusboost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn. 46(12), 3460–3471 (2013)
Kang, Q., Chen, X., Li, S., Zhou, M.: A noise-filtered under-sampling scheme for imbalanced classification. IEEE Trans. Cybern. 47(12), 4263–4274 (2016)
Koziarski, M.: Radial-based undersampling for imbalanced data classification. Pattern Recognit. 102, 107–262 (2020)
Barua, S., Islam, M., Yao, X., Murase, K.: Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2014). https://doi.org/10.1109/TKDE.2012.232
Lu, W., Li, Z., Chu, J.: Adaptive ensemble undersampling-boost: a novel learning framework for imbalanced data. J. Syst. Softw. 132, 272–282 (2017)
Batuwita, R., Palade, V.: Fsvm-cil: fuzzy support vector machines for class imbalance learning. IEEE Trans. Fuzzy Syst. 18(3), 558–571 (2010)
Khemchandani, R., Chandra, S., et al.: Twin support vector machines for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell. 29(5), 905–910 (2007)
Tomar, D., Agarwal, S.: Twin support vector machine: a review from 2007 to 2014. Egypt. Inf. J. 16(1), 55–69 (2015)
Ji, W., Liu, D., Meng, Y., Xue, Y.: A review of genetic-based evolutionary algorithms in SVM parameters optimization. Evolutionary Intelligence, pp. 1–26 (2020)
Xuefeng, L., Fang, L.: Choosing multiple parameters for SVM based on genetic algorithm. In: 6th International Conference on Signal Processing, 2002, vol. 1, pp. 117–119. IEEE (2002)
Gupta, P., Mehlawat, M.K., Mittal, G.: Asset portfolio optimization using support vector machines and real-coded genetic algorithm. J. Glob. Optim. 53(2), 297–315 (2012)
Kalyani, S., Swarup, K.: Static security assessment in power systems using multi-class SVM with parameter selection methods. Int. J. Comput. Theory Eng. 5(3), 465 (2013)
Mishra, S., Ahirwar, A.: An analysis on feature selection method using real coded genetic algorithm (RCGA). J. Softw. Eng. Tools & Technol. Trends 5(1), 23–30 (2018)
Rai, P., Barman, A.G.: Design optimization of spur gear using SA and RCGA. J. Braz. Soc. Mech. Sci. Eng. 40(5), 1–8 (2018)
Yin, Z.Y., Jin, Y.F., Shen, S.L., Huang, H.W.: An efficient optimization method for identifying parameters of soft structured clay by an enhanced genetic algorithm and elastic-viscoplastic model. Acta Geotech. 12(4), 849–867 (2017)
Tao, M., Xinzhi, Z., Yinjie, L.: A parameters optimization method for an SVM based on adaptive genetic algorithm. Comput. Measur. Control 24(9), 215–217 (2016)
Tam, V.W., Cheng, K.Y., Lui, K.S.: Using micro-genetic algorithms to improve localization in wireless sensor networks. JCM 1(4), 1–10 (2006)
De Sampaio, W.B., Silva, A.C., de Paiva, A.C., Gattass, M.: Detection of masses in mammograms with adaption to breast density using genetic algorithm, phylogenetic trees, lbp and svm. Expert Syst. Appl. 42(22), 8911–8928 (2015)
Zhang, J., Zhou, X., Yang, J., Cao, C., Ma, J.: Adaptive robust blind watermarking scheme improved by entropy-based svm and optimized quantum genetic algorithm. Mathematical Problems in Engineering 2019 (2019)
Chen, P., Yuan, L., He, Y., Luo, S.: An improved svm classifier based on double chains quantum genetic algorithm and its application in analogue circuit diagnosis. Neurocomputing 211, 202–211 (2016)
Devos, O., Downey, G., Duponchel, L.: Simultaneous data pre-processing and svm classification model selection based on a parallel genetic algorithm applied to spectroscopic data of olive oils. Food Chem. 148, 124–130 (2014)
Li, X., Kong, W., Shi, W., Shen, Q.: A combination of chemometrics methods and gc-ms for the classification of edible vegetable oils. Chemom. Intell. Lab. Syst. 155, 145–150 (2016)
Adankon, M.M., Cheriet, M.: Genetic algorithm-based training for semi-supervised svm. Neural Comput. Appl. 19(8), 1197–1206 (2010)
Ding, S., Zhu, Z., Zhang, X.: An overview on semi-supervised support vector machine. Neural Comput. Appl. 28(5), 969–978 (2017)
Corus, D., Oliveto, P.S.: Standard steady state genetic algorithms can hillclimb faster than mutation-only evolutionary algorithms. IEEE Trans. Evol. Comput. 22(5), 720–732 (2017)
Maratea, A., Petrosino, A., Manzo, M.: Adjusted f-measure and kernel scaling for imbalanced data learning. Inf. Sci. 257, 331–341 (2014)
Ripley, B.: Classification and regression trees. R package version pp. 1–0 (2005)
Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F. (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17:1
Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(17), 1–5 (2017)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Tahir, M. A.U.H., Aghar, S., Manzoor, A.,Noor, M.A.: classification model for class imbalance dataset using genetic programming. IEEE Access 7, 71013–71037. https://doi.org/10.1109/ACCESS.2019.2915611
Lessmann, S., Stahlbock, R., Crone, S.F.: Genetic algorithms for support vector machine model selection. In: International Joint Conference on Neural Networks, IJCNN’06, pp. 3063–3069. IEEE (2006)
Howley, T., Madden, M.G.: The genetic evolution of kernels for support vector machine classifiers. In: 15th Irish conference on artificial intelligence, pp. 445–453. Citeseer (2004)
Frohlich, H., Chapelle, O., Scholkopf, B.: Feature selection for support vector machines by means of genetic algorithm. In: Proceedings of 15th IEEE International Conference on Tools with Artificial Intelligence, 2003, pp. 142–148. IEEE (2003)
Shao, L., Liu, L., Li, X.: Feature learning for image classification via multiobjective genetic programming. IEEE Trans. Neural Netw. Learn. Syst. 25(7), 1359–1371 (2014)
Cervantes, J., Li, X., Yu, W.: Using Genetic Algorithm to improvecassification accuracy on imbalanced data. In: 2013 IEEE InternationalConference on Systems, Man, and Cybernetics, pp. 2659-2664 (2013). https://doi.org/10.1109/SMC.2013.7
Acknowledgements
This research is supported by Malaysia Ministry of Higher Education under Grant FRGS-RACER/1/2019/SS09/UUM//2.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pozi, M.S.M., Azhar, N.A., Raziff, A.R.A. et al. SVGPM: evolving SVM decision function by using genetic programming to solve imbalanced classification problem. Prog Artif Intell 11, 65–77 (2022). https://doi.org/10.1007/s13748-021-00260-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13748-021-00260-4